Machine Learning (ML) is being applied successfully across many industries and by all kinds of organizations, from start-ups to large corporations. However, many encounter unexpected problems when trying to apply ML to their use case. In this post, we are going to see what it takes to develop a successful ML product, and how Machine Learning Operations (MLOps) can help.
There is a quote that is very widespread in ML circles, and it goes loosely like this: “More than [scary number] percent of ML projects never reach production”. This means that an organization has invested money and time developing a proof-of-concept application of ML, but failed to exploit it in a real product or application. While many estimate that scary number to be as high as 80%, a famous study by Gartner estimates it to be around 50%.
So what is causing so many ML projects to fail? There are several factors involved but - contrary to a common misconception - the lack of sophisticated modeling techniques is not usually one of them. These days it is relatively easy for data professionals to develop ML models using well-known techniques for which training material and free, high-quality and open-source tools are readily available. In many cases (but not all), with just a bit of coding and a few months of training anybody with a computer can learn to train basic ML models that can in principle provide real value. There are even several auto-ML tools that are highly successful and sophisticated, and do not require in-depth ML knowledge.
The barrier of entry, in other words, is not very high. This is a monumental achievement for which the ML community and the open-source movement deserve a lot of credit.
Then, what is the issue? Training an ML model is only a small part of what it takes to build a real product or application of ML, that can actually be used by somebody to achieve something.
There are several objectives that can determine a successful ML project:
Machine Learning Operations (MLOps) is an ensemble of techniques, tools and best practices that help achieve the objectives delineated in the previous section. With MLOps you can lay out a development process for ML that is efficient, frictionless, transparent, and reproducible. MLOps cannot of course guarantee the success of a ML project - only you can - but it is an essential tool that reduces the effort, shortens the development time and reduces the cost across the lifespan of the project well beyond the first deployment.
In many ways, MLOps is for machine learning what DevOps is for software development. However, there are important differences. While DevOps only deals with software and its artifacts, MLOps must deal with 3 components: the software to train the model (version control, dependencies, containerization…), the data (data version control, feature management…), and the model itself (experiment tracking, model repository, deployment…).
Moreover, while the development of software is ideally linear (see the figure above), the development of ML is very iterative by nature. You should do experiments with different possible solutions to the same problem and then select the best-performing one. Finally, to test ML models you need not only unit tests and integration tests just like for software, but also data testing and validation, as well as model performance evaluation. Consequently, the MLOps process tends to be more complex than the DevOps process and typically involves more people and more tools.
Taming complexity is indeed generally the challenge with MLOps, especially at this stage where the landscape of tools and practices is still relatively immature. You also need a balanced MLOps solution that matches the development stage of your ML solution. If you are deploying your first ML model, you don’t need to invest in the same infrastructure that you would need if you already have 50 models in production. Instead, the infrastructure can grow with your company and ML use case.
Given the complexity of processes and tools involved in an MLops solution, it is evident that developing ML solutions in an MLOps workflow requires diverse profiles and skills. Depending on the organization, the names might change or be used in different ways, but the roles stay pretty much unchanged.
On the technical side, data scientists and ML engineers are involved. These are the main people responsible for developing the ML model and the training pipelines, as well as performing all the relevant analytics for example on data quality, performance measurements, and so on.
There is also involvement from data engineers, who are responsible for the data ingestion pipelines, the quality of the data that the DS/MLE receive, as well as for provisioning the right data at inference time.
From the beginning, software engineers, also called platform engineers in some organizations, should also be involved. They are responsible for the production environment, both front-end and back-end. They are necessary partners to think about how a certain ML model could be deployed to production, and the constraints are in terms of processing power, latency, throughput and so on. Ideally, your MLOps process should not require the software engineers to re-implement the ML models. Instead, the ML models developed by the DS/MLE should be usable out of the box, typically as API endpoints.
Finally, MLOps engineers and DevOps engineers help in handling the infrastructure: training servers, the different MLOps tools, and any other infrastructure needed to train and deploy a model.
In this post on machine learning in production and MLOps, we have seen what some key goals for a successful ML project are, how MLOps can help achieve them and what roles are included in MLOps. Subscribe to our newsletter to stay tuned for more MLOps related content coming soon!