MLOps eliminates manual toil and mistakes working with multiple versions of data preprocessing and feature engineering pipelines, datasets, respective train/validation/inference source code, and models by automation and high-level orchestration.
Why would you want to do MLOps?
If your team works on multiple DS/AI directions and/or approaches, it will face the chaos of versions of data preprocessing, datasets, features, and models in no time. You will also be surprised to discover that Data Scientists are not as good as you want them to be at infrastructure works – for instance, making Kubernetes deployments for training and operationalization of ML models.
MLOps is a long-term investment too. The ROIs are the following:
Fast and reliable iterations of experiments. No need to rewrite Python/Java/R code on every iteration. Set up train and validation workflows once, automate them with MLOps, and toggle hyperparameters from now on.
Flexible scaling of computational resources. There are times when computational resources are in high demand. But your team doesn’t need them spinning during the night or a weekend, for example. And there is always somebody who forgot to shut down that expensive GPU instance – if the shutdown was not automated, of course. Do you like bills for something that wasn’t used on purpose? Hope you don’t.
Consistent track of experiments’ results. Hyperparameters are aligned with validation results and logged automatically. They always forget about it, you should know.
Seamless integration and deployment. Your service requires iterative updates. Believe us, you don’t want to do that manually.
Risk mitigation. Compare new ML models to the existing ones, detect and avoid input drift. Eliminate human mistakes at all stages of the ML roadmap.
Automatic monitoring and feedback loop. The live model scoring environment and the model training environment are distinct. As a result, test environment scores are likely to be different from those in reality. This risk mitigation exposes the model to the actual environment as a canary deployment. Observing and recording real-world performance metrics and predictions allows detecting corner cases, iterating, and developing a better model. Without human intervention.
Data Warehouse
Data warehouse is a persistent, secure, enterprise-grade storage for the groomed data that have been extracted and transformed at the ETL stage. The data is ready for analytics and AI tasks.
Model Development
The outcomes of the Model Development stage are:
the finest trained model stored as a binary file
supporting source code pushed to the repository
recorded results of data exploration and model evaluation metrics
The following ML routines are automated:
Data Exploration At this stage, the following questions are answered: How does your data look? Does it allow effectively solve your problem? What gaps does it have?
Feature Engineering Record the current list of features and how they were computed to replay or analyze them later.
Recording Architectures, Hyperparameters, and Performance Metrics During the ML trials stage, engineers test multiple architectures, hyperparameters, and metrics. It is best to keep track of all outcomes automatically as people tend to overlook things. Having a comprehensive log of trials simplifies the choice of the most competent model.
Model Evaluation Before facing the destination environment, the model performance is evaluated on a separate data subset that was not used in training. Usually, this subset is the best possible approximation of the real-world inputs. Obtained metrics are a significant indicator of the model’s generalization capabilities and set expectations for the model behavior in the desired environment.
Selection of the best model Once all experiments’ performance metrics and predictions have been recorded, selecting the best model becomes trivial and is done without human endeavors.
Preparing for Production
The main deliverable of the Preparing for Production step is a standalone service where the binary file of the trained ML model file, supplementing source code, and other required artifacts are wrapped up. The service is not facing the environment for now, but it is ready to.
The following steps are automated:
Runtime Environment establishment Whether it would be a custom-built service, data science platform, dedicated services like TensorFlow Serving, low-level infrastructure like Kubernetes cluster, JVM on embedded system, or multiple heterogeneous production environments coexistent.
Quality Assurance Elimination of bugs and errors in data preparation as well as model design, train, and evaluation Validation of technical compatibility of the model and its runtime environment Verification of origins of all input datasets, pre-trained models, and other assets, as they could be subject to regulations or copyrights. Automation of the validation operations to ensure their appropriateness and consistency while maintaining the ability to deploy quickly
Reproducibility and audibility Provide the ability to rerun the same experiment and get the same results easily. Model architecture and hyperparameters aligned with the data used for training and testing, the metrics reached, plus the full specification of the training environment.
Security Machine learning introduces a new range of potential threats where an attacker intentionally provides malicious data designed to cause the model to make a mistake.
Deployment and Integration
After successful model development, a data scientist pushes the code, metadata, and documentation to a central repository. This action triggers an integration and deployment workflow.
During the Deploying to Production stage, the following routines are automated:
Build the model Build the model artifacts Send the artifacts to long-term storage Run basic checks (smoke tests and/or sanity checks) Generate fairness and explainability reports
Deploy to a test environment Run tests to validate ML performance, computational performance
Deploy to the production environment Deploy the model as canary Verify correct behavior and metrics Fully deploy the model
Monitoring and Feedback Loop
Production machine learning models can degrade in quality fast and without warning – until it’s too late – it has a potentially negative impact on the business. That’s why model monitoring is a crucial step in the ML model life cycle and a critical piece of MLOps.
Monitoring aims to address two major concerns:
Technical Is the system alive? Are the CPU, RAM, network usage, and disk space as expected? Are requests being processed at the expected rate?
Performance Is the model still accurate? Is it performing as well as it did during the design phase?
The Feedback Loop is the information flow from the production environment back to the model training environment for further improvement.
The production feedback is constantly recorded. The feedback is used to detect the model’s performance degradation and to augment the training dataset. Once the degradation is detected, the update is triggered. The update includes retraining the model on the augmented dataset or developing a new model with additional features.