⚡ MLOps (Machine Learning Operations) is the discipline of deploying, monitoring, and maintaining ML models in production reliably. Building a model in a notebook is the easy part. Getting it to serve millions of requests per day, detecting when it starts making poor predictions, retraining it as the world changes, and rolling back safely when updates go wrong — that is MLOps.
Category: MLOps · Difficulty: Intermediate · Last updated: 15 May 2026 · 5 min read
MLOps — What It Is and Why Getting AI Into Production Is Harder Than Building It
What is MLOps?
A study by Gartner estimated that fewer than 15% of ML projects ever make it to production. The gap between a working model in a notebook and a reliable ML system serving real users is enormous — and it has nothing to do with the model.
The model is maybe 20% of the problem. The other 80% is everything around it: Where does the training data come from and how is it validated? How is the model packaged for serving? How does it get deployed without downtime? How do you know if it starts making bad predictions? What happens when the real-world data distribution shifts? How do you retrain and re-deploy safely? How do you track which version of the model is in production and which data it was trained on?
MLOps is the engineering discipline that answers all of these questions — applying the rigour of software engineering (version control, testing, CI/CD, monitoring) to the unique challenges of machine learning systems.
THE ML LIFECYCLE
Data ingestion and validation — ensure training data is complete, correctly formatted, and not corrupted before training begins.
Feature engineering and the feature store — compute and store features consistently across training and serving so there is no training-serving skew.
Experiment tracking — log every experiment: hyperparameters, training data version, evaluation metrics, model artefacts. MLflow and Weights & Biases make this reproducible.
Model training pipeline — automated, reproducible training that runs on new data without manual intervention.
Model evaluation and validation — automated tests on held-out data, bias audits, performance benchmarks before any model is promoted to production.
Model registry — a versioned store of trained models with metadata about training data, evaluation results, and deployment history.
Model serving — packaging models as APIs or batch jobs and deploying them to serve predictions at scale.
Monitoring — track real-time prediction performance, data drift, model accuracy against ground truth, latency, and error rates.
Retraining triggers — automated or manual retraining when monitoring detects performance degradation or data drift.
Real-world examples
Not theory — what real teams actually shipped using this technique.
- Netflix’s recommendation system runs a continuous MLOps pipeline — models are retrained daily on new viewing data, automatically evaluated against the current production model, and promoted only if they improve key metrics. The entire cycle runs without human intervention.
- Uber’s Michelangelo MLOps platform processes hundreds of millions of predictions per day across dozens of ML use cases — serving fraud detection, ETA prediction, surge pricing, and driver matching from a unified infrastructure with consistent monitoring and retraining.
- A hospital deploying a sepsis prediction model uses MLOps practices to monitor prediction calibration against clinical outcomes — detecting when the model’s risk scores diverge from actual sepsis rates and triggering a review before patient harm occurs.
Common pitfalls
- Training-serving skew — when the features computed during training are computed differently during serving (different preprocessing, different data sources), model accuracy drops despite no model changes. Feature stores with shared computation logic prevent this.
- Silent failures — models can degrade gradually without obvious errors. A recommendation model that slowly loses relevance does not throw exceptions — it just serves worse recommendations. Only monitoring catches this.
- Over-engineering early — teams sometimes build complex MLOps infrastructure before they have a model worth deploying. Start simple, add complexity as scale demands it.
- Model and data versioning — if you cannot reproduce a production model exactly (same data, same code, same hyperparameters), you cannot debug failures or safely roll back. Version everything from day one.
Frequently asked questions
QUESTION 1 What is MLOps in simple terms?
ANSWER 1 The engineering discipline of getting ML models from notebook to production and keeping them working reliably — deployment, monitoring, retraining, and rollback.
QUESTION 2 Why do most ML projects never reach production?
ANSWER 2 Building a model is easy. Packaging, serving, monitoring, retraining, and versioning it reliably in production is the hard part that most teams underestimate.
QUESTION 3 What is data drift and why does it matter?
ANSWER 3 When real-world data changes after deployment, models silently degrade — without monitoring for drift, nobody knows until the business impact is already felt.
QUESTION 4 What tools are used in MLOps?
ANSWER 4 MLflow (experiment tracking), Kubeflow/Airflow (pipelines), TorchServe/Triton (serving), Feast/Tecton (feature stores), Evidently/Arize (monitoring).
📬 Get one concept + one use case every Tuesday. Join the newsletter →