Roadmap · Updated May 2026

The MLOps Engineer trek

The full ML lifecycle in production: experiment tracking, feature stores, model serving, drift detection, CI/CD for ML, and LLMOps for large language model systems.

Stages

Estimated time

7 months

Level

Intermediate → Advanced

Maintained by

3 practitioners

Stage 01

ML lifecycle & fundamentals

Understand the full ML lifecycle before automating it: training, evaluation, deployment, and monitoring — and why each stage needs engineering rigor.

ML LifecycleFundamentalsBeginner

Stage 02

Experiment tracking

MLflow and Weights & Biases for tracking experiments, comparing runs, and never losing a model again.

MLflowW&BExperiment Tracking

Stage 03

Data versioning & pipelines

DVC for versioning data and pipelines, ensuring every experiment is reproducible from data to model.

DVCData VersioningPipelines

Stage 04

Feature stores

Offline and online feature serving, feature sharing across teams, and avoiding training-serving skew.

Feature StoreFeastTecton

Stage 05

Model packaging & serving

Packaging models with MLflow and ONNX, serving with FastAPI and TorchServe, and building low-latency inference endpoints.

Model ServingFastAPITorchServe

Stage 06

CI/CD for ML

Automated training, testing, and deployment pipelines for ML models — triggered by code changes, data changes, or model performance degradation.

CI/CDGitHub ActionsAutomated Training

Stage 07

Model monitoring & drift detection

Keeping models accurate after they ship: data drift, concept drift, prediction monitoring, and automated retraining triggers.

MonitoringDrift DetectionEvidently

Stage 08

Orchestration for ML

Airflow, Prefect, and Kubeflow for orchestrating complex ML pipelines with dependencies, retries, and scheduling.

AirflowKubeflowOrchestration

Stage 09

Cloud ML platforms

AWS SageMaker, Google Vertex AI, and Azure ML — managed platforms for training, deploying, and monitoring ML models at scale.

SageMakerVertex AICloud ML

Stage 10

LLMOps

The emerging discipline of operating large language models in production: prompt versioning, LLM evaluation, RAG pipelines, and cost optimization.

LLMOpsLLMsEvaluation

Stage 11

ML platform engineering

Building internal ML platforms: self-service tooling, governance, and the infrastructure that makes data science teams 10x more productive.

ML PlatformInternal ToolsGovernance

Stage 12

Capstone — production ML system

Build, deploy, monitor, and iterate on a complete production ML system from data to drift-monitored endpoint.

CapstoneAdvancedPortfolio

Trek complete. What's next?

You've walked the full roadmap. Now ship the capstone, write about it, and share the path with the next engineer who needs it.

Read the blog Explore more roadmaps