Roadmap · Updated May 2026

The Data Scientist trek

Statistics, SQL, Python, machine learning, deep learning, experimentation, big data, and communication. Everything from raw data to deployed model to executive insight.

Stages

Estimated time

7 months

Level

Beginner → Advanced

Maintained by

3 practitioners

Stage 01

Statistics & probability foundations

Probability, distributions, hypothesis testing, and the statistical thinking that separates data scientists from people who make charts. This is the hardest and most important stage.

StatisticsProbabilityBeginner

Stage 02

SQL for analytics

Window functions, CTEs, recursive queries, and the SQL patterns that let you answer business questions directly from a database without writing a line of Python.

SQLAnalyticsPostgres

Stage 03

Python for data science

NumPy, pandas, and Polars. Slicing, joining, reshaping, and writing data pipelines that are fast, readable, and reproducible.

PythonpandasPolarsNumPy

Stage 04

Data wrangling & feature engineering

The craft of turning raw data into model-ready features. Encoding, scaling, imputation, feature selection, and handling the real-world messiness of production data.

Feature Engineeringscikit-learnData Prep

Stage 05

Machine learning fundamentals

Linear/logistic regression, trees, ensembles, and clustering. Build the intuitions that make complex models make sense — and know when to use them.

scikit-learnMLXGBoostIntermediate

Stage 06

Model evaluation & selection

Beyond accuracy: AUC, calibration, fairness, cross-validation, and the rigorous evaluation practices that prevent you from shipping models that hurt users.

EvaluationMetricsFairness

Stage 07

Deep learning basics

Neural networks, PyTorch, CNNs, and how to train models that learn representations. The practical skills for when tabular ML isn't enough.

Deep LearningPyTorchNeural Networks

Stage 08

NLP & text analytics

Text preprocessing, embeddings, sentiment analysis, topic modeling, and working with transformer models for text tasks.

NLPHuggingFaceTransformers

Stage 09

Experimentation & causal inference

Designing experiments that change decisions, avoiding the traps that make most A/B tests misleading, and going beyond correlation to causation.

ExperimentationA/B TestingCausal Inference

Stage 10

Big data & distributed computing

When data doesn't fit in memory: Spark, Dask, BigQuery, Snowflake, and the architecture of modern data lakehouses.

SparkBigQueryDaskBig Data

Stage 11

Data visualization & storytelling

Matplotlib, seaborn, Plotly, Tableau, and the narrative structure that makes data findings actually change decisions.

VisualizationTableauStorytelling

Stage 12

Model deployment & MLOps basics

Serving models in production: FastAPI, Docker, monitoring, drift detection, and the basics of keeping models accurate after they ship.

MLOpsModel ServingDrift Detection

Stage 13

Capstone — end-to-end data science project

Research question → data → model → deployment → stakeholder recommendation. A real artifact for your portfolio.

CapstoneAdvancedPortfolio

Trek complete. What's next?

You've walked the full roadmap. Now ship the capstone, write about it, and share the path with the next engineer who needs it.

Read the blog Explore more roadmaps