The Data Scientist trek
Statistics, SQL, Python, machine learning, deep learning, experimentation, big data, and communication. Everything from raw data to deployed model to executive insight.
Statistics & probability foundations
Probability, distributions, hypothesis testing, and the statistical thinking that separates data scientists from people who make charts. This is the hardest and most important stage.
SQL for analytics
Window functions, CTEs, recursive queries, and the SQL patterns that let you answer business questions directly from a database without writing a line of Python.
Python for data science
NumPy, pandas, and Polars. Slicing, joining, reshaping, and writing data pipelines that are fast, readable, and reproducible.
Data wrangling & feature engineering
The craft of turning raw data into model-ready features. Encoding, scaling, imputation, feature selection, and handling the real-world messiness of production data.
Machine learning fundamentals
Linear/logistic regression, trees, ensembles, and clustering. Build the intuitions that make complex models make sense — and know when to use them.
Model evaluation & selection
Beyond accuracy: AUC, calibration, fairness, cross-validation, and the rigorous evaluation practices that prevent you from shipping models that hurt users.
Deep learning basics
Neural networks, PyTorch, CNNs, and how to train models that learn representations. The practical skills for when tabular ML isn't enough.
NLP & text analytics
Text preprocessing, embeddings, sentiment analysis, topic modeling, and working with transformer models for text tasks.
Experimentation & causal inference
Designing experiments that change decisions, avoiding the traps that make most A/B tests misleading, and going beyond correlation to causation.
Big data & distributed computing
When data doesn't fit in memory: Spark, Dask, BigQuery, Snowflake, and the architecture of modern data lakehouses.
Data visualization & storytelling
Matplotlib, seaborn, Plotly, Tableau, and the narrative structure that makes data findings actually change decisions.
Model deployment & MLOps basics
Serving models in production: FastAPI, Docker, monitoring, drift detection, and the basics of keeping models accurate after they ship.
Capstone — end-to-end data science project
Research question → data → model → deployment → stakeholder recommendation. A real artifact for your portfolio.
Trek complete. What's next?
You've walked the full roadmap. Now ship the capstone, write about it, and share the path with the next engineer who needs it.