The Data Engineer trek
Build the infrastructure that powers data teams. SQL, Python, Spark, Kafka, Airflow, cloud data warehouses, and the architecture of modern data lakehouses.
SQL & data modeling foundations
Data engineering starts with SQL mastery and understanding how to model data for analytical workloads.
Python for data engineering
Python for building pipelines, not analysis. File I/O, API calls, data transformation, and packaging reusable code.
Cloud data warehouses
BigQuery, Snowflake, and Redshift — the analytical databases that power modern data teams.
dbt — analytics engineering
dbt transforms raw data into analytics-ready models, tested and documented, as code.
Apache Spark & distributed computing
When data doesn't fit in memory: Spark's DataFrame API, optimization, and the mental model for distributed computation.
Stream processing with Kafka
Real-time data ingestion: Kafka fundamentals, Kafka Streams, Flink, and building event-driven pipelines.
Workflow orchestration with Airflow
Airflow, Prefect, and Dagster — scheduling, monitoring, and making data pipelines reliable in production.
Data quality & testing
Pipelines without data quality checks are time bombs. Great Expectations, dbt tests, and systematic data quality frameworks.
Data lakehouse architecture
Delta Lake, Apache Iceberg, and the lakehouse pattern that gives data lakes ACID transactions and schema enforcement.
Data governance & security
Data cataloging, lineage, access control, PII handling, and the compliance requirements data engineers must operationalize.
Infrastructure for data
Terraform for data infrastructure, cost optimization for data warehouses, and running data platforms on Kubernetes.
Real-time analytics
Combining batch and streaming for hybrid architectures. ksqlDB, Flink SQL, and building dashboards on live data.
Capstone — build a data platform
Design, build, and document a production data platform that a data team can rely on.
Trek complete. What's next?
You've walked the full roadmap. Now ship the capstone, write about it, and share the path with the next engineer who needs it.