Dagster

Table of contents

What is Dagster?

Dagster is a modern data orchestration platform designed for building, deploying, and maintaining data applications. It was created by Elementl (now Dagster Labs) and provides a Python-native approach to data engineering with a focus on development productivity, observability, and data quality.

Dagster is particularly well-suited for:

  • Data Engineering: ETL/ELT pipelines, data transformation workflows
  • Data Platforms: Building internal data platforms and data products
  • ML Operations: Model training pipelines, feature stores, ML workflows
  • Analytics Engineering: dbt integration, analytics workflows

Key features of Dagster include:

  • Python-native: Workflows are defined as Python code using assets and ops
  • Asset-centric: Data assets are first-class citizens, enabling data lineage and dependency tracking
  • Type system: Built-in type checking and validation for data quality
  • Observability: Rich UI for monitoring assets, runs, and data lineage
  • Development experience: Local development, testing, and debugging tools
  • Integration: Strong integration with popular data tools (dbt, Spark, Pandas, etc.)