SLIDE 1 Nick Schrock
Founder, Elementl
@schrockn
SLIDE 2
SLIDE 3
“Our data is totally broken”
SLIDE 4 “Our data is totally broken”
- We don’t know where our data comes from
- We don’t know what it means
- We cannot reliably process and test it
- Our engineers don’t want to deal with it
- It isn’t “fun.” It isn’t “sexy.”
SLIDE 5
SLIDE 6
SLIDE 7 Data Cleaning My Job
SAY WHAT THEY
SLIDE 8 My Job Not my job
MEAN WHAT THEY
Data Cleaning
SLIDE 9 MEAN WHAT THEY
My Job Not my job Data Cleaning
SLIDE 10 WHAT THEY MEAN
- Rolling their own infrastructure
- Repeated work
- Maintaining unreliable processes
My Job Not my job Data Cleaning
SLIDE 11
FAILURE IS THE NORM
SLIDE 12
Business Leader: Failure is the norm. Data scientist: I waste most of my time. Engineers: I don’t want to touch it.
SLIDE 13 2009: UI development is awful
- I spend 80% of my time fighting the browser
SLIDE 14 2009: UI development is awful
- We can’t change our UI–there’s no testing
- It breaks all the time.
- Our engineers don’t want to touch it
- I spend 80% of my time fighting the browser
SLIDE 15 2019: A (UI) world transformed
But it was the software abstractions that proved decisive. Browsers did get better.
SLIDE 16
React acknowledged complexity It respected the discipline Scripts Full applications
SLIDE 17 React Frontend Applications Data Applications Dagster
SLIDE 18
- Solves a real problem
- Incremental adoption path
- Preserve tools that work
- Immediate value and productivity gains
PRINCIPLEs
SLIDE 19
SLIDE 20
Graphs of functional computations that produce and consume data assets
Data Applications
SLIDE 21
> pip install dagster
SLIDE 22
SLIDE 23
SLIDE 24 DAGSTER CONCEPTS
- Solid: A unit of functional computation
- Pipeline: A DAG of solids
SLIDE 26
SLIDE 27
SLIDE 28 DAGSTER CONCEPTS
- Solid
- Inputs: Inputs are the data
- Config: Config modifies how data is computed
- Pipeline
SLIDE 29
SLIDE 30
SLIDE 31
SLIDE 32
SLIDE 33
- Solid
- Inputs & Config
- Pipeline
- Dependencies
DAGSTER CONCEPTS
SLIDE 35
SLIDE 36
SLIDE 37 DAGSTER CONCEPTS
- Solid
- Inputs & Config
- Pipeline
- Dependencies
- Context
- Logging: Structured Logging
- Resources: Connections, Services, Etc
SLIDE 38 Python library API Dagster Libraries and Integrations
DAG View
Beautiful, High-Quality Tools
Dagit Editor Console
PySpark
SLIDE 39 API
- Queryable and Introspectable
- Operable
- Executable and Configurable
- Monitorable
- Logging and Live Subscriptions
Dagster: a platform for building tools
Graph of Functional Computations
SLIDE 40 DAG View
Python library API Dagster Libraries and Integrations
Beautiful, High-Quality Tools
Dagit Editor Console
PySpark
SLIDE 41 DAG View
Python library API Dagster Libraries and Integrations
Beautiful, High-Quality Tools
Dagit Editor Console
PySpark Spark Runtime Scala DBs (Snowflake et SQL
SLIDE 42
SLIDE 43
SLIDE 44
- Open Source, Python Library
- Multi-lingual integration
- Beautiful Tooling
SLIDE 45 Engineering Data Current Status Quo Engineering Data Where we need to go
What ABOUT THOSE DATA SCIENTISTS?
Overlap is cultural, driven by
SLIDE 46
SLIDE 47
SLIDE 48
SLIDE 49
SLIDE 50
SLIDE 51
SLIDE 52 DAG View
Python library API Dagster Libraries and Integrations
Beautiful, High-Quality Tools
Dagit Editor Console
PySpark Spark Runtime Scala DBs (Snowflake et SQL
SLIDE 53 DAG View
Python library API Dagster Libraries and Integrations
Beautiful, High-Quality Tools
Dagit Editor Console
PySpark Spark Runtime Scala Jupyter Papermill DBs (Snowflake et SQL
SLIDE 54 Python library Dagster Libraries and Integrations Data Engineering Analysts Data Science PySpark Spark Runtime Scala Jupyter Papermill DBs (Snowflake et SQL
SLIDE 55 Dagster Libraries and Integrations Data Engineering Analysts Data Science API dagster-
Local Executor
dagit dagster cli
Airflow Data Application: A Graph of Computations
SLIDE 56
SLIDE 57
SLIDE 58 DATA ENGINEERING
- An emerging discipline
- At an inflection point
Scripts Data Applications
SLIDE 59
ELEGANT PROGRAMMING MODEL NEW, BEAUTIFUL TOOLING FLEXIBLE AND INCREmENTAL
SLIDE 60
- Use your tools
- Preserve your code
- Deploy to your infrastructure
- Adopt incrementally
FLEXIBLE AND INCREmENTAL
SLIDE 61
And there is a ton of work to do
SLIDE 62
TEAM
Nate Kupp Alex Langenfeld Max Gasner
SLIDE 63
THANK YOU
Mikhail Novikov Ben Gotow Uma Roy
SLIDE 64
THANK YOU
Abe Gong Superconductive Health
SLIDE 65 schrockn@elementl.com
@schrockn
https://github.com/dagster-io/dagster
Join the team. Partner with us. https://elementl.com
SLIDE 66