Nick Schrock Founder, Elementl @schrockn Our data is totally - - PowerPoint PPT Presentation

nick schrock
SMART_READER_LITE
LIVE PREVIEW

Nick Schrock Founder, Elementl @schrockn Our data is totally - - PowerPoint PPT Presentation

Nick Schrock Founder, Elementl @schrockn Our data is totally broken Our data is totally broken We dont know where our data comes from We dont know what it means We cannot reliably process and test it Our


slide-1
SLIDE 1

Nick Schrock

Founder, Elementl

@schrockn

slide-2
SLIDE 2
slide-3
SLIDE 3

“Our data is totally broken”

slide-4
SLIDE 4

“Our data is totally broken”

  • We don’t know where our data comes from
  • We don’t know what it means
  • We cannot reliably process and test it
  • Our engineers don’t want to deal with it
  • It isn’t “fun.” It isn’t “sexy.”
slide-5
SLIDE 5
slide-6
SLIDE 6
slide-7
SLIDE 7

Data Cleaning My Job

SAY WHAT THEY

slide-8
SLIDE 8

My Job Not my job

MEAN WHAT THEY

Data Cleaning

slide-9
SLIDE 9

MEAN WHAT THEY

My Job Not my job Data Cleaning

slide-10
SLIDE 10

WHAT THEY MEAN

  • Rolling their own infrastructure
  • Repeated work
  • Maintaining unreliable processes

My Job Not my job Data Cleaning

slide-11
SLIDE 11

FAILURE IS THE NORM

slide-12
SLIDE 12

Business Leader: Failure is the norm. Data scientist: I waste most of my time. Engineers: I don’t want to touch it.

slide-13
SLIDE 13

2009: UI development is awful

  • I spend 80% of my time fighting the browser
slide-14
SLIDE 14

2009: UI development is awful

  • We can’t change our UI–there’s no testing
  • It breaks all the time.
  • Our engineers don’t want to touch it
  • I spend 80% of my time fighting the browser
slide-15
SLIDE 15

2019: A (UI) world transformed

But it was the software abstractions that proved decisive. Browsers did get better.

slide-16
SLIDE 16

React acknowledged complexity It respected the discipline Scripts Full applications

slide-17
SLIDE 17

React Frontend Applications Data Applications Dagster

slide-18
SLIDE 18
  • Solves a real problem
  • Incremental adoption path
  • Preserve tools that work
  • Immediate value and productivity gains

PRINCIPLEs

slide-19
SLIDE 19
slide-20
SLIDE 20

Graphs of functional computations that produce and consume data assets

Data Applications

slide-21
SLIDE 21

> pip install dagster

slide-22
SLIDE 22
slide-23
SLIDE 23
slide-24
SLIDE 24

DAGSTER CONCEPTS

  • Solid: A unit of functional computation
  • Pipeline: A DAG of solids
slide-25
SLIDE 25
  • n

Page Rank

slide-26
SLIDE 26
slide-27
SLIDE 27
slide-28
SLIDE 28

DAGSTER CONCEPTS

  • Solid
  • Inputs: Inputs are the data
  • Config: Config modifies how data is computed
  • Pipeline
slide-29
SLIDE 29
slide-30
SLIDE 30
slide-31
SLIDE 31
slide-32
SLIDE 32
slide-33
SLIDE 33
  • Solid
  • Inputs & Config
  • Pipeline
  • Dependencies

DAGSTER CONCEPTS

slide-34
SLIDE 34

Before: After:

slide-35
SLIDE 35
slide-36
SLIDE 36
slide-37
SLIDE 37

DAGSTER CONCEPTS

  • Solid
  • Inputs & Config
  • Pipeline
  • Dependencies
  • Context
  • Logging: Structured Logging
  • Resources: Connections, Services, Etc
slide-38
SLIDE 38

Python library API Dagster Libraries and Integrations

DAG View

Beautiful, High-Quality Tools

Dagit Editor Console

PySpark

slide-39
SLIDE 39

API

  • Queryable and Introspectable
  • Operable
  • Executable and Configurable
  • Monitorable
  • Logging and Live Subscriptions

Dagster: a platform for building tools

Graph of Functional Computations

slide-40
SLIDE 40

DAG View

Python library API Dagster Libraries and Integrations

Beautiful, High-Quality Tools

Dagit Editor Console

PySpark

slide-41
SLIDE 41

DAG View

Python library API Dagster Libraries and Integrations

Beautiful, High-Quality Tools

Dagit Editor Console

PySpark Spark Runtime Scala DBs (Snowflake et SQL

slide-42
SLIDE 42
slide-43
SLIDE 43
slide-44
SLIDE 44
  • Open Source, Python Library
  • Multi-lingual integration
  • Beautiful Tooling
slide-45
SLIDE 45

Engineering Data Current Status Quo Engineering Data Where we need to go

What ABOUT THOSE DATA SCIENTISTS?

Overlap is cultural, driven by

slide-46
SLIDE 46
slide-47
SLIDE 47
slide-48
SLIDE 48
slide-49
SLIDE 49
slide-50
SLIDE 50
slide-51
SLIDE 51
slide-52
SLIDE 52

DAG View

Python library API Dagster Libraries and Integrations

Beautiful, High-Quality Tools

Dagit Editor Console

PySpark Spark Runtime Scala DBs (Snowflake et SQL

slide-53
SLIDE 53

DAG View

Python library API Dagster Libraries and Integrations

Beautiful, High-Quality Tools

Dagit Editor Console

PySpark Spark Runtime Scala Jupyter Papermill DBs (Snowflake et SQL

slide-54
SLIDE 54

Python library Dagster Libraries and Integrations Data Engineering Analysts Data Science PySpark Spark Runtime Scala Jupyter Papermill DBs (Snowflake et SQL

slide-55
SLIDE 55

Dagster Libraries and Integrations Data Engineering Analysts Data Science API dagster-

Local Executor

dagit dagster cli

Airflow Data Application: A Graph of Computations

slide-56
SLIDE 56
slide-57
SLIDE 57
slide-58
SLIDE 58

DATA ENGINEERING

  • An emerging discipline
  • At an inflection point

Scripts Data Applications

slide-59
SLIDE 59

ELEGANT PROGRAMMING MODEL NEW, BEAUTIFUL TOOLING FLEXIBLE AND INCREmENTAL

slide-60
SLIDE 60
  • Use your tools
  • Preserve your code
  • Deploy to your infrastructure
  • Adopt incrementally

FLEXIBLE AND INCREmENTAL

slide-61
SLIDE 61

And there is a ton of work to do

slide-62
SLIDE 62

TEAM

Nate Kupp Alex Langenfeld Max Gasner

slide-63
SLIDE 63

THANK YOU

Mikhail Novikov Ben Gotow Uma Roy

slide-64
SLIDE 64

THANK YOU

Abe Gong Superconductive Health

slide-65
SLIDE 65

schrockn@elementl.com

@schrockn

https://github.com/dagster-io/dagster

Join the team. Partner with us. https://elementl.com

slide-66
SLIDE 66