AIP-31: Airflow functional DAG Airflow Summit 2020 1 Introduction - - PowerPoint PPT Presentation

aip 31 airflow functional dag
SMART_READER_LITE
LIVE PREVIEW

AIP-31: Airflow functional DAG Airflow Summit 2020 1 Introduction - - PowerPoint PPT Presentation

July 10, 2020 AIP-31: Airflow functional DAG Airflow Summit 2020 1 Introduction 2 Why functional DAG? 3 Explicit XCom: XComArg 4 @task decorator 5 Future work Intro Gerard Casas Saez Software Engineer ML Platform - Cortex @


slide-1
SLIDE 1

Airflow Summit 2020

AIP-31: Airflow functional DAG

July 10, 2020

slide-2
SLIDE 2

1 2 3 4 5 Introduction Why functional DAG? Explicit XCom: XComArg @task decorator Future work

slide-3
SLIDE 3

Intro 👌

slide-4
SLIDE 4

Gerard Casas Saez

Software Engineer ML Platform - Cortex @ Twitter Follow me @casassaez

slide-5
SLIDE 5

Why functional DAG?

slide-6
SLIDE 6

Extract Transform Load

Parse JSON Extract origin parameter Format email subject and content Data out: Email subject + content strings Send email to myself to get current IP

Example ETL pipeline

GET request to HttpBin /get endpoint Data out: HttpBin JSON string

slide-7
SLIDE 7
  • XCom value vs Execution date based file paths
  • Preferred: XCom. Why?
  • Sometimes data fits in DB! Ex: model training metrics.
  • More flexible paths, not only date needed, custom config (HDFS

cluster, GCS vs HDFS…)

  • XCom are visible from Web UI, easier to debug
  • Better reusability of operators
  • Already used by a lot of OSS Airflow operators!

Passing data between operators

slide-8
SLIDE 8

Example DAG

slide-9
SLIDE 9

Example DAG

slide-10
SLIDE 10
  • ETL workflow resemble functions: Functional Data Engineering
  • Variable == data artifact ⩬ xcom metadata
  • Function == operator
  • Data artifacts are implicit in Airflow (XCom table for metadata)
  • Needs explicit task dependency declaration
  • Custom function to operator is hard-ish (PythonOperator)

AIP-31: Motivation

slide-11
SLIDE 11
  • Streamlined (Functional) Airflow roadmap
  • TypedXComArg in ML Workflows (internal Twitter Airflow fork)
  • ML pipelines investigation
  • Prefect Functional DAG
  • Dagster pipelines and solids
  • Te

nsorflow Extended pipelines

  • Square’s Bionic pipelines
  • Netflix Metaflow pipelines

Prior art/Inspiration

slide-12
SLIDE 12

Explicit XCom: XComArg class

slide-13
SLIDE 13
  • Resolved on operator execution for templated fields
  • XComArg(op, ‘subject’) == “{{context[‘ti’].xcom_pull(‘op_id’, ‘subject’)}}”
  • XComArg(op, ‘subject’).resolve() == ti.xcom_pull(op, ‘subject’)
  • Used in DAG definition
  • Change XComArg key using __getitem__: val[‘body’]
  • BaseOperator property to generate default XComArg: .output
  • Implicit task dependency based on XComArg dependency

XComArg: Reference to future XCom value

slide-14
SLIDE 14

Example DAG

slide-15
SLIDE 15

Example DAG

slide-16
SLIDE 16

@task decorator

slide-17
SLIDE 17

Python function to Airflow operator

slide-18
SLIDE 18
  • Usage:
  • @airflow.decorators.task
  • @dag.task
  • Calling decorated function generates PythonOperator
  • Set op_args and op_kwargs
  • Multiple outputs support, return dictionary with string keys.
  • Generate Task ids automatically
  • Return default XComArg when called
  • [UPCOMING] No context kwarg support, instead get_current_context()

@task decorator

slide-19
SLIDE 19

Example DAG

slide-20
SLIDE 20

Example DAG

slide-21
SLIDE 21

Future work! 🚁

slide-22
SLIDE 22
  • @dag decorator: Same concept as @task but to create DAG
  • Function kwargs == DAG parameters
  • Type hints support for multiple outputs
  • Automatically detect if output must be splitted into different XCom values.
  • Custom XCom backends
  • Handle serialization for specific Python classes
  • Handle I/O for different centralized local file systems: HDFS, GCS, S3...
  • Ex: Serialize/Deserialize pandas from/into CSV in HDFS when used for XCom values

Future work + Contributions

slide-23
SLIDE 23

Custom XCom backend

slide-24
SLIDE 24

@dag decorator

slide-25
SLIDE 25

Last but not least. Not working alone: Functional Ops SIG

slide-26
SLIDE 26

Kudos to..

  • Contributors for AIP-31
  • Tomek Urbaszek
  • Evgeny Shulman
  • Jonathan Shir

+ Airflow reviewers and committers (Kaxil, Ash, Jarek, Dan…)

slide-27
SLIDE 27

Questions? 🤕

slide-28
SLIDE 28

Thank you. 👌