Production Docker Image for Apache Airflow
Airflow Summit 2020 - 14.07.2020
Production Docker Image for Apache Airflow Airflow Summit 2020 - - PowerPoint PPT Presentation
Production Docker Image for Apache Airflow Airflow Summit 2020 - 14.07.2020 Production Container Image for Apache Airflow Airflow Summit 2020 - 14.07.2020 Hi! Jarek Potiuk Apache Airflow: PMC Member and Committer Polidea:
Airflow Summit 2020 - 14.07.2020
Airflow Summit 2020 - 14.07.2020
Polidea
Logo or mockup Hi!
Jarek Potiuk
Apache Airflow: PMC Member and Committer Polidea: Principal Software Engineer (ex-CTO) Airflow Summit: Co-Organizer: Content (Lead)
@higrys
Polidea
Polidea
○ What container images are and why there are important ?
○ How it looked like so far ? ○ How it is going to look like now ?
○ What is in the image? ○ How we test the image?
○ How to extend Airflow Image? ○ How to customize Airflow Image? ○ How you can use the Image?
○ What’s next?
What questions will be answered?
Intro
Polidea
○
https://docker-curriculum.com/
○ https://github.com/apache/airflow/blob/master/IMAGES.rst
○
“Airflow on Kubernetes” by Michael Hewitt
https://www.crowdcast.io/e/airflowsummit/6
What this talk is NOT about?
Intro
Polidea
Who is the talk for?
Intro
Polidea
Polidea
○ OCI: https://opencontainers.org/
What is a container ?
Context Container Container image
Polidea
○ Building, Running, Sharing containers
Container ≠ Docker
Context
Container execution engine Container registry Container management CLI
Polidea
Context: What is Container file
FS Layers
Polidea
Context: Container Lifecycle: Build
Container image
Container registry Container execution engine Container Image file (Dockerfile)
Build
Polidea
Context: Container Lifecycle: Run
Container image
Container registry Container execution engine Container Image file (Dockerfile)
Run
Polidea
Context: Container Lifecycle: Push
Container image
Container registry Container execution engine Container Image file (Dockerfile)
Push
Polidea
Context: Container Lifecycle: Pull
Container image
Container registry Container execution engine Container Image file (Dockerfile)
Pull
Polidea
Why containers are important?
Context
Polidea
Polidea
History of Containers in Airflow: CI
Status
Polidea
○ Used by many users in production ○ Used by the publicly available Helm Chart (not managed by community )
○ Alpha Quality community image in 1.10.10 ○ Beta Quality community image in 1.10.11 (now!)
History of Containers in Airflow: Prod
Status
Polidea
State of the Official Production image
Status
Polidea
Polidea
Internals: DockerHub releases Released image
Polidea
Container Image or Container File ?
Internals: Releasing the image
Polidea
○ async,aws,azure,celery,dask,elasticsearch,gcp,kubernetes, mysql,postgres,redis,slack,ssh,statsd,virtualenv
Features of the production image
Internals
Polidea
Features of the production image file
Internals
Polidea
Internals: build image Build image
(side comment) ~ 730 modules ~ 360 MB Install to ${HOME}/.local
Polidea
Internals: main image
Main image
Polidea
Internals: entrypoint
missing (OpenShift)
command
Polidea
Internals: .dockerignore
want by “!”
subdirectories/patterns
sources
takes time
unneeded artifacts
Polidea
How we test the image ?
Internals
Polidea
Polidea
Usage: Extending Airflow image - use released image
Container image
Container registry
apache/airflow:1.10.11
docker build . -t yourcompany/airflow:1.10.11-BUILD_ID
yourcompany/airflow:1.10.11-BUILD_ID
Polidea
Pros
Extending image - Pros & Cons
Usage
Cons
dependencies
Polidea
Usage: Customising Airflow image - default docker build
Container image
Same as apache/airflow:1.10.11
Polidea
Usage: Customising Airflow image - use build args
Polidea
Usage: Image Customization options
See IMAGES.rst in the Airflow repo.
Polidea
Usage: It’s a Breeze to build images
environment
production images:
https://s.apache.org/airflow-breeze
See BREEZE.rst in the Airflow repo
Polidea
Pros
(security reviews!)
Customising image - Pros & Cons
Usage
Cons
Polidea
Usage
Why not eat and have cake ?
Runtime Container image Base Container image
When dependencies change When DAGs change base-image-for-your-company:1.10.11-2020-07-14
Polidea
○ Managed: Amazon ECS, Google Container on VMs, Azure Container Instances
○ Helm Chart ○ Airflow Operator (not recommended yet)
How to deploy the images ?
Usage
Polidea
Polidea
○ ON BUILD support ? ○ AIRFLOW__CORE__SQL_ALCHEMY_CONN_CMD, AIRFLOW__CELERY__BROKER_URL_CMD support ? ○ Automated user creation ?
What is the future for Airflow images?
Future
Polidea
Polidea
hello@polidea.com