Greg Neiheisel CTO Astronomer Data Engineering Platform Streaming - - PowerPoint PPT Presentation

greg neiheisel
SMART_READER_LITE
LIVE PREVIEW

Greg Neiheisel CTO Astronomer Data Engineering Platform Streaming - - PowerPoint PPT Presentation

Greg Neiheisel CTO Astronomer Data Engineering Platform Streaming data Data pipelines Code first ETL Early Priorities Quick prototyping Get data in motion Ease of scale Astronomer V1 Lambda + API Gateway Cloudwatch for Monitoring


slide-1
SLIDE 1

Greg Neiheisel

CTO

slide-2
SLIDE 2

Astronomer

Streaming data Data pipelines Code first ETL

Data Engineering Platform

slide-3
SLIDE 3

Early Priorities

Quick prototyping Get data in motion Ease of scale

slide-4
SLIDE 4

Astronomer V1

Lambda + API Gateway Cloudwatch for Monitoring Kinesis + Elastic Beanstalk

slide-5
SLIDE 5

Trouble in paradise

slide-6
SLIDE 6

Strategic Obstacles

Companies view Amazon as direct competition Acquisition talks Open source philosophy

slide-7
SLIDE 7

Engineering Obstacles

Access to customer data Need a better tool for ETL Deeply ingrained in the AWS ecosystem

slide-8
SLIDE 8

Single Unified Platform

slide-9
SLIDE 9

DC/OS at Astronomer

Apache Airflow & Spark on Mesos Marathon (Kubernetes?) replaces Elastic Beanstalk Foundation for open source DE platform

slide-10
SLIDE 10

Apache Airflow

slide-11
SLIDE 11

Airflow on Mesos

Leverage community-contributed Mesos executor Up and running quickly Scales to millions of tasks daily

slide-12
SLIDE 12

Airflow at Astronomer

Behind the scenes to Managed service Intelligent Redshift loading Dependency driven tasks

slide-13
SLIDE 13

Not all AWS tools are created equal

slide-14
SLIDE 14

Kinesis to Kafka

slide-15
SLIDE 15

Issues with Kinesis

Buggy Kinesis Client Library Not available everywhere Unable to tap into the Kafka ecosystem

slide-16
SLIDE 16

The road to Kafka

Rewriting API and processors in Go Improve provisioning, monitoring and testing Run systems in parallel

slide-17
SLIDE 17

Kong and the inevitable end of API Gateway

slide-18
SLIDE 18

Kong

Replaces API Gateway Auth, rate limiting, lambda invocations for APIs Backed by Cassandra

slide-19
SLIDE 19

CloudFormation + Ansible to Terraform

slide-20
SLIDE 20

Terraform

Infrastructure as code 100% repeatable installs Ease of scale

slide-21
SLIDE 21

Rebuilding CloudWatch

slide-22
SLIDE 22

Prometheus

All nodes monitored out of the box Write our own exporters Ease of scale

slide-23
SLIDE 23

ELK

Centralized logging Aggregated queries across instances

slide-24
SLIDE 24

KairosDB

Time series events collected via REST Extremely durable, backed by Cassandra Rollups must be handled externally

slide-25
SLIDE 25

R&D

Kafka Connect sources/sinks Ceph or Minio Istio, Weave, Kubernetes Druid

slide-26
SLIDE 26

Astronomer.io

Greg Neiheisel Twitter: @schniebot LinkedIn: greg-neiheisel