The history and anatomy of Apache Superset Maxime Beauchemin Open - - PowerPoint PPT Presentation

the history and anatomy of apache superset maxime
SMART_READER_LITE
LIVE PREVIEW

The history and anatomy of Apache Superset Maxime Beauchemin Open - - PowerPoint PPT Presentation

The history and anatomy of Apache Superset Maxime Beauchemin Open source leader & community builder Creator of Apache Superset Creator of Apache Airflow Digital artist Influencer in the data engineering space 15+


slide-1
SLIDE 1

The history and anatomy of Apache Superset

slide-2
SLIDE 2

2

Maxime Beauchemin

  • Open source leader & community builder

○ Creator of Apache Superset ○ Creator of Apache Airflow

  • Digital artist
  • Influencer in the data engineering space
  • 15+ years of experience in data & analytics
  • Entrepreneur
slide-3
SLIDE 3

Apache Superset

A data visualization and exploration platform

3

slide-4
SLIDE 4

4

slide-5
SLIDE 5

5

slide-6
SLIDE 6

6

slide-7
SLIDE 7

7

slide-8
SLIDE 8

8

slide-9
SLIDE 9
  • Easy-to-use & fast “time-to-dashboard”
  • Enterprise-ready (RBAC) & cloud-native
  • Richest set of visualizations (50+)

○ Solid geospatial visualization

  • Lightweight semantic layer
  • Works with a wide array of databases
  • Deep integration with Druid
  • A thriving and growing community

Apache Superset

A data visualization and exploration platform

9

slide-10
SLIDE 10

10

Panoramix Caravel

The early days

slide-11
SLIDE 11

11

slide-12
SLIDE 12

The Superset Project

  • Thriving & accelerating open source community
  • Most promising open source BI solution
  • 1500 WAU at Airbnb (replaced Tableau), 400 WAU at Lyft
  • 12 + committed engineers at 3 leading tech companies

12

slide-13
SLIDE 13

Stack

ES6 Javascript Frontend

  • React / Redux
  • webpack / eslint / jest
  • Broken down as many packages @supserset-ui/*
  • nvd3, data-ui (VX), blocks, ...

Python Backend

  • Flask.* + Flask App Builder
  • Pandas
  • SQLAlchemy (ORM + SQL Toolkit)
  • Many utility libs (sqlparse, dateutils, ...)

13

slide-14
SLIDE 14

Async Infra [optional]

14

metadata

MySQL, Postgres, MariaDB, ...

Message Queue

Redis, RabbitMQ, ...

WSGI Web Server(s) Celery Worker(s) Chart Cache (optional)

Redis, Memcached, ...

Results Cache

[optional] S3, HDFS,...

Analytics Databases

Druid, Presto, Hive, Redshift, BigQuery, MySQL, Postgres, Snowflake, MemCached, ...

Architecture

slide-15
SLIDE 15

Challenges

15

slide-16
SLIDE 16

Challenge: a fast pace repo

16

slide-17
SLIDE 17

Challenge: a huge dependency tree

17

Javascript:

  • 88 production packages
  • 68 dev packages
  • ls node_modules/ | wc -l == 1242

Python

  • 35 direct dependencies
  • ~66 leaves in the dep tree
slide-18
SLIDE 18

Challenge: Release Management

18

slide-19
SLIDE 19

Challenge: Coordination

19

slide-20
SLIDE 20

Challenge: ASF bureaucracy

20

slide-21
SLIDE 21

Roadmap

  • Steady Apache-approved releases
  • Quality & polish ++
  • Thumbnails + cards!
  • A formal data access layer API
  • Embeddable components
  • Schedule simple data pipelines

21

slide-22
SLIDE 22

What’s next!?

  • Automated root cause analysis & anomaly detection
  • Assisted dashboard generation
  • Collaborative workspaces & social features
  • Mobile!
  • Data governance & auditing
  • Integrated notebooks
  • Storytelling
  • Specialized visualization packages
  • ML models introspection
  • Alerts, notifications, email/mobile delivery

22

slide-23
SLIDE 23

Conclusion

23

  • I’m looking to help companies onboard!
  • Interested in working on Superset!?

max@preset.io github.com/apache/incubator-superset apache-superset.slack.com