the history and anatomy of apache superset maxime
play

The history and anatomy of Apache Superset Maxime Beauchemin Open - PowerPoint PPT Presentation

The history and anatomy of Apache Superset Maxime Beauchemin Open source leader & community builder Creator of Apache Superset Creator of Apache Airflow Digital artist Influencer in the data engineering space 15+


  1. The history and anatomy of Apache Superset

  2. Maxime Beauchemin ● Open source leader & community builder ○ Creator of Apache Superset Creator of Apache Airflow ○ ● Digital artist ● Influencer in the data engineering space ● 15+ years of experience in data & analytics Entrepreneur ● 2

  3. Apache Superset A data visualization and exploration platform 3

  4. 4

  5. 5

  6. 6

  7. 7

  8. 8

  9. Apache Superset A data visualization and exploration platform Easy-to-use & fast “time-to-dashboard” ● Enterprise-ready (RBAC) & cloud-native ● Richest set of visualizations (50+) ● Solid geospatial visualization ○ Lightweight semantic layer ● Works with a wide array of databases ● Deep integration with Druid ● A thriving and growing community ● 9

  10. The early days Caravel Panoramix 10

  11. 11

  12. The Superset Project Thriving & accelerating open source community ● Most promising open source BI solution ● 1500 WAU at Airbnb (replaced Tableau), 400 WAU at Lyft ● 12 + committed engineers at 3 leading tech companies ● 12

  13. Stack ES6 Javascript Frontend React / Redux ● webpack / eslint / jest ● Broken down as many packages @supserset-ui/* ● nvd3, data-ui (VX), blocks, ... ● Python Backend Flask.* + Flask App Builder ● Pandas ● SQLAlchemy (ORM + SQL Toolkit) ● Many utility libs (sqlparse, dateutils, ...) ● 13

  14. Architecture Async Infra [optional] metadata MySQL, Postgres, MariaDB, ... Celery WSGI Web Worker(s) Server(s) Message Queue Redis, RabbitMQ, ... Chart Cache (optional) Results Cache Redis, Memcached, ... Analytics Databases [optional] S3, HDFS,... Druid, Presto, Hive, Redshift, BigQuery, MySQL, Postgres, Snowflake, MemCached, ... 14

  15. Challenges 15

  16. Challenge: a fast pace repo 16

  17. Challenge: a huge dependency tree Javascript: 88 production packages ● 68 dev packages ● ls node_modules/ | wc -l == 1242 ● Python 35 direct dependencies ● ~66 leaves in the dep tree ● 17

  18. Challenge: Release Management 18

  19. Challenge: Coordination 19

  20. Challenge: ASF bureaucracy 20

  21. Roadmap Steady Apache-approved releases ● Quality & polish ++ ● Thumbnails + cards! ● A formal data access layer API ● Embeddable components ● Schedule simple data pipelines ● 21

  22. What’s next!? Automated root cause analysis & anomaly detection ● Assisted dashboard generation ● Collaborative workspaces & social features ● Mobile! ● Data governance & auditing ● Integrated notebooks ● Storytelling ● Specialized visualization packages ● ML models introspection ● Alerts, notifications, email/mobile delivery ● 22

  23. Conclusion I’m looking to help companies onboard! ● Interested in working on Superset!? ● max@preset.io github.com/apache/incubator-superset apache-superset.slack.com 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend