Presto Summit NYC 2019
December 11, 2019
Slack handles: @cheolsoo; @abhonsule slack-corp.com
Presto Summit NYC 2019 December 11, 2019 Slack handles: @cheolsoo; - - PowerPoint PPT Presentation
Presto Summit NYC 2019 December 11, 2019 Slack handles: @cheolsoo; @abhonsule slack-corp.com Mission Make peoples working lives simpler, more pleasant and more productive. Slack Data Engineering at Slack Custodian of all data generated
December 11, 2019
Slack handles: @cheolsoo; @abhonsule slack-corp.com
215B +270M 700B 250B Logs Daily Messages Daily Records Messages Table
Custodian of all data generated within Slack, the product. We provide the infrastructure and tooling necessary for stakeholders to reliably access product data for user facing features, product and business insights.
Databooks AB Testing framework BI portal
Presto
Airflow Analytics .ts Sqooper
Slack’s AB testing/ Experiments framework Tool used by Analysts, Data scientists, Marketing, Sales, Finance BI tool used by Corp/ Biztech Batch ingestion system Slack’s internal analytics portal - Product Managers, Engineers, Analysts, Data scientists, Sales, Marketing, Finance DAGs running on ETL scheduling system
clog queries
Query client logs
Past Present Future
Presto on EMR Single cluster Starburst on EC2 Multiple clusters Federated clusters
Query success rate
Query count
balancing
properties
capacity planning
cluster in parallel
config changes or version upgrades
with 25-lines of code
with spot
per cluster
groups config
scheduling policies config
weighted_fair (etl)
resource groups
scheduling policies
weighted_fair (etl)
7071:/usr/local/jmx_exporter/exporter.yml JVM self.consul_job( 'presto', datacenters=[env + '-us-east-1-dw1'], services=['presto'] ) Prometheus
curl -XPUT localhost:8889/v1/info/state -d "SHUTTING_DOWN" -H "Content-type: application/json" Graceful decommission "auto_scaling_group": { "prepare_for_termination_cmd": "<cmd>" } Chef role
balancing
impact of rogue queries
Slack handles: @cheolsoo; @abhonsule slack-corp.com