presto at wayfair
play

Presto at Wayfair Vinay Narayana https:// - PowerPoint PPT Presentation

Presto at Wayfair Vinay Narayana https:// www.linkedin.com/in/vinaynarayana/ @nvinay26 1. Problem Statement 2. Why Presto ? 3. Presto at Wayfair Deployment Adoption Performance Monitoring 4. Whats Next 2 Problem Statement 3


  1. Presto at Wayfair Vinay Narayana https:// www.linkedin.com/in/vinaynarayana/ @nvinay26

  2. 1. Problem Statement 2. Why Presto ? 3. Presto at Wayfair Deployment Adoption Performance Monitoring 4. What’s Next 2

  3. Problem Statement 3

  4. Remedies 1. Optimize Hive queries 2. Set up queues to prioritize batch jobs 3. Throttle users to 2 ad-hoc hive queries 4. Move jobs from Hive to Spark 5. Conduct SME training session for both Hive and Spark 4

  5. Why Presto ? ● It’s VERY fast! ● ANSI SQL Support ● Presto can run separately from the storage HDFS cluster making it great for interactive queries ● Single SQL query to access, combine and analyze data from multiple data sources (unlike Impala) ● Presto is easier to understand and use versus Spark 5

  6. Presto At Wayfair Presto Ad Hoc Cluster Hive Presto Workers Metastore Clients Presto Coordinator 6

  7. Presto At Wayfair Presto Ad Hoc Cluster Presto CLI Presto ad hoc (Read Only Cluster) Version: 0.217 301 VM’s (8*64) with 1 Coordinator, 300 Workers Total available Memory ~20TB Total CPU available 2400 vcores 7

  8. Adoption – before & after Presto Ad Hoc Cluster Hive Queries per Month prior to Presto’s performance won almost half Presto of Hive activity in just two months . 40% 80K Queries 8

  9. Adoption – after Presto Ad Hoc Cluster Presto users growth over Presto Queries per Month the year 6x Growth 9

  10. Query Throttling Presto Ad Hoc Cluster SELECT only ● 2 queries per user ● 2 queued queries per user ● Increased the time limit ● from 5 to 10 mins A v g e x e c u t i o n t i m e d r o p p e d f r o m 5 1 s e c s t o 2 0 s e c s 10

  11. Presto Read/Write Cluster Beta In beta with 50 nodes ● Limited # of users using a default ● namespace Faster writes/inserts than hive ● Resource grouping is enabled via queues ● 10 min limit on query execution time ● 11

  12. POC: Starburst Presto Distribution 12

  13. Monitoring Presto Skynet (internal) 13

  14. What’s Next Rationalize Continue Presto to Tableau Presto in Google BigQuery vs migrating jobs to Connector Cloud Presto in GCP Presto 14

  15. Questions? 15

  16. Adoption – surprises Presto Ad Hoc Cluster ● Overall, very few surprises! ● No official Presto connector for Vertica (very popular at Wayfair) and Vertica internal libraries are closed... so we wrote our own connector ● Performance & unification has become so popular, devs now asking to point their applications to Presto as a data interface layer (evaluating…) 17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend