Presto at Wayfair
Vinay Narayana
https:// www.linkedin.com/in/vinaynarayana/ @nvinay26
Presto at Wayfair Vinay Narayana https:// - - PowerPoint PPT Presentation
Presto at Wayfair Vinay Narayana https:// www.linkedin.com/in/vinaynarayana/ @nvinay26 1. Problem Statement 2. Why Presto ? 3. Presto at Wayfair Deployment Adoption Performance Monitoring 4. Whats Next 2 Problem Statement 3
https:// www.linkedin.com/in/vinaynarayana/ @nvinay26
2
1. Problem Statement 2. Why Presto ? 3. Presto at Wayfair Deployment Adoption Performance Monitoring 4. What’s Next
3
4
1. Optimize Hive queries 2. Set up queues to prioritize batch jobs 3. Throttle users to 2 ad-hoc hive queries 4. Move jobs from Hive to Spark 5. Conduct SME training session for both Hive and Spark
5
HDFS cluster making it great for interactive queries
analyze data from multiple data sources (unlike Impala)
Spark
6
Presto Coordinator Clients Hive Metastore Presto Workers Presto Ad Hoc Cluster
7
Presto ad hoc (Read Only Cluster) Version: 0.217 301 VM’s (8*64) with 1 Coordinator, 300 Workers Total available Memory ~20TB Total CPU available 2400 vcores
Presto CLI
Presto Ad Hoc Cluster
8
80K Queries
40% Hive Queries per Month prior to Presto Presto’s performance won almost half
Presto Ad Hoc Cluster
9
Presto users growth over the year
Presto Ad Hoc Cluster
Presto Queries per Month 6x Growth
10
from 5 to 10 mins
A v g e x e c u t i
t i m e d r
p e d f r
5 1 s e c s t
s e c s Presto Ad Hoc Cluster
11
namespace
12
13
Skynet (internal)
14
Continue migrating jobs to Presto Presto to Tableau Connector Presto in Google Cloud Rationalize BigQuery vs Presto in GCP
15
17
Vertica internal libraries are closed... so we wrote our own connector
point their applications to Presto as a data interface layer (evaluating…)
Presto Ad Hoc Cluster