Presto at Wayfair Vinay Narayana https:// - - PowerPoint PPT Presentation

presto at wayfair
SMART_READER_LITE
LIVE PREVIEW

Presto at Wayfair Vinay Narayana https:// - - PowerPoint PPT Presentation

Presto at Wayfair Vinay Narayana https:// www.linkedin.com/in/vinaynarayana/ @nvinay26 1. Problem Statement 2. Why Presto ? 3. Presto at Wayfair Deployment Adoption Performance Monitoring 4. Whats Next 2 Problem Statement 3


slide-1
SLIDE 1

Presto at Wayfair

Vinay Narayana

https:// www.linkedin.com/in/vinaynarayana/ @nvinay26

slide-2
SLIDE 2

2

1. Problem Statement 2. Why Presto ? 3. Presto at Wayfair Deployment Adoption Performance Monitoring 4. What’s Next

slide-3
SLIDE 3

3

Problem Statement

slide-4
SLIDE 4

4

1. Optimize Hive queries 2. Set up queues to prioritize batch jobs 3. Throttle users to 2 ad-hoc hive queries 4. Move jobs from Hive to Spark 5. Conduct SME training session for both Hive and Spark

Remedies

slide-5
SLIDE 5

5

  • It’s VERY fast!
  • ANSI SQL Support
  • Presto can run separately from the storage

HDFS cluster making it great for interactive queries

  • Single SQL query to access, combine and

analyze data from multiple data sources (unlike Impala)

  • Presto is easier to understand and use versus

Spark

Why Presto ?

slide-6
SLIDE 6

6

Presto At Wayfair

Presto Coordinator Clients Hive Metastore Presto Workers Presto Ad Hoc Cluster

slide-7
SLIDE 7

7

Presto At Wayfair

Presto ad hoc (Read Only Cluster) Version: 0.217 301 VM’s (8*64) with 1 Coordinator, 300 Workers Total available Memory ~20TB Total CPU available 2400 vcores

Presto CLI

Presto Ad Hoc Cluster

slide-8
SLIDE 8

8

Adoption – before & after

80K Queries

40% Hive Queries per Month prior to Presto Presto’s performance won almost half

  • f Hive activity in just two months.

Presto Ad Hoc Cluster

slide-9
SLIDE 9

9

Presto users growth over the year

Adoption – after

Presto Ad Hoc Cluster

Presto Queries per Month 6x Growth

slide-10
SLIDE 10

10

  • SELECT only
  • 2 queries per user
  • 2 queued queries per user
  • Increased the time limit

from 5 to 10 mins

Query Throttling

A v g e x e c u t i

  • n

t i m e d r

  • p

p e d f r

  • m

5 1 s e c s t

  • 2

s e c s Presto Ad Hoc Cluster

slide-11
SLIDE 11

11

  • In beta with 50 nodes
  • Limited # of users using a default

namespace

  • Faster writes/inserts than hive
  • Resource grouping is enabled via queues
  • 10 min limit on query execution time

Presto Read/Write Cluster Beta

slide-12
SLIDE 12

12

POC: Starburst Presto Distribution

slide-13
SLIDE 13

13

Monitoring Presto

Skynet (internal)

slide-14
SLIDE 14

14

What’s Next

Continue migrating jobs to Presto Presto to Tableau Connector Presto in Google Cloud Rationalize BigQuery vs Presto in GCP

slide-15
SLIDE 15

15

Questions?

slide-16
SLIDE 16
slide-17
SLIDE 17

17

  • Overall, very few surprises!
  • No official Presto connector for Vertica (very popular at Wayfair) and

Vertica internal libraries are closed... so we wrote our own connector

  • Performance & unification has become so popular, devs now asking to

point their applications to Presto as a data interface layer (evaluating…)

Adoption – surprises

Presto Ad Hoc Cluster