Hadoop Infrastructure @Uber Past , Present and Future Mayank - - PowerPoint PPT Presentation

hadoop infrastructure uber past present and future
SMART_READER_LITE
LIVE PREVIEW

Hadoop Infrastructure @Uber Past , Present and Future Mayank - - PowerPoint PPT Presentation

Hadoop Infrastructure @Uber Past , Present and Future Mayank Bansal U B E R | Data Ubers Mission Transporta=on as reliable as running water , everywhere, for everyone 75+ Countries 500+ Ci=es And growing U B E R | Data How


slide-1
SLIDE 1

U B E R | Data

Hadoop Infrastructure @Uber Past , Present and Future

Mayank Bansal

slide-2
SLIDE 2

U B E R | Data

“ Transporta=on as reliable as running water , everywhere, for everyone ”

Uber’s Mission

75+ Countries 500+ Ci=es And growing…

slide-3
SLIDE 3

U B E R | Data

How Uber works

slide-4
SLIDE 4

U B E R | Data

How Uber works

slide-5
SLIDE 5

U B E R | Data

How Uber works

slide-6
SLIDE 6

U B E R | Data

Data Driven Decisions

slide-7
SLIDE 7

U B E R | Data

Data Infra Once Upon a 8me.. (2014)

Kafka Logs

Key-Val DB RDBMS DBs

S3

Applica=ons

… ETL

Business Ops A/B Experiments Adhoc Analytics

City Ops

Vertica

Data Warehouse Data Science EMR

slide-8
SLIDE 8

U B E R | Data

Data Infrastructure Today

Kafka8 Logs Schemaless DB SOA DBs

Service Accounts

… ETL

Machine Learning

Experimenta=on

Data Science

Adhoc Analytics

Ops/Data Science

HDFS

City Ops Data Science Spark| Presto Hive

slide-9
SLIDE 9

Few Takeaways …

  • Strict Schema Management

○ Because our largest data audience are SQL Savvy! (1000s of Uber Ops!) ○ SQL = Strict Schema

  • Big Data Processing Tools Unlocked -

Hive, Presto and Spark

○ Migrate SQL savvy users from Ver=ca to Hive & Presto (1000s of Ops & 100s of data scien=sts & analysts) ○ Spark for more advanced users - 100s of data scien=sts

slide-10
SLIDE 10

U B E R | Data

Hadoop Evolu8on @ ebay

2014

1X Nodes 1X PB

2015 10X Nodes 4X PB Data

3000+ node 30,000+ cores 50+ PB

2016 90X Nodes 40X PB Data

Hadoop Evolu8on @ Uber

slide-11
SLIDE 11

U B E R | Data

Hadoop Cluster U=liza=on

  • Over

provisioning for the peak loads.

  • Over capacity

for an=cipa=on

  • f future

growth

slide-12
SLIDE 12

U B E R | Data

Hadoop Evolu8on @ ebay

2014 0 Nodes 2015 X Nodes 2016 300X Nodes

Mesos Evolu8on @ Uber

slide-13
SLIDE 13

U B E R | Data

Mesos Cluster U=liza=on

  • Over

provisioning for the peak loads

  • Over capacity

for an=cipa=on

  • f future growth
slide-14
SLIDE 14

U B E R | Data

End Goal

Online Presto

slide-15
SLIDE 15

U B E R | Data

What we need ?

GLOBAL VIEW OF RESOURCES

slide-16
SLIDE 16

U B E R | Data

Available Resource Managers

slide-17
SLIDE 17

U B E R | Data

Mesos vs YARN

YARN MESOS Single Level Scheduler Two Level Scheduler Use C groups for isola=on Use C groups for Isola=on CPU, Memory as a resource CPU, Memory and Disk as a resource Works well with Hadoop work loads Works well with longer running services YARN support =me based reserva=ons Mesos does not have support of reserva=ons Dominant resource scheduling Scheduling is done by frameworks and depends on case to case basis

Scales Beger Similar Isola=on Disk is beger This is Important Imp for batch SLA’s Beger for batch

slide-18
SLIDE 18

U B E R | Data

Let’s 8ed them together

YARN is good for Hadoop Mesos is good for Longer Running Services

In a Nutshell

slide-19
SLIDE 19

U B E R | Data

slide-20
SLIDE 20

U B E R | Data

  • Myriad is Mesos Framework for Apache

YARN

  • Mesos manages Data Center resources
  • YARN manages Hadoop workloads
  • Myriad
  • Gets resources from Mesos
  • Launches Node Managers
slide-21
SLIDE 21

U B E R | Data

  • YARN will handle

resources handed

  • ver to it.
  • Mesos will work on

rest of the resources

Myriad’s Limita8ons

Sta=c Resource Par==oning

slide-22
SLIDE 22

U B E R | Data

  • YARN will never be able to do over subscrip=on.
  • Node Manager will go away
  • Fragmenta=on of resources
  • Mesos over subscrip=on can kill YARN too

Myriad’s Limita8ons

Resource Over Subscrip=on

slide-23
SLIDE 23

U B E R | Data

  • No Global Quota

Enforcement

  • No Global

Priori=es

Myriad’s Limita8ons

slide-24
SLIDE 24

U B E R | Data

  • Elas=c Resource Management
  • Bin Packing
  • Stability
  • Long List …

Myriad’s Limita8ons

slide-25
SLIDE 25

U B E R | Data

Unified Scheduler

slide-26
SLIDE 26

U B E R | Data

High Level Characteris8cs

  • Global Quota Management
  • Central Scheduling policies
  • Over subscrip=on for both Online and Batch
  • Isola=on and bin packing
  • SLA guarantees at Global Level
slide-27
SLIDE 27

U B E R | Data

Unified Scheduler

slide-28
SLIDE 28

U B E R | Data

Few Takeaways …

  • We need one scheduling layer across all

workloads

  • Par==oning resources are not good
  • At least can save 30% resources
  • Stability and simplicity wins in Produc=on
  • Mul= Level of resource Management and

scheduling will not be scalable

slide-29
SLIDE 29

U B E R | Data

slide-30
SLIDE 30

U B E R | Data

Ques=ons? mabansal@uber.com mayank@apache.org

slide-31
SLIDE 31

U B E R | Data

Thank You !!!