Computation Reuse in Analytics Job Service at Microsoft Alekh - - PowerPoint PPT Presentation

computation reuse in analytics job service at microsoft
SMART_READER_LITE
LIVE PREVIEW

Computation Reuse in Analytics Job Service at Microsoft Alekh - - PowerPoint PPT Presentation

Computation Reuse in Analytics Job Service at Microsoft Alekh Jindal, Shi Qiao, Hiren Patel, Zhicheng Yin, Jieming Di, Malay Bag, Marc Friedman, Yifung Lin, Konstantinos Karanasos, Sriram Rao Microsoft Computation Reuse in Analytics ! Job


slide-1
SLIDE 1

Computation Reuse in Analytics Job Service at Microsoft

Alekh Jindal, Shi Qiao, Hiren Patel, Zhicheng Yin, Jieming Di, Malay Bag, Marc Friedman, Yifung Lin, Konstantinos Karanasos, Sriram Rao Microsoft

slide-2
SLIDE 2

Computation Reuse in Analytics Job Service at Microsoft

Alekh Jindal, Shi Qiao, Hiren Patel, Zhicheng Yin, Jieming Di, Malay Bag, Marc Friedman, Yifung Lin, Konstantinos Karanasos, Sriram Rao Microsoft

!

slide-3
SLIDE 3

A brief history of Views

Typical Materialized View Assumptions:

  • Tuning few databases
  • Relatively static data

with some updates

  • Views materialized

a priori and offline

  • Accurate estimates of

utility/cost of view materialization

First VLDB First SIGMOD

Materialized Views Logical Database Design View Maintenance Optimizing Queries View Selection XML XQuery Knowledge Bases Incremental Dynamic Partial

slide-4
SLIDE 4

What’s new: Analytics-as-a-Service!

Also, Job Service or Serverless Analytics:

  • Not require users to manage h/w or s/w
  • Only provide SQL queries over stored data
  • Service provider takes care of the execution
  • Users only pay for the processing cost

Typical Materialized View Assumptions:

  • Tuning few databases
  • Relatively static data

with some updates

  • Views materialized

a priori and offline

  • Accurate estimates of

utility/cost of view materialization Experience from SCOPE Job Service:

  • Cluster-wide computation overlaps
  • Recurring jobs with new inputs
  • Always online with SLA requirements
  • Cost estimations very challenging

SCOPE Job Service at Microsoft:

  • ~105 number of machines
  • ~105 number of analytical jobs
  • ~103 developers across Microsoft
  • ~EBs data processed per day
slide-5
SLIDE 5

Reassigning Passengers to Planes in Mid-Air

Boston -> Paris -> Tokyo Boston -> Paris Boston -> Tokyo

slide-6
SLIDE 6

Reassigning Passengers to Planes in Mid-Air

Boston -> Paris -> Tokyo Boston -> Paris Boston -> Tokyo

slide-7
SLIDE 7

Assumption: Recurring Workloads Assumption: Exact Subexpression Matches

CloudViews Overview

slide-8
SLIDE 8

CloudViews Overview

Recurring Workload Feedback Loop View Sel.

  • Phy. Design

Expiry Metadata Service User Interfaces & Tooling Online Materialization Rewrite queries using Views Synchronization Job Coordination

slide-9
SLIDE 9

Recurring Workloads

Feedback Loop View Sel.

  • Phy. Design

Expiry User Interfaces & Tooling Online Materialization Rewrite queries using Views Synchronization Job Coordination Metadata Service Recurring Workload

slide-10
SLIDE 10

Recurring Workloads

  • Periodic queries with different inputs and parameters
  • Structured/unstructured data; custom user code

June 5, 2018 Q1 Q2 June 6, 2018 Q1’ Q2’ June 7, 2018 Q1’’ Q2’’ sig sig’ sig’’

Analysis Reuse

8:00 am 9:00 am 10:00 am

slide-11
SLIDE 11

Reuse over Recurring Workloads

  • Problem: detect/reuse common subexpressions when new data

arrives in each recurring interval

  • Solution: precise/normalized query signatures
slide-12
SLIDE 12

Metadata Service

Recurring Workload Feedback Loop View Sel.

  • Phy. Design

Expiry User Interfaces & Tooling Online Materialization Rewrite queries using Views Synchronization Job Coordination Metadata Service

slide-13
SLIDE 13

Metadata Service

  • Materialized view lookup
  • Consistent view materialization
  • Quick view discovery
slide-14
SLIDE 14

Query Rewriting / Online Materialization

Recurring Workload Feedback Loop View Sel.

  • Phy. Design

Expiry User Interfaces & Tooling Synchronization Job Coordination Metadata Service Rewrite queries using Views Online Materialization

slide-15
SLIDE 15

Query Rewriting / Online Materialization

Query Rewriting using Views Online View Materialization

slide-16
SLIDE 16

Analyzing Production Workloads

  • Cluster-wide overlaps:
  • 45% jobs
  • 65% users
  • 80% subgraphs
  • Operator-wise overlaps:
  • Up to 1000s of overlaps

Shuffle Sort Joins Filters

slide-17
SLIDE 17

Performance Impact

  • Workload: 32 queries
  • Latency:
  • Improvements depend on the

critical path

  • Some queries slower due to

materialization

  • Processing time:
  • Additional processing time for

read/write

  • Savings in general
  • Overheads:
  • Workload analysis in an hour
  • ~10ms metadata service lookup
  • Optimization time higher/lower

when creating/using views

  • Avg. Speedup: 43%
  • Avg. Speedup: 36%
slide-18
SLIDE 18

Lessons Learned

  • Discovering hidden redundancies, static computations
  • Important to get the view physical design right in big data systems
  • Interesting side effects: failure recovery, cost estimates
  • User expectations: automatic, debuggability, privacy regulations
  • Even classic database concepts take a lot of time to bake in industry
  • Challenge: some of the assumptions may not hold
  • Industrial research is fun! ☺
slide-19
SLIDE 19

Thanks!

Coming up:

Selecting Subexpressions to Materialize at Datacenter Scale

Alekh Jindal, Konstantinos Karanasos, Sriram Rao, Hiren Patel VLDB 2018/PVLDB, Rio de Janeiro, Brazil

See you at:

Poster Session 1, Wednesday 16:00-18:00, Houston 567

slide-20
SLIDE 20

✓Materialized views over recurring workloads ✓CloudViews Analyzer

✓Feedback Loop ✓View Selection ✓Physical Design ✓View Expiry

✓CloudViews Runtime

✓Metadata Service ✓Online Materialization ✓Query Rewriting ✓Synchronization ✓Job Coordination

  • What do we mean by computation reuse?
  • What is a “job service”? How is it different from “databases”?
  • How does a job service look like at Microsoft?
  • Why is computation reuse challenging in a job service?
  • What is our solution, key insights, and takeaways?

Computation Reuse in Analytics Job Service at Microsoft

Alekh Jindal (Microsoft), Shi Qiao (Microsoft), Hiren Patel (Microsoft), Zhicheng Yin (Microsoft), Jieming Di (Microsoft), Malay Bag (Microsoft), Marc Friedman (Microsoft), Yifung Lin (Microsoft), Konstantinos Karanasos (Microsoft), Sriram Rao (Microsoft)

Key Ingredients Architecture Questions

Industry 1 Tue, 11-12:30