Rack-scale Data Processing System Jana Giceva , Darko Makreshanski, - - PowerPoint PPT Presentation

rack scale data processing system
SMART_READER_LITE
LIVE PREVIEW

Rack-scale Data Processing System Jana Giceva , Darko Makreshanski, - - PowerPoint PPT Presentation

Rack-scale Data Processing System Jana Giceva , Darko Makreshanski, Claude Barthels, Alessandro Dovis, Gustavo Alonso Systems Group, Department of Computer Science, ETH Zurich Rack-scale Data Processing System Jana Giceva , Darko Makreshanski,


slide-1
SLIDE 1

Rack-scale Data Processing System

Jana Giceva, Darko Makreshanski, Claude Barthels, Alessandro Dovis, Gustavo Alonso Systems Group, Department of Computer Science, ETH Zurich

slide-2
SLIDE 2

Rack-scale Data Processing System

Jana Giceva, Darko Makreshanski, Claude Barthels, Alessandro Dovis, Gustavo Alonso Systems Group, Department of Computer Science, ETH Zurich

Application’s perspective

  • f a rack
slide-3
SLIDE 3

3

Analytical QP Transactional QP Graph processing Machine learning Ad-hoc BI QP

MULTI FLAVOR DATA PROCESSING

FruitBox – a data processing system

Workshop for Rack-scale Computing

slide-4
SLIDE 4

4

FruitBox

Analytical QP Transactional QP Graph processing Machine learning Ad-hoc BI QP

MULTI FLAVOR DATA PROCESSING

  • Building a system for multi-flavor data processing:

1. Hardware that meets the resource demand. 2. System architecture to support workload heterogeneity. 3. Aim for 10s-100s millions of requests per second. 4. Efficient resource utilization.

Workshop for Rack-scale Computing

slide-5
SLIDE 5

FruitBox – a rack-scale data processing system

Analytical QP Transactional QP Graph processing Machine learning Ad-hoc BI QP

MULTI FLAVOR DATA PROCESSING

  • Which box could run such a heterogeneous WL?
  • A multicore is not enough
  • A rack-scale system:
  • More resources
  • Better isolation
  • Blurring the machine-cluster boundaries

RACK-SCALE SYSTEM 1000s of cores TBs of RAM InfiniBand

Workshop for Rack-scale Computing 5

slide-6
SLIDE 6

Rack-scale data processing system

Custom build a rack-scale system for data processing?

Many such commercial systems exists – Data Appliances

Oracle Exadata Netezza (IBM ) TwinFin and many more …

Workshop for Rack-scale Computing 6

slide-7
SLIDE 7
  • Separate data-storage from data-processing
  • Achieve both physical and logical data independence

7

Transactional processing Analytical processing

Storage Engine

Machine Learning

Data processing

System design for Multi-flavor data processing

Workshop for Rack-scale Computing

slide-8
SLIDE 8

Storage Engine

8

Data processing Transactional processing Analytical processing Machine Learning

Storage Engine

Tuple- and batch-based interface to the storage engine.

Workshop for Rack-scale Computing

slide-9
SLIDE 9

Storage Engine

9

Data processing Transactional processing Analytical processing Machine Learning

Storage Engine

KV Stores Scans

Storage engine components:

  • KV Stores (B-tree)
  • Crescando Scans

KV Stores → transactional Scans → analytical Tuple- and batch-based interface to the storage engine.

Workshop for Rack-scale Computing

slide-10
SLIDE 10

Storage Engine

10

Data processing Transactional processing Analytical processing Machine Learning

Storage Engine

KV Stores Scans

Storage engine components:

  • KV Stores (B-tree)
  • Crescando Scans

KV Stores → transactional Scans → analytical Tuple- and batch-based interface to the storage engine. Transaction logic separated from query processing.

MVCC: Snapshot isolation

Hyder [SIGMOD’15], HyPer[VLDB’11], Hekaton [SIGMOD’14], SharedDB [Giannikis PhD’14], Tell [SIGMOD’15], Multimed [Eurosys’11]

slide-11
SLIDE 11

Handling millions of requests/second

11

  • It makes no sense to process them individually

if they access the same data.

  • Why should each query scan a TB of data?
  • Batch requests – share data, computation, bandwidth

… for higher throughput and predictable performance trading off a bit of latency.

IBM Blink, MonetDB/X100[VLDB’07], CJOIN [VLDB’09], Crescando[VLDB’09], SharedDB [VLDB’12,’14],

vs

Workshop for Rack-scale Computing

slide-12
SLIDE 12

Efficient resource utilization

  • Getting the most out of such a complex system

requires cross-layer optimization.

  • e.g. DB/OS co-design
  • Already some work on multicore systems.

12

Analytical QP Transactional QP Graph processing Machine learning Ad-hoc BI QP

MULTI FLAVOR DATA PROCESSING

  • Noisy system environment
  • Load interaction

Unpredictable performance Not meeting SLAs

  • Resource overprovisioning

Inefficiency and higher cost

slide-13
SLIDE 13

12

OS DB

What is the knowledge we have? Who knows what?

Big semantic gap!

COD: DB/OS co-design

Application requirements and characteristics System state and utilization of resources Hardware & architecture +

COD: Database/Operating System co-design [CIDR’12]

Workshop for Rack-scale Computing

slide-14
SLIDE 14

13

DB storage engine OS policy engine DBMS OS

  • ther apps

DB/OS Interface

Constraints and requirements Notification on updates Explicit allocation

COD’s interface

Workshop for Rack-scale Computing

slide-15
SLIDE 15

14

1 2 3 4 5 6 7 8 9 5 10 15 20

Latency [sec] Elapsed time [min]

Adaptability – Latency

Naïve datastore engine

SLA COD

Experiment setup

  • AMD MagnyCours
  • 4 x 2.2GHz AMD Opteron 6174 processors
  • total Datastore size 53GB
  • Noise: another CPU-intensive task

running on core 0

Adaptability to dynamic system state

Workshop for Rack-scale Computing

slide-16
SLIDE 16

Resource efficient deployment

15

Deployment algorithm

Query plan Resource requirements

  • f operators

Multicore machine

Data dependency graph Resource Activity Vectors Model of multicore machine Deployment of operators to CPU cores

DB OS

Deployment of query plans on multicores [VLDB’15]

Workshop for Rack-scale Computing

slide-17
SLIDE 17

Evaluation

16

Query plan

  • SharedDB’s TPC-W [1]
  • 11 web-interactions in
  • ne query plan
  • 44 operators
  • 20GB dataset

7 2 6 3 1 5 4 AMD Magnycours

  • 4 x 2 dies:
  • 6 cores
  • 5 MB L3 cache
  • 16 GB NUMA node

L3 cache

D R A M

46 42 38 34 30 26

[1] SharedDB – Giannikis et al. VLDB’12

Workshop for Rack-scale Computing

slide-18
SLIDE 18

Comparison with standard approaches

17

Throughput [WIPS] Response Time [ms] Approaches # cores Average Stdev 50th 90th 99th Default OS 48 Operator per core 44 Deployment algorithm

Workshop for Rack-scale Computing

slide-19
SLIDE 19

Comparison with standard approaches

18

Throughput [WIPS] Response Time [ms] Approaches # cores Average Stdev 50th 90th 99th Default OS 48 317.30 31.11 8.22 72.43 82.03 Operator per core 44 425.86 54.34 14.59 22.93 36.08 Deployment algorithm

Workshop for Rack-scale Computing

slide-20
SLIDE 20

Comparison with standard approaches

19

Throughput [WIPS] Response Time [ms] Approaches # cores Average Stdev 50th 90th 99th Default OS 48 317.30 31.11 8.22 72.43 82.03 Operator per core 44 425.86 54.34 14.59 22.93 36.08 Deployment algorithm 6

Workshop for Rack-scale Computing

slide-21
SLIDE 21

Comparison with standard approaches

20

Throughput [WIPS] Response Time [ms] Approaches # cores Average Stdev 50th 90th 99th Default OS 48 317.30 31.11 8.22 72.43 82.03 Operator per core 44 425.86 54.34 14.59 22.93 36.08 Deployment algorithm 6 428.07 32.80 15.36 23.73 36.13

Workshop for Rack-scale Computing

slide-22
SLIDE 22

Comparison with standard approaches

21

Throughput [WIPS] Response Time [ms] Approaches # cores Average Stdev 50th 90th 99th Default OS 48 317.30 31.11 8.22 72.43 82.03 Operator per core 44 425.86 54.34 14.59 22.93 36.08 Deployment algorithm 6 428.07 32.80 15.36 23.73 36.13

Performance / Resource efficiency savings of x7.37

Workshop for Rack-scale Computing

slide-23
SLIDE 23

Conclusion

22

  • Multi-flavor data processing system
  • We have all the pieces of the puzzle

Separate data- storage from data-processing Efficient resource management Batching as a first class citizen … on a rack-scale system

Workshop for Rack-scale Computing

Putting them together opens a lot of opportunities.

slide-24
SLIDE 24

Conclusion

23

  • Multi-flavor data processing system
  • We have all the pieces of the puzzle

Separate data- storage from data-processing Efficient resource management Batching as a first class citizen … on a rack-scale system

Workshop for Rack-scale Computing

Putting them together opens a lot of opportunities.

  • Intelligent storage engine:
  • Co-processors, active-memory, hardware specialization (FPGAs)
  • Optimizing the network stack:
  • … for different memory access patterns
  • Extend the cross-layer interface:
  • DB optimizer that is aware of the complexity of the rack
  • Rack-scale resource management