Benchmarking Elastic Cloud Big Data Serv rvices under SLA - - PowerPoint PPT Presentation

benchmarking elastic cloud big data serv rvices under sla
SMART_READER_LITE
LIVE PREVIEW

Benchmarking Elastic Cloud Big Data Serv rvices under SLA - - PowerPoint PPT Presentation

Benchmarking Elastic Cloud Big Data Serv rvices under SLA Constraints Nic icola las Pog oggi, Victor Cuevas-Vicenttn, David Carrera, Josep Lluis Berral, Thomas Fenech, Gonzalo Gomez, Davide Brini, Alejandro Montero Umar Farooq Minhas, Jose


slide-1
SLIDE 1

Benchmarking Elastic Cloud Big Data Serv rvices under SLA Constraints

Nic icola las Pog

  • ggi, Victor Cuevas-Vicenttín, David Carrera, Josep Lluis Berral, Thomas Fenech,

Gonzalo Gomez, Davide Brini, Alejandro Montero Umar Farooq Minhas, Jose A. Blakeley, Donald Kossmann, Raghu Ramakrishnan and Clemens Szyperski.

TPCTC - August 2019

slide-2
SLIDE 2

Outline

  • 1. Intro to TPCx-BB
  • a. Limitations for cloud systems
  • b. Contributions
  • 2. Realistic workload generation
  • a. Production datasets
  • b. Job arrival rates
  • 3. Elasticity Test
  • a. Current metric
  • b. SLA-based addition
  • 4. Experimental evaluation
  • a. Elasticity Test
  • b. Load, Power, Throughput tests
  • c. Metric evaluation
  • 5. Conclusions
  • a. Future directions

2

slide-3
SLIDE 3

Benchmarking and TPCx-BB

  • Benchmarks capture the solution to a problem and guide decisions.
  • Widely used in development, configuration, and testing.
  • TPCx-BB (BigBench) is the first standardized big data benchmark
  • Collaboration between industry and academia
  • Follows the retailer model of TPC-DS
  • Adds:
  • Semi and unstructured data
  • SQL, UDF, ML, and NLP queries

Retailer data model

slide-4
SLIDE 4

TPCx-BB benchmark workflow

  • Similar to previous TPC database benchmarks:
  • Load Test (TLD):
  • Generates the DB
  • imports raw data, metastore, stats, columnar
  • Power Test (TPT)
  • Runs queries sequentially
  • Throughput Test (TTT)
  • Runs queries concurrently
  • Includes a data refresh stage
  • Produces a final performance metric
  • BB queries per minute

DB @ SF Load data

Seq

q1 … q30

User1

q15 q21 … q16

User2

q12 q18 … q2

UserN

… Metric

slide-5
SLIDE 5

Limitations of the cocurrency test

Dr Drawback 1: 1:

  • Constant concurrency workloads

at the same scale

Dr Drawback 2: 2:

  • Does not consider QoS (is

(isola lation)

  • Query time degradation is not obvious

from the final metric

  • We found poor scalability under

concurrency in BB [1]

Stream1

q15 q21 … q16

Stream2

q12 q18 … q2

Stream3

q16 q30 … q19

[1] Characterizing BigBench queries, Hive, and Spark in multi-cloud environments TPCTC'17 Q4 from 10 to 100GB

  • ver 15X slower
slide-6
SLIDE 6

Proposal and contributions

  • 1. Build a realistic big data work

rklo load generator

  • Based on production workloads
  • 2. Measure QoS in the form of per

per-query ry SL SLAs

  • Apply the results in a new metric
  • With minimal parameters

3.

  • 3. Ext

xtend TP TPCx Cx-BB BB with a new concurrency test and metric

  • Implement a driver and evaluate differences
slide-7
SLIDE 7

Realistic workload generation

slide-8
SLIDE 8

Analyzing production big data workloads

  • Cosmos cluster operated within Microsoft
  • Sample of 350,000 job submissions
  • Over a month of data in 2017
  • Objectives:
  • 1. Model job submission patterns
  • 2. Workload characterization

Peaks Valleys

slide-9
SLIDE 9

Modeling arrival rates

  • Use Hidden Markov Model (HMM) to

model temporal pattern in the workload

  • Probabilities between finite number of states
  • HMM allows scaling the workload

Peaks Valleys

slide-10
SLIDE 10

Modeling arrival rates

  • Use Hidden Markov Model (HMM) to

model temporal pattern in the workload

  • Probabilities between finite number of states
  • HMM allows scaling the workload

Fluctuations are captured by 4 states and the transitions between them

Peaks Valleys

slide-11
SLIDE 11

Job input data size

  • As no general temporal pattern found
  • Cumulative distribution sufficient for

modeling SF

  • CDF used to generate random

variates mapped to SF

  • 1, 10, 100, 1000 GB
  • Studied further in [2]
  • Findings:
  • 55% < 1GB
  • 90% < 1TB

CD CDF of

  • f th

the e job’s input data size

[2] Big Data Data Management Systems performance analysis using Aloja and BigBench. Master thesis

slide-12
SLIDE 12

Elasticity Test

slide-13
SLIDE 13

Methodology for generating workloads

1.

  • 1. Se

Set t sc scale le (max concurrent submissions)

  • Defaults to n
  • Total queries = n * total queries

2.

  • 2. Generate model

l (queries per interval)

1. Assign queries to each batch randomly

  • Query repetition avoided within a batch
  • 2. Multi scale factors can be set
  • Include all standard smaller SF

3.

  • 3. Defi

fine granula larit ity

1. Set time between batches 2. Defaults to 60s.

slide-14
SLIDE 14

Methodology for generating workloads

1.

  • 1. Se

Set t sc scale le (max concurrent submissions)

  • Defaults to n
  • Total queries = n * total queries

2.

  • 2. Generate model

l (queries per interval)

1. Assign queries to each batch randomly

  • Query repetition avoided within a batch
  • 2. Multi scale factors can be set
  • Include all standard smaller SF

3.

  • 3. Defi

fine granula larit ity

1. Set time between batches 2. Defaults to 60s.

t1

q17

t2

q7

t3

q15 q21

t4

q6 q9 q14

t5

q9 q14

t6

q11 q22 q21

t7

q16 q15

t8

q24

Elas lastic icity Test se sequence

Time intervals # queries / batch

slide-15
SLIDE 15

New SLA-aware benchmark metric

  • Query-specific SLAs
  • Sets a limit for query completion time
  • Measures
  • Number of misses
  • Distance to SLA
  • Currently defined ad-hoc
  • Uses Power Test times for the SUT(s)
  • Adds a 25%

25% margin tolerance

  • Benefits
  • Works on all SF and future proof
slide-16
SLIDE 16

New SLA-aware benchmark metric

  • Query-specific SLAs
  • Sets a limit for query completion time
  • Measures
  • Number of misses
  • Distance to SLA
  • Currently defined ad-hoc
  • Uses Power Test times for the SUT(s)
  • Adds a 25%

25% margin tolerance

  • Benefits
  • Works on all SF and future proof

Example: q1 took 38s. in isolation SLA for q1 = 47.5s.

slide-17
SLIDE 17

New SLA-aware benchmark metric

  • Query-specific SLAs on concurrency
  • Sets a limit for query completion time
  • Measures
  • Number of misses
  • Distance to SLA
  • Indirectly isolation and dependencies
  • Currently defined ad-hoc
  • Uses Power Test times for the SUT(s)
  • Adds a 25%

25% margin tolerance

  • Benefits
  • Works on all SF and future proof to tech.

Example: q1 took 38s. in isolation SLA for q1 = 47.5s.

t1

q17

t2

q7

t3

q15 q21

t4

q6 q9 q14

t5

q9 q14

t6

q11 q22 q21

t7

q16 q15

t8

q24

Elas lastic icity Test se sequence

Time # queries / batch time

SLA distance

slide-18
SLIDE 18

Current TPCx-BB performance metric

slide-19
SLIDE 19

Current TPCx-BB performance metric

Scale factor Total number of queries

slide-20
SLIDE 20

Current TPCx-BB performance metric

slide-21
SLIDE 21

Current TPCx-BB performance metric

slide-22
SLIDE 22

Current TPCx-BB performance metric

slide-23
SLIDE 23

New SLA-aware benchmark metric

BB++

slide-24
SLIDE 24

New SLA-aware benchmark metric

BB++

slide-25
SLIDE 25

New SLA-aware benchmark metric

Interval between each batch of queries

BB++

slide-26
SLIDE 26

New SLA-aware benchmark metric

BB++

SLA distance

slide-27
SLIDE 27

New SLA-aware benchmark metric

BB++

SLA factor

slide-28
SLIDE 28

New SLA-aware benchmark metric

BB++

Total execution time of the elasticity test

slide-29
SLIDE 29

SLA distance

  • Distance between the actual execution time and the specified SLA
slide-30
SLIDE 30

SLA distance

  • Distance between the actual execution time and the specified SLA
slide-31
SLIDE 31

SLA distance

  • Distance between the actual execution time and the specified SLA

Queries that complete within their SLA do not contribute to the sum

slide-32
SLIDE 32

SLA distance

  • Distance between the actual execution time and the specified SLA
slide-33
SLIDE 33

SLA factor

< 1 when less tan 25% of the queries fail their SLA, > 1 if more of 25% of the queries fail their SLA

slide-34
SLIDE 34

SLA factor

< 1 when less tan 25% of the queries fail their SLA, > 1 if more of 25% of the queries fail their SLA

Number of queries that fail to meet their SLA

slide-35
SLIDE 35

SLA factor

< 1 when less tan 25% of the queries fail their SLA, > 1 if more of 25% of the queries fail their SLA

slide-36
SLIDE 36

Experimental evaluation

slide-37
SLIDE 37

Experimental evaluation

  • Experiments performed on Apache Hiv

ive (2.2/2.3) and Sp Spark rk (2.1/2.2)

  • Benchmark runs limited to the 14 SQ

SQL querie ies of TPCx-BB

  • Due to errors and scalability limitations
  • Using a fixed scale factor
  • Total 512

512-cores and 2TB TB of RAM

  • 32 workers: 16 vcpus and 64GB RAM
  • Ran on 3 majo

jor clo loud provi viders using block storage

  • Results an

anonymized

  • (Only results for Provid

ider1 at t 10 10TB presented)

slide-38
SLIDE 38

Ela lasticity Test at 10TB and 2 streams

Provid ider A: : Hiv ive

slide-39
SLIDE 39

Ela lasticity Test at 10TB and 2 streams

Provid ider A: : Hiv ive Provid ider A: : Spark

slide-40
SLIDE 40

Complete TPCx-BB test times at 10TB

21

Provider A: Hive Provider A: Spark Elasticity Time (s) 7,084 6,603 Throughput Time (s) 12,878 6,496 Power Time (s) 5,036 5,520 Load time (s) 5,124 5,124 Total Time (s) 30,122 23,743

5,124 5,124 5,036 5,520 12,878 6,496 7,084 6,603

Total Time (s), 30,122 Total Time (s), 23,743

5,000 10,000 15,000 20,000 25,000 30,000 35,000

Time (s)

Provid ider A: : Hiv ive Provid ider B: : Spark

slide-41
SLIDE 41

BB BB++ ++Qpm (n (new)

1 2 Provider A: Hive 1,352 295 Provider A: Spark 1,767 1,286

Provider A: Hive 1,352 Provider A: Hive 295 Provider A: Spark 1,767 Provider A: Spark 1,286 Metric score

Comparison of the two scores at 10TB

22

Hive gets 4.3x lower score in the new metric 30% diff Spark also gets a lower score

BB++Qpm BBQpm

BB BBQpm (old (old)

slide-42
SLIDE 42

BB BB++ ++Qpm (n (new)

1 2 Provider A: Hive 1,352 295 Provider A: Spark 1,767 1,286

Provider A: Hive 1,352 Provider A: Hive 295 Provider A: Spark 1,767 Provider A: Spark 1,286 Metric score

Comparison of the two scores at 10TB

22

Hive gets 4.3x lower score in the new metric 30% diff Spark also gets a lower score

BB++Qpm BBQpm

BB BBQpm (old (old)

slide-43
SLIDE 43

Summary and future directions

slide-44
SLIDE 44

Summary

  • The throughput test under TPC DB benchmarks provides limited signal
  • Closed loop system (constant loa

load)

  • Does not consider temporal patterns
  • Limited test of load balancers and schedulers (no
  • queueing)
  • Modeling a real-world big data cluster we have produced:
  • A workload generator with job

job arr arrival l rates

  • Multi-data-scales test
  • Extended TPCx-BB with the Elasticity Test
  • Incorporating SL

SLAs and proposing a new metric

  • Evaluated its applicability to cloud big data systems
  • And how scores differs to the current metric

24

slide-45
SLIDE 45

Conclusions and future work

  • The Elasticity Test considers aspects crucial for the cloud
  • Dynamic workloads in accordance to real-world behavior
  • QoS at the query-level or is

isola

  • lation
  • The ET can improve the development of elastic cloud systems
  • By rewarding systems that can keep QoS under con
  • ncurrency
  • While sa

savin ing costs in periods of low intensity

Futu ture dir irectio ions

  • Test elastic DBaaS / QaaS under concurrency
  • Specification of SLAs needs to be studied further
  • Work with this community and gather feedback and next steps
slide-46
SLIDE 46

Thanks, , questions?

Foll

  • llow up /

/ feedback : : Np Npogg ggi@ac.upc.edu

Benchmarking Ela lastic Clo loud Big ig Data Services under SLA Constraints

TPCTC - August 2019

slide-47
SLIDE 47

Extra slides

slide-48
SLIDE 48

Elasticity Test at 1TB Hive: Prov A and B

SLA tester (sample)

slide-49
SLIDE 49

Sample total queries and arrivals

Workload parameters:

  • 10 TB scale factor
  • 2 streams of 14 SQL queries
  • total of 28 queries
  • λbatch = 240 sec (4 min)
slide-50
SLIDE 50

Experiments at 100GB with 8-streams (112 total queries)

Fas ast system Slo Slow w system sho howin wing que queuein ing g and and degr degraded per perfor

  • rmance