Changing the Face of Database Cloud Services with Personalized - - PowerPoint PPT Presentation

changing the face of database cloud
SMART_READER_LITE
LIVE PREVIEW

Changing the Face of Database Cloud Services with Personalized - - PowerPoint PPT Presentation

Changing the Face of Database Cloud Services with Personalized Service Level Agreements Jennifer Ortiz, Victor Teixeira de Almeida, Magdalena Balazinska University of Washington, Computer Science and Engineering PETROBRAS S.A., Rio de Janerio,


slide-1
SLIDE 1

Changing the Face of Database Cloud Services with Personalized Service Level Agreements

Jennifer Ortiz, Victor Teixeira de Almeida, Magdalena Balazinska

University of Washington, Computer Science and Engineering PETROBRAS S.A., Rio de Janerio, RJ, Brazil

1

CIDR 2015

slide-2
SLIDE 2

2

slide-3
SLIDE 3

Many Data Management & Analytics Systems Available

3

slide-4
SLIDE 4

Many Systems are Available as Cloud Services

4

slide-5
SLIDE 5

Cloud Services Today

5

How many instances of the service? Which Hadoop Version? Pig or Hive?

Amazon EMR

slide-6
SLIDE 6

Cloud Services Today

6

How many instances of the service? Which Hadoop Version? Pig or Hive?

Amazon EMR

slide-7
SLIDE 7

7

Cloud Services Today

How long will my query take?

BigQuery

slide-8
SLIDE 8

8

Cloud Services Can Do Better!

System Internals

slide-9
SLIDE 9

9

Cloud Services Can Do Better!

System Internals

slide-10
SLIDE 10

10

Cloud Services Can Do Better!

SELECT … FROM … WHERE …

Query:

  • Query Capabilities
  • Time
  • Money
slide-11
SLIDE 11

A new proposal

Time to Re-think the interface…

11

  • Hide details of cluster deployment and resources
  • Show users monetary costs and performance estimates on their data
  • Let users pick the desired trade-off between options shown

Personalized Service Level Agreements

slide-12
SLIDE 12

A PSLA Example

12

Tier 1: $0.10/hour

Within 20 seconds: SELECT <up to 10 attributes> FROM <Fact | Dimension> WHERE <up to 100% of data> Within 1 minute: SELECT <up to 5 attributes> FROM <JOIN Fact + 4 Dimensions> WHERE <up to 10% of data> Within 10 minutes: SELECT <up to 10 attributes> FROM <JOIN Fact + 8 Dimensions> WHERE <up to 100% of data>

Fixed, hourly price Expected performance Templates capture capabilities

slide-13
SLIDE 13

A PSLA Example

13

Tier 1: $0.10/hour

Within 20 seconds: SELECT <up to 10 attributes> FROM <Fact | Dimension> WHERE <up to 100% of data> Within 1 minute: SELECT <up to 5 attributes> FROM <JOIN Fact + 4 Dimensions> WHERE <up to 10% of data> Within 10 minutes: SELECT <up to 10 attributes> FROM <JOIN Fact + 8 Dimensions> WHERE <up to 100% of data>

Different tiers

  • f service

Within 1 second: SELECT <up to 10 attributes> FROM <Fact | Dimension> WHERE <up to 100% of data>

Tier 2: $0.50/hour

Fixed, hourly price Expected performance Templates capture capabilities

slide-14
SLIDE 14

Goals

14

Database D Cloud C

slide-15
SLIDE 15

Goals

15

Database D Cloud C PSLA P

Money, Time, Capabilities

slide-16
SLIDE 16

Goals

16

Database D Cloud C PSLA P PSLAManager

Money, Time, Capabilities

slide-17
SLIDE 17

Goals

17

Database D Cloud C PSLA P PSLAManager

Money, Time, Capabilities

slide-18
SLIDE 18

Example of a Real PSLA

18

slide-19
SLIDE 19

TPC-H Star Schema Benchmark

19

  • Based on TPC-H
  • 10GB
slide-20
SLIDE 20

Myria is a data management service in the cloud that we built at UW. It has a parallel, shared-nothing back-end query execution engine called MyriaX

20

slide-21
SLIDE 21

PSLA for Myria

21

slide-22
SLIDE 22

PSLA for Myria

22

slide-23
SLIDE 23

PSLA for Myria

23

slide-24
SLIDE 24

PSLA for Myria

24

slide-25
SLIDE 25

PSLA for Myria

25

slide-26
SLIDE 26

PSLAManager

26

Database D Cloud C PSLA P PSLAManager

slide-27
SLIDE 27

27

Workload Compression into PSLA (repeat for each tier) Workload Generation Query Clustering Template Generation

Dropping Queries with Similar Times

PSLA

  • Perf. Modeling

Tier Selection

Data Service

PSLAManager Workflow

Perform Offline Perform Online

Runtime Prediction

slide-28
SLIDE 28

28

Workload Compression into PSLA (repeat for each tier) Workload Generation Query Clustering Template Generation

Dropping Queries with Similar Times

PSLA

  • Perf. Modeling

Tier Selection

Data Service

PSLAManager Workflow

Perform Offline Perform Online

Runtime Prediction

slide-29
SLIDE 29

29

Workload Compression into PSLA (repeat for each tier) Workload Generation Query Clustering Template Generation

Dropping Queries with Similar Times

PSLA

  • Perf. Modeling

Tier Selection

Data Service

PSLAManager Workflow

Perform Offline Perform Online

Runtime Prediction

slide-30
SLIDE 30

30

Workload Compression into PSLA (repeat for each tier) Workload Generation Query Clustering Template Generation

Dropping Queries with Similar Times

PSLA

  • Perf. Modeling

Tier Selection

Data Service

PSLAManager Workflow

Perform Offline Perform Online

Runtime Prediction

slide-31
SLIDE 31

31

Workload Compression into PSLA (repeat for each tier) Workload Generation Query Clustering Template Generation

Dropping Queries with Similar Times

PSLA

  • Perf. Modeling

Tier Selection

Data Service

PSLAManager Workflow

Perform Offline Perform Online

Runtime Prediction

slide-32
SLIDE 32

32

Workload Compression into PSLA (repeat for each tier) Workload Generation Query Clustering Template Generation

Dropping Queries with Similar Times

PSLA

  • Perf. Modeling

Tier Selection

Data Service

PSLAManager Workflow

Perform Offline Perform Online

Runtime Prediction

slide-33
SLIDE 33

33

Workload Compression into PSLA (repeat for each tier) Workload Generation Query Clustering Template Generation

Dropping Queries with Similar Times

PSLA

  • Perf. Modeling

Tier Selection

Data Service

PSLAManager Workflow

Perform Offline Perform Online

Runtime Prediction

slide-34
SLIDE 34

Query Workload Generation

34

  • Which queries to generate?

– Joins drive performance

  • Think about possible combinations of joins

– Only consider most expensive queries – Build toward more complex queries, include selections and projections

Tables in Order by Size: Lineitem, Part, Customer, Supplier, Date

(Lineitem Part) (Customer Date), (Lineitem Supplier), etc.

Consider: All possible 2-way joins

slide-35
SLIDE 35

35

Workload Compression into PSLA (repeat for each tier) Workload Generation Query Clustering Template Generation

Dropping Queries with Similar Times

PSLA

  • Perf. Modeling

Tier Selection

Data Service

PSLAManager Workflow

Perform Offline Perform Online

Runtime Prediction

slide-36
SLIDE 36

36

Workload Compression into PSLA (repeat for each tier) Workload Generation Query Clustering Template Generation

Dropping Queries with Similar Times

PSLA

  • Perf. Modeling

Tier Selection

Data Service

PSLAManager Workflow

Perform Offline Perform Online

Runtime Prediction

slide-37
SLIDE 37

37

Workload Compression into PSLA (repeat for each tier) Workload Generation Query Clustering Template Generation

Dropping Queries with Similar Times

PSLA

  • Perf. Modeling

Tier Selection

Data Service

PSLAManager Workflow

Perform Offline Perform Online

Runtime Prediction

slide-38
SLIDE 38

Tier Selection

38

50 100 150 200 250 300 350 2 4 6 8 10 12 14 16 18

Seconds Workers

Runtime Distributions of Query Workload Per Configuration in Myria

Workers (Configurations) Seconds

EMD 17.43 EMD 7.07 EMD 13.74

slide-39
SLIDE 39

39

Tier Selection

slide-40
SLIDE 40

40

Workload Compression into PSLA (repeat for each tier) Workload Generation Query Clustering Template Generation

Dropping Queries with Similar Times

PSLA

  • Perf. Modeling

Tier Selection

Data Service

PSLAManager Workflow

Perform Offline Perform Online

Runtime Prediction

slide-41
SLIDE 41

Workload Compression

41

2 4 6 8 10 12 14 16 18

Workers

  • Threshold-based
  • Density-based

STEP 1: Query Clustering

slide-42
SLIDE 42

2 4 6 8 10 12 14 16 18

Workers 42

th

Workload Compression

STEP 1: Query Clustering

slide-43
SLIDE 43

43

2 4 6 8 10 12 14 16 18

Workers

Workload Compression

STEP 1: Query Clustering

slide-44
SLIDE 44

44

Workload Compression

STEP 1: Query Clustering

Tier 1: $0.XX/hour

Within Cluster Max Threshold: Query Templates in Cluster… Within Cluster Max Threshold: Query Templates in Cluster…

slide-45
SLIDE 45

45

Workload Compression

STEP 2: Template Generation

Time(s) Configuration SELECT (5 ATT) FROM (5 TABLES) WHERE 10% SELECT (4 ATT) FROM (4 TABLES) WHERE 1%

Query Dominance

Queries  Query Templates

slide-46
SLIDE 46

46

Workload Compression

STEP 2: Template Generation

Time(s) Configuration SELECT (5 ATT) FROM (5 TABLES) WHERE 10%

Query Dominance

SELECT (4 ATT) FROM (4 TABLES) WHERE 1% Attributes Projected Tables Selectivity

slide-47
SLIDE 47

47

Workload Compression

STEP 2: Template Generation

Time(s) Configuration

Given:

SELECT (5 ATT) FROM (5 TABLES) WHERE 10%

Query Dominance

SELECT (4 ATT) FROM (4 TABLES) WHERE 1% Attributes Projected Tables Selectivity

slide-48
SLIDE 48

48

Workload Compression

STEP 2: Template Generation

Time(s) Configuration

slide-49
SLIDE 49

49

Workload Compression

Time(s) Configuration

STEP 2: Template Generation

Root Query Template: We call a query template a root query template if no

  • ther query template in the same cluster dominates

it.

slide-50
SLIDE 50

50 50

Configuration 1

Time(s)

Workload Compression

STEP 3: Dropping Queries with Similar Times

slide-51
SLIDE 51

51 51

Configuration 1

Time(s)

Configuration 2

Workload Compression

Root Query Templates Queries  Query Templates

STEP 3: Dropping Queries with Similar Times

slide-52
SLIDE 52

52 52

Configuration 1

Time(s)

Configuration 2

Workload Compression

STEP 3: Dropping Queries with Similar Times

slide-53
SLIDE 53

53 53

Configuration 1

Time(s)

Configuration 2

Workload Compression

STEP 3: Dropping Queries with Similar Times

slide-54
SLIDE 54

54 54

Configuration 1

Time(s)

Configuration 2

Workload Compression

STEP 3: Dropping Queries with Similar Times

slide-55
SLIDE 55

PSLA Quality Assessment

55

slide-56
SLIDE 56

PSLA Quality Metrics

  • PSLA Query Capabilities
  • PSLA Complexity
  • PSLA Performance Error Metric

56

slide-57
SLIDE 57

57

Time(s)

High Complexity Low Error

Time(s)

Low Complexity High Error

Quality Metric Trade-offs

Found that a log interval is best without tuning

slide-58
SLIDE 58

58

PSLA Evaluation on Predicted Runtimes

slide-59
SLIDE 59

59

Workload Compression into PSLA (repeat for each tier) Workload Generation Query Clustering Template Generation

Dropping Queries with Similar Times

PSLA

  • Perf. Modeling

Tier Selection

Data Service

PSLAManager Workflow

Perform Offline Perform Online

Runtime Prediction

slide-60
SLIDE 60

60

Workload Compression into PSLA (repeat for each tier) Workload Generation Query Clustering Template Generation

Dropping Queries with Similar Times

PSLA

  • Perf. Modeling

Tier Selection

Data Service

PSLAManager Workflow

Perform Offline Perform Online

Runtime Prediction

slide-61
SLIDE 61

61

Query Feature Vector

Query workload

Est. Rows

  • Est. IO

Avg. Row

Query Feature Vector

Est. Rows

  • Est. IO

Avg. Row

Cloud Configuration Cloud Configuration

runtime runtime

Based on Predicting Multiple Metrics for Queries: Better Decisions enabled by Machine Learning [Ganapathi et. al. 2009]

Train model

  • ffline on
  • ther data

and queries Predict runtime from query features

Performance Model (Offline)

slide-62
SLIDE 62

Training Dataset

  • Synthetic Dataset

– 10GB – 6 Tables – 61 Attributes

62

slide-63
SLIDE 63

63

Workload Compression into PSLA (repeat for each tier) Workload Generation Query Clustering Template Generation

Dropping Queries with Similar Times

PSLA

  • Perf. Modeling

Tier Selection

Data Service

PSLAManager Workflow

Perform Offline Perform Online

Runtime Prediction

slide-64
SLIDE 64

Predicted Myria PSLA (Predicted Runtimes)

64

slide-65
SLIDE 65

Looking Forward

  • Direct extensions to the approach

– Add support for indexes – Improve time predictions

  • Longer-term future work

– Can we guarantee the runtimes? – Can we update the PSLA as user queries data

  • Goal is to show increasingly more complex queries
  • Usability testing

65

slide-66
SLIDE 66

Conclusion

  • Many cloud DBMSs exist
  • Require users to reason about resources
  • We propose to re-think that interface
  • Personalized Service Level Agreements

– Service Tiers – Price/Capabilities/Performance

  • Important direction for cloud DBMSs

66