Stout An Adaptive Interface to Scalable Cloud Storage John Dunagan - - PowerPoint PPT Presentation

stout
SMART_READER_LITE
LIVE PREVIEW

Stout An Adaptive Interface to Scalable Cloud Storage John Dunagan - - PowerPoint PPT Presentation

Stout An Adaptive Interface to Scalable Cloud Storage John Dunagan John C. McCullough Alec Wolman Alex C. Snoeren UC San Diego Microsoft Research June 23, 2010 Scalable Multi-tiered Services 2 Scalable Multi-tiered Services app


slide-1
SLIDE 1

Stout

An Adaptive Interface to Scalable Cloud Storage

John C. McCullough John Dunagan† Alec Wolman† Alex C. Snoeren

UC San Diego Microsoft Research†

June 23, 2010

slide-2
SLIDE 2

Scalable Multi-tiered Services

2

slide-3
SLIDE 3

Scalable Multi-tiered Services

www app store www www app app store store

2

slide-4
SLIDE 4

Scalable Multi-tiered Services

www app store www www app app store store Spreadsheet

2

slide-5
SLIDE 5

Scalable Multi-tiered Services

www app store www www app app store store Spreadsheet

2

slide-6
SLIDE 6

Scalable Multi-tiered Services

www app store www www app app store store client Spreadsheet

2

slide-7
SLIDE 7

Scalable Multi-tiered Services

www app store www www app app store store client Spreadsheet

2

slide-8
SLIDE 8

Scalable Multi-tiered Services

www app store www www app app store store client Spreadsheet

2

slide-9
SLIDE 9

Scalable Multi-tiered Services

www app store www www app app store store client Spreadsheet

2

slide-10
SLIDE 10

Scalable Multi-tiered Services

www app store www www app app store store client Spreadsheet

2

slide-11
SLIDE 11

Scalable Multi-tiered Services

www app store www www app app store store Spreadsheet

2

slide-12
SLIDE 12

Scalable Multi-tiered Services

www app store www www app app store store Spreadsheet www app app store

2

slide-13
SLIDE 13

Scalable Multi-tiered Services

www app store www www app app store store Spreadsheet store

2

slide-14
SLIDE 14

Scalable Multi-tiered Services

www app store www www app app store store Spreadsheet store . . . . . . store app www Other App

2

slide-15
SLIDE 15

Scalable Multi-tiered Services

www app store www www app app store store Spreadsheet store . . . . . . store app www Other App

2

slide-16
SLIDE 16

Key-Value Storage

app store

◮ Simple interface

◮ read(key) → value ◮ write(key, value)

◮ Natural to send requests right away

3

slide-17
SLIDE 17

Key-Value Storage

app store

◮ Simple interface

◮ read(key) → value ◮ write(key, value)

◮ Natural to send requests right away ◮ Block for response to survive failures

3

slide-18
SLIDE 18

Key-Value Storage

app store

◮ Simple interface

◮ read(key) → value ◮ write(key, value)

◮ Natural to send requests right away ◮ Block for response to survive failures ◮ Performance characteristics:

Load (requests/s) Latency Saturation

3

slide-19
SLIDE 19

Key-Value Storage

app store

◮ Simple interface

◮ read(key) → value ◮ write(key, value)

◮ Natural to send requests right away ◮ Block for response to survive failures ◮ Performance characteristics:

Load (requests/s) Latency Saturation

3

slide-20
SLIDE 20

Improving Performance Under Load

app store

◮ Application server handles requests for many

clients

◮ Storage request overheads

◮ Networking delay ◮ Protocol-processing ◮ Disk seeks ◮ etc. 4

slide-21
SLIDE 21

Improving Performance Under Load

app store

◮ Application server handles requests for many

clients

◮ Storage request overheads

◮ Networking delay ◮ Protocol-processing ◮ Disk seeks ◮ etc. 4

slide-22
SLIDE 22

Improving Performance Under Load

app store

◮ Application server handles requests for many

clients

◮ Storage request overheads

◮ Networking delay ◮ Protocol-processing ◮ Disk seeks ◮ etc. 4

slide-23
SLIDE 23

Improving Performance Under Load

app store

◮ Application server handles requests for many

clients

◮ Storage request overheads

◮ Networking delay ◮ Protocol-processing ◮ Disk seeks ◮ etc. 4

slide-24
SLIDE 24

Improving Performance Under Load

app store

◮ Application server handles requests for many

clients

◮ Storage request overheads

◮ Networking delay ◮ Protocol-processing ◮ Disk seeks ◮ etc. 4

slide-25
SLIDE 25

Improving Performance Under Load

app store

◮ Application server handles requests for many

clients

◮ Storage request overheads

◮ Networking delay ◮ Protocol-processing ◮ Disk seeks ◮ etc. 4

slide-26
SLIDE 26

Improving Performance Under Load

app store

◮ Application server handles requests for many

clients

◮ Storage request overheads

◮ Networking delay ◮ Protocol-processing ◮ Disk seeks ◮ etc.

◮ Batch to amortize overheads

4

slide-27
SLIDE 27

Improving Performance Under Load

app store

◮ Application server handles requests for many

clients

◮ Storage request overheads

◮ Networking delay ◮ Protocol-processing ◮ Disk seeks ◮ etc.

◮ Batch to amortize overheads

4

slide-28
SLIDE 28

Selecting a Batching Interval

◮ Most apps use a fixed batching interval

5

slide-29
SLIDE 29

Selecting a Batching Interval

◮ Most apps use a fixed batching interval

short interval long

5

slide-30
SLIDE 30

Selecting a Batching Interval

◮ Most apps use a fixed batching interval ◮ Latency/throughput tradeoff

short interval long better latency better throughput

5

slide-31
SLIDE 31

Selecting a Batching Interval

◮ Most apps use a fixed batching interval ◮ Latency/throughput tradeoff

short interval long better latency better throughput

Load (requests/s) Latency

5

slide-32
SLIDE 32

Selecting a Batching Interval

◮ Most apps use a fixed batching interval ◮ Latency/throughput tradeoff

short interval long better latency better throughput

Load (requests/s) Latency

5

slide-33
SLIDE 33

Selecting a Batching Interval

◮ Most apps use a fixed batching interval ◮ Latency/throughput tradeoff

short interval long better latency better throughput

Load (requests/s) Latency

5

slide-34
SLIDE 34

Selecting a Batching Interval

◮ Most apps use a fixed batching interval ◮ Latency/throughput tradeoff ◮ Want flexible batching interval

◮ Short when lightly loaded ◮ Long when heavily loaded

short interval long better latency better throughput

Load (requests/s) Latency

5

slide-35
SLIDE 35

Solution: Stout

www app1 Stout www app2 Stout store store store

◮ Stout is a storage interposition library ◮ Our contribution is a technique for

independently adjusting the batching interval

6

slide-36
SLIDE 36

Outline

  • 1. Introduction
  • 2. Application Structure
  • 3. Adaptive Batching
  • 4. Evaluation

7

slide-37
SLIDE 37

Overlapped Request Processing

www app store

8

slide-38
SLIDE 38

Overlapped Request Processing

www app store

8

slide-39
SLIDE 39

Overlapped Request Processing

www app store ProcessRequest(req):

8

slide-40
SLIDE 40

Overlapped Request Processing

www app store ProcessRequest(req): key = Parse(req)

8

slide-41
SLIDE 41

Overlapped Request Processing

www app store ProcessRequest(req): key = Parse(req) Process(key,req)

8

slide-42
SLIDE 42

Overlapped Request Processing

www app store ProcessRequest(req): key = Parse(req) Process(key,req) PersistState(key)

8

slide-43
SLIDE 43

Overlapped Request Processing

www app store ProcessRequest(req): key = Parse(req) Process(key,req) PersistState(key)

8

slide-44
SLIDE 44

Overlapped Request Processing

www app store ProcessRequest(req): key = Parse(req) Process(key,req) PersistState(key) reply = MakeReply(req)

8

slide-45
SLIDE 45

Overlapped Request Processing

www app store ProcessRequest(req): key = Parse(req) Process(key,req) PersistState(key) reply = MakeReply(req) SendReply(reply)

8

slide-46
SLIDE 46

Overlapped Request Processing

www app store ProcessRequest(req): key = Parse(req) Process(key,req) PersistState(key) reply = MakeReply(req) SendReply(reply)

8

slide-47
SLIDE 47

Overlapped Request Processing

www app store ProcessRequest(req): key = Parse(req) Process(key,req) PersistState(key) reply = MakeReply(req) SendReply(reply)

8

slide-48
SLIDE 48

Overlapped Request Processing

www app store Stout ProcessRequest(req): key = Parse(req) Process(key,req) PersistState(key) reply = MakeReply(req) SendReply(reply)

8

slide-49
SLIDE 49

Overlapped Request Processing

www app store Stout ProcessRequest(req): key = Parse(req) Process(key,req) MarkDirty(key) reply = MakeReply(req) SendReply(reply)

8

slide-50
SLIDE 50

Overlapped Request Processing

www app store Stout ProcessRequest(req): key = Parse(req) Process(key,req) MarkDirty(key) reply = MakeReply(req) SafeReply(key,reply)

8

slide-51
SLIDE 51

Overlapped Request Processing

www app store Stout ProcessRequest(req): key = Parse(req) Process(key,req) MarkDirty(key) reply = MakeReply(req) SafeReply(key,reply) BatchingLoop: keys = DirtyKeys() replies = Depends(keys) AsyncWrite(keys, replies) Sleep(interval)

8

slide-52
SLIDE 52

Overlapped Request Processing

www app store Stout ProcessRequest(req): key = Parse(req) Process(key,req) MarkDirty(key) reply = MakeReply(req) SafeReply(key,reply) BatchingLoop: keys = DirtyKeys() replies = Depends(keys) AsyncWrite(keys, replies) Sleep(interval)

8

slide-53
SLIDE 53

Staying Safe: Consistency

◮ Don’t reveal uncomitted state

Synchronous app store x=5 Potential Async app store x=5

9

slide-54
SLIDE 54

Staying Safe: Consistency

◮ Don’t reveal uncomitted state ◮ Potential async: Inconsistency on failure

Synchronous app store x=5 Potential Async app store x=5

9

slide-55
SLIDE 55

Staying Safe: Consistency

◮ Don’t reveal uncomitted state ◮ Potential async: Inconsistency on failure

Synchronous app store x=5 Potential Async app store x=5

9

slide-56
SLIDE 56

Staying Safe: Consistency

◮ Don’t reveal uncomitted state ◮ Potential async: Inconsistency on failure

Synchronous app store x=5 Potential Async app store x=5

9

slide-57
SLIDE 57

Staying Safe: Consistency

◮ Don’t reveal uncomitted state ◮ Potential async: Inconsistency on failure

Synchronous app store x=5 Potential Async app store x=5 interval

9

slide-58
SLIDE 58

Staying Safe: Consistency

◮ Don’t reveal uncomitted state ◮ Potential async: Inconsistency on failure

Synchronous app store x=5 Potential Async app store x=5 interval

9

slide-59
SLIDE 59

Staying Safe: Consistency

◮ Don’t reveal uncomitted state ◮ Potential async: Inconsistency on failure

Synchronous app store x=5 Potential Async app store x=5

9

slide-60
SLIDE 60

Staying Safe: Consistency

◮ Don’t reveal uncomitted state ◮ Potential async: Inconsistency on failure

Synchronous app store x=5 Potential Async app store x=5

9

slide-61
SLIDE 61

Staying Safe: Consistency

◮ Don’t reveal uncomitted state ◮ Potential async: Inconsistency on failure

Synchronous app store x=5 Potential Async app store x=5 Failure

9

slide-62
SLIDE 62

Staying Safe: Consistency

◮ Don’t reveal uncomitted state ◮ Potential async: Inconsistency on failure ◮ Stout provides serialized update semantics

Synchronous app store x=5 Stout Async app store x=5

9

slide-63
SLIDE 63

Staying Safe: Consistency

◮ Don’t reveal uncomitted state ◮ Potential async: Inconsistency on failure ◮ Stout provides serialized update semantics

Synchronous app store x=5 Stout Async app store x=5 interval

9

slide-64
SLIDE 64

Staying Safe: Consistency

◮ Don’t reveal uncomitted state ◮ Potential async: Inconsistency on failure ◮ Stout provides serialized update semantics

Synchronous app store x=5 Stout Async app store x=5 interval

9

slide-65
SLIDE 65

Staying Safe: Consistency

◮ Don’t reveal uncomitted state ◮ Potential async: Inconsistency on failure ◮ Stout provides serialized update semantics

Synchronous app store x=5 Stout Async app store x=5 interval

9

slide-66
SLIDE 66

Benefit: Write Collapsing

◮ Batched commits enable further optimization ◮ Can write most recent version only ◮ Reduces load at the store

10

slide-67
SLIDE 67

Benefit: Write Collapsing

◮ Batched commits enable further optimization ◮ Can write most recent version only ◮ Reduces load at the store

x=5

10

slide-68
SLIDE 68

Benefit: Write Collapsing

◮ Batched commits enable further optimization ◮ Can write most recent version only ◮ Reduces load at the store

x=5 x=6

10

slide-69
SLIDE 69

Benefit: Write Collapsing

◮ Batched commits enable further optimization ◮ Can write most recent version only ◮ Reduces load at the store

x=5 x=6 x=7

10

slide-70
SLIDE 70

Benefit: Write Collapsing

◮ Batched commits enable further optimization ◮ Can write most recent version only ◮ Reduces load at the store

x=5 x=6 x=7

10

slide-71
SLIDE 71

Benefit: Write Collapsing

◮ Batched commits enable further optimization ◮ Can write most recent version only ◮ Reduces load at the store

x=5 x=6 x=7 x=7

10

slide-72
SLIDE 72

Outline

  • 1. Introduction
  • 2. Application Structure
  • 3. Adaptive Batching
  • 4. Evaluation

11

slide-73
SLIDE 73

Adapting to Shared Storage

◮ Storage system is a shared medium ◮ Independently reach efficient fair share ◮ Delay as congestion indicator

◮ Rather than modifying storage for explicit

notification

app store store store app app Stout Stout Stout Queue

12

slide-74
SLIDE 74

Delay-based Congestion Control

◮ Unknown bottleneck capacity ◮ Traditional TCP signaled via packet loss ◮ Delay-based congestion control triggered by

latency changes Router Queue

13

slide-75
SLIDE 75

Applications to Storage

Networking Storage Mechanism Change Rate Change Size ACCELERATE Send Faster Batch Less BACK-OFF Send Slower Batch More

14

slide-76
SLIDE 76

Algorithm

if perf < recent perf BACK-OFF else ACCELERATE

15

slide-77
SLIDE 77

Algorithm: Estimating Storage Performance

if perf < recent perf BACK-OFF else ACCELERATE batch size latency + interval

16

slide-78
SLIDE 78

Algorithm: Estimating Storage Capacity

if perf < recent perf BACK-OFF else ACCELERATE if backed-off EWMA(batch sizei) EWMA(lati) + EWMA(intervali) else // accelerated MAXi( batch sizei lati + intervali )

17

slide-79
SLIDE 79

Algorithm: Achieving Fair Share

if perf < recent perf BACK-OFF else ACCELERATE

18

slide-80
SLIDE 80

Algorithm: Achieving Fair Share

if perf < recent perf BACK-OFF else ACCELERATE (1 + α) ∗ intervali

18

slide-81
SLIDE 81

Algorithm: Achieving Fair Share

if perf < recent perf BACK-OFF else ACCELERATE (1 + α) ∗ intervali (1 − β) ∗ intervali + β ∗

  • intervali

18

slide-82
SLIDE 82

Algorithm: Achieving Fair Share

if perf < recent perf BACK-OFF else ACCELERATE (1 + α) ∗ intervali (1 − β) ∗ intervali + β ∗

  • intervali

Time (s) interval

18

slide-83
SLIDE 83

Outline

  • 1. Introduction
  • 2. Application Structure
  • 3. Adaptive Batching
  • 4. Evaluation

19

slide-84
SLIDE 84

Evaluation

◮ Baseline Storage System Performance

◮ Benefits of batching ◮ Benefits of write-collapsing

◮ Stout

◮ Versus fixed batching intervals ◮ Workload variation 20

slide-85
SLIDE 85

Evaluation

!"#$%&$'(%

21

slide-86
SLIDE 86

Evaluation

!"#$%&$'(%

Sectioned Document Store

21

slide-87
SLIDE 87

Evaluation

!"#$%&$'(%

Sectioned Document Store Our Workload

◮ 256-byte documents: IOPS dominated ◮ 50% read, 50% write

21

slide-88
SLIDE 88

Evaluation: Configuration

Evaluation Platform

◮ 50 machines

◮ 1 Experiment Controller ◮ 1 Lease Manager ◮ 12 Frontends ◮ 32 Middle Tiers ◮ 4 Storage (Partitioned Key-Value w/MSSQL as

storage)

www app store 12× 32× 4×

22

slide-89
SLIDE 89

Baseline: Importance of Batching

2k 4k 6k 8k 10k 12k 14k 16k 18k Load (requests/s) 50 100 150 200 250 300 End-to-end Latency (ms) no-batching

23

slide-90
SLIDE 90

Baseline: Importance of Batching

2k 4k 6k 8k 10k 12k 14k 16k 18k Load (requests/s) 50 100 150 200 250 300 End-to-end Latency (ms) no-batching 10ms

23

slide-91
SLIDE 91

Baseline: Importance of Batching

2k 4k 6k 8k 10k 12k 14k 16k 18k Load (requests/s) 50 100 150 200 250 300 End-to-end Latency (ms) no-batching 10ms 20ms

◮ Batching improves performance

23

slide-92
SLIDE 92

Baseline: Importance of Write-Collapsing

4k 6k 8k 10k 12k 14k 16k 18k 20k Load (requests/s) 50 100 150 200 250 300 End-to-end Latency (ms) 10ms low collapsing

Low collapsing 10k Documents High collapsing 100 Documents

24

slide-93
SLIDE 93

Baseline: Importance of Write-Collapsing

4k 6k 8k 10k 12k 14k 16k 18k 20k Load (requests/s) 50 100 150 200 250 300 End-to-end Latency (ms) 10ms low collapsing 20ms low collapsing

Low collapsing 10k Documents High collapsing 100 Documents

24

slide-94
SLIDE 94

Baseline: Importance of Write-Collapsing

4k 6k 8k 10k 12k 14k 16k 18k 20k Load (requests/s) 50 100 150 200 250 300 End-to-end Latency (ms) 10ms low collapsing 20ms low collapsing 10ms high collapsing

Low collapsing 10k Documents High collapsing 100 Documents

24

slide-95
SLIDE 95

Baseline: Importance of Write-Collapsing

4k 6k 8k 10k 12k 14k 16k 18k 20k Load (requests/s) 50 100 150 200 250 300 End-to-end Latency (ms) 10ms low collapsing 20ms low collapsing 10ms high collapsing 20ms high collapsing

Low collapsing 10k Documents High collapsing 100 Documents

◮ Improvement dependent on workload

24

slide-96
SLIDE 96

Evaluation: Stout vs. Fixed Intervals

5k 10k 15k 20k 25k 30k 35k 40k 45k Load (requests/s) 100 200 300 400 500 600 700 800 End-to-end Latency (ms) 20ms

25

slide-97
SLIDE 97

Evaluation: Stout vs. Fixed Intervals

5k 10k 15k 20k 25k 30k 35k 40k 45k Load (requests/s) 100 200 300 400 500 600 700 800 End-to-end Latency (ms) 20ms 40ms

25

slide-98
SLIDE 98

Evaluation: Stout vs. Fixed Intervals

5k 10k 15k 20k 25k 30k 35k 40k 45k Load (requests/s) 100 200 300 400 500 600 700 800 End-to-end Latency (ms) 20ms 40ms 80ms

25

slide-99
SLIDE 99

Evaluation: Stout vs. Fixed Intervals

5k 10k 15k 20k 25k 30k 35k 40k 45k Load (requests/s) 100 200 300 400 500 600 700 800 End-to-end Latency (ms) 20ms 40ms 80ms 160ms

25

slide-100
SLIDE 100

Evaluation: Stout vs. Fixed Intervals

5k 10k 15k 20k 25k 30k 35k 40k 45k Load (requests/s) 100 200 300 400 500 600 700 800 End-to-end Latency (ms) 20ms 40ms 80ms 160ms Stout

◮ Stout better than any fixed interval across

wide range of workloads

25

slide-101
SLIDE 101

Evaluation: Workload Variation

60 70 80 90 100 110 120 Time (s) 50 100 150 200 250 End-to-end Latency (ms) 20ms decrease

Decrease 12k requests/s → 8k requests/s Increase 12k requests/s → 18k requests/s

26

slide-102
SLIDE 102

Evaluation: Workload Variation

60 70 80 90 100 110 120 Time (s) 50 100 150 200 250 End-to-end Latency (ms) 20ms decrease Stout decrease

Decrease 12k requests/s → 8k requests/s Increase 12k requests/s → 18k requests/s

26

slide-103
SLIDE 103

Evaluation: Workload Variation

60 70 80 90 100 110 120 Time (s) 50 100 150 200 250 End-to-end Latency (ms) 20ms decrease Stout decrease 20ms increase

Decrease 12k requests/s → 8k requests/s Increase 12k requests/s → 18k requests/s

26

slide-104
SLIDE 104

Evaluation: Workload Variation

60 70 80 90 100 110 120 Time (s) 50 100 150 200 250 End-to-end Latency (ms) 20ms decrease Stout decrease 20ms increase Stout increase

Decrease 12k requests/s → 8k requests/s Increase 12k requests/s → 18k requests/s

26

slide-105
SLIDE 105

Additional Evaluation

◮ Fairness (Jain’s Fairness index of 0.96) ◮ Stout achieves similar performance with:

◮ PacificA ◮ SQL Data Services 27

slide-106
SLIDE 106

Conclusion

◮ Batching improves storage performance ◮ Current practice is fixed latency/throughput

tradeoff

◮ Stout introduces distributed adaptation

technique

◮ Achieve 3× higher throughput over

low-latency fixed interval for modified Live Mesh service

28

slide-107
SLIDE 107

Questions?

29