Fourier-Assisted Machine Learning of Hard Disk Drive Access Time - - PowerPoint PPT Presentation

fourier assisted machine learning of hard disk drive
SMART_READER_LITE
LIVE PREVIEW

Fourier-Assisted Machine Learning of Hard Disk Drive Access Time - - PowerPoint PPT Presentation

Fourier-Assisted Machine Learning of Hard Disk Drive Access Time Models Adam Crume 1 Carlos Maltzahn 1 Lee Ward 2 Thomas Kroeger 2 Matthew Curry 2 Ron Oldfield 2 Patrick Widener 2 1 University of California, Santa Cruz { adamcrume, carlosm }


slide-1
SLIDE 1

Fourier-Assisted Machine Learning of Hard Disk Drive Access Time Models

Adam Crume1 Carlos Maltzahn1 Lee Ward2 Thomas Kroeger2 Matthew Curry2 Ron Oldfield2 Patrick Widener2

1University of California, Santa Cruz

{adamcrume, carlosm}@cs.ucsc.edu

2Sandia National Laboratories, Livermore, CA

{lee, tmkroeg, mlcurry, raoldfi, pwidene}@sandia.gov

November 18, 2013

Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times

slide-2
SLIDE 2

Predicting hard drive performance

Use cases: System simulations File system design Quality of service / real-time guarantees (Anna Povzner’s work with Fahrrad)

Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 1 / 26

slide-3
SLIDE 3

Complexity, part 1

Spindle T r a c k ( 2 5 s e c t

  • r

s ) S p a r e s e c t

  • r

s S e c t

  • r

S e c t

  • r

1 S e c t

  • r

2 . . . T r a c k 1 ( 2 4 s e c t

  • r

s ) S p a r e s e c t

  • r

s S e c t

  • r

S e c t

  • r

1 S e c t

  • r

2 ... Skew B a d s e c t

  • r

R e m a p p e d Rotational latency = 8.33ms max at 7200 RPM Arm Read/write head S e e k = ∼ 2 2 m s m a x

Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 2 / 26

slide-4
SLIDE 4

Complexity, part 2

Spindle Platter Track 0 Track 1 Track 2 Track 5 Track 4 Track 3 Track 6 Track 7 Track 8 Track 11 Track 10 Track 9 Serpentine Spindle Platter Track 0 Track 1 Track 2 Track 3 Track 4 Track 5 Track 6 Track 7 Track 8 Track 9 Track 10 Track 11 Serpentine

Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 3 / 26

slide-5
SLIDE 5

Complexity, part 3

Additionally: Queueing Scheduling Caching Readahead Write-back

Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 4 / 26

slide-6
SLIDE 6

Machine learning of hard drive performance

Short term goals (this presentation): Automated Fast Long term goals (future work): Flexible Future-proof Device-independent

Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 5 / 26

slide-7
SLIDE 7

Offline vs. online machine learning

Offline: separate train and test phases Device Training data Model Model Requests Predictions freeze Online: feedback from real device Model Predictions Compare Requests Device feedback

Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 6 / 26

slide-8
SLIDE 8

Offline vs. online use cases

System simulations (offline) File system design (offline) Quality of service / real-time guarantees (offline/online)

Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 7 / 26

slide-9
SLIDE 9

Existing machine learning approaches: demerit

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2 4 6 8 10 Percentile Latency (ms) Actual CDF Predicted CDF

Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 8 / 26

slide-10
SLIDE 10

Existing machine learning approaches: average in time slice

5 10 15 20 2 4 6 8 10 Latency (ms) Wall clock time (s) T i m e S l i c e T i m e S l i c e 1 T i m e S l i c e 2 T i m e S l i c e 3 T i m e S l i c e 4

For each time slice: r0 → prediction0 r1 → prediction1 . . . . . . rn → predictionn          → average → compare with real average

Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 9 / 26

slide-11
SLIDE 11

Existing machine learning approaches: predict average

5 10 15 20 2 4 6 8 10 Latency (ms) Wall clock time (s) T i m e S l i c e T i m e S l i c e 1 T i m e S l i c e 2 T i m e S l i c e 3 T i m e S l i c e 4

For each time slice: r0 r1 . . . rn          → aggregate → predict average → compare with real average

Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 10 / 26

slide-12
SLIDE 12

Existing machine learning approaches: limitations

All aggregate. None predict individual latencies with low error. Hard part? Access times.

Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 11 / 26

slide-13
SLIDE 13

Workload

Characteristics: Random Read-only Single-sector Full utilization First serpentine Minimizes: Caching Readahead Write-back Transfer time Request arrival time sensitivity Track length variation Workload emphasizes access time (which is a hard problem by itself) and de-emphasizes everything else. Other workloads are future work.

Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 12 / 26

slide-14
SLIDE 14

Access time breakdown

5 10 15 20 25 30 35 2e+08 4e+08 6e+08 8e+08 1e+09 Access time (ms) Distance from sector 0 (synthetic data) Long seek latency Short seek latency Rotational latency

Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 13 / 26

slide-15
SLIDE 15

Access times: unpredictable?

Why are access times hard to predict?

Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 14 / 26

slide-16
SLIDE 16

Access times: unpredictable?

Why are access times hard to predict?

Rotational layout Serpentines Sector sparing Skew

Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 14 / 26

slide-17
SLIDE 17

Access times: unpredictable?

Why are access times hard to predict?

Rotational layout Serpentines Sector sparing Skew

What do these have in common?

Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 14 / 26

slide-18
SLIDE 18

Access times: unpredictable?

Why are access times hard to predict?

Rotational layout Serpentines Sector sparing Skew

What do these have in common?

Periodicity! Most machine learning algorithms cannot directly predict periodic functions well.

Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 14 / 26

slide-19
SLIDE 19

Access time function

Access time (ms) 2000 4000 6000 8000 10000 Start sector (a) 2000 4000 6000 8000 10000 End sector (b) 1 2 3 4 5 6 7 8 9

Full table is 1 billion by 1 billion entries, would take approximately 500 million years to capture data and 3.5 exabytes to store it. Extremely sparse sampling is required, must compute on the fly.

Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 15 / 26

slide-20
SLIDE 20

Input augmentation

Key idea 1 of 2: add sines and cosines to inputs a b

                      a sin(2πa/p1) cos(2πa/p1) sin(2πa/p2) cos(2πa/p2) . . . b sin(2πb/p1) cos(2πb/p1) sin(2πb/p2) cos(2πb/p2) . . .                       (a is the start sector, b is the end sector)

Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 16 / 26

slide-21
SLIDE 21

Fourier transform

|ˆ f (u, v)|

  • 200
  • 150
  • 100
  • 50

50 100 150 200 u

  • 200
  • 150
  • 100
  • 50

50 100 150 200 v 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 17 / 26

slide-22
SLIDE 22

Fourier transform

Key idea 2 of 2: search on diagonal to limit computation

|ˆ f (u, v)|

  • 200
  • 150
  • 100
  • 50

50 100 150 200 u

  • 200
  • 150
  • 100
  • 50

50 100 150 200 v 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 17 / 26

slide-23
SLIDE 23

Decision trees

  • 1.5
  • 1
  • 0.5

0.5 1 1.5

  • 1.5
  • 1
  • 0.5

0.5 1 1.5

Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 18 / 26

slide-24
SLIDE 24

Decision trees

  • 1.5
  • 1
  • 0.5

0.5 1 1.5

  • 1.5
  • 1
  • 0.5

0.5 1 1.5

Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 18 / 26

slide-25
SLIDE 25

Decision trees

  • 1.5
  • 1
  • 0.5

0.5 1 1.5

  • 1.5
  • 1
  • 0.5

0.5 1 1.5

Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 18 / 26

slide-26
SLIDE 26

Interdependence

  • 1
  • 0.8
  • 0.6
  • 0.4
  • 0.2

0.2 0.4 0.6 0.8 1

  • 1
  • 0.8
  • 0.6
  • 0.4
  • 0.2

0.2 0.4 0.6 0.8 1

Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 19 / 26

slide-27
SLIDE 27

Neural net basics

Neuron y = f ( wixi + b) x1 x2 x3 y Usually, f (x) = tanh(x) (or similar). Final output may use f (x) = x. Training: given input xi and desired output y∗, adjust wi and b such that y = y∗

Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 20 / 26

slide-28
SLIDE 28

Flat neural net

a sin(2πa/p1) cos(2πa/p1) b sin(2πb/p1) cos(2πb/p1) . . . access time (a is the start sector, b is the end sector)

Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 21 / 26

slide-29
SLIDE 29

Neural net with shared weights

a sin(2πa/p1) cos(2πa/p1) . . . . . . b sin(2πb/p1) cos(2πb/p1) . . . . . . Subnet 1 Subnet 2 . . . access time

(a is the start sector, b is the end sector)

Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 22 / 26

slide-30
SLIDE 30

Access times

Access time (ms) 2000 4000 6000 8000 10000 Start sector (a) 2000 4000 6000 8000 10000 End sector (b) 1 2 3 4 5 6 7 8 9 Access time (ms) 2000 4000 6000 8000 10000 Start sector (a) 2000 4000 6000 8000 10000 End sector (b) 1 2 3 4 5 6 7 8 9

Actual Predicted

Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 23 / 26

slide-31
SLIDE 31

Cross validation

Data Set 3 Set 4 Set 5 Set 2 Set 1 Training set Testing set split

Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 24 / 26

slide-32
SLIDE 32

Cross validation

Data Set 3 Set 4 Set 5 Set 2 Set 1 Training set Testing set split

Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 24 / 26

slide-33
SLIDE 33

Cross validation

Data Set 3 Set 4 Set 5 Set 2 Set 1 Training set Testing set split

Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 24 / 26

slide-34
SLIDE 34

Cross validation

Data Set 3 Set 4 Set 5 Set 2 Set 1 Training set Testing set split

Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 24 / 26

slide-35
SLIDE 35

Cross validation

Data Set 3 Set 4 Set 5 Set 2 Set 1 Training set Testing set split

Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 24 / 26

slide-36
SLIDE 36

Results

Configuration Error (ms) constant value 2.013 ± 0.024 Decision trees no periods, w/o bagging 2.075 ± 0.014 no periods, with bagging 2.067 ± 0.001 6 random periods, w/o bagging 2.019 ± 0.013 6 random periods, with bagging 2.015 ± 0.013 6 periods, w/o bagging 1.649 ± 0.154 6 periods, with bagging 1.123 ± 0.009 Neural nets no periods, w/o subnets 2.014 ± 0.034 no periods, with subnets 2.012 ± 0.019 6 random periods, w/o subnets 1.924 ± 0.176 6 random periods, with subnets 1.992 ± 0.059 6 periods, w/o subnets 0.954 ± 0.052 6 periods, with subnets 0.830 ± 0.031 RMS errors for predictions over the first 237,631 sectors (94 tracks) with a random read workload. Speedup of 40X compared to DiskSim (not counting trace load time)

Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 25 / 26

slide-37
SLIDE 37

Conclusions

Periodicity information improves multiple algorithms High-level assumption, likely to apply to many devices Machine learning of per-request latencies is possible

Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 26 / 26