fourier assisted machine learning of hard disk drive
play

Fourier-Assisted Machine Learning of Hard Disk Drive Access Time - PowerPoint PPT Presentation

Fourier-Assisted Machine Learning of Hard Disk Drive Access Time Models Adam Crume 1 Carlos Maltzahn 1 Lee Ward 2 Thomas Kroeger 2 Matthew Curry 2 Ron Oldfield 2 Patrick Widener 2 1 University of California, Santa Cruz { adamcrume, carlosm }


  1. Fourier-Assisted Machine Learning of Hard Disk Drive Access Time Models Adam Crume 1 Carlos Maltzahn 1 Lee Ward 2 Thomas Kroeger 2 Matthew Curry 2 Ron Oldfield 2 Patrick Widener 2 1 University of California, Santa Cruz { adamcrume, carlosm } @cs.ucsc.edu 2 Sandia National Laboratories, Livermore, CA { lee, tmkroeg, mlcurry, raoldfi, pwidene } @sandia.gov November 18, 2013 Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times

  2. Predicting hard drive performance Use cases: System simulations File system design Quality of service / real-time guarantees (Anna Povzner’s work with Fahrrad) Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 1 / 26

  3. Complexity, part 1 Rotational latency = 8.33ms max at 7200 RPM 0 ( 2 5 0 0 s k e c c t o a r r s T ) ( 2 4 0 0 s 1 e k c t c o a r s r T ) Read/write head x a Spindle m s ... m 2 2 = ∼ k r e o e S t c d e S e s e p d c p a t a o B m r e 2 R S e c . t . o . r 1 s r o t S S c e e e c s t c o r r e t 0 a s S p o r o r t 2 c e s S e e Arm c r a t o p r S 1 S e 0 c t o r Skew Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 2 / 26

  4. Complexity, part 2 Serpentine Track 8 Track 7 Track 6 Track 2 Track 1 Track 0 Spindle Platter Track 9 Track 10 Track 11 Track 3 Track 4 Track 5 Serpentine Track 8 Track 7 Track 6 Track 2 Track 1 Track 0 Spindle Platter Track 11 Track 10 Track 9 Track 5 Track 4 Track 3 Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 3 / 26

  5. Complexity, part 3 Additionally: Queueing Scheduling Caching Readahead Write-back Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 4 / 26

  6. Machine learning of hard drive performance Short term goals (this presentation): Automated Fast Long term goals (future work): Flexible Future-proof Device-independent Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 5 / 26

  7. Offline vs. online machine learning Offline: separate train and test phases Requests Device freeze Training data Model Model Predictions Online: feedback from real device Requests feedback Model Compare Predictions Device Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 6 / 26

  8. Offline vs. online use cases System simulations (offline) File system design (offline) Quality of service / real-time guarantees (offline/online) Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 7 / 26

  9. Existing machine learning approaches: demerit 1 0.9 0.8 0.7 0.6 Percentile 0.5 0.4 0.3 0.2 0.1 Actual CDF Predicted CDF 0 0 2 4 6 8 10 Latency (ms) Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 8 / 26

  10. Existing machine learning approaches: average in time slice 20 Latency (ms) 15 0 1 2 3 4 e e e e e c c c c c i i i i i l l l l l S S S S S 10 e e e e e m m m m m i i i i i T T T T T 5 0 0 2 4 6 8 10 Wall clock time (s) For each time slice:  → r 0 prediction 0   r 1 → prediction 1   → average → compare with real average . . . . . .    → r n prediction n  Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 9 / 26

  11. Existing machine learning approaches: predict average 20 Latency (ms) 15 0 1 2 3 4 e e e e e c c c c c i i i i i l l l l l S S S S S 10 e e e e e m m m m m i i i i i T T T T T 5 0 0 2 4 6 8 10 Wall clock time (s) For each time slice:  r 0   r 1   → aggregate → predict average → compare with real average . . .    r n  Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 10 / 26

  12. Existing machine learning approaches: limitations All aggregate. None predict individual latencies with low error. Hard part? Access times. Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 11 / 26

  13. Workload Characteristics: Minimizes: Random Caching Read-only Readahead Single-sector Write-back Full utilization Transfer time First serpentine Request arrival time sensitivity Track length variation Workload emphasizes access time (which is a hard problem by itself) and de-emphasizes everything else. Other workloads are future work. Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 12 / 26

  14. Access time breakdown 35 30 25 Access time (ms) 20 Rotational latency 15 Short seek latency 10 5 Long seek latency (synthetic data) 0 0 2e+08 4e+08 6e+08 8e+08 1e+09 Distance from sector 0 Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 13 / 26

  15. Access times: unpredictable? Why are access times hard to predict? Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 14 / 26

  16. Access times: unpredictable? Why are access times hard to predict? Rotational layout Serpentines Sector sparing Skew Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 14 / 26

  17. Access times: unpredictable? Why are access times hard to predict? Rotational layout Serpentines Sector sparing Skew What do these have in common? Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 14 / 26

  18. Access times: unpredictable? Why are access times hard to predict? Rotational layout Serpentines Sector sparing Skew What do these have in common? Periodicity! Most machine learning algorithms cannot directly predict periodic functions well. Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 14 / 26

  19. Access time function 10000 9 8 8000 7 6 Access time (ms) End sector ( b ) 6000 5 4 4000 3 2 2000 1 0 0 0 2000 4000 6000 8000 10000 Start sector ( a ) Full table is 1 billion by 1 billion entries, would take approximately 500 million years to capture data and 3.5 exabytes to store it. Extremely sparse sampling is required, must compute on the fly. Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 15 / 26

  20. Input augmentation Key idea 1 of 2: add sines and cosines to inputs   a sin(2 π a / p 1 )     cos(2 π a / p 1 )     sin(2 π a / p 2 )     cos(2 π a / p 2 )   � a  .  . �   .   →   b b     sin(2 π b / p 1 )     cos(2 π b / p 1 )     sin(2 π b / p 2 )     cos(2 π b / p 2 )     . . . ( a is the start sector, b is the end sector) Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 16 / 26

  21. Fourier transform 200 1 | ˆ f ( u , v ) | 0.9 150 0.8 100 0.7 50 0.6 0 0.5 v 0.4 -50 0.3 -100 0.2 -150 0.1 -200 0 -200 -150 -100 -50 0 50 100 150 200 u Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 17 / 26

  22. Fourier transform Key idea 2 of 2: search on diagonal to limit computation 200 1 | ˆ f ( u , v ) | 0.9 150 0.8 100 0.7 50 0.6 0 0.5 v 0.4 -50 0.3 -100 0.2 -150 0.1 -200 0 -200 -150 -100 -50 0 50 100 150 200 u Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 17 / 26

  23. Decision trees 1.5 1 0.5 0 -0.5 -1 -1.5 -1.5 -1 -0.5 0 0.5 1 1.5 Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 18 / 26

  24. Decision trees 1.5 1 0.5 0 -0.5 -1 -1.5 -1.5 -1 -0.5 0 0.5 1 1.5 Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 18 / 26

  25. Decision trees 1.5 1 0.5 0 -0.5 -1 -1.5 -1.5 -1 -0.5 0 0.5 1 1.5 Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 18 / 26

  26. Interdependence 1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 19 / 26

  27. Neural net basics y = f ( � w i x i + b ) x 1 x 2 y Neuron x 3 Usually, f ( x ) = tanh( x ) (or similar). Final output may use f ( x ) = x . Training: given input x i and desired output y ∗ , adjust w i and b such that y = y ∗ Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 20 / 26

  28. Flat neural net a sin (2 π a / p 1 ) cos (2 π a / p 1 ) access time . . . b sin (2 π b / p 1 ) cos (2 π b / p 1 ) ( a is the start sector, b is the end sector) Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 21 / 26

  29. Neural net with shared weights Subnet 1 a sin (2 π a / p 1 ) . . . . . . cos (2 π a / p 1 ) access time Subnet 2 . . . b sin (2 π b / p 1 ) . . . . . . cos (2 π b / p 1 ) ( a is the start sector, b is the end sector) Crume, Maltzahn, Ward, Kroeger, Curry, Oldfield, Widener Hard Drive Access Times 22 / 26

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend