1
Markov Model Prediction of Markov Model Prediction of I/O Requests - - PowerPoint PPT Presentation
Markov Model Prediction of Markov Model Prediction of I/O Requests - - PowerPoint PPT Presentation
Pablo Research Group UIUC Markov Model Prediction of Markov Model Prediction of I/O Requests for Scientific I/O Requests for Scientific Applications Applications James Oly and Daniel A. Reed James Oly and Daniel A. Reed Pablo Research Group
2
Pablo Research Group UIUC
Outline
= = Markov Models for I/O
Markov Models for I/O
= = Description of Scientific Application Traces
Description of Scientific Application Traces
= = Trace
Trace-
- driven Simulations
driven Simulations – – Prediction accuracy results Prediction accuracy results – – Cache simulation results Cache simulation results
= = Experimental Results
Experimental Results
= = Summary
Summary
3
Pablo Research Group UIUC
Introduction
= = I/O continues to be a bottleneck in scientific
I/O continues to be a bottleneck in scientific computing computing
= = Knowledge of the I/O request pattern can be
Knowledge of the I/O request pattern can be crucial to improving performance crucial to improving performance
= = Exact descriptions of request patterns:
Exact descriptions of request patterns: – – May require expensive, multilevel May require expensive, multilevel instrumentation to capture complex patterns instrumentation to capture complex patterns – – May vary due to data dependence or user May vary due to data dependence or user interaction interaction
4
Pablo Research Group UIUC
Markov Models for I/O
= = Probabilistic models can help
Probabilistic models can help
= = Markov property: the probability of encountering a
Markov property: the probability of encountering a future state depends solely on the current state future state depends solely on the current state – – Little history Little history -
- > compact model
> compact model
= = Each file block is represented by a state in the
Each file block is represented by a state in the model model
5
Pablo Research Group UIUC
Example Model
6
Pablo Research Group UIUC
Prediction Algorithms
= = Greedy
Greedy – – Always chooses the most likely transition from Always chooses the most likely transition from last chosen state last chosen state
= = Path
Path-
- based
based – – Depth Depth-
- limited search for most likely path
limited search for most likely path
= = Amortized
Amortized – – Finds most likely state for 1, 2, 3... transitions Finds most likely state for 1, 2, 3... transitions from current state from current state – – Generated using state occupancy vectors and Generated using state occupancy vectors and Kolmogorov equation: Kolmogorov equation: π π(t+1) = (t+1) = π π(t) P (t) P
7
Pablo Research Group UIUC
Application Traces
= = Cactus
Cactus – – Modular environment for numerical relativity Modular environment for numerical relativity – – Small reads (<16 bytes), 65% sequential Small reads (<16 bytes), 65% sequential
= = Dyna3D
Dyna3D – – Explicit finite Explicit finite-
- element code analyzing transient
element code analyzing transient dynamic response of 3D solids and structures dynamic response of 3D solids and structures – – 2MB worth of one byte reads, 100% sequential 2MB worth of one byte reads, 100% sequential
8
Pablo Research Group UIUC
Application Traces
= = CONTINUUM
CONTINUUM – – Unstructured mesh continuum mechanics code Unstructured mesh continuum mechanics code – – Widely varying request size, 43% sequential Widely varying request size, 43% sequential
9
Pablo Research Group UIUC
Application Traces
= = Hartree
Hartree-
- Fock
Fock – – Calculates interactions among atomic nuclei and Calculates interactions among atomic nuclei and electrons in reaction paths, storing numerical electrons in reaction paths, storing numerical quadrature data for subsequent reuse quadrature data for subsequent reuse – – 80KB reads, 100% sequential; read six times 80KB reads, 100% sequential; read six times
= = SAR
SAR – – Produces surface images from aircraft Produces surface images from aircraft-
- or
- r
satellite satellite-
- mounted radar data
mounted radar data – – 370KB to 2MB requests, 67% sequential 370KB to 2MB requests, 67% sequential
10
Pablo Research Group UIUC
Application Traces
= = HYDRO
HYDRO – – Block Block-
- structured mesh hydrodynamics code
structured mesh hydrodynamics code – – Widely varying request size, 67% sequential Widely varying request size, 67% sequential
11
Pablo Research Group UIUC
Prediction Accuracy
= = Judges how accurately the model represents the
Judges how accurately the model represents the
- riginal pattern
- riginal pattern
= = A prediction of length L is created before each
A prediction of length L is created before each block request block request
= = The prediction is compared to the next L blocks
The prediction is compared to the next L blocks actually requested actually requested
= = The percentages of correctly predicted blocks are
The percentages of correctly predicted blocks are averaged to form the overall accuracy rating averaged to form the overall accuracy rating
12
Pablo Research Group UIUC
Prediction Accuracy
= = Block size and prediction length
Block size and prediction length
13
Pablo Research Group UIUC
Prediction Accuracy
= = Prediction algorithm
Prediction algorithm – – Most cases showed little difference between the
Most cases showed little difference between the algorithms, with amortized usually performing slightly algorithms, with amortized usually performing slightly better better
– – Amortized performed far better for Hartree
Amortized performed far better for Hartree-
- Fock
Fock
14
Pablo Research Group UIUC
Cache Simulation
= = LRU replacement
LRU replacement
= = Prefetched blocks inserted at LRU end of the
Prefetched blocks inserted at LRU end of the replacement queue replacement queue
= = Block sizes ranging from 1 KB to 64 KB
Block sizes ranging from 1 KB to 64 KB
= = Prediction horizon ranging from 1 to 10
Prediction horizon ranging from 1 to 10 blocks blocks
= = Compared hit ratios with N
Compared hit ratios with N-
- block readahead
block readahead policy policy
15
Pablo Research Group UIUC
Cache Simulation
= = All strategies yield relatively high hit ratios
All strategies yield relatively high hit ratios
= = Markov model predictions usually have
Markov model predictions usually have slightly higher hit rates, especially for N>1 slightly higher hit rates, especially for N>1
16
Pablo Research Group UIUC
Experimental Results
= = Cactus
Cactus – – 50x50x50 grid, 2000 iterations
50x50x50 grid, 2000 iterations
– – 2 millions block reads over 2 GB file
2 millions block reads over 2 GB file
= = Synthetic Hartree
Synthetic Hartree-
- Fock
Fock – – Synthetic application that mimics Hartree
Synthetic application that mimics Hartree-
- Fock
Fock
– – Sequential 80 KB reads to a 2.64 MB file, repeated
Sequential 80 KB reads to a 2.64 MB file, repeated six times six times
– – ~1 ms compute time between requests
~1 ms compute time between requests
17
Pablo Research Group UIUC
Experimental Results
= = Applications modified for PPFS 2, our user
Applications modified for PPFS 2, our user-
- level
level parallel filesystem parallel filesystem
= = Prediction policies compared
Prediction policies compared – – No prefetching No prefetching – – N N-
- block readahead
block readahead – – Markov model (greedy prediction) Markov model (greedy prediction)
18
Pablo Research Group UIUC
Experimental Results (Cactus)
= = Execution time decreased by up to 10% compared to
Execution time decreased by up to 10% compared to not prefetching not prefetching
= = N
N-
- block readahead was not effective
block readahead was not effective
19
Pablo Research Group UIUC
Experimental Results (H-F)
= = Sequential pattern is ideal for N
Sequential pattern is ideal for N-
- block readahead
block readahead
= = Markov model prediction matches its performance
Markov model prediction matches its performance
20
Pablo Research Group UIUC
Summary
= = Markov models can accurately represent the
Markov models can accurately represent the access patterns of scientific applications access patterns of scientific applications – – Usually have minimal loss of accuracy over a wide
Usually have minimal loss of accuracy over a wide range of block sizes range of block sizes
– – More sophisticated prediction policies can reduce
More sophisticated prediction policies can reduce error error
= = Markov models for prefetching