The Anytime Automaton Joshua San Miguel Natalie Enright Jerger - - PowerPoint PPT Presentation
The Anytime Automaton Joshua San Miguel Natalie Enright Jerger - - PowerPoint PPT Presentation
The Anytime Automaton Joshua San Miguel Natalie Enright Jerger Summary We propose the Anytime Automaton : A new computation model for approximate computing. 2 Summary We propose the Anytime Automaton : A new computation model for
Summary
2
We propose the Anytime Automaton:
- A new computation model for approximate computing.
Summary
3
We propose the Anytime Automaton:
- A new computation model for approximate computing.
application execution quality final
- utput
Summary
4
We propose the Anytime Automaton:
- A new computation model for approximate computing.
application execution quality final
- utput
Approximate Computing
Many applications are inherently noisy and imprecise.
5
http://www.zentut.com/ http://www.businessweek.com/ http://www.cc.gatech.edu/~cnieto6/ http://www.analyticbridge.com/ http://themusicparlour.blogspot.ca/ http://www.scientific-computing.com/
Data mining Computer vision Audio and video processing Gaming Machine learning Dynamical simulation
Approximate Computing
Many applications are inherently noisy and imprecise.
6
http://www.zentut.com/ http://www.businessweek.com/ http://www.cc.gatech.edu/~cnieto6/ http://www.analyticbridge.com/ http://themusicparlour.blogspot.ca/ http://www.scientific-computing.com/
Data mining Computer vision Audio and video processing Gaming Machine learning Dynamical simulation
But how can we apply approximate computing techniques and still ensure acceptability in final output?
Approximate Computing
program () { foos_on_first(); bars_on_second(); hello_worlds_on_third(); }
7
time foos_on_first bars_on_second hello_worlds_on_third
Approximate Computing
program () { approx_foos_on_first(); bars_on_second(); hello_worlds_on_third(); }
8
time foos_on_first bars_on_second hello_worlds_on_third tune quality (runtime-quality tradeoff)
Approximate Computing
program () { approx_foos_on_first(); bars_on_second(); hello_worlds_on_third(); }
9
time foos_on_first bars_on_second hello_worlds_on_third tune quality (runtime-quality tradeoff)
Approximate Computing
program () { approx_foos_on_first(); approx_bars_on_second(); hello_worlds_on_third(); }
10
time bars_on_second hello_worlds_on_third tune quality foos_on_first
Approximate Computing
program () { approx_foos_on_first(); approx_bars_on_second(); hello_worlds_on_third(); }
11
time bars_on_second hello_worlds_on_third tune quality foos_on_first
Approximate Computing
program () { approx_foos_on_first(); approx_bars_on_second(); hello_worlds_on_third(); }
12
time bars_on_second hello_worlds_on_third foos_on_first But final output may not be acceptable!
Approximate Computing
program () { approx_foos_on_first(); approx_bars_on_second(); hello_worlds_on_third(); }
13
time bars_on_second hello_worlds_on_third foos_on_first But final output may not be acceptable!
Difficult to ensure acceptability of final output
- n-the-fly, since quality control limited to local
approximations and not their composition. (Challenge #1: Holistic Quality Control)
Approximate Computing
program () { approx_foos_on_first(); approx_bars_on_second(); hello_worlds_on_third(); }
14
time bars_on_second hello_worlds_on_third foos_on_first But final output may not be acceptable!
Difficult to ensure acceptability of final output
- n-the-fly, since quality control limited to local
approximations and not their composition. (Challenge #1: Holistic Quality Control)
tune local quality tune local quality
Real-Time Computing
Real-time systems impose strict runtime constraints; loss in output quality more tolerable than not finishing in time.
15
http://www.streamingvideoprovider.co.uk/ http://www.beaudaniels-illustration.com/ http://m.exed.hec.edu/
Streaming multimedia Automotive systems Telecommunications
Real-Time Computing
program () { approx_foos_on_first(); approx_bars_on_second(); hello_worlds_on_third(); }
16
time foos_on_first bars_on_second hello_worlds_on_third strict target runtime
Real-Time Computing
program () { approx_foos_on_first(); approx_bars_on_second(); hello_worlds_on_third(); }
17
time foos_on_first bars_on_second hello_worlds_on_third tune quality tune quality strict target runtime
Real-Time Computing
program () { approx_foos_on_first(); approx_bars_on_second(); hello_worlds_on_third(); }
18
time foos_on_first bars_on_second hello_worlds_on_third tune quality tune quality strict target runtime
Real-Time Computing
program () { approx_foos_on_first(); approx_bars_on_second(); hello_worlds_on_third(); }
19
time foos_on_first bars_on_second hello_worlds_on_third But actual runtimes variable! strict target runtime
Real-Time Computing
program () { approx_foos_on_first(); approx_bars_on_second(); hello_worlds_on_third(); }
20
time foos_on_first bars_on_second hello_worlds_on_third
Difficult to ensure strict real-time constraints are met (i.e., interrupt the application), since runtime-quality tradeoffs vary dynamically. (Challenge #2: Interruptibility)
But actual runtimes variable! strict target runtime
User-Interactive Computing
In user-interactive environments, users dictate quality requirements on-the-fly.
21
http://www.businessweek.com/ http://www.pcadvisor.co.uk/ http://www.expressvpn.com/
Gaming Mobile vision Search/recommendation
User-Interactive Computing
program () { approx_foos_on_first(); approx_bars_on_second(); hello_worlds_on_third(); }
22
time foos_on_first bars_on_second hello_worlds_on_third
User-Interactive Computing
program () { approx_foos_on_first(); approx_bars_on_second(); hello_worlds_on_third(); }
23
time bars_on_second hello_worlds_on_third foos_on_first tune quality?
User-Interactive Computing
program () { approx_foos_on_first(); approx_bars_on_second(); hello_worlds_on_third(); }
24
time bars_on_second hello_worlds_on_third But target quality unknown! foos_on_first tune quality?
User-Interactive Computing
program () { approx_foos_on_first(); approx_bars_on_second(); hello_worlds_on_third(); }
25
time bars_on_second hello_worlds_on_third But target quality unknown! foos_on_first tune quality?
Difficult to ensure acceptability for a given user at a given context, since acceptable quality cannot be determined a priori. (Challenge #3: User Flexibility)
Anytime Automaton
26
We propose the Anytime Automaton:
- A new computation model for approximate computing.
- Revisits and generalizes concepts from anytime (or iterative)
algorithms, originally studied for real-time decision problems.
- A recipe for applying approximate computing techniques such
that the final output is available early and improves in quality
- ver time.
Anytime Automaton
27
application execution quality
Anytime Automaton
28
application execution quality final
- utput
conventionally, single output
Anytime Automaton
29
application execution quality precise
- utput
Anytime Automaton
30
application execution quality precise
- utput
holistic quality control: final output available early
Anytime Automaton
31
application execution quality precise
- utput
strict target runtime
Anytime Automaton
32
application execution quality precise
- utput
interruptibility: use current output if needed strict target runtime
Anytime Automaton
33
application execution quality precise
- utput
Anytime Automaton
34
application execution quality precise
- utput
Anytime Automaton
35
application execution quality precise
- utput
Anytime Automaton
36
application execution quality precise
- utput
user flexibility: wait longer for better quality
Outline
Anytime Automaton
- The Model
- The Approximations
Evaluation
- Methodology
- Experimental Results
Conclusion
37
Anytime Automaton
38
program:
data dependence computation
Anytime Automaton
39
inst A;
program:
Anytime Automaton
40
inst A; inst B; inst C;
program:
Anytime Automaton
41
func (...);
program:
Anytime Automaton
42
kernel () { ... }
program:
Dataflow Model
43
program:
Dataflow Model
44
i n p u t
Dataflow Model
45
i n p u t
Dataflow Model
46
i n p u t
Dataflow Model
47
i n p u t
Dataflow Model
48
i n p u t
Dataflow Model
49
i n p u t
Dataflow Model
50
i n p u t
Dataflow Model
51
i n p u t
Anytime Automaton
52
Anytime Automaton
53
produce a single result
Anytime Automaton
54
produce multiple approx versions of result (i.e., anytime) precise version
Anytime Automaton
55
Data Diffusion Model
Anytime Automaton
56
i n p u t
Anytime Automaton
57
i n p u t
Anytime Automaton
58
i n p u t child works on parent’s approx result
Anytime Automaton
59
i n p u t child works on parent’s approx result parent works on producing better approx result
Anytime Automaton
60
i n p u t
Anytime Automaton
61
i n p u t
Anytime Automaton
62
i n p u t
Anytime Automaton
63
i n p u t final output available early
Anytime Automaton
64
i n p u t
Anytime Automaton
65
i n p u t
Anytime Automaton
66
i n p u t
Anytime Automaton
67
i n p u t
Anytime Automaton
68
i n p u t
Anytime Automaton
69
i n p u t precise output
Anytime Automaton – The Model
70
- 1. Ensure precise output is always produced eventually.
Anytime Automaton – The Model
71
- 1. Ensure precise output is always produced eventually.
Anytime Automaton – The Model
72
- 1. Ensure precise output is always produced eventually.
kernel () { ... }
Anytime Automaton – The Model
73
- 1. Ensure precise output is always produced eventually.
kernel () { ... } pure function
Anytime Automaton – The Model
74
- 1. Ensure precise output is always produced eventually.
kernel () { ... } pure function single-writer, updates isolated
Anytime Automaton – The Model
75
- 2. Create the effect of improving accuracy over time.
Anytime Automaton – The Model
76
- 2. Create the effect of improving accuracy over time.
Anytime (or iterative) algorithms have been studied before but are traditionally built into the coarse-grained derivation of an application. Approximate computing techniques have proliferated recently and have been shown to have general fine-grained applicability.
Anytime Automaton – The Model
77
- 2. Create the effect of improving accuracy over time.
Anytime Automaton – The Model
78
- 2. Create the effect of improving accuracy over time.
anytime algorithm?
Anytime Automaton – The Model
79
- 2. Create the effect of improving accuracy over time.
kernel () { ... }
Anytime Automaton – The Model
80
- 2. Create the effect of improving accuracy over time.
kernel () { ... } approx computing techniques
Anytime Automaton – The Model
81
- 2. Create the effect of improving accuracy over time.
approx computing techniques
Anytime Automaton – The Model
82
- 3. Enable interruptibility via pipelining.
Anytime Automaton – The Model
83
- 3. Enable interruptibility via pipelining.
time computation A computation B computation C
Anytime Automaton – The Model
84
- 3. Enable interruptibility via pipelining.
time computation B computation C
Anytime Automaton – The Model
85
- 3. Enable interruptibility via pipelining.
time computation C
Anytime Automaton – The Model
86
- 3. Enable interruptibility via pipelining.
time computation C final output not ready!
Anytime Automaton – The Model
87
- 3. Enable interruptibility via pipelining.
time computation A computation B computation C
Anytime Automaton – The Model
88
- 3. Enable interruptibility via pipelining.
time computation A computation B computation C
Anytime Automaton – The Model
89
- 3. Enable interruptibility via pipelining.
time computation A computation B computation C
Anytime Automaton – The Model
90
- 3. Enable interruptibility via pipelining.
time approx output ready! computation A computation B computation C
Anytime Automaton – The Approximations
91
- 1. General case: apply approximations iteratively.
Anytime Automaton – The Approximations
92
- 1. General case: apply approximations iteratively.
quality knob
Anytime Automaton – The Approximations
93
loop perforation smaller perforation stride load value approximation lower approximation degree neural acceleration higher neural network complexity SRAM bit upsets higher supply voltage floating-point precision more mantissa bits
- 1. General case: apply approximations iteratively.
Anytime Automaton – The Approximations
94
- 1. General case: apply approximations iteratively.
- Loop perforation
Anytime Automaton – The Approximations
95
perforation stride 20: for i = 0, 20, 40, 60, 80, 100, ...., N-1
- 1. General case: apply approximations iteratively.
- Loop perforation
Anytime Automaton – The Approximations
96
- 1. General case: apply approximations iteratively.
- Loop perforation
perforation stride 20: for i = 0, 20, 40, 60, 80, 100, ...., N-1 perforation stride 15: for i = 0, 15, 30, 45, 60, 75, ....., N-1 perforation stride 10: for i = 0, 10, 20, 30, 40, 50, ....., N-1 perforation stride 5: for i = 0, 5, 10, 15, 20, 25, ......, N-1 perforation stride 1: for i = 0, 1, 2, 3, 4, 5, 6, 7, ...., N-1
Anytime Automaton – The Approximations
97
- 1. General case: apply approximations iteratively.
- Loop perforation
perforation stride 20: for i = 0, 20, 40, 60, 80, 100, ...., N-1 perforation stride 15: for i = 0, 15, 30, 45, 60, 75, ....., N-1 perforation stride 10: for i = 0, 10, 20, 30, 40, 50, ....., N-1 perforation stride 5: for i = 0, 5, 10, 15, 20, 25, ......, N-1 perforation stride 1: for i = 0, 1, 2, 3, 4, 5, 6, 7, ...., N-1
Achieves desired effect of improving quality over time, but can yield redundant work.
Anytime Automaton – The Approximations
98
- 1. General case: apply approximations iteratively.
- Loop perforation
perforation stride 20: for i = 0, 20, 40, 60, 80, 100, ...., N-1 perforation stride 15: for i = 0, 15, 30, 45, 60, 75, ....., N-1 perforation stride 10: for i = 0, 10, 20, 30, 40, 50, ....., N-1 perforation stride 5: for i = 0, 5, 10, 15, 20, 25, ......, N-1 perforation stride 1: for i = 0, 1, 2, 3, 4, 5, 6, 7, ...., N-1
Anytime Automaton – The Approximations
99
- 2. Better case: apply diffusive approximations.
- Each approximation builds on the previous one.
Anytime Automaton – The Approximations
100
- 2. Better case: apply diffusive approximations.
- Each approximation builds on the previous one.
data dependences (each approximate result contributes usefully to precise result)
Anytime Automaton – The Approximations
101
- 2. Better case: apply diffusive approximations.
- Each approximation builds on the previous one.
data dependences (each approximate result contributes usefully to precise result) data sampling more samples integer/fixed-point precision more bits
Anytime Automaton – The Approximations
102
- 2. Better case: apply diffusive approximations.
- Input sampling (e.g., generating a distribution)
Anytime Automaton – The Approximations
103
- 2. Better case: apply diffusive approximations.
- Input sampling (e.g., generating a distribution)
Anytime Automaton – The Approximations
104
- 2. Better case: apply diffusive approximations.
- Input sampling (e.g., generating a distribution)
commutative
- peration
Anytime Automaton – The Approximations
105
- 2. Better case: apply diffusive approximations.
- Input sampling (e.g., generating a distribution)
To improve quality, no need to reiterate from beginning; therefore, diffusive. (e.g., just add more samples to current result)
Anytime Automaton – The Approximations
106
- 2. Better case: apply diffusive approximations.
- Input sampling (e.g., generating a distribution)
Anytime Automaton – The Approximations
107
- 2. Better case: apply diffusive approximations.
- Input sampling (e.g., generating a distribution)
Anytime Automaton – The Approximations
108
- 2. Better case: apply diffusive approximations.
- Input sampling (e.g., generating a distribution)
Anytime Automaton – The Approximations
109
- 2. Better case: apply diffusive approximations.
- Input sampling (e.g., generating a distribution)
Anytime Automaton – The Approximations
110
- 2. Better case: apply diffusive approximations.
- Input sampling (e.g., generating a distribution)
Minimal redundant work since each element processed exactly once.
Anytime Automaton – The Approximations
111
- 2. Better case: apply diffusive approximations.
- Output sampling (e.g., generating an image)
Anytime Automaton – The Approximations
112
- 2. Better case: apply diffusive approximations.
- Output sampling (e.g., generating an image)
sequential permutation
Anytime Automaton – The Approximations
113
- 2. Better case: apply diffusive approximations.
- Output sampling (e.g., generating an image)
Anytime Automaton – The Approximations
114
- 2. Better case: apply diffusive approximations.
- Output sampling (e.g., generating an image)
Anytime Automaton – The Approximations
115
- 2. Better case: apply diffusive approximations.
- Output sampling (e.g., generating an image)
Anytime Automaton – The Approximations
116
- 2. Better case: apply diffusive approximations.
- Output sampling (e.g., generating an image)
Anytime Automaton – The Approximations
117
- 2. Better case: apply diffusive approximations.
- Output sampling (e.g., generating an image)
Anytime Automaton – The Approximations
118
- 2. Better case: apply diffusive approximations.
- Output sampling (e.g., generating an image)
tree permutation
Anytime Automaton – The Approximations
119
- 2. Better case: apply diffusive approximations.
- Output sampling (e.g., generating an image)
Anytime Automaton – The Approximations
120
- 2. Better case: apply diffusive approximations.
- Output sampling (e.g., generating an image)
Anytime Automaton – The Approximations
121
- 2. Better case: apply diffusive approximations.
- Output sampling (e.g., generating an image)
Anytime Automaton – The Approximations
122
- 2. Better case: apply diffusive approximations.
- Integer/fixed-point precision (e.g., dot product)
[ X Y Z ] ● [ 10.1101 01.0010 11.0110 ]
X * 10.1101 Y * 01.0010 Z * 11.0110
Anytime Automaton – The Approximations
123
- 2. Better case: apply diffusive approximations.
- Integer/fixed-point precision (e.g., dot product)
time
[ X Y Z ] ● [ 10.1101 01.0010 11.0110 ]
X * 10.1101 Y * 01.0010 Z * 11.0110
Anytime Automaton – The Approximations
124
- 2. Better case: apply diffusive approximations.
- Integer/fixed-point precision (e.g., dot product)
time final result not ready!
[ X Y Z ] ● [ 10.1101 01.0010 11.0110 ]
X * 10.1101 Y * 01.0010 Z * 11.0110
Anytime Automaton – The Approximations
125
- 2. Better case: apply diffusive approximations.
- Integer/fixed-point precision (e.g., dot product)
time
[ X Y Z ] ● [ 10.1101 01.0010 11.0110 ]
Y * 01.0010 Z * 11.0110
Anytime Automaton – The Approximations
126
- 2. Better case: apply diffusive approximations.
- Integer/fixed-point precision (e.g., dot product)
time
X * 10 X * 11 X * 01
MSb LSb
[ X Y Z ] ● [ 10.1101 01.0010 11.0110 ]
Anytime Automaton – The Approximations
127
- 2. Better case: apply diffusive approximations.
- Integer/fixed-point precision (e.g., dot product)
time
X * 10 Y * 01 Z * 11 X * 11 Y * 00 Z * 01 X * 01 Y * 10 Z * 10
MSb LSb
[ X Y Z ] ● [ 10.1101 01.0010 11.0110 ]
Anytime Automaton – The Approximations
128
- 2. Better case: apply diffusive approximations.
- Integer/fixed-point precision (e.g., dot product)
time
X * 10 Y * 01 Z * 11 X * 11 Y * 00 Z * 01 X * 01 Y * 10 Z * 10
MSb LSb
[ X Y Z ] ● [ 10.1101 01.0010 11.0110 ]
Anytime Automaton – The Approximations
129
- 2. Better case: apply diffusive approximations.
- Integer/fixed-point precision (e.g., dot product)
time
X * 10 Y * 01 Z * 11 X * 11 Y * 00 Z * 01 X * 01 Y * 10 Z * 10
approx result ready! MSb LSb
[ X Y Z ] ● [ 10.1101 01.0010 11.0110 ]
Anytime Automaton
130
More details in paper:
- Asynchronous/synchronous pipelining
- Data locality with sampling
- Approximate storage techniques
- Thread scheduling
Evaluation – Methodology
131
Experiments:
- IBM Power 780 system
- 4 POWER7+ cores
- 32 total hardware threads
Applications:
- PERFECT and AxBench suites
- 2D convolution (output sampling, reduced precision†, SRAM bit upsets†)
- debayer (output sampling)
- discrete wavelet transform (loop perforation)
- histogram equalization (input and output sampling)
- k-means clustering (output sampling)
†see paper
Evaluation – 2D Convolution
132 10 20 30 40 0.5 1 1.5 2 SNR (dB) runtime (normalized to baseline) inf
better
Evaluation – 2D Convolution
133 10 20 30 40 0.5 1 1.5 2 SNR (dB) runtime (normalized to baseline) inf
better
Evaluation – 2D Convolution
134 10 20 30 40 0.5 1 1.5 2 SNR (dB) runtime (normalized to baseline) inf
better
Evaluation – 2D Convolution
135 10 20 30 40 0.5 1 1.5 2 SNR (dB) runtime (normalized to baseline) inf
better
Evaluation – 2D Convolution
136 10 20 30 40 0.5 1 1.5 2 SNR (dB) runtime (normalized to baseline) inf
better
Evaluation – 2D Convolution
137 10 20 30 40 0.5 1 1.5 2 SNR (dB) runtime (normalized to baseline) inf
better
Evaluation – 2D Convolution
138 10 20 30 40 0.5 1 1.5 2 SNR (dB) runtime (normalized to baseline) inf
better
Evaluation – Debayer
139 5 10 15 20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 SNR (dB) runtime (normalized to baseline) inf
better
Evaluation – Debayer
140 5 10 15 20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 SNR (dB) runtime (normalized to baseline) inf
better
Evaluation – Debayer
141 5 10 15 20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 SNR (dB) runtime (normalized to baseline) inf
better
Evaluation – Debayer
142 5 10 15 20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 SNR (dB) runtime (normalized to baseline) inf
better
Evaluation – Debayer
143 5 10 15 20 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 SNR (dB) runtime (normalized to baseline) inf
better
Evaluation – Discrete Wavelet Transform
144 5 10 15 20 25 30 0.5 1 1.5 2 2.5 SNR (dB) runtime (normalized to baseline) inf
better
Evaluation – Discrete Wavelet Transform
145 5 10 15 20 25 30 0.5 1 1.5 2 2.5 SNR (dB) runtime (normalized to baseline) inf
better
Evaluation – Discrete Wavelet Transform
146 5 10 15 20 25 30 0.5 1 1.5 2 2.5 SNR (dB) runtime (normalized to baseline) inf
better
Evaluation – Discrete Wavelet Transform
147 5 10 15 20 25 30 0.5 1 1.5 2 2.5 SNR (dB) runtime (normalized to baseline) inf
better
Evaluation – Discrete Wavelet Transform
148 5 10 15 20 25 30 0.5 1 1.5 2 2.5 SNR (dB) runtime (normalized to baseline) inf
better
Evaluation – Summary
149
how acceptable the output is how much time is expended
Conclusion
150
We propose the Anytime Automaton:
- A new computation model for approximate computing.
application execution quality precise
- utput
Conclusion
151
We propose the Anytime Automaton:
- A new computation model for approximate computing.
application execution quality holistic quality control: final output available early precise
- utput
Conclusion
152
We propose the Anytime Automaton:
- A new computation model for approximate computing.
application execution quality interruptibility: use current output if needed precise
- utput
Conclusion
153
We propose the Anytime Automaton:
- A new computation model for approximate computing.
application execution quality precise
- utput
user flexibility: wait longer for better quality
Thank you
The Anytime Automaton
Joshua San Miguel Natalie Enright Jerger Special thanks to IBM collaborators: Viji Srinivasan, Ravi Nair, Dan Prener