Xiangyu (Mike) Zhang, Ramin Bashizade, Craig LaBoda, Chris Dwyer*, Alvin R. Lebeck
Architecting a Stochastic Computing Unit with Molecular Optical Devices
Duke University *Parabon Labs
Architecting a Stochastic Computing Unit with Molecular Optical - - PowerPoint PPT Presentation
Architecting a Stochastic Computing Unit with Molecular Optical Devices Xiangyu (Mike) Zhang, Ramin Bashizade, Craig LaBoda, Chris Dwyer*, Alvin R. Lebeck Duke University *Parabon Labs Stochastic (Probabilistic) Computing [Hamra et al.,
Xiangyu (Mike) Zhang, Ramin Bashizade, Craig LaBoda, Chris Dwyer*, Alvin R. Lebeck
Architecting a Stochastic Computing Unit with Molecular Optical Devices
Duke University *Parabon Labs
Stochastic (Probabilistic) Computing
2
Computer vision
[Geiger et al., 2012] [Shin et al., 2015]
Earthquake prediction Medical statistics
Image source: mayo.edu [Hamra et al., 2013]
Statistical Machine Learning
image pairs.
Using Markov Chain Monte Carlo method
Right image Left image
ππ ππ
3
while(not converged) { for each pixel { 1) compute probabilities of each possible label; 2) randomly assign new label based on the probabilities; } }
Gibbs Sampling
4
Using Markov Chain Monte Carlo method
Left image Label dependency
Label value nbr2βs label value nbr3βs label value nbr4βs label value nbr1βs label value
data Right image Energy function:
πΉ(πΈππ’π, ππππππ‘)
while(not converged) { for each pixel { 1) compute probabilities of each possible label; 2) randomly assign new label based on the probabilities; } }
4
Using Markov Chain Monte Carlo method
Left image
Pixel Possible Matchings
Label dependency
Label value nbr2βs label value nbr3βs label value nbr4βs label value nbr1βs label value
data Right image Energy function:
πΉ(πΈππ’π, ππππππ‘)
while(not converged) { for each pixel { 1) compute probabilities of each possible label; 2) randomly assign new label based on the probabilities; } }
4
Using Markov Chain Monte Carlo method
Left image
Pixel Possible Matchings
Label dependency Pr: 0.1 0.5 0.2 0.1
Label value nbr2βs label value nbr3βs label value nbr4βs label value nbr1βs label value
data Right image Energy function:
πΉ(πΈππ’π, ππππππ‘)
while(not converged) { for each pixel { 1) compute probabilities of each possible label; 2) randomly assign new label based on the probabilities; } }
4
Using Markov Chain Monte Carlo method
Left image
Pixel Possible Matchings
Label dependency Pr: 0.1 0.5 0.2 0.1
Label value nbr2βs label value nbr3βs label value nbr4βs label value nbr1βs label value
data Right image Energy function:
πΉ(πΈππ’π, ππππππ‘)
while(not converged) { for each pixel { 1) compute probabilities of each possible label; 2) randomly assign new label based on the probabilities; } }
4
Using Markov Chain Monte Carlo method
Left image
Pixel Possible Matchings
Label dependency Pr: 0.1 0.5 0.2 0.1
Label value nbr2βs label value nbr3βs label value nbr4βs label value nbr1βs label value
data Right image
β
Energy function:
πΉ(πΈππ’π, ππππππ‘)
while(not converged) { for each pixel { 1) compute probabilities of each possible label; 2) randomly assign new label based on the probabilities; } }
4
Using Markov Chain Monte Carlo method
Left image
Pixel Possible Matchings
Label dependency
Label value nbr2βs label value nbr3βs label value nbr4βs label value nbr1βs label value
data
source: tricks.coRight image Energy function:
πΉ(πΈππ’π, ππππππ‘)
while(not converged) { for each pixel { 1) compute probabilities of each possible label; 2) randomly assign new label based on the probabilities; } }
4
Using Markov Chain Monte Carlo method
Left image
Pixel Possible Matchings
Label dependency
Label value nbr2βs label value nbr3βs label value nbr4βs label value nbr1βs label value
data
1
Right image Energy function:
πΉ(πΈππ’π, ππππππ‘)
while(not converged) { for each pixel { 1) compute probabilities of each possible label; 2) randomly assign new label based on the probabilities; } }
4
Using Markov Chain Monte Carlo method
Left image Label dependency
Label value nbr2βs label value nbr3βs label value nbr4βs label value nbr1βs label value
data Right image Disparity map (lighter is closer) Energy function:
πΉ(πΈππ’π, ππππππ‘)
while(not converged) { for each pixel { 1) compute probabilities of each possible label; 2) randomly assign new label based on the probabilities; } }
4
Using Markov Chain Monte Carlo method
Left image Label dependency
Label value nbr2βs label value nbr3βs label value nbr4βs label value nbr1βs label value
data Right image Disparity map (lighter is closer) Energy function:
πΉ(πΈππ’π, ππππππ‘)
while(not converged) { for each pixel { 1) compute probabilities of each possible label; 2) randomly assign new label based on the probabilities; } }
4
Using Markov Chain Monte Carlo method
Left image Label dependency
Emerging Technology + Hardware specialization
[Wang et al., 2016]
Label value nbr2βs label value nbr3βs label value nbr4βs label value nbr1βs label value
data Right image Disparity map (lighter is closer) Energy function:
πΉ(πΈππ’π, ππππππ‘)
Light Source Fluorescent Molecules Photon Detector
t
Fluorescence PDF
5
Review: Sampling Using Molecules
Light Source Fluorescent Molecules Photon Detector
t
Fluorescence PDF
π π’ = ππβππ’
5
Review: Sampling Using Molecules
Light Source Fluorescent Molecules Photon Detector
t
Fluorescence PDF
π π’ = ππβππ’
16
Review: Sampling Using Molecules
π’π
Light Source Fluorescent Molecules Photon Detector
t
Fluorescence PDF
π π’ = ππβππ’
17
Review: Sampling Using Molecules
π β Γ concentration π’π 1 2 intensity
Light Source Fluorescent Molecules Photon Detector
t
Fluorescence PDF
π π’ = ππβππ’
18
Review: Sampling Using Molecules
π β Γ concentration π’π
2
intensity
Light Source Fluorescent Molecules Photon Detector
t
Fluorescence PDF
π π’ = ππβππ’
19
Review: Sampling Using Molecules
π β Γ concentration π’π intensity
RET Circuit
Light Source Fluorescent Molecules Photon Detector
t
Fluorescence PDF
π π’ = ππβππ’
20
Review: Sampling Using Molecules
π β Γ concentration π’π intensity
RSU-G
Review: RET-based Gibbs Sampling Unit (RSU-G)
6
Application values β Probability Values
RET samples β Application Values
CMOS
CMOS
RET Circuit RET Circuit RET Circuit RET Circuit Sample generation RET + CMOS Hybrid
1) compute probabilities of each possible label; 2) randomly assign new label based on the probabilities;
Review: RET-based Gibbs Sampling Unit (RSU-G)
6
Application values β Probability Values
RET samples β Application Values
CMOS
CMOS
RET Circuit RET Circuit RET Circuit RET Circuit Sample generation RET + CMOS Hybrid Data (πΈ) Labels (ππ‘)
π = exp(βπΉ(πΈ, π)) 1) compute probabilities of each possible label; 2) randomly assign new label based on the probabilities;
Review: RET-based Gibbs Sampling Unit (RSU-G)
6
Application values β Probability Values
RET samples β Application Values
CMOS
CMOS
RET Circuit RET Circuit RET Circuit RET Circuit Sample generation RET + CMOS Hybrid Data (πΈ) Labels (ππ‘)
π = exp(βπΉ(πΈ, π)) π π’ = πexp(βππ’) π = ππ ππππ(π’1, π’2, β¦ ) 1) compute probabilities of each possible label; 2) randomly assign new label based on the probabilities;
βFirst-to-fireβ [Wang et al., 2015]
Review: RET-based Gibbs Sampling Unit (RSU-G)
6
Application values β Probability Values
RET samples β Application Values
CMOS
CMOS
RET Circuit RET Circuit RET Circuit RET Circuit Sample generation RET + CMOS Hybrid Data (πΈ) Labels (ππ‘)
π = exp(βπΉ(πΈ, π)) π π’ = πexp(βππ’) π = ππ ππππ(π’1, π’2, β¦ ) 1) compute probabilities of each possible label; 2) randomly assign new label based on the probabilities;
βFirst-to-fireβ [Wang et al., 2015]
Review: RET-based Gibbs Sampling Unit (RSU-G)
7
Original Prototype result
Image Segmentation
[Wang et al., 2016]
Review: RET-based Gibbs Sampling Unit (RSU-G)
7
Software disparity map RSU-G disparity map
Stereo vision teddy dataset
Original Prototype result
Image Segmentation
[Wang et al., 2016]
Review: RET-based Gibbs Sampling Unit (RSU-G)
7
Software disparity map RSU-G disparity map
Stereo vision teddy dataset
How to preserve result quality?
Original Prototype result
Image Segmentation
[Wang et al., 2016]
How to do this? NaΓ―ve Approach
8
Application values β Probability Values
RET samples β Application Values
Sample generation RET Circuit RET Circuit RET Circuit RET Circuit
How to do this? NaΓ―ve Approach
8
Application values β Probability Values
RET samples β Application Values
Sample generation RET Circuit RET Circuit RET Circuit RET Circuit
Progressive Quality Analysis
How to do this? NaΓ―ve Approach
8
Application values β Probability Values
RET samples β Application Values
Sample generation RET Circuit RET Circuit RET Circuit RET Circuit
How to do this? NaΓ―ve Approach
8
Application values β Probability Values
RET samples β Application Values
Sample generation RET Circuit RET Circuit RET Circuit RET Circuit
Improving Decay Rate (π) Dynamic Ranges
9
Application values β Probability Values
RET samples β Application Values
Sample generation RET Circuit RET Circuit RET Circuit RET Circuit
π = exp(βπΉ(πΈ, π))
CMOS RET + CMOS Hybrid CMOS
Improving Decay Rate (π) Dynamic Ranges
9
Application values β Probability Values
RET samples β Application Values
Sample generation RET Circuit RET Circuit RET Circuit RET Circuit
π = exp(βπΉ(πΈ, π)) High result quality.
CMOS RET + CMOS Hybrid CMOS
Improving Decay Rate (π) Dynamic Ranges
9
Application values β Probability Values
RET samples β Application Values
Sample generation RET Circuit RET Circuit RET Circuit RET Circuit
π = exp(βπΉ(πΈ, π)) High result quality. Minimal area/power.
CMOS RET + CMOS Hybrid CMOS
Exploring Sample Generation
10
t
Application values β Probability Values
RET samples β Application Values
Sample generation RET Circuit RET Circuit RET Circuit RET Circuit
π π’ = πexp(βππ’)
CMOS RET + CMOS Hybrid CMOS
Exploring Sample Generation
10
t
Application values β Probability Values
RET samples β Application Values
Sample generation RET Circuit RET Circuit RET Circuit RET Circuit
π π’ = πexp(βππ’)
CMOS RET + CMOS Hybrid CMOS
Exploring Sample Generation
10
t
Application values β Probability Values
RET samples β Application Values
Sample generation RET Circuit RET Circuit RET Circuit RET Circuit
π π’ = πexp(βππ’)
CMOS RET + CMOS Hybrid CMOS
Timing Resolution vs. Truncation Probability
38
High Quality (up to a point) Low Cost High Quality
Bad-pixel Percentage
(ππππ)
Timing Resolution vs. Truncation Probability
39
High Quality (up to a point) Low Cost
Previous RSU-G
High Quality
Bad-pixel Percentage
(ππππ)
Timing Resolution vs. Truncation Probability
40
High Quality (up to a point) Low Cost
Previous RSU-G
New RSU-G High Quality
Bad-pixel Percentage
(ππππ)
Timing Resolution vs. Truncation Probability
41
High Quality (up to a point) Low Cost
Previous RSU-G
New RSU-G High Quality
Bad-pixel Percentage
(ππππ)
Re-design RET Circuit
12
β intensity
QDLEDs PD
RET
ππ
Previous RET Circuit
Re-design RET Circuit
12
QDLED7 QDLED0
β¦
β concentration β intensity
QDLEDs PD
RET
ππ
Previous RET Circuit
Re-design RET Circuit
12
QDLED7 QDLED0 PD PD PD PD MUX
ππ
β¦
PD PD PD PD
New RET Circuit
β concentration β intensity
QDLEDs PD
RET
ππ
Previous RET Circuit
QDLED7
Sharing Light Sources
13
QDLED0
RSU-G1 RET Circuit RSU-G2 RET Circuit RSU-G3 RET Circuit
New RSU-G
14
Result Quality
15
0% 5% 10% 15% 20% 25% 30% 35% teddy poster art
Bad-pixel Percentage (BP)
Software new_RSU-G
Stereo vision result quality (lower is better)
Software disparity map New RSU-G disparity map
Stereo vision teddy dataset
Performance / Area / Power
1 2 3 4 5 6 320x320 1920x1080
Speedup over GPU
Stereo vision New RSU-G augmented GPU
5.3x
16
Performance / Area / Power
1 2 3 4 5 6 320x320 1920x1080
Speedup over GPU
Stereo vision New RSU-G augmented GPU
500 1000 1500 2000 2500 3000 3500 4000
Area ( ) um2
RSU-G (no sharing) RSU-G (4 sharing)
Area: RSU-G
5.3x
16
Performance / Area / Power
1 2 3 4 5 6 320x320 1920x1080
Speedup over GPU
Stereo vision New RSU-G augmented GPU
500 1000 1500 2000 2500 3000 3500 4000
Area ( ) um2
RSU-G (no sharing) RSU-G (4 sharing) Intel DRNG (AES part) LFSR (19-bit)
[Hofemeier, 2012]
Area: RSU-G vs. CMOS alternatives
5.3x
16
Performance / Area / Power
1 2 3 4 5 6 320x320 1920x1080
Speedup over GPU
Stereo vision New RSU-G augmented GPU
500 1000 1500 2000 2500 3000 3500 4000
Area ( ) um2
RSU-G (no sharing) RSU-G (4 sharing) Intel DRNG (AES part) LFSR (19-bit)
[Hofemeier, 2012]
Area: RSU-G vs. CMOS alternatives
5.3x
16
Conclusion
17
Conclusion
17
High result quality. Minimal area/power. Sizable performance benefits. More flexibility.
Conclusion
17
accelerators.
High result quality. Minimal area/power. Sizable performance benefits. More flexibility.
Conclusion
17
accelerators.
High result quality. Minimal area/power. Sizable performance benefits. More flexibility.
18