Architecting a Stochastic Computing Unit with Molecular Optical - - PowerPoint PPT Presentation

β–Ά
architecting a stochastic computing unit with molecular
SMART_READER_LITE
LIVE PREVIEW

Architecting a Stochastic Computing Unit with Molecular Optical - - PowerPoint PPT Presentation

Architecting a Stochastic Computing Unit with Molecular Optical Devices Xiangyu (Mike) Zhang, Ramin Bashizade, Craig LaBoda, Chris Dwyer*, Alvin R. Lebeck Duke University *Parabon Labs Stochastic (Probabilistic) Computing [Hamra et al.,


slide-1
SLIDE 1

Xiangyu (Mike) Zhang, Ramin Bashizade, Craig LaBoda, Chris Dwyer*, Alvin R. Lebeck

Architecting a Stochastic Computing Unit with Molecular Optical Devices

Duke University *Parabon Labs

slide-2
SLIDE 2

Stochastic (Probabilistic) Computing

2

Computer vision

[Geiger et al., 2012] [Shin et al., 2015]

Earthquake prediction Medical statistics

Image source: mayo.edu [Hamra et al., 2013]

  • Probabilistic algorithms (e.g. Markov Chain Monte Carlo):
  • Key to performance: generating samples.
  • Problem: Sampling overhead is TOO HIGH:
  • A sample from a simple distribution: >500 cycles.

Statistical Machine Learning

slide-3
SLIDE 3
  • Stereo Vision: reconstruct the depth information from

image pairs.

  • Matching corresponding pixels.
  • Calculate disparity: location difference π‘¦π‘š βˆ’ 𝑦𝑠.

Using Markov Chain Monte Carlo method

Right image Left image

π’šπ’” π’šπ’Ž

3

slide-4
SLIDE 4

while(not converged) { for each pixel { 1) compute probabilities of each possible label; 2) randomly assign new label based on the probabilities; } }

Gibbs Sampling

  • Label: disparity on that pixel.

4

Using Markov Chain Monte Carlo method

Left image Label dependency

Label value nbr2’s label value nbr3’s label value nbr4’s label value nbr1’s label value

data Right image Energy function:

𝐹(𝐸𝑏𝑒𝑏, π‘€π‘π‘π‘šπ‘“π‘‘)

slide-5
SLIDE 5

while(not converged) { for each pixel { 1) compute probabilities of each possible label; 2) randomly assign new label based on the probabilities; } }

  • Label: disparity on that pixel.

4

Using Markov Chain Monte Carlo method

Left image

Pixel Possible Matchings

Label dependency

Label value nbr2’s label value nbr3’s label value nbr4’s label value nbr1’s label value

data Right image Energy function:

𝐹(𝐸𝑏𝑒𝑏, π‘€π‘π‘π‘šπ‘“π‘‘)

slide-6
SLIDE 6

while(not converged) { for each pixel { 1) compute probabilities of each possible label; 2) randomly assign new label based on the probabilities; } }

  • Label: disparity on that pixel.

4

Using Markov Chain Monte Carlo method

Left image

Pixel Possible Matchings

Label dependency Pr: 0.1 0.5 0.2 0.1

Label value nbr2’s label value nbr3’s label value nbr4’s label value nbr1’s label value

data Right image Energy function:

𝐹(𝐸𝑏𝑒𝑏, π‘€π‘π‘π‘šπ‘“π‘‘)

slide-7
SLIDE 7

while(not converged) { for each pixel { 1) compute probabilities of each possible label; 2) randomly assign new label based on the probabilities; } }

  • Label: disparity on that pixel.

4

Using Markov Chain Monte Carlo method

Left image

Pixel Possible Matchings

Label dependency Pr: 0.1 0.5 0.2 0.1

Label value nbr2’s label value nbr3’s label value nbr4’s label value nbr1’s label value

data Right image Energy function:

𝐹(𝐸𝑏𝑒𝑏, π‘€π‘π‘π‘šπ‘“π‘‘)

slide-8
SLIDE 8

while(not converged) { for each pixel { 1) compute probabilities of each possible label; 2) randomly assign new label based on the probabilities; } }

  • Label: disparity on that pixel.

4

Using Markov Chain Monte Carlo method

Left image

Pixel Possible Matchings

Label dependency Pr: 0.1 0.5 0.2 0.1

Label value nbr2’s label value nbr3’s label value nbr4’s label value nbr1’s label value

data Right image

βˆ’

Energy function:

𝐹(𝐸𝑏𝑒𝑏, π‘€π‘π‘π‘šπ‘“π‘‘)

slide-9
SLIDE 9

while(not converged) { for each pixel { 1) compute probabilities of each possible label; 2) randomly assign new label based on the probabilities; } }

  • Label: disparity on that pixel.

4

Using Markov Chain Monte Carlo method

Left image

Pixel Possible Matchings

Label dependency

Label value nbr2’s label value nbr3’s label value nbr4’s label value nbr1’s label value

data

source: tricks.co

Right image Energy function:

𝐹(𝐸𝑏𝑒𝑏, π‘€π‘π‘π‘šπ‘“π‘‘)

slide-10
SLIDE 10

while(not converged) { for each pixel { 1) compute probabilities of each possible label; 2) randomly assign new label based on the probabilities; } }

  • Label: disparity on that pixel.

4

Using Markov Chain Monte Carlo method

Left image

Pixel Possible Matchings

Label dependency

Label value nbr2’s label value nbr3’s label value nbr4’s label value nbr1’s label value

data

1

Right image Energy function:

𝐹(𝐸𝑏𝑒𝑏, π‘€π‘π‘π‘šπ‘“π‘‘)

slide-11
SLIDE 11

while(not converged) { for each pixel { 1) compute probabilities of each possible label; 2) randomly assign new label based on the probabilities; } }

  • Label: disparity on that pixel.

4

Using Markov Chain Monte Carlo method

Left image Label dependency

Label value nbr2’s label value nbr3’s label value nbr4’s label value nbr1’s label value

data Right image Disparity map (lighter is closer) Energy function:

𝐹(𝐸𝑏𝑒𝑏, π‘€π‘π‘π‘šπ‘“π‘‘)

slide-12
SLIDE 12

while(not converged) { for each pixel { 1) compute probabilities of each possible label; 2) randomly assign new label based on the probabilities; } }

  • Label: disparity on that pixel.

4

Using Markov Chain Monte Carlo method

Left image Label dependency

Label value nbr2’s label value nbr3’s label value nbr4’s label value nbr1’s label value

data Right image Disparity map (lighter is closer) Energy function:

𝐹(𝐸𝑏𝑒𝑏, π‘€π‘π‘π‘šπ‘“π‘‘)

slide-13
SLIDE 13

while(not converged) { for each pixel { 1) compute probabilities of each possible label; 2) randomly assign new label based on the probabilities; } }

  • Label: disparity on that pixel.

4

Using Markov Chain Monte Carlo method

Left image Label dependency

Emerging Technology + Hardware specialization

[Wang et al., 2016]

Label value nbr2’s label value nbr3’s label value nbr4’s label value nbr1’s label value

data Right image Disparity map (lighter is closer) Energy function:

𝐹(𝐸𝑏𝑒𝑏, π‘€π‘π‘π‘šπ‘“π‘‘)

slide-14
SLIDE 14

Light Source Fluorescent Molecules Photon Detector

t

Fluorescence PDF

5

Review: Sampling Using Molecules

slide-15
SLIDE 15

Light Source Fluorescent Molecules Photon Detector

t

Fluorescence PDF

  • Single fluorophore: exponential distribution.

π‘ž 𝑒 = πœ‡π‘“βˆ’πœ‡π‘’

5

Review: Sampling Using Molecules

slide-16
SLIDE 16

Light Source Fluorescent Molecules Photon Detector

t

Fluorescence PDF

  • Single fluorophore: exponential distribution.
  • Record Time to Fluorescence 𝑒𝑔.

π‘ž 𝑒 = πœ‡π‘“βˆ’πœ‡π‘’

16

Review: Sampling Using Molecules

𝑒𝑔

slide-17
SLIDE 17

Light Source Fluorescent Molecules Photon Detector

t

Fluorescence PDF

  • Single fluorophore: exponential distribution.
  • Record Time to Fluorescence 𝑒𝑔.
  • Parameterize distributions by decay rate ( Ξ»):

π‘ž 𝑒 = πœ‡π‘“βˆ’πœ‡π‘’

17

Review: Sampling Using Molecules

πœ‡ ∝ Γ— concentration 𝑒𝑔 1 2 intensity

slide-18
SLIDE 18

Light Source Fluorescent Molecules Photon Detector

t

Fluorescence PDF

  • Single fluorophore: exponential distribution.
  • Record Time to Fluorescence 𝑒𝑔.
  • Parameterize distributions by decay rate ( Ξ»):

π‘ž 𝑒 = πœ‡π‘“βˆ’πœ‡π‘’

18

Review: Sampling Using Molecules

πœ‡ ∝ Γ— concentration 𝑒𝑔

2

intensity

slide-19
SLIDE 19

Light Source Fluorescent Molecules Photon Detector

t

Fluorescence PDF

  • Single fluorophore: exponential distribution.
  • Record Time to Fluorescence 𝑒𝑔.
  • Parameterize distributions by decay rate ( Ξ»):

π‘ž 𝑒 = πœ‡π‘“βˆ’πœ‡π‘’

19

Review: Sampling Using Molecules

πœ‡ ∝ Γ— concentration 𝑒𝑔 intensity

slide-20
SLIDE 20

RET Circuit

Light Source Fluorescent Molecules Photon Detector

t

Fluorescence PDF

  • Single fluorophore: exponential distribution.
  • Record Time to Fluorescence 𝑒𝑔.
  • Parameterize distributions by decay rate ( Ξ»):

π‘ž 𝑒 = πœ‡π‘“βˆ’πœ‡π‘’

20

Review: Sampling Using Molecules

πœ‡ ∝ Γ— concentration 𝑒𝑔 intensity

  • Implemented in Resonance Energy Transfer (RET) circuit.
slide-21
SLIDE 21

RSU-G

Review: RET-based Gibbs Sampling Unit (RSU-G)

6

Application values β†’ Probability Values

RET samples β†’ Application Values

CMOS

CMOS

RET Circuit RET Circuit RET Circuit RET Circuit Sample generation RET + CMOS Hybrid

1) compute probabilities of each possible label; 2) randomly assign new label based on the probabilities;

  • Accelerates inner loop computation:
slide-22
SLIDE 22

Review: RET-based Gibbs Sampling Unit (RSU-G)

6

Application values β†’ Probability Values

RET samples β†’ Application Values

CMOS

CMOS

RET Circuit RET Circuit RET Circuit RET Circuit Sample generation RET + CMOS Hybrid Data (𝐸) Labels (𝑀𝑑)

πœ‡ = exp(βˆ’πΉ(𝐸, 𝑀)) 1) compute probabilities of each possible label; 2) randomly assign new label based on the probabilities;

  • Accelerates inner loop computation:
slide-23
SLIDE 23

Review: RET-based Gibbs Sampling Unit (RSU-G)

6

Application values β†’ Probability Values

RET samples β†’ Application Values

CMOS

CMOS

RET Circuit RET Circuit RET Circuit RET Circuit Sample generation RET + CMOS Hybrid Data (𝐸) Labels (𝑀𝑑)

πœ‡ = exp(βˆ’πΉ(𝐸, 𝑀)) π‘ž 𝑒 = πœ‡exp(βˆ’πœ‡π‘’) 𝑀 = π‘π‘ π‘•π‘›π‘—π‘œ(𝑒1, 𝑒2, … ) 1) compute probabilities of each possible label; 2) randomly assign new label based on the probabilities;

  • Accelerates inner loop computation:

β€œFirst-to-fire” [Wang et al., 2015]

slide-24
SLIDE 24

Review: RET-based Gibbs Sampling Unit (RSU-G)

6

Application values β†’ Probability Values

RET samples β†’ Application Values

CMOS

CMOS

RET Circuit RET Circuit RET Circuit RET Circuit Sample generation RET + CMOS Hybrid Data (𝐸) Labels (𝑀𝑑)

πœ‡ = exp(βˆ’πΉ(𝐸, 𝑀)) π‘ž 𝑒 = πœ‡exp(βˆ’πœ‡π‘’) 𝑀 = π‘π‘ π‘•π‘›π‘—π‘œ(𝑒1, 𝑒2, … ) 1) compute probabilities of each possible label; 2) randomly assign new label based on the probabilities;

  • Accelerates inner loop computation:

β€œFirst-to-fire” [Wang et al., 2015]

slide-25
SLIDE 25

Review: RET-based Gibbs Sampling Unit (RSU-G)

7

  • Speedups over GPU:
  • First experimental demonstration of RSU-G.
  • Discrete Accelerator 21-84x
  • Augmented GPU 3-34x

Original Prototype result

Image Segmentation

[Wang et al., 2016]

slide-26
SLIDE 26

Review: RET-based Gibbs Sampling Unit (RSU-G)

7

Software disparity map RSU-G disparity map

Stereo vision teddy dataset

  • Speedups over GPU:
  • First experimental demonstration of RSU-G.
  • Discrete Accelerator 21-84x
  • Augmented GPU 3-34x
  • Result quality: doesn’t reach the level we need!

Original Prototype result

Image Segmentation

[Wang et al., 2016]

slide-27
SLIDE 27

Review: RET-based Gibbs Sampling Unit (RSU-G)

7

Software disparity map RSU-G disparity map

Stereo vision teddy dataset

  • Speedups over GPU:
  • First experimental demonstration of RSU-G.
  • Discrete Accelerator 21-84x
  • Augmented GPU 3-34x
  • Result quality: doesn’t reach the level we need!

How to preserve result quality?

Original Prototype result

Image Segmentation

[Wang et al., 2016]

slide-28
SLIDE 28

How to do this? NaΓ―ve Approach

8

Application values β†’ Probability Values

RET samples β†’ Application Values

Sample generation RET Circuit RET Circuit RET Circuit RET Circuit

slide-29
SLIDE 29

How to do this? NaΓ―ve Approach

8

Application values β†’ Probability Values

RET samples β†’ Application Values

Sample generation RET Circuit RET Circuit RET Circuit RET Circuit

  • Lacks both precision and dynamic range.

Progressive Quality Analysis

slide-30
SLIDE 30

How to do this? NaΓ―ve Approach

8

Application values β†’ Probability Values

RET samples β†’ Application Values

  • Naively increase #bits: too much area/power.

Sample generation RET Circuit RET Circuit RET Circuit RET Circuit

  • Lacks both precision and dynamic range.
slide-31
SLIDE 31

How to do this? NaΓ―ve Approach

8

Application values β†’ Probability Values

RET samples β†’ Application Values

  • Naively increase #bits: too much area/power.
  • Need new microarchitecture to achieve:
  • High result quality.
  • Minimal area/power.
  • Sizable performance benefits.
  • More flexibility.

Sample generation RET Circuit RET Circuit RET Circuit RET Circuit

  • Lacks both precision and dynamic range.
slide-32
SLIDE 32

Improving Decay Rate (πœ‡) Dynamic Ranges

9

Application values β†’ Probability Values

RET samples β†’ Application Values

Sample generation RET Circuit RET Circuit RET Circuit RET Circuit

  • New microarchitectural techniques:
  • Decay rate scaling.
  • Probability cut-off.
  • 2π‘œ approximation.
  • Requires only 4 unique decay rates.
  • Detail described in the paper.

πœ‡ = exp(βˆ’πΉ(𝐸, 𝑀))

CMOS RET + CMOS Hybrid CMOS

slide-33
SLIDE 33

Improving Decay Rate (πœ‡) Dynamic Ranges

9

Application values β†’ Probability Values

RET samples β†’ Application Values

Sample generation RET Circuit RET Circuit RET Circuit RET Circuit

  • New microarchitectural techniques:
  • Decay rate scaling.
  • Probability cut-off.
  • 2π‘œ approximation.
  • Requires only 4 unique decay rates.
  • Detail described in the paper.

πœ‡ = exp(βˆ’πΉ(𝐸, 𝑀)) High result quality.

CMOS RET + CMOS Hybrid CMOS

slide-34
SLIDE 34

Improving Decay Rate (πœ‡) Dynamic Ranges

9

Application values β†’ Probability Values

RET samples β†’ Application Values

Sample generation RET Circuit RET Circuit RET Circuit RET Circuit

  • New microarchitectural techniques:
  • Decay rate scaling.
  • Probability cut-off.
  • 2π‘œ approximation.
  • Requires only 4 unique decay rates.
  • Detail described in the paper.

πœ‡ = exp(βˆ’πΉ(𝐸, 𝑀)) High result quality. Minimal area/power.

CMOS RET + CMOS Hybrid CMOS

slide-35
SLIDE 35

Exploring Sample Generation

10

  • Timing resolution.

t

  • Truncation probability.

Application values β†’ Probability Values

RET samples β†’ Application Values

Sample generation RET Circuit RET Circuit RET Circuit RET Circuit

  • Measuring Time to Fluorescence (TTF).

π‘ž 𝑒 = πœ‡exp(βˆ’πœ‡π‘’)

CMOS RET + CMOS Hybrid CMOS

  • Two Parameters:
slide-36
SLIDE 36

Exploring Sample Generation

10

  • Timing resolution.

t

  • Truncation probability.

Application values β†’ Probability Values

RET samples β†’ Application Values

Sample generation RET Circuit RET Circuit RET Circuit RET Circuit

  • Measuring Time to Fluorescence (TTF).

π‘ž 𝑒 = πœ‡exp(βˆ’πœ‡π‘’)

CMOS RET + CMOS Hybrid CMOS

  • Two Parameters:
slide-37
SLIDE 37

Exploring Sample Generation

10

  • Timing resolution.

t

  • Truncation probability.

Application values β†’ Probability Values

RET samples β†’ Application Values

Sample generation RET Circuit RET Circuit RET Circuit RET Circuit

  • Measuring Time to Fluorescence (TTF).

π‘ž 𝑒 = πœ‡exp(βˆ’πœ‡π‘’)

CMOS RET + CMOS Hybrid CMOS

  • Two Parameters:
slide-38
SLIDE 38

Timing Resolution vs. Truncation Probability

38

  • Dashed line: equal result quality.
  • Pr (𝑒 β‰₯ 𝑒𝑛𝑏𝑦) = 0.5 for πœ‡π‘›π‘—π‘œ (< 0.5 for other πœ‡s).

High Quality (up to a point) Low Cost High Quality

Bad-pixel Percentage

(πœ‡π‘›π‘—π‘œ)

slide-39
SLIDE 39

Timing Resolution vs. Truncation Probability

39

  • Dashed line: equal result quality.
  • Pr (𝑒 β‰₯ 𝑒𝑛𝑏𝑦) = 0.5 for πœ‡π‘›π‘—π‘œ (< 0.5 for other πœ‡s).

βˆ—

High Quality (up to a point) Low Cost

Previous RSU-G

High Quality

Bad-pixel Percentage

(πœ‡π‘›π‘—π‘œ)

slide-40
SLIDE 40

Timing Resolution vs. Truncation Probability

40

  • Dashed line: equal result quality.
  • Pr (𝑒 β‰₯ 𝑒𝑛𝑏𝑦) = 0.5 for πœ‡π‘›π‘—π‘œ (< 0.5 for other πœ‡s).

βˆ— βˆ—

High Quality (up to a point) Low Cost

Previous RSU-G

New RSU-G High Quality

Bad-pixel Percentage

(πœ‡π‘›π‘—π‘œ)

slide-41
SLIDE 41

Timing Resolution vs. Truncation Probability

41

  • Dashed line: equal result quality.
  • Pr (𝑒 β‰₯ 𝑒𝑛𝑏𝑦) = 0.5 for πœ‡π‘›π‘—π‘œ (< 0.5 for other πœ‡s).
  • Fluorophores can emit photons beyond max detection time.
  • Need 8 RET replica to avoid structural hazard.

βˆ— βˆ—

High Quality (up to a point) Low Cost

Previous RSU-G

New RSU-G High Quality

Bad-pixel Percentage

(πœ‡π‘›π‘—π‘œ)

slide-42
SLIDE 42

Re-design RET Circuit

12

  • Previous RET Circuit: QDLEDs dominates the area.

∝ intensity

QDLEDs PD

RET

πœ‡π‘—

Previous RET Circuit

…

Γ—8

slide-43
SLIDE 43

Re-design RET Circuit

12

  • Previous RET Circuit: QDLEDs dominates the area.
  • New RET Circuit:
  • Fixed intensity, 8 waveguides lighted in round robin.

QDLED7 QDLED0

…

∝ concentration ∝ intensity

QDLEDs PD

RET

πœ‡π‘—

Previous RET Circuit

…

Γ—8

slide-44
SLIDE 44

Re-design RET Circuit

12

  • Previous RET Circuit: QDLEDs dominates the area.
  • New RET Circuit:
  • Fixed intensity, 8 waveguides lighted in round robin.
  • Exploit 4 unique decay rates -> 4 unique concentrations.

QDLED7 QDLED0 PD PD PD PD MUX

πœ‡π‘—

…

PD PD PD PD

New RET Circuit

∝ concentration ∝ intensity

QDLEDs PD

RET

πœ‡π‘—

Previous RET Circuit

…

Γ—8

slide-45
SLIDE 45

QDLED7

Sharing Light Sources

13

  • Multiple RET circuits can share light sources.

QDLED0

… …

RSU-G1 RET Circuit RSU-G2 RET Circuit RSU-G3 RET Circuit

  • Further reduces area/power.
  • Opportunities to use one light source per chip.

…

slide-46
SLIDE 46

New RSU-G

14

  • Circuit and microarchitecture changes:
  • Improved decay rate (πœ‡) dynamic ranges.
  • New RET circuit and peripheral circuits.
  • Support multiple energy function 𝐹(𝐸, 𝑀) for more applications.
  • Efficient πœ‡ conversion.
  • Software visible (ISA) change:
  • Addition interface for simulated annealing.
  • Details of the pipeline in the paper.
slide-47
SLIDE 47

Result Quality

15

0% 5% 10% 15% 20% 25% 30% 35% teddy poster art

Bad-pixel Percentage (BP)

Software new_RSU-G

Stereo vision result quality (lower is better)

  • Standard benchmarks and metrics:
  • Stereo vision.
  • Motion estimation.
  • Image segmentation.
  • New RSU-G provides same result quality as software.
  • Other two application results in the paper.

Software disparity map New RSU-G disparity map

Stereo vision teddy dataset

slide-48
SLIDE 48

Performance / Area / Power

  • Preserve speedups.

1 2 3 4 5 6 320x320 1920x1080

Speedup over GPU

Stereo vision New RSU-G augmented GPU

5.3x

16

slide-49
SLIDE 49

Performance / Area / Power

  • Preserve speedups.
  • New RSU-G: 1.00x area, 1.27x power vs. the previous.

1 2 3 4 5 6 320x320 1920x1080

Speedup over GPU

Stereo vision New RSU-G augmented GPU

500 1000 1500 2000 2500 3000 3500 4000

Area ( ) um2

RSU-G (no sharing) RSU-G (4 sharing)

Area: RSU-G

5.3x

16

slide-50
SLIDE 50

Performance / Area / Power

  • Preserve speedups.
  • New RSU-G: 1.00x area, 1.27x power vs. the previous.
  • Replace RET part with CMOS sample generation designs:

1 2 3 4 5 6 320x320 1920x1080

Speedup over GPU

Stereo vision New RSU-G augmented GPU

500 1000 1500 2000 2500 3000 3500 4000

Area ( ) um2

RSU-G (no sharing) RSU-G (4 sharing) Intel DRNG (AES part) LFSR (19-bit)

  • 0.62x area vs. Intel DRNG (AES part).
  • 1.05x area vs. 19-bit LFSR.

[Hofemeier, 2012]

Area: RSU-G vs. CMOS alternatives

5.3x

16

slide-51
SLIDE 51

Performance / Area / Power

  • Preserve speedups.
  • New RSU-G: 1.00x area, 1.27x power vs. the previous.
  • Replace RET part with CMOS sample generation designs:

1 2 3 4 5 6 320x320 1920x1080

Speedup over GPU

Stereo vision New RSU-G augmented GPU

500 1000 1500 2000 2500 3000 3500 4000

Area ( ) um2

RSU-G (no sharing) RSU-G (4 sharing) Intel DRNG (AES part) LFSR (19-bit)

  • 0.62x area vs. Intel DRNG (AES part).
  • 1.05x area vs. 19-bit LFSR.

[Hofemeier, 2012]

  • RSU-G: high quality quantum randomness.

Area: RSU-G vs. CMOS alternatives

5.3x

16

slide-52
SLIDE 52

Conclusion

17

  • MCMC: general framework in statistical machine learning.
  • Quality analysis on standard benchmarks and metrics.
  • Previous RSU-G: didn’t provide required result quality.
slide-53
SLIDE 53

Conclusion

17

  • MCMC: general framework in statistical machine learning.
  • Quality analysis on standard benchmarks and metrics.
  • Previous RSU-G: didn’t provide required result quality.
  • New RSU-G:

High result quality. Minimal area/power. Sizable performance benefits. More flexibility.

slide-54
SLIDE 54

Conclusion

17

  • MCMC: general framework in statistical machine learning.
  • Quality analysis on standard benchmarks and metrics.
  • Previous RSU-G: didn’t provide required result quality.
  • New RSU-G:
  • Opportunities to further reduce area/power/fabrication cost.
  • Some techniques are applicable to pure-CMOS

accelerators.

High result quality. Minimal area/power. Sizable performance benefits. More flexibility.

slide-55
SLIDE 55

Conclusion

17

  • MCMC: general framework in statistical machine learning.
  • Quality analysis on standard benchmarks and metrics.
  • Previous RSU-G: didn’t provide required result quality.
  • New RSU-G:
  • Opportunities to further reduce area/power/fabrication cost.
  • Some techniques are applicable to pure-CMOS

accelerators.

  • Future Work:
  • Supporting more mathematical MCMC models.
  • Quality analysis on wider application domains.

High result quality. Minimal area/power. Sizable performance benefits. More flexibility.

slide-56
SLIDE 56

Thank you

  • Q&A

18