Recurrent Neural Networks Sharan Narang May 9, 2017 Silicon Valley - - PowerPoint PPT Presentation

recurrent neural networks
SMART_READER_LITE
LIVE PREVIEW

Recurrent Neural Networks Sharan Narang May 9, 2017 Silicon Valley - - PowerPoint PPT Presentation

Exploring Sparsity in Recurrent Neural Networks Sharan Narang May 9, 2017 Silicon Valley AI Lab Speech Recognition with Deep Learning English Scaling with Data Compa paris rison on of Spe peech Recogni gnition tion App pproac roaches


slide-1
SLIDE 1

Silicon Valley AI Lab

Exploring Sparsity in Recurrent Neural Networks

Sharan Narang

May 9, 2017

slide-2
SLIDE 2

Speech Recognition with Deep Learning

English

slide-3
SLIDE 3

Scaling with Data

Accuracy Data + Model Size (Speed)

Compa paris rison

  • n of Spe

peech Recogni gnition tion App pproac roaches hes Title

Deep Learning Traditional methods

slide-4
SLIDE 4

Model Sizes

8.14 67.70 115.47 32.56 270.79 461.87

0.00 100.00 200.00 300.00 400.00 500.00

Deep Speech 1 Deep Speech 2 (RNN) Deep Speech 2 (GRU) Baidu du Speec ech h Models els Number of Parameters (in millions) Size (in MB)

slide-5
SLIDE 5

Future Vision

slide-6
SLIDE 6

Sparse Neural Networks

slide-7
SLIDE 7

Pruning Weights

Epochs Dense Initial Network Start of Training Pruning Weights During Training Sparse Final Network End of Training

slide-8
SLIDE 8

Pruning Approach

0.1 0.2 0.3 0.4 0.5

5 10 15 20 Prune Threshold Epoch Recurrent Linear

slide-9
SLIDE 9

Pruning Layers

70% 75% 80% 85% 90% 95% 100%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 Sparsity Layers Pruned Percent

slide-10
SLIDE 10

Results

Model Layer Size # of Params CER Relative Perf RNN Dense 1760 67 million 10.67 0.0% RNN Sparse 1760 8.3 million 12.88

  • 20.71%

RNN Sparse 2560 11.1 million 10.59 0.75% RNN Sparse 3072 16.7 million 10.25 3.95% GRU Dense 2560 115 million 9.55 0.0% GRU Sparse 2560 13 million 10.87

  • 13.82%

GRU Sparse 3568 17.8 million 9.76

  • 2.2%
slide-11
SLIDE 11

Equal Parameter Networks

20 25 30 35 40 45 50 55 60

5 10 15 20 CTC Cost Epoch Number

small_dense_train small_dense_dev0 large_sparse_train large_sparse_dev0

slide-12
SLIDE 12

Sparsity v/s Accuracy

  • 70%
  • 60%
  • 50%
  • 40%
  • 30%
  • 20%
  • 10%

0% 10%

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Relative Accuracy Sparsity

10.89 CER 17.4 CER 13.0 CER

Baseline line

slide-13
SLIDE 13

Models don’t need to be retrained

slide-14
SLIDE 14

Compression

8.11 6.06 4.04

2 4 6 8 10

1760 Sparse 2560 Sparse 3072 Sparse Compression Sparse se Models els

RNN Model

slide-15
SLIDE 15

Speedup

2.90 1.93 1.16 10 5.33 3.89

2 4 6 8 10 12

1760 Sparse 2560 Sparse 3072 Sparse Sparse se Models els Measured Speedup Expected Speedup

slide-16
SLIDE 16

Conclusion

  • Sparse Neural Networks can achieve good accuracy while

significantly reducing the number of parameters

  • Threshold based approach works for fully connected layers,

recurrent layers and GRU layers

  • Improvements in Sparse Matrix Vector libraries can result in

higher speedup for Sparse Neural Networks

slide-17
SLIDE 17

Thank You!

slide-18
SLIDE 18

Sharan Narang sharan@baidu.com http://research.baidu.com

Silicon Valley AI Lab