exploiting hidden layer modular redundancy for fault
play

Exploiting Hidden Layer Modular Redundancy for Fault-Tolerance in - PowerPoint PPT Presentation

Exploiting Hidden Layer Modular Redundancy for Fault-Tolerance in Neural Network Accelerators Schuyler Eldridge Ajay Joshi Department of Electrical and Computer Engineering, Boston University schuye@bu.edu January 30, 2015 This work was


  1. Exploiting Hidden Layer Modular Redundancy for Fault-Tolerance in Neural Network Accelerators Schuyler Eldridge Ajay Joshi Department of Electrical and Computer Engineering, Boston University schuye@bu.edu January 30, 2015 This work was supported by a NASA Office of the Chief Technologist’s Space Technology Research Fellowship. schuye@bu.edu 30 Jan 2015 1/12

  2. Motivation Leveraging CMOS Scaling for Improved Performance is Becoming Increasingly Hard Contributing factors making it difficult include: Fixed power budgets An eventual slowdown of Moore’s Law Computer engineers increasingly turn towards alternative designs Alternative Designs As an alternative, others are investigating general and special purpose accelerators One actively researched accelerator architecture is that of neural network accelerators schuye@bu.edu 30 Jan 2015 2/12

  3. Artificial Neural Networks Y 1 Y o Artificial Neural Network Output . . . Directed graph of O 1 O o neurons Edges between neurons are weighted . . . Hidden H 1 H 2 H h bias Use in Applications Machine Learning Input . . . I 1 I i bias Big Data Approximate X 1 X i Computing Figure: Two-layer neural network with State Prediction i × h × o nodes. schuye@bu.edu 30 Jan 2015 3/12

  4. Neural Networks and Fault-Tolerance The Brain is Fault-Tolerant! Ergo neural networks are fault-tolerant This isn’t generally the case! Do Neural Networks have the potential for Fault-Tolerance? Neural networks have a redundant structure There are multiple paths from input to output Regression tasks often approximate smooth functions Small changes in inputs or internal computations may only cause small changes in the output However, there is no implicit guarantee of fault-tolerance unless you train a neural network to specifically demonstrate those properties schuye@bu.edu 30 Jan 2015 4/12

  5. N -MR Technique Y 1 Y 2 O 1 O 2 Steps for Amount of Redundancy N 1 Replicate each hidden neuron N times H 1 H 2 bias 2 Replicate each hidden neuron connection for each new neuron 3 Multiply all connection weights by 1 / N I 1 I 2 bias X 1 X 2 Figure: N -MR-1 schuye@bu.edu 30 Jan 2015 5/12

  6. N -MR Technique Y 1 Y 2 Y 1 Y 2 O 1 O 2 O 1 O 2 H 1 H 2 bias H 1 H 2 H 3 H 4 bias I 1 I 2 bias I 1 I 2 bias X 1 X 2 X 1 X 2 Figure: N -MR-1 Figure: N -MR-2 schuye@bu.edu 30 Jan 2015 5/12

  7. N -MR Technique Y 1 Y 2 Y 1 Y 2 O 1 O 2 O 1 O 2 H 1 H 2 bias H 1 H 2 H 3 H 4 H 5 H 6 bias I 1 I 2 bias I 1 I 2 bias X 1 X 2 X 1 X 2 Figure: N -MR-1 Figure: N -MR-3 schuye@bu.edu 30 Jan 2015 5/12

  8. N -MR Technique Y 1 Y 2 Y 1 Y 2 O 1 O 2 O 1 O 2 H 1 H 2 bias H 1 H 2 H 3 H 4 H 5 H 6 H 7 H 8 bias I 1 I 2 bias I 1 I 2 bias X 1 X 2 X 1 X 2 Figure: N -MR-1 Figure: N -MR-4 schuye@bu.edu 30 Jan 2015 5/12

  9. Neural Network Accelerator Architecture NN Config and PE Data Storage Unit PE Intermediate PE Control Storage Core PE Communication PE Figure: Block diagram of our neural network accelerator Basic Operation in a Multicore Environment Threads communicate neural network computation requests to this accelerator The accelerator allocates processing elements (PEs) to compute the outputs of all pending requests schuye@bu.edu 30 Jan 2015 6/12

  10. Neural Network Accelerator Architecture NN Config and PE Data Storage Unit PE Intermediate PE Control Storage Core PE Communication PE Figure: Block diagram of our neural network accelerator Basic Operation in a Multicore Environment Threads communicate neural network computation requests to this accelerator The accelerator allocates processing elements (PEs) to compute the outputs of all pending requests schuye@bu.edu 30 Jan 2015 6/12

  11. Neural Network Accelerator Architecture NN Config and PE Data Storage Unit PE Intermediate PE Control Storage Core PE Communication PE Figure: Block diagram of our neural network accelerator Basic Operation in a Multicore Environment Threads communicate neural network computation requests to this accelerator The accelerator allocates processing elements (PEs) to compute the outputs of all pending requests schuye@bu.edu 30 Jan 2015 6/12

  12. Neural Network Accelerator Architecture NN Config and PE Data Storage Unit PE Intermediate PE Control Storage Core PE Communication PE Figure: Block diagram of our neural network accelerator Basic Operation in a Multicore Environment Threads communicate neural network computation requests to this accelerator The accelerator allocates processing elements (PEs) to compute the outputs of all pending requests schuye@bu.edu 30 Jan 2015 6/12

  13. Neural Network Accelerator Architecture NN Config and PE Data Storage Unit PE Intermediate PE Control Storage Core PE Communication PE Figure: Block diagram of our neural network accelerator Basic Operation in a Multicore Environment Threads communicate neural network computation requests to this accelerator The accelerator allocates processing elements (PEs) to compute the outputs of all pending requests schuye@bu.edu 30 Jan 2015 6/12

  14. Evaluation Overview Table: Evaluated neural networks and their topologies Application NN Topology Description blackscholes (b) [1] 6 × 8 × 8 × 1 Financial option pricing rsa (r) [2] 30 × 30 × 30 Brute-force prime factorization sobel (s) [1] 9 × 8 × 1 3 × 3 Sobel filter Methodology We vary the amount of N -MR for the applications in Table 1 running on our NN accelerator architecture We introduce a random fault into a neuron and measure the accuracy and latency R. St. Amant et al. , “General-purpose code acceleration with limited-precision analog computation,” in ISCA , 2014, pp. 505–516. A. Waterland et al. , “Asc: Automatically scalable computation,” in ASPLOS . ACM, 2014, pp. 575–590. schuye@bu.edu 30 Jan 2015 7/12

  15. Evaluation – Normalized Latency Normalized Latency Latency Scaling with N -MR 6 Work, where work is the number of edges to 4 compute, scale with N -MR 2 However, latency scales sublinearly for our accelerator 1 3 5 7 Increasing N -MR means Amount of N -MR more work, but also more rsa blackscholes efficient use of the sobel Linear Baseline accelerator Figure: Latency normalized to N -MR-1 schuye@bu.edu 30 Jan 2015 8/12

  16. Evaluation – Accuracy Percentage Error Increase 10 4 10 1 Normalized Accuracy 10 3 10 2 10 1 10 0 10 0 1 3 5 7 1 3 5 7 Amount of N -MR Amount of N -MR blackscholes (MSE) rsa (% correct) sobel (MSE) Figure: Left : percentage accuracy difference, Right : accuracy normalized to N -MR-1 Accuracy and N -MR Generally, accuracy improves with increasing N -MR schuye@bu.edu 30 Jan 2015 9/12

  17. Evaluation – Combined Metrics Normalized EDP Cost of N -MR 10 1 We evaluate the cost using Energy-Delay product (EDP) 10 0 A high cost as N -MR increases both energy 1 3 5 7 and delay Amount of N -MR rsa blackscholes sobel Figure: Energy-Delay Product (EDP) for varying N -MR schuye@bu.edu 30 Jan 2015 10/12

  18. Discussion and Conclusion An Initial Approach Future Directions As neural network Varying N -MR at run-time accelerators become mainstream, approaches to Faults are currently assumed improve their fault-tolerance to be intermittent, but by will have increased value varying internal PE structure and enforcing scheduling N -MR is a preliminary step neurons on different PEs, a to leverage the potential for more robust approach can fault-tolerance in neural be developed networks Run-time splitting of Other approaches do exist: important nodes or not Training with faults Splitting important computing unimportant neurons and pruning nodes unimportant ones schuye@bu.edu 30 Jan 2015 11/12

  19. Summary and Questions Figure: Latency, accuracy, and combined metrics Y 1 Y 2 O 1 O 2 NN Config and PE Data Storage Unit PE Intermediate H 1 H 2 bias PE Control Storage Core PE Communication I 1 I 2 bias PE Figure: NN accelerator architecture X 1 X 2 Figure: A two-layer NN schuye@bu.edu 30 Jan 2015 12/12

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend