on line learning architectures
play

On-line Learning Architectures March 2017 Tarek M. Taha Electrical - PowerPoint PPT Presentation

On-line Learning Architectures March 2017 Tarek M. Taha Electrical and Computer Engineering Department University of Dayton 1 Tarek Taha Device SPICE Model Boise State Univ of Michigan HP Labs Actual Device 1 60 0 1 Current (mA)


  1. On-line Learning Architectures March 2017 Tarek M. Taha Electrical and Computer Engineering Department University of Dayton 1 Tarek Taha

  2. Device SPICE Model Boise State Univ of Michigan HP Labs Actual Device 1 60 0 1 Current (mA) 0.8 40 -0.1 SPICE Model Current ( µ A) Voltage (V) 0.6 Current ( µ A) 20 0.5 -0.2 0.4 0 -0.3 0.2 - -20 0 0 -0.4 0 0.1 0.2 0.3 1 2 3 4 -2 -1 0 -40 -1.5 -1 -0.5 0 0.5 1 1.5 Voltage (V) Voltage (V) Voltage (V) C. Yakopcic, T. M. Taha, G. Subramanyam, and R. E. Pino, "Memristor SPICE Model and Crossbar Simulation Based on Devices with Nanosecond Switching Time," IEEE International Joint Conference on Neural Networks (IJCNN), August 2013. [ BEST PAPER AWARD ] 2 Tarek Taha

  3. Memristor Backpropagation Circuit Online training via backpropagation circuits  The same memristor crossbar implements both forward and backward passes  Forward Pass Backward Pass A A A A inputs inputs B B B B C C C C β β β β 𝜀 𝑀2 , 1 𝜀 𝑀2 , 2 𝜀 𝑀2 , 1 𝜀 𝑀2 , 2 Training Training Unit (L1) Unit (L1) ctrl1 ctrl1 ctrl2 ctrl2 + + 𝜀 𝑀1 , 6 𝜀 𝑀1 , 6 - - + + F 1 F 2 - - + + . . . . . . - - + + - - + + - - + + 𝜀 𝑀1 , 1 𝜀 𝑀1 , 1 - - β β β β output_1 output_1 output_2 output_2 - - - - target_1 ∑ ∑ target_1 ∑ ∑ target_2 target_2 + + + + δ L2,1 δ L2,1 δ L2,2 δ L2,2 A B C Training Training Unit (L2) Unit (L2) 3 Tarek Taha

  4. Socrates: Online Training Architecture Memristor learning core Multicore processing system  Implements backpropagation based online training  A A T wo versions: 1) memristor core and 2) fully digital core inputs B  B C Has three 3D stack memory for storing training data  C β β 3D stack NC NC NC NC eDRAM R R R R --OR-- NC NC NC NC R R R R Digital learning core NC NC NC NC R R R R Router NC NC NC NC R R R R 4 Tarek Taha

  5. FPGA Implementation Neural Core  We verified the system functionality by implementing the digital system on an FPGA. NC = Neural Core R = Router NC NC NC NC R R R R NC NC NC NC R R R R NC NC NC NC R R R R Router NC NC NC NC Routing bus R R R R Router Out 8 bits In S S S S S S S S S Actual images to and S S S S out of FPGA based S S S S multicore neural S S S S processor: Output port Input port 5 Tarek Taha

  6. Training Efficiency Training compared to GTX980Ti GPU Energy Efficiency: - Digital: 5900x - Memristor: 70000x Speedup: - Digital: 14x - Memristor: 7x - Batch processing on GPU but not on specialized processors Accuracy(%) GPU Digital Memristor MNIST 99% 97% 97% COIL-20 100% 98% 97% COIL-100 99% 99% 99% 6 Tarek Taha

  7. Inference Efficiency Inference compared to GTX980Ti GPU Energy Efficiency: - Digital: 7000x - Memristor: 358000x Speedup: - Digital: 29x - Memristor: 60x Memristor learning core is about 35x smaller in area: - Digital: 0.52 mm 2 - Memristor: 0.014 mm 2 7 Tarek Taha

  8. Large Crossbar Simulation  Studying training using memristor V 1 =1 V crossbars needs to be done in SPICE Inputs to account for circuit parasitics. . . . . .  Simulating one cycle of a large crossbar . in can take over 24 hours in SPICE. V M =1 V (training requires thousands of iterations and hence is nearly R + - R impossible in SPICE). R f + - y j y 1  Developed a fast (~30ms) algorithm to y N/2 approximate SPICE output with over 99% accuracy.  Enables us to examine learning on 1 Potential across memristor (V) 1 memristor crossbars. Potential across memristor (V) 0.98 0.98 0.96 0.96 0.94 0.94 0.92 0.92 0.9 0.9 0 2 4 6 8 0 2 4 6 8 Memristor 4 x 10 Memristor 4 x 10 MATLAB 300×300 SPICE 300×300 8 Tarek Taha

  9. Memristor for On-Line Learning Memory On-line Learning Memristor Memristor R off R off 10ns 500ns 10ns R on R on  On-line learning requires “highly tunable” or “slow” memristors  This does not affect inference speed 9 Tarek Taha

  10. LiNbO 3 Memristor for Online Learning  Device is quite slow: order of tens of 10 millisecond switching time. 5 Current (mA)  Useful for backprogation training. 0 -5 -10 -3 -2 -1 0 1 2 3 Voltage (V) Switching speed of devices and minimum programming pulse needed in neuromorphic training circuits. Approx. Switching Minimum -6.5 Switching Voltage Pulse Width 6 Time Required -7 5 10ns 6 1ps Current (mA) Current (mA) -7.5 10ns 4.5 10ps 4 -8 100ns 4.5 100ps -8.5 3 1µs 4.5 1ns -9 2 10µs 4.5 10ns -9.5 0 20 40 60 80 100 0 500 1000 1500 2000 Pulse Number Pulse Number 10 Tarek Taha

  11. Unsupervised Clustering We extended the  Before Training After Training backpropagation circuit to 0.75 1 implement auto-encoder 0.7 based unsupervised learning 0.5 Benign 0.65 in memristor crossbars. Malignant Wisconsin 0.6 0 These graphs show the  0.55 circuit learning and clustering -0.5 0.5 Benign data in an unsupervised Malignant 0.45 -1 manner. 0.2 0.25 0.3 0.35 0.4 -1 -0.5 0 0.5 1 0.7 -0.3 Setosa -0.4 Versicolor 0.65 Virginica Iris -0.5 x 1 -0.6 x 1 ’ 0.6 -0.7 x 2 -0.8 Setosa 0.55 x 2 ’ Versicolor -0.9 n 1 Virginica x 3 0.5 -1 x 3 ’ 0.8 0.81 0.82 0.83 -1 -0.8 -0.6 -0.4 -0.2 0 n 2 x 4 0.7 1 x 4 ’ Class1 n 3 Class2 0.8 x 5 0.65 Class3 x 5 ’ Wine 1 0.6 0.6 x 6 x 6 ’ Class1 0.55 0.4 Layer L 1 Layer L 2 Class2 1 Class3 0.5 0.2 0.2 0.25 0.3 0.35 -1 -0.5 0 0.5 11 Tarek Taha

  12. Application: Cybersecurity  The autoencoder based unsupervised training circuits were used to learn normal network packets.  When the memristor circuits were presented with anomalous network packets, these were detected with a 96.6% accuracy and a 4% false positive rate. Distance bet n input & reconstruction 1.2 Distance bet n input & reconstruction 1.2 100 1 1 80 Detection rate (%) 0.8 0.8 60 0.6 0.6 40 0.4 0.4 20 0.2 0.2 0 0 20 40 60 80 100 0 0 False detection (%) 0 500 1000 1500 2000 2500 3000 0 500 1000 1500 2000 2500 Normal packet Anomalous packet  Could also implement rule based malicious packet detection (similar to SNORT) in a very compact circuit Regex : user\s[^\n]{10} 12 Tarek Taha

  13. Application: Autonomous Agent for UAVs  Exploring implementation on:  Cognitively Enhanced Complex Event  IBM TrueNorth processor Processing (CECEP) Architecture  Memristor circuits  High performance clusters: CPUs and  Geared towards autonomous GPUs decision making in a variety of applications including autonomous UAVs IBM IO “Adapters” Event Output Streams TrueNorth A A inputs B B Memristor C C Systems β β 13 Tarek Taha

  14. Parallel Cognitive Systems Laboratory Research Engineers: Doctoral Students: Sponsors: Raqib Hasan, PhD Zahangir Alom • • Chris Yakopcic, PhD Rasitha Fernando • • Wei Song, PhD Ted Josue • • Tanvir Atahary, PhD Will Mitchell • • Yangjie Qi • Nayim Rahman • Stefan Westberg • Ayesha Zaman • Shuo Zhang • Master’s Students: Omar Faruq • Tom Mealy • http://taha-lab.org 14 Tarek Taha

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend