On-line Learning Architectures March 2017 Tarek M. Taha Electrical - PowerPoint PPT Presentation

On-line Learning Architectures March 2017 Tarek M. Taha Electrical and Computer Engineering Department University of Dayton 1 Tarek Taha

Device SPICE Model Boise State Univ of Michigan HP Labs Actual Device 1 60 0 1 Current (mA) 0.8 40 -0.1 SPICE Model Current ( µ A) Voltage (V) 0.6 Current ( µ A) 20 0.5 -0.2 0.4 0 -0.3 0.2 - -20 0 0 -0.4 0 0.1 0.2 0.3 1 2 3 4 -2 -1 0 -40 -1.5 -1 -0.5 0 0.5 1 1.5 Voltage (V) Voltage (V) Voltage (V) C. Yakopcic, T. M. Taha, G. Subramanyam, and R. E. Pino, "Memristor SPICE Model and Crossbar Simulation Based on Devices with Nanosecond Switching Time," IEEE International Joint Conference on Neural Networks (IJCNN), August 2013. [ BEST PAPER AWARD ] 2 Tarek Taha

Memristor Backpropagation Circuit Online training via backpropagation circuits  The same memristor crossbar implements both forward and backward passes  Forward Pass Backward Pass A A A A inputs inputs B B B B C C C C β β β β 𝜀 𝑀2 , 1 𝜀 𝑀2 , 2 𝜀 𝑀2 , 1 𝜀 𝑀2 , 2 Training Training Unit (L1) Unit (L1) ctrl1 ctrl1 ctrl2 ctrl2 + + 𝜀 𝑀1 , 6 𝜀 𝑀1 , 6 - - + + F 1 F 2 - - + + . . . . . . - - + + - - + + - - + + 𝜀 𝑀1 , 1 𝜀 𝑀1 , 1 - - β β β β output_1 output_1 output_2 output_2 - - - - target_1 ∑ ∑ target_1 ∑ ∑ target_2 target_2 + + + + δ L2,1 δ L2,1 δ L2,2 δ L2,2 A B C Training Training Unit (L2) Unit (L2) 3 Tarek Taha

Socrates: Online Training Architecture Memristor learning core Multicore processing system  Implements backpropagation based online training  A A T wo versions: 1) memristor core and 2) fully digital core inputs B  B C Has three 3D stack memory for storing training data  C β β 3D stack NC NC NC NC eDRAM R R R R --OR-- NC NC NC NC R R R R Digital learning core NC NC NC NC R R R R Router NC NC NC NC R R R R 4 Tarek Taha

FPGA Implementation Neural Core  We verified the system functionality by implementing the digital system on an FPGA. NC = Neural Core R = Router NC NC NC NC R R R R NC NC NC NC R R R R NC NC NC NC R R R R Router NC NC NC NC Routing bus R R R R Router Out 8 bits In S S S S S S S S S Actual images to and S S S S out of FPGA based S S S S multicore neural S S S S processor: Output port Input port 5 Tarek Taha

Training Efficiency Training compared to GTX980Ti GPU Energy Efficiency: - Digital: 5900x - Memristor: 70000x Speedup: - Digital: 14x - Memristor: 7x - Batch processing on GPU but not on specialized processors Accuracy(%) GPU Digital Memristor MNIST 99% 97% 97% COIL-20 100% 98% 97% COIL-100 99% 99% 99% 6 Tarek Taha

Inference Efficiency Inference compared to GTX980Ti GPU Energy Efficiency: - Digital: 7000x - Memristor: 358000x Speedup: - Digital: 29x - Memristor: 60x Memristor learning core is about 35x smaller in area: - Digital: 0.52 mm 2 - Memristor: 0.014 mm 2 7 Tarek Taha

Large Crossbar Simulation  Studying training using memristor V 1 =1 V crossbars needs to be done in SPICE Inputs to account for circuit parasitics. . . . . .  Simulating one cycle of a large crossbar . in can take over 24 hours in SPICE. V M =1 V (training requires thousands of iterations and hence is nearly R + - R impossible in SPICE). R f + - y j y 1  Developed a fast (~30ms) algorithm to y N/2 approximate SPICE output with over 99% accuracy.  Enables us to examine learning on 1 Potential across memristor (V) 1 memristor crossbars. Potential across memristor (V) 0.98 0.98 0.96 0.96 0.94 0.94 0.92 0.92 0.9 0.9 0 2 4 6 8 0 2 4 6 8 Memristor 4 x 10 Memristor 4 x 10 MATLAB 300×300 SPICE 300×300 8 Tarek Taha

Memristor for On-Line Learning Memory On-line Learning Memristor Memristor R off R off 10ns 500ns 10ns R on R on  On-line learning requires “highly tunable” or “slow” memristors  This does not affect inference speed 9 Tarek Taha

LiNbO 3 Memristor for Online Learning  Device is quite slow: order of tens of 10 millisecond switching time. 5 Current (mA)  Useful for backprogation training. 0 -5 -10 -3 -2 -1 0 1 2 3 Voltage (V) Switching speed of devices and minimum programming pulse needed in neuromorphic training circuits. Approx. Switching Minimum -6.5 Switching Voltage Pulse Width 6 Time Required -7 5 10ns 6 1ps Current (mA) Current (mA) -7.5 10ns 4.5 10ps 4 -8 100ns 4.5 100ps -8.5 3 1µs 4.5 1ns -9 2 10µs 4.5 10ns -9.5 0 20 40 60 80 100 0 500 1000 1500 2000 Pulse Number Pulse Number 10 Tarek Taha

Unsupervised Clustering We extended the  Before Training After Training backpropagation circuit to 0.75 1 implement auto-encoder 0.7 based unsupervised learning 0.5 Benign 0.65 in memristor crossbars. Malignant Wisconsin 0.6 0 These graphs show the  0.55 circuit learning and clustering -0.5 0.5 Benign data in an unsupervised Malignant 0.45 -1 manner. 0.2 0.25 0.3 0.35 0.4 -1 -0.5 0 0.5 1 0.7 -0.3 Setosa -0.4 Versicolor 0.65 Virginica Iris -0.5 x 1 -0.6 x 1 ’ 0.6 -0.7 x 2 -0.8 Setosa 0.55 x 2 ’ Versicolor -0.9 n 1 Virginica x 3 0.5 -1 x 3 ’ 0.8 0.81 0.82 0.83 -1 -0.8 -0.6 -0.4 -0.2 0 n 2 x 4 0.7 1 x 4 ’ Class1 n 3 Class2 0.8 x 5 0.65 Class3 x 5 ’ Wine 1 0.6 0.6 x 6 x 6 ’ Class1 0.55 0.4 Layer L 1 Layer L 2 Class2 1 Class3 0.5 0.2 0.2 0.25 0.3 0.35 -1 -0.5 0 0.5 11 Tarek Taha

Application: Cybersecurity  The autoencoder based unsupervised training circuits were used to learn normal network packets.  When the memristor circuits were presented with anomalous network packets, these were detected with a 96.6% accuracy and a 4% false positive rate. Distance bet n input & reconstruction 1.2 Distance bet n input & reconstruction 1.2 100 1 1 80 Detection rate (%) 0.8 0.8 60 0.6 0.6 40 0.4 0.4 20 0.2 0.2 0 0 20 40 60 80 100 0 0 False detection (%) 0 500 1000 1500 2000 2500 3000 0 500 1000 1500 2000 2500 Normal packet Anomalous packet  Could also implement rule based malicious packet detection (similar to SNORT) in a very compact circuit Regex : user\s[^\n]{10} 12 Tarek Taha

Application: Autonomous Agent for UAVs  Exploring implementation on:  Cognitively Enhanced Complex Event  IBM TrueNorth processor Processing (CECEP) Architecture  Memristor circuits  High performance clusters: CPUs and  Geared towards autonomous GPUs decision making in a variety of applications including autonomous UAVs IBM IO “Adapters” Event Output Streams TrueNorth A A inputs B B Memristor C C Systems β β 13 Tarek Taha

Parallel Cognitive Systems Laboratory Research Engineers: Doctoral Students: Sponsors: Raqib Hasan, PhD Zahangir Alom • • Chris Yakopcic, PhD Rasitha Fernando • • Wei Song, PhD Ted Josue • • Tanvir Atahary, PhD Will Mitchell • • Yangjie Qi • Nayim Rahman • Stefan Westberg • Ayesha Zaman • Shuo Zhang • Master’s Students: Omar Faruq • Tom Mealy • http://taha-lab.org 14 Tarek Taha

On-line Learning Architectures March 2017 Tarek M. Taha Electrical - PowerPoint PPT Presentation

On-line Learning Architectures March 2017 Tarek M. Taha Electrical and Computer Engineering Department University of Dayton 1 Tarek Taha Device SPICE Model Boise State Univ of Michigan HP Labs Actual Device 1 60 0 1 Current (mA)

The Slope of a Line The Slope of a Line The Slope of a Line The Slope of a Line The Slope of a

Title Slide Math 696 Class July 19, 2002 Line 1 Line 2 Line 3 Line 4 Line 5 Line 6 Line 7

Router Architectures CPU CPU Memory Memory packets NFE NFE Processor Processor Line Card

Architectures Architectural styles Software architectures Architectures versus middleware

X-Line 101 June 2019 X-Line 101 X-Line Unit Overview What makes X-Line unique X-Line 101

Hartford Line: A New Model for Intercity Passenger Rail 1 Hartford Line Service 2 Hartford

Coupling On-line and Off-line Random Graphs Woojin Kim March 1st Introduction Preliminary

John Heartfeld J. Otto Seibold Tempest Half life Piet Mondrian The 7 elements of art 1. line

CompSci 356: Computer Network Architectures Lecture 2: Network Architectures Xiaowei Yang

Architectures, Architectures, Microkernels, IPC, Microkernels, IPC, Capabilities Capabilities

Overview Agent Architectures Definition of agent architecture Classical Architectures for

CompSci 356: Computer Network Architectures Lecture 2: Network Architectures Xiaowei Yang

HPC Architectures Types of resource currently in use Outline Shared memory architectures

HPC Architectures Types of resource currently in use Outline Shared memory architectures

Hudson Line Investments and Capacity Constraints Pascack Valley Line The Pascack Valley

Command Line Arguments ECE2893 Lecture 20 ECE2893 Command Line Arguments Spring 2011 1 / 5

Threat Modeling and S haring S ummary Proposal to kick off Threat Modeling proj ect

Bear Essentials Bear Essentials Rangers in the Classroom Presentation Rangers in the Classroom

Network Monitoring on Industrial Control Systems Alvaro Cardenas, PhD. David I Urbina, PhD.

Design and implementation of an intrusion detection system (IDS) for in-vehicle networks

EternalBlue: Exploit Analysis and Beyond WHO AM I? Emma McCall Cyber Security Analyst @ Riot

SDN / NFV panel @ SBRC Sbastien Tandel sta@hpe.com slideshare.net/standel May 2017

IPv6 Intrusion Detection Research Project Carsten Rossenhvel, EANTC AG Sven Schindler,

15 20 years ago Internet starting to reach a wider audience most people did not