On-line Learning Architectures March 2017 Tarek M. Taha Electrical - - PowerPoint PPT Presentation

on line learning architectures
SMART_READER_LITE
LIVE PREVIEW

On-line Learning Architectures March 2017 Tarek M. Taha Electrical - - PowerPoint PPT Presentation

On-line Learning Architectures March 2017 Tarek M. Taha Electrical and Computer Engineering Department University of Dayton 1 Tarek Taha Device SPICE Model Boise State Univ of Michigan HP Labs Actual Device 1 60 0 1 Current (mA)


slide-1
SLIDE 1

Tarek Taha 1

On-line Learning Architectures

Tarek M. Taha

Electrical and Computer Engineering Department University of Dayton

March 2017

slide-2
SLIDE 2

Tarek Taha 2

Device SPICE Model

C. Yakopcic, T. M. Taha, G. Subramanyam, and R. E. Pino, "Memristor SPICE Model and Crossbar Simulation Based on Devices with Nanosecond Switching Time," IEEE International Joint Conference on Neural Networks (IJCNN), August 2013. [BEST PAPER AWARD]

0.1 0.2 0.3 0.5 1 Voltage (V) Current (mA)

  • Voltage (V)

1 2 3 4 0.2 0.4 0.6 0.8 1 Current (µA)

  • 2
  • 1
  • 0.4
  • 0.3
  • 0.2
  • 0.1

Voltage (V)

  • 1.5
  • 1
  • 0.5

0.5 1 1.5

  • 40
  • 20

20 40 60 Voltage (V) Current (µA)

Univ of Michigan Boise State HP Labs SPICE Model Actual Device

slide-3
SLIDE 3

Tarek Taha 3

Memristor Backpropagation Circuit

A B C F1 F2

Online training via backpropagation circuits

The same memristor crossbar implements both forward and backward passes

Training Unit (L2) C B β C A A B β β β inputs 𝜀𝑀1,6 𝜀𝑀1,1

. . .

+

  • +
  • +
  • +
  • +
  • +
  • ctrl2

ctrl1

target_2

+

  • δL2,2
  • utput_2

  • δL2,1

+

target_1

  • utput_1

Training Unit (L1) 𝜀𝑀2,1 𝜀𝑀2,2 Training Unit (L2) C B β C A A B β β β inputs 𝜀𝑀1,6 𝜀𝑀1,1

. . .

+

  • +
  • +
  • +
  • +
  • +
  • ctrl2

ctrl1

target_2

+

  • δL2,2
  • utput_2

  • δL2,1

+

target_1

  • utput_1

Training Unit (L1) 𝜀𝑀2,1 𝜀𝑀2,2

Forward Pass Backward Pass

slide-4
SLIDE 4

Tarek Taha 4

Socrates: Online Training Architecture

NC R NC R NC R NC R NC R NC R NC R NC R NC R NC R NC R NC R NC R NC R NC R NC R

3D stack eDRAM

Router

Multicore processing system

Implements backpropagation based online training

T wo versions: 1) memristor core and 2) fully digital core

Has three 3D stack memory for storing training data

C B β C A A B β inputs

Memristor learning core Digital learning core

  • -OR--
slide-5
SLIDE 5

Tarek Taha 5

FPGA Implementation

NC = Neural Core R = Router

Router Neural Core

NC R NC R NC R NC R NC R NC R NC R NC R NC R NC R NC R NC R NC R NC R NC R NC R

Out In

Input port Output port Routing bus 8 bits S S S S S S S S S S S S S S S S S S S S S Router

Actual images to and

  • ut of FPGA based

multicore neural processor:

 We verified the system functionality by

implementing the digital system on an FPGA.

slide-6
SLIDE 6

Tarek Taha 6

Training Efficiency

Accuracy(%) GPU Digital Memristor MNIST 99%

97% 97%

COIL-20 100%

98% 97%

COIL-100 99%

99% 99%

Training compared to GTX980Ti GPU Energy Efficiency:

  • Digital: 5900x
  • Memristor: 70000x

Speedup:

  • Digital: 14x
  • Memristor: 7x
  • Batch processing on GPU but not
  • n specialized processors
slide-7
SLIDE 7

Tarek Taha 7

Inference Efficiency

Inference compared to GTX980Ti GPU Energy Efficiency:

  • Digital: 7000x
  • Memristor: 358000x

Speedup:

  • Digital: 29x
  • Memristor: 60x

Memristor learning core is about 35x smaller in area:

  • Digital: 0.52 mm2
  • Memristor: 0.014 mm2
slide-8
SLIDE 8

Tarek Taha 8

Large Crossbar Simulation

2 4 6 8 x 10

4

0.9 0.92 0.94 0.96 0.98 1 Memristor Potential across memristor (V) 2 4 6 8 x 10

4

0.9 0.92 0.94 0.96 0.98 1 Memristor Potential across memristor (V)

SPICE 300×300 MATLAB 300×300

y1

. . .

Inputs

. . .

yN/2 VM=1 V V1=1 V

yj

R Rf R

+ - + -

 Studying training using memristor

crossbars needs to be done in SPICE to account for circuit parasitics.

 Simulating one cycle of a large crossbar

in can take over 24 hours in SPICE. (training requires thousands of iterations and hence is nearly impossible in SPICE).

 Developed a fast (~30ms) algorithm to

approximate SPICE output with over 99% accuracy.

 Enables us to examine learning on

memristor crossbars.

slide-9
SLIDE 9

Tarek Taha 9

Memristor for On-Line Learning

 On-line learning requires “highly tunable” or “slow” memristors  This does not affect inference speed

Ron Roff 10ns Ron Roff 10ns 500ns Memory Memristor On-line Learning Memristor

slide-10
SLIDE 10

Tarek Taha 10

LiNbO3 Memristor for Online Learning

 Device is quite slow: order of tens of

millisecond switching time.

 Useful for backprogation training.

  • 3
  • 2
  • 1

1 2 3

  • 10
  • 5

5 10 Voltage (V) Current (mA)

Approx. Switching Time Switching Voltage Minimum Pulse Width Required 10ns 6 1ps 10ns 4.5 10ps 100ns 4.5 100ps 1µs 4.5 1ns 10µs 4.5 10ns Switching speed of devices and minimum programming pulse needed in neuromorphic training circuits.

500 1000 1500 2000

  • 9.5
  • 9
  • 8.5
  • 8
  • 7.5
  • 7
  • 6.5

Pulse Number Current (mA)

20 40 60 80 100 2 3 4 5 6 Pulse Number Current (mA)

slide-11
SLIDE 11

Tarek Taha 11

Unsupervised Clustering

Before Training After Training

Wisconsin Iris Wine

0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 Benign Malignant

  • 1
  • 0.5

0.5 1

  • 1
  • 0.5

0.5 1 Benign Malignant 0.8 0.81 0.82 0.83 0.5 0.55 0.6 0.65 0.7 Setosa Versicolor Virginica

  • 1
  • 0.8
  • 0.6
  • 0.4
  • 0.2
  • 1
  • 0.9
  • 0.8
  • 0.7
  • 0.6
  • 0.5
  • 0.4
  • 0.3

Setosa Versicolor Virginica 0.2 0.25 0.3 0.35 0.5 0.55 0.6 0.65 0.7 Class1 Class2 Class3

  • 1
  • 0.5

0.5 0.2 0.4 0.6 0.8 1 Class1 Class2 Class3

x1 x2 x3 x4 x5 x6 1 n1 n2 n3 1 x1’ x2’ x3’ x4’ x5’ x6’

Layer L1 Layer L2

We extended the backpropagation circuit to implement auto-encoder based unsupervised learning in memristor crossbars.

These graphs show the circuit learning and clustering data in an unsupervised manner.

slide-12
SLIDE 12

Tarek Taha 12

Application: Cybersecurity

500 1000 1500 2000 2500 3000 0.2 0.4 0.6 0.8 1 1.2 Normal packet Distance betn input & reconstruction

500 1000 1500 2000 2500 0.2 0.4 0.6 0.8 1 1.2 Anomalous packet Distance betn input & reconstruction 20 40 60 80 100 20 40 60 80 100 False detection (%) Detection rate (%)

 The autoencoder based unsupervised training circuits were used to learn normal

network packets.

 When the memristor circuits were presented with anomalous network packets,

these were detected with a 96.6% accuracy and a 4% false positive rate.

 Could also implement rule based

malicious packet detection (similar to SNORT) in a very compact circuit

Regex : user\s[^\n]{10}

slide-13
SLIDE 13

Tarek Taha 13

Application: Autonomous Agent for UAVs

IBM TrueNorth

 Exploring implementation on:  IBM TrueNorth processor  Memristor circuits  High performance clusters: CPUs and

GPUs

 Cognitively Enhanced Complex Event

Processing (CECEP) Architecture

 Geared towards autonomous

decision making in a variety of applications including autonomous UAVs

Event Output Streams

IO “Adapters”

C B β C A A B β inputs

Memristor Systems

slide-14
SLIDE 14

Tarek Taha 14

Parallel Cognitive Systems Laboratory

Research Engineers:

  • Raqib Hasan, PhD
  • Chris

Yakopcic, PhD

  • Wei Song, PhD
  • Tanvir Atahary, PhD

Sponsors:

http://taha-lab.org

Doctoral Students:

  • Zahangir Alom
  • Rasitha Fernando
  • Ted Josue
  • Will Mitchell
  • Yangjie Qi
  • Nayim Rahman
  • Stefan Westberg
  • Ayesha Zaman
  • Shuo Zhang

Master’s Students:

  • Omar Faruq
  • Tom Mealy