Hardware-Software Co-design of Slimmed Optical Neural Networks Zheng - - PowerPoint PPT Presentation

hardware software co design of slimmed optical neural
SMART_READER_LITE
LIVE PREVIEW

Hardware-Software Co-design of Slimmed Optical Neural Networks Zheng - - PowerPoint PPT Presentation

Hardware-Software Co-design of Slimmed Optical Neural Networks Zheng Zhao 1 , Derong Liu 1 , Meng Li 1 , Zhoufeng Ying 1 , Lu Zhang 2 , Biying Xu 1 , Bei Yu 2 , Ray Chen 1 , David Pan 1 The University of Texas at Austin 1 The Chinese University of


slide-1
SLIDE 1

Hardware-Software Co-design of Slimmed Optical Neural Networks

Zheng Zhao1, Derong Liu1, Meng Li1, Zhoufeng Ying1,

Lu Zhang2, Biying Xu1, Bei Yu2, Ray Chen1, David Pan1

The University of Texas at Austin1 The Chinese University of Hong Kong2

slide-2
SLIDE 2

Introduction

t

Emergence of dedicated AI accelerators

› Optical neural network processor: light in and light out

» Speed-of-light floating point matrix-vector multiplication » >100GHz detection rate » Ultra-low energy consumption if configured

› Great number of components, sensitivity to noise

[Shen+, Nature Photonics 2017]

slide-3
SLIDE 3

Previous Optical Neural Network (ONN)

t

SVD decompose W = U Σ V*

t

U and V* are unitary matrices

› A unitary X satisfies XX* = I › Implemented by Mach-Zehnder interferometers array

t

Σ is a diagonal matrix

› Diagonal values are non-negative real › Implemented by optical attenuators

t

σ is non-linear activation

› Implemented by saturable absorber

3

[Shen+, Nature Photonics 2017]

Most area expensive

V* U Σ

in

σ

  • ut

W

Input layer Hidden layer

Output layer

slide-4
SLIDE 4

t

Mach-Zehnder interferometers (MZI) for U and V*

› A single MZI implements a 2-dim unitary › An array of n(n-1)/2 MZIs implements an n-dim unitary

t

Given an n-dim unitary, φ’s can be uniquely computed

Implementing Unitary U and V*

ϕ coupler coupler

  • ut

in

… … …

Ti,j in

  • ut

ith row jth row ith col. jth col.

4

slide-5
SLIDE 5

Previous ONN overview

t Layer size measured by # of MZIs = m(m-1)/2+n(n-1)/2 t Software training and hardware implementation

› Train W directly in software à SVD-decomp to obtain U, Σ, V*

(m x m) (m x n) (n x n) (n x 1) (m x 1) (m x n)

V* U Σ W SVD decomp Software Training Optical Implementation

V* U Σ

in

σ

  • ut

W

slide-6
SLIDE 6

t T: sparse tree network t U: unitary network t Σ: diagonal network t Use less # of MZIs = n(n-1)/2

› 1 unitary matrix to maintain the expressivity › An area-efficient tree network to match the dimension

Slimmed Architecture

6

(m x n) (n x n) (n x n) (n x 1) (m x 1) (m x n)

same constraints as the previous architecture

Σ T U

in

σ

  • ut

W

slide-7
SLIDE 7

t An arbitrary weight W is not TUΣ-decomposable t Co-design solution: training and implementation are coupled

› T and Σ: Train the device parameters, constraints embedded › U: Add unitary regularization then approximate with true unitary

Software Training Optical Implementation U U with reg. Approx. T T = Σ Σ =

Co-design Overview

7

Previous Train and Impl.

V* U Σ W SVD decomp Software Training Optical Implementation

slide-8
SLIDE 8

Sparse Tree Network

t Sparse Tree network (T) to match the different dimension

› Suppose in-dim > out-dim › α: linear transfer coefficient

8

2nd subtree 1st subtree in

  • ut

3rd subtree

y N x 1subtree … … xN x2 x1

slide-9
SLIDE 9

Sparse Tree Network Implementation

t Implemented with MZIs or directional couplers t A 2 x 1 subtree

can be Implemented with a single-out MZI or directional coupler

9

2 x 1 subtree x2 x1 y

ϕ coupler coupler

  • ut

in

(energy conservation)

slide-10
SLIDE 10

Sparse Tree Network Implementation

t Any N-input subtree with arbitrary α’s satisfying energy conservation

can be implemented it by cascading (N-1) single-out MZIs.

t Energy conservation embedded in training

Software Training Optical Implementation U U with reg. Approx. T T = Σ Σ =

slide-11
SLIDE 11

Unitary Network in Training

t For unitary network U satisfying UU* = I, add the regularization

reg = ∥UU* − I ∥F

t Training loss function

Loss = Data Loss + Regularization Loss leading to a near-implementable ONN with high accuracy

t Trained Ut ~ unitary but only true unitary is implementable by MZIs 11

slide-12
SLIDE 12

Unitary Network in Implementation

t Approximate Ut by a true unitary Ua t SVD-decompose Ut = PSQ* à Ua = PQ* t Claim. Minimize the regularization ⇔ find the best approximation

  • Min. reg ⇔ Min. || Ut - Ua ||F

Software Training Optical Implementation U U with reg. Approx. T T = Σ Σ =

slide-13
SLIDE 13

t Implemented in TensorFlow for various ONN setup t Tested it on Intel Core i9-7900X CPU and an NVIDIA TitanXp GPU t Performed on the handwritten digit dataset MNIST

Simulation Results

13

N1: (1414)-100-10 N4: (1414)-150-150-10 N7: (1414)-150-150-150-10 N2: (1414)-150-10 N5: (2828)-400-400-10 N8: (2828)-400-400-200-10 N3: (2828)-400-10 N6: (2828)-600-300-10 N9: (2828)-600-600-300-10

slide-14
SLIDE 14

Simulation Results

  • N1~N9: network configurations
  • Our architecture uses 15%-38% less MZIs
  • Similar accuracy (~0 accuracy loss)
  • Maximum loss is 0.0088
  • Average is 0.0058

# of MZIs Accuracy

slide-15
SLIDE 15

t Better resilience due to less cascaded components

Previous ONN Our ONN

Noise Robustness

15

Accuracy Accuracy Noise Amplitude Noise Amplitude

slide-16
SLIDE 16

Training Curve

16

  • Converged in 300 epochs
  • Balance of the accuracy and the unitary

approximation

Regularization Epoch Epoch Epoch Accuracy Accuracy Regularization Regularization Accuracy

slide-17
SLIDE 17

Contributions of This Work

t An new architecture for ONN

› Area-efficiency › ~0 accuracy loss › Better robustness to noise

t Hardware and software co-design methodology

› Software-embedded hardware parameters › Hardware constraints guaranteed by software

17

slide-18
SLIDE 18

Future Work

t Better MZI pruning methods

› ~0 phase MZI à pruned + accuracy recover › MZI-sparse unitary matrix

t Design for robustness

› Adjust noise distribution in training

t Online training t ONN for other neural network architectures

› CNN, RNN, etc.

18

slide-19
SLIDE 19

19