Networks: An FFT-Based Architecture Jiaqi Gu, Zheng Zhao, Chenghao - PowerPoint PPT Presentation

Towards Area-Efficient Optical Neural Networks: An FFT-Based Architecture Jiaqi Gu, Zheng Zhao, Chenghao Feng, Mingjie Liu, Ray T. Chen, David Z. Pan ECE Department, The University of Texas at Austin This work is supported in part by MURI

AI Acceleration and Challenges ⧫ ML models and dataset keep increasing -> more computation demands › Low latency › Low power › High bandwidth Autonomous Vehicle Data Center ⧫ Moore ’ s law is challenging to provide higher-performance computations 2

AI Acceleration and Challenges ⧫ Using light to continue Moore ’ s Law ⧫ Promising technology for next-generation AI accelerator Fully Optical Chip 10 10 (Theoretical Limit) 10 8 Optical- GFLOP / W Da- Electrical 10 6 Diannao Hybrid NVIDIA Tegra 28nm Chip V100 K1 TitanX ASIC 10 4 Core-i7 ASIC 28nm 28nm 5930K GPU GPU 22nm 10 2 CPU 0 [Shen+, Nature Photonics 2017] 3

Optical Neural Networks (ONN) ⧫ Emergence of neuromorphic platforms for AI acceleration ⧫ Optical neural networks (ONNs) › Ultra-fast execution speed (light in and light out) › >100 GHz photo-detection rate › Near-zero energy consumption if configured ⧫ Unsatisfactory hardware area cost › Mach-Zehnder Interferometers (MZI) are relatively large › Previous architecture costs lots of MZIs (area-inefficient) › Previous architecture is not compatible with network pruning [Shen+, Nature Photonics 2017] 4

Previous MZI-based ONN Architecture ⧫ Map weight matrix to MZI arrays ⧫ Singular value decomposition › › U and V* are square unitary matrices › Σ is diagonal matrix ⧫ Unitary group parametrization › › R ij is planar rotation matrix › R ij with phase can be implemented by an MZI 5

Previous MZI-based ONN Architecture ⧫ Slimmed ONN architecture [ASPDAC ’ 19 Zhao+] ⧫ TU Σ decomposition › T is a sparse tree network for dimension matching › U is a square unitary matrix › Σ is diagonal matrix ⧫ Use less # of MZIs ⧫ Limits: only remove the smaller unitary [ASPDAC ’ 19 Zhao+] 6

Our Proposed FFT-ONN Architecture ⧫ Efficient circulant matrix multiplication in Fourier domain ⧫ 2.2~3.7X area reduction ⧫ Without accuracy loss ST/CT : Splitter/Combiner tree ( Signal Fanout/Accumulation ) OFFT/OIFFT : Optical FFT/IFFT ( Fourier Domain Transform ) EM : Element-wise multiplication ( Weight Encoding in Fourier Domain ) 7

Block-circulant Matrix Multiplication ⧫ Not general matrix multiplication ⧫ Block-circulant matrix: each k x k block is a circulant matrix ⧫ Efficient algorithm in Fourier domain ⧫ Comparable expressiveness to classical NNs. [ICLR ’ 18 Li+] 8

OFFT/OIFFT ⧫ Basic structure for 2-point FFT › 2 × 2 directional coupler › −𝜌/2 phase shifter 9

Weight Encoding ⧫ Multiplication in Fourier domain › Attenuator: magnitude modulation › Phase shifter: phase modulation ⧫ Enable online/on-chip training › No complicated decomposition › Gradient backprop. friendly ⧫ Splitter tree: fanout ⧫ Combiner tree: accumulation › Fewer # of crossings: 𝑃(𝑜) 10

ONN Structured Pruning Flow ⧫ Two-phase structured pruning › Group lasso regularization › Save 30% - 40% components › Without accuracy loss (<0.5%) Masked 4 x 4 block eliminates the corresponding hardware Masked Weight Pruning Mask 𝑵 11

Training Curve ⧫ Same convergence speed as w/o pruning ⧫ Negligible accuracy loss (<0.5%) 12

Pruning-compatibility Comparison ⧫ Indirect and complicated ⧫ Direct pruning ⧫ Severe degradation ⧫ No accuracy loss 13

Experimental Results ⧫ 2.2~3.7X area cost reduction on various network configurations 𝑃(𝑛𝑜 ⧫ Similar accuracy (<0.5% diff) 𝑃 𝑛 2 + 𝑜 2 𝑙 log 2 𝑙) SVD: [Shen+, Nature Photonics 2017] T Σ U: [Zhao+, ASPDAC 2019] 14

Simulation Validation ⧫ Lumerical INTERCONNECT tool ⧫ Device-level numerical simulation 15

Simulation Validation ⧫ Lumerical INTERCONNECT simulation (<1.2% maximum error) › 4 x 4 identity projection › 4 x 4 circulant matrix multiplication 16

FFT-based ONN Summary ⧫ A new ONN architecture › Without using MZI › 2.2X ~ 3.7X lower area cost › Near-zero accuracy degradation ⧫ Fourier-domain ONN › Efficient neuromorphic computation using Fourier optics › Better compatibility to NN compression › Enable on-chip learning 17

Extension and Potential ⧫ Beyond classical real matrix multiplication › Enhanced expressiveness w/ latent weights in the complex domain ⧫ Beyond 1-D multi-layer perceptron › Extensible to 2-D frequency-domain optical convolution neural network ⧫ Beyond inference acceleration › Efficient on-chip training / self-learning t 18

Future Directions Design for better robustness: FFT non-ideality; weight-encoding error On-chip training framework for FFT-based ONN architecture Chip tapeout and experimental testing 19

Networks: An FFT-Based Architecture Jiaqi Gu, Zheng Zhao, Chenghao - PowerPoint PPT Presentation

Towards Area-Efficient Optical Neural Networks: An FFT-Based Architecture Jiaqi Gu, Zheng Zhao, Chenghao Feng, Mingjie Liu, Ray T. Chen, David Z. Pan ECE Department, The University of Texas at Austin This work is supported in part by MURI AI

The Fast Fourier Transform - FFT Sound Design and Interactive Music - FFT Learning Objectives

FFT Application Examples and Implementation FFT Example 1: Signal Sparsity in time Frequency

FFT libraries on Cray XT: CRay Adaptive FFT (CRAFFT) Jonathan Bentz Cray Inc. Outline

2DECOMP&FFT A Highly Scalable 2D Decomposition Library and FFT Interface Ning Li and

FFT analysis of DNA sequences Harvey Lab Group Meeting March 1, 2004 Russell Hanson 2 Nave

FFT Group Update Singapore Roadshow www.fftsecurity.com Forward Looking Statements This

Fast Convolutions Via the Overlap- and-Save Method Using Shared Memory FFT Karel Admek , Sofia

LOW-COMMUNICATION FFT WITH FAST MULTIPOLE METHOD Cris Cecka, Senior Research Scientist. May 11,

Analyzing fluid flows via the ergodicity defect ergodicity defect Sherry E. Scott FFT 2013

The The Fast Fourier Transform Fast Fourier Transform Basic FFT Stuff That s Good to

Advanced Digital Signal Processing Part 4: DFT and FFT Gerhard Schmidt

Case Study in 3D FFT Ahmed Sanaullah Martin Herbordt Vipin Sachdeva Boston University Silicon

Time-Frequency Analysis FFT with MATLAB Philippe B. Laval KSU Fall 2019 Philippe B. Laval

Non-Recursive In-Place FFT Algorithm Idea: "Unwind the in-place recursive algorithm and work

Time-Frequency Analysis FFT with MATLAB Philippe B. Laval KSU Fall 2015 Philippe B. Laval

and Inference for Convolutional Neural Networks 1 2 FFT IFFT 3 4 Mathieu et al.: Fast

Understanding the ultrafast TeV variability of blazars Dimitrios Giannios Princeton, Department

Role of cardiac CT in LAA closure Dr J-L. SABLAYROLLES, Dr I. TIMOFEEVA, Dr L. MACRON, Dr J.

T10: LGAD Sensors (402.8.4.1) Christopher Rogan University of Kansas US-MTD Technical Review

Advanced micromagnetics and atomistic simulations of magnets Richard F L Evans ESM 2018 Overview

M. Liebhaber E-mail: martin.liebhaber@helmholtz-berlin.de Helmholtz Center Berlin Institue for

OWCM: One-Way Counter Mode Danilo Gligoroski and Hristina Mihajloska and Hkon Jacobsen

Developments in Particle Therapy using Nuclear Science and Technology Helsinki, NUSPRASEN

Nanoclusters Nanoparticle vs. Nanocluster 2-3 nm Nanoclusters Nanoparticles No surface plasmon

Networks: An FFT-Based Architecture Jiaqi Gu, Zheng Zhao, Chenghao - PowerPoint PPT Presentation

Towards Area-Efficient Optical Neural Networks: An FFT-Based Architecture Jiaqi Gu, Zheng Zhao, Chenghao Feng, Mingjie Liu, Ray T. Chen, David Z. Pan ECE Department, The University of Texas at Austin This work is supported in part by MURI AI

The Fast Fourier Transform - FFT Sound Design and Interactive Music - FFT Learning Objectives

FFT Application Examples and Implementation FFT Example 1: Signal Sparsity in time Frequency

FFT libraries on Cray XT: CRay Adaptive FFT (CRAFFT) Jonathan Bentz Cray Inc. Outline

2DECOMP&amp;FFT A Highly Scalable 2D Decomposition Library and FFT Interface Ning Li and

FFT analysis of DNA sequences Harvey Lab Group Meeting March 1, 2004 Russell Hanson 2 Nave

FFT Group Update Singapore Roadshow www.fftsecurity.com Forward Looking Statements This

Fast Convolutions Via the Overlap- and-Save Method Using Shared Memory FFT Karel Admek , Sofia

LOW-COMMUNICATION FFT WITH FAST MULTIPOLE METHOD Cris Cecka, Senior Research Scientist. May 11,

Analyzing fluid flows via the ergodicity defect ergodicity defect Sherry E. Scott FFT 2013

The The Fast Fourier Transform Fast Fourier Transform Basic FFT Stuff That s Good to

Advanced Digital Signal Processing Part 4: DFT and FFT Gerhard Schmidt

Case Study in 3D FFT Ahmed Sanaullah Martin Herbordt Vipin Sachdeva Boston University Silicon

Time-Frequency Analysis FFT with MATLAB Philippe B. Laval KSU Fall 2019 Philippe B. Laval

Non-Recursive In-Place FFT Algorithm Idea: &quot;Unwind the in-place recursive algorithm and work

Time-Frequency Analysis FFT with MATLAB Philippe B. Laval KSU Fall 2015 Philippe B. Laval

and Inference for Convolutional Neural Networks 1 2 FFT IFFT 3 4 Mathieu et al.: Fast

Understanding the ultrafast TeV variability of blazars Dimitrios Giannios Princeton, Department

Role of cardiac CT in LAA closure Dr J-L. SABLAYROLLES, Dr I. TIMOFEEVA, Dr L. MACRON, Dr J.

T10: LGAD Sensors (402.8.4.1) Christopher Rogan University of Kansas US-MTD Technical Review

Advanced micromagnetics and atomistic simulations of magnets Richard F L Evans ESM 2018 Overview

M. Liebhaber E-mail: martin.liebhaber@helmholtz-berlin.de Helmholtz Center Berlin Institue for

OWCM: One-Way Counter Mode Danilo Gligoroski and Hristina Mihajloska and Hkon Jacobsen

Developments in Particle Therapy using Nuclear Science and Technology Helsinki, NUSPRASEN

Nanoclusters Nanoparticle vs. Nanocluster 2-3 nm Nanoclusters Nanoparticles No surface plasmon

2DECOMP&FFT A Highly Scalable 2D Decomposition Library and FFT Interface Ning Li and

Non-Recursive In-Place FFT Algorithm Idea: "Unwind the in-place recursive algorithm and work