Novel tensor framework for neural networks and model reduction - PowerPoint PPT Presentation

Novel tensor framework for neural networks and model reduction Shashanka Ubaru 1 Lior Horesh 1 Misha Kilmer 2 Elizabeth Newman 2 Haim Avron 3 Osman Malik 4 1 IBM TJ Watson Research Center 2 Tufts University 3 Tel Aviv University 4 University of Colorado, Boulder ICERM Workshop on Algorithms for Dimension and Complexity Reduction IBM Research / March, 2020 / c � 2020 IBM Corporation Shashanka Ubaru (IBM) Tensor NNs 1 / 35

Outline Brief introduction to tensors Tensor based graph neural networks Tensor neural networks Numerical Results Model reduction for NNs? Shashanka Ubaru (IBM) Tensor NNs 2 / 35

Introduction Much of real-world data is inherently multidimensional Many operators and models are natively multi-way Shashanka Ubaru (IBM) Tensor NNs 3 / 35

Tensor Applications Medical imaging Machine vision Latent semantic tensor indexing Video surveillance, streaming Ivanov, Mathies, Vasilescu, Tensor subspace analysis for viewpoint recognition , ICCV, 2009 Shi, Ling, Hu, Yuan, Xing, Multi-target tracking with motion context in tensor power iteration , CVPR, 2014 Shashanka Ubaru (IBM) Tensor NNs 4 / 35

Background and Notation Notation : A n 1 × n 2 ..., × n d - d th order tensor ◮ 0 th order tensor - scalar ◮ 1 st order tensor - vector ◮ 2 nd order tensor - matrix ◮ 3 rd order tensor ... Shashanka Ubaru (IBM) Tensor NNs 5 / 35

Inside the Box Fiber - a vector defined by fixing all but one index while varying the rest Slice - a matrix defined by fixing all but two indices while varying the rest Shashanka Ubaru (IBM) Tensor NNs 6 / 35

Tensor Multiplication Definition The k - mode multiplication of a tensor A ∈ R n 1 × n 2 ×···× n d with a matrix U ∈ R j × n k is denoted by A× k U and is of size n 1 × · · · × n k − 1 × j × n k +1 × · · · × n d Element-wise n d � ( A× k U ) i 1 ··· i k − 1 ji k +1 ··· i d = a i 1 i 2 ··· i d u ji k i k =1 k-mode multiplication Shashanka Ubaru (IBM) Tensor NNs 7 / 35

The ⋆ M -Product Given A ∈ R ℓ × p × n , B ∈ R p × m × n , and an invertible n × n matrix M , then � � A ❆ ˆ ˆ × 3 M − 1 C = A ⋆ M B = B where C ∈ R ℓ × m × n , ˆ A = A × 3 M , and ❆ multiplies the frontal slices in parallel Useful properties: tensor transpose, identity tensor, connection to Fourier transform, invariance to circulant shifts, . . . Shashanka Ubaru (IBM) Tensor NNs 8 / 35

Tensor Graph Convolutional Networks Shashanka Ubaru (IBM) Tensor NNs 9 / 35

Dynamic Graphs Graphs are ubiquitous data structures - represent interactions and structural relationships. In many real-world applications, underlying graph changes over time. Learning representations of dynamic graphs is essential. Shashanka Ubaru (IBM) Tensor NNs 10 / 35

Dynamic Graphs - Applications Corporate/financial networks, Natural Language Understanding (NLU), Social networks, Neural activity networks, Traffic predictions. Shashanka Ubaru (IBM) Tensor NNs 11 / 35

Graph Convolutional Networks Graph Neural Networks (GNN) popular tools to explore graph structured data. Graph Convolutional Networks (GCN) - based on graph convolution filters - extend convolutional neural networks (CNNs) to irregular graph domains. These GNN models operate on a given, static graph. Courtesy: Image by (Kipf & Welling, 2016). Shashanka Ubaru (IBM) Tensor NNs 12 / 35

Graph Convolutional Networks Motivation: Convolution of two signals x and y : x ⊗ y = F − 1 ( Fx ⊙ Fy ) , F is Fourier transform (DFT matrix). Convolution of two node signals x and y on a graph with Laplacian L = U Λ U T : x ⊗ y = U ( U T x ⊙ U T y ) . Filtered convolution: x ⊗ filt y = h ( L ) x ⊙ h ( L ) y , with matrix filter function h ( L ) = U h (Λ) U T . Shashanka Ubaru (IBM) Tensor NNs 13 / 35

Graph Convolutional Neural Networks Layer of initial convolution based GNNs (Bruna et. al, 2016): Given graph Laplacian L ∈ R N × N and node features X ∈ R N × F : H i +1 = σ ( h θ ( L ) H i W ( i ) ) , h θ filter function parametrized by θ , σ a nonlinear function (e.g., RELU), and W ( i ) a weight matrix with H 0 = X . Defferrard et al., (2016) used Chebyshev approximation: K � h θ ( L ) = θ k T k ( L ) . k =0 GCN (Kipf & Welling, 2016): Each layer takes form: σ ( LXW ). 2-layer example: Z = softmax( L σ ( LXW (0) ) W (1) ) Shashanka Ubaru (IBM) Tensor NNs 14 / 35

GCN for dynamic graphs We consider time varying , or dynamic , graphs Goal: Extend GCN framework to the dynamic setting for tasks such as node and edge classification, link prediction. Our approach: Use the tensor framework T adjacency matrices A :: t ∈ R N × N stacked into tensor A ∈ R N × N × T T node feature matrices X :: t ∈ R N × F stacked into tensor X ∈ R N × F × T Shashanka Ubaru (IBM) Tensor NNs 15 / 35

TensorGCN Adjacency tensor Time T A 1 Embedding 2 1 Graph tasks Link prediction TensorGCN Edge classification Node classification X 1 Dynamic graph Feature Tensor Shashanka Ubaru (IBM) Tensor NNs 16 / 35

TensorGCN We use the ⋆ M -Product to extend the std. GCN to dynamic graphs. We propose tensor GCN model σ ( A ⋆ M X ⋆ M W ). 2-layer example: Z = softmax( A ⋆ M σ ( A ⋆ M X ⋆ M W (0) ) ⋆ M W (1) ) (1) We choose M to be lower triangular and banded: � 1 if max(1 , t − b + 1) ≤ k ≤ t, min( b,t ) M tk = 0 otherwise , Can be shown to be consistent with a spatio-temporal message passing model. O. Malik, S. Ubaru, L. Horesh, M. Kilmer, and H. Avron, Tensor graph convolutional networks for prediction on dynamic graphs , 2020 Shashanka Ubaru (IBM) Tensor NNs 17 / 35

Tensor Neural Networks Shashanka Ubaru (IBM) Tensor NNs 18 / 35

Neural Networks Let a 0 be a feature vector with an associated target vector c Let f be a function which propagates a 0 though connected layers: a j +1 = σ ( W j · a j + b j ) for j = 0 , . . . , N − 1 , where σ is some nonlinear, monotonic activation function Shashanka Ubaru (IBM) Tensor NNs 19 / 35

Neural Networks Let a 0 be a feature vector with an associated target vector c Let f be a function which propagates a 0 though connected layers: a j +1 = σ ( W j · a j + b j ) for j = 0 , . . . , N − 1 , where σ is some nonlinear, monotonic activation function Goal: Learn the function f which optimizes: m f ∈H E ( f ) ≡ 1 � V ( c ( i ) , f ( a ( i ) min 0 )) + R ( f ) m � �� i =1 regularizer loss function H - hypothesis space of functions rich, restrictive, efficient Shashanka Ubaru (IBM) Tensor NNs 19 / 35

Reduced Parameterization Given an n × n image A 0 , stored as a 0 ∈ R n 2 × 1 and � A 0 ∈ R n × 1 × n . Matrix: a j +1 = σ ( W j · a j + b j ) n 4 + n 2 parameters Tensor: A j +1 = σ ( W j ⋆ M � � A j + � B j ) n 3 + n 2 parameters Shashanka Ubaru (IBM) Tensor NNs 20 / 35

Improved Parametrization Given an n × n image A 0 , stored as a 0 ∈ R n 2 × 1 and � A 0 ∈ R n × 1 × n . Shashanka Ubaru (IBM) Tensor NNs 21 / 35

Tensor Neural Networks (tNNs) Forward propagation Update parameters Objective function Backward propagation M. Nielsen, Neural networks and deep learning , 2017 Shashanka Ubaru (IBM) Tensor NNs 22 / 35

Tensor Neural Networks (tNNs) A j +1 = σ ( W j ⋆ M � � A j + � B j ) Update parameters Objective function Backward propagation M. Nielsen, Neural networks and deep learning , 2017 Shashanka Ubaru (IBM) Tensor NNs 22 / 35

Tensor Neural Networks (tNNs) A j +1 = σ ( W j ⋆ M � A j +1 = σ ( W j ⋆ M � � � A j + � A j + � B j ) B j ) 2 || W N · unfold ( � E = 1 A N ) − c || 2 Update parameters F Backward propagation M. Nielsen, Neural networks and deep learning , 2017 Shashanka Ubaru (IBM) Tensor NNs 22 / 35

Tensor Neural Networks (tNNs) A j +1 = σ ( W j ⋆ M � A j +1 = σ ( W j ⋆ M � � � A j + � A j + � B j ) B j ) 2 || W N · unfold ( � E = 1 A N ) − c || 2 Update parameters F δ � j ⋆ M ( δ � A j +1 ⊙ σ ′ ( � A j = W ⊤ Z j +1 )) where � Z j +1 = W j ⋆ M � A j + � B j and ⊙ is the pointwise product δ � A j := ∂E ∂ � A j M. Nielsen, Neural networks and deep learning , 2017 Shashanka Ubaru (IBM) Tensor NNs 22 / 35

Tensor Neural Networks (tNNs) A j +1 = σ ( W j ⋆ M � A j +1 = σ ( W j ⋆ M � � � A j + � A j + � B j ) B j ) 2 || W N · unfold ( � E = 1 A N ) − c || 2 Update parameters F δ � j ⋆ M ( δ � A j +1 ⊙ σ ′ ( � A j = W ⊤ Z j +1 )) where � Z j +1 = W j ⋆ M � A j + � B j and ⊙ is the pointwise product ∂ � ∂ � A j +1 Z j +1 δ � A j := ∂E ∂E A j = ∂ � ∂ � ∂ � ∂ � A j +1 Z j +1 A j M. Nielsen, Neural networks and deep learning , 2017 Shashanka Ubaru (IBM) Tensor NNs 22 / 35

Tensor Neural Networks (tNNs) A j +1 = σ ( W j ⋆ M � A j +1 = σ ( W j ⋆ M � � � A j + � A j + � B j ) B j ) δ W j = ( δ � A j +1 ⊙ σ ′ ( � Z j +1 )) ⋆ M � A ⊤ j 2 || W N · unfold ( � E = 1 A N ) − c || 2 F δ � B j = δ � A j +1 ⊙ σ ′ ( � Z j +1 ) δ � j ⋆ M ( δ � A j +1 ⊙ σ ′ ( � A j = W ⊤ Z j +1 )) where � Z j +1 = W j ⋆ M � A j + � B j and ⊙ is the pointwise product M. Nielsen, Neural networks and deep learning , 2017 Shashanka Ubaru (IBM) Tensor NNs 22 / 35

Novel tensor framework for neural networks and model reduction - PowerPoint PPT Presentation

Novel tensor framework for neural networks and model reduction Shashanka Ubaru 1 Lior Horesh 1 Misha Kilmer 2 Elizabeth Newman 2 Haim Avron 3 Osman Malik 4 1 IBM TJ Watson Research Center 2 Tufts University 3 Tel Aviv University 4 University of

8. Tensor Field Visualization Tensor: extension of concept of scalar and vector Tensor data

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS Cris Cecka Senior Research Scientist,

(Some) Challenges in (Some) Challenges in Tensor Mining Tensor Mining Evrim Acar Sandia

Tensor Field Techniques Lecture 11 March 5, 2020 Outline Basics of tensor algebra Tensor

TENSOR ALGEBRA Continuum Mechanics Course (MMC) - ETSECCPB - UPC Introduction to Tensors Tensor

Tensor-Matrix Products with a Compressed Sparse Tensor Shaden Smith George Karypis University

Tensor Field Visualization 9-1 Ronald Peikert SciVis 2007 - Tensor Fields Tensors

Novel Gaits for a Novel Novel Gaits for a Novel Crawling/Grasping Mechanism Crawling/Grasping

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

FY2019 Results Briefing Results Briefing Agenda

A Combinatorial Proof of the Cyclic Sieving Phenomenon for Faces of Coxeterhedra Tung-Shan Fu

Machine Placement and Migration in Cloud Datacenters Authors: Liuhua Chen, Haiying Shen and

Dependent Types in Haskell What is Dependent Type Theory? are types, proofs are programs

Upcoming HORIZON 2020 HEALTH CALLS 2018-20 Digital Health Ecosystem Wales Andy Bleaden -

Algebraic models for multilinear dependence Jason Morton Stanford University February 21, 2009

Optimizing an homogeneous polynomial on the unit sphere Rima Khouja Advisors: Bernard Mourrain

MINCON INTERIM RESULTS 2020 The Drillers Choice SUMMARY H1 2020 PROGRESS THROUGH CHALLENGING