novel tensor framework for neural networks and model
play

Novel tensor framework for neural networks and model reduction - PowerPoint PPT Presentation

Novel tensor framework for neural networks and model reduction Shashanka Ubaru 1 Lior Horesh 1 Misha Kilmer 2 Elizabeth Newman 2 Haim Avron 3 Osman Malik 4 1 IBM TJ Watson Research Center 2 Tufts University 3 Tel Aviv University 4 University of


  1. Novel tensor framework for neural networks and model reduction Shashanka Ubaru 1 Lior Horesh 1 Misha Kilmer 2 Elizabeth Newman 2 Haim Avron 3 Osman Malik 4 1 IBM TJ Watson Research Center 2 Tufts University 3 Tel Aviv University 4 University of Colorado, Boulder ICERM Workshop on Algorithms for Dimension and Complexity Reduction IBM Research / March, 2020 / c � 2020 IBM Corporation Shashanka Ubaru (IBM) Tensor NNs 1 / 35

  2. Outline Brief introduction to tensors Tensor based graph neural networks Tensor neural networks Numerical Results Model reduction for NNs? Shashanka Ubaru (IBM) Tensor NNs 2 / 35

  3. Introduction Much of real-world data is inherently multidimensional Many operators and models are natively multi-way Shashanka Ubaru (IBM) Tensor NNs 3 / 35

  4. Tensor Applications Medical imaging Machine vision Latent semantic tensor indexing Video surveillance, streaming Ivanov, Mathies, Vasilescu, Tensor subspace analysis for viewpoint recognition , ICCV, 2009 Shi, Ling, Hu, Yuan, Xing, Multi-target tracking with motion context in tensor power iteration , CVPR, 2014 Shashanka Ubaru (IBM) Tensor NNs 4 / 35

  5. Background and Notation Notation : A n 1 × n 2 ..., × n d - d th order tensor ◮ 0 th order tensor - scalar ◮ 1 st order tensor - vector ◮ 2 nd order tensor - matrix ◮ 3 rd order tensor ... Shashanka Ubaru (IBM) Tensor NNs 5 / 35

  6. Inside the Box Fiber - a vector defined by fixing all but one index while varying the rest Slice - a matrix defined by fixing all but two indices while varying the rest Shashanka Ubaru (IBM) Tensor NNs 6 / 35

  7. Tensor Multiplication Definition The k - mode multiplication of a tensor A ∈ R n 1 × n 2 ×···× n d with a matrix U ∈ R j × n k is denoted by A× k U and is of size n 1 × · · · × n k − 1 × j × n k +1 × · · · × n d Element-wise n d � ( A× k U ) i 1 ··· i k − 1 ji k +1 ··· i d = a i 1 i 2 ··· i d u ji k i k =1 k-mode multiplication Shashanka Ubaru (IBM) Tensor NNs 7 / 35

  8. The ⋆ M -Product Given A ∈ R ℓ × p × n , B ∈ R p × m × n , and an invertible n × n matrix M , then � � A ❆ ˆ ˆ × 3 M − 1 C = A ⋆ M B = B where C ∈ R ℓ × m × n , ˆ A = A × 3 M , and ❆ multiplies the frontal slices in parallel Useful properties: tensor transpose, identity tensor, connection to Fourier transform, invariance to circulant shifts, . . . Shashanka Ubaru (IBM) Tensor NNs 8 / 35

  9. Tensor Graph Convolutional Networks Shashanka Ubaru (IBM) Tensor NNs 9 / 35

  10. Dynamic Graphs Graphs are ubiquitous data structures - represent interactions and structural relationships. In many real-world applications, underlying graph changes over time. Learning representations of dynamic graphs is essential. Shashanka Ubaru (IBM) Tensor NNs 10 / 35

  11. Dynamic Graphs - Applications Corporate/financial networks, Natural Language Understanding (NLU), Social networks, Neural activity networks, Traffic predictions. Shashanka Ubaru (IBM) Tensor NNs 11 / 35

  12. Graph Convolutional Networks Graph Neural Networks (GNN) popular tools to explore graph structured data. Graph Convolutional Networks (GCN) - based on graph convolution filters - extend convolutional neural networks (CNNs) to irregular graph domains. These GNN models operate on a given, static graph. Courtesy: Image by (Kipf & Welling, 2016). Shashanka Ubaru (IBM) Tensor NNs 12 / 35

  13. Graph Convolutional Networks Motivation: Convolution of two signals x and y : x ⊗ y = F − 1 ( Fx ⊙ Fy ) , F is Fourier transform (DFT matrix). Convolution of two node signals x and y on a graph with Laplacian L = U Λ U T : x ⊗ y = U ( U T x ⊙ U T y ) . Filtered convolution: x ⊗ filt y = h ( L ) x ⊙ h ( L ) y , with matrix filter function h ( L ) = U h (Λ) U T . Shashanka Ubaru (IBM) Tensor NNs 13 / 35

  14. Graph Convolutional Neural Networks Layer of initial convolution based GNNs (Bruna et. al, 2016): Given graph Laplacian L ∈ R N × N and node features X ∈ R N × F : H i +1 = σ ( h θ ( L ) H i W ( i ) ) , h θ filter function parametrized by θ , σ a nonlinear function (e.g., RELU), and W ( i ) a weight matrix with H 0 = X . Defferrard et al., (2016) used Chebyshev approximation: K � h θ ( L ) = θ k T k ( L ) . k =0 GCN (Kipf & Welling, 2016): Each layer takes form: σ ( LXW ). 2-layer example: Z = softmax( L σ ( LXW (0) ) W (1) ) Shashanka Ubaru (IBM) Tensor NNs 14 / 35

  15. GCN for dynamic graphs We consider time varying , or dynamic , graphs Goal: Extend GCN framework to the dynamic setting for tasks such as node and edge classification, link prediction. Our approach: Use the tensor framework T adjacency matrices A :: t ∈ R N × N stacked into tensor A ∈ R N × N × T T node feature matrices X :: t ∈ R N × F stacked into tensor X ∈ R N × F × T Shashanka Ubaru (IBM) Tensor NNs 15 / 35

  16. TensorGCN Adjacency tensor Time T A 1 Embedding 2 1 Graph tasks Link prediction TensorGCN Edge classification Node classification X 1 Dynamic graph Feature Tensor Shashanka Ubaru (IBM) Tensor NNs 16 / 35

  17. TensorGCN We use the ⋆ M -Product to extend the std. GCN to dynamic graphs. We propose tensor GCN model σ ( A ⋆ M X ⋆ M W ). 2-layer example: Z = softmax( A ⋆ M σ ( A ⋆ M X ⋆ M W (0) ) ⋆ M W (1) ) (1) We choose M to be lower triangular and banded: � 1 if max(1 , t − b + 1) ≤ k ≤ t, min( b,t ) M tk = 0 otherwise , Can be shown to be consistent with a spatio-temporal message passing model. O. Malik, S. Ubaru, L. Horesh, M. Kilmer, and H. Avron, Tensor graph convolutional networks for prediction on dynamic graphs , 2020 Shashanka Ubaru (IBM) Tensor NNs 17 / 35

  18. Tensor Neural Networks Shashanka Ubaru (IBM) Tensor NNs 18 / 35

  19. Neural Networks Let a 0 be a feature vector with an associated target vector c Let f be a function which propagates a 0 though connected layers: a j +1 = σ ( W j · a j + b j ) for j = 0 , . . . , N − 1 , where σ is some nonlinear, monotonic activation function Shashanka Ubaru (IBM) Tensor NNs 19 / 35

  20. Neural Networks Let a 0 be a feature vector with an associated target vector c Let f be a function which propagates a 0 though connected layers: a j +1 = σ ( W j · a j + b j ) for j = 0 , . . . , N − 1 , where σ is some nonlinear, monotonic activation function Goal: Learn the function f which optimizes: m f ∈H E ( f ) ≡ 1 � V ( c ( i ) , f ( a ( i ) min 0 )) + R ( f ) m � �� � � �� � i =1 regularizer loss function H - hypothesis space of functions rich, restrictive, efficient Shashanka Ubaru (IBM) Tensor NNs 19 / 35

  21. Reduced Parameterization Given an n × n image A 0 , stored as a 0 ∈ R n 2 × 1 and � A 0 ∈ R n × 1 × n . Matrix: a j +1 = σ ( W j · a j + b j ) n 4 + n 2 parameters Tensor: A j +1 = σ ( W j ⋆ M � � A j + � B j ) n 3 + n 2 parameters Shashanka Ubaru (IBM) Tensor NNs 20 / 35

  22. Improved Parametrization Given an n × n image A 0 , stored as a 0 ∈ R n 2 × 1 and � A 0 ∈ R n × 1 × n . Shashanka Ubaru (IBM) Tensor NNs 21 / 35

  23. Tensor Neural Networks (tNNs) Forward propagation Update parameters Objective function Backward propagation M. Nielsen, Neural networks and deep learning , 2017 Shashanka Ubaru (IBM) Tensor NNs 22 / 35

  24. Tensor Neural Networks (tNNs) A j +1 = σ ( W j ⋆ M � � A j + � B j ) Update parameters Objective function Backward propagation M. Nielsen, Neural networks and deep learning , 2017 Shashanka Ubaru (IBM) Tensor NNs 22 / 35

  25. Tensor Neural Networks (tNNs) A j +1 = σ ( W j ⋆ M � A j +1 = σ ( W j ⋆ M � � � A j + � A j + � B j ) B j ) 2 || W N · unfold ( � E = 1 A N ) − c || 2 Update parameters F Backward propagation M. Nielsen, Neural networks and deep learning , 2017 Shashanka Ubaru (IBM) Tensor NNs 22 / 35

  26. Tensor Neural Networks (tNNs) A j +1 = σ ( W j ⋆ M � A j +1 = σ ( W j ⋆ M � � � A j + � A j + � B j ) B j ) 2 || W N · unfold ( � E = 1 A N ) − c || 2 Update parameters F δ � j ⋆ M ( δ � A j +1 ⊙ σ ′ ( � A j = W ⊤ Z j +1 )) where � Z j +1 = W j ⋆ M � A j + � B j and ⊙ is the pointwise product δ � A j := ∂E ∂ � A j M. Nielsen, Neural networks and deep learning , 2017 Shashanka Ubaru (IBM) Tensor NNs 22 / 35

  27. Tensor Neural Networks (tNNs) A j +1 = σ ( W j ⋆ M � A j +1 = σ ( W j ⋆ M � � � A j + � A j + � B j ) B j ) 2 || W N · unfold ( � E = 1 A N ) − c || 2 Update parameters F δ � j ⋆ M ( δ � A j +1 ⊙ σ ′ ( � A j = W ⊤ Z j +1 )) where � Z j +1 = W j ⋆ M � A j + � B j and ⊙ is the pointwise product ∂ � ∂ � A j +1 Z j +1 δ � A j := ∂E ∂E A j = ∂ � ∂ � ∂ � ∂ � A j +1 Z j +1 A j M. Nielsen, Neural networks and deep learning , 2017 Shashanka Ubaru (IBM) Tensor NNs 22 / 35

  28. Tensor Neural Networks (tNNs) A j +1 = σ ( W j ⋆ M � A j +1 = σ ( W j ⋆ M � � � A j + � A j + � B j ) B j ) δ W j = ( δ � A j +1 ⊙ σ ′ ( � Z j +1 )) ⋆ M � A ⊤ j 2 || W N · unfold ( � E = 1 A N ) − c || 2 F δ � B j = δ � A j +1 ⊙ σ ′ ( � Z j +1 ) δ � j ⋆ M ( δ � A j +1 ⊙ σ ′ ( � A j = W ⊤ Z j +1 )) where � Z j +1 = W j ⋆ M � A j + � B j and ⊙ is the pointwise product M. Nielsen, Neural networks and deep learning , 2017 Shashanka Ubaru (IBM) Tensor NNs 22 / 35

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend