Neural Networks and Sparse Coding from the Signal Processing - PowerPoint PPT Presentation

Neural Networks and Sparse Coding from the Signal Processing Perspective Gerald Schuller Ilmenau University of Technology and Fraunhofer Institute for Digital Media Technology (IDMT) April 6, 2016 Gerald Schuller Ilmenau University of Technology and Fraunhofer Institute for Digital Media Technology (IDMT) Neural Networks and Sparse Coding from the Signal Processing Perspective

Gerald Schuller Ilmenau University of Technology and Fraunhofer Institute for Digital Media Technology (IDMT) Neural Networks and Sparse Coding from the Signal Processing Perspective

Introduction Goal: Show Connections and shared principles between neural networks, sparse coding, and optimization and signal processing. You will see programming examples in Python This is for easier understandability, to test if and how algorithms work, and for reproducibility of results, to make algorithms testable and useful for other researchers. Gerald Schuller Ilmenau University of Technology and Fraunhofer Institute for Digital Media Technology (IDMT) Neural Networks and Sparse Coding from the Signal Processing Perspective

Introduction Optimization Optimization is needed for Neural Networks, Sparse Coding, and Compressed Sensing Feasibility often depends on a fast and practical optimization algorithm Gerald Schuller Ilmenau University of Technology and Fraunhofer Institute for Digital Media Technology (IDMT) Neural Networks and Sparse Coding from the Signal Processing Perspective

Introduction Optimization The goal of optimization is to find the vector x which minimizes the error function f ( x ). We know: in a minimum, the functions derivative is zero, f ′ ( x ) := df ( x ) = 0 dx . Gerald Schuller Ilmenau University of Technology and Fraunhofer Institute for Digital Media Technology (IDMT) Neural Networks and Sparse Coding from the Signal Processing Perspective

Newtons Algorithm Newtons Method An approach to iteratively find the zero of a function is Newtons method . Take some function f(x), where x is not a vector but just a number, then we can find its minimum as depicted in the following picture. Gerald Schuller Ilmenau University of Technology and Fraunhofer Institute for Digital Media Technology (IDMT) Neural Networks and Sparse Coding from the Signal Processing Perspective

Newtons Algorithm Newtons Method with the iteration x new = x old − f ( x old ) f ′ ( x old ) Now we want to find the zero not of f ( x ), but of f ′ ( x ), hence we simply replace f ( x ) by f ′ ( x ) and obtain the following iteration, x new = x old − f ′ ( x old ) ′′ ( x old ) f Gerald Schuller Ilmenau University of Technology and Fraunhofer Institute for Digital Media Technology (IDMT) Neural Networks and Sparse Coding from the Signal Processing Perspective

Newtons Algorithm Newtons Method For a multi-dimensional function, where the argument x is a vector, the first derivative is a vector called Gradient, with symbol Nabla ∇ , because we need the derivative with respect to each element of the argument vector x , ∂ f   ∂ x 1 . . ∇ f ( x ) =   .   ∂ f ∂ x n (where n is the number of unknowns in the argiment vector x ). Gerald Schuller Ilmenau University of Technology and Fraunhofer Institute for Digital Media Technology (IDMT) Neural Networks and Sparse Coding from the Signal Processing Perspective

Newtons Algorithm Newtons Method For the second derivative, we need to take each element of the gradient vector and again take the derivative to each element of the argument vector. Hence we obtain a matrix, the Hesse Matrix , as matrix of second derivatives,  ∂ 2 f ∂ 2 f  · · · ∂ x 1 ∂ x 1 ∂ x 1 ∂ x n . . ...   . . H f ( x ) = . .     ∂ 2 f ∂ 2 f · · · ∂ x n ∂ x 1 ∂ x n ∂ x n Observe that this Hesse Matrix is symmetric around its diagonal. Gerald Schuller Ilmenau University of Technology and Fraunhofer Institute for Digital Media Technology (IDMT) Neural Networks and Sparse Coding from the Signal Processing Perspective

Newtons Algorithm Newtons Method Using these definitions we can generalize our Newton algorithm to the multi-dimensional case. The one-dimensional iteration x new = x old − f ′ ( x old ) ′′ ( x old ) f turns into the multi-dimensional iteration x new = x old − H − 1 ( x old ) ∇ f ( x old ) f Gerald Schuller Ilmenau University of Technology and Fraunhofer Institute for Digital Media Technology (IDMT) Neural Networks and Sparse Coding from the Signal Processing Perspective

Gradient Descent Gradient Descent For a minimum, H f ( x ) must be positive definite (all eigenvalues are positive). The problem here is that for the Hesse matrix we need to compute n 2 second derivatives, which can be computationally too complex, and then we need to invert this matrix. Hence we make the simplifying assumption, that the Hesse matrix can be written as a diagonal matrix with identical values on the diagonal . This leads to the widely used Gradient Descent or Steepest Descent method. Gerald Schuller Ilmenau University of Technology and Fraunhofer Institute for Digital Media Technology (IDMT) Neural Networks and Sparse Coding from the Signal Processing Perspective

Gradient Descent Gradient Descent We approximate our Hesse matrix as H f ( x k ) = 1 α · I Observe that this is mostly is mostly a very crude approximation, but since we have an iteration with many small updates it can still work. The best value of α depends on how good it approximates the Hesse matrix. Gerald Schuller Ilmenau University of Technology and Fraunhofer Institute for Digital Media Technology (IDMT) Neural Networks and Sparse Coding from the Signal Processing Perspective

Gradient Descent Gradient Descent Hence our iteration x new = x old − H − 1 ( x old ) ∇ f ( x old ) f with H − 1 = α · I turns into f x new = x old − α ∇ f ( x old ) which is much simpler to compute. This is also called “Steepest Descent”, because the gradient tell us the direction of the steepest descent, or “Gradient Descent” because of the update direction along the gradient. Gerald Schuller Ilmenau University of Technology and Fraunhofer Institute for Digital Media Technology (IDMT) Neural Networks and Sparse Coding from the Signal Processing Perspective

Gradient Descent Gradient Descent We see that the update of x consists only of the gradient ∇ f ( x k ) scaled by the factor α . In each step, we reduce the value of f ( x ) by moving x in the direction of the gradient. If we make α larger, we obtain larger update steps and hence quicker convergence to the minimum, but it may oscillate around the minimum. For smaller α the steps become smaller, but it will converge more precisely to the minimum. Gerald Schuller Ilmenau University of Technology and Fraunhofer Institute for Digital Media Technology (IDMT) Neural Networks and Sparse Coding from the Signal Processing Perspective

Gradient Descent Gradient Descent Example Find the 2-dimensional minimum of the function f ( x 0 , x 1 ) = cos( x 0 ) − sin( x 1 ) Its gradient is ∇ f ( x 0 , x 1 ) = [ − sin( x 0 ) , − cos( x 1 )] Observe: the Hessian matrix of 2nd derivatives has diagonal form (since it is a sum of 1-dim. functions), although not necessarily with the same entries on the diagonal, hence it is a good fit for the Gradient Descent Gerald Schuller Ilmenau University of Technology and Fraunhofer Institute for Digital Media Technology (IDMT) Neural Networks and Sparse Coding from the Signal Processing Perspective

Gradient Descent Gradient Descent Example in Python ipython − pylab alpha=1; x=array([2,2]) #Gradient Descent update: x= x − alpha ∗ array([ − sin(x[0]), − cos(x[1])]) print (x) #[ 2.90929743 1.58385316] x= x − alpha ∗ array([ − sin(x[0]), − cos(x[1])]) print (x) #[ 3.13950913 1.5707967 ] x= x − alpha ∗ array([ − sin(x[0]), − cos(x[1])]) print (x) #[ 3.14159265 1.57079633] print (pi, pi/2) #(3.141592653589793, 1.5707963267948966) Gerald Schuller Ilmenau University of Technology and Fraunhofer Institute for Digital Media Technology (IDMT) Neural Networks and Sparse Coding from the Signal Processing Perspective

Gradient Descent Gradient Descent Example in Python Observe: after only 3 iterations we obtain π and pi / 2 with 9 digits accuracy! Keep in mind : Gradient Descent works if its assumption of a diagonal Hesse matrix is true! Gerald Schuller Ilmenau University of Technology and Fraunhofer Institute for Digital Media Technology (IDMT) Neural Networks and Sparse Coding from the Signal Processing Perspective

Gradient Descent Gradient Descent Example 2 in Python Find the 2-dimensional minimum of the function f ( x 0 , x 1 ) = exp (cos( x 0 ) − sin( x 1 )) Observe: it has the same minima as before, and has resemblance to non-linear functions in Neural Networks. Its gradient is ∇ f ( x 0 , x 1 ) = exp (cos( x 0 ) − sin( x 1 )) · [ − sin( x 0 ) , − cos( x 1 )] Observe: the Hessian matrix of 2nd derivatives now has no diagonal form (because of the non-linear exp function), hence it is not a good fit for the Gradient Descent anymore. Gerald Schuller Ilmenau University of Technology and Fraunhofer Institute for Digital Media Technology (IDMT) Neural Networks and Sparse Coding from the Signal Processing Perspective

Neural Networks and Sparse Coding from the Signal Processing - PowerPoint PPT Presentation

Neural Networks and Sparse Coding from the Signal Processing Perspective Gerald Schuller Ilmenau University of Technology and Fraunhofer Institute for Digital Media Technology (IDMT) April 6, 2016 Gerald Schuller Ilmenau University of

Bag of Pursuits and Neural Gas for Improved Sparse Coding Manifold Learning with Sparse Coding

Formal Modeling in Cognitive Science 1 Coding Theorems Lecture 28: Kraft Inequality; Source Coding

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Coding and Applications in Sensor Networks Coding and Applications in Sensor Networks Why coding?

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Sparse Coding and Dictionary Learning for Image Analysis Part I: Optimization for Sparse Coding

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Image and Video Coding: Video Coding Extensions Screen Content Coding Screen Content Coding

ADVANCED MULTIMEDIA ADVANCED MULTIMEDIA CODING CODING Fernando Pereira Instituto Superior

Dynamical systems Expanding maps on the circle. Coding Jana Rodriguez Hertz ICTP 2018 coding

Coding and Applications in Sensor Networks Why coding? Information compression

Tx Signal: 1000 Hz sine wave; Attenuation; Random noise with 0.5ms spike Tx Signal Noise Rx

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Review of DSP 1 Signal and Systems: Signal are represented mathematically as functions of

Digital Signal Processing: state-of-the-art at CERN and recommendations. Maria Elena Angoletta

Chapter 1 Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu

1 Problem: Problem: 7-Segment LED Display 7-Segment LED Display Problem : Design a circuit that

Draft A review of Array-RQMC Sorting methods and convergence rates Pierre LEcuyer Christian

Sampo-UI Framework for Semantic Portal User Interfaces Digital Humanities in Action: Sampo Model

Quicksort 4-18-2013 Th Thursda sday, April 18 18 th th Barben Ro Rooms s A&B &B,

CDA 4253/CIS 6930 FPGA System Design RTL Design Methodology Hao Zheng Comp S ci & Eng Univ

Neural Networks and Sparse Coding from the Signal Processing - PowerPoint PPT Presentation

Neural Networks and Sparse Coding from the Signal Processing Perspective Gerald Schuller Ilmenau University of Technology and Fraunhofer Institute for Digital Media Technology (IDMT) April 6, 2016 Gerald Schuller Ilmenau University of

Bag of Pursuits and Neural Gas for Improved Sparse Coding Manifold Learning with Sparse Coding

Formal Modeling in Cognitive Science 1 Coding Theorems Lecture 28: Kraft Inequality; Source Coding

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Coding and Applications in Sensor Networks Coding and Applications in Sensor Networks Why coding?

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Sparse Coding and Dictionary Learning for Image Analysis Part I: Optimization for Sparse Coding

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Image and Video Coding: Video Coding Extensions Screen Content Coding Screen Content Coding

ADVANCED MULTIMEDIA ADVANCED MULTIMEDIA CODING CODING Fernando Pereira Instituto Superior

Dynamical systems Expanding maps on the circle. Coding Jana Rodriguez Hertz ICTP 2018 coding

Coding and Applications in Sensor Networks Why coding? Information compression

Tx Signal: 1000 Hz sine wave; Attenuation; Random noise with 0.5ms spike Tx Signal Noise Rx

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Review of DSP 1 Signal and Systems: Signal are represented mathematically as functions of

Digital Signal Processing: state-of-the-art at CERN and recommendations. Maria Elena Angoletta

Chapter 1 Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu

1 Problem: Problem: 7-Segment LED Display 7-Segment LED Display Problem : Design a circuit that

Draft A review of Array-RQMC Sorting methods and convergence rates Pierre LEcuyer Christian

Sampo-UI Framework for Semantic Portal User Interfaces Digital Humanities in Action: Sampo Model

Quicksort 4-18-2013 Th Thursda sday, April 18 18 th th Barben Ro Rooms s A&amp;B &amp;B,

CDA 4253/CIS 6930 FPGA System Design RTL Design Methodology Hao Zheng Comp S ci &amp; Eng Univ

Quicksort 4-18-2013 Th Thursda sday, April 18 18 th th Barben Ro Rooms s A&B &B,

CDA 4253/CIS 6930 FPGA System Design RTL Design Methodology Hao Zheng Comp S ci & Eng Univ