Lecture 24: Autoencoders ICA Aykut Erdem December 2017 Hacettepe - PowerPoint PPT Presentation

Lecture 24: − Autoencoders − ICA Aykut Erdem December 2017 Hacettepe University

Last time… Dimensionality Reduction • Clustering - One way to summarize a complex real-valued data point with a single categorical variable   • Dimensionality reduction - Another way to simplify complex high-dimensional data - Summarize data with a lower dimensional real valued vector • Given data points in d dimensions • Convert them to data points in r<d dims slide by Fereshteh Sadeghi • With minimal loss of information 2

Last time… Principal Component Analysis • PCA Vectors originate from the center of mass.   • Principal component #1: points in the direction of the largest variance .   • Each subsequent principal component - is orthogonal to the previous ones, and - points in the directions of the largest variance of the residual subspace slide by Barnabás Póczos and Aarti Singh 3

Last time… PCA Applications 2 2 2 2 4 4 4 4 6 6 6 6 8 8 8 8 10 10 10 10 12 12 12 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 2 2 2 4 4 4 4 6 6 6 6 8 8 8 8 10 10 10 10 12 12 12 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 2 2 2 4 4 4 4 6 6 6 6 8 8 8 8 10 10 10 10 12 12 12 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 2 2 2 4 4 4 4 6 6 6 6 8 8 8 8 10 10 10 10 12 12 12 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 Face Recognition Image Compression x x’ U x Noise Filtering 4

Today • PCA shortcomings • Autoencoders • ICA 5

PCA Shortcomings 6

Problematic Data Set for PCA • PCA doesn’t know labels! slide by Barnabás Póczos and Aarti Singh 7 PCA ¡doesn’t ¡know ¡labels!

PCA vs. Fisher Linear Discriminant Principal Component Analysis • • • higher variance   • • • bad for discriminability • • Fisher Linear Discriminant ysis • smaller variance   • • slide by Javier Hernandez Rivera • good discriminability • • • • 8

Problematic Data Set for PCA • PCA cannot capture NON-LINEAR structure! slide by Barnabás Póczos and Aarti Singh 9

PCA Conclusions • PCA - Finds orthonormal basis for data - Sorts dimensions in order of “importance” - Discard low significance dimensions   • Uses: - Get compact description - Ignore noise - Improve classification (hopefully)   slide by Barnabás Póczos and Aarti Singh • Not magic: - Doesn’t know class labels - Can only capture linear variations   • One of many tricks to reduce dimensionality! 10

Autoencoders 11

            Relation to Neural Networks • PCA is closely related to a particular form of neural network • An autoencoder is a neural network whose outputs are its own inputs   slide by Sanja Fidler • The goal is to minimize reconstruction error 12

  Auto encoders • Define   ˆ z = f ( W x ); x = g ( V z ) slide by Sanja Fidler 13

      Auto encoders • Define   ˆ z = f ( W x ); x = g ( V z ) • Goal:   N 1 X || x ( n ) − ˆ x ( n ) || 2 min 2 N W , V n =1 slide by Sanja Fidler 14

            Auto encoders • Define   ˆ z = f ( W x ); x = g ( V z ) • Goal:   N 1 X || x ( n ) − ˆ x ( n ) || 2 min 2 N W , V n =1 • If g and f are linear   N 1 X || x ( n ) − VW x ( n ) || 2 min 2 N W , V n =1 slide by Sanja Fidler 15

            Auto encoders • Define   ˆ z = f ( W x ); x = g ( V z ) • Goal:   N 1 X || x ( n ) − ˆ x ( n ) || 2 min 2 N W , V n =1 • If g and f are linear   N 1 X || x ( n ) − VW x ( n ) || 2 min 2 N W , V n =1 slide by Sanja Fidler • In other words, the optimal solution is PCA 16

Auto encoders: Nonlinear PCA • What if g ( ) is not linear? • Then we are basically doing nonlinear PCA • Some subtleties but in general this is an accurate description slide by Sanja Fidler 17

Comparing Reconstructions Real data 30-d deep autoencoder 30-d logistic PCA 30-d PCA slide by Sanja Fidler 18

Independent Component   Analysis (ICA) 19

      A Serious Limitation of PCA • Recall that PCA looks at the   covariance matrix only.   What if the data is not well   described by the covariance   matrix?   slide by Kornel Laskowski and Dave Touretzky • The only distribution which is uniquely specified by its covariance (with the subtracted mean) is the Gaussian distribution. Distributions which deviate from the Gaussian are poorly described by their covariances. 20

Faithful vs Meaningful Representations • Even with non-Gaussian data, variance maximization leads to the most faithful representation in a reconstruction error sense (recall that we trained our autoencoder network using a mean-square error in an input reconstruction layer).   • The mean-square error measure implicitly assumes Gaussianity, since it penalizes datapoints close to the mean less that those that are far away.   • But it does not in general lead to the most meaningful slide by Kornel Laskowski and Dave Touretzky representation.   • We need to perform gradient descent in some function other than the reconstruction error. 21

          A Criterion Stronger than Decorrelation • The way to circumvent these problems is to look for components which are statistically independent, rather than just uncorrelated.   • For statistical independence, we require that   N � p ( ξ 1 , ξ 2 , · · · , ξ N ) = p ( ξ i ) i =1 • For uncorrelatedness, all we required was that   = 0 , i ̸ = j � ξ i ξ j � − � ξ i �� ξ j � slide by Kornel Laskowski and Dave Touretzky • Independence is a stronger requirement; under independence,   � g 1 ( ξ i ) g 2 ( ξ j ) � − � g 1 ( ξ i ) �� g 2 ( ξ j ) � = 0 , i ̸ = j for any functions g 1 and g 2 . 22

Independent Component Analysis (ICA) • Like PCA, except that we’re looking for a transformation subject to the stronger requirement of independence, rather than uncorrelatedness.   • In general, no analytic solution (like eigenvalue decomposition for PCA) exists, so ICA is implemented using neural network models.   • To do this, we need an architecture and an objective function to descend/climb in.   • Leads to N independent (or as independent as possible) components in N -dimensional space; they need not be orthogonal.   slide by Kornel Laskowski and Dave Touretzky • When are independent components identical to uncorrelated (principal) components? When the generative distribution is uniquely determined by its first and second moments. This is true of only the Gaussian distribution. 23

          Neural Network for ICA • Single layer network:   • Patterns { ξ } are fed into the input layer.   slide by Kornel Laskowski and Dave Touretzky • Inputs multiplied by weights in matrix W .   • Output logistic (vector notation here): 1 y = ¯ 1 + e W T ¯ ξ 24

              Objective Function for ICA • Want to ensure that the outputs y i are maximally independent. • This is identical to requiring that the mutual information be small. Or alternately that the joint entropy be large.   entropy of distribution p of first   H ( p ) = neuron’s output   conditional entropy H ( p | q ) = H ( p ) − H ( q | p )   I ( p ; q ) = H ( q ) − H ( p | q )   = slide by Kornel Laskowski and Dave Touretzky mutual information = • Gradient ascent in this objective function is called infomax (we’re trying to maximize the enclosed area representing information quantities). 25

    Blind Source Separation (BSS) • The most famous application of ICA.   • Have K sources { s k [ t ]} , and K signals { x k [ t ]} . Both { s k [ t ]} and { x k [ t ]} are time series ( t is a discrete time index).   • Each signal is a linear mixture of the sources   x k [ t ] = A s k [ t ] + n k [ t ] where n k [ t ] is the noise contribution in the kth signal slide by Kornel Laskowski and Dave Touretzky x k [ t ] , and A is a mixture matrix.   • The problem: given x k [ n ] , determine A and s k [ n ] . 26

The Cocktail Party Sources Observation ICA Estimation Mixing slide by Barnabás Póczos and Aarti Singh y(t)=Wx(t) x(t) = As(t) s(t) 6 27

Demo: The Cocktail Party • Frequency domain ICA (1995) Input mix: Extracted speech: Paris Smaragdis http://paris.cs.illinois.edu/demos/index.html 28

Lecture 24: Autoencoders ICA Aykut Erdem December 2017 Hacettepe - PowerPoint PPT Presentation

Lecture 24: Autoencoders ICA Aykut Erdem December 2017 Hacettepe University Last time Dimensionality Reduction Clustering - One way to summarize a complex real-valued data point with a single categorical variable

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

Pocket Lecture Pocket Lecture Pocket Lecture Pocket Lecture Listen Audio Notes Progress

Multiphase Modelling in Cancer Helen Byrne Wolfson Centre for Mathematical Biology Mathematical

Previous Lecture Todays Lecture Slides for Lecture 5 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 30 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 28 Completion of divide-by-3 counter

Previous Lecture Todays Lecture Slides for Lecture 12 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 3 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 2 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 35 ENEL 353: Digital Circuits Fall

Lecture Capture Introduction to Lecture Capture Learning Outcomes What will lecture capture

Previous Lecture Todays Lecture Slides for Lecture 32 Completion of a timing analysis

Repetition Automatic Control, Basic Course, Lecture 11 Fredrik Bagge Carlson December 17, 2016

Previous Lecture Todays Lecture Slides for Lecture 26 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 33 ENEL 353: Digital Circuits Fall

Introduction to Machine Learning CMU-10701 20. Independent Component Analysis Barnabs Pczos

A Bioshock 2 Post-Mortem Michael Kamper 2K Marin Audio Lead Michael Csurics 2K Marin Dialogue

RENDERING TECHNIQUES 22 Mar. 2012 Yanir Kleiman What is 3D Graphics? Why 3D? Draw one frame at

SEPECC Meeting 9:25 Advocacy Skill Building 9:40 Racial & Social Justice Tuesday, August 4,

Deterministic Independent Component Analysis (ICA) Ruitong Huang Andrs Gyrgy Csaba

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II

ICA Q&A session Jonathan Bowdler Head of Regulatory Compliance Objectives of the session 1)

PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2018 Soleymani

Lecture 24: Autoencoders ICA Aykut Erdem December 2017 Hacettepe - PowerPoint PPT Presentation

Lecture 24: Autoencoders ICA Aykut Erdem December 2017 Hacettepe University Last time Dimensionality Reduction Clustering - One way to summarize a complex real-valued data point with a single categorical variable

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

Pocket Lecture Pocket Lecture Pocket Lecture Pocket Lecture Listen Audio Notes Progress

Multiphase Modelling in Cancer Helen Byrne Wolfson Centre for Mathematical Biology Mathematical

Previous Lecture Todays Lecture Slides for Lecture 5 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 30 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 28 Completion of divide-by-3 counter

Previous Lecture Todays Lecture Slides for Lecture 12 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 3 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 2 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 35 ENEL 353: Digital Circuits Fall

Lecture Capture Introduction to Lecture Capture Learning Outcomes What will lecture capture

Previous Lecture Todays Lecture Slides for Lecture 32 Completion of a timing analysis

Repetition Automatic Control, Basic Course, Lecture 11 Fredrik Bagge Carlson December 17, 2016

Previous Lecture Todays Lecture Slides for Lecture 26 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 33 ENEL 353: Digital Circuits Fall

Introduction to Machine Learning CMU-10701 20. Independent Component Analysis Barnabs Pczos

A Bioshock 2 Post-Mortem Michael Kamper 2K Marin Audio Lead Michael Csurics 2K Marin Dialogue

RENDERING TECHNIQUES 22 Mar. 2012 Yanir Kleiman What is 3D Graphics? Why 3D? Draw one frame at

SEPECC Meeting 9:25 Advocacy Skill Building 9:40 Racial &amp; Social Justice Tuesday, August 4,

Deterministic Independent Component Analysis (ICA) Ruitong Huang Andrs Gyrgy Csaba

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II

ICA Q&amp;A session Jonathan Bowdler Head of Regulatory Compliance Objectives of the session 1)

PCA &amp; ICA CE-717: Machine Learning Sharif University of Technology Spring 2018 Soleymani

SEPECC Meeting 9:25 Advocacy Skill Building 9:40 Racial & Social Justice Tuesday, August 4,

ICA Q&A session Jonathan Bowdler Head of Regulatory Compliance Objectives of the session 1)

PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2018 Soleymani