Introduction to Machine Learning CMU-10701 20. Independent - PowerPoint PPT Presentation

Introduction to Machine Learning CMU-10701 20. Independent Component Analysis Barnabás Póczos

Contents  ICA model  ICA applications  ICA generalizations  ICA theory 2

Independent Component Analysis 3

Independent Component Analysis Goal: 4

Independent Component Analysis Model Observations (Mixtures) original signals ICA estimated signals 5

Independent Component Analysys Model We observe We want Goal: 6

ICA vs PCA, Similarities • Perform linear transformations • Matrix factorization PCA : low rank matrix factorization for compression S M<N N U X = M ICA : full rank matrix factorization to remove dependency among the rows S A X N = 7 N

ICA vs PCA, Similarities  PCA: X=US, U T U=I  ICA: X=AS  PCA does compression M<N •  ICA does not do compression same # of features (M=N) •  PCA just removes correlations, not higher order dependence  ICA removes correlations, and higher order dependence  PCA: some components are more important than others (based on eigenvalues)  ICA: components are equally important 8

ICA vs PCA Note • PCA vectors are orthogonal • ICA vectors are not orthogonal 9

ICA vs PCA 10

The Cocktail Party Problem SOLVING WITH PCA Sources Observation PCA Estimation Mixing x(t) = As(t) y(t)=Wx(t) s(t) 11

The Cocktail Party Problem SOLVING WITH ICA Sources Observation ICA Estimation Mixing x(t) = As(t) y(t)=Wx(t) s(t) 12

Some ICA Applications TEMPORAL STATIC • Image denoising • Medical signal processing – fMRI, ECG, EEG • Microarray data processing • Brain Computer Interfaces • Decomposing the spectra of • Modeling of the hippocampus, galaxies place cells • Face recognition • Modeling of the visual cortex • Facial expression recognition • Time series analysis • Feature extraction • Financial applications • Clustering • Blind deconvolution • Classification • Deep Neural Networks 13

ICA Application, Removing Artifacts from EEG  EEG ~ Neural cocktail party  Severe contamination of EEG activity by • eye movements • blinks • muscle • heart, ECG artifact • vessel pulse • electrode noise • line noise, alternating current (60 Hz)  ICA can improve signal • effectively detect, separate and remove activity in EEG records from a wide variety of artifactual sources. (Jung, Makeig, Bell, and Sejnowski)  ICA weights help find location of sources 14

ICA Application, Removing Artifacts from EEG 15 Fig from Jung

Removing Artifacts from EEG 16 Fig from Jung

ICA for Image Denoising original noisy Wiener filtered ICA denoised (Hoyer, Hyvarinen) median filtered 17 17

ICA for Motion Style Components  Method for analysis and synthesis of human motion from motion captured data  Provides perceptually meaningful components  109 markers, 327 parameters ) 6 independent components (emotion, content,…) (Mori & Hoshino 2002, Shapiro et al 2006, Cao et al 2003) 18

walk sneaky sneaky with walk walk with sneaky 19 19

ICA basis vectors extracted from natural images Gabor wavelets, edge detection, receptive fields of V1 cells... 20

PCA basis vectors extracted from natural images 21

ICA Theory 22

Basic terms, definitions  uncorrelated and independent variables  entropy, joint entropy, negentropy  mutual information  Kullback-Leibler divergence 23

Statistical (in)dependence Definition: Definition: Lemma: Proof: Homework 24

Correlation Definition: Lemma: Proof: Homework Lemma: Proof: Homework Lemma: Proof: Homework 25

Mutual Information, Entropy Definition (Mutual Information) Definition (Shannon entropy) Definition (KL divergence) 26

Solving the ICA problem with i.i.d. sources 27

Solving the ICA problem with i.i.d. sources 28

Whitening Theorem (Whitening) Definitions Note 29

Proof of the whitening theorem 30

Proof of the whitening theorem We can use PCA for whitening! 31

Whitening solves half of the ICA problem Note: The number of free parameters of an N by N orthogonal matrix is (N-1)(N-2)/2. ) whitening solves half of the ICA problem original mixed whitened 32

Solving ICA ICA task: Given x ,  find y (the estimation of s ),  find W (the estimation of A -1 ) ICA solution : y=Wx  Remove mean, E[ x ]=0  Whitening, E[ xx T ]= I  Find an orthogonal W optimizing an objective function • Sequence of 2-d Jacobi (Givens) rotations rotated original mixed whitened 33 (demixed)

Optimization Using Jacobi Rotation Matrices p q p q 34

Gaussian sources are problematic The Gaussian distribution is spherically symmetric . Mixing it with an orthogonal matrix… produces the same distribution... No hope for recovery...  However, this is the only ‘nice’ distribution that we cannot recover!  35

ICA Cost Functions ) go away from normal distribution 36 36

Central Limit Theorem The sum of independent variables converges to the normal distribution ) For separation go far away from the normal distribution ) Negentropy, |kurtozis| maximization 37 37 Figs borrowed from Ata Kaban

ICA Algorithms 38

Algorithms There are more than 100 different ICA algorithms… • Mutual information (MI) estimation • Kernel-ICA [Bach & Jordan, 2002] • Entropy, negentropy estimation • Infomax ICA [ Bell & Sejnowski 1995 ] • RADICAL [Learned-Miller & Fisher, 2003] • FastICA [Hyvarinen, 1999] • [ Girolami & Fyfe 1997 ] • ML estimation • KDICA [Chen, 2006] • EM-ICA [Welling] • [ MacKay 1996; Pearlmutter & Parra 1996; Cardoso 1997 ] • Higher order moments, cumulants based methods • JADE [Cardoso, 1993] • Nonlinear correlation based methods • [Jutten and Herault, 1991] 39

Maximum Likelihood ICA Algorithm rows of W 40 David J.C. MacKay (97)

ICA algorithm based on Kurtosis maximization Kurtosis = 4 th order cumulant Measures • the distance from normality • the degree of peakedness 41

The Fast ICA algorithm (Hyvarinen) Probably the most famous ICA algorithm 42

Independent Subspace Analysis 43

Independent Subspace Analysis Original Mixed Separated Hinton diagram 44

Independent Subspace Analysis Numerical Simulations 2D Letters (i.i.d.) Observation Sources Estimated sources Performance matrix 45

Independent Subspace Analysis 46

Thanks for the Attention! 47

Introduction to Machine Learning CMU-10701 20. Independent - PowerPoint PPT Presentation

Introduction to Machine Learning CMU-10701 20. Independent Component Analysis Barnabs Pczos Contents ICA model ICA applications ICA generalizations ICA theory 2 Independent Component Analysis 3 Independent Component

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

A Bioshock 2 Post-Mortem Michael Kamper 2K Marin Audio Lead Michael Csurics 2K Marin Dialogue

RENDERING TECHNIQUES 22 Mar. 2012 Yanir Kleiman What is 3D Graphics? Why 3D? Draw one frame at

SEPECC Meeting 9:25 Advocacy Skill Building 9:40 Racial & Social Justice Tuesday, August 4,

Causal inference Gary Goertz Kroc Institute for International Peace Studies University of Notre

Lecture 24: Autoencoders ICA Aykut Erdem December 2017 Hacettepe University Last time

Deterministic Independent Component Analysis (ICA) Ruitong Huang Andrs Gyrgy Csaba

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II

ICA Q&A session Jonathan Bowdler Head of Regulatory Compliance Objectives of the session 1)

Introduction to Machine Learning CMU-10701 20. Independent - PowerPoint PPT Presentation

Introduction to Machine Learning CMU-10701 20. Independent Component Analysis Barnabs Pczos Contents ICA model ICA applications ICA generalizations ICA theory 2 Independent Component Analysis 3 Independent Component

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

A Bioshock 2 Post-Mortem Michael Kamper 2K Marin Audio Lead Michael Csurics 2K Marin Dialogue

RENDERING TECHNIQUES 22 Mar. 2012 Yanir Kleiman What is 3D Graphics? Why 3D? Draw one frame at

SEPECC Meeting 9:25 Advocacy Skill Building 9:40 Racial &amp; Social Justice Tuesday, August 4,

Causal inference Gary Goertz Kroc Institute for International Peace Studies University of Notre

Lecture 24: Autoencoders ICA Aykut Erdem December 2017 Hacettepe University Last time

Deterministic Independent Component Analysis (ICA) Ruitong Huang Andrs Gyrgy Csaba

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II

ICA Q&amp;A session Jonathan Bowdler Head of Regulatory Compliance Objectives of the session 1)

SEPECC Meeting 9:25 Advocacy Skill Building 9:40 Racial & Social Justice Tuesday, August 4,

ICA Q&A session Jonathan Bowdler Head of Regulatory Compliance Objectives of the session 1)