Introduction to Machine Learning CMU-10701 20. Independent - - PowerPoint PPT Presentation

introduction to machine learning
SMART_READER_LITE
LIVE PREVIEW

Introduction to Machine Learning CMU-10701 20. Independent - - PowerPoint PPT Presentation

Introduction to Machine Learning CMU-10701 20. Independent Component Analysis Barnabs Pczos Contents ICA model ICA applications ICA generalizations ICA theory 2 Independent Component Analysis 3 Independent Component


slide-1
SLIDE 1

Introduction to Machine Learning CMU-10701

  • 20. Independent Component Analysis

Barnabás Póczos

slide-2
SLIDE 2

2

Contents

 ICA model  ICA applications  ICA generalizations  ICA theory

slide-3
SLIDE 3

3

Independent Component Analysis

slide-4
SLIDE 4

4

Goal:

Independent Component Analysis

slide-5
SLIDE 5

5

Observations (Mixtures)

  • riginal signals

Model

ICA estimated signals

Independent Component Analysis

slide-6
SLIDE 6

6

We observe Model We want Goal:

Independent Component Analysys

slide-7
SLIDE 7

7

  • Perform linear transformations
  • Matrix factorization

X U S X A S

PCA: low rank matrix factorization for compression ICA: full rank matrix factorization to remove dependency among the rows = = N N

N M

M<N

ICA vs PCA, Similarities

slide-8
SLIDE 8

8

 PCA: X=US, UTU=I  ICA: X=AS  PCA does compression

  • M<N

 ICA does not do compression

  • same # of features (M=N)

 PCA just removes correlations, not higher order dependence  ICA removes correlations, and higher order dependence  PCA: some components are more important than others (based on eigenvalues)  ICA: components are equally important

ICA vs PCA, Similarities

slide-9
SLIDE 9

9

Note

  • PCA vectors are orthogonal
  • ICA vectors are not orthogonal

ICA vs PCA

slide-10
SLIDE 10

10

ICA vs PCA

slide-11
SLIDE 11

11

PCA Estimation Sources Observation

x(t) = As(t) s(t)

Mixing

y(t)=Wx(t)

The Cocktail Party Problem SOLVING WITH PCA

slide-12
SLIDE 12

12

ICA Estimation Sources Observation

x(t) = As(t) s(t)

Mixing

y(t)=Wx(t)

The Cocktail Party Problem SOLVING WITH ICA

slide-13
SLIDE 13

13

STATIC

  • Image denoising
  • Microarray data processing
  • Decomposing the spectra of

galaxies

  • Face recognition
  • Facial expression recognition
  • Feature extraction
  • Clustering
  • Classification
  • Deep Neural Networks

TEMPORAL

  • Medical signal processing – fMRI,

ECG, EEG

  • Brain Computer Interfaces
  • Modeling of the hippocampus,

place cells

  • Modeling of the visual cortex
  • Time series analysis
  • Financial applications
  • Blind deconvolution

Some ICA Applications

slide-14
SLIDE 14

14

 EEG ~ Neural cocktail party  Severe contamination of EEG activity by

  • eye movements
  • blinks
  • muscle
  • heart, ECG artifact
  • vessel pulse
  • electrode noise
  • line noise, alternating current (60 Hz)

 ICA can improve signal

  • effectively detect, separate and remove activity in EEG

records from a wide variety of artifactual sources.

(Jung, Makeig, Bell, and Sejnowski)

 ICA weights help find location of sources

ICA Application, Removing Artifacts from EEG

slide-15
SLIDE 15

15

Fig from Jung

ICA Application, Removing Artifacts from EEG

slide-16
SLIDE 16

16

Fig from Jung

Removing Artifacts from EEG

slide-17
SLIDE 17

17 17

  • riginal

noisy Wiener filtered median filtered ICA denoised

ICA for Image Denoising

(Hoyer, Hyvarinen)

slide-18
SLIDE 18

18

 Method for analysis and synthesis of human motion from motion captured data  Provides perceptually meaningful components  109 markers, 327 parameters ) 6 independent components (emotion, content,…)

ICA for Motion Style Components

(Mori & Hoshino 2002, Shapiro et al 2006, Cao et al 2003)

slide-19
SLIDE 19

19 19

walk sneaky walk with sneaky sneaky with walk

slide-20
SLIDE 20

20

Gabor wavelets, edge detection, receptive fields of V1 cells...

ICA basis vectors extracted from natural images

slide-21
SLIDE 21

21

PCA basis vectors extracted from natural images

slide-22
SLIDE 22

22

ICA Theory

slide-23
SLIDE 23

23

 uncorrelated and independent variables  entropy, joint entropy, negentropy  mutual information  Kullback-Leibler divergence

Basic terms, definitions

slide-24
SLIDE 24

24

Proof: Homework

Definition: Lemma: Definition:

Statistical (in)dependence

slide-25
SLIDE 25

25

Definition: Lemma:

Proof: Homework

Lemma:

Proof: Homework

Lemma:

Proof: Homework

Correlation

slide-26
SLIDE 26

26

Definition (Mutual Information) Definition (Shannon entropy) Definition (KL divergence)

Mutual Information, Entropy

slide-27
SLIDE 27

27

Solving the ICA problem with i.i.d. sources

slide-28
SLIDE 28

28

Solving the ICA problem with i.i.d. sources

slide-29
SLIDE 29

29

Theorem (Whitening) Definitions Note

Whitening

slide-30
SLIDE 30

30

Proof of the whitening theorem

slide-31
SLIDE 31

31

We can use PCA for whitening!

Proof of the whitening theorem

slide-32
SLIDE 32

32

whitened

  • riginal

mixed

Whitening solves half of the ICA problem

Note: The number of free parameters of an N by N orthogonal matrix is (N-1)(N-2)/2. ) whitening solves half of the ICA problem

slide-33
SLIDE 33

33

 Remove mean, E[x]=0  Whitening, E[xxT]=I  Find an orthogonal W optimizing an objective function

  • Sequence of 2-d Jacobi (Givens) rotations

 find y (the estimation of s),  find W (the estimation of A-1) ICA solution: y=Wx ICA task: Given x,

  • riginal

mixed whitened rotated (demixed)

Solving ICA

slide-34
SLIDE 34

34

p q p q

Optimization Using Jacobi Rotation Matrices

slide-35
SLIDE 35

35

The Gaussian distribution is spherically symmetric. Mixing it with an orthogonal matrix… produces the same distribution...

However, this is the only ‘nice’ distribution that we cannot recover!  No hope for recovery... 

Gaussian sources are problematic

slide-36
SLIDE 36

36 36

) go away from normal distribution

ICA Cost Functions

slide-37
SLIDE 37

37 37

The sum of independent variables converges to the normal distribution

) For separation go far away from the normal distribution ) Negentropy, |kurtozis| maximization

Figs borrowed from Ata Kaban

Central Limit Theorem

slide-38
SLIDE 38

38

ICA Algorithms

slide-39
SLIDE 39

39

There are more than 100 different ICA algorithms…

  • Mutual information (MI) estimation
  • Kernel-ICA [Bach & Jordan, 2002]
  • Entropy, negentropy estimation
  • Infomax ICA [Bell & Sejnowski 1995]
  • RADICAL [Learned-Miller & Fisher, 2003]
  • FastICA [Hyvarinen, 1999]
  • [Girolami & Fyfe 1997]
  • ML estimation
  • KDICA [Chen, 2006]
  • EM-ICA [Welling]
  • [MacKay 1996; Pearlmutter & Parra 1996; Cardoso 1997]
  • Higher order moments, cumulants based methods
  • JADE [Cardoso, 1993]
  • Nonlinear correlation based methods
  • [Jutten and Herault, 1991]

Algorithms

slide-40
SLIDE 40

40

David J.C. MacKay (97) rows of W

Maximum Likelihood ICA Algorithm

slide-41
SLIDE 41

41

Kurtosis = 4th order cumulant Measures

  • the distance from normality
  • the degree of peakedness

ICA algorithm based on Kurtosis maximization

slide-42
SLIDE 42

42

Probably the most famous ICA algorithm

The Fast ICA algorithm (Hyvarinen)

slide-43
SLIDE 43

43

Independent Subspace Analysis

slide-44
SLIDE 44

44

Original Separated Mixed Hinton diagram

Independent Subspace Analysis

slide-45
SLIDE 45

45

Numerical Simulations 2D Letters (i.i.d.)

Sources Observation Estimated sources Performance matrix

Independent Subspace Analysis

slide-46
SLIDE 46

46

Independent Subspace Analysis

slide-47
SLIDE 47

47

Thanks for the Attention!