Introduction to Machine Learning 10701 Independent Component - - PowerPoint PPT Presentation

introduction to machine learning 10701
SMART_READER_LITE
LIVE PREVIEW

Introduction to Machine Learning 10701 Independent Component - - PowerPoint PPT Presentation

Introduction to Machine Learning 10701 Independent Component Analysis Barnabs Pczos & Aarti Singh Independent Component Analysis 2 Independent Component Analysis Model Observations (Mixtures) original signals ICA estimated signals


slide-1
SLIDE 1

Introduction to Machine Learning 10701

Independent Component Analysis

Barnabás Póczos & Aarti Singh

slide-2
SLIDE 2

2

Independent Component Analysis

slide-3
SLIDE 3

3

Observations (Mixtures)

  • riginal signals

Model

ICA estimated signals

Independent Component Analysis

slide-4
SLIDE 4

4

We observe Model We want Goal:

Independent Component Analysys

slide-5
SLIDE 5

5

PCA Estimation Sources Observation

x(t) = As(t) s(t)

Mixing

y(t)=Wx(t)

The Cocktail Party Problem

SOLVI NG WI TH PCA

slide-6
SLIDE 6

6

ICA Estimation Sources Observation

x(t) = As(t) s(t)

Mixing

y(t)=Wx(t)

The Cocktail Party Problem

SOLVI NG WI TH I CA

slide-7
SLIDE 7

7

  • Perform linear transformations
  • Matrix factorization

X U S X A S

PCA: low rank matrix factorization for compression ICA: full rank matrix factorization to remove dependency among the rows = = N N

N M

M<N

ICA vs PCA, Similarities

Columns of U = PCA vectors Columns of A = ICA vectors

slide-8
SLIDE 8

8

 PCA: X= US, UTU= I  ICA: X= AS, A is invertible  PCA does compression

  • M< N

 ICA does not do compression

  • same # of features (M= N)

 PCA just removes correlations, not higher order dependence  ICA removes correlations, and higher order dependence  PCA: some components are more important than others

(based on eigenvalues)

 ICA: components are equally important

ICA vs PCA, Similarities

slide-9
SLIDE 9

9

Note

  • PCA vectors are orthogonal
  • ICA vectors are not orthogonal

ICA vs PCA

slide-10
SLIDE 10

10

ICA vs PCA

slide-11
SLIDE 11

11

Gabor wavelets, edge detection, receptive fields of V1 cells..., deep neural networks

ICA basis vectors extracted from natural images

slide-12
SLIDE 12

12

PCA basis vectors extracted from natural images

slide-13
SLIDE 13

13

STATI C

  • Image denoising
  • Microarray data processing
  • Decomposing the spectra of

galaxies

  • Face recognition
  • Facial expression recognition
  • Feature extraction
  • Clustering
  • Classification
  • Deep Neural Networks

TEMPORAL

  • Medical signal processing – fMRI,

ECG, EEG

  • Brain Computer Interfaces
  • Modeling of the hippocampus,

place cells

  • Modeling of the visual cortex
  • Time series analysis
  • Financial applications
  • Blind deconvolution

Some ICA Applications

slide-14
SLIDE 14

14

 EEG ~ Neural cocktail party  Severe con

  • nt am inat ion
  • n of EEG activity by
  • eye movements
  • blinks
  • muscle
  • heart, ECG artifact
  • vessel pulse
  • electrode noise
  • line noise, alternating current (60 Hz)

 ICA can improve signal

  • effectively det ect

ct , separat e and rem ove activity in EEG records from a wide variety of artifactual sources.

(Jung, Makeig, Bell, and Sejnowski)

 ICA weights (mixing matrix) help find location of sources

ICA Application, Removing Artifacts from EEG

slide-15
SLIDE 15

15

Fig from Jung

ICA Application, Removing Artifacts from EEG

slide-16
SLIDE 16

16

Fig from Jung

Removing Artifacts from EEG

slide-17
SLIDE 17

17

17

  • riginal

noisy Wiener filtered median filtered ICA denoised

ICA for Image Denoising

(Hoyer, Hyvarinen)

slide-18
SLIDE 18

18

 Method for analysis and synthesis of human motion from

motion captured data

 Provides perceptually meaningful “style” components  109 markers, (327dim data)  Motion capture ) data matrix Goal: Find motion style components.

ICA ) 6 independent components (emotion, content,…)

ICA for Motion Style Components

(Mori & Hoshino 2002, Shapiro et al 2006, Cao et al 2003)

slide-19
SLIDE 19

19

walk sneaky walk with sneaky sneaky with walk

slide-20
SLIDE 20

20

ICA Theory

slide-21
SLIDE 21

21

Definition (Independence)

Statistical (in)dependence

Definition (Mutual Information) Definition (Shannon entropy) Definition (KL divergence)

slide-22
SLIDE 22

22

Solving the ICA problem with i.i.d. sources

slide-23
SLIDE 23

23

Solving the ICA problem

slide-24
SLIDE 24

24

Whitening

(We assumed centered data)

slide-25
SLIDE 25

25

Whitening

We have,

slide-26
SLIDE 26

26

whitened

  • riginal

mixed

Whitening solves half of the ICA problem

Note: The number of free parameters of an N by N orthogonal matrix is (N-1)(N-2)/2. ) whitening solves half of the ICA problem

slide-27
SLIDE 27

27

 Remove mean, E[x]= 0  Whitening, E[xxT]= I  Find an orthogonal W optimizing an objective function

  • Sequence of 2-d Jacobi (Givens) rotations

 find y (the estimation of s),  find W (the estimation of A-1) I CA solution: y= Wx I CA task: Given x,

  • riginal

mixed whitened rotated (demixed)

Solving ICA

slide-28
SLIDE 28

28

p q p q

Optimization Using Jacobi Rotation Matrices

slide-29
SLIDE 29

29

ICA Cost Functions

Proof: Homework

Lemma

Therefore,

slide-30
SLIDE 30

30

) go away from normal distribution

ICA Cost Functions

Therefore,

The covariance is fixed: I. Which distribution has the largest entropy?

slide-31
SLIDE 31

31

31

The sum of independent variables converges to the normal distribution

) For separation go far away from the normal distribution ) Negentropy, |kurtozis| maximization

Figs from Ata Kaban

Central Limit Theorem

slide-32
SLIDE 32

32

ICA Algorithms

slide-33
SLIDE 33

33

rows of W

Maximum Likelihood ICA Algorithm

David J.C. MacKay (97)

slide-34
SLIDE 34

34

Maximum Likelihood ICA Algorithm

slide-35
SLIDE 35

35

Kurtosis = 4th order cumulant

Measures

  • the distance from normality
  • the degree of peakedness

ICA algorithm based on Kurtosis maximization

slide-36
SLIDE 36

36

Probably the most famous ICA algorithm

The Fast ICA algorithm (Hyvarinen)

(¸ Lagrange multiplier)

Solve this equation by Newton–Raphson’s method.

slide-37
SLIDE 37

37

Newton method for finding a root

slide-38
SLIDE 38

38

Newton Method for Finding a Root

Linear Approximation (1st order Taylor approx): Goal: Therefore,

slide-39
SLIDE 39

39

Illustration of Newton’s method

Goal: finding a root

In the next step we will linearize here in x

slide-40
SLIDE 40

40

Example: Finding a Root

http://en.wikipedia.org/wiki/Newton%27s_method

slide-41
SLIDE 41

41

Newton Method for Finding a Root

This can be generalized to multivariate functions

Therefore, [Pseudo inverse if there is no inverse]

slide-42
SLIDE 42

42

Newton method for FastICA

slide-43
SLIDE 43

43

The Fast ICA algorithm (Hyvarinen)

Solve:

The derivative of F :

Note:

slide-44
SLIDE 44

44

The Fast ICA algorithm (Hyvarinen)

Therefore, The Jacobian matrix becomes diagonal, and can easily be inverted.

slide-45
SLIDE 45

45

Other Nonlinearities

slide-46
SLIDE 46

46

Other Nonlinearities

Newton method: Algorithm:

slide-47
SLIDE 47

47

Fast ICA for several units