Introduction to Principal Component Analysis and Indepedent - - PowerPoint PPT Presentation

introduction to principal component analysis and
SMART_READER_LITE
LIVE PREVIEW

Introduction to Principal Component Analysis and Indepedent - - PowerPoint PPT Presentation

National Aeronautics and Space Administration Introduction to Principal Component Analysis and Indepedent Component Analysis Tristan A. Hearn Bioscience and Technology Branch, NASA Glenn Research Center May 29, 2010 www.nasa.gov National


slide-1
SLIDE 1

National Aeronautics and Space Administration

www.nasa.gov

Introduction to Principal Component Analysis and Indepedent Component Analysis

Tristan A. Hearn

Bioscience and Technology Branch, NASA Glenn Research Center

May 29, 2010

slide-2
SLIDE 2

National Aeronautics and Space Administration

www.nasa.gov

Table of Contents

Introduction Blind Seperation Problem PCA ICA Cocktail Party Problem Application to Images References

slide-3
SLIDE 3

National Aeronautics and Space Administration

www.nasa.gov

Introduction

Princial Component Analysis (PCA) and Independent Component Analysis (ICA) are both types of transformations that may be performed on a given matrix: A ∈ RM×N The basis vectors are computed to satisfy statistical properties associated with given data Orthogonal or biorthogonal

slide-4
SLIDE 4

National Aeronautics and Space Administration

www.nasa.gov

Introduction

Princial Component Analysis (PCA) and Independent Component Analysis (ICA) are both types of transformations that may be performed on a given matrix: A ∈ RM×N The basis vectors are computed to satisfy statistical properties associated with given data Orthogonal or biorthogonal

slide-5
SLIDE 5

National Aeronautics and Space Administration

www.nasa.gov

Introduction

Princial Component Analysis (PCA) and Independent Component Analysis (ICA) are both types of transformations that may be performed on a given matrix: A ∈ RM×N The basis vectors are computed to satisfy statistical properties associated with given data Orthogonal or biorthogonal

slide-6
SLIDE 6

National Aeronautics and Space Administration

www.nasa.gov

Introduction

PCA and ICA both seek a linear transformation on A such that the column (or row) vectors of A, represented in the new basis, maximize some measure related to statistical independence.

Definition

Two Random variables are said to be independent if and only if their joint density is the product of the marginal densities: fX1,X2 (x, y) = fX1 (x) fX2 (y)

slide-7
SLIDE 7

National Aeronautics and Space Administration

www.nasa.gov

The Blind Seperation Problem (BSS)

Problem

Consider solving: x1 (t) =

?

  • a11s1 (t) + . . . + a1NsN (t)

. . . xN (t) = aN1s1 (t) + . . . + aNNsN (t)

  • ?

⇒    x1 (t) . . . xN (t)    =    a11 · · · a1N . . . ... . . . aN1 · · · aNN       s1 (t) . . . sN (t)   

  • ?

⇒ AS

  • ?

= X where xi, si ∈ RM×1

slide-8
SLIDE 8

National Aeronautics and Space Administration

www.nasa.gov

The Blind Seperation Problem (BSS)

Problem

Consider solving: x1 (t) =

?

  • a11s1 (t) + . . . + a1NsN (t)

. . . xN (t) = aN1s1 (t) + . . . + aNNsN (t)

  • ?

⇒    x1 (t) . . . xN (t)    =    a11 · · · a1N . . . ... . . . aN1 · · · aNN       s1 (t) . . . sN (t)   

  • ?

⇒ AS

  • ?

= X where xi, si ∈ RM×1

slide-9
SLIDE 9

National Aeronautics and Space Administration

www.nasa.gov

The Blind Seperation Problem (BSS)

Problem

Consider solving: x1 (t) =

?

  • a11s1 (t) + . . . + a1NsN (t)

. . . xN (t) = aN1s1 (t) + . . . + aNNsN (t)

  • ?

⇒    x1 (t) . . . xN (t)    =    a11 · · · a1N . . . ... . . . aN1 · · · aNN       s1 (t) . . . sN (t)   

  • ?

⇒ AS

  • ?

= X where xi, si ∈ RM×1

slide-10
SLIDE 10

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

PCA aims to compute a ’more meaningful’ basis in which to represent given data ’More meaningful’: should reduce noise and redundancy in the data Goal: to seperate sources, filter data, and reveal ’hidden’ dynamics. PCA begins by assuming that the transformation to the new basis is linear: PX = Y ⇒ yi =    pixi . . . pixi    where xi, yi represent columns of the source and transformed data matrices X, Y and pi represents a row of the transform matrix P. So the rows of P form a new basis for the columns of X; they are the Principal Components of the given data.

slide-11
SLIDE 11

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

PCA aims to compute a ’more meaningful’ basis in which to represent given data ’More meaningful’: should reduce noise and redundancy in the data Goal: to seperate sources, filter data, and reveal ’hidden’ dynamics. PCA begins by assuming that the transformation to the new basis is linear: PX = Y ⇒ yi =    pixi . . . pixi    where xi, yi represent columns of the source and transformed data matrices X, Y and pi represents a row of the transform matrix P. So the rows of P form a new basis for the columns of X; they are the Principal Components of the given data.

slide-12
SLIDE 12

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

PCA aims to compute a ’more meaningful’ basis in which to represent given data ’More meaningful’: should reduce noise and redundancy in the data Goal: to seperate sources, filter data, and reveal ’hidden’ dynamics. PCA begins by assuming that the transformation to the new basis is linear: PX = Y ⇒ yi =    pixi . . . pixi    where xi, yi represent columns of the source and transformed data matrices X, Y and pi represents a row of the transform matrix P. So the rows of P form a new basis for the columns of X; they are the Principal Components of the given data.

slide-13
SLIDE 13

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

PCA aims to compute a ’more meaningful’ basis in which to represent given data ’More meaningful’: should reduce noise and redundancy in the data Goal: to seperate sources, filter data, and reveal ’hidden’ dynamics. PCA begins by assuming that the transformation to the new basis is linear: PX = Y ⇒ yi =    pixi . . . pixi    where xi, yi represent columns of the source and transformed data matrices X, Y and pi represents a row of the transform matrix P. So the rows of P form a new basis for the columns of X; they are the Principal Components of the given data.

slide-14
SLIDE 14

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

PCA aims to compute a ’more meaningful’ basis in which to represent given data ’More meaningful’: should reduce noise and redundancy in the data Goal: to seperate sources, filter data, and reveal ’hidden’ dynamics. PCA begins by assuming that the transformation to the new basis is linear: PX = Y ⇒ yi =    pixi . . . pixi    where xi, yi represent columns of the source and transformed data matrices X, Y and pi represents a row of the transform matrix P. So the rows of P form a new basis for the columns of X; they are the Principal Components of the given data.

slide-15
SLIDE 15

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

PCA aims to compute a ’more meaningful’ basis in which to represent given data ’More meaningful’: should reduce noise and redundancy in the data Goal: to seperate sources, filter data, and reveal ’hidden’ dynamics. PCA begins by assuming that the transformation to the new basis is linear: PX = Y ⇒ yi =    pixi . . . pixi    where xi, yi represent columns of the source and transformed data matrices X, Y and pi represents a row of the transform matrix P. So the rows of P form a new basis for the columns of X; they are the Principal Components of the given data.

slide-16
SLIDE 16

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

To minimize redundancy in the new basis, the sampled data should be un-correlated in the new basis.

Definition

Two random samples x, y are un-correlated if their sample covariance is 0: σ2

x,y =

1 n − 1 (x − ¯ x) (y − ¯ y)T = 0 and σ2

x,x > 0 is simply the variance of x.

slide-17
SLIDE 17

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

To minimize redundancy in the new basis, the sampled data should be un-correlated in the new basis.

Definition

Two random samples x, y are un-correlated if their sample covariance is 0: σ2

x,y =

1 n − 1 (x − ¯ x) (y − ¯ y)T = 0 and σ2

x,x > 0 is simply the variance of x.

slide-18
SLIDE 18

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

To minimize redundancy in the new basis, the sampled data should be un-correlated in the new basis.

Definition

n random samples y1, y2, . . . are un-correlated if their sample covariance matrix is diagonal: SY =

1 n−1

  • Y − ¯

Y1 Y − ¯ Y1 T =    a1 ... an    SY is always a square, symmetric matrix Diagonal elements are the individual variances of y1, y2, . . . Off-diagonal elements are the covariances of y1, y2, . . . SY quantifies the correlation between all possible pairings of {y1, . . . , yn}

slide-19
SLIDE 19

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

To minimize redundancy in the new basis, the sampled data should be un-correlated in the new basis.

Definition

n random samples y1, y2, . . . are un-correlated if their sample covariance matrix is diagonal: SY =

1 n−1

  • Y − ¯

Y1 Y − ¯ Y1 T =    a1 ... an    SY is always a square, symmetric matrix Diagonal elements are the individual variances of y1, y2, . . . Off-diagonal elements are the covariances of y1, y2, . . . SY quantifies the correlation between all possible pairings of {y1, . . . , yn}

slide-20
SLIDE 20

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

To minimize redundancy in the new basis, the sampled data should be un-correlated in the new basis.

Definition

n random samples y1, y2, . . . are un-correlated if their sample covariance matrix is diagonal: SY =

1 n−1

  • Y − ¯

Y1 Y − ¯ Y1 T =    a1 ... an    SY is always a square, symmetric matrix Diagonal elements are the individual variances of y1, y2, . . . Off-diagonal elements are the covariances of y1, y2, . . . SY quantifies the correlation between all possible pairings of {y1, . . . , yn}

slide-21
SLIDE 21

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

To minimize redundancy in the new basis, the sampled data should be un-correlated in the new basis.

Definition

n random samples y1, y2, . . . are un-correlated if their sample covariance matrix is diagonal: SY =

1 n−1

  • Y − ¯

Y1 Y − ¯ Y1 T =    a1 ... an    SY is always a square, symmetric matrix Diagonal elements are the individual variances of y1, y2, . . . Off-diagonal elements are the covariances of y1, y2, . . . SY quantifies the correlation between all possible pairings of {y1, . . . , yn}

slide-22
SLIDE 22

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

To minimize redundancy in the new basis, the sampled data should be un-correlated in the new basis.

Definition

n random samples y1, y2, . . . are un-correlated if their sample covariance matrix is diagonal: SY =

1 n−1

  • Y − ¯

Y1 Y − ¯ Y1 T =    a1 ... an    SY is always a square, symmetric matrix Diagonal elements are the individual variances of y1, y2, . . . Off-diagonal elements are the covariances of y1, y2, . . . SY quantifies the correlation between all possible pairings of {y1, . . . , yn}

slide-23
SLIDE 23

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

So to remove redundancy, we must find new basis vectors (Principal Components) such that the covariance matrix of the transformed data is diagonal. PCA also assumes that the basis vectors are orthogonal, to simplify the computation of the new basis.

Definition

Two vectors x, y are said to be orthogonal if their dot product is zero: x · y =

n

  • i=1

xiyi = 0

slide-24
SLIDE 24

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

So to remove redundancy, we must find new basis vectors (Principal Components) such that the covariance matrix of the transformed data is diagonal. PCA also assumes that the basis vectors are orthogonal, to simplify the computation of the new basis.

Definition

Two vectors x, y are said to be orthogonal if their dot product is zero: x · y =

n

  • i=1

xiyi = 0

slide-25
SLIDE 25

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

Summary of assumptions: Linearity of the transformation. The sample mean and sample variance are sufficient statistics for the underlying seperation problem. Large variances in X correspond to important dynamics in the underlying system. The principal components are orthogonal.

Definition

A function T (x) is said to be a sufficient statistic for the random variable x if the conditional probability distribution of x, given T (x), is not a function of any unknown distribution parameters: P (X = x| T (x) , θ ∈ Ω) = P (X = x| T (x))

slide-26
SLIDE 26

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

Summary of assumptions: Linearity of the transformation. The sample mean and sample variance are sufficient statistics for the underlying seperation problem. Large variances in X correspond to important dynamics in the underlying system. The principal components are orthogonal.

Definition

A function T (x) is said to be a sufficient statistic for the random variable x if the conditional probability distribution of x, given T (x), is not a function of any unknown distribution parameters: P (X = x| T (x) , θ ∈ Ω) = P (X = x| T (x))

slide-27
SLIDE 27

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

Summary of assumptions: Linearity of the transformation. The sample mean and sample variance are sufficient statistics for the underlying seperation problem. Large variances in X correspond to important dynamics in the underlying system. The principal components are orthogonal.

Definition

A function T (x) is said to be a sufficient statistic for the random variable x if the conditional probability distribution of x, given T (x), is not a function of any unknown distribution parameters: P (X = x| T (x) , θ ∈ Ω) = P (X = x| T (x))

slide-28
SLIDE 28

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

Summary of assumptions: Linearity of the transformation. The sample mean and sample variance are sufficient statistics for the underlying seperation problem. Large variances in X correspond to important dynamics in the underlying system. The principal components are orthogonal.

Definition

A function T (x) is said to be a sufficient statistic for the random variable x if the conditional probability distribution of x, given T (x), is not a function of any unknown distribution parameters: P (X = x| T (x) , θ ∈ Ω) = P (X = x| T (x))

slide-29
SLIDE 29

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

Summary of assumptions: Linearity of the transformation. The sample mean and sample variance are sufficient statistics for the underlying seperation problem. Large variances in X correspond to important dynamics in the underlying system. The principal components are orthogonal.

Definition

A function T (x) is said to be a sufficient statistic for the random variable x if the conditional probability distribution of x, given T (x), is not a function of any unknown distribution parameters: P (X = x| T (x) , θ ∈ Ω) = P (X = x| T (x))

slide-30
SLIDE 30

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

Solving for the PCs: WLOG, assume ¯ X is normalized with zero mean. Seek an orthonormal matrix P (where Y = PX) such that SY =

1 n−1YYT is diagonalized. The rows of P will be the

principal components of X. So: SY = 1 n − 1Y Y T = P

  • 1

n − 1XX T

  • symmetric!

PT

slide-31
SLIDE 31

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

Solving for the PCs: WLOG, assume ¯ X is normalized with zero mean. Seek an orthonormal matrix P (where Y = PX) such that SY =

1 n−1YYT is diagonalized. The rows of P will be the

principal components of X. So: SY = 1 n − 1Y Y T = P

  • 1

n − 1XX T

  • symmetric!

PT

slide-32
SLIDE 32

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

Solving for the PCs: WLOG, assume ¯ X is normalized with zero mean. Seek an orthonormal matrix P (where Y = PX) such that SY =

1 n−1YYT is diagonalized. The rows of P will be the

principal components of X. So: SY = 1 n − 1Y Y T = P

  • 1

n − 1XX T

  • symmetric!

PT

slide-33
SLIDE 33

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

Any real, symmetric matrix is diagonalized by an orthonormal matrix of its eigenvectors. Therefore, normalizing the data matrix X and computing the eigenvectors of

1 n−1XXT = SX with give the principal

components! Best approach: the singular value decomposition

slide-34
SLIDE 34

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

Any real, symmetric matrix is diagonalized by an orthonormal matrix of its eigenvectors. Therefore, normalizing the data matrix X and computing the eigenvectors of

1 n−1XXT = SX with give the principal

components! Best approach: the singular value decomposition

slide-35
SLIDE 35

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

Any real, symmetric matrix is diagonalized by an orthonormal matrix of its eigenvectors. Therefore, normalizing the data matrix X and computing the eigenvectors of

1 n−1XXT = SX with give the principal

components! Best approach: the singular value decomposition

slide-36
SLIDE 36

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

Definition

The singular value decomposition of a real m × xn matrix X is given by: X = UΣVT where U is an m × m matrix containing the eigenvectors of XX T, V is an n × n matrix containing the eigenvectors of X TX, and Σ is an m × n matrix with the square roots of the eigenvalues of XXT along its main diagonal. The singular values σ (elements of Σ) or ordered from greatest to least, and each correspond to a basis vector in U and V . Dimension reduction: choose a minimum acceptable value for the σs; consider as the principal components only the vectors corresponding to σs larger than the chosen threshold.

slide-37
SLIDE 37

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

Definition

The singular value decomposition of a real m × xn matrix X is given by: X = UΣVT where U is an m × m matrix containing the eigenvectors of XX T, V is an n × n matrix containing the eigenvectors of X TX, and Σ is an m × n matrix with the square roots of the eigenvalues of XXT along its main diagonal. The singular values σ (elements of Σ) or ordered from greatest to least, and each correspond to a basis vector in U and V . Dimension reduction: choose a minimum acceptable value for the σs; consider as the principal components only the vectors corresponding to σs larger than the chosen threshold.

slide-38
SLIDE 38

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

Definition

The singular value decomposition of a real m × xn matrix X is given by: X = UΣVT where U is an m × m matrix containing the eigenvectors of XX T, V is an n × n matrix containing the eigenvectors of X TX, and Σ is an m × n matrix with the square roots of the eigenvalues of XXT along its main diagonal. The singular values σ (elements of Σ) or ordered from greatest to least, and each correspond to a basis vector in U and V . Dimension reduction: choose a minimum acceptable value for the σs; consider as the principal components only the vectors corresponding to σs larger than the chosen threshold.

slide-39
SLIDE 39

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

The SVD is a very important matrix factorization with a wide variety of applications. For PCA, note that: Z = 1 √ n − 1XT ⇒ ZTZ =

  • 1

√ n − 1XT T 1 √ n − 1XT

  • =

1 n − 1

  • XTT

XT = 1 n − 1XXT = SX So the matrix V given by the SVD of Z will give the eigenvectors of SX, which are the principal components! Therefore P = VT. Once P is found, the data can be transformed: Y = PX

slide-40
SLIDE 40

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

The SVD is a very important matrix factorization with a wide variety of applications. For PCA, note that: Z = 1 √ n − 1XT ⇒ ZTZ =

  • 1

√ n − 1XT T 1 √ n − 1XT

  • =

1 n − 1

  • XTT

XT = 1 n − 1XXT = SX So the matrix V given by the SVD of Z will give the eigenvectors of SX, which are the principal components! Therefore P = VT. Once P is found, the data can be transformed: Y = PX

slide-41
SLIDE 41

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

The SVD is a very important matrix factorization with a wide variety of applications. For PCA, note that: Z = 1 √ n − 1XT ⇒ ZTZ =

  • 1

√ n − 1XT T 1 √ n − 1XT

  • =

1 n − 1

  • XTT

XT = 1 n − 1XXT = SX So the matrix V given by the SVD of Z will give the eigenvectors of SX, which are the principal components! Therefore P = VT. Once P is found, the data can be transformed: Y = PX

slide-42
SLIDE 42

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

The SVD is a very important matrix factorization with a wide variety of applications. For PCA, note that: Z = 1 √ n − 1XT ⇒ ZTZ =

  • 1

√ n − 1XT T 1 √ n − 1XT

  • =

1 n − 1

  • XTT

XT = 1 n − 1XXT = SX So the matrix V given by the SVD of Z will give the eigenvectors of SX, which are the principal components! Therefore P = VT. Once P is found, the data can be transformed: Y = PX

slide-43
SLIDE 43

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

The SVD is a very important matrix factorization with a wide variety of applications. For PCA, note that: Z = 1 √ n − 1XT ⇒ ZTZ =

  • 1

√ n − 1XT T 1 √ n − 1XT

  • =

1 n − 1

  • XTT

XT = 1 n − 1XXT = SX So the matrix V given by the SVD of Z will give the eigenvectors of SX, which are the principal components! Therefore P = VT. Once P is found, the data can be transformed: Y = PX

slide-44
SLIDE 44

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

2D Example

Let x1 = [x1,1, . . . , x1,1000] , x2 = [x2,1, . . . , x2,1000] be random variables such that x1,i

i.i.d.

∼ P1 and x2,j

i.i.d.

∼ P2 ∀i, j with the two distributions P1, P2 unknown. So, x1, x2 are two different measurement types (sensors, etc) each containing 1000 measurements.

slide-45
SLIDE 45

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

2D Example

We can plot x1 vs x2 data to show that they are strongly correlated:

slide-46
SLIDE 46

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

The SVD of X = [x1, x2]T is computed to be: U =    3.77 × 10−2 · · · −3.61 × 10−2 . . . ... . . . −4.57 × 10−2 · · · 0.97    Σ = 142.85 43.61

  • V T =
  • 0.63

0.77 −0.77 0.63

slide-47
SLIDE 47

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

2D Example

PCA provides a transformation into a new basis in which the data becomes uncorrelated.

slide-48
SLIDE 48

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

3D Example

Let us introduct a new component, so that the data is 3 dimensional: x3 = x1 − x2 ⇒ x3 provides no new information about the underlying system! Thanks to the SVD, the PCA provides a mechanism for detecting this and removing the redundant dimension.

slide-49
SLIDE 49

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

3D Example

slide-50
SLIDE 50

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

The SVD of X = [x1, x2]T is computed to be: U =    3.77 × 10−2 · · · −3.61 × 10−2 . . . ... . . . −4.57 × 10−2 · · · 0.97    Σ =   142.97 73.35 4.29 × 10−14   V T =   0.61 0.77 −0.16 0.54 −0.25 0.80 −0.577 0.577 0.577  

slide-51
SLIDE 51

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

3D Example

Since the singular value corresponding to third PC is small, the contribution of that axis in the new basis is minimal ⇒ Projection onto the first two PCs is sufficient to charectorize the data!

slide-52
SLIDE 52

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

2-Source Audio Example

⊲ ⊲

slide-53
SLIDE 53

National Aeronautics and Space Administration

www.nasa.gov

Principal Component Analysis

2-Source Audio Example

⊲ ⊲

slide-54
SLIDE 54

National Aeronautics and Space Administration

www.nasa.gov

In the previous two examples, PCA was not successfull in completely seperating the mixed signals. What is needed: A transformation driven by a stronger measure of independence.

slide-55
SLIDE 55

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

ICA, like PCA, aims to compute a ’more meaningful’ basis in which to represent given data. ’More meaningful’: should reduce noise and redundancy in the data Goal: to seperate sources, filter data, and reveal ’hidden’ dynamics. ICA also begins by assuming that the transformation to the new basis is linear: WX = Y ⇒ yi =    wixi . . . wixi    where xi, yi represent columns of the source and transformed data matrices X, Y and wi represents a row of the transform matrix W. So the rows of W form a new basis for the columns of X; they are the Independent Components of the given data.

slide-56
SLIDE 56

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

ICA, like PCA, aims to compute a ’more meaningful’ basis in which to represent given data. ’More meaningful’: should reduce noise and redundancy in the data Goal: to seperate sources, filter data, and reveal ’hidden’ dynamics. ICA also begins by assuming that the transformation to the new basis is linear: WX = Y ⇒ yi =    wixi . . . wixi    where xi, yi represent columns of the source and transformed data matrices X, Y and wi represents a row of the transform matrix W. So the rows of W form a new basis for the columns of X; they are the Independent Components of the given data.

slide-57
SLIDE 57

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

ICA, like PCA, aims to compute a ’more meaningful’ basis in which to represent given data. ’More meaningful’: should reduce noise and redundancy in the data Goal: to seperate sources, filter data, and reveal ’hidden’ dynamics. ICA also begins by assuming that the transformation to the new basis is linear: WX = Y ⇒ yi =    wixi . . . wixi    where xi, yi represent columns of the source and transformed data matrices X, Y and wi represents a row of the transform matrix W. So the rows of W form a new basis for the columns of X; they are the Independent Components of the given data.

slide-58
SLIDE 58

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

ICA, like PCA, aims to compute a ’more meaningful’ basis in which to represent given data. ’More meaningful’: should reduce noise and redundancy in the data Goal: to seperate sources, filter data, and reveal ’hidden’ dynamics. ICA also begins by assuming that the transformation to the new basis is linear: WX = Y ⇒ yi =    wixi . . . wixi    where xi, yi represent columns of the source and transformed data matrices X, Y and wi represents a row of the transform matrix W. So the rows of W form a new basis for the columns of X; they are the Independent Components of the given data.

slide-59
SLIDE 59

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

ICA, like PCA, aims to compute a ’more meaningful’ basis in which to represent given data. ’More meaningful’: should reduce noise and redundancy in the data Goal: to seperate sources, filter data, and reveal ’hidden’ dynamics. ICA also begins by assuming that the transformation to the new basis is linear: WX = Y ⇒ yi =    wixi . . . wixi    where xi, yi represent columns of the source and transformed data matrices X, Y and wi represents a row of the transform matrix W. So the rows of W form a new basis for the columns of X; they are the Independent Components of the given data.

slide-60
SLIDE 60

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

ICA, like PCA, aims to compute a ’more meaningful’ basis in which to represent given data. ’More meaningful’: should reduce noise and redundancy in the data Goal: to seperate sources, filter data, and reveal ’hidden’ dynamics. ICA also begins by assuming that the transformation to the new basis is linear: WX = Y ⇒ yi =    wixi . . . wixi    where xi, yi represent columns of the source and transformed data matrices X, Y and wi represents a row of the transform matrix W. So the rows of W form a new basis for the columns of X; they are the Independent Components of the given data.

slide-61
SLIDE 61

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

However, unlike PCA: The vectors of the new basis are not assumed to be

  • rthogonal.

Directions of highest variance are not assumed to be strongly charectoristic of the underlying dynamics of the system. Measures based on higher order statistics (> 2) are assumed to be necessary to seperate the sources in a problem. There is no standard measure of independence or computational algorithm to perform ICA.

Algorithms are iterative and tend to be much more computationally expensive than the SVD. In general, well-posedness is not guaranteed.

slide-62
SLIDE 62

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

However, unlike PCA: The vectors of the new basis are not assumed to be

  • rthogonal.

Directions of highest variance are not assumed to be strongly charectoristic of the underlying dynamics of the system. Measures based on higher order statistics (> 2) are assumed to be necessary to seperate the sources in a problem. There is no standard measure of independence or computational algorithm to perform ICA.

Algorithms are iterative and tend to be much more computationally expensive than the SVD. In general, well-posedness is not guaranteed.

slide-63
SLIDE 63

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

However, unlike PCA: The vectors of the new basis are not assumed to be

  • rthogonal.

Directions of highest variance are not assumed to be strongly charectoristic of the underlying dynamics of the system. Measures based on higher order statistics (> 2) are assumed to be necessary to seperate the sources in a problem. There is no standard measure of independence or computational algorithm to perform ICA.

Algorithms are iterative and tend to be much more computationally expensive than the SVD. In general, well-posedness is not guaranteed.

slide-64
SLIDE 64

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

However, unlike PCA: The vectors of the new basis are not assumed to be

  • rthogonal.

Directions of highest variance are not assumed to be strongly charectoristic of the underlying dynamics of the system. Measures based on higher order statistics (> 2) are assumed to be necessary to seperate the sources in a problem. There is no standard measure of independence or computational algorithm to perform ICA.

Algorithms are iterative and tend to be much more computationally expensive than the SVD. In general, well-posedness is not guaranteed.

slide-65
SLIDE 65

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

However, unlike PCA: The vectors of the new basis are not assumed to be

  • rthogonal.

Directions of highest variance are not assumed to be strongly charectoristic of the underlying dynamics of the system. Measures based on higher order statistics (> 2) are assumed to be necessary to seperate the sources in a problem. There is no standard measure of independence or computational algorithm to perform ICA.

Algorithms are iterative and tend to be much more computationally expensive than the SVD. In general, well-posedness is not guaranteed.

slide-66
SLIDE 66

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

However, unlike PCA: The vectors of the new basis are not assumed to be

  • rthogonal.

Directions of highest variance are not assumed to be strongly charectoristic of the underlying dynamics of the system. Measures based on higher order statistics (> 2) are assumed to be necessary to seperate the sources in a problem. There is no standard measure of independence or computational algorithm to perform ICA.

Algorithms are iterative and tend to be much more computationally expensive than the SVD. In general, well-posedness is not guaranteed.

slide-67
SLIDE 67

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

However, unlike PCA: The vectors of the new basis are not assumed to be

  • rthogonal.

Directions of highest variance are not assumed to be strongly charectoristic of the underlying dynamics of the system. Measures based on higher order statistics (> 2) are assumed to be necessary to seperate the sources in a problem. There is no standard measure of independence or computational algorithm to perform ICA.

Algorithms are iterative and tend to be much more computationally expensive than the SVD. In general, well-posedness is not guaranteed.

slide-68
SLIDE 68

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

Also: There is no framework for reducing the dimensionality of data within ICA (must perform PCA first!) Computationally efficient estimators used to approximate higher order statistics are typically biased. The variances of the original sources cannot be recovered. The signs of the original sources cannot be recovered. Any ordering of the sources which existed prior to mixing cannot be recovered.

slide-69
SLIDE 69

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

Also: There is no framework for reducing the dimensionality of data within ICA (must perform PCA first!) Computationally efficient estimators used to approximate higher order statistics are typically biased. The variances of the original sources cannot be recovered. The signs of the original sources cannot be recovered. Any ordering of the sources which existed prior to mixing cannot be recovered.

slide-70
SLIDE 70

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

Also: There is no framework for reducing the dimensionality of data within ICA (must perform PCA first!) Computationally efficient estimators used to approximate higher order statistics are typically biased. The variances of the original sources cannot be recovered. The signs of the original sources cannot be recovered. Any ordering of the sources which existed prior to mixing cannot be recovered.

slide-71
SLIDE 71

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

Also: There is no framework for reducing the dimensionality of data within ICA (must perform PCA first!) Computationally efficient estimators used to approximate higher order statistics are typically biased. The variances of the original sources cannot be recovered. The signs of the original sources cannot be recovered. Any ordering of the sources which existed prior to mixing cannot be recovered.

slide-72
SLIDE 72

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

Also: There is no framework for reducing the dimensionality of data within ICA (must perform PCA first!) Computationally efficient estimators used to approximate higher order statistics are typically biased. The variances of the original sources cannot be recovered. The signs of the original sources cannot be recovered. Any ordering of the sources which existed prior to mixing cannot be recovered.

slide-73
SLIDE 73

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

Seek W, Y such that Y = W−1X and each row of Y maximizes some high-order measure of independence. Typical perspectives:

Maximum liklihood Direct high-order moments Maximization of mutual information Maximization of negative information entropy

The optimization for any choice of the above measures is motivated by the Central Limit Theorem.

slide-74
SLIDE 74

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

Seek W, Y such that Y = W−1X and each row of Y maximizes some high-order measure of independence. Typical perspectives:

Maximum liklihood Direct high-order moments Maximization of mutual information Maximization of negative information entropy

The optimization for any choice of the above measures is motivated by the Central Limit Theorem.

slide-75
SLIDE 75

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

Seek W, Y such that Y = W−1X and each row of Y maximizes some high-order measure of independence. Typical perspectives:

Maximum liklihood Direct high-order moments Maximization of mutual information Maximization of negative information entropy

The optimization for any choice of the above measures is motivated by the Central Limit Theorem.

slide-76
SLIDE 76

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

Seek W, Y such that Y = W−1X and each row of Y maximizes some high-order measure of independence. Typical perspectives:

Maximum liklihood Direct high-order moments Maximization of mutual information Maximization of negative information entropy

The optimization for any choice of the above measures is motivated by the Central Limit Theorem.

slide-77
SLIDE 77

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

Seek W, Y such that Y = W−1X and each row of Y maximizes some high-order measure of independence. Typical perspectives:

Maximum liklihood Direct high-order moments Maximization of mutual information Maximization of negative information entropy

The optimization for any choice of the above measures is motivated by the Central Limit Theorem.

slide-78
SLIDE 78

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

Seek W, Y such that Y = W−1X and each row of Y maximizes some high-order measure of independence. Typical perspectives:

Maximum liklihood Direct high-order moments Maximization of mutual information Maximization of negative information entropy

The optimization for any choice of the above measures is motivated by the Central Limit Theorem.

slide-79
SLIDE 79

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

Seek W, Y such that Y = W−1X and each row of Y maximizes some high-order measure of independence. Typical perspectives:

Maximum liklihood Direct high-order moments Maximization of mutual information Maximization of negative information entropy

The optimization for any choice of the above measures is motivated by the Central Limit Theorem.

slide-80
SLIDE 80

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

Central Limit Theorem (Lyapunov)

Let Xn, n ∈ N be any sequence of independent random variables; each with finite mean µn and variance σ2

  • n. Define S2

N = n i=1 σ2 i .

If for some δ > 0 the expectations E

  • [Xk]2+δ

are finite for every k ∈ N and the condition lim

N→∞ 1 S2+δ

N

N

  • i=1

E

  • [Xn − µn]2+δ

= 0 is satisfied, then: N

i=1 (Xn − µn)

Sn

distr.

→ Normal (0, 1) as N → ∞

slide-81
SLIDE 81

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

Heuristic arguement: The sum of any group of independent random variables is ’more gaussian’ they any of the individual random variables. Assume that none of the original sources has a gaussian distribution:

Then minimizing gaussinity w.r.t. higher order statistical measures should seperate the sources in X!

slide-82
SLIDE 82

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

Heuristic arguement: The sum of any group of independent random variables is ’more gaussian’ they any of the individual random variables. Assume that none of the original sources has a gaussian distribution:

Then minimizing gaussinity w.r.t. higher order statistical measures should seperate the sources in X!

slide-83
SLIDE 83

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

Heuristic arguement: The sum of any group of independent random variables is ’more gaussian’ they any of the individual random variables. Assume that none of the original sources has a gaussian distribution:

Then minimizing gaussinity w.r.t. higher order statistical measures should seperate the sources in X!

slide-84
SLIDE 84

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

Definition

The Kurtosis of a a random variable x is defined to be: κ (x) = E

  • x4

− 3

  • E
  • y22

Kurtosis is a measure of ’peakedness’ and thickness of tails for a distribution. Note that if x is gaussian: κ (x) = 3

  • E
  • y22 − 3
  • E
  • y22 = 0

So, simultaneously minimizing |κ (Y1)| , . . . , |κ (Ym)| or (κ (Y1))2 , . . . , (κ (Ym))2 can provide a basis where the recovered sources are (in one sense) maximally non-gaussian.

slide-85
SLIDE 85

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

Definition

The Kurtosis of a a random variable x is defined to be: κ (x) = E

  • x4

− 3

  • E
  • y22

Kurtosis is a measure of ’peakedness’ and thickness of tails for a distribution. Note that if x is gaussian: κ (x) = 3

  • E
  • y22 − 3
  • E
  • y22 = 0

So, simultaneously minimizing |κ (Y1)| , . . . , |κ (Ym)| or (κ (Y1))2 , . . . , (κ (Ym))2 can provide a basis where the recovered sources are (in one sense) maximally non-gaussian.

slide-86
SLIDE 86

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

Definition

The Kurtosis of a a random variable x is defined to be: κ (x) = E

  • x4

− 3

  • E
  • y22

Kurtosis is a measure of ’peakedness’ and thickness of tails for a distribution. Note that if x is gaussian: κ (x) = 3

  • E
  • y22 − 3
  • E
  • y22 = 0

So, simultaneously minimizing |κ (Y1)| , . . . , |κ (Ym)| or (κ (Y1))2 , . . . , (κ (Ym))2 can provide a basis where the recovered sources are (in one sense) maximally non-gaussian.

slide-87
SLIDE 87

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

Definition

The Kurtosis of a a random variable x is defined to be: κ (x) = E

  • x4

− 3

  • E
  • y22

Kurtosis is a measure of ’peakedness’ and thickness of tails for a distribution. Note that if x is gaussian: κ (x) = 3

  • E
  • y22 − 3
  • E
  • y22 = 0

So, simultaneously minimizing |κ (Y1)| , . . . , |κ (Ym)| or (κ (Y1))2 , . . . , (κ (Ym))2 can provide a basis where the recovered sources are (in one sense) maximally non-gaussian.

slide-88
SLIDE 88

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

Drawbacks of using kurtosis as an optimality critereon: Very sensitive to outliers. Note a robust measure of gaussinity. A more suitable measure of gaussinity is required to produce stable ICA methods.

slide-89
SLIDE 89

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

Definition

The Differential Entropy of a a continuous random variable X with density function fX (x)is defined to be: H (X) = −

  • fX (x) log fX (x) dx

Can be interpreted as the degree of information carried by a random variable. Fundamental result in information theory: A gaussian random variable has the greatest entropy among all random variables

  • f equal variance.
slide-90
SLIDE 90

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

Definition

The Differential Entropy of a a continuous random variable X with density function fX (x)is defined to be: H (X) = −

  • fX (x) log fX (x) dx

Can be interpreted as the degree of information carried by a random variable. Fundamental result in information theory: A gaussian random variable has the greatest entropy among all random variables

  • f equal variance.
slide-91
SLIDE 91

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

Definition

The Differential Entropy of a a continuous random variable X with density function fX (x)is defined to be: H (X) = −

  • fX (x) log fX (x) dx

Can be interpreted as the degree of information carried by a random variable. Fundamental result in information theory: A gaussian random variable has the greatest entropy among all random variables

  • f equal variance.
slide-92
SLIDE 92

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

Consider the following:

Definition

The Negative Entropy (or Negentropy) of a a continuous random variable X with density function fX (x)is defined to be: J (X) = H (Xgauss) − H (X) where Xgauss is a random variable with identical variance to X (or identical covariance matrix). Advantages: Always non-negative; equal to 0 for a gaussian random variable. Not sensitive to sample outliers.

slide-93
SLIDE 93

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

Consider the following:

Definition

The Negative Entropy (or Negentropy) of a a continuous random variable X with density function fX (x)is defined to be: J (X) = H (Xgauss) − H (X) where Xgauss is a random variable with identical variance to X (or identical covariance matrix). Advantages: Always non-negative; equal to 0 for a gaussian random variable. Not sensitive to sample outliers.

slide-94
SLIDE 94

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

Consider the following:

Definition

The Negative Entropy (or Negentropy) of a a continuous random variable X with density function fX (x)is defined to be: J (X) = H (Xgauss) − H (X) where Xgauss is a random variable with identical variance to X (or identical covariance matrix). Advantages: Always non-negative; equal to 0 for a gaussian random variable. Not sensitive to sample outliers.

slide-95
SLIDE 95

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

Difficulties: Negentropy optimization is computationally difficult to deal with directly. Estimates: J (X) ≈ 1

12E

  • y32 + 1

48κ(y)2

Same problems as in the case of just using kurtosis!

J (X) ≈

n

  • i=1

ki(E [Gi (y)] − E [Gi (v)])2, where {ki} are positive constants, v is a standard gaussian random variable and {Gi} are some non-quadratic functions.

slide-96
SLIDE 96

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

Difficulties: Negentropy optimization is computationally difficult to deal with directly. Estimates: J (X) ≈ 1

12E

  • y32 + 1

48κ(y)2

Same problems as in the case of just using kurtosis!

J (X) ≈

n

  • i=1

ki(E [Gi (y)] − E [Gi (v)])2, where {ki} are positive constants, v is a standard gaussian random variable and {Gi} are some non-quadratic functions.

slide-97
SLIDE 97

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

Difficulties: Negentropy optimization is computationally difficult to deal with directly. Estimates: J (X) ≈ 1

12E

  • y32 + 1

48κ(y)2

Same problems as in the case of just using kurtosis!

J (X) ≈

n

  • i=1

ki(E [Gi (y)] − E [Gi (v)])2, where {ki} are positive constants, v is a standard gaussian random variable and {Gi} are some non-quadratic functions.

slide-98
SLIDE 98

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

Typically: All of the Gi are the same function. Very good results have been demonstrated using:

G (u) =

1 α1 log [cosh (α1u)], for some constant 1 ≤ α1 ≤ 2

G (u) = − exp

  • −u2/2
slide-99
SLIDE 99

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

Typically: All of the Gi are the same function. Very good results have been demonstrated using:

G (u) =

1 α1 log [cosh (α1u)], for some constant 1 ≤ α1 ≤ 2

G (u) = − exp

  • −u2/2
slide-100
SLIDE 100

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

Typically: All of the Gi are the same function. Very good results have been demonstrated using:

G (u) =

1 α1 log [cosh (α1u)], for some constant 1 ≤ α1 ≤ 2

G (u) = − exp

  • −u2/2
slide-101
SLIDE 101

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

Consider the computation of one the first independent component w1, based on maximizing negentropy: The maxima of the negentropy approximations of wT

1 X are obtained at certain optima

  • f E
  • wT

1 X

  • . KKT conditions: optima of E
  • wT

1 X

  • under the

constraint E

  • wT

1 X

2 = w1 = 1 are obtained at points where: E

  • wT

1 X

  • − βw1 = 0

So, the Jacobian has the form: J (w1) = E

  • XXTg′

wT

1 X

  • − βI
slide-102
SLIDE 102

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

If we use the approximation E

  • XXTg′

wT

1 X

  • ≈ E
  • XXT

E

  • g′

wT

1 X

  • = E
  • g′

wT

1 X

  • I, we

get the Newton-Raphson iteration: w1 = w1 −

  • E
  • Xg
  • wT

1 X

  • − βw1
  • /
  • E
  • g′

wT

1 X

  • − β
  • Dividing both sides by β − E
  • g′

wT

1 X

  • gives:

w+

1 = E

  • Xg
  • wT

1 X

  • − E
  • Xg′

wT

1 X

  • w1

This is the basic iterate of the FastICA algorithm[4].

slide-103
SLIDE 103

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

The computation of a single Independent Component (FastICA): Choose an initial random vector w1. Compute w+

1 = E

  • Xg
  • wT

1 X

  • − E
  • Xg′

wT

1 X

  • w1

Set w1 = w+

1 /

  • w+

1

  • .

Repeat until convergence.

slide-104
SLIDE 104

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

The computation of a single Independent Component (FastICA): Choose an initial random vector w1. Compute w+

1 = E

  • Xg
  • wT

1 X

  • − E
  • Xg′

wT

1 X

  • w1

Set w1 = w+

1 /

  • w+

1

  • .

Repeat until convergence.

slide-105
SLIDE 105

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

The computation of a single Independent Component (FastICA): Choose an initial random vector w1. Compute w+

1 = E

  • Xg
  • wT

1 X

  • − E
  • Xg′

wT

1 X

  • w1

Set w1 = w+

1 /

  • w+

1

  • .

Repeat until convergence.

slide-106
SLIDE 106

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

The computation of a single Independent Component (FastICA): Choose an initial random vector w1. Compute w+

1 = E

  • Xg
  • wT

1 X

  • − E
  • Xg′

wT

1 X

  • w1

Set w1 = w+

1 /

  • w+

1

  • .

Repeat until convergence.

slide-107
SLIDE 107

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

The computation of a single Independent Component (FastICA): Choose an initial random vector w1. Compute w+

1 = E

  • Xg
  • wT

1 X

  • − E
  • Xg′

wT

1 X

  • w1

Set w1 = w+

1 /

  • w+

1

  • .

Repeat until convergence.

slide-108
SLIDE 108

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

The computation of the full Independent Component Analysis (FastICA): Assume w1, . . . , wp independent components have been estimated. Run single-component method for a vector wp+1, and after every iteration subtract wp+1 from wT

p+1wjwj for j = 1, . . . , p:

wp+1 = wp+1 − p

j=1 wT p+1wjwj

Renormalize wp+1: wp+1 = wp+1/

  • wT

p+1wp+1

Repeat until p = m

slide-109
SLIDE 109

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

The computation of the full Independent Component Analysis (FastICA): Assume w1, . . . , wp independent components have been estimated. Run single-component method for a vector wp+1, and after every iteration subtract wp+1 from wT

p+1wjwj for j = 1, . . . , p:

wp+1 = wp+1 − p

j=1 wT p+1wjwj

Renormalize wp+1: wp+1 = wp+1/

  • wT

p+1wp+1

Repeat until p = m

slide-110
SLIDE 110

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

The computation of the full Independent Component Analysis (FastICA): Assume w1, . . . , wp independent components have been estimated. Run single-component method for a vector wp+1, and after every iteration subtract wp+1 from wT

p+1wjwj for j = 1, . . . , p:

wp+1 = wp+1 − p

j=1 wT p+1wjwj

Renormalize wp+1: wp+1 = wp+1/

  • wT

p+1wp+1

Repeat until p = m

slide-111
SLIDE 111

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

The computation of the full Independent Component Analysis (FastICA): Assume w1, . . . , wp independent components have been estimated. Run single-component method for a vector wp+1, and after every iteration subtract wp+1 from wT

p+1wjwj for j = 1, . . . , p:

wp+1 = wp+1 − p

j=1 wT p+1wjwj

Renormalize wp+1: wp+1 = wp+1/

  • wT

p+1wp+1

Repeat until p = m

slide-112
SLIDE 112

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

The computation of the full Independent Component Analysis (FastICA): Assume w1, . . . , wp independent components have been estimated. Run single-component method for a vector wp+1, and after every iteration subtract wp+1 from wT

p+1wjwj for j = 1, . . . , p:

wp+1 = wp+1 − p

j=1 wT p+1wjwj

Renormalize wp+1: wp+1 = wp+1/

  • wT

p+1wp+1

Repeat until p = m

slide-113
SLIDE 113

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

Properties of FastICA[4][3][1]: Convergence is cubic (assuming the ICA data model). There is no step-size parameter to be chosen. Is a type of neural algorithm. Directly computes ICs using practically any non-linearity g.

The choice of g does affect performance.

Parallel, distributed, computationally simple, requires little memory. May still prematurely converge to local optima. PCA must be conducted on X prior to use of FastICA.

slide-114
SLIDE 114

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

Properties of FastICA[4][3][1]: Convergence is cubic (assuming the ICA data model). There is no step-size parameter to be chosen. Is a type of neural algorithm. Directly computes ICs using practically any non-linearity g.

The choice of g does affect performance.

Parallel, distributed, computationally simple, requires little memory. May still prematurely converge to local optima. PCA must be conducted on X prior to use of FastICA.

slide-115
SLIDE 115

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

Properties of FastICA[4][3][1]: Convergence is cubic (assuming the ICA data model). There is no step-size parameter to be chosen. Is a type of neural algorithm. Directly computes ICs using practically any non-linearity g.

The choice of g does affect performance.

Parallel, distributed, computationally simple, requires little memory. May still prematurely converge to local optima. PCA must be conducted on X prior to use of FastICA.

slide-116
SLIDE 116

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

Properties of FastICA[4][3][1]: Convergence is cubic (assuming the ICA data model). There is no step-size parameter to be chosen. Is a type of neural algorithm. Directly computes ICs using practically any non-linearity g.

The choice of g does affect performance.

Parallel, distributed, computationally simple, requires little memory. May still prematurely converge to local optima. PCA must be conducted on X prior to use of FastICA.

slide-117
SLIDE 117

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

Properties of FastICA[4][3][1]: Convergence is cubic (assuming the ICA data model). There is no step-size parameter to be chosen. Is a type of neural algorithm. Directly computes ICs using practically any non-linearity g.

The choice of g does affect performance.

Parallel, distributed, computationally simple, requires little memory. May still prematurely converge to local optima. PCA must be conducted on X prior to use of FastICA.

slide-118
SLIDE 118

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

Properties of FastICA[4][3][1]: Convergence is cubic (assuming the ICA data model). There is no step-size parameter to be chosen. Is a type of neural algorithm. Directly computes ICs using practically any non-linearity g.

The choice of g does affect performance.

Parallel, distributed, computationally simple, requires little memory. May still prematurely converge to local optima. PCA must be conducted on X prior to use of FastICA.

slide-119
SLIDE 119

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

Properties of FastICA[4][3][1]: Convergence is cubic (assuming the ICA data model). There is no step-size parameter to be chosen. Is a type of neural algorithm. Directly computes ICs using practically any non-linearity g.

The choice of g does affect performance.

Parallel, distributed, computationally simple, requires little memory. May still prematurely converge to local optima. PCA must be conducted on X prior to use of FastICA.

slide-120
SLIDE 120

National Aeronautics and Space Administration

www.nasa.gov

Independent Component Analysis

2-Source Audio Example

⊲ ⊲

slide-121
SLIDE 121

National Aeronautics and Space Administration

www.nasa.gov

Example (Cocktail Party Problem)

These 6 audio recordings are assumed to be a linear mix of unknown sources (via multiplication with an unknown matrix): ⊲ ⊲ ⊲ ⊲ ⊲ ⊲

slide-122
SLIDE 122

National Aeronautics and Space Administration

www.nasa.gov

Seperation via PCA

The following 6 signals were retrieved from the mixed sources using PCA: ⊲ ⊲ ⊲ ⊲ ⊲ ⊲

slide-123
SLIDE 123

National Aeronautics and Space Administration

www.nasa.gov

Seperation via ICA

The following 6 signals were retrieved from the mixed sources using ICA: ⊲ ⊲ ⊲ ⊲ ⊲ ⊲

slide-124
SLIDE 124

National Aeronautics and Space Administration

www.nasa.gov

Seperation of Mixed Images

These 8 images are assumed to be a linear mix of unknown sources (via multiplication with an unknown matrix):

slide-125
SLIDE 125

National Aeronautics and Space Administration

www.nasa.gov

First: we precondition the system by executing PCA (SVD). A stem plot of the singular values, σ, gives: So from the 8 observed images, there are only 5 significant components detected via PCA/SVD.

slide-126
SLIDE 126

National Aeronautics and Space Administration

www.nasa.gov

Seperation of Mixed Images

The following 5 images were retrieved from the mixed sources using ICA:

slide-127
SLIDE 127

National Aeronautics and Space Administration

www.nasa.gov Aapo Hyvrinen and Erkki Oja. A fast Fixed-Point algorithm for independent component analysis. Neural Computation, 9(7):1483–1492, October 1997. Fabrizio Esposito, Erich Seifritz, Elia Formisano, Renato Morrone, Tommaso Scarabino, Gioacchino Tedeschi, Sossio Cirillo, Rainer Goebel, and Francesco Di Salle. Real-time independent component analysis of fMRI time-series. NeuroImage, 20(4):2209–2224, December 2003. Pierre Comon. Independent component analysis, a new concept? Signal Processing, 36(3):287–314, April 1994.

  • E. Oja.

Independent component analysis: algorithms and applications, 2000.

slide-128
SLIDE 128

National Aeronautics and Space Administration

www.nasa.gov

Questions?