Free Component Analysis Raj Rao Dept. of Electrical Engg. & - - PowerPoint PPT Presentation

free component analysis raj rao
SMART_READER_LITE
LIVE PREVIEW

Free Component Analysis Raj Rao Dept. of Electrical Engg. & - - PowerPoint PPT Presentation

Free Component Analysis Raj Rao Dept. of Electrical Engg. & Computer Science www.eecs.umich.edu/~rajnrao Joint work with Hao Wu Funding by ONR, DARPA, ARO 1 Free Component Analysis (FCA) An experiment in pictures 2 I1 = Hedgehog 3 I2


slide-1
SLIDE 1

Free Component Analysis Raj Rao

  • Dept. of Electrical Engg. & Computer Science

www.eecs.umich.edu/~rajnrao Joint work with Hao Wu Funding by ONR, DARPA, ARO

1

slide-2
SLIDE 2

Free Component Analysis (FCA) An experiment in pictures

2

slide-3
SLIDE 3

I1 = Hedgehog

3

slide-4
SLIDE 4

I2 = Panda

4

slide-5
SLIDE 5

Mixed Image 1

5

slide-6
SLIDE 6

Mixed Image 2

6

slide-7
SLIDE 7

Mixing Images

7

slide-8
SLIDE 8

This talk: Unmixing Images from Mixed Images

(Toy) Mixing model: M1 = 0.5 I1 + 0.5 I2 M2 = 0.5 I1 − 0.5 I2 Perfect-unmixing algorithm: U1 = 1 M1 +1 M2 U2 = 1 M1 −1 M2 Goal of FCA:

  • Unmix images with no prior knowledge of

– mixing model – image structure e.g. ”what does typical hedgehog look like?”

  • Questions answered in this talk: Can this be done? When? How well? Theory?

8

slide-9
SLIDE 9

Algorithm(s) for FCA

Perfect-unmixing algorithm: U1 = 1 M1 +1 M2 Strategy: Cast as the optimization problem w1, w2 = arg max f(w1 M1 + w2 M2) subject to w2

1 + w2 2 = 2

Choice of f: ⇐ Important!

9

slide-10
SLIDE 10

An algorithm for FCA

Strategy: w1, w2 = arg max |f(w1 M1 + w2 M2)| subject to w2

1 + w2 2 = 2

Choice of f: ⇐ where the magic happens!

  • X = n × m matrix and σi(X) = i-th singular value of X:

f(X) = (1/n)

  • i

σ4

i (X) − (1 + n/m)

  • 1/n
  • i

σ2

i (X)

2

  • f(X) is the“’free” fourth (rectangular) cumulant of X
  • Insight: w1 = 1 and w2 = 1 ⇒ Success
  • Display w1M1 + w2M2 .. what do we get?

10

slide-11
SLIDE 11

Mixed Images vs Original Images

11

slide-12
SLIDE 12

Mixed vs Unmixed Images

12

slide-13
SLIDE 13

Unmixed Image 1 vs Image 1

13

slide-14
SLIDE 14

Zoom into Unmixed Image 1

  • Near perfect unmixing: Did we get lucky?

14

slide-15
SLIDE 15

This talk: Unmixing Images from Mixed Images

(Toy) Mixing model: M1 = 0.5 I1 + 0.5 I2 M2 = 0.5 I1 − 0.5 I2 Perfect-unmixing algorithm: U1 = 1 M1 +1 M2 U2 = 1 M1 −1 M2 Goal of FCA:

  • Unmix images with no prior knowledge of

– mixing model – image structure e.g. ”what does typical hedgehog look like?”

  • Questions answered in this talk: Can this be done? When? How well? Theory?

15

slide-16
SLIDE 16

I1

16

slide-17
SLIDE 17

I2

17

slide-18
SLIDE 18

Mixed Images

18

slide-19
SLIDE 19

Unmixed Images: FCA

19

slide-20
SLIDE 20

Unmixed Images: ICA

20

slide-21
SLIDE 21

Free Component Analysis (FCA) A great but not-perfect unmixing example

21

slide-22
SLIDE 22

I1 = NYC

22

slide-23
SLIDE 23

I2 = Berlin

23

slide-24
SLIDE 24

Mixed Image 1

24

slide-25
SLIDE 25

Mixed Image 2

25

slide-26
SLIDE 26

Mixed vs FCA Unmixed

26

slide-27
SLIDE 27

Unmixed Image 1

27

slide-28
SLIDE 28

Unmixed Image 1: Zoom In

  • Great but not near-perfect unmixing

28

slide-29
SLIDE 29

Free Component Analysis (FCA) A not-great at all example

29

slide-30
SLIDE 30

I1

30

slide-31
SLIDE 31

I2

31

slide-32
SLIDE 32

Mixed vs FCA Unmixed

  • Quiz: How to make FCA great again? (Q: Why does this not work? More in a bit..)

32

slide-33
SLIDE 33

Theory: FCA – Setup

Mixing model: Assume s non-commutative random variables are being mixed as   y1 . . . ys   = A   x1 . . . xs   Covariance matrix: Σij = Σji = φ(xix∗

j)

Random matrix connection: φ(xix∗

j)” = ” lim n E

1 nTr(XiX∗

j )

  • ≈ 1

nTr(XiX∗

j ) 33

slide-34
SLIDE 34

Theory: FCA – Recovery guarantee

Mixing model:   y1 . . . ys   = A   x1 . . . xs   FCA: Find s directions that maximize aboslute value of free kurtosis +a bit more Theorem [N. and Wu, ’17]: FCA with free kurtosis objective function perfectly unmixes signals

  • x1, . . . xs are freely independent
  • invertible A, Σ = I
  • Free kurtosis = 0
  • At most one ”i.i.d. Gaussian” like random matrix

34

slide-35
SLIDE 35

Proof: FCA – Recovery guarantee

Orthogonal Mixing model: Q =

  • q1

. . . qs

  • is an orthogonal matrix. Let

  y1 . . . ys   = Q   x1 . . . xs   Algorithm: wopt = arg max

||w||2=1 |κ4(wTy)|

Claim: Assume |κ4(x1)| > |κ4(x2)| > . . . > |κ4(xs)| then: wopt = ±q1

35

slide-36
SLIDE 36

Sketch of proof

Step 1: Change of variables κ4(wTy) = κ4(wTQx) = κ4( wTx)

wT = wTQ ⇒ || w||2 = 1 as well

  • wopt = arg max

|| w||2=1 |κ4(

wTx)| Claim: We are done if we can show that

  • wopt = ±e1

36

slide-37
SLIDE 37

Sketch of Proof – continued

Equivalent optimization problem:

  • wopt = arg max

|| w||2=1 |κ4(

wTx)| Expanding terms on right hand side: κ4( wTx) = κ4( w1x1 + . . . + wsxs) Properties of (free) cumulants: If x1 and x2 are (freely) independent

  • Additivity: κi(x1 + x2) = κi(x1) + κi(x2)
  • Homogeneity: κi(c x) = ciκi(x)
  • First cumulant is mean, second cumulant is variance and so on

37

slide-38
SLIDE 38

Sketch of Proof – continued

Properties of (free) cumulants: If x1 and x2 are (freely) independent

  • Additivity: κi(x1 + x2) = κi(x1) + κi(x2)
  • Homogeneity: κi(c x) = ciκi(x)

Expanding terms on right hand side: κ4( wTx) = κ4( w1x1 + . . . + xsxs) = κ4( w1x1) + . . . + κ4( wsxs) by additivity = w4

1κ4(x1) + . . . +

w4

sκ4(xs)

by homogeneity

38

slide-39
SLIDE 39

Sketch of Proof – continued

Expanding terms on right hand side: κ4( wTx) =

  • i

κ4(xi) w4

i

Bounding the abs. kurtosis: ⇒ |κ4( wTx)| ≤ max{|κ4(xi)}i

4

  • i=1

w4

i

≤ |κ4(x1)|

  • i
  • w2

i

since w4

i ≤

w2

i on the sphere

≤ |κ4(x1)| · 1 since

  • i
  • w2

i = 1 39

slide-40
SLIDE 40

Sketch of Proof – final piece

Upper-bound: ⇒ |κ4( wTx)| ≤ |κ4(x1)| ⇒ wopt = ±e1 Change of variables: Since wopt = Q wopt we have that ⇒ wopt = ±q1 = arg max

||w||2=1 |κ4(wTy)|

Linear unmixing transformation: ⇒ wT

  • ptQy = ±eT

1 y = ±y1

  • Same problem + spherical + orth. constraints ⇒ y2, . . .

40

slide-41
SLIDE 41

Sketch of proof: FCA – Recovery guarantee

Key inequality: Uses fact that cumulants of free rvs add + spherical constraints |κ4(qTx)| ≤ max(|κ4(x1)|, |κ4(x2)|) Extremal recovery property: ⇒ arg max

||q||=1 |κ4(qTx)| =

         q1 = 1, q2 = 0 if |κ4(x1)| > |κ4(x2)| q1 = 0, q2 = 1 if |κ4(x1)| < |κ4(x2)| either of above if |κ4(x1)| = |κ4(x2)|

  • Similar property for other higher order free cumulants

41

slide-42
SLIDE 42

FCA algorithm: Whitening step

Input: Z = [Z1, · · · , Zs]T ∈ RsN×M where Zi ∈ RN×M.

  • 1. Compute Z = [ 1

MZ11M1T M, · · · , 1 MZn1M1T M]T

  • 2. Compute

Z = Z − Z.

  • 3. Compute the n × n covariance matrix C where for i, j = 1, . . . , s:

Cij = 1 N Tr( Zi ZT

j ).

  • 4. Compute eigenvalue decomposition C = UΛ2U T.
  • 5. Compute Y = ((Λ−1U T) ⊗ IN) ¯

Z.

  • 6. return: Y , Λ and U.

42

slide-43
SLIDE 43

FCA algorithm: Optimal orthogonal matrix finding step

Input: Z = [Z1, · · · , Zs]T ∈ RsN×M where Zi ∈ RN×M

  • 1. Compute Y , Λ, U by whitening Z. Then compute
  • W = arg max

W ∈O(s)

  • κr

4.

  • W TY
  • ,

where W = W ⊗ IN where

  • κ4(X) = 1

N Tr

  • (XXT)2

  • 1 + N

M 1 N Tr(XXT) 2

  • 3. Compute

A = UΛ W and X = ( A−1 ⊗ IN)Z such that Z = ( A ⊗ IN) X

  • 4. return:

A and X

43

slide-44
SLIDE 44

Asymptotic Freeness

Let (X, ϕ) be a non-commutative probability space and fix a positive integer n ≥ 1. For each i ∈ I, let Xi ⊂ X be a unital subalgebra. The subalgebras (Xi)i∈I are called freely independent (or simply free), if for all k ≥ 1 ϕ(x1 · · · xk) = 0 whenever ϕ(xj) = 0 for all j = 1, · · · , k, and neighboring elements are from diffierent subalgebras, i.e. xj ∈ Xi(j), i(1) = i(2), i(2) = i(3), · · · , i(k − 1) = i(k).

  • Analog of E[x1x2x3] = 0 whenever E[x1] = 0, E[x2] = 0 and E[x3] = 0

Random matrix connection: φ(xix∗

j) = lim n E

1 nTr(XiXT

j )

  • ≈ 1

nTr(XiX∗

j ) 44

slide-45
SLIDE 45

(Asymptotically) Free vs UnFree

Free: Singular vectors of matrix pair “incoherently related” (as can be)+relaxable UnFree: Singular vectors of matrix pair are ”coherently related”

  • Insight: FCA fail/success ⇔ Trump/Clinton v. Berlin/NYC v. Panda/Hedgehog

45

slide-46
SLIDE 46

Making FCA great again

  • Insight: Is there a sub-matrix (pair) that is ”more” free? Question: Where?

46

slide-47
SLIDE 47

ICA vs FCA

Mixing model:   y1 . . . ys   = A   x1 . . . xs   FCA: Find s directions that maximize aboslute value of free kurtosis +a bit more ICA: Find s directions that maximize aboslute value of classical kurtosis +a bit more

  • FCA ⇔ Matrices
  • ICA ⇔ Scalars
  • Note; both FCA and ICA can be applied to same data!
  • Question: Which unmixes better?

47

slide-48
SLIDE 48

Unmixed Images: FCA vs ICA

48

slide-49
SLIDE 49

ICA vs FCA

49

slide-50
SLIDE 50

ICA vs FCA: FCA wins!

50

slide-51
SLIDE 51

ICA vs FCA: Objective functions

  • Insight: scalar κ4 ≈ 0, matrix κ4 ≫ 0 ⇒ Power of embedding!

51

slide-52
SLIDE 52

FCA: Free entropy vs Free Kurtosis Maximization

52

slide-53
SLIDE 53

ICA vs FCA

  • Insight: FCA with Free entropy ≫ Free Kurtosis!

53

slide-54
SLIDE 54

Application: FCA to unmix speech

  • How to apply FCA to mixed speech signals?

54

slide-55
SLIDE 55

FCA with Embedding

  • Embedding signals into matrix via Fourier Analysis

55

slide-56
SLIDE 56

Big picture: Free Component Analysis (FCA)

Mixing model:   y1 . . . ys   = A   x1 . . . xs   FCA: Unmix by finding s directions that maximize abs. free kurtosis +a bit more

  • Recovery guarantees
  • Free entropy FCA ≫ free kurtosis FCA
  • Manifold optimization integral to FCA
  • FCA via an eigenvalue (or tensor) computation
  • Math surprise: Matrices in the wild are free-er than we fear they aren’t!

56

slide-57
SLIDE 57

Application Prologue

57

slide-58
SLIDE 58

Physics magic: (Not-so-free) Polarizers

58

slide-59
SLIDE 59

Math Magic: Free probability

59