Approximate Factor Analysis Models Lorenzo Finesso, Peter Spreij - - PowerPoint PPT Presentation

approximate factor analysis models
SMART_READER_LITE
LIVE PREVIEW

Approximate Factor Analysis Models Lorenzo Finesso, Peter Spreij - - PowerPoint PPT Presentation

Approximate Factor Analysis Models Lorenzo Finesso, Peter Spreij Brixen July 19, 2007 1 0 0 1 1 0 P 1 = 2 2 1 1 0 2 2 1 1 2 0 2 1 1 P 2 = 2 0 2


slide-1
SLIDE 1

Approximate Factor Analysis Models

Lorenzo Finesso, Peter Spreij Brixen – July 19, 2007

slide-2
SLIDE 2

P1 =

    

1 0 0

1 2 1 2 1 2 1 2

    

P2 =

    

1 2 1 2 0 1 2 1 2 0

0 0 1

    

P2P1 =

    

1 2 1 4 1 4 1 2 1 4 1 4 1 2 1 2

    

1

slide-3
SLIDE 3

Factor Analysis models Y = HX + ε where X ∈ Rk and ε ∈ Rn, independent zero mean normals, (k < n)

Cov(X) = I, and Cov(ε) = D > 0, diagonal

therefore

Cov(Y ) := Σ0 = HH⊤ + D Cov(Y |X) = D diagonal

2

slide-4
SLIDE 4

Exact (weak) realization of FA models Problem Given the positive covariance matrix Σ0 ∈ Rn×n and the integer k < n find (H, D) such that H ∈ Rn×k D > 0 diagonal n × n Σ0 = HH⊤ + D

3

slide-5
SLIDE 5

Informational divergence between normal measures Given probability measures P1 ≪ P2, on the same space D(P1||P2) = E P1 log dP1 dP2 normal case

P1 = N(0, Σ1), P2 = N(0, Σ2)

  • n Rn

D(P1||P2) := D(Σ1||Σ2) = 1 2 log |Σ2| |Σ1| + 1 2 tr(Σ−1

2 Σ1) − n

2

4

slide-6
SLIDE 6

Approximate FA models Problem Given Σ0 ∈ Rn×n positive and the integer k < n minimize D(Σ0||HH⊤+D) = 1 2 log |HH⊤ + D| |Σ0| + 1 2 tr((HH⊤+D)−1Σ0)− n 2

  • ver (H, D), where H ∈ Rn×k and D > 0 is diagonal of size n

Proposition The approximate FA problem admits a (nonunique) solution

5

slide-7
SLIDE 7

Lifted version of the problem Definitions

Σ =

  • Σ ∈ R (n+k)×(n+k) : Σ =
  • Σ11

Σ12 Σ21 Σ22

  • > 0
  • Two subsets of Σ will play a special role

Σ0 = {Σ ∈ Σ : Σ11 = Σ0} Σ1 =

  • Σ ∈ Σ : Σ =
  • HH⊤ + D

HQ (HQ)⊤ Q⊤Q

  • Elements of Σ1 will often be denoted by Σ(H, D, Q)

Remark Y ∼ N(0, Σ0) admits an exact FA model of size k iff

Σ0 ∩ Σ1 = ∅

6

slide-8
SLIDE 8

Lifted problem

Problem min

Σ′∈Σ0,Σ1∈Σ1

D(Σ′||Σ1) Proposition Let Σ0 be given. It holds that min

H,D D(Σ0||HH⊤ + D) =

min

Σ′∈Σ0,Σ1∈Σ1

D(Σ′||Σ1)

7

slide-9
SLIDE 9

First partial minimization

Problem min

Σ′∈Σ0

D(Σ′||Σ)

This problem has a unique solution

8

slide-10
SLIDE 10

First partial minimization - general solution Proposition Let (Y, X) ∼ Q = QY,X and let

P =

  • P = P Y,X : P Y = P0
  • for a given P0 ≪ QY , then

min

P∈P

D(P||Q) = D(P ∗||Q) where P ∗ is given by P ∗Y = P0, P ∗X|Y = QX|Y Moreover, for any P ∈ P, one has the Pythagorean law D(P||Q) = D(P||P ∗) + D(P ∗||Q)

9

slide-11
SLIDE 11

First partial minimization – normal case Proposition Let Q ∼ N(0, Σ) and P0 ∼ N(0, Σ0) where Σ ∈ Σ and Σ0 ∈ R n×n, then min

Σ′∈Σ0

D(Σ′||Σ) is attained by P ∗ ∼ N(0, Σ∗) with

Σ∗ =

 

Σ0 Σ0Σ−1

11 Σ12

Σ21Σ−1

11 Σ0

Σ22 − Σ21Σ−1

11 (Σ11 − Σ0)Σ−1 11 Σ12

 

10

slide-12
SLIDE 12

Second partial minimization

Problem min

Σ1∈Σ1

D(Σ||Σ1)

This problem has a unique solution Σ∗

1 = Σ∗(H∗, D∗, Q∗)

11

slide-13
SLIDE 13

Second partial minimization – normal case Notation For M square let ∆(M) be the diagonal ∆(M)ii = Mii Proposition An optimal point is (H∗, D∗, Q∗) with H∗ = Σ12Σ−1/2

22

D∗ = ∆(Σ11 − Σ12Σ−1

22 Σ21)

Q∗ = Σ1/2

22

thus: Σ∗

1 =

  • Σ12Σ−1

22 Σ21 + ∆(Σ11 − Σ12Σ−1 22 Σ21)

Σ12 Σ21 Σ22

  • moreover

D(Σ||Σ(H, D, Q)) = D(Σ||Σ∗

1) + D(Σ∗ 1||Σ(H, D, Q))

for any Σ(H, D, Q) ∈ Σ1

12

slide-14
SLIDE 14

Alternating minimization algorithm

Given Σ0 > 0, pick (H0, D0, Q0) and let Σ(0)

1

= Σ(H0, D0, Q0) construct the sequence Σ(0)

1

− → Σ′(1) − → Σ(1)

1

− → Σ′(2) − → Σ(2)

1

− → . . . where . D(Σ′(t+1)||Σ(t)

1 ) =

min

Σ′∈Σ0

D(Σ′||Σ(t)

1 )

and . D(Σ′(t+1)||Σ(t+1)

1

) = min

Σ1∈Σ1

D(Σ′(t+1)||Σ1)

13

slide-15
SLIDE 15

Algorithm At the t-th iteration the matrices Ht, Dt and Qt are available. Qt+1 =

  • Q⊤

t Qt − Q⊤ t H⊤ t (HtH⊤ t + Dt)−1HtQt

+ Q⊤

t H⊤ t (HtH⊤ t + Dt)−1Σ0(HtH⊤ t + Dt)−1HtQt

1/2

Ht+1 = Σ0(HtH⊤

t + Dt)−1HtQtQ−1 t+1

Dt+1 = ∆(Σ0 − Ht+1H⊤

t+1)

14

slide-16
SLIDE 16

Algorithm Notice The update rules can be written in terms of (Ht, Dt) only Rt = I − H⊤

t (HtH⊤ t + Dt)−1(HtH⊤ t + Dt − Σ0)(HtH⊤ t + Dt)−1Ht

Ht+1 = Σ0(HtH⊤

t + Dt)−1HtR−1/2 t

Dt+1 = ∆(Σ0 − Ht+1H⊤

t+1)

15

slide-17
SLIDE 17

Some properties of the algorithm Proposition (a) Dt > 0 (b) Rt is invertible (c) If H0 is of full column rank, so is Ht (e) If Σ0 = HtH⊤

t + Dt the algorithm stops

(f) The objective function decreases at each iteration (g) The limit points (H, D) of the algorithm satisfy the relations H = (Σ0 − HH⊤)D−1H, D = ∆(Σ0 − HH⊤)

16