Matrix Factorization with Binary Components Uniqueness in a - - PowerPoint PPT Presentation

matrix factorization with binary components uniqueness in
SMART_READER_LITE
LIVE PREVIEW

Matrix Factorization with Binary Components Uniqueness in a - - PowerPoint PPT Presentation

Matrix Factorization with Binary Components Uniqueness in a randomized model Felix Krahmer, TU M unchen Joint work with: Matthias Hein, Saarland University, David James , University of G ottingen Matrix Factorization given data


slide-1
SLIDE 1

Matrix Factorization with Binary Components – Uniqueness in a randomized model

Felix Krahmer, TU M¨ unchen

Joint work with: Matthias Hein, Saarland University, David James, University of G¨

  • ttingen
slide-2
SLIDE 2

Matrix Factorization

given data matrix D ∈ Rm×n, n number of data points, m number

  • f features

find matrices T ∈ Rm×r, A ∈ Rr×n such that

D = TA

  • r

minT∈Rm×r,A∈Rr×n D − TA2

F ,

exact case approximate case where r is typically small Globally optimal solution:

Singular Value Decomposition (SVD)

D = UΣV T = ⇒ T = UΣ, A = V T.

best rank r approximation obtained by taking top r singular values

Problem: Factors often lack interpretation

Felix Krahmer, TUM Matrix Factorization with Binary Components 2 of 23

slide-3
SLIDE 3

Nonnegative Matrix Factorization (NMF)

given data matrix D ∈ Rm×n, find matrices T ∈ Rm×r

+

, A ∈ Rr×n

+

such that D = TA

  • r

minT∈Rm×r

+

,A∈Rr×n

+

D − TA2

F .

(taken from Lee, Seung: Learning the parts of objects by NMF, Nature(1999))

Felix Krahmer, TUM Matrix Factorization with Binary Components 3 of 23

slide-4
SLIDE 4

Nonnegative Matrix Factorization (NMF)

given data matrix D ∈ Rm×n, find matrices T ∈ Rm×r

+

, A ∈ Rr×n

+

such that D = TA

  • r

minT∈Rm×r

+

,A∈Rr×n

+

D − TA2

F .

Prior work:

used for finding latent factors/components T solved via alternating least squares but convergence can only

proven to critical point = ⇒ no guarantee to find global optimum

In 2012 Arora, Ge, Kanna, Moitra propose an algorithm for exact

NMF with runtime O((nm)r2).

In the case where T is separable, algorithm runs in polynomial

time (improved by Bittorf et al (2013)) Goal: extend conditions on NMF for which solution can be found efficiently

Felix Krahmer, TUM Matrix Factorization with Binary Components 3 of 23

slide-5
SLIDE 5

Gene expression data analysis

Gene expression is the process by which information from a gene is

used in the synthesis of a functional gene product.               =         

1 1 1 1 1

  • 1

1 1 1 1

  • 1

1 1 1 1

                      gene product gene expression genes Goal: Decompose gene expression data into functional processes

Felix Krahmer, TUM Matrix Factorization with Binary Components 4 of 23

slide-6
SLIDE 6

Matrix Factorization with Binary Components

Our model:

=

     

1 1 1 1 1 1 1 1 1 1 1 1 1 1

      D ∈ Rm×n T ∈ {0, 1}m×r A ∈ Rr×n Our Goal: factor D = TA

Felix Krahmer, TUM Matrix Factorization with Binary Components 5 of 23

slide-7
SLIDE 7

Matrix Factorization with Binary Components

Our model:

=

     

1 1 1 1 1 1 1 1 1 1 1 1 1 1

      D ∈ Rm×n T ∈ {0, 1}m×r A ∈ Rr×n Our Goal: factor D = TA

Felix Krahmer, TUM Matrix Factorization with Binary Components 5 of 23

slide-8
SLIDE 8

Matrix Factorization with Binary Components

Our model:

=

     

1 1 1 1 1 1 1 1 1 1 1 1 1 1

      D ∈ Rm×n T ∈ {0, 1}m×r A ∈ Rr×n Our Goal: factor D = TA Assumptions:

rank(D) = r ≪ m,

rank(A) = r, 1TA = 1T

the columns of T are affinely independent, i.e.

∀λ ∈ Rr with λT1r = 0 and Tλ = 0 = ⇒ λ = 0

Felix Krahmer, TUM Matrix Factorization with Binary Components 5 of 23

slide-9
SLIDE 9

Key idea

Lemma

The affine hull of T and D agree, aff(D) = aff(T). Illustration for m = 3 - note that aff(D) ∩ {0, 1}m = T

Theorem (Slawski, Hein, Lutsik (NIPS 2013))

Some exact factorization can be computed in O(rm2r) by computing aff(T) ∩ {0, 1}m = aff(D) ∩ {0, 1}m.

Felix Krahmer, TUM Matrix Factorization with Binary Components 6 of 23

slide-10
SLIDE 10

Uniqueness of the Factorization

Solutions are not guaranteed to be nonnegative => If two solutions exist, we may find one which is not nonnegative Uniqueness is crucial for the interpretability of the factors!

=

   

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

   

A ?

=

   

1 1 1 1 1 1 1 1 1 1 1 1

   

A′ D T T ′

Factorization is unique if aff(T) ∩ {0, 1}m = {T:,1, . . . , T:,r}

Felix Krahmer, TUM Matrix Factorization with Binary Components 7 of 23

slide-11
SLIDE 11

Matrix Factorization with Random Binary Components

Our model:

=

      

t1,1 . . . t1,r t2,1 . . . t2,r . . . . . . tm−1,1 . . . tm−1,r tm,1 . . . tm,r

       D ∈ Rm×n T random matrix A ∈ Rr×n, 1TA = 1T

tij are drawn independently from {0, 1} with probabilities

P [tij = 0] = p and P [tij = 1] = 1 − p,

choose p big to simulate sparse binary components task: bound probability that aff(T) ∩ {0, 1}m = {T:,1, . . . , T:,r} Felix Krahmer, TUM Matrix Factorization with Binary Components 8 of 23

slide-12
SLIDE 12

Idea

Replace T with M taking the values in {−1, +1} with same

probability distribution P [aff(T) ∩ {0, 1}m = {T:,1, . . . , T:,r}] = P [aff(M) ∩ {−1, +1}m = {M:,1, . . . , M:,r}]

Define

Rs = P [∃x ∈ Rr, | supp(x)| = s, Mx ∈ {−1, +1}m] ,

Felix Krahmer, TUM Matrix Factorization with Binary Components 9 of 23

slide-13
SLIDE 13

Idea

Replace T with M taking the values in {−1, +1} with same

probability distribution P [aff(T) ∩ {0, 1}m = {T:,1, . . . , T:,r}] = P [aff(M) ∩ {−1, +1}m = {M:,1, . . . , M:,r}]

Define

Rs = P [∃x ∈ Rr, | supp(x)| = s, Mx ∈ {−1, +1}m] , then P [aff(M) ∩ {−1, +1}m = {M:,1, . . . , M:,r}] ≤

r

  • s=2

Rs

Felix Krahmer, TUM Matrix Factorization with Binary Components 9 of 23

slide-14
SLIDE 14

Idea

Replace T with M taking the values in {−1, +1} with same

probability distribution P [aff(T) ∩ {0, 1}m = {T:,1, . . . , T:,r}] = P [aff(M) ∩ {−1, +1}m = {M:,1, . . . , M:,r}]

Define

Rs = P [∃x ∈ Rr, | supp(x)| = s, Mx ∈ {−1, +1}m] , Ps = P [∃x ∈ Rr, supp(x) = {1, . . . , s} : Mx ∈ {−1, +1}m] , then P [aff(M) ∩ {−1, +1}m = {M:,1, . . . , M:,r}] ≤

r

  • s=2

Rs ≤

r

  • s=2

r s

  • Ps

Felix Krahmer, TUM Matrix Factorization with Binary Components 9 of 23

slide-15
SLIDE 15

Odlyzko 1988

Theorem (Odlyzko 1988)

Let M be a random m × r matrix whose entries are drawn independently from {−1, +1} with equal probabilities (p = 1/2). If

r ≤ m

  • 1 −

10 log(m)

  • ,

then

P [aff(M) ∩ {−1, +1}m = {M:,1, . . . , M:,r}] ≤

  • r

3

  • P3 + O

7 10 m

with P3 = 4 3

4

m, as m tends to infinity.

Felix Krahmer, TUM Matrix Factorization with Binary Components 10 of 23

slide-16
SLIDE 16

Conjecture - Uniqueness under Random Sampling

Conjecture

Let M be a random m × r matrix whose entries are drawn independently from {−1, +1} with probabilities

P [mij = −1] = p and P [mij = 1] = 1 − p,

If there is some fixed ε > 0 such that

r < m(1 − ε),

Then,

P [aff(M) ∩ {−1, +1}m = {M:,1, . . . , M:,r}] ≤

  • r

3

  • P3 + o (P3)

with P3 = 4(1 − p(1 − p))m, as m tends to infinity.

Felix Krahmer, TUM Matrix Factorization with Binary Components 11 of 23

slide-17
SLIDE 17

Conjecture - Uniqueness under Random Sampling

Conjecture

Let M be a random m × r matrix whose entries are drawn independently from {−1, +1} with probabilities

P [mij = −1] = p and P [mij = 1] = 1 − p,

If there is some fixed ε > 0 such that

r < m(1 − ε),

Then,

P [aff(M) ∩ {−1, +1}m = {M:,1, . . . , M:,r}] ≤

  • r

3

  • P3 + o (P3)

with P3 = 4(1 − p(1 − p))m, as m tends to infinity.

(1 − p(1 − p)) < 1 for p ∈ (0, 1) (1 − 1

2(1 − 1 2)) = 3 4

Felix Krahmer, TUM Matrix Factorization with Binary Components 11 of 23

slide-18
SLIDE 18

Partial result

Theorem (almost/work in progress)

Let M be a random m × r matrix whose entries are drawn independently from {−1, +1} with probabilities

P [mij = −1] = p and P [mij = 1] = 1 − p,

If there is some fixed ε > 0 such that

r ≤ 32,

Then,

P [aff(M) ∩ {−1, +1}m = {M:,1, . . . , M:,r}] ≤

  • r

3

  • P3 + o (P3)

with P3 = 4(1 − p(1 − p))m, as m tends to infinity.

(1 − p(1 − p)) < 1 for p ∈ (0, 1) (1 − 1

2(1 − 1 2)) = 3 4

Felix Krahmer, TUM Matrix Factorization with Binary Components 12 of 23

slide-19
SLIDE 19

Sperner family and Sperners Lemma

Definition (Sperner (1928))

A family of sets that does not include two sets X and Y for which X ⊂ Y is called a Sperner family.

Felix Krahmer, TUM Matrix Factorization with Binary Components 13 of 23

slide-20
SLIDE 20

Sperner family and Sperners Lemma

Definition (Sperner (1928))

A family of sets that does not include two sets X and Y for which X ⊂ Y is called a Sperner family.

Definition

Let M ∈ {−1, +1}N×s be a binary matrix. Define Xi = {j|mij = +1} . We say that M has Sperner rows if A = {X1, . . . , XN} is a Sperner family.

Felix Krahmer, TUM Matrix Factorization with Binary Components 13 of 23

slide-21
SLIDE 21

Sperner family and Sperners Lemma

Definition (Sperner (1928))

A family of sets that does not include two sets X and Y for which X ⊂ Y is called a Sperner family.

Definition

Let M ∈ {−1, +1}N×s be a binary matrix. Define Xi = {j|mij = +1} . We say that M has Sperner rows if A = {X1, . . . , XN} is a Sperner family.

Lemma (Hein, FK, James)

Let M ∈ {−1, +1}N×s be a binary matrix, x ∈ Rs,xi = 0 ∀i ,e ∈ {±1}N. then Mx = e ⇒ DeMDsgn(x) has Sperner rows, where De, Dsgn(x) is the diagonal matrix with diagonal e, sgn(x) resp.

Felix Krahmer, TUM Matrix Factorization with Binary Components 13 of 23

slide-22
SLIDE 22

Theorem

Theorem (Hein, FK, James)

Let M ∈ {−1, +1}N×s be a random matrix where the entries are independent Bernoulli random variables with parameter p. Then the probability Ps that there exists an x ∈ (R \ {0})s and an e ∈ {±1}N such that Mx = e can be bounded from above by Ps ≤ 2s−1

  • s

⌊ s−1

2 ⌋

  • · W N

s ,

where Ws is the sum of the

  • s

⌊ s−1

2 ⌋

  • largest and smallest monomials of

the polynomial

  • p + (1 − p)

s.

Felix Krahmer, TUM Matrix Factorization with Binary Components 14 of 23

slide-23
SLIDE 23

Theorem

Theorem (Hein, FK, James)

Let M ∈ {−1, +1}N×s be a random matrix where the entries are independent Bernoulli random variables with parameter p. Then the probability Ps that there exists an x ∈ (R \ {0})s and an e ∈ {±1}N such that Mx = e can be bounded from above by Ps ≤ 2s−1

  • s

⌊ s−1

2 ⌋

  • · W N

s ,

where Ws is the sum of the

  • s

⌊ s−1

2 ⌋

  • largest and smallest monomials of

the polynomial

  • p + (1 − p)

s −(ps−1(1 − p) + p(1 − p)s−1) · 1s≥3.

Felix Krahmer, TUM Matrix Factorization with Binary Components 14 of 23

slide-24
SLIDE 24

Theorem

Theorem (Hein, FK, James)

Let M ∈ {−1, +1}N×s be a random matrix where the entries are independent Bernoulli random variables with parameter p. Then the probability Ps that there exists an x ∈ (R \ {0})s and an e ∈ {±1}N such that Mx = e can be bounded from above by Ps ≤ 2s−1

  • s

⌊ s−1

2 ⌋

  • · W N

s ,

where Ws is the sum of the

  • s

⌊ s−1

2 ⌋

  • largest and smallest monomials of

the polynomial

  • p + (1 − p)

s −(ps−1(1 − p) + p(1 − p)s−1) · 1s≥3. If s is constant, then Ps = O(W N

s ).

Felix Krahmer, TUM Matrix Factorization with Binary Components 14 of 23

slide-25
SLIDE 25

Plot of Ws

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 0.4 0.5 0.6 0.7 0.8 0.9 1 p W 2 W 3 W 4 W 5 W 6

W1, . . . , W6

Felix Krahmer, TUM Matrix Factorization with Binary Components 15 of 23

slide-26
SLIDE 26

Plot of Ws

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 0.4 0.5 0.6 0.7 0.8 0.9 1 p W 2 W 3 W 4 W 5 W 6 W 7

W1, . . . , W7

Felix Krahmer, TUM Matrix Factorization with Binary Components 15 of 23

slide-27
SLIDE 27

Plot of Ws

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 0.4 0.5 0.6 0.7 0.8 0.9 1 p W2 W3 W4 W5 W6 W7 W8 W9 W1 0 W1 1 W1 2

W1, . . . , W12

Felix Krahmer, TUM Matrix Factorization with Binary Components 15 of 23

slide-28
SLIDE 28

Bounding Ps, s ≥ 6

Consider individual rows instead of all rows simultaneously to bound Ps

Felix Krahmer, TUM Matrix Factorization with Binary Components 16 of 23

slide-29
SLIDE 29

Bounding Ps, s ≥ 6

Consider individual rows instead of all rows simultaneously to bound Ps Let s ≤ √m Ps ≤ , MS

Felix Krahmer, TUM Matrix Factorization with Binary Components 16 of 23

slide-30
SLIDE 30

Bounding Ps, s ≥ 6

Consider individual rows instead of all rows simultaneously to bound Ps Let s ≤ √m Ps ≤ , MS = A B

  • ,
  • 1. Divide rows of Matrix MS into to parts:

A ∈ {−1, +1}s×(s+q) ,

B ∈ {−1, +1}s×(r−s−q)

Felix Krahmer, TUM Matrix Factorization with Binary Components 16 of 23

slide-31
SLIDE 31

Bounding Ps, s ≥ 6

Consider individual rows instead of all rows simultaneously to bound Ps Let s ≤ √m Ps ≤ +Qs,q, MS = A B

  • ,
  • 1. Divide rows of Matrix MS into to parts:

A ∈ {−1, +1}s×(s+q) ,

B ∈ {−1, +1}s×(r−s−q)

  • 2. Condition on the event that A has full rank

Qs,q = P [A does not have full rank] Felix Krahmer, TUM Matrix Factorization with Binary Components 16 of 23

slide-32
SLIDE 32

Bounding Ps, s ≥ 6

Consider individual rows instead of all rows simultaneously to bound Ps Let s ≤ √m Ps ≤ 2s s + q s

  • +Qs,q,

MS = A B

  • ,
  • 1. Divide rows of Matrix MS into to parts:

A ∈ {−1, +1}s×(s+q) ,

B ∈ {−1, +1}s×(r−s−q)

  • 2. Condition on the event that A has full rank

Qs,q = P [A does not have full rank]

  • 3. for some subset σ ⊂ [s + q] of cardinality s,

Aσ is invertible

= ⇒ for each α ∈ {−1, +1}s,

there exists a unique x ∈ RS such that Aσx = α. Felix Krahmer, TUM Matrix Factorization with Binary Components 16 of 23

slide-33
SLIDE 33

Bounding Ps, s ≥ 6

Consider individual rows instead of all rows simultaneously to bound Ps Let s ≤ √m Ps ≤ 2s s + q s

  • Lm−(s+q)

p,s

+Qs,q, MS = A B

  • ,
  • 1. Divide rows of Matrix MS into to parts:

A ∈ {−1, +1}s×(s+q) ,

B ∈ {−1, +1}s×(r−s−q)

  • 2. Condition on the event that A has full rank

Qs,q = P [A does not have full rank]

  • 3. for some subset σ ⊂ [s + q] of cardinality s,

Aσ is invertible

= ⇒ for each α ∈ {−1, +1}s,

there exists a unique x ∈ RS such that Aσx = α.

  • 4. For each x, see if Bx is also in {−1, +1}m.
  • 5. Lp,s :=

sup

x:| supp(x)|=s m row of M

P [m, x ∈ {−1, +1}]

Felix Krahmer, TUM Matrix Factorization with Binary Components 16 of 23

slide-34
SLIDE 34

The Littlewood-Offord Lemma

By similar arguments, we see that, for some d,

Qs,q ≤

d

  • k=2

2k−1

  • k

⌊ k−1 2 ⌋

  • · W m

k + s s−1

  • k=d+1

s k s + q k

  • Ls+q−k

p,k+1 ,

Felix Krahmer, TUM Matrix Factorization with Binary Components 17 of 23

slide-35
SLIDE 35

The Littlewood-Offord Lemma

By similar arguments, we see that, for some d,

Qs,q ≤

d

  • k=2

2k−1

  • k

⌊ k−1 2 ⌋

  • · W m

k + s s−1

  • k=d+1

s k s + q k

  • Ls+q−k

p,k+1 ,

Now choosing parameters q = m/2 and s = O(1), it follows that

Ps = O

  • max
  • W2, . . . , Wd,
  • Lp,d+1, . . . ,
  • Lp,s

m as m → ∞.

Felix Krahmer, TUM Matrix Factorization with Binary Components 17 of 23

slide-36
SLIDE 36

The Littlewood-Offord Lemma

By similar arguments, we see that, for some d,

Qs,q ≤

d

  • k=2

2k−1

  • k

⌊ k−1 2 ⌋

  • · W m

k + s s−1

  • k=d+1

s k s + q k

  • Ls+q−k

p,k+1 ,

Now choosing parameters q = m/2 and s = O(1), it follows that

Ps = O

  • max
  • W2, . . . , Wd,
  • Lp,d+1, . . . ,
  • Lp,s

m as m → ∞.

Key Problem: bound

Lp,s = P [m, x ∈ {+1, −1}] , m row of M, | supp(x)| = s

Felix Krahmer, TUM Matrix Factorization with Binary Components 17 of 23

slide-37
SLIDE 37

The Littlewood-Offord Lemma

Theorem (Littlewood-Offord)

Let x ∈ Rn, xi > 0, i = 1, . . . , n and y ∈ R. P  

n

  • j=1

ǫjxj = y   ≤ max

k

n k

  • pk(1 − p)n−k

P  

n

  • j=1

ǫjxj ∈ {±y}   ≤ max

k

n k pk(1 − p)n−k + pn−k(1 − p)k

Bottleneck: xi > 0 Felix Krahmer, TUM Matrix Factorization with Binary Components 18 of 23

slide-38
SLIDE 38

Generalized Littlewood-Offord Theorem

Theorem (Hein, FK, James - known before?)

Let x ∈ Rn, xi = 0 , i = 1, . . . , n and y ∈ R. If n is even, then P  

n

  • j=1

ǫjxj = y   ≤

  • 0≤j≤ n

2

n

2

j 2 p2j(1 − p)n−2j If n is odd and p ≥ 1

2, then

P  

n

  • j=1

ǫjxj = y   ≤

  • 0≤j≤⌊ n

2 ⌋

⌊ n

2⌋

j ⌈ n

2⌉

j + 1

  • p2j+1(1 − p)n−(2j+1).

Both inequalities are sharp, and reduce to standard LO for p = 1

2.

Still not enough, need P

n

j=1 ǫjxj ∈ {±y}

  • , work in progress.

Union bound works, e.g., for p < 0.85, s > 6, or p < 0.95, s > 32. Felix Krahmer, TUM Matrix Factorization with Binary Components 19 of 23

slide-39
SLIDE 39

Generalized Littlewood-Offord Theorem II

Theorem (Hein, FK, James)

P

  • n
  • j=1

ǫjxj ∈ {±y}

  • ≤ max0≤n−≤⌊ n

2 ⌋ n−

  • l=0

n−n−

  • k=0

|Akl |

  • pk+n−−l(1 − p)s−n−−k−l + (1 − p)k+n−−lps−n−−k−l

Al

k are sets of cardinality l in certain Sperner families

  • p = 1

2 = ⇒    2−s s

⌊ s 2 ⌋

  • s odd

2−s

s ⌊ s 2 −1⌋

  • s even

if p > 1

2, no closed analytic form available

use relaxation to linear program and solve. Felix Krahmer, TUM Matrix Factorization with Binary Components 20 of 23

slide-40
SLIDE 40

plots of min{Ws,

  • Lp,s}

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 0.4 0.5 0.6 0.7 0.8 0.9 1 p W 2 W 3 m i n( W 4 ˆ W 4) m i n( W 5 ˆ W 5) m i n( W 6 ˆ W 6)

s = 2 . . . 6

Felix Krahmer, TUM Matrix Factorization with Binary Components 21 of 23

slide-41
SLIDE 41

plots of min{Ws,

  • Lp,s}

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 0.4 0.5 0.6 0.7 0.8 0.9 1 p W 2 W 3 mi n( W 4 ˆ W 4) mi n( W 5 ˆ W 5) mi n( W 6 ˆ W 6) mi n( W 7 ˆ W 7) mi n( W 8 ˆ W 8) mi n( W 9 ˆ W 9) mi n( W 1 0 ˆ W 1 0 ) mi n( W 1 1 ˆ W 1 1 ) mi n( W 1 2 ˆ W 1 2 )

s = 2 . . . 12

Felix Krahmer, TUM Matrix Factorization with Binary Components 21 of 23

slide-42
SLIDE 42

plots of min{Ws,

  • Lp,s}

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 p

s = 2 . . . 32

Felix Krahmer, TUM Matrix Factorization with Binary Components 21 of 23

slide-43
SLIDE 43

Conclusion and Outlook

Conclusion:

Generalization of the Littlewood-Offord Lemma. ’Proof by picture’ of uniqueness in binary matrix factorization

under a random model for the binary component as long as either p is not too close to 1 or r is not too large Outlook:

Further generalize the LO lemma to develop a proof for all p and r. Felix Krahmer, TUM Matrix Factorization with Binary Components 22 of 23

slide-44
SLIDE 44

Thank you

Thank you for your attention. Questions?

Felix Krahmer, TUM Matrix Factorization with Binary Components 23 of 23