Bayesian Estimation of Low-rank Matrices Pierre Alquier Journes de - - PowerPoint PPT Presentation

bayesian estimation of low rank matrices
SMART_READER_LITE
LIVE PREVIEW

Bayesian Estimation of Low-rank Matrices Pierre Alquier Journes de - - PowerPoint PPT Presentation

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion Bayesian Estimation of Low-rank Matrices Pierre Alquier Journes de Statistique du Sud, Barcelona 09/06/2014 Pierre Alquier


slide-1
SLIDE 1

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion

Bayesian Estimation of Low-rank Matrices

Pierre Alquier Journées de Statistique du Sud, Barcelona 09/06/2014

Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-2
SLIDE 2

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion

Introduction

Many problems arising in statistics / signal processing involve estimation / recovery of low-rank matrices : PCA, matrix completion for recommender systems, reduced rank regression / multitask learning, video processing : separation of moving object and static background, quantum statistics, ...

Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-3
SLIDE 3

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion

Reduced rank regression

Consider m regression models Yj = XBj + εj with the same regressors X, Yj is a column vector in Rn, X is n × p.

Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-4
SLIDE 4

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion

Reduced rank regression

Consider m regression models Yj = XBj + εj with the same regressors X, Yj is a column vector in Rn, X is n × p.   Y1 . . . Ym  

  • Y

= X   B1 . . . Bm  

  • B

+   ε1 . . . εm  

  • E

Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-5
SLIDE 5

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion

Reduced rank regression

Consider m regression models Yj = XBj + εj with the same regressors X, Yj is a column vector in Rn, X is n × p.   Y1 . . . Ym  

  • Y

= X   B1 . . . Bm  

  • B

+   ε1 . . . εm  

  • E

Y = XB + ε where Y is n × m, X is n × p, B is p × m.

Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-6
SLIDE 6

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion

Reduced rank regression

Consider m regression models Yj = XBj + εj with the same regressors X, Yj is a column vector in Rn, X is n × p.   Y1 . . . Ym  

  • Y

= X   B1 . . . Bm  

  • B

+   ε1 . . . εm  

  • E

Y = XB + ε where Y is n × m, X is n × p, B is p × m. Economic theory : rank(B) ≪ min(p, m).

Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-7
SLIDE 7

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion

Reduced rank regression

Consider m regression models Yj = XBj + εj with the same regressors X, Yj is a column vector in Rn, X is n × p.   Y1 . . . Ym  

  • Y

= X   B1 . . . Bm  

  • B

+   ε1 . . . εm  

  • E

Y = XB + ε where Y is n × m, X is n × p, B is p × m. Economic theory : rank(B) ≪ min(p, m). Example :

  • M. R. Gibbons & W. Ferson (1985). Testing asset pricing models with changing expectations and

an unobserved market portfolio. Journal of Financial Economics 14 217–236. Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-8
SLIDE 8

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-9
SLIDE 9

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion StarWars I StarWars IV π . . . Claire 4 ? 3 . . . Nial ? 4 ? . . . Brendon 2 ? 4 . . . Andrew ? 4 ? . . . Adrian 1 ? ? . . . Jason 2 4 5 . . . Pierre 3 5 5 . . . . . . . . . . . . . . . ... Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-10
SLIDE 10

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion StarWars I StarWars IV π . . . Claire 4 ? 3 . . . Nial ? 4 ? . . . Brendon 2 ? 4 . . . Andrew ? 4 ? . . . Adrian 1 ? ? . . . Jason 2 4 5 . . . Pierre 3 5 5 . . . . . . . . . . . . . . . ...

  • J. Bennett & S. Lanning (2007). The Netflix Prize. Proceedings of KDD Cup and Workshop’07

3–6. Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-11
SLIDE 11

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion StarWars I StarWars IV π . . . Claire 4 ? 3 . . . Nial ? 4 ? . . . Brendon 2 ? 4 . . . Andrew ? 4 ? . . . Adrian 1 ? ? . . . Jason 2 4 5 . . . Pierre 3 5 5 . . . . . . . . . . . . . . . ...

  • J. Bennett & S. Lanning (2007). The Netflix Prize. Proceedings of KDD Cup and Workshop’07

3–6.

Parameter : B = (Bi,j)1≤i≤p,1≤j≤m where Bi,j is the rating of user i to movie j.

Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-12
SLIDE 12

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion StarWars I StarWars IV π . . . Claire 4 ? 3 . . . Nial ? 4 ? . . . Brendon 2 ? 4 . . . Andrew ? 4 ? . . . Adrian 1 ? ? . . . Jason 2 4 5 . . . Pierre 3 5 5 . . . . . . . . . . . . . . . ...

  • J. Bennett & S. Lanning (2007). The Netflix Prize. Proceedings of KDD Cup and Workshop’07

3–6.

Parameter : B = (Bi,j)1≤i≤p,1≤j≤m where Bi,j is the rating of user i to movie j.

  • Obs. : Yi,j = Bi,j + εi,j, (i, j) ∈ I {1, . . . , n} × {1, . . . , p}.

Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-13
SLIDE 13

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion StarWars I StarWars IV π . . . Claire 4 ? 3 . . . Nial ? 4 ? . . . Brendon 2 ? 4 . . . Andrew ? 4 ? . . . Adrian 1 ? ? . . . Jason 2 4 5 . . . Pierre 3 5 5 . . . . . . . . . . . . . . . ...

  • J. Bennett & S. Lanning (2007). The Netflix Prize. Proceedings of KDD Cup and Workshop’07

3–6.

Parameter : B = (Bi,j)1≤i≤p,1≤j≤m where Bi,j is the rating of user i to movie j.

  • Obs. : Yi,j = Bi,j + εi,j, (i, j) ∈ I {1, . . . , n} × {1, . . . , p}.

Objective : estimate B by ˆ B, and advertise to i the movies j with ˆ Bi,j ≃ 5.

Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-14
SLIDE 14

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion StarWars I StarWars IV π . . . Claire 4 ? 3 . . . Nial ? 4 ? . . . Brendon 2 ? 4 . . . Andrew ? 4 ? . . . Adrian 1 ? ? . . . Jason 2 4 5 . . . Pierre 3 5 5 . . . . . . . . . . . . . . . ...

  • J. Bennett & S. Lanning (2007). The Netflix Prize. Proceedings of KDD Cup and Workshop’07

3–6.

Parameter : B = (Bi,j)1≤i≤p,1≤j≤m where Bi,j is the rating of user i to movie j.

  • Obs. : Yi,j = Bi,j + εi,j, (i, j) ∈ I {1, . . . , n} × {1, . . . , p}.

Objective : estimate B by ˆ B, and advertise to i the movies j with ˆ Bi,j ≃ 5. Assumption : rank(B) ≪ min(m, p).

Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-15
SLIDE 15

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion

(Non-Bayesian) estimation

Usually : fit the data subject to a constraint, or penalty : rank(B). Feasible in reduced-rank regression, e.g. :

  • F. Bunea, Y. She & M. Wegkamp (2011). Optimal selection of reduced rank estimators of

high-dimensional matrices. The Annals of Statistics 39 1282–1309. Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-16
SLIDE 16

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion

(Non-Bayesian) estimation

Usually : fit the data subject to a constraint, or penalty : rank(B). Feasible in reduced-rank regression, e.g. :

  • F. Bunea, Y. She & M. Wegkamp (2011). Optimal selection of reduced rank estimators of

high-dimensional matrices. The Annals of Statistics 39 1282–1309.

B∗ = Tr

  • BTB

1

2

the nuclear norm, leads to feasible algorithms in matrix completion, e.g. :

  • E. Candès & Y. Plan (2009). Matrix completion with noise. Proceedings of the IEEE 98 925–936.

(among others).

Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-17
SLIDE 17

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion

Bayesian estimators - known rank

Survey of Bayesian estimation in econometrics when rank(B) = k is known :

  • J. Geweke (1996). Bayesian reduced rank regression in econometrics. Journal of Econometrics 75

121–146. Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-18
SLIDE 18

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion

Bayesian estimators - known rank

Survey of Bayesian estimation in econometrics when rank(B) = k is known :

  • J. Geweke (1996). Bayesian reduced rank regression in econometrics. Journal of Econometrics 75

121–146.

Define : B

  • p×m

= M

  • p×k

NT

  • k×m

.

Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-19
SLIDE 19

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion

Bayesian estimators - known rank

Survey of Bayesian estimation in econometrics when rank(B) = k is known :

  • J. Geweke (1996). Bayesian reduced rank regression in econometrics. Journal of Econometrics 75

121–146.

Define : B

  • p×m

= M

  • p×k

NT

  • k×m

. Let M·,ℓ ∼ N(0, γI) be the ℓ-th column of M, we have : B =

k

  • ℓ=1

M·,ℓ(N·,ℓ)T ⇒ rank(B) ≤ k.

Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-20
SLIDE 20

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion

Bayesian estimators - adaptation

Among others :

  • R. Salakhutdinov & A. Mnih (2008). Bayesian probabilistic matrix factorization using MCMC.

Proceedings of ICML’08. Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-21
SLIDE 21

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion

Bayesian estimators - adaptation

Among others :

  • R. Salakhutdinov & A. Mnih (2008). Bayesian probabilistic matrix factorization using MCMC.

Proceedings of ICML’08.

B =

k

  • ℓ=1

M·,ℓ(N·,ℓ)T with k large - e.g. k = min(p, m).

Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-22
SLIDE 22

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion

Bayesian estimators - adaptation

Among others :

  • R. Salakhutdinov & A. Mnih (2008). Bayesian probabilistic matrix factorization using MCMC.

Proceedings of ICML’08.

B =

k

  • ℓ=1

M·,ℓ(N·,ℓ)T with k large - e.g. k = min(p, m). M·,ℓ ∼ N(0, γℓI), γℓ is itself random, such that most of the γℓ ≃ 0.

Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-23
SLIDE 23

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion

Conjugate priors on γ = (γ1, . . . , γℓ)

The γℓ are independent and : name prior posterior discrete (1 − p)δε + pδC (1 − p′)δε + p′δC inverse-gamma

1 γℓ ∼ Γ(a, b) 1 γℓ|obs. ∼ Γ(a′, b′)

gamma γℓ ∼ Γ( m+p+1

2

, β2

2 )

γℓ|obs. ∼ IG(µ, ν)

Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-24
SLIDE 24

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion

Estimator in Reduced Rank Regression (RRR)

Y = XB0 + E

Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-25
SLIDE 25

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion

Estimator in Reduced Rank Regression (RRR)

Y = XB0 + E Given the prior π on B, we define : ˆ ρλ(dB) ∝ exp

  • −λY − XB2

F

  • π(dB)

and ˆ Bλ =

  • B ˆ

ρλ(dB).

Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-26
SLIDE 26

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion

Estimator in Reduced Rank Regression (RRR)

Y = XB0 + E Given the prior π on B, we define : ˆ ρλ(dB) ∝ exp

  • −λY − XB2

F

  • π(dB)

and ˆ Bλ =

  • B ˆ

ρλ(dB). Note that when the noise is N(0, σ2) and λ =

1 2σ2 this is the

Bayesian estimator (posterior mean).

Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-27
SLIDE 27

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion

Theorem - RRR

Theorem Gaussian or bounded noise, known variance σ2, inverse gamma prior on the γℓ, there is a calibration of the parameters such that : E

  • X ˆ

Bλ − XB02

F

  • ≤ inf

r

inf

M(p,r),N(m,r)

  • X(MNT − B0)2

F

+ C

  • σ2r max(m, p) log

nmp σ2

  • + σ2
  • 1 + X2

F

np N2

F + M2 F + σ2

  • .

Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-28
SLIDE 28

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion

Theorem - RRR

When the entries of X are bounded by c, E

  • X ˆ

Bλ − XB02

F

  • ≤ inf

r

inf

M(p, r), N(m, r) |Mi,j |, |Ni,j | < c′

  • X(MNT − B)2

F

+ C(c, c′)

  • σ2r max(m, p) log

nmp σ2

  • + σ4

.

Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-29
SLIDE 29

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion

Theorem - RRR

When the entries of X are bounded by c, E

  • X ˆ

Bλ − XB02

F

  • ≤ inf

r

inf

M(p, r), N(m, r) |Mi,j |, |Ni,j | < c′

  • X(MNT − B)2

F

+ C(c, c′)

  • σ2r max(m, p) log

nmp σ2

  • + σ4

. Rate : r max(m, p) log nmp

σ2

  • . Minimax-optimal rate :

r max(m, p), see e.g.

  • V. Koltchinskii, K. Lounici & A. Tsybakov (2011). Nuclear-norm penalization and optimal rates

for noisy low-rank matrix completion. The Annals of Statistics 39 2302–2329. Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-30
SLIDE 30

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion

Tools used in proof

PAC-Bayesian inequalities :

Catoni, O. (2004) Statistical Learning Theory and Stochastic Optimization. Saint-Flour Summer School on Probability Theory 2001 (Jean Picard ed.), Lecture Notes in Mathematics, Springer. Dalalyan, A. and Tsybakov, A. (2008). Aggregation by Exponential Weighting, Sharp Oracle Inequalities and Sparsity. Machine Learning 72 39–61. Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-31
SLIDE 31

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion

Toy example : denoising Mondrian paintings

Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-32
SLIDE 32

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion

Toy example : denoising Mondrian paintings

Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-33
SLIDE 33

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion

Toy example : denoising Mondrian paintings

Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-34
SLIDE 34

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion

Toy example : denoising Mondrian paintings

Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-35
SLIDE 35

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion

Estimator in the matrix completion problem

We observe (ik, jk) ∈ {1, . . . , p} × {1, . . . , m} iid uniform and Yk = B0

ik,jk + εk,

1 ≤ k ≤ n.

Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-36
SLIDE 36

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion

Estimator in the matrix completion problem

We observe (ik, jk) ∈ {1, . . . , p} × {1, . . . , m} iid uniform and Yk = B0

ik,jk + εk,

1 ≤ k ≤ n. Given the prior π on B, we define : ˆ ρλ(dB) ∝ exp

  • −λ

n

  • k=1

(Yk − Bik,jk)2

  • π(dB)

and ˆ Bλ =

  • B ˆ

ρλ(dB).

Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-37
SLIDE 37

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion

Theorem - matrix completion

Theorem sub-Gaussian noise, known variance σ2, B0 = MNT (p × k, k × m), |Mi,j|, |Ni,j| ≤

  • c

min(m,p),

discrete prior on the γℓ only, there is a calibration of the parameters such that, with probability at least 1 − η : 1 mpˆ Bλ − B02

F

≤ C(σ2, c)  r max(m, p) log(nmp)

  • n min(m, p)

+ log

  • 1

η

  • √n

  .

Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-38
SLIDE 38

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion

Theorem - matrix completion

Theorem 1 mpˆ Bλ − B02

F

≤ C(σ2, c)  r max(m, p) log(nmp)

  • n min(m, p)

+ log

  • 1

η

  • √n

  .

Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-39
SLIDE 39

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion

Theorem - matrix completion

Theorem 1 mpˆ Bλ − B02

F

≤ C(σ2, c)  r max(m, p) log(nmp)

  • n min(m, p)

+ log

  • 1

η

  • √n

  . Minimax rate : r max(m, p) n .

Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-40
SLIDE 40

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion

Gibbs and Varational Bayes on MovieLens 100K

Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-41
SLIDE 41

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion

Gibbs and Varational Bayes on MovieLens 100K

RMSE GS VB all 0.93 0.94 1 1.86 1.76 2 1.20 1.16 3 0.66 0.68 4 0.56 0.62 5 1.05 1.09

Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-42
SLIDE 42

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion

VB on MovieLens 100K/1M/10M

Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-43
SLIDE 43

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion

Conclusion

Open question : can we get the minimax rate in the matrix completion problem ?

Pierre Alquier Bayesian Estimation of Low-rank Matrices

slide-44
SLIDE 44

Bayesian Estimation of Low-rank Matrices : Introduction Bayesian Reduced Rank Regression Bayesian Matrix Completion

Conclusion

Open question : can we get the minimax rate in the matrix completion problem ? ∗ ∗ ∗ The theorem on reduced rank regression :

  • P. Alquier (2013). Bayesian estimation of low-rank matrices : short survey and theoretical study.

Proceedings of ALT’13 309–323.

The theorem on matrix completion and the simulations :

  • P. Alquier, N. Chopin, V. Cottet and J. Rousseau (2014). Bayesian matrix completion : prior

specification and consistency. Preprint arXiv :1406.1440. Pierre Alquier Bayesian Estimation of Low-rank Matrices