Finding low-rank structure in messy data Laura Balzano University - - PowerPoint PPT Presentation

finding low rank structure in messy data
SMART_READER_LITE
LIVE PREVIEW

Finding low-rank structure in messy data Laura Balzano University - - PowerPoint PPT Presentation

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion Finding low-rank structure in messy data Laura Balzano University of Michigan Michigan Institute for Data Science March 2017 Laura Balzano University of


slide-1
SLIDE 1

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

Finding low-rank structure in messy data

Laura Balzano

University of Michigan

Michigan Institute for Data Science March 2017

Laura Balzano University of Michigan Low-rank structure in messy data

slide-2
SLIDE 2

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

Big Data means Messy Data

Longitude Latitude Ozone Concentration −81 −80 −79 −78 −77 −76 −75 38 39 40 41 42 0.01 0.02 0.03 0.04

Laura Balzano University of Michigan Low-rank structure in messy data

slide-3
SLIDE 3

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

Big Data means Messy Data

!"#$"% &'()#"% *$#+),-'.% &'+#/).0%%%1%%%'*-&)23-'.% Laura Balzano University of Michigan Low-rank structure in messy data

slide-4
SLIDE 4

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

Structure

In all these cases, we believe there is some structure in the data. That structure can help us predict, interpolate, detect anomalies, etc.

Laura Balzano University of Michigan Low-rank structure in messy data

slide-5
SLIDE 5

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

Structure

In all these cases, we believe there is some structure in the data. That structure can help us predict, interpolate, detect anomalies, etc. Much of my work focuses on low-rank structure.

Laura Balzano University of Michigan Low-rank structure in messy data

slide-6
SLIDE 6

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

Subspace Representations

50 100 150 200 − − − −

state

  • rdered singular values

(normalized)

Byte%Count%data%from%UW%network%

sp

Temperature%data%from%UCLA%Sensornet%

  • rdered singular values

(normalized) 10 20 30 40 50 60 10-3 10-2 10-1 100

  • rdered singular

values (normalized) Laura Balzano University of Michigan Low-rank structure in messy data

slide-7
SLIDE 7

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

Subspace Representations

Laura Balzano University of Michigan Low-rank structure in messy data

slide-8
SLIDE 8

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

Subspace Representations

Laura Balzano University of Michigan Low-rank structure in messy data

slide-9
SLIDE 9

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

Low-rank structure for Messy Data

! Structured Single Index Models E[ y| x] = g(xTw)

0.2 0.4 0.6 0.8 1 Idealized ISE response 0.2 0.4 0.6 0.8 1

! Matrix completion or factorization with streaming data ! PCA with heteroscedastic data ! Union of subspace data – active clustering or completion

Laura Balzano University of Michigan Low-rank structure in messy data

slide-10
SLIDE 10

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

Collaborators

NSF, Army Research Office, MCubed

Laura Balzano University of Michigan Low-rank structure in messy data

slide-11
SLIDE 11

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

LRMC with Monotonic Observations

Low-rank Matrix Completion under Monotonic Transformation: Can we recover a low-rank matrix where every entry has been perturbed using an unknown monotonic function?

Laura Balzano University of Michigan Low-rank structure in messy data

slide-12
SLIDE 12

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

Low-rank Matrix Completion

We have an n × m, rank r matrix X. However, we only observe a subset of the entries, Ω ⊂ {1, . . . , n} × {1, . . . , m}.

Laura Balzano University of Michigan Low-rank structure in messy data

slide-13
SLIDE 13

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

Example 1: Recommender Systems

!"#$"% &'()#"% *$#+),-'.% &'+#/).0%%%1%%%'*-&)23-'.%

Laura Balzano University of Michigan Low-rank structure in messy data

slide-14
SLIDE 14

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

Example 1: Recommender Systems

Ne#lix'Prize' Compe//on' 200642009' Winning'team' received'$1M'

Laura Balzano University of Michigan Low-rank structure in messy data

slide-15
SLIDE 15

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

Low-rank Matrix Completion

We have an n × m, rank r matrix X. However, we only observe a subset of the entries, Ω ⊂ {1, . . . , n} × {1, . . . , m}. We may find a solution by solving the following NP-hard

  • ptimization:

minimize

M

rank(M) subject to MΩ = XΩ

Laura Balzano University of Michigan Low-rank structure in messy data

slide-16
SLIDE 16

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

Low-rank Matrix Completion

We have an n × m, rank r matrix X. However, we only observe a subset of the entries, Ω ⊂ {1, . . . , n} × {1, . . . , m}. Or we may solve this convex problem: minimize

M

M∗ =

n

  • i=1

σi(M) subject to MΩ = XΩ Exact recovery guarantees: X is exactly low-rank and incoherent. MSE guarantees: X is nearly low-rank with bounded (r + 1)th singular value.

Laura Balzano University of Michigan Low-rank structure in messy data

slide-17
SLIDE 17

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

Low-rank Matrix Completion Algorithms

There are a plethora of algorithms to solve the nuclear norm problem or reformulations. LMaFit, APGL, FPCA Singular value thresholding: iterated SVD, SVT, FRSVT Grassmannian: OptSpace, GROUSE

Laura Balzano University of Michigan Low-rank structure in messy data

slide-18
SLIDE 18

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

Example 1: Recommender Systems

!"#$"% &'()#"% *$#+),-'.% &'+#/).0%%%1%%%'*-&)23-'.%

Laura Balzano University of Michigan Low-rank structure in messy data

slide-19
SLIDE 19

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

Example 2: Blind Sensor Calibration

Laura Balzano University of Michigan Low-rank structure in messy data

slide-20
SLIDE 20

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

Example 2: Blind Sensor Calibration

Ion Selective Electrodes have a nonlinear response to their ions (pH, ammonium, calcium, etc)

0.2 0.4 0.6 0.8 1 Idealized ISE response 0.2 0.4 0.6 0.8 1

Laura Balzano University of Michigan Low-rank structure in messy data

slide-21
SLIDE 21

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

Single Index Model

Suppose we have predictor variables x and response variables y, and we seek a transformation g and vector w relating the two such that E[y|x] = g

  • xTw
  • .

Generalized Linear Model: g is known, y|x are RVs from an exponential family distribution parameterized by w.

Includes linear regression, log-linear regression, and logistic regression

Single Index Model: Both g and w are unknown.

Laura Balzano University of Michigan Low-rank structure in messy data

slide-22
SLIDE 22

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

Single Index Model Learning

Algorithm 1 Lipshitz-Isotron Algorithm [Kakade et al., 2011] Given T > 0, (xi, yi)p

i=1;

Set w(1) := 1; for t = 1, 2, . . . , T do Update g using Lipschitz-PAV: g(t) = LPAV

  • (xT

i w(t), yi)p i=1

  • .

Update w using gradient descent: w(t+1) = w(t) + 1 p

p

  • i=1
  • yi − g(t)(xT

i w(t))

  • xi

end for

Laura Balzano University of Michigan Low-rank structure in messy data

slide-23
SLIDE 23

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

Lipschitz Pool Adjacent Violator

The Pool Adjacent Violator (PAV) algorithm pools points and averages to minimize mean squared error g(xi) − yi.

PAV

L-PAV adds the additional constraint of a given Lipschitz constant.

0.2 0.4 0.6 0.8 1 1 2 3 4 5 6

Data PAV LPAV

Laura Balzano University of Michigan Low-rank structure in messy data

slide-24
SLIDE 24

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

High-rank (and effective rank) matrices

For Z low-rank, Yij = g(Zij) =

1 1+exp−γZij , Y has full rank.

Yij = g(Zij) = quantize to grid(Zij), Y has full rank. These matrices even have high effective rank. For a rank-50, 1000x1000 matrix Z, we can plot the eff rank of Y :

gamma

0.1 0.2 0.3 0.4 0.5 0.6

ǫ=0.01 effective rank

200 400 600 800 1000

Logistic function number of grid points

20 40 60 80 100 120

ǫ=0.001 effective rank

200 400 600 800 1000

Quantizing to a grid erank Laura Balzano University of Michigan Low-rank structure in messy data

slide-25
SLIDE 25

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

Optimization Formulation

We observe Yij = g∗(Z ∗

ij ) + Nij for (i, j) ∈ Ω, where Ω is the set of

  • bserved entries.

min

g,Z

(g(Zi,j) − Yi,j)2

  • subj. to

g : R → R is Lipschitz and monotone rank(Z) ≤ r Non-convex in each variable, but we can alternate the standard approaches: Use gradient descent and projection onto the low-rank cone for Z. Use LPAV for g. We call this algorithm MMC-LS.

Laura Balzano University of Michigan Low-rank structure in messy data

slide-26
SLIDE 26

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

MMC-c Algorithm

Algorithm 2 MMC-calibrated [Ganti et al., 2015] Given max iterations T > 0, step size η > 0, rank r, data YΩ Init ˆ g(0)(z) = |Ω|

mnz, ˆ

Z (0) = mn

|Ω|Y0, where Y0 zero-filled YΩ.

for t = 1, 2, . . . , T do Update ˆ Z using gradient descent: ˆ Z (t)

i,j = ˆ

Z (t−1)

i,j

− η

  • ˆ

gt−1 ˆ Z (t−1)

i,j

  • − Yi,j
  • I(i,j)∈Ω

Project: ˆ Z (t) = Pr( ˆ Z (t)) Update g: g(t) = LPAV

  • {( ˆ

Z (t)

i,j , Yi,j) for (i, j) ∈ Ω}

  • .

end for

Calibrated Loss Laura Balzano University of Michigan Low-rank structure in messy data

slide-27
SLIDE 27

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

Remarks

MMC consists of three steps: gradient descent, projection, and LPAV. The gradient descent step requires a step size parameter η; we chose a small constant stepsize by cross validation. The projection requires rank r. For our implementation, we started with a small r and increased it, in the same vein as [Wen et al., 2012]. LPAV is the solution of a QP. Ravi developed an ADMM implementation as well.

Laura Balzano University of Michigan Low-rank structure in messy data

slide-28
SLIDE 28

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

Experiments on Data

Paper recommendation: 3426 features from 50 scholars’ research profiles. Jester: 4.1 Million continuous ratings (-10.00 to +10.00) of 100 jokes from 73,421 users. Movie lens: 100,000 ratings from 1000 users on 1700 movies. Cameraman: Dictionary learning on patches of the image. Dataset Dimension |Ω| r0.01(Y ) PaperReco 3426 × 50 34294 (20%) 47 Jester-3 24938 × 100 124690 (5%) 66 ML-100k 1682 × 943 64000 (4%) 391 Cameraman 1536 × 512 157016 (20%) 393

Laura Balzano University of Michigan Low-rank structure in messy data

slide-29
SLIDE 29

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

Real Data Performance

RMSE on a held-out test set: Dataset |Ω|/mn LMaFit-A MMC-c T = 1 MMC-c PaperReco 20% 0.4026 0.4247 0.2965 Jester-3 5% 6.8728 5.327 5.2348 ML-100k 4% 3.3101 1.388 1.1533 Cameraman 20% 0.0754 0.1656 0.06885

Laura Balzano University of Michigan Low-rank structure in messy data

slide-30
SLIDE 30

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

Next Steps

We have generalized this to arbitrary atomic structure on the weight vector. We still seek full convergence theory in both cases. We seek results for relative order recovery when the nonlinearity is more severe.

Laura Balzano University of Michigan Low-rank structure in messy data

slide-31
SLIDE 31

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

PCA for Heteroscedastic Data

PCA for Heteroscedastic High-Dimensional Data: How does PCA perform with data of different quality?

Laura Balzano University of Michigan Low-rank structure in messy data

slide-32
SLIDE 32

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

Motivation

Suppose we have different data sources for the same phenomenon, all of differing quality.

Images from http://www.medicalnewstoday.com/articles/153201.php, https://www.nasa.gov/multimedia/imagegallery/iotd.html, http://www.livescience.com/27992-portable-pollution-sensors-improve-data-nsf-ria.html Laura Balzano University of Michigan Low-rank structure in messy data

slide-33
SLIDE 33

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

Problem Formulation

Homoscedastic noise 100% σ2 = 1 Heteroscedastic noise 50% σ2

1 = 0.1, 50% σ2 2 = 1.9 Laura Balzano University of Michigan Low-rank structure in messy data

slide-34
SLIDE 34

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

Problem Formulation

Homoscedastic noise 100% σ2 = 1 Heteroscedastic noise 50% σ2

1 = 0.1, 50% σ2 2 = 1.9

We model n heteroscedastic sample vectors y1, . . . , yn ∈ Cd from a k dimensional subspace ˜ U ∈ Cd×k as yi = ˜ U ˜ Θ˜ zi + ηiεi =

k

  • j=1

˜ θj ˜ uj(˜ z(j)

i

)∗ + ηiεi. (1)

Laura Balzano University of Michigan Low-rank structure in messy data

slide-35
SLIDE 35

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

Problem Formulation

yi = ˜ U ˜ Θ˜ zi + ηiεi =

k

  • j=1

˜ θj ˜ uj(˜ z(j)

i

)∗ + ηiεi i = 1, . . . , n . (2) ˜ U = [˜ u1 · · · ˜ uk] ∈ Cd×k forms an orthonormal subspace basis, ˜ Θ = diag(˜ θ1, . . . , ˜ θk) ∈ Rk×k

+

is a diagonal matrix of amplitudes, ˜ zi ∈ Ck are IID sample coefficient vectors ηi ∈ {σ1, . . . , σL} selects one of L noise standard deviations, εi ∈ Cd are zero-mean, unit variance IID noise vectors and we define nj to be the number of samples with ηi = σj where n1 + · · · + nL = n.

Laura Balzano University of Michigan Low-rank structure in messy data

slide-36
SLIDE 36

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

Performance of PCA

Theorem 1 (PCA asymptotic recovery [Hong et al., 2017])

Suppose that the sample-to-dimension ratio n/d → c > 0 and the proportions nℓ/n → pℓ for ℓ = 1, . . . , L as n, d → ∞. Then the i-th PCA amplitude ˆ θi is such that ˆ θ2

i a.s.

− → 1 c max{α, βi}

  • 1 + c

L

  • ℓ=1

pℓσ2

max{α, βi} − σ2

  • .

(3) If A(βi) > 0, then the i-th principal component ˆ ui is such that

  • ˆ

ui, Span{˜ uj : ˜ θj = ˜ θi}

  • 2 a.s.

− → A(βi) βiB′

i (βi)

(4)

  • ˆ

ui, Span{˜ uj : ˜ θj = ˜ θi}

  • 2 a.s.

− → 0,

Laura Balzano University of Michigan Low-rank structure in messy data

slide-37
SLIDE 37

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

Performance of PCA

Theorem 2 (PCA asymptotic recovery cont’d [Hong et al., 2017])

the normalized score vector ˆ z(i)/√n is such that

  • ˆ

z(i) √n , Span{˜ z(j) : ˜ θj = ˜ θi}

  • 2

a.s.

− → A(βi) c(βi + (1 − c)˜ θ2

i )B′ i (βi)

(5)

  • ˆ

z(i) √n , Span{˜ z(j) : ˜ θj = ˜ θi}

  • 2

a.s.

− → 0, where α and βi are, respectively, the largest real roots of A(x) := 1 − c

L

  • ℓ=1

pℓσ4

(x − σ2

ℓ)2 ,

Bi(x) := 1 − c ˜ θ2

i L

  • ℓ=1

pℓ x − σ2

. (6)

Laura Balzano University of Michigan Low-rank structure in messy data

slide-38
SLIDE 38

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

Performance of PCA

Suppose all σi = 0. Then A(x) = 1 ∀x, and Bi(x) = 1 − c ˜

θ2

i

x

= ⇒ βi = c ˜ θ2

i and

ˆ θ2

i a.s.

− → 1 c max{α, βi} = ˜ θ2

i

  • ˆ

ui, Span{˜ uj : ˜ θj = ˜ θi}

  • 2 a.s.

− → A(βi) βiB′

i (βi) = 1

  • ˆ

z(i) √n , Span{˜ z(j) : ˜ θj = ˜ θi}

  • 2

a.s.

− → A(βi) c(βi + (1 − c)˜ θ2

i )B′ i (βi)

= 1

Laura Balzano University of Michigan Low-rank structure in messy data

slide-39
SLIDE 39

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

Principal Component recovery concentration

1/2 1 p2 0.5 1.0 |ˆ ui, ˜ ui|2 i = 1 i = 2

Figure: 103 samples in 102 dimensions.

1/2 1 p2 0.5 1.0 |ˆ ui, ˜ ui|2 i = 1 i = 2

Figure: 104 samples in 103 dimensions.

Simulated subspace recovery (4) for two noise levels as a function of the contamination fraction p2. Noise has σ2

1 = 0.1, σ2 2 = 3.25, other params c = 10, ˜

θ1 = 1, and ˜ θ2 = 0.8. Simulation mean (blue curve) and interquartile interval (light blue ribbon) are shown with the asymptotic recovery (4) of Theorem 1 (green curve). The region where A(βi ) ≤ 0 is the red horizontal segment with value zero (a conjecture). Increasing data size indicates a concentration behavior as expected. Laura Balzano University of Michigan Low-rank structure in messy data

slide-40
SLIDE 40

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

Subspace recovery for two noise variances

1 2 3 4 σ2

1

1 2 3 4 5 σ2

2

0.2 0.4 0.6 0.8 1 |ˆ ui, ˜ ui|2

Figure: Asymptotic subspace recovery (4) as a function of noise variances σ2

1 and σ2 2 occurring in

proportions p1 = 70% and p2 = 30%, where c = 10 and ˜ θi = 1. Contours are overlaid in black and the region where A(βi ) ≤ 0 is shown as zero (a conjecture). Along the dashed cyan line, the average noise variance is ¯ σ2 ≈ 1.74 and the best performance occurs when σ2

1 = σ2 2 = ¯

σ2. Laura Balzano University of Michigan Low-rank structure in messy data

slide-41
SLIDE 41

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

Principal Component Recovery

0.5 1 1.5 2 c 0.5 1 1.5 2 2.5 ˜ θi 0.2 0.4 0.6 0.8 1 |hˆ ui, ˜ uii|2

Figure: Homoscedastic noise with σ2

1 = 1.

0.5 1 1.5 2 c 0.5 1 1.5 2 2.5 ˜ θi 0.2 0.4 0.6 0.8 1 |hˆ ui, ˜ uii|2

Figure: Heteroscedastic noise with 80% at σ2

1 = 0.8 and 20% at

σ2

2 = 1.8.

Laura Balzano University of Michigan Low-rank structure in messy data

slide-42
SLIDE 42

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

Optimality of homoscedasticity

Consider the following three settings:

1 99% of samples have noise variance 0.01 but 1% have

variance 99.01.

2 99% of samples have noise variance 1.01 but 1% have

variance 0.01.

3 All samples have noise variance 1 (i.e., data are

homoscedastic). In all three settings, the average noise variance is 1.

Laura Balzano University of Michigan Low-rank structure in messy data

slide-43
SLIDE 43

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

Optimality of homoscedasticity

Theorem 3 (Optimality of homoscedasticity) Homoscedastic noise produces the best asymptotic PCA amplitude (3), subspace recovery (4) and coefficient recovery (5) in Theorem 1 for a given average noise variance ¯ σ2 = p1σ2

1 + · · · + pLσ2 L over all distributions of noise variances for

which A(βi) > 0.

Laura Balzano University of Michigan Low-rank structure in messy data

slide-44
SLIDE 44

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

Next Steps

We are interested in a weighted version of PCA to leverage the final theorem. According to the theory, 1/σi weighting is not optimal. We wish to study the finite sample behavior so that we can understand when to throw away data.

Laura Balzano University of Michigan Low-rank structure in messy data

slide-45
SLIDE 45

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

Conclusion

Low-rank structure shows up in many big data problems. Big datasets are messy datasets! We can recover low-rank structure both when data are

  • bserved through an unknown nonlinear monotonic function

and

  • bserved with different noise variances.

Laura Balzano University of Michigan Low-rank structure in messy data

slide-46
SLIDE 46

Introduction LRMC with Monotonic Observations PCA for Heteroscedastic Data Conclusion

Thank you! Questions?

Ganti, R. S., Balzano, L., and Willett, R. (2015). Matrix completion under monotonic single index models. In Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., and Garnett, R., editors, Advances in Neural Information Processing Systems 28, pages 1864–1872. Curran Associates, Inc. Hong, D., Balzano, L., and Fessler, J. (2017). Asymptotic performance of pca for high-dimensional heteroscedastic data. Coming on arxiv soon! Kakade, S. M., Kanade, V., Shamir, O., and Kalai, A. (2011). Efficient learning of generalized linear and single index models with isotonic regression. In Advances in Neural Information Processing Systems, pages 927–935. Wen, Z., Yin, W., and Zhang, Y. (2012). Solving a low-rank factorization model for matrix completion by a nonlinear successive over-relaxation algorithm. Mathematical Programming Computation, 4(4):333–361. Laura Balzano University of Michigan Low-rank structure in messy data

slide-47
SLIDE 47

The Pool Adjacent Violator (PAV) algorithm pools points and averages to solve arg min

monotone g

  • 1

p

p

  • i=1

(g(xi) − yi)2

  • .

Back to

LPAV .

0.2 0.4 0.6 0.8 1 1 2 3 4 5 6 Data PAV LPAV

Laura Balzano University of Michigan Low-rank structure in messy data

slide-48
SLIDE 48

High-rank Matrices: Effective rank

Definition 4 The effective rank of an n × m matrix Y , m < n, with singular values σj is rǫ(Y ) = min   k ∈ N :

  • m

j=k+1 σ2 j

m

j=1 σ2 j

≤ ǫ    . Back to

Matrix Completion . Laura Balzano University of Michigan Low-rank structure in messy data

slide-49
SLIDE 49

MMC-LS Algorithm

Algorithm 3 MMC-LS Given max iterations T > 0, step size η > 0, rank r, data YΩ Init ˆ g(0)(z) = |Ω|

mnz, ˆ

Z (0) = mn

|Ω|Y0, where Y0 zero-filled YΩ.

for t = 1, 2, . . . , T do Update ˆ Z using gradient descent: ˆ Z (t)

i,j = ˆ

Z (t−1)

i,j

−η

  • ˆ

gt−1 ˆ Z (t−1)

i,j

  • − Yi,j

gt−1)′( ˆ Z (t−1)

i,j

)I(i,j)∈Ω Project: ˆ Z (t) = Pr( ˆ Z (t)) Update ˆ g: ˆ g(t) = LPAV

  • {( ˆ

Z (t)

i,j , Yi,j) for (i, j) ∈ Ω}

  • .

end for

Laura Balzano University of Michigan Low-rank structure in messy data

slide-50
SLIDE 50

Optimization of Calibrated Loss

Let Φ : R → R be a differentiable function that satisfies Φ′ = g∗. Since g∗ is monotonic, Φ is convex. Consider: L(Φ, Z) =

  • (i,j)∈Ω

Φ(Zi,j) − Yi,jZi,j Differentiating with respect to Z we get that a minimizer satisfies

  • (i,j)∈Ω g∗(Zi,j) − Yi,j = 0; in other words, Z ∗ is a minimizer in
  • expectation. So L(Φ, Z) is a calibrated loss for our problem.

Back to

MMC-calibrated . Laura Balzano University of Michigan Low-rank structure in messy data