On Robustness of Principal Component Regression Anish Agarwal - - PowerPoint PPT Presentation

on robustness of principal component regression
SMART_READER_LITE
LIVE PREVIEW

On Robustness of Principal Component Regression Anish Agarwal - - PowerPoint PPT Presentation

On Robustness of Principal Component Regression Anish Agarwal Devavrat Shah, Dennis Shen, Dogyoon Song MIT 1 What is PCR? 1 2 What is PCR? 1 3 What is PCR? 1 Step 1: PCA 4 What is PCR? 1 Step 1: PCA ( k -components) 5 What is PCR?


slide-1
SLIDE 1

1

On Robustness of Principal Component Regression

Anish Agarwal Devavrat Shah, Dennis Shen, Dogyoon Song MIT

slide-2
SLIDE 2

2

What is PCR?

1

slide-3
SLIDE 3

3

What is PCR?

1

slide-4
SLIDE 4

4

What is PCR?

1

Step 1: PCA

slide-5
SLIDE 5

What is PCR?

1

Step 1: PCA

(k-components)

5

slide-6
SLIDE 6

6

What is PCR?

1

Step 2: Regression

minimize

slide-7
SLIDE 7

7

What is PCR?

1

Step 3: Prediction

slide-8
SLIDE 8

8

When & Why Use PCR

2

slide-9
SLIDE 9

9

Data Science Folklore

2

“IF DATA IS (APPROXIMATELY) LOW-DIMENSIONAL, USE PCR!”

  • - LOREM IPSUM
  • - An

Anonymous Data ta Scienti tists ts

Whe When n exactly sho should we be usi sing ng PC PCR?

slide-10
SLIDE 10

10

Key Questions We Answer

2

Theoretical properties of PCR? Is dimension-reduction only benefit to PCR?

slide-11
SLIDE 11

11

Our Theoretical Analysis of PCR helps answer following questions..

How many principal components to pick? How low-rank do covariates need to be? How well does PCR perform on a test data (i.e. generalization properties)?

slide-12
SLIDE 12

12

  • - LOREM IPSUM

NO!

Is Dimension-Reduction Only Benefit?

slide-13
SLIDE 13

13

PCR (as is) works for a wide variety of settings!

2

Missing

? ? ? ?

Mixed valued

1 3. 3.14

Sensitive Noisy

slide-14
SLIDE 14

14

  • - LOREM IPSUM

We We show PCR R is surprisingly ly robu bust to proble blems ms th that p t plague ue l larg rge-sca scale m modern rn d data tase sets ts

Ma Main in Con

  • ntrib

ibut ution ion of

  • f this

is Wor

  • rk
slide-15
SLIDE 15

15

  • - LOREM IPSUM

Erro rror-In Vari ariab able Regre ression

(S (Setti etting We e Consider) er)

slide-16
SLIDE 16

16

Classical (high-dimensional) Regression

2

slide-17
SLIDE 17

17

Error-in-Variable (EIV) Regression

2

? ? ? ?

Representative of modern datasets

slide-18
SLIDE 18

18

EIV - Surprising Number of Applications

2

Causal Inference (Synthetic Control) Time Series Analysis Differentially-private Regression Mixed Valued Regression

(noise by design) (measurement noise) (structural noise) (measurement noise)

slide-19
SLIDE 19

19

EIV - Surprising Number of Applications

2

Causal Inference (Synthetic Control) Time Series Analysis Differentially-private Regression Mixed Valued Regression

(noise by design) (structural noise) (measurement noise) (measurement noise)

slide-20
SLIDE 20

20

  • - LOREM IPSUM

Formal R Results

slide-21
SLIDE 21

21

Theorem (Informal): Training Error

2

OLS minmax error rate (low-dimensional, noiseless, fully observed covariates) PCR implicitly denoises covariates!

If principal components chosen correctly (" = $)

fraction of observations number of covariates

slide-22
SLIDE 22

22

Theorem (Informal): Testing Error

2

If principal components not chosen correctly (" ≠ $)

Test Error Train Error with PCR(")

PCR implicitly de-noises covariates PCR implicitly performs &'-regularization

Choose k that minimizes above

slide-23
SLIDE 23

23

When To and Not to Use PCR? – Look at Spectrum

2 Don’t Use PCR! Use PCR!

Magnitude of Singular Values Singular Values (ordered by magnitude)

Case 1 Case 3 Case 2 Case 4

slide-24
SLIDE 24

24

Exponential-decaying spectrum is ubiquitous in real-world data

2

GDP Trajectories (Macroeconomics)

slide-25
SLIDE 25

25

Avito Ad-Click Dataset (E-Commerce) Exponential-decaying spectrum is ubiquitous in real-world data

2

slide-26
SLIDE 26

26

Cricket Trajectories (Sports) Exponential-decaying spectrum is ubiquitous in real-world data

2

slide-27
SLIDE 27

27

Surprising Applications of PCR

3

slide-28
SLIDE 28

28

Applications of Error-In-Variable Regression

3

Causal Inference (Synthetic Control) Time Series Analysis Differentially-private Regression Mixed Valued Regression

(noise by design) (measurement noise) (structural noise) (measurement noise)

slide-29
SLIDE 29

29

Da Data p privacy i is t top-of

  • f-mind

mind as s we we inc increasing singly apply ML on n se sensit nsitiv ive use ser data (gene netic ic data, purcha hase se hist history etc.)

slide-30
SLIDE 30

30

Standard N Notion o

  • f P

Priva vacy i in M ML ε-Differential P Priva vacy

Intuitively, an algorithm is ε-differentially private if ou

  • utcom
  • me of
  • f a

a stati tatisti tical al query ry on a database ca cannot ch change by mo more than ε due to

pr presence/absence of any us user data record

Example of Statistical Query: “Average Income of all users between ages 25 and 30”

slide-31
SLIDE 31

31

Ho How w to achie hieve ε-di

differ eren entially priva vacy?

Laplace M Mechanism

database

Laplacian N Noise ⁄ " #

slide-32
SLIDE 32

32

Pr Predict ictiv ive Accu ccuracy cy vs.

  • s. Pr

Priv ivacy cy Tradeoff ff

Ca Can n we achi hieve good prediction n error and nd still maint ntain n privacy? y? Ye Yes!

slide-33
SLIDE 33

33

Pr Predict ictiv ive Accu ccuracy cy vs.

  • s. Pr

Priv ivacy cy Tradeoff ff

Ca Can n we achi hieve good prediction n error and nd still maint ntain n privacy? y?

Step 1: Data Owner adds Laplacian Noise Step 2: Analyst Performs PCR

Don Done!

slide-34
SLIDE 34

34

Wh What i t is s sample c complexity ty c cost f t for r ε-di differential p privacy?

Prediction Error

Do Does de de-no noising ising st step (PC PCA) break priv ivacy cy? No, PCA only de-noises covariates on average with respect to the - norm

slide-35
SLIDE 35

35

Conclusion

4

slide-36
SLIDE 36

36

In Inspec pect spec pectrum of yo your cova variate e matrix

Magnitude of Singular Values Singular Values (ordered by magnitude)

Case 1 Case 2

de-noises

Use PCR!

regularizes

slide-37
SLIDE 37

37

Po Possib ssible Implica icatio ions ns fo for Modern n ML

Step 1: Dimension Reduction

PCA

Linear Case

Li Linea ear l low-di dimens nsional nal covar ariat ate pre- proc processing has many implicit benefits (e.g. de- noising, regularizing)

Non-Linear Case

Does non-linear covariate pre-processing (e.g. GANs) have similar benefits for unstructured data?

GANs?

slide-38
SLIDE 38

38

Co Come Me Meet Us s At Our Post ster Po Post ster #3 #3 – East Exhibition Hall B + C, 5-7pm, Thursday Sh Shameless Plug ug :) PCR for Time Series Analysis: ts tspd pdb.mit. t.edu PCR for Causal Inference: gi github.com/Rom Romcos

  • s/SC

SC_de demo