Statistics via State Evolution Mohsen Bayati Stanford University - - PowerPoint PPT Presentation

statistics via state evolution
SMART_READER_LITE
LIVE PREVIEW

Statistics via State Evolution Mohsen Bayati Stanford University - - PowerPoint PPT Presentation

Risk and Noise Estimation in High Dimensional Statistics via State Evolution Mohsen Bayati Stanford University Joint work with Jose Bento, Murat Erdogdu, Marc Lelarge, and Andrea Montanari Statistical learning motivations Data Prediction


slide-1
SLIDE 1

Risk and Noise Estimation in High Dimensional Statistics via State Evolution

Mohsen Bayati Stanford University Joint work with Jose Bento, Murat Erdogdu, Marc Lelarge, and Andrea Montanari

slide-2
SLIDE 2

Statistical learning motivations

slide-3
SLIDE 3

Data  Prediction

  • Online advertising:

– Predict probability of click on an ad

  • Healthcare

– Predict occurrence of diabetes

  • Finance

– Predict change in stock prices

slide-4
SLIDE 4

Formulation

  • Patient record i:
  • Given n records:
  • Posit a linear model:
  • Goal: find a good
slide-5
SLIDE 5

Massive amounts of measurements

Traditional clinical decision making, based on few important measurements (small p) Electronic health records  many cheap measurements (large p)

Monitoring MD exam Nurse

  • bservation

Labs Medications Radiology Location tracking Real-time vital signs Testing for wellness Genomic data Personalized medicine Smartphone apps

slide-6
SLIDE 6

How to use more measurements?

  • Standard least square  Many solutions

– Most solutions are poor for future outcomes (due to noise)

  • Main problem: For large p find few important measurements
  • Infer a sparse
slide-7
SLIDE 7
  • Define a loss function:
  • Example: least square or Gaussian noise:
  • Estimate by

Learning recipe

slide-8
SLIDE 8

NP-hard problem

  • Estimate by
slide-9
SLIDE 9

Convex relaxation (LASSO)

  • Estimate by

– Tibshirani’96, Chen-Donoho’95 – Automatically selects few important measurements

slide-10
SLIDE 10

Model selection

Source: Elements of Statistical Learning, Hastie et al 2009

High (small ) Low (large )

slide-11
SLIDE 11

Mathematical questions

slide-12
SLIDE 12

Characteristics of the solution

What performance should we expect?

– What is MSE or for each ? – How to choose the best ?

slide-13
SLIDE 13

Growing theory on LASSO

  • Zhao, Yu (2006)
  • Candes, Romberg, Tao (2006)
  • Candes, Tao (2007)
  • Bunea, Tsybakov, Wegkamp (2007)
  • Bickel, Ritov, Tsybakov (2009)
  • Buhlmann, van de Geer (2009)
  • Zhang (2009)
  • Meinshausen, Yu (2009)
  • Wainwright (2009, 2011)
  • Talagrand (2010)
  • Belloni, Chernozhukov et al (2009-13)
  • Maleki et al (2011)
  • Bickel et al (2012)
  • There are many more but not listed due to space limitations.
slide-14
SLIDE 14

General random convex functions

  • Consider
  • Let A be a Gaussian matrix
  • Let

be strictly convex or

  • Talagrand’10: Finds generic properties of the minimizer

, in particular MSE can be calculated, when certain replica symmetric equations have solutions

  • Chapter 3 of Mean Field Models for Spin Glasses Vol 1
  • Does not apply to our case with
slide-15
SLIDE 15

Some intuition: scalar case (p=n=1)

  • For
  • LASSO estimate is
  • Simple calculus:
  • Then MSE is

– With independent

slide-16
SLIDE 16

Main result

  • Theorem (Bayati-Montanari): For

and

– Like the scalar case with – Where:

slide-17
SLIDE 17

Main result

  • Theorem (Bayati-Montanari): For

and

– In fact we prove:

  • Problem asymptotically decouples to p scalar sub-problems with

increased Gaussian noise

– And we find a formula for the noise

slide-18
SLIDE 18

Main result (general case)

  • Theorem (Bayati-Montanari): For

and

– In fact we prove:

  • Problem asymptotically decouples to p scalar sub-problems with

increased Gaussian noise

slide-19
SLIDE 19

Main result (general case)

  • Theorem (Bayati-Montanari): For

and

– Note: There is strong empirical and theoretical evidence that Gaussian assumption on A is not necessary

  • Donoho-Maleki-Montanari’09
  • Bayati-Lelarge-Montanari’12-13
slide-20
SLIDE 20

Algorithmic Analysis

slide-21
SLIDE 21

Proof strategy

  • 1. Construct a sequence in that
  • 2. Show that

– With

slide-22
SLIDE 22

Start with Belief Propagation

1. Gibbs measure 2. Write cavity (belief propagation) equations at 3. Each message is a probability distribution with mean satisfying

slide-23
SLIDE 23

AMP algorithm: derivation

1. Gibbs measure 2. Message-passing (MP) algorithm 3. Look at first order approximation of the messages (AMP)

slide-24
SLIDE 24

AMP algorithm

  • Approximate message passing (AMP)

– Donoho , Maleki, Montanari’09

Onsager reaction term.

slide-25
SLIDE 25

AMP and compressed sensing

  • AMP was originally designed to solve this problem:
  • This is equivalent to LASSO solution when
  • In this case assumption is that x0 is the solution to the above
  • ptimization when L1 norm is replaced with L0 norm
slide-26
SLIDE 26

Phase transition line and algorithms

Source: Arian Maleki’s PhD Thesis

slide-27
SLIDE 27

AMP algorithm

  • Approximate message passing (AMP)

– Donoho , Maleki, Montanari’09

  • For Gaussian A as it converges and accuracy can be

achieved after iterations

– Bayati, Montanari’12

slide-28
SLIDE 28

Main steps of the proof

1. We use a conditioning technique (due to Bolthausen) to prove:

– Where

2.

  • 3. Therefore algorithm’s estimate satisfies the main claim:
slide-29
SLIDE 29

Recall the main result

  • Theorem (Bayati-Montanari): For

and

  • Problem: The right hand side requires knowledge of

. What can be done when we do not have that?

slide-30
SLIDE 30

Objective

  • Recall the problem:
  • Given , construct estimator for MSE and (*)
  • So far we used the knowledge of noise and distribution of x0

which is not realistic.

  • Next, we’ll demonstrate how to solve (*).
slide-31
SLIDE 31

Recipe (Columns of A are iid)

  • 1. Let
  • 2. Define pseudo-data
  • 3. The estimators are

where

slide-32
SLIDE 32

Main Result

Theorem (Bayati-Erdogdu-Montanari’13) For and

  • For correlated columns we have a similar (non-rigorous)

formula that relies on a conjecture based on replica method due to Javanmard-Montanari’13.

slide-33
SLIDE 33

Sketch of the proof

– where

  • Using Stein’s SURE estimate:
slide-34
SLIDE 34

MSE Estimation (iid Gaussian Data)

slide-35
SLIDE 35

MSE Estimation (Correlated Gaussian Data)

Relies on a replica method conjecture

slide-36
SLIDE 36

Comparison with noise estimation methods

  • Belloni, Chernuzhukov (2009)
  • Fan, Guo, Hao (2010)
  • Sun, Zhang (2010)
  • Zhang (2010)
  • Städler, Bühlmann, van de Geer (2010, 2012)
slide-37
SLIDE 37

Noise Estimation (iid Gaussian Data)

slide-38
SLIDE 38

Noise Estimation (Correlated Gaussian Data)

slide-39
SLIDE 39

Extensions to general random matrices

slide-40
SLIDE 40

Recall AMP and MP algorithm

1. Gibbs measure 2. Message-passing algorithm 3. Look at first order approximation of the messages.

slide-41
SLIDE 41

General random matrices (i.n.i.d.)

Theorem (Bayati-Lelarge-Montanari’12) 1) As , finite marginals of are asymptotically insensitive to the distribution of with sub-exponential tail. 2) The entries are asymptotically Gaussian with zero mean, and variance that can be calculated by a one dimensional equation.

slide-42
SLIDE 42

Main steps of the proof

Step 1: AMP is asymptotically equivalent to its belief propagation (MP) counterpart (w.l.o.g. assume A is symmetric)

MP AMP

slide-43
SLIDE 43

Main steps of the proof

Step 2: MP messages are summation over non-backtracking trees Example: If

a b i l

slide-44
SLIDE 44

Main steps of the proof

Step 2: MP messages are summation over non-backtracking trees Example: If

a b i l

slide-45
SLIDE 45

Main steps of the proof

Step 2: (continued)

slide-46
SLIDE 46

Main steps of the proof

Step 2: (continued) First term is independent of the distribution and only depends

  • n the second moments.

Each edge is repeated twice Converges to 0 as p grows

slide-47
SLIDE 47

Extensions and open directions

  • Setting:
  • General distribution on A, other cost functions/regularizers
  • Promising progress:

– Rangan et al ’10-12 – Schniter et al’10-12 – Donoho-Johnston-Montanari’11 – Maleki et al’11 – Krzakala-Mézard-Sausset-Sun-Zdeborová 11-12 – Bean-Bickel-El Karoui-Yu’12 – Bayati-Lelarge-Montanari’12 – Javanmard-Montanari’12,13 – Kabashima et al ‘12-14 – Manoel-Krzakala-Tramel-Zdeborová ‘14 – Caltagirone-Krzakala-Zdeborová ’14 – Schülke-Caltagirone-Zdeborová ’14

slide-48
SLIDE 48

Thank you!