M-Estimation under High-Dimensional Asymptotics DLD, Andrea - PowerPoint PPT Presentation

M-estimation Our Paper Isometry Between (M)-estimation & Lasso M-Estimation under High-Dimensional Asymptotics DLD, Andrea Montanari 2014-05-01 DLD, Andrea Montanari M-Estimation under High-Dimensional Asymptotics

M-estimation Classical M-estimation Our Paper Big Data M-estimation Isometry Between (M)-estimation & Lasso ∗ “An out-of-the-park grand-slam home run” Annals of Mathematical Statistics 1964 ∗ Richard Olshen DLD, Andrea Montanari M-Estimation under High-Dimensional Asymptotics

M-estimation Classical M-estimation Our Paper Big Data M-estimation Isometry Between (M)-estimation & Lasso DLD, Andrea Montanari M-Estimation under High-Dimensional Asymptotics

M-estimation Classical M-estimation Our Paper Big Data M-estimation Isometry Between (M)-estimation & Lasso M-estimation Basics Location model Y i = θ + Z i , i = 1 , . . . , n Errors: Z i ∼ F , not necessarily Gaussian. “Loss” Function ρ ( t ) eg t 2 , | t | , − log( f ( t )),. . . n � ( M ) min ρ ( Y i − θ ) θ i =1 Asymptotic Distribution √ n (ˆ θ n − θ ) ⇒ D N (0 , V ) , n → ∞ . Asymptotic Variance: ψ = ρ ′ : � ψ 2 d F V ( ψ, F ) = � ( ψ ′ d F ) 2 Information Bound 1 V ( ψ, F ) ≥ I ( F ) DLD, Andrea Montanari M-Estimation under High-Dimensional Asymptotics

M-estimation Classical M-estimation Our Paper Big Data M-estimation Isometry Between (M)-estimation & Lasso The One-Step Viewpoint One-Step Huber Estimates in the Linear Model P. J. BICKEL* Simple "one-step" versions of Huber's (M) estimates for equivalence holds in the more general context of the the linear model are introduced. Some relevant Monte Carlo results obtained linear model for general 46. in the Princeton project [1] are singled out and discussed. The large Typically the estimates obtained from (1.1) are not sample behavior of these procedures is examined under very mild scale equivariant.1 To obtain acceptable procedures a regularity conditions. scale equivariant and location invariant estimate of scale 1. INTRODUCTION 6 must be calculated from the data and 6 be obtained as the solution of In 1964 Huber [7] introduced a class of estimates n (referred to as (M)) in the location problem, studied their E, Off (Xi - 0) = O(1.4) asymptotic behavior and identified robust members of j-1 the group. These procedures are the solutions 8 of equa- where tions of the form, (x) = (x/a) . (1.5) n The resulting 6 is then both location and scale equi- E +(X}-') -0, (1.1) variant. The estimate 6 can be obtained simultaneously ' by with solving a system of equations such as those of Xi = 0 + El, * ?Xn = + E. and where El, ** En 2 [8, p. 96] or the "likelihood Huber's Proposal are unknown independent, distributed identically errors equations" which have a distribution F which is symmetric about 0. n ;\ If F has a density Xj - f which is smooth and if f is known, E 4 ) 0X O' then maximum likelihood estimates if they exist satisfy a-1 (1.6) = -f'/f. (1.1) with F6 n Xi - 6 Under successively milder regularity conditions on V ' were and F, Huber showed in [7] and [8] that such consistent and asymptotically normal with mean 0 and where x (t) = t4* (t) Or, we may choose 6 indepen- -1. variance F)/n where dently. For instance, in this the K(#k, article, normalized inter- JASA 1975 F) =1 2 quartile range, dt K(VI, /(t)f(t) f(t)do(t)I . (1.2) l = (X(n-[n/4]+1) - X([,/4]))/24'-1(3/4), (1.7) and If F is unknown but close to a normal the symmetrized interquartile range, distribution with mean 0 and known variance in a suitable sense, Huber { i - m I/(D-1(3), 62 = median (1.8) in [7] further showed that (M) estimates based on are used where X(l) < ... < X(n) are the order statistics, if Itl < K #K(t) = t 4' is the standard normal cdf and m is the sample median. DLD, Andrea Montanari M-Estimation under High-Dimensional Asymptotics = Ksgnt if Itl >K (1.3) If 6 -+ a(F) at rate 1/v'n and F is symmetric as hy- pothesized, then the asymptotic theory for the location have a desirable minimax robustness property. If K is model continues to be valid with K (t, F) replaced by finite these estimates can only be calculated iteratively. cf. [7].) We shall show the con- It has, K(#6( (F)), F). (E.g., (in however, been observed by Fisher, Neyman and text of the linear model) under mild conditions that the others that if F is known and ' = ((- f'/f), the estimate one-step "Gauss-Newton" approximation to (1.4)-O obtained by starting with a Vn consistent estimate 6 being the only unknown-behaves asymptotically like and performing one Gauss-Newton iteration of (1.1) is the root. asymptotically efficient even when the MLE is not and The estimates corresponding to OK have a rather ap- is equivalent to it when it is (cf. [13]). One purpose of pealing form and, of course, all of these Gauss-Newton this note is to show that under mild conditions this I In this article location (scale) invariance refers to procedures which remain * P.J. Bickel is professor, Department of Statistics, University of California, unchanged when the data are shifted (rescaled). The term "equivariant" is in ac- Berkeley, Ca. 94720. This research was performed with partial support of the cord with its usage in [2]. Thus, ; location and scale equivariant means that O.N.R. under Contract N00014-67-A-D151-0017 with Princeton University, and ;(aX1 + b, * , aX. + b) = aJ(Xi, * *, aX,) + b and a scale equivariant means N00014-67-A0114-0004 with the University of California at Berkeley, as well as that 3(aXi, * *, aX.) = Ia * *, Xn). that of the John Simon Guggenheim Foundation. I(Xi, The author would like to thank P.J. Huber, C. Kraft and C. Van Eeden and D. Relles for providing him with a Journal of the American Statistical Association reprints of their work on this subject; W. Rogers III for programming the Monte Carlo computations of Section 3, which appeared in the June 1975, Volume 70, Number 350 Princeton project; and a referee who made Tables 1 and 2 reflect numerical realities. Theory and Methods Section 428

M-estimation Classical M-estimation Our Paper Big Data M-estimation Isometry Between (M)-estimation & Lasso Regression M-estimation: the One-Step Viewpoint Regression model Y i = X ′ i θ + Z i , Z i ∼ iid F , i = 1 , . . . , n Objective function of (M): n � ρ ( Y i − X ′ R ( ϑ ) = i ϑ ) i =1 ( M ) min ϑ R ( ϑ ) θ n any √ n -consistent estimate of θ : One-step estimate: ˜ θ 1 = ˜ θ n ] − 1 ∇ R | ˜ ˆ θ n − [Hess R | ˜ θ n . Effectiveness: ˆ θ true solution of M-equation: θ 1 − ˆ θ 1 − ˆ θ ) ′ = o ( n − 1 ) E (ˆ θ )(ˆ DLD, Andrea Montanari M-Estimation under High-Dimensional Asymptotics

M-estimation Classical M-estimation Our Paper Big Data M-estimation Isometry Between (M)-estimation & Lasso Driving Idea of Classical Asymptotics The M-estimate is asymptotically equivalent to a single step of Newton’s method for finding a zero of ∇ R starting at the true underlying parameter. Goes back to Fisher, ‘Method of Scoring’ for MLE. DLD, Andrea Montanari M-Estimation under High-Dimensional Asymptotics

M-estimation Classical M-estimation Our Paper Big Data M-estimation Isometry Between (M)-estimation & Lasso Derivation of Asymptotic Variance Formula Approximation to One-Step: 1 θ 1 = θ + B ( ψ, F )( X ′ X ) − 1 X ′ ( ψ ( Z i )) + o p ( n − 1 / 2 ) ˆ � ψ ′ dF . Observe that where B ( ψ, F ) = Var (( X ′ X ) − 1 X ′ ( ψ ( Z i ))) ∼ ( X ′ X ) − 1 A ( ψ, F ) � ψ 2 dF . Hence if X i , j ∼ N (0 , 1 where A ( ψ, F ) = n ) θ i − θ i ) → A ( ψ, F ) Var (ˆ B ( ψ, F ) 2 = V ( ψ, F ) DLD, Andrea Montanari M-Estimation under High-Dimensional Asymptotics

M-estimation Classical M-estimation Our Paper Big Data M-estimation Isometry Between (M)-estimation & Lasso Asymptotics for Regression, I DLD, Andrea Montanari M-Estimation under High-Dimensional Asymptotics

M-estimation Classical M-estimation Our Paper Big Data M-estimation Isometry Between (M)-estimation & Lasso Asymptotics for Regression, II PJ Huber, Annals of Statistics 1973 DLD, Andrea Montanari M-Estimation under High-Dimensional Asymptotics

M-estimation Classical M-estimation Our Paper Big Data M-estimation Isometry Between (M)-estimation & Lasso DLD, Andrea Montanari M-Estimation under High-Dimensional Asymptotics

M-Estimation under High-Dimensional Asymptotics DLD, Andrea - PowerPoint PPT Presentation

M-estimation Our Paper Isometry Between (M)-estimation & Lasso M-Estimation under High-Dimensional Asymptotics DLD, Andrea Montanari 2014-05-01 DLD, Andrea Montanari M-Estimation under High-Dimensional Asymptotics M-estimation

Asymptotics of symmetric functions with applications to Setup Asymptotics of statistical

Mean field asymptotics in high-dimensional statistics: A few references Andrea Montanari July

Asymptotics Will Perkins January 22, 2013 Asymptotics In many theorems and questions in

Statistical mechanics via Answers: GUE asymptotics of symmetric functions Probability via Schur

Foundations of Computer Science Lecture 9 Sums And Asymptotics Computing Sums Asymptotics:

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

PPMLHDFE: Fast Poisson Estimation with High Dimensional Fixed Effects Paulo Guimares 2020

High Dimensional Data, Covariance Matrices High Dimensional Data Examples and Application to

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou

Regularized Estimation in High-dimensional Time Series Models Sumanta Basu Cornell University

Wald Test Asymptotics of LRT Lecture 21 Biostatistics 602 - Statistical Inference . . . .

What is this talk about? Applied Asymptotics in R an R package bundle Examples of the use of

Asymptotics of radiation fields in asymptotically Minkowski spacetimes Dean Baskin joint with

Cauchy-Riemann (CR) Manifolds, Szeg o kernel asymptotics and Morse inequalities on CR-manifolds

Data Asymptotics Dr. Jarad Niemi STAT 544 - Iowa State University February 7, 2018 Jarad Niemi

SLAM Survey: Simultaneous Localization and Mapping Martin Poppinga University of Hamburg

Table Interpretation SIGIR 2019 tutorial - Part II Shuo Zhang and Krisztian Balog University of

2 3 4 1430 Review Key Slam Slam Bid & Play Real Hands HE Card Zone Invite Ask

Week 4: 1430 Review March 9, 2020 In Slam Zone, after suit agreement Control Bids (if any) Week 4

Real Time GPU Stereo Visual Simultaneous Localization and Mapping Brent Tweddle May 13, 2009

www n http://www.cs.berkeley.edu/~pabbeel/cs287-fa11 n [Step through webpage] Page 1

Quark-sector CP violations and beyond Youngjoon Kwon Yonsei University 1st T2HKK Workshop @

Introduction to Mobile Robotics Welcome Lukas Luft, Wolfram Burgard 1 Today This course

M-Estimation under High-Dimensional Asymptotics DLD, Andrea - PowerPoint PPT Presentation

M-estimation Our Paper Isometry Between (M)-estimation & Lasso M-Estimation under High-Dimensional Asymptotics DLD, Andrea Montanari 2014-05-01 DLD, Andrea Montanari M-Estimation under High-Dimensional Asymptotics M-estimation

Asymptotics of symmetric functions with applications to Setup Asymptotics of statistical

Mean field asymptotics in high-dimensional statistics: A few references Andrea Montanari July

Asymptotics Will Perkins January 22, 2013 Asymptotics In many theorems and questions in

Statistical mechanics via Answers: GUE asymptotics of symmetric functions Probability via Schur

Foundations of Computer Science Lecture 9 Sums And Asymptotics Computing Sums Asymptotics:

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

PPMLHDFE: Fast Poisson Estimation with High Dimensional Fixed Effects Paulo Guimares 2020

High Dimensional Data, Covariance Matrices High Dimensional Data Examples and Application to

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou

Regularized Estimation in High-dimensional Time Series Models Sumanta Basu Cornell University

Wald Test Asymptotics of LRT Lecture 21 Biostatistics 602 - Statistical Inference . . . .

What is this talk about? Applied Asymptotics in R an R package bundle Examples of the use of

Asymptotics of radiation fields in asymptotically Minkowski spacetimes Dean Baskin joint with

Cauchy-Riemann (CR) Manifolds, Szeg o kernel asymptotics and Morse inequalities on CR-manifolds

Data Asymptotics Dr. Jarad Niemi STAT 544 - Iowa State University February 7, 2018 Jarad Niemi

SLAM Survey: Simultaneous Localization and Mapping Martin Poppinga University of Hamburg

Table Interpretation SIGIR 2019 tutorial - Part II Shuo Zhang and Krisztian Balog University of

2 3 4 1430 Review Key Slam Slam Bid &amp; Play Real Hands HE Card Zone Invite Ask

Week 4: 1430 Review March 9, 2020 In Slam Zone, after suit agreement Control Bids (if any) Week 4

Real Time GPU Stereo Visual Simultaneous Localization and Mapping Brent Tweddle May 13, 2009

www n http://www.cs.berkeley.edu/~pabbeel/cs287-fa11 n [Step through webpage] Page 1

Quark-sector CP violations and beyond Youngjoon Kwon Yonsei University 1st T2HKK Workshop @

Introduction to Mobile Robotics Welcome Lukas Luft, Wolfram Burgard 1 Today This course

2 3 4 1430 Review Key Slam Slam Bid & Play Real Hands HE Card Zone Invite Ask