Fast and efficient estimation of the geometric median in high - PowerPoint PPT Presentation

Fast and efficient estimation of the geometric median in high dimension : a stochastic gradient approach Herv´ e Cardot Institut de Math´ ematiques de Bourgogne, Universit´ e de Bourgogne with Peggy C´ enac (Univ. Bourgogne) and Mohamed Chaouch (EDF) Herve.Cardot@u-bourgogne.fr Compstat - August 2010 - Paris

The median in R A ”central” notion in statistics since Laplace. For a random variable taking values in R : ”the” (may not be unique) value m such that P ( X ≤ m ) = 0 . 5 . Another characterization of the median m � E (sign( X − m )) = sign( X ( ω ) − m ) dP ( ω ) = 0 . X − m Since sign( X − m ) = | X − m | , we also have the following characterization, m = arg min z ∈ R E | X − z | . • The quantile of order α, for α ∈ ]0 , 1[ , is defined by P ( X ≤ q α ) = α. Equivalently, q α = arg min z ∈ R E [ | X − z | + (2 α − 1)( X − z )] .

The geometric median in R d (or in a separable Hilbert space H ) In the Euclidean space R d equipped with its usual norm � � , a natural generalization of the median m := arg min z ∈ H E � X − z � , called geometric or spatial median (Haldane, 1948). Property (Kemperman, 1987) If the space H is strictly convex, the geometric median m is unique, unless the support of X is within a one dimensional subspace. • Examples of strictly convex spaces : - Euclidean spaces R d , when d > 1 , - separable Hilbert spaces H , - some Banach spaces ( L p , 1 < p < ∞ ). • Support condition : ∃ ( u 1 , u 2 ) ∈ H × H , � u 1 , u 2 � = 0 , V ar( � u 1 , X � ) > 0 and V ar( � u 2 , X � ) > 0 .

Characterization of the geometric median We suppose there are no atoms ( ∀ x ∈ H , P ( X = x ) = 0). Then G : H �→ R defined by G ( x ) = E � X − x � , is strictly convex and Fr´ echet diff´ erentiable � X − x � Φ( x ) := ∇ G x = − E . � X − x � The median m is characterized by ∇ G m = 0 . G has a second order Fr´ echet derivative, at point m , Γ m : H �→ H , � � �� 1 I H − ( X − m ) ⊗ ( X − m ) Γ m := E , � X − m � � X − m � 2 where I H is the identity operator and u ⊗ v = � u , . � v , for ( u , v ) ∈ H 2 . If E � X − m � − 1 < ∞ , then Γ m is a bounded and strictly positive operator. There are constants, ∞ > E � X − m � − 1 = λ M > λ m > 0 , λ M � u � 2 ≥ � Γ m u , u � ≥ λ m � u � 2 , ∀ u ∈ H .

Robustness : the influence function Consider a distribution P 0 contaminated by a point-mass distribution at z ∈ H , P ǫ, z = (1 − ǫ ) P 0 + ǫδ z . The influence function m ( P ǫ, z ) − m ( P 0 ) IF m ( z ) = lim ǫ ǫ → 0 is a measure of the sensitivity of the median m to a small perturbation of the target distribution. Property z − m IF m ( z ) = Γ − 1 m � z − m � and the gross error sensitivity is bounded as follows sup {� IF m ( z ) � , z ∈ H } = 1 . λ m • The gross error sensitivity is not bounded for the mean.

Estimation in R d Suppose we have a sample of n independent realizations, X 1 , . . . , X n . The usual estimator of m (Gower, 1974, Vardi & Zhang, 2000, Gervini, 2008) is characterized by n � X i − � m m � = 0 , � X i − � i =1 Iterative and rather long procedures are needed (Newton-Raphson or Weiszfeld) to get a numerical solution n n � � X i − � m m e +1 = m e ) X i . m � = 0 ⇒ � p i ( � � X i − � i =1 i =1 Property (Haberman, 1989, Niemiro, 1992). If H = R d , when n → + ∞ , √ n ( � m n − m ) � N (0 , Γ − 1 m Var ( S ( X − m ))Γ − 1 m ) with S ( u ) = u / � u � , u ∈ R d .

A very simple and very fast algorithm We consider the algorithm X n +1 − m n m n +1 = m n + γ n � X n +1 − m n � and suppose the steps γ n are such that ∀ n , γ n > 0 , � � γ 2 γ n = ∞ and n < ∞ . n ≥ 1 n ≥ 1 Advantages • For a sample of n realizations in R d : O ( nd ) operations. • No need to store all the data (data streams) • Automatic update (online estimation)

A Robbins-Monro (1951) algorithm This algorithm (stochastic gradient) can also be written m n +1 = m n − γ n ( Φ( m n ) + ζ n +1 ) , � �� gradient with ζ n +1 = − X n +1 − m n � X n +1 − m n � − Φ( m n ) . • If the X n are i . i . d ., the sequence ζ n +1 is a martingale difference, E ( ζ n +1 | F n ) = 0 avec F n = σ ( X 0 , . . . , X n ) . Moreover, � � � ζ n +1 � 2 |F n ≤ 4 . E

A remark on geometric quantiles estimation This approach can be extended directly to get stochastic approximations to geometric quantiles (Chaudhuri, 1996). Consider a vector u ∈ H , such that � u � < 1 . The geometric quantile of X , say m u , corresponding to direction u , is defined, uniquely under previous assumptions, by m u = arg min Q ∈ H E ( � X − Q � + � X − Q , u � ) . It is characterized by Φ u ( m u ) = Φ( m u ) − u = 0 . The following stochastic approximation � X n +1 − � � m u n m u m u n +1 = � n + γ n n � + u . � � X n +1 − � m u

A convergence result in Hilbert spaces Result (Cardot, C´ enac, Zitt 2010) The sequence m n converges almost surely when n tends to infinity, � m n − m � → 0 , p . s . • Sketch of the proof (classical) Show that for all ǫ ∈ ]0 , 1[ , P (Ω ǫ ) = 0 , with ˘ ǫ < V n ( ω ) < ǫ − 1 ¯ Ω ǫ = ω ∈ Ω : ∃ n ǫ ( ω ) ≥ 1 , ∀ n ≥ n ǫ ( ω ) , considering that 0 1 n n X X B C γ 2 n →∞ E V n +1 = E V 0 + lim lim j + 2 γ n E � Φ( m n ) , m − m n � A ≤ C B C n →∞ @ | {z } j =1 j =1 < − λ ǫ ǫ in Ω ǫ Property For all r > 0 , the function G restricted to x ∈ B ( m , r ) is strongly convex : ∃ λ r > 0 , such that ∀ x ∈ B ( m , r ) � Φ( x ) , x − m � ≥ G ( x ) − G ( m ) ≥ λ r � x − m � 2 .

Does it really work ? A sample with Gaussian distribution, � 10 � 3 with mean (0 , 0) and variance . 3 2 4 2 X2 0 -2 -4 -5 0 5 X1

Not really, even for simple examples ! ! ! X n +1 − m n g m n +1 = m n + n 3 / 4 � X n +1 − m n � 0.10 0.5 RM, g=10 0.08 RM, g=1 0.4 AV, g=10 RM, g=10 AV, g=1 RM, g=1 AV, g=10 AV, g=1 0.06 0.3 MSE MSE 0.04 0.2 0.02 0.1 0.00 0.0 0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000 Iterations iteration

Averaging : a magic formula n � m n = 1 Consider now the mean of all past iterations, ¯ m j , n j =1  X n +1 − m n   m n +1 = m n + γ n � X n +1 − m n � m n + m n +1 − ¯ m n   ¯ = ¯ m n +1 n + 1 Property (in R d ) • If γ n = g / n α , 0 . 5 < α < 1 , √ n ( ¯ m n − m ) � N (0 , ∆) in distribution, where ∆ is the same variance matrix as for � m n , ∆ = Γ − 1 m Var ( S ( X − m ))Γ − 1 m with S ( u ) = u / � u � , u ∈ R d . Proof : As in Polyak & Juditsky (1992).

200 samples with size n = 2000 . Averaging 0.25 0.25 0.20 0.20 0.15 0.15 0.10 0.10 0.05 0.05 0.00 0.00 g = 0.1 g = 0.5 g = 1 g = 2 g = 5 g = 10 Mean g = 0.1 g = 0.5 g = 1 g = 2 g = 5 g = 10

Example : electricity consumption curves • EDF (Electricit´ e de France) has planned to install communicating meters (30 millions). Survey sampling techniques will be used to select 300 000 meters which will provide individual electricity consumption at fine time scales. • A first test on a population of N = 18900 giving electricity consumption every 30 minutes. 1200 1000 800 Electricity consumption 600 400 200 0 0 50 100 150

Example : median trajectory 300 250 200 Electricity consumption 150 100 50 mean curve median (pointwise) median curve 0 0 1 2 3 4 5 6 7 Days

Perspectives • Averaging in Hilbert spaces H : still no results on nonlinear algorithms in the literature. • Discretized noisy trajectories � � X n ( t n 1 ) + ǫ 1 n , . . . , X n ( t n Z n = p n ) + ǫ p n n • Covariates : conditional geometric median m ( X | Z = z ) = β 0 + β 1 z where Z is for example the mean consumption of the week before. We look for ( β 0 ,β 1 ) ∈ H × H E � X − ( β 0 + β 1 Z ) � min • (robust) Clustering with medians based on � . � (k-median).

Fast and efficient estimation of the geometric median in high - PowerPoint PPT Presentation

Fast and efficient estimation of the geometric median in high dimension : a stochastic gradient approach Herv e Cardot Institut de Math ematiques de Bourgogne, Universit e de Bourgogne with Peggy C enac (Univ. Bourgogne) and Mohamed

the nerves sensory radial median ulnar median median sensory median median ulnar radial

I - -75 Median Cable Barrier 75 Median Cable Barrier 75 Median Cable Barrier I 75 Median Cable

Linear-time Median Def: Median of elements A=a 1 , a 2 , , a n is the (n/2)-th smallest element

Spartanburg Nation Median Value of a $115,900 $184,700 Home Median Gross Rent $705 $950

African American Strategy Equitable Access to Homeownership Presentation April 16, 2018

Median Finding Test Cases What's Next 1. Median finding, part 2 2. Why we write test cases 3.

Business Statistics CONTENTS Hypotheses on the median The sign test The Wilcoxon signed ranks

Estimation of Median Incomes of Small Areas: A Bayesian Semiparametric Approach Malay Ghosh

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

Geometric Optimization Piotr Indyk April 26, 2005 Lecture 19: Geometric Optimization Geometric

Geometric Algebra A powerful tool for solving geometric problems in visual computing Leandro A.

When does the Tukey Median work? Banghua Zhu with Jiantao Jiao and Jacob Steinhardt Department

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

Events MedIAN Jobs Contact The Network Background About MedIAN UKs national Medical

Annual Survey of Hours and Earnings (ASHE) Key Findings Median Weekly Median weekly earnings in

W4231: Analysis of Algorithms Definition of median 9/14/1999 Let A = a 1 a n be a

CS4102 Algorithms Fall 2018 Warm up Compare + with + () When

Order Statistics Carola Wenk Slides courtesy of Charles Leiserson with additions by Carola Wenk

Mean, median & mode imputations DEALIN G W ITH MIS S IN G DATA IN P YTH ON Suraj Donthi

On Medians of (Randomized) Pairwise Means Pierre Laforgue 1 , Stephan Cl on 1 , Patrice Bertail

Saving Behavior Across the Wealth Distribution: The Importance of Capital Gains Andreas Fagereng

JUST THE MATHS SLIDES NUMBER 19.4 PROBABILITY 4 (Measures of location and dispersion) by

Quantifying Algorithmic Improvements over Time Lars Kotthofg University of Wyoming

Median Closings Report The Transportation Technical Advisory Committee reviewed the Rt 13

Fast and efficient estimation of the geometric median in high - PowerPoint PPT Presentation

Fast and efficient estimation of the geometric median in high dimension : a stochastic gradient approach Herv e Cardot Institut de Math ematiques de Bourgogne, Universit e de Bourgogne with Peggy C enac (Univ. Bourgogne) and Mohamed

the nerves sensory radial median ulnar median median sensory median median ulnar radial

I - -75 Median Cable Barrier 75 Median Cable Barrier 75 Median Cable Barrier I 75 Median Cable

Linear-time Median Def: Median of elements A=a 1 , a 2 , , a n is the (n/2)-th smallest element

Spartanburg Nation Median Value of a $115,900 $184,700 Home Median Gross Rent $705 $950

African American Strategy Equitable Access to Homeownership Presentation April 16, 2018

Median Finding Test Cases What's Next 1. Median finding, part 2 2. Why we write test cases 3.

Business Statistics CONTENTS Hypotheses on the median The sign test The Wilcoxon signed ranks

Estimation of Median Incomes of Small Areas: A Bayesian Semiparametric Approach Malay Ghosh

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

Geometric Optimization Piotr Indyk April 26, 2005 Lecture 19: Geometric Optimization Geometric

Geometric Algebra A powerful tool for solving geometric problems in visual computing Leandro A.

When does the Tukey Median work? Banghua Zhu with Jiantao Jiao and Jacob Steinhardt Department

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

Events MedIAN Jobs Contact The Network Background About MedIAN UKs national Medical

Annual Survey of Hours and Earnings (ASHE) Key Findings Median Weekly Median weekly earnings in

W4231: Analysis of Algorithms Definition of median 9/14/1999 Let A = a 1 a n be a

CS4102 Algorithms Fall 2018 Warm up Compare + with + () When

Order Statistics Carola Wenk Slides courtesy of Charles Leiserson with additions by Carola Wenk

Mean, median &amp; mode imputations DEALIN G W ITH MIS S IN G DATA IN P YTH ON Suraj Donthi

On Medians of (Randomized) Pairwise Means Pierre Laforgue 1 , Stephan Cl on 1 , Patrice Bertail

Saving Behavior Across the Wealth Distribution: The Importance of Capital Gains Andreas Fagereng

JUST THE MATHS SLIDES NUMBER 19.4 PROBABILITY 4 (Measures of location and dispersion) by

Quantifying Algorithmic Improvements over Time Lars Kotthofg University of Wyoming

Median Closings Report The Transportation Technical Advisory Committee reviewed the Rt 13

Mean, median & mode imputations DEALIN G W ITH MIS S IN G DATA IN P YTH ON Suraj Donthi