The additive model revisited Sara van de Geer January 8, 2013 but - PowerPoint PPT Presentation

The additive model revisited Sara van de Geer January 8, 2013 but first something else (Les Houches) Additive model January 8, 2013 1 / 30

Contents Sharp oracle inequalities Structured sparsity Compatibility (restricted eigenvalue condition) Semiparametric approach Partial linear models Nonparametric models (Les Houches) Additive model January 8, 2013 2 / 30

Sharp oracle inequalities Let S ∈ S be some index set and {F S } S ∈S be a collection of models. Moreover let L ( X , f ) be a loss function and R ( f ) := E L ( X , f ) . We say that the estimator ˆ f satisfies a sharp oracle inequality if with large probability � � R (ˆ f ) ≤ min min R ( f ) + Remainder ( S ) . S ∈S f ∈F S Non-sharp oracle inequalities are of the form: with large probability � � R (ˆ f ) − R ( f 0 ) ≤ ( 1 + δ ) min ( R ( f ) − R ( f 0 )) + Remainder δ ( S ) min , S ∈S f ∈F S where δ > 0 and f 0 := min R ( f ) . f ∈∪ S ∈S F S (Les Houches) Additive model January 8, 2013 3 / 30

Sharp oracle inequalities with structured sparsity penalities High-dimensional linear model: Y = X β 0 + ǫ, with Y ∈ R n , X and n × p matrix and β 0 ∈ R p . We believe that β 0 can be well approximated by a “structured sparse” β . Let Ω be some given norm on R p . Norm-penalized estimator: � � β := ˆ ˆ � Y − X β � 2 β Ω := arg min 2 / n + 2 λ Ω( β ) . β ∈ R p Aim: (Sharp) sparsity oracle inequalities for ˆ β . (Les Houches) Additive model January 8, 2013 4 / 30

Notation: for β ∈ R p and S ⊂ { 1 , . . . , p } β j , S := β j l { j ∈ S } . Example ℓ 1 -norm p � Ω( β ) := � β � 1 := | β j | ❀ Lasso j = 1 The ℓ 1 -norm is decomposable : � β � 1 = � β S � 1 + � β S c � 1 ∀ β ∀ S . (Les Houches) Additive model January 8, 2013 5 / 30

Definition We say that the norm Ω is weakly decomposable for S if there exists a norm Ω S c on R p −| S | such that for all β ∈ R p , Ω( β ) ≥ Ω( β S ) + Ω S c ( β S c ) . Definition We say that S is an allowed set (for Ω ) if Ω is weakly decomposable for S . (Les Houches) Additive model January 8, 2013 6 / 30

Example The group Lasso norm: T � � | G t |� β G t � 2 , β ∈ R p , Ω( β ) := � β � 2 , 1 := t = 1 where G 1 , . . . , G T is a partition of { 1 , . . . , p } into disjoint groups. It is (weakly) decomposable for S = ∪ t ∈T G t with Ω S c = Ω . Thus, for any β , S := ∪{ G t : � β G t � 2 � = 0 } is an allowed set. (Les Houches) Additive model January 8, 2013 7 / 30

Example From Micchelli et al. (2010) Let A ⊂ [ 0 , ∞ ) p be some convex cone. Define p � β 2 � 1 j � Ω( β ) := Ω( β ; A ) := min + a j . 2 a j a ∈A j = 1 Let A S := { a S : a ∈ A} . Definition We call A S an allowed set, if A S ⊂ A . Lemma Suppose A S is an allowed set. Then S is allowed, i.e. S is weakly decomposable for Ω . (Les Houches) Additive model January 8, 2013 8 / 30

We use the notation � v � 2 n := v T v / n , v ∈ R n . Definition Suppose S is an allowed set. Let L > 0 be some constant. The Ω -eigenvalue (for S ) is � � � X β S − X β S c � n : Ω( β S ) = 1 , Ω S c ( β S c ) ≤ L δ Ω ( L , S ) := min . The Ω -effective sparsity is 1 Γ 2 Ω ( L , S ) := Ω ( L , S ) . δ 2 (Les Houches) Additive model January 8, 2013 9 / 30

The dual norm of Ω is denoted by Ω ∗ , that is | w T β | , w ∈ R p . Ω ∗ ( w ) := sup Ω( β ) ≤ 1 We moreover let Ω S c be the dual norm of Ω S c . ∗ (Les Houches) Additive model January 8, 2013 10 / 30

A sharp oracle inequality Theorem Let β ∈ R p be arbitrary and let Let S ⊃ { j : β j � = 0 } be an allowed set. Define � � , λ S c := Ω S c � � λ S := Ω ∗ ( ǫ T X ) S / n ( ǫ T X ) S c / n . ∗ Suppose λ > λ S c . Define � λ + λ S � L S := . λ − λ S c Then � 2 � � X (ˆ β − β 0 ) � 2 n ≤ � X ( β − β 0 ) � 2 ( λ + λ S ) Γ 2 n + Ω ( L S , S ) . Related results: Bach (2010). (Les Houches) Additive model January 8, 2013 11 / 30

What about convergence of the Ω -estimation error? (Les Houches) Additive model January 8, 2013 12 / 30

Theorem Let β ∈ R p be arbitrary and let Let S ⊃ { j : β j � = 0 } be an allowed set. Define � � � � , λ S c := Ω S c λ S := Ω ∗ ( ǫ T X ) S / n ( ǫ T X ) S c / n . ∗ Suppose λ > λ S c . Define for some 0 ≤ δ < 1 � λ + λ S �� 1 + δ � L S := . λ − λ S c 1 − δ Then n + δ ( λ − λ S c )Ω S c (ˆ � X (ˆ β S c ) + δ ( λ + λ S )Ω(ˆ β − β 0 ) � 2 β S − β ) � 2 � ≤ � X ( β − β 0 ) � 2 ( 1 + δ )( λ + λ S ) Γ 2 n + Ω ( L S , S ) . (Les Houches) Additive model January 8, 2013 13 / 30

Special case where Ω = � · � 1 Theorem (Koltchinskii et al. (2011)) Let for S ⊂ { 1 , . . . , p } λ 0 := � ( ǫ T X ) � ∞ / n . Define for λ > λ 0 L := λ + λ 0 . λ − λ 0 Then � � � X (ˆ β − β 0 ) � 2 � X ( β − β 0 ) � 2 n + ( λ + λ 0 ) 2 Γ 2 ( L , � β � 0 ) n ≤ min . β ∈ R p (Les Houches) Additive model January 8, 2013 14 / 30

Compatibility (restricted eigenvalue condition) Recall that for the ℓ 1 -norm 1 Γ 2 ( L , S ) = δ 2 ( L , S ) , with � � δ ( L , S ) := min � X β S − X β S c � n : � β S � 1 = 1 , � β S c � 1 ≤ L . We have | S | Γ 2 ( L , S ) ≤ κ 2 ( L , S ) , where κ 2 ( L , S ) is the restricted eigenvalue (Bickel et al. (2009)). (Les Houches) Additive model January 8, 2013 15 / 30

Consider the case S = { 1 } , and write X 1 := X S , X 2 := X S c . Let X 1 ˆ PX 2 be the projection (in R n ) of X 1 on X 2 and X 1 ˆ AX 2 := X 1 − X 1 ˆ PX 2 be the antiprojection. Define γ 0 := arg min {� γ � 1 : X 1 ˆ ˆ PX 2 = X 2 γ } . Then clearly δ ( L , { 1 } ) = � X 1 ˆ γ 0 � 1 . AX 2 � n ∀ L ≥ � ˆ When n < p one readily sees that γ 0 � 1 . δ ( L , { 1 } ) = 0 ∀ L ≥ � ˆ (Les Houches) Additive model January 8, 2013 16 / 30

Suppose now that the rows of X are i.i.d. with sub-Gaussian distribution Q . Let X 1 PX 2 be the projection of X 1 on X 2 in L 2 ( Q ) and X 1 AX 2 := X 1 − X 1 PX 2 . Let � · � be the L 2 ( Q ) -norm. Define γ 0 := arg min {� γ � 1 : X 1 PX 2 = X 2 γ } . � log p / n small Then with large probability, for L δ ( L , S ) ≥ ( 1 − ǫ ) � X 1 AX 2 � ∀ L ≥ � γ 0 � 1 . and moreover, � log p ( X 1 AX 1 ) T ( X 1 PX 2 ) / n ≍ . n (Les Houches) Additive model January 8, 2013 17 / 30

Oracle inequalities for parameters of interest High-dimensional linear model: Y = X 1 β 0 1 + X 2 β 0 2 + ǫ, β 0 1 ∈ R q , β 0 2 ∈ R p − q , and the entries of ǫ i.i.d. sub-Gaussian. Suppose the rows of X are i.i.d with sub-Gaussian distribution Q . We are interested in estimating β 0 1 . Lasso estimator: � � β = (ˆ ˆ β 1 , ˆ � Y − X 1 β 1 − X 2 β 2 � 2 β 1 ) := arg min 2 / n + λ � β 1 � 1 + λ � β 2 � 1 . β 1 , β 2 (Les Houches) Additive model January 8, 2013 18 / 30

Notation Let X 1 PX 2 be the projection of X 1 on X 2 in L 2 ( Q ) , and define ˜ X 1 := X 1 − X 1 PX 2 = X 1 AX 2 . Let Σ 1 := E ˜ X T 1 ˜ X 1 / n , and let ˜ Λ 2 1 be its smallest eigenvalue. Define � � C 0 := arg min � C � 1 , ∞ : X 1 PX 2 = X 2 C , where � C � 1 , ∞ := max 1 ≤ k ≤ q � γ k � 1 , C := ( γ 1 , . . . , γ p − q ) . (Les Houches) Additive model January 8, 2013 19 / 30

Condition 1 1 / ˜ Λ 1 = O ( 1 ) �� n Condition 2 � β 0 � 1 = O ( 1 ) and s 1 := � β 0 1 � 0 ∨ 1 = o . log p (Les Houches) Additive model January 8, 2013 20 / 30

Theorem � Take λ ≍ log p / n. Then � ˆ β − β 0 � 1 = O P ( 1 ) . If moreover � C 0 � 1 , ∞ = O ( 1 ) ( i . e . ℓ 1 − smoothness of the projection ) , then � � � log p � ˆ β 1 − β 0 1 � 1 = O P s 1 = o P ( 1 ) . n Special case: q = 1 (recall q = dim ( β 1 ) ). Then s 1 = 1 and hence �� log p � | ˆ β 1 − β 0 1 | = O P . n (Les Houches) Additive model January 8, 2013 21 / 30

The high-dimensional partial linear model Joint work with Patric M¨ uller . Additive model: Y = X β 0 + g 0 ( Z ) + ǫ, with ǫ ⊥ ( X , Z ) . We assume that the entries of ( X , Z ) ∈ R p × Z are i.i.d. with distribution Q and that the entries of ǫ are i.i.d. sub-Gaussian. We will assume that g 0 has a given “smoothness” m > 1 / 2 and that β 0 is sparse, with X β 0 is “smoother” than g 0 . Estimator: � � (ˆ � Y − X β − g ( Z ) � 2 2 / n + λ � β � 1 + µ 2 J 2 ( g ) β, ˆ g ) := arg min , β, g where J is some (semi-)norm on the space of functions on Z . (Les Houches) Additive model January 8, 2013 22 / 30

Notation We write ˜ X := XAZ := X − XPZ where XPZ := E ( X | Z ) . X T ˜ The smallest eigenvalue of E ˜ X / n is denoted by ˜ Λ 2 . The largest eigenvalue of E ( XPZ ) T ( XPZ ) / n is denoted by Λ 2 P . � · � is the L 2 ( Q ) -norm. (Les Houches) Additive model January 8, 2013 23 / 30

The additive model revisited Sara van de Geer January 8, 2013 but - PowerPoint PPT Presentation

The additive model revisited Sara van de Geer January 8, 2013 but first something else (Les Houches) Additive model January 8, 2013 1 / 30 The additive model revisited Sara van de Geer January 8, 2013 but first something else (Les

Hom and Ext, Revisited Justin Lyle Lawrence, KS justin.lyle@ku.edu April 28, 2018 JL Hom and

Generalized Additive Models September 10, 2019 Generalized Additive Models September 10, 2019 1

Additive Manufacturing Turning Mind into Matter Neal de Beer (Ph.D) Overview Introduction to

APPLING CERMET (WC) COATINGS VIA INTERNAL APPLING CERMET (WC) COATINGS VIA INTERNAL BORE LASER

Metal Additive Technology 101 Technology Choices and Applications Jeff Crandall Additive

Diverse and Additive Cartesian Abstraction Heuristics Jendrik Seipp Malte Helmert University of

Non-additive measures and integrals Vicen c Torra March, 2014 IIIA-CSIC (joint work with

Lattice and Non-Lattice Markov Additive Models Jevgenijs Ivanovs, Guy Latouche and Peter Taylor

6 Decision- -Making Making MVC (revisited) 6 Decision MVC (revisited) decision

Problem-solving revisited Problem-solving revisited David Lim (District Judge / Mediator) State

Environmental Acquisition Revisited Richard Cobbe and Matthias Felleisen Northeastern University

CEPH WIRE PROTOCOL REVISITED CEPH WIRE PROTOCOL REVISITED MESSENGER V2 MESSENGER V2 Ricardo

Hard State Revisited: Network Filesystems Hard State Revisited: Network Filesystems Jeff Chase

CCM: The CORBA Component Model Portable Object Adapter (POA) revisited Portable Object

Additive Manufacturing for Biomedical Applications Kenny Dalgarno School of Mechanical and

Introduction to Additive Manufacturing (AM 101) 8 December 2015 Caroline Scheck Naval Surface

A Modern History of Probability Theory Kevin H. Knuth Depts. of Physics and Informatics

Computational Linguistics Statistical NLP Aurlie Herbelot 2020 Centre for Mind/Brain Sciences

Noise Graph Addition: A New Perspective for Graph Anonymization Vicen Torra, Julin Salas

A Proof of Analytic Subordination for Free Additive Convolution using Monotone Independence

High Dimensional Bayesian Optimisation and Bandits via Additive Models Kirthevasan Kanda samy ,

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by

Introduction to Data Science Winter Semester 2019/20 Oliver Ernst TU Chemnitz, Fakultt fr

Presentation 7.3a: Multiple linear re- gression Murray Logan July 19, 2017 Table of contents