Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables - PowerPoint PPT Presentation

Introduction The framework Convex approximations and algorithms Mixed Local Linear and Quadratic Approximation: MLLQA Numerical examples Conclusion Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables SIDI ZAKARI Ibrahim LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT’2010 Paris, August 22-27, 2010 SIDI ZAKARI Ibrahim co-authors Mkhadri Abdallah and N’Guessan Assi

Introduction The framework Convex approximations and algorithms Mixed Local Linear and Quadratic Approximation: MLLQA Numerical examples Conclusion Motivations ◮ Fan and Li (2001), Zou and Li (2008) works ◮ Convex penalties (e.g quadratic penalties) : make trade-off between bias and variance, can create unnecessary biases when the true parameters are large and cannot produce parsimonious models. ◮ Nonconcave penalties (e.g: SCAD penalty,Fan 1997 and hard thresholding penalty, Antoniadis 1997) ◮ Variables selection in high dimension (correlated variables) ◮ Penalized likelihood framework SIDI ZAKARI Ibrahim co-authors Mkhadri Abdallah and N’Guessan Assi

Introduction The framework Convex approximations and algorithms Mixed Local Linear and Quadratic Approximation: MLLQA Numerical examples Conclusion Ideal procedure for variable selection ◮ Unbiasedness: The resulting estimator is nearly unbiasedness when the true unkwown parameter is large to avoid excessive estimation bias. ◮ Sparsity: Estimating a small coefficient as zero, to reduce model complexity. ◮ Continuity: The resulting estimator is continuous in the data to avoid instability in model prediction. SIDI ZAKARI Ibrahim co-authors Mkhadri Abdallah and N’Guessan Assi

Introduction The framework Convex approximations and algorithms Mixed Local Linear and Quadratic Approximation: MLLQA Numerical examples Conclusion The Smoothly Clipped Absolute Deviation (SCAD) Penalty The SCAD penalty noted J λ ( . ) satisfies all three requirements (unbiasedness,sparsity,continuity) and is defined by J λ ( 0 ) = 0 and for | β j | > 0 λ ( | β j | ) = λ I ( | β j | ≤ λ ) + ( a λ − | β j | ) + ′ J I ( | β j | > λ ) , (1) a − 1 where ( z ) + = max ( z , 0 ) , a > 2 and λ > 0. SCAD possesses oracle properties. SIDI ZAKARI Ibrahim co-authors Mkhadri Abdallah and N’Guessan Assi

Introduction The framework Convex approximations and algorithms Mixed Local Linear and Quadratic Approximation: MLLQA Numerical examples Conclusion Generalities Let ( x i , y i ) , i = 1 , . . . , n an i.i.d random variables sample where R p , y i ∈ I x i ∈ I R . The conditional log-likelihood function knowing x i is: ℓ i ( β ) = ℓ i ( β , φ ) = ℓ i ( x t i β , y i , φ ) (2) where φ is the dispersion parameter, supposed known. We want to estimate β maximizing: n p � � ℓ i ( β ) − n J λ ( | β j | ) , P ℓ ( β ) = (3) i = 1 j = 1 SIDI ZAKARI Ibrahim co-authors Mkhadri Abdallah and N’Guessan Assi

Introduction The framework Convex approximations and algorithms Mixed Local Linear and Quadratic Approximation: MLLQA Numerical examples Conclusion ◮ The penalized likelihood is nonconcave and nondifferentiable ◮ Maximization problem ◮ Alternative: Approximation of the SCAD penalty by convex functions ◮ Iterative algorithms LQA Algorithm: Fan and Li (2001)   λ ( | β ( k )  n p ′  � � J | ) β ( k + 1 ) = argmax β j β 2 ℓ i ( β ) − n  . (4) j 2 | β ( k )  | i = 1 j = 1 j ◮ When | β ( k ) | < ǫ 0 put ˆ β j = 0 j ◮ Two drawbacks: Choice of ǫ 0 and definitive exclusion of variables. SIDI ZAKARI Ibrahim co-authors Mkhadri Abdallah and N’Guessan Assi

Introduction The framework Convex approximations and algorithms Mixed Local Linear and Quadratic Approximation: MLLQA Numerical examples Conclusion LLA Algorithm: Zou and Li (2008)    p  n � � β ( k + 1 ) = argmax β λ ( | β ( k ) ′ ℓ i ( β ) − n J | ) | β j |  . (5) j  i = 1 j = 1 ◮ The one step LLA estimations are good as estimations obtained after the fully iterative LLA. ◮ The well known LARS algorithm is used when computing the solution. ◮ Therefore, as with LASSO (Tibshirani, 1996) there is a problem of selection in the case p >> n . SIDI ZAKARI Ibrahim co-authors Mkhadri Abdallah and N’Guessan Assi

Introduction The framework Convex approximations and algorithms Mixed Local Linear and Quadratic Approximation: MLLQA Numerical examples Conclusion Our contribution: MLLQA Algorithm    n p p  � � � j | β j | − n β ( k + 1 ) = argmax β ω 1 ω 2 j ,τ β 2 ℓ i ( β ) − n  . j  2 i = 1 j = 1 j = 1 (6) λ ( | β ( 0 ) | ) , | β ( 0 ) Where ω 1 j and ω 2 ′ | and eventually j ,τ depend on J j j τ > 0. ◮ β ( 0 ) is the Maximum Likelihood Estimator. ◮ The second term is for selection. ◮ The third one guarantees grouping effect as with the elastic net (Zou and Hastie, 2005). ◮ For the convergence we prove that MLLQA is an instance of MM algorithms (Hunter and Li 2005). SIDI ZAKARI Ibrahim co-authors Mkhadri Abdallah and N’Guessan Assi

Introduction The framework Convex approximations and algorithms Mixed Local Linear and Quadratic Approximation: MLLQA Numerical examples Conclusion Augmented data problem We show that solving problem ( 6 ) is equivalent to find:   p   � 1 2 � Y ∗ − X ∗ β � 2 + n � ω 1 β = argmin β j | β j | . (7)   j = 1 Y ∗ ∈ I R n + p , X ∗ of dimension ( n + p ) ∗ p and ( Y ∗ , X ∗ ) depend on data ( Y , X ) . Proposition Solving the problem ( 3 ) via one-step MLLQA algorithm is equivalent to One-step LLA on augmented data. SIDI ZAKARI Ibrahim co-authors Mkhadri Abdallah and N’Guessan Assi

Introduction The framework Convex approximations and algorithms Mixed Local Linear and Quadratic Approximation: MLLQA Numerical examples Conclusion Oracle and Statistical Properties of the one step MLLQA estimator β ( ose ) be the one-step estimator β ( 1 ) and β 0 the true Let � model parameter. Assume β 0 = ( β 01 , ..., β 0 p ) T = ( β T 20 ) T and β 20 = 0. Under 10 , β T some regularity conditions we have the following theorem: Theorem If √ n λ n → ∞ and λ n → 0, � β ( ose ) is Sparse: with probability tending to 1, � β ( ose ) 2 = 0 . Asymptotically normal: √ n ( � β ( ose ) 1 − β 10 ) → N ( 0 , I − 1 1 ( β 10 )) ′ ◮ Continuity: the minimum of the function | β | + J λ ( | β | ) must be attained at zero (Fan and Li 2001).In the case of ′ one-step it suffices that J λ ( | β | ) be continuous for | β | > 0 to have the continuity of � β ( ose ) . SIDI ZAKARI Ibrahim co-authors Mkhadri Abdallah and N’Guessan Assi

Introduction The framework Convex approximations and algorithms Mixed Local Linear and Quadratic Approximation: MLLQA Numerical examples Conclusion Grouping effect: case of correlated variables Assume that the response variable is centered and the predictors are standardized. If | β ( 0 ) | = | β ( 0 ) | � = 0 i , j ∈ { 1 , ..., p } i j we then have: | β ( 0 ) � | + τ j 1. D λ,τ,β ( 0 ) ( i , j ) ≤ 2 ( 1 − ρ ) λ ( | β ( 0 ) nJ ′ | ) j 2. x i = x j ⇒ � β i = � β j i x j and D λ,τ,β ( 0 ) ( i , j ) = | � β i − � β j | Where ρ = x t | Y | 1 SIDI ZAKARI Ibrahim co-authors Mkhadri Abdallah and N’Guessan Assi

Introduction The framework Convex approximations and algorithms Mixed Local Linear and Quadratic Approximation: MLLQA Numerical examples Conclusion Linear Model In this example, simulation data were generated from the linear regression model, y = x T β + ǫ, where β = ( 3 , 1 . 5 , 0 , 0 , 2 , 0 , 0 , 0 , 0 , 0 , 0 , 0 ) T , ǫ ∼ N ( 0 , 1 ) and x is multivariate normal distribution with zero mean and covariance between the i th and j th elements being ρ | i − j | with ρ ∈ { 0 . 5 , 0 . 7 , 0 . 9 } .The sample size is set to be 50 and 100.For each case we repeated the simulation 500 times. SIDI ZAKARI Ibrahim co-authors Mkhadri Abdallah and N’Guessan Assi

Introduction The framework Convex approximations and algorithms Mixed Local Linear and Quadratic Approximation: MLLQA Numerical examples Conclusion n = 50 No. of Zeros Proportion of Method MRME C IC Underfit Correctfit Overfit ρ = . 5 LLA 0.357 3 2.712 0 0.412 0.588 MLLQA 0.331 3 2.488 0 0.492 0.508 ρ = . 7 LLA 0.437 2.998 2.794 0.002 0.362 0.636 MLLQA 0.383 2.994 2.654 0.006 0.410 0.584 ρ = . 9 LLA 0.616 2.884 2.676 0.116 0.282 0.606 MLLQA 0.579 2.876 2.556 0.124 0.302 0.578 SIDI ZAKARI Ibrahim co-authors Mkhadri Abdallah and N’Guessan Assi

Introduction The framework Convex approximations and algorithms Mixed Local Linear and Quadratic Approximation: MLLQA Numerical examples Conclusion n = 100 No. of Zeros Proportion of Method MRME C IC Underfit Correctfit Overfit ρ = . 5 LLA 0.492 2.998 3.154 0.002 0.460 0.538 MLLQA 0.455 2.998 3.114 0.002 0.482 0.516 ρ = . 7 LLA 0.486 2.998 2.828 0.002 0.480 0.518 MLLQA 0.451 2.998 2.872 0.002 0.490 0.508 ρ = . 9 LLA 0.539 2.946 2.490 0.054 0.394 0.552 MLLQA 0.491 2.944 2.516 0.056 0.412 0.532 SIDI ZAKARI Ibrahim co-authors Mkhadri Abdallah and N’Guessan Assi

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables - PowerPoint PPT Presentation

Introduction The framework Convex approximations and algorithms Mixed Local Linear and Quadratic Approximation: MLLQA Numerical examples Conclusion Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables SIDI ZAKARI Ibrahim

Mean Absolute Deviation Mean Absolute Deviation O Definition: Mean Absolute Deviation (MAD) is

Spontaneous Coronary artery Dissection Dr Dave Adlam SCAD conference 7 th November 2015 Welcome

Absolute and Local Extrema Definition (Absolute Maximum) A function f has an absolute maximum at c

Absolute and Local Extrema Definition 1 (Absolute Maximum) . A function f has an absolute max- imum

Law Establishing SCAD Law No. 7 of 2008 establishes the functions and authorities assigned to

d i E Absolute Value a l l u d Dr. Abdulla Eid b A College of Science . r D MATHS

Maximize the Value of CDMA Networks and Maximize the Value of CDMA Networks and Smoothly Evolve to

= a a m ; a m m = 1,2,3 Consider : Is it possible that all three Omegas are EQUAL , i.e.

Standard Deviation MDM4U: Mathematics of Data Management A deviation is the difference between any

Correlated-Q Learning and Cyclic Equilibria in Markov games Haoqi Zhang Correlated-Q Learning

Is the Round- -trip Time trip Time Is the Round Correlated with the Number of Correlated with

Statistical Timing Analysis Statistical Timing Analysis g g y y Considering Spatially and

Receiver-based Recovery of Clipped OFDM Signals for PAPR Reduction: A Bayesian Approach Anum Ali

KARIBU BBSCC! WELCOME TO BBSCC! Please pick up a paper clipped packet. You will be asked to move

Practical denoising of clipped or overexposed noisy images Alessandro Foi www.cs.tut.fi/~foi

17 www.scad.ae Table of Contents Table of Contents

Models for Neutrino Masses and Physics Beyond Standard Model Salah Nasrj The 2nd Toyama

UNDULOID-LIKE EQUILIBRIUM SHAPES OF SINGLE-WALL CARBON NANOTUBES UNDER PRESSURE Vassil M.

20/08/2018 Henry Stewart, Lord Darnley with his younger brother Charles 1563 by Eworth, Royal

From neutrino masses to the matter- antimatter asymmetry of the Universe Stphane Lavignac (IPhT

ClearSCADA WEB-X CLIENT! Diary of the Penetration Tester ! Aditya K Sood, Senior Security

Lecture 11: Data representations Felix Held, Mathematical Sciences MSA220/MVE440 Statistical

Selective Inference via the Condition on Selection Framework: Inference after Variable Selection

STK-IN4300 Statistical Learning Methods in Data Science Riccardo De Bin debin@math.uio.no

Sambuz

Useful Links

Newsletter

Mail Us