robust sparse quadratic discriminantion jianqing fan
play

Robust Sparse Quadratic Discriminantion Jianqing Fan Princeton - PowerPoint PPT Presentation

Robust Sparse Quadratic Discriminantion Jianqing Fan Princeton University with Tracy Ke, Han Liu and Lucy Xia May 2, 2014 Jianqing Fan (Princeton University) Quadro Outline Introduction 1 Rayleigh Quotient for sparse QDA 2 Optimization


  1. Robust Sparse Quadratic Discriminantion Jianqing Fan Princeton University with Tracy Ke, Han Liu and Lucy Xia May 2, 2014 Jianqing Fan (Princeton University) Quadro

  2. Outline Introduction 1 Rayleigh Quotient for sparse QDA 2 Optimization Algorithm 3 Application to Classification 4 Theoretical Results 5 Numerical Studies 6 Jianqing Fan (Princeton University) Quadro

  3. Introduction High Dimensional Classification Jianqing Fan (Princeton University) Quadro

  4. High-dimensional Classification � pervades all facets of machine learning and Big Data Biomedicine : disease classification / predicting clinical outcomes / biological process using microarray or proteomics data. Machine learning : Document/text classification, image classification Social Networks : Community detection Jianqing Fan (Princeton University) Quadro

  5. Classification Training data : { X i 1 } n 1 i = 1 and { X i 2 } n 2 i = 1 for classes 1 and 2. 5 4 Aim : Classify a new data X by I { f ( X ) < c } + 1 3 ? 2 � Family of functions f : linear, quadratic 1 � Criterion for selecting f : logistic, hinge 0 Convex surrogate −1 −2 −2 −1 0 1 2 3 4 Jianqing Fan (Princeton University) Quadro

  6. A popular approach Sparse linear classifiers : Minimize classification errors ( Bickel& Levina, 04, Fan & Fan, 08; Shao et al. 11; Cai & Liu, 11; Fan, et al, 12 ). ⋆ Works well with Gaussian data with equal variance. ⋆ Powerless if centroids are the same; no interaction considered 2.5 2 1.5 1 0.5 0 −0.5 −1 −1.5 −2 −2 −1 0 1 2 3 4 � Heteroscadestic variance? Non-Gaussian distributions? Jianqing Fan (Princeton University) Quadro

  7. Other popular approaches � Plug-in quadratic discriminant. ⋆ needs Σ − 1 1 , Σ − 1 2 ; ⋆ Gaussianity. � Kernel SVM, logistic regression. ⋆ inadequate use of dist.; ⋆ few results; ⋆ interactions � Minimizing classification error: ⋆ non-convex; not easily computable. Jianqing Fan (Princeton University) Quadro

  8. What new today? Find a quadratic rule that max. Rayleigh Quotient. 1 Non-equal covariance matrices; 2 Fourth cross-moments avoided using elliptical distributions 3 Uniform estimation of means and variance for heavy-tails. 4 Jianqing Fan (Princeton University) Quadro

  9. Rayleigh Quotient Optimization Jianqing Fan (Princeton University) Quadro

  10. Rayleigh Quotient [ E 1 f ( X ) − E 2 f ( X )] 2 between-class-var Rq ( f ) = ∝ π var 1 [ f ( X )]+( 1 − π ) var 2 [ f ( X )] within-class-var Rayleigh Q � In the ”classical” setting, Rq ( f ) is equiv. to Err ( f ) � In ”broader” setting, it is a surrogate of classification error. � Of independent scientific interest. Jianqing Fan (Princeton University) Quadro

  11. Rayleigh quotient for quadratic loss Quadratic projection : Q Ω , δ ( X ) = X ⊤ Ω X − 2 δ ⊤ X . With π = P ( Y = 1 ) and κ = 1 − π π , we have [ D ( Ω , δ )] 2 Rq ( Q ) ∝ V 1 ( Ω , δ )+ κ V 2 ( Ω , δ ) = R ( Ω , δ ) , D ( Ω , δ ) = E 1 Q ( X ) − E 2 Q ( X ) . V k ( Ω , δ ) = var k ( Q ( X )) , k = 1 , 2. Reduce to ROAD ( Fan, Feng, Tong, 12 ) when linear. Jianqing Fan (Princeton University) Quadro

  12. Challenge and Solution Challenge : involve all fourth cross moments. Solution : Consider the elliptical family. E ξ 2 = d , X = µ + ξ Σ 1 / 2 U , X ∼ E ( µ , Σ , g ) Theorem ( Variance of Quadratic Form ) var ( Q ( X )) = 2 ( 1 + γ ) tr ( ΩΣΩΣ )+ γ [ tr ( ΩΣ )] 2 + 4 ( Ω µ − δ ) ⊤ Σ ( Ω µ − δ ) , quadratic in Ω , δ , E ( ξ 4 ) where γ = d ( d + 2 ) − 1 is the kurtosis parameter. Jianqing Fan (Princeton University) Quadro

  13. Rayleigh Quotient under elliptical family Semiparametric model : Two classes: E ( µ 1 , Σ 1 , g ) and E ( µ 2 , Σ 2 , g ) . D , V 1 and V 2 : involve only µ 1 , µ 2 , Σ 1 , Σ 2 and γ Examples of γ : Contaminated Gaussian ( ω , τ ) Compound Gaussian U ( 1 , 2 ) Gaussian t v 1 + ω ( τ 4 − 1 ) 2 1 γ 0 ( 1 + ω ( τ 2 − 1 )) 2 − 1 ν − 2 6 Jianqing Fan (Princeton University) Quadro

  14. Sparse quadratic solution Simplification : Using homogeneity, [ D ( Ω , δ )] 2 V 1 ( Ω , δ )+ κ V 2 ( Ω , δ ) ∝ argmin V 1 ( Ω , δ )+ κ V 2 ( Ω , δ ) argmax � �� � Ω , δ D ( Ω , δ )= 1 V ( Ω , δ ) Theorem ( Sparsified version: Ω ∈ R d × d , δ ∈ R d ) V ( Ω , δ )+ λ 1 | Ω | 1 + λ 2 | δ | 1 . argmin ( Ω , δ ): D ( Ω , δ )= 1 � Applicable to linear discriminant = ⇒ ROAD Jianqing Fan (Princeton University) Quadro

  15. Robust Estimation and Optimization Algorithm Jianqing Fan (Princeton University) Quadro

  16. Robust Estimation of Mean Problems : Elliptical distributions can have heavy tails. Challenges : ⋆ Sample median �≈ mean when skew (e.g. EX 2 ) ⋆ Need uniform conv. for exponentially many σ 2 ii . How to estimate mean with exponential concentration for heavy tails? Jianqing Fan (Princeton University) Quadro

  17. Robust Estimation of Mean Problems : Elliptical distributions can have heavy tails. Challenges : ⋆ Sample median �≈ mean when skew (e.g. EX 2 ) ⋆ Need uniform conv. for exponentially many σ 2 ii . How to estimate mean with exponential concentration for heavy tails? Jianqing Fan (Princeton University) Quadro

  18. Catoni’s M-estimator � µ n ∑ h ( α n , d ( x ij − � µ j )) = 0 , α n , d → 0 . i = 1 h strictly increasing: log ( 1 − y + y 2 / 2 ) ≤ h ( y ) ≤ log ( 1 + y + y 2 / 2 ) . 1 � � 1 / 2 4log ( n ∨ d ) with v ≥ max j σ 2 α n , d = jj . 2 n [ v + 4 v log ( n ∨ d )) n − 4log ( n ∨ d ) ] Catoni's influence function h(.) 3 2 � 1 log d | � µ j − µ j | ∞ = O p ( n ) y 0 −1 needs bounded 2 nd moment −2 −3 −6 −4 −2 0 2 4 6 x Jianqing Fan (Princeton University) Quadro

  19. Robust Estimation of Σ k η j = � j , Catoni’s M-estimator using { x 2 1 j , ··· , x 2 � EX 2 nj } . 1 variance estimation : for a small δ 0 , 2 j = � σ 2 µ 2 � Σ jj = max { � η j − � j , δ 0 } . Off-diagonal elements : 3 � Σ jk = � σ j � σ k sin ( π � τ jk / 2 ) � �� � robust corr � τ jk : Kendall’s tau correlation ( Liu, et al, 12; Zou & Xue, 12 ). Jianqing Fan (Princeton University) Quadro

  20. Projection into nonnegative matrix � � Σ is indefinite : sup-norm projection : � � � | A − � Σ = argmin Σ | ∞ , convex optimization A ≥ 0 Estimated projected truth Property : | � Σ − Σ | ∞ ≤ 2 | � Σ − Σ | ∞ . Jianqing Fan (Princeton University) Quadro

  21. Robust Estimation of γ 1 d ( d + 2 ) E ( ξ 4 ) − 1 and Recall : γ = E ( ξ 4 ) = E { [( X − µ ) ⊤ Σ − 1 ( X − µ )] 2 } . Intuitive estimator: — also estimable for subvectors . � � n 1 1 µ ) ⊤ � µ )] 2 − 1 , ∑ � γ = max [( X i − � Ω ( X i − � , 0 d ( d + 2 ) n i = 1 µ and � Ω are estimators of µ and Σ − 1 (CLIME, Cai, et al, 11 ). ⋆ � � � µ − µ | ∞ , | � Ω − Σ − 1 | ∞ Properties: | � | � γ − γ | ≤ C max . Jianqing Fan (Princeton University) Quadro

  22. Linearized Augmented Lagrangian Target : min D ( Ω , δ )= 1 V ( Ω , δ )+ λ 1 | Ω | 1 + λ 2 | δ | 1 . Rayleigh Q � Let F ρ ( Ω , δ , ν ) = V ( Ω , δ )+ ν [ D ( Ω , δ ) − 1 ]+ ρ [ D ( Ω , δ ) − 1 ] 2 � �� � quadratic in Ω and δ Ω ( 1 ) ⇒ δ ( 1 ) ⇒ ν ( 1 ) = ⇒ Ω ( 2 ) ⇒ δ ( 2 ) ⇒ ν ( 2 ) = ⇒ ··· Jianqing Fan (Princeton University) Quadro

  23. Linearized Augmented Lagrangian: Details � Minimize F ρ ( Ω , δ , ν )+ λ 1 | Ω | 1 + λ 2 | δ | 1 . Rayleigh Q � � Ω ( k ) = argmin Ω F ρ ( Ω , δ ( k − 1 ) , ν ( k − 1 ) )+ λ 1 | Ω | 1 , (soft-thresh.) � � δ ( k ) = argmin δ F ρ ( Ω ( k ) , δ , ν ( k − 1 ) )+ λ 2 | δ | 1 , (LASSO) ν ( k ) = ν ( k − 1 ) + 2 ρ [ D ( Ω ( k ) , δ ( k ) ) − 1 ] . Jianqing Fan (Princeton University) Quadro

  24. Application to Classification Jianqing Fan (Princeton University) Quadro

  25. Finding a Threshold Q Where to Cut??? Jianqing Fan (Princeton University) Quadro

  26. Finding a Threshold Back to approx � � Z ⊤ Ω Z − 2 Z ⊤ δ < c + 1. ⋆ Classification rule: I ⋆ Reparametrization: c = tM 1 ( Ω , δ )+( 1 − t ) M 2 ( Ω , δ ) . ⋆ Minimizing wrt t an approximated classification error: � � � � ( 1 − t ) D ( Ω , δ ) tD ( Ω , δ ) Err ( t ) ≡ π ¯ +( 1 − π )¯ � � Φ Φ , V 1 ( Ω , δ ) V 2 ( Ω , δ ) Jianqing Fan (Princeton University) Quadro

  27. Overview of Our Procedure Raw Data Robust M-estimator, and Kendall’s tau correlation estimation µ 2 , b Σ 1 , b µ 1 , b b Σ 2 , b γ Rayleigh quotient optimization (a regularized convex programming) Ω , b ( b δ ) Find threshold of c ( t ∗ ) , where t ∗ is found by minimizing Err ( b Ω , b δ , t ) Quadratic Classification Rule: δ , c ( t ∗ )) = I ( Z > b f ( b Ω , b Ω Z − 2 Z > b δ < c ( t ∗ )) Jianqing Fan (Princeton University) Quadro

  28. Theoretical Results Jianqing Fan (Princeton University) Quadro

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend