Sliced Inverse Regression with Interaction (Siri) Detection for - PowerPoint PPT Presentation

Sliced Inverse Regression with Interaction (Siri) Detection for non-Gaussian BN learning Jun S. Liu Department of Statistics Harvard University Joint work with Bo Jiang 1

General: Regression and Classification Responses Covariates Ind 1 x 11 , x 12 , …, x 1p Y 1 Ind 2 Y 2 x 21 , x 22 , …, x 2p M   x N1 , x N2 , …, x NP Ind N Y N 2

Variable Selection with Interaction Let Y ∈ R be a univerate response variable and X ∈ R p be a vector of p continuous predictor variables Y = X 1 × X 2 + ϵ , ϵ ∼ N ( 0, σ 2 ) , X ∼ MVN ( 0, I p ) Suppose p = 1000. How to find X 1 and X 2 ? One step forward selection : ∼ 500,000 interaction terms 4

Variable Selection with Interaction Let Y ∈ R be a univerate response variable and X ∈ R p be a vector of p continuous predictor variables Y = X 1 × X 2 + ϵ , ϵ ∼ N ( 0, σ 2 ) , X ∼ MVN ( 0, I p ) Suppose p = 1000. How to find X 1 and X 2 ? One step forward selection : ∼ 500,000 interaction terms Is there any marginal relationship between Y and X 1 ? 5

[Y | X] ? [X | Y] ? Who is behind the bar ? 6

General: Regression and Classification Responses Covariates Ind 1 x 11 , x 12 , …, x 1p Y 1 Ind 2 Y 2 x 21 , x 22 , …, x 2p M   x N1 , x N2 , …, x NP Ind N Y N P Y ( | X ) P ( X | ) ( )/ Y P Y P ( ) X = How to model this? 7

Naïve Bayes model Y X 1 X 2 X 3 X m 8

(Augmented) Naïve Bayes Model l BEAM: Bayesian Epistasis Association Mapping (Zhang and Liu 2007): discrete univariate response and discrete predictors l (Augmented) Naïve Bayes Classifier with Variable Selection and Interaction Detection (Yuan Yuan et al.): discrete univariate response and continuous (but discretized) predictors l Bayesian Partition Model for eQTL study (Zhang et al. 2010): continuous multivariate responses and discrete predictors l Sliced Inverse Regression with Interaction Detection (SIRI): continuous univariate response and continuous predictors 9

Tree-Augmented Naïve Bayes Y TAN (tree-augmented naïve Bayes) X 4 X 6 X 3 X 1 X 2 X 5 (Pearl 1988; Friedman 1997) 10

Augmented Naïve Bayes X 2.21 Y Group 0 X 01 X 02 X 2.22 Group 22 X 2.12 X 11 X 12 X 13 X 2.11 X 2.13 Group 1 Group 21 11

How about continuous covariates? • We may discretize Y, and discretize each X • Or discretize Y, assuming joint Gaussian distributions on X? • Sound familiar?

An observation: Y = X 1 × X 2 + ϵ , ϵ ∼ N ( 0, σ 2 ) , X ∼ MVN ( 0, I p ) y 13 x 1

Sliced Inverse Regression (SIR, Li 1991) SIR is a tool for dimension reduction in multivariate statistics Let Y ∈ R be a univerate response variable and X ∈ R p be a vector of p continuous predictor variables T X , ... , β K T X , ϵ ) Y = f ( β 1 f is an unknown function and ϵ is the error with finite variance How to identify unknown projection vectors β 1 , ... , β K ? 14

###ëëÊ ¹ à”Â‹åf 30

̂ ̂ SIR Algorithm Let Σ xx be the covariance matrix of X . Standarize X to : − 1 / 2 { X − E X } Z = Σ xx Divde the range of y i into S nonoverlapping slices H s ∈ { 1,... ,S } n s is the number of observations within each slice − 1 ∑ i ∈ H s z i , and Compute the mean of z i over all slices ̄ z s = n s calculate the estimate for Cov { E ( X ∣ Y )} : S − 1 ∑ s = 1 T M = n n s ̄ z s ̄ z s M , ̂ λ k and corresponding Identify largest K eigenvalues of ̂ η k . Then, eigenvectors ̂ − 1 / 2 ̂ β k = ̂ Σ xx η k ( k = 1,... ,K ) 31

SIR with Variable Selection Only a subset of predictors are relevant: β 1 , ... , β K are sparse Backward subset selection (Cook 2004, Li et al. 2005) Shrinkage estimates of β 1 , ... , β K using L 1 - or L 2 -penalty : Regularized SIR (RSIR, Zhong et al. 2005) Sparse SIR (SSIR, Li 2007) Correlation Pursuit (Zhong et al. 2012) : A forward selection and backward elimination procedure motivated by F-test in stepwise regression F 1, n − d − 1 = ( n − d − 1 )( ̂ R d + 1 − ̂ R d 2 2 ) 1 − ̂ R d + 1 2 32

Correlation Pursuit (COP) A the k th Let A be the current set of selected predictors and ̂ λ k largest eigenvalue estimated by SIR based on predictors in A For j th predictor ( j ∉ A ) , X j , define statistic A + j = n ( ̂ λ k A + j − ̂ λ k A ) COP k 1 − ̂ λ k A + j 33

Correlation Pursuit (COP) A the k th Let A be the current set of selected predictors and ̂ λ k largest eigenvalue estimated by SIR based on predictors in A For j th predictor ( j ∉ A ) , X j , define statistic A + j = n ( ̂ λ k A + j − ̂ λ k A ) COP k 1 − ̂ λ k A + j If j ∉ A ,COP k A + j ( k = 1,... , K ) are asymptotically � 2 ( 1 ) , i.i.d. χ A + j = ∑ k = 1 K A + j is asymptotically χ 2 ( K ) and COP 1: K COP k 34

Correlation Pursuit (COP) A the k th Let A be the current set of selected predictors and ̂ λ k largest eigenvalue estimated by SIR based on predictors in A For j th predictor ( j ∉ A ) , X j , define statistic A + j = n ( ̂ λ k A + j − ̂ λ k A ) COP k 1 − ̂ λ k A + j If j ∉ A ,COP k A + j ( k = 1,... , K ) are asymptotically � 2 ( 1 ) , i.i.d. χ A + j = ∑ k = 1 K A + j is asymptotically χ 2 ( K ) and COP 1: K COP k r ) ,r < 1 / 2 The stepwise procedure is consistent if p = O ( n 35

Correlation Pursuit (COP) A the k th Let A be the current set of selected predictors and ̂ λ k largest eigenvalue estimated by SIR based on predictors in A For j th predictor ( j ∉ A ) , X j , define statistic A + j = n ( ̂ λ k A + j − ̂ λ k A ) COP k 1 − ̂ λ k A + j If j ∉ A ,COP k A + j ( k = 1,... , K ) are asymptotically � 2 ( 1 ) , i.i.d. χ A + j = ∑ k = 1 K A + j is asymptotically χ 2 ( K ) and COP 1: K COP k r ) ,r < 1 / 2 The stepwise procedure is consistent if p = O ( n Dimension K and threshold in forward selection (backward elimnation) are chosen by cross-validation 36

SIR via MLE C , d = ∣ A ∣ Let A be the set of relevant predictors and C = A X A ∣ Y ∈ H s ∼ N ( μ s , Σ ) 37

SIR via MLE Let A be the set of relevant predictors and C = ¬ A , d = ∣ A ∣ X A ∣ Y ∈ H s ∼ N ( μ s , Σ ) X C ∣ X A ,Y ∈ H s ∼ N ( X A β , Σ 0 ) 38

SIR via MLE C , d = ∣ A ∣ Let A be the set of relevant predictors and C = A X A ∣ Y ∈ H s ∼ N ( μ s , Σ ) X C ∣ X A ,Y ∈ H s ∼ N ( X A β , Σ 0 ) μ s = α + Γ γ s , where γ s ∈ R K and Γ is a d × K orthogonal matrix 39

SIR via MLE C , d = ∣ A ∣ Let A be the set of relevant predictors and C = A X A ∣ Y ∈ H s ∼ N ( μ s , Σ ) X C ∣ X A ,Y ∈ H s ∼ N ( X A β , Σ 0 ) K , belongs to a K -dimensional affine space ( K < d ) μ s = α + V 40

SIR via MLE C , d = ∣ A ∣ Let A be the set of relevant predictors and C = A X A ∣ Y ∈ H s ∼ N ( μ s , Σ ) X C ∣ X A ,Y ∈ H s ∼ N ( X A β , Σ 0 ) K , belongs to a K -dimensional affine space ( K < d ) μ s = α + V K coincides with SIR directions MLE of the span of subspace V (Cook 2007, Szretter and Yohai 2009) 41

SIR via MLE Let A be the set of relevant predictors and C = ¬ A , d = ∣ A ∣ X A ∣ Y ∈ H s ∼ N ( μ s , Σ ) X C ∣ X A ,Y ∈ H s ∼ N ( X A β , Σ 0 ) K , belongs to a K -dimensional affine space ( K < d ) μ s = α + V K coincides with SIR directions MLE of the span of subspace V (Cook 2007, Szretter and Yohai 2009) Given current A and predctor X j ∉ A , we want to test H 0 : X j is irrelevant, vs. H 1 : X j is relevant 42

SIR via MLE C , d = ∣ A ∣ Let A be the set of relevant predictors and C = A X A ∣ Y ∈ H s ∼ N ( μ s , Σ ) X C ∣ X A ,Y ∈ H s ∼ N ( X A β , Σ 0 ) K , belongs to a K -dimensional affine space ( K < d ) μ s = α + V K coincides with SIR directions MLE of the span of subspace V (Cook 2007, Szretter and Yohai 2009) Given current A and predctor X j ∉ A , we want to test H 0 : X j is irrelevant, vs. H 1 : X j is relevant P M 1 ( X ∣ Y ) P M 1 ( X j ∣ X A ,Y ) P M 0 ( X ∣ Y ) = P M 0 ( X j ∣ X A ,Y ) 43

SIR via MLE C , d = ∣ A ∣ Let A be the set of relevant predictors and C = A X A ∣ Y ∈ H s ∼ N ( μ s , Σ ) X C ∣ X A ,Y ∈ H s ∼ N ( X A β , Σ 0 ) K , belongs to a K -dimensional affine space ( K < d ) μ s = α + V K coincides with SIR directions MLE of the span of subspace V (Cook 2007, Szretter and Yohai 2009) Given current A and predctor X j ∉ A , we want to test H 0 : X j is irrelevant, vs. H 1 : X j is relevant M 1 ( X j ∣ X A ,Y ) P ̂ LR j = M 0 ( X j ∣ X A ,Y ) P ̂ 44

̂ LR Test vs. COP Given current A , the likelihood ratio (LR) test statistic of H 0 : X j is irrelevant, vs. H 1 : X j is relevant A + j )− ∑ k = 1 K K 2LR j = − n ( ∑ k = 1 log ( 1 − ̂ log ( 1 − ̂ λ k A )) λ k log ( A + j ) A + j − ̂ λ k A λ k K = n ∑ k = 1 1 + 1 − ̂ λ k Under H 0 : X j is irrelevant A + j = n ( ̂ λ k A + j − ̂ λ k 2 ( 1 ) , ( ̂ λ k A + j − ̂ λ k A ) A ) → p χ → p 0 COP k 1 − ̂ λ k 1 − ̂ λ k A + j A + j 45

Sliced Inverse Regression with Interaction (Siri) Detection for - PowerPoint PPT Presentation

Sliced Inverse Regression with Interaction (Siri) Detection for non-Gaussian BN learning Jun S. Liu Department of Statistics Harvard University Joint work with Bo Jiang 1 General: Regression and Classification Responses Covariates Ind 1 x

Using Siri with iOS 10 Naples MUG March 13 2017 Siri now talks to apps Apple made Siri available

Statistical Inverse Problems and abstract inverse problems examples Instrumental Variables

Dynamic Inverse Problems: Schmitt Efficient Algorithms and Approximate Inverse Problems

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Teaching Generative Language W orkshop W orkbook: NAC August 2016 Siri Ming, Ph.D., BCBA

Responsible Procurement Ensuring efficient compliance in Challenging Supply Chains Siri

On the Combined Behaviour of Autonomous Resource Management Agents Siri Fagernes and Alva L.

Inverse Kinematics Inverse Kinematics Inverse Kinematics Carnegie Carnegie Sebastian Grassia

Course on Inverse Problems Albert Tarantola Lesson VI: a) General Formulation of the Inverse

the interaction The Interaction interaction models translations between user and system

the interaction physical characteristics of interaction interaction styles the

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Statistics on hypercube orientations Lara Pudwell faculty.valpo.edu/lpudwell joint work with

HIV Epidemiology Update: Maryland Colin Flynn Prevention and Health Promotion Administration

metall meta llic ic bio iomat materia erials ls wi with es esse senti ntial al oil ils

WebTransport + WebCodecs at W3C Games Workshop 6/19 Problem 1: WebRTC not great for cloud gaming

Application Examples and Timetabling Marco Chiarandini Outline 1. Metaheuristics (continued)

How Can Metaheuristics Help How Can Metaheuristics Help Software Engineers Software Engineers

Inserting Active Components of Particle Swarm Inserting Active Components of Particle Swarm

Op#miza#on of LLVM-Based Code using Mul#-Objec#ve Evolu#onary Algorithms Bernab Dorronsoro

Sambuz

Useful Links

Newsletter

Mail Us