Generalized sample selection model Magorzata Wojty 1 , Giampiero - PowerPoint PPT Presentation

Generalized sample selection model Małgorzata Wojtyś 1 , Giampiero Marra 2 1 Plymouth University, 2 University College London XLII Konferencja "Statystyka Matematyczna", Będlewo, November 29, 2016 Małgorzata Wojtyś 1 , Giampiero Marra 2 Generalized sample selection model

Plan Sample selection problem: Classical Heckman model Generalized model using GAM and copulae Estimation approach Real life application example Małgorzata Wojtyś 1 , Giampiero Marra 2 Generalized sample selection model

Motivating example Example : HIV prevalence P(HIV positive) ∼ socio-economic and health characteristics Some individuals in the sample refused to say whether they are HIV positive. They may differ in important characteristics from individuals who did answer the question. If the link between decision to provide an answer and being HIV positive exists and is not only through observables then sample selection bias arises and univariate equation model is not appropriate. Małgorzata Wojtyś 1 , Giampiero Marra 2 Generalized sample selection model

Sample selection Regression of primary interest: i ∼ x (1) Y ∗ , i = 1 , . . . , n , i where x ( 1 ) - row vector of predictors. i But: observations on some Y ∗ are missing, based on a combination of i observed and unobserved characteristics. Małgorzata Wojtyś 1 , Giampiero Marra 2 Generalized sample selection model

Sample selection Regression of primary interest: i ∼ x (1) Y ∗ , i = 1 , . . . , n , i where x ( 1 ) - row vector of predictors. i But: observations on some Y ∗ are missing, based on a combination of i observed and unobserved characteristics. Observables: Y i = Y ∗ i U i , where U i - binary selection variable, U i ∈ { 0 , 1 } . Małgorzata Wojtyś 1 , Giampiero Marra 2 Generalized sample selection model

Sample selection Regression of primary interest: i ∼ x (1) Y ∗ , i = 1 , . . . , n , i where x ( 1 ) - row vector of predictors. i But: observations on some Y ∗ are missing, based on a combination of i observed and unobserved characteristics. Observables: Y i = Y ∗ i U i , where U i - binary selection variable, U i ∈ { 0 , 1 } . Selection mechanism: P ( U i = 1) ∼ x (2) , i where x (2) - vector of covariates. i Małgorzata Wojtyś 1 , Giampiero Marra 2 Generalized sample selection model

Classical Heckmann’s (1979) model For i = 1 , . . . , n β (1) + ε 1 i i = x (1) Y ∗ i β (2) + ε 2 i i = x (2) U ∗ i Małgorzata Wojtyś 1 , Giampiero Marra 2 Generalized sample selection model

Classical Heckmann’s (1979) model For i = 1 , . . . , n β (1) + ε 1 i i = x (1) Y ∗ i β (2) + ε 2 i i = x (2) U ∗ i where � ε 1 i �� 0 � σ 2 � � �� ρσ ∼ N , ε 2 i 0 ρσ 1 Małgorzata Wojtyś 1 , Giampiero Marra 2 Generalized sample selection model

Classical Heckmann’s (1979) model For i = 1 , . . . , n β (1) + ε 1 i i = x (1) Y ∗ i β (2) + ε 2 i i = x (2) U ∗ i where � ε 1 i �� 0 � σ 2 � � �� ρσ ∼ N , ε 2 i 0 ρσ 1 Latent variables: Y ∗ i , U ∗ i . Observables: U i = I ( U ∗ i > 0) ( ⇒ probit regression) Y i = Y ∗ i U i Małgorzata Wojtyś 1 , Giampiero Marra 2 Generalized sample selection model

Classical Heckmann’s (1979) model For i = 1 , . . . , n β (1) + ε 1 i i = x (1) Y ∗ i β (2) + ε 2 i i = x (2) U ∗ i where � ε 1 i �� 0 � σ 2 � � �� ρσ ∼ N , ε 2 i 0 ρσ 1 Latent variables: Y ∗ i , U ∗ i . Observables: U i = I ( U ∗ i > 0) ( ⇒ probit regression) Y i = Y ∗ i U i Modifications: eg. bivariate t -distribution (Marchenko & Genton, 2012), Archimedean copulas (Smith, 2003). Małgorzata Wojtyś 1 , Giampiero Marra 2 Generalized sample selection model

Generalized sample selection model Random component Y ∗ ∼ f 1 belongs to an exponential family of distributions: � y η 1 − b ( η 1 ) � f 1 ( y | η 1 , φ ) = exp + c ( y , φ ) φ for some b ( · ) and c ( · ). It holds E ( Y ∗ ) = b ′ ( η 1 ) and Var ( Y ∗ ) = b ′′ ( η 1 ). Selection variable U = I ( U ∗ > 0) and U ∗ ∼ f 2 , where − ( u − η 2 ) 2 � � f 2 ( u | η 2 ) = exp . implying the probit regression model for U . F ( y , u ) – joint cdf of ( Y ∗ , U ∗ ), F 1 ( y ), F 2 ( u ) - marginal cdf’s. C θ – the copula such that F ( y , u ) = C θ ( F 1 ( y ) , F 2 ( u )) , where θ - dependence parameter of copula. Małgorzata Wojtyś 1 , Giampiero Marra 2 Generalized sample selection model

Likelihood Likelihood of an observed outcome ( Y , U ): � P ( U = 0) if U = 0 , L = f Y | U ( Y | U = 1) P ( U = 1) if U = 1 , Małgorzata Wojtyś 1 , Giampiero Marra 2 Generalized sample selection model

Likelihood Likelihood of an observed outcome ( Y , U ): � P ( U = 0) if U = 0 , L = f Y | U ( Y | U = 1) P ( U = 1) if U = 1 , It holds f Y | U ( y | U = 1) = ∂ ∂ y P ( Y ≤ y | U = 1) = P ( Y ∗ ≤ y , U ∗ > 0) = ∂ = ∂ F 1 ( y ) − F ( y , 0) = ∂ y P ( U = 1) ∂ y P ( U = 1) Małgorzata Wojtyś 1 , Giampiero Marra 2 Generalized sample selection model

Likelihood Likelihood of an observed outcome ( Y , U ): � P ( U = 0) if U = 0 , L = f Y | U ( Y | U = 1) P ( U = 1) if U = 1 , It holds f Y | U ( y | U = 1) = ∂ ∂ y P ( Y ≤ y | U = 1) = P ( Y ∗ ≤ y , U ∗ > 0) = ∂ = ∂ F 1 ( y ) − F ( y , 0) = ∂ y P ( U = 1) ∂ y P ( U = 1) 1 � f 1 ( y ) − ∂ � = ∂ y F ( y , 0) P ( U = 1) Małgorzata Wojtyś 1 , Giampiero Marra 2 Generalized sample selection model

Likelihood Likelihood of an observed outcome ( Y , U ): � P ( U = 0) if U = 0 , L = f Y | U ( Y | U = 1) P ( U = 1) if U = 1 , It holds f Y | U ( y | U = 1) = ∂ ∂ y P ( Y ≤ y | U = 1) = P ( Y ∗ ≤ y , U ∗ > 0) = ∂ = ∂ F 1 ( y ) − F ( y , 0) = ∂ y P ( U = 1) ∂ y P ( U = 1) 1 � f 1 ( y ) − ∂ � = ∂ y F ( y , 0) P ( U = 1) So � P ( U = 0) = F 2 (0) if U = 0 , L = f 1 ( y ) − ∂ ∂ y F ( y , 0) | y = Y if U = 1 , Małgorzata Wojtyś 1 , Giampiero Marra 2 Generalized sample selection model

Log-likelihood So: � U � f 1 ( y ) − ∂ L ( Y , U ) = F 2 (0) 1 − U × ∂ y F ( y , 0) | y = Y Małgorzata Wojtyś 1 , Giampiero Marra 2 Generalized sample selection model

Log-likelihood So: � U � f 1 ( y ) − ∂ L ( Y , U ) = F 2 (0) 1 − U × ∂ y F ( y , 0) | y = Y Using copula representation, we obtain log-likelihood: ℓ = (1 − U ) log F 2 (0) + U log ( f 1 ( Y ) (1 − z ( Y , η 1 , η 2 ))) , where z ( y , η 1 , η 2 ) = ∂ � ∂ v C θ ( v , F 2 (0)) � v → F 1 ( y ) The function z can be also expressed as z ( y , η 1 , η 2 ) = P ( U = 0) f Y ∗ | U ( y | U = 0)( f 1 ( y | η 1 )) − 1 . Małgorzata Wojtyś 1 , Giampiero Marra 2 Generalized sample selection model

The fact that E ( Y ) = b ′ ( η 1 ) implies ∂ ∂η 1 z ( Y , η 1 , η 2 ) ∂ ℓ = U ( Y − µ 1 ) + U ∂η 1 1 − z ( Y , η 1 , η 2 ) where µ 1 = E ( Y ). As E ( ∂ ∂η 1 ℓ ) = 0, � ∂ � ∂η 1 z ( Y , η 1 , η 2 ) Cov ( U , Y ) = − E U 1 − z ( Y , η 1 , η 2 ) which provides another interpretation for the function z ( Y , η 1 , η 2 ). Małgorzata Wojtyś 1 , Giampiero Marra 2 Generalized sample selection model

Generalized sample selection model Magorzata Wojty 1 , Giampiero - PowerPoint PPT Presentation

Generalized sample selection model Magorzata Wojty 1 , Giampiero Marra 2 1 Plymouth University, 2 University College London XLII Konferencja "Statystyka Matematyczna", Bdlewo, November 29, 2016 Magorzata Wojty 1 , Giampiero

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Sample 2 Inlet in western (Sunset) Bay 0 Sample 3 Inlet behind Christian Island 1 Sample

Agglomeration of Ash Particles due to Flue Gas Conditioning (a) Sample CA8S12F1 (b) Sample

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

Automatic Sample-by- sample Model Selection Between Two Off-the-shelf Classifiers Steve P.

Generalized MPLS Signaling draft-ietf-mpls-generalized-signaling-05.txt

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

STAT 213 Model Selection II Colin Reimer Dawson Oberlin College March 30, 2018 1 / 13 Outline

n e CC Sample Selection for the Near Detector CDR Tanaz Angelina Mohayai MPD Meeting Oct. 29,

Generalized Contagion Generalized Model of Contagion Principles of Complex Systems References

SEM Photographs of Activated ash samples SEM Micrographs (Original ash samples) (a) Sample S1F1

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Selection 2 Selection Selection given a set of (distinct) elements, finding the element larger

Generalized Model Predictive Control (Discretely Generalized MPC) Sa sa V. Rakovi c, Ph.D.

Key Concepts Nicky Best and Alexina Mason Imperial College London BAYES 2013, May 21-23, Erasmus

ADVANCED ECONOMETRICS I Theory (3/3) Instructor: Joaquim J. S. Ramalho E.mail:

Introduction to the R Statistical Computing Environment John Fox McMaster University May 2013

D y i 1 ,..., y iT | x i , c i D y it | x it , c i (12) t 1 (where we

Lecture 2: Model-based classification Felix Held, Mathematical Sciences MSA220/MVE440 Statistical

Madonna & Sustainability Sustainable Purchasing Leadership Council

T HE SCHOOL DI ST RI CT OF PHI L ADE L PHI A PROCURE ME NT DE PART ME NT ST

This talk is for the Producer Bootcamp [213] at GDC 2013. The description is on this page:

Sambuz

Useful Links

Newsletter

Mail Us

Generalized sample selection model Magorzata Wojty 1 , Giampiero - PowerPoint PPT Presentation

Generalized sample selection model Magorzata Wojty 1 , Giampiero Marra 2 1 Plymouth University, 2 University College London XLII Konferencja "Statystyka Matematyczna", Bdlewo, November 29, 2016 Magorzata Wojty 1 , Giampiero

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Sample 2 Inlet in western (Sunset) Bay 0 Sample 3 Inlet behind Christian Island 1 Sample

Agglomeration of Ash Particles due to Flue Gas Conditioning (a) Sample CA8S12F1 (b) Sample

ERP Selection KIRTANE &amp; PANDIT Suhas Deshpande Why ERP Selection is important ?

Automatic Sample-by- sample Model Selection Between Two Off-the-shelf Classifiers Steve P.

Generalized MPLS Signaling draft-ietf-mpls-generalized-signaling-05.txt

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

STAT 213 Model Selection II Colin Reimer Dawson Oberlin College March 30, 2018 1 / 13 Outline

n e CC Sample Selection for the Near Detector CDR Tanaz Angelina Mohayai MPD Meeting Oct. 29,

Generalized Contagion Generalized Model of Contagion Principles of Complex Systems References

SEM Photographs of Activated ash samples SEM Micrographs (Original ash samples) (a) Sample S1F1

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Selection 2 Selection Selection given a set of (distinct) elements, finding the element larger

Generalized Model Predictive Control (Discretely Generalized MPC) Sa sa V. Rakovi c, Ph.D.

Key Concepts Nicky Best and Alexina Mason Imperial College London BAYES 2013, May 21-23, Erasmus

ADVANCED ECONOMETRICS I Theory (3/3) Instructor: Joaquim J. S. Ramalho E.mail:

Introduction to the R Statistical Computing Environment John Fox McMaster University May 2013

D y i 1 ,..., y iT | x i , c i D y it | x it , c i (12) t 1 (where we

Lecture 2: Model-based classification Felix Held, Mathematical Sciences MSA220/MVE440 Statistical

Madonna &amp; Sustainability Sustainable Purchasing Leadership Council

T HE SCHOOL DI ST RI CT OF PHI L ADE L PHI A PROCURE ME NT DE PART ME NT ST

This talk is for the Producer Bootcamp [213] at GDC 2013. The description is on this page:

Sambuz

Useful Links

Newsletter

Mail Us

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

Madonna & Sustainability Sustainable Purchasing Leadership Council