Deep Generalized Method of Moments for Instrumental Variable - PowerPoint PPT Presentation

Deep Generalized Method of Moments for Instrumental Variable Analysis Andrew Bennett, Nathan Kallus, Tobias Schnabel

Intro Background Method Experiments Endogeneity ◮ g 0 ( x ) = max( x, x/ 5) ◮ Y = g 0 ( X ) − 2 ǫ + η ◮ X = Z + 2 ǫ, Z, ǫ, η ∼ N (0 , 1) 4 2 0 2 true g 0 estimated by neural net 4 observed 6 4 2 0 2 4 6 8

Intro Background Method Experiments IV Model ◮ Y = g 0 ( X ) + ǫ ◮ E ǫ = 0 , E ǫ 2 < ∞ ◮ E [ ǫ | X ] � = 0 ◮ Hence, g 0 ( X ) � = E [ Y | X ] ◮ Instrument Z has ◮ E [ ǫ | Z ] = 0 ◮ P ( X | Z ) � = P ( X ) ◮ If had additional endogenous context L , include it in both X and Z ◮ g 0 ∈ G = { g ( · ; θ ) : θ ∈ Θ } ◮ θ 0 ∈ Θ is such that g 0 ( x ) = g ( x ; θ 0 )

Intro Background Method Experiments IV is Workhorse of Empirical Research Source of Instrumental Outcome Variable Endogenous Variable Variable(s) Reference 1. Natural Experiments Labor supply Disability insurance Region and time variation in Gruber (2000) replacement rates benefit rules From Angrist & Krueger 2001 Labor supply Fertility Sibling-Sex composition Angrist and Evans (1998) Education, Labor Out-of-wedlock Occurrence of twin births Bronars and Grogger supply fertility (1994) Wages Unemployment State laws Anderson and Meyer insurance tax rate (2000) Earnings Years of schooling Region and time variation in Duflo (2001) school construction Earnings Years of schooling Proximity to college Card (1995) Earnings Years of schooling Quarter of birth Angrist and Krueger (1991) Earnings Veteran status Cohort dummies Imbens and van der Klaauw (1995) Earnings Veteran status Draft lottery number Angrist (1990) Achievement test Class size Discontinuities in class size Angrist and Lavy (1999) scores due to maximum class-size rule College enrollment Financial aid Discontinuities in financial van der Klaauw (1996) aid formula Health Heart attack surgery Proximity to cardiac care McClellan, McNeil and centers Newhouse (1994) Crime Police Electoral cycles Levitt (1997) Employment and Length of prison Randomly assigned federal Kling (1999) Earnings sentence judges Birth weight Maternal smoking State cigarette taxes Evans and Ringel (1999)

Intro Background Method Experiments Going further ◮ Standard methods like 2SLS and GMM and more recent variants are significantly impeded when: ◮ X is structured high-dimensional ( e.g. , image)? ◮ and/or Z is structured high-dimensional ( e.g. , image)? ◮ and/or g 0 is complex ( e.g. , neural network)? ◮ (As we’ll discuss)

Intro Background Method Experiments DeepGMM ◮ We develop a method termed DeepGMM ◮ Aims to addresses IV with such high-dimensional variables / complex relationships ◮ Based on a new variational interpretation of optimally-weighted GMM (inverse-covariance), which we use to efficiently control very many moment conditions ◮ DeepGMM given by the solution to a smooth zero-sum game, which we solve with iterative smooth-game-playing algorithms (` a la GANs) ◮ Numerical results will show that DeepGMM matches the performance of best-tuned methods in standard settings and continues to work in high-dimensional settings where even recent methods break

Intro Background Method Experiments This talk 1 Introduction 2 Background 3 Methodology 4 Experiments

Intro Background Method Experiments Two-stage methods ◮ E [ ǫ | Z ] = 0 implies � E [ Y | Z ] = E [ g 0 ( X ) | Z ] = g 0 ( x ) d P ( X = x | Z ) ◮ If g ( x ; θ ) = θ T φ ( x ) : becomes E [ Y | Z ] = θ T E [ φ ( X ) | Z ] ◮ Leads to 2SLS: regress φ ( X ) on Z (possibly transformed) by least-squares and then regress Y on ˆ E [ φ ( X ) | Z ] ◮ Various methods that find basis expansions non-parametrically ( e.g. , Newey and Powell) ◮ In lieu of a basis, DeepIV instead suggests to learn P ( X = x | Z ) as NN-parameterized Gaussian mixture ◮ Doesn’t work if X is rich ◮ Can suffer from “forbidden regression” ◮ Unlike least-squares, MLE doesn’t guarantee orthogonality irrespective of specification

Intro Background Method Experiments Moment methods ◮ E [ ǫ | Z ] = 0 implies E [ f ( Z )( Y − g 0 ( X ))] = 0 ◮ For any f 1 , . . . , f m implies the moment conditions ψ ( f j ; θ 0 ) = 0 where ψ ( f ; θ ) = E [ f ( Z )( Y − g ( X ; θ ))] ◮ GMM takes ψ n ( f ; θ ) = ˆ E n [ f ( Z )( Y − g ( X ; θ ))] and sets θ GMM ∈ argmin ˆ � ( ψ n ( f 1 ; θ ) , . . . , ψ n ( f m ; θ )) � 2 θ ∈ Θ ◮ Usually: � · � 2 . Recently, AGMM: � · � ∞ ◮ Significant inefficiencies with many moments: wasting modeling power to make redundant moments small ◮ Hansen et al: (With finitely-many moments) this norm gives the minimal asymptotic variance (efficiency) for any ˜ θ → p θ 0 : � v � 2 = v T C − 1 � n θ v, [ C θ ] jk = 1 i =1 f j ( Z i ) f k ( Z i )( Y i − g ( X i ; θ )) 2 . ˜ n ◮ E.g., two-step/iterated/cts GMM. Generically OWGMM.

Intro Background Method Experiments Failure with Many Moment Conditions ◮ When g ( x ; θ ) is a flexible model, many – possibly infinitely many – moment conditions may be needed to identify θ 0 ◮ But both GMM and OWGMM will fail if we use too many moments

Intro Background Method Experiments Variational Reformulation of OWGMM ◮ Let V be vector space of real-valued fns of Z ◮ ψ n ( f ; θ ) is a linear operator on V i =1 f ( Z i ) h ( Z i )( Y i − g ( X i ; θ )) 2 is a bilinear ◮ C θ ( f, h ) = 1 � n n form on V ◮ Given any subset F ⊆ V , define ψ n ( f ; θ ) − 1 Ψ n ( θ ; F , ˜ θ ) = sup 4 C ˜ θ ( f, f ) f ∈F Theorem Let F = span( f 1 , . . . , f m ) be a subspace. For OWGMM norm: � ( ψ n ( f 1 ; θ ) , . . . , ψ n ( f m ; θ )) � 2 = Ψ n ( θ ; F , ˜ θ ) . θ OWGMM ∈ argmin θ ∈ Θ Ψ n ( θ ; F , ˜ Hence: ˆ θ ) .

Intro Background Method Experiments DeepGMM ◮ Idea: use this reformulation and replace F with a rich set ◮ But not with a hi-dim subspace (that’d just be GMM) ◮ Let F = { f ( z ; τ ) : τ ∈ T } , G = { g ( x ; θ ) : θ ∈ Θ } be all networks of given architecture with varying weights τ, θ ◮ (Think about it as the union the spans of the penultimate layer functions) ◮ DeepGMM is then given by the solution to the smooth zero-sum game (for any data-driven ˜ θ ) θ DeepGMM ∈ argmin ˆ sup U ˜ θ ( θ, τ ) θ ∈ Θ τ ∈T � n θ ( θ, τ ) = 1 i =1 f ( Z i ; τ )( Y i − g ( X i ; θ )) where U ˜ n � n i =1 f 2 ( Z i ; τ )( Y i − g ( X i ; ˜ 1 θ )) 2 . − 4 n

Intro Background Method Experiments Consistency of DeepGMM ◮ Assumptions: ◮ Identification: θ 0 uniquely solves ψ ( f ; θ ) = 0 ∀ f ∈ F ◮ Complexity: F , G have vanishing Rademacher complexities (alternatively, can use a combinatorial measure like VC) ◮ Absolutely star shaped: f ∈ F , | λ | ≤ 1 = ⇒ ( λf ) ∈ F ◮ Continuity: g ( x ; θ ) , f ( x ; τ ) are continuous in θ, τ for all x ◮ Boundedness: Y, sup θ ∈ Θ | g ( X ; θ ) | , sup τ ∈T | f ( Z ; τ ) | bounded Theorem Let ˜ θ n by any data-dependent sequence with a limit in probability. Let ˆ θ n , ˆ τ n be any approximate equilibrium of our game, i.e., θ n (ˆ θ n (ˆ sup U ˜ θ n , τ ) − o p (1) ≤ U ˜ θ n , ˆ τ n ) ≤ inf θ U ˜ θ n ( θ, ˆ τ n ) + o p (1) . τ ∈T Then ˆ θ n → p θ 0 .

Intro Background Method Experiments Consistency of DeepGMM ◮ Specification is much more defensible when use such a rich F ◮ Nonetheless, if we drop specification we instead get θ : ψ ( f ; θ )=0 ∀ f ∈F � θ − ˆ inf θ n � → p 0

Intro Background Method Experiments Optimization ◮ Thanks to surge of interest in GANs, lots of good algorithms for playing smooth games ◮ We use OAdam by Daskalakis et al. ◮ Main idea: use updates with negative momentum

Intro Background Method Experiments Choosing ˜ θ ◮ Ideally ˜ θ ≈ θ 0 θ DeepGMM using another ˜ ◮ Can let it be ˆ θ ◮ Can repeat this ◮ To simulate this, at every step of the learning algorithm, we update it to be the last θ iterate

Intro Background Method Experiments Overview ◮ Low-dimensional scenarios: 2-dim Z , 1-dim Z ◮ High-dimensional scenarios: Z , X , or both are images ◮ Benchmarks: ◮ DirectNN: regress Y on X with NN ◮ Vanilla2SLS: all linear ◮ Poly2SLS: select degree and ridge penalty by CV ◮ GMM+NN*: OWGMM with NN g ( x ; θ ) ; solve using Adam ◮ When Z is low-dim expand with 10 RBFs around EM clustering centroids. When Z is high-dim use raw instrument. ◮ AGMM: github.com/vsyrgkanis/adversarial gmm ◮ One-step GMM with � · � ∞ + jitter update to moments ◮ Same moment conditions as above ◮ DeepIV: github.com/microsoft/EconML

Intro Background Method Experiments Low-dimensional scenarios Y = g 0 ( X ) + e + δ X = 0 . 5 Z 1 + 0 . 5 e + γ Z ∼ Uniform([ − 3 , 3] 2 ) e ∼ N (0 , 1) , γ, δ ∼ N (0 , 0 . 1) ◮ abs : g 0 ( x ) = | x | ◮ linear : g 0 ( x ) = x ◮ sin : g 0 ( x ) = sin( x ) ◮ step : g 0 ( x ) = I { x ≥ 0 }

Intro Background Method Experiments sin step abs linear

Intro Background Method Experiments

Deep Generalized Method of Moments for Instrumental Variable - PowerPoint PPT Presentation

Deep Generalized Method of Moments for Instrumental Variable Analysis Andrew Bennett, Nathan Kallus, Tobias Schnabel Intro Background Method Experiments Endogeneity g 0 ( x ) = max( x, x/ 5) Y = g 0 ( X ) 2 + X = Z + 2

Generalized Method of Moments (GMM) Estimation Heino Bohn Nielsen 1 of 35 Outline (1)

Econ 2148, fall 2017 Instrumental variables II, continuous treatment Maximilian Kasy Department

Econ 2148, fall 2017 Instrumental variables I, origins and binary treatment case Maximilian Kasy

Econ 2148, fall 2019 Instrumental variables I, origins and binary treatment case Maximilian Kasy

Econ 2148, fall 2019 Instrumental variables II, continuous treatment Maximilian Kasy Department

APPLYING THE METHOD APPLYING THE METHOD OF MOMENTS TO OF MOMENTS TO DEVELOP RELIABILITY

A Course in Applied Econometrics Outline Lecture 16 1. Introduction 2. Generalized Method

Generalized Method of Moments (GMM) Estimation Heino Bohn Nielsen 1 of 32 Outline (1)

Generalized method of moments estimation of linear dynamic panel data models Sebastian Kripfganz

Generalized Method of Moments (GMM) Estimation Heino Bohn Nielsen 1 of 29 Outline of the

Estimating Frequency Moments Moments Estimating F 0 Algorithm Correctness Anil Maheshwari

Instrumental Variables for Dummies January 2011 () IV January 2011 1 / 4 Instrumental

Instrumental Variables Philosophy of Economics University of Virginia Matthias Brinkmann

Generalized MPLS Signaling draft-ietf-mpls-generalized-signaling-05.txt

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

Generalized method of moments for an extended Gamma process Zeina Al Masry, Sophie Mercier,

Overview of Computer Algebra http://cocoa.dima.unige.it/ J. Abbott Universitt Kassel J.

Fermilab http://www.fnal.gov Chromaticity Correction for a Muon Collider Optics Y. Alexahin, E.

Origin of EWSB Landau Ginzburg Potential with its origin unexplained V ( H ) = 2 H H

Cementing High Availability in OpenFlow with RuleBricks Dan Williams & Hani Jamjoom IBM T. J.

Recent works on joint distributions involving the Poupard and Entringer statistics Guoniu Han

Compact Stars within a SU(3) chiral Quark Meson Model Andreas Zacchi 1 Matthias Hanauske 1 , 3

Pauli blocking in the pion gas - a lesson for compact star physics 1 David Blaschke Institute of

Anisotropic Structures - Theory and Design Strutture anisotrope: teoria e progetto Paolo VANNUCCI

Deep Generalized Method of Moments for Instrumental Variable - PowerPoint PPT Presentation

Deep Generalized Method of Moments for Instrumental Variable Analysis Andrew Bennett, Nathan Kallus, Tobias Schnabel Intro Background Method Experiments Endogeneity g 0 ( x ) = max( x, x/ 5) Y = g 0 ( X ) 2 + X = Z + 2

Generalized Method of Moments (GMM) Estimation Heino Bohn Nielsen 1 of 35 Outline (1)

Econ 2148, fall 2017 Instrumental variables II, continuous treatment Maximilian Kasy Department

Econ 2148, fall 2017 Instrumental variables I, origins and binary treatment case Maximilian Kasy

Econ 2148, fall 2019 Instrumental variables I, origins and binary treatment case Maximilian Kasy

Econ 2148, fall 2019 Instrumental variables II, continuous treatment Maximilian Kasy Department

APPLYING THE METHOD APPLYING THE METHOD OF MOMENTS TO OF MOMENTS TO DEVELOP RELIABILITY

A Course in Applied Econometrics Outline Lecture 16 1. Introduction 2. Generalized Method

Generalized Method of Moments (GMM) Estimation Heino Bohn Nielsen 1 of 32 Outline (1)

Generalized method of moments estimation of linear dynamic panel data models Sebastian Kripfganz

Generalized Method of Moments (GMM) Estimation Heino Bohn Nielsen 1 of 29 Outline of the

Estimating Frequency Moments Moments Estimating F 0 Algorithm Correctness Anil Maheshwari

Instrumental Variables for Dummies January 2011 () IV January 2011 1 / 4 Instrumental

Instrumental Variables Philosophy of Economics University of Virginia Matthias Brinkmann

Generalized MPLS Signaling draft-ietf-mpls-generalized-signaling-05.txt

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

Generalized method of moments for an extended Gamma process Zeina Al Masry, Sophie Mercier,

Overview of Computer Algebra http://cocoa.dima.unige.it/ J. Abbott Universitt Kassel J.

Fermilab http://www.fnal.gov Chromaticity Correction for a Muon Collider Optics Y. Alexahin, E.

Origin of EWSB Landau Ginzburg Potential with its origin unexplained V ( H ) = 2 H H

Cementing High Availability in OpenFlow with RuleBricks Dan Williams &amp; Hani Jamjoom IBM T. J.

Recent works on joint distributions involving the Poupard and Entringer statistics Guoniu Han

Compact Stars within a SU(3) chiral Quark Meson Model Andreas Zacchi 1 Matthias Hanauske 1 , 3

Pauli blocking in the pion gas - a lesson for compact star physics 1 David Blaschke Institute of

Anisotropic Structures - Theory and Design Strutture anisotrope: teoria e progetto Paolo VANNUCCI

Cementing High Availability in OpenFlow with RuleBricks Dan Williams & Hani Jamjoom IBM T. J.