Efficient Policy Learning from Surrogate-Loss Classifications - PowerPoint PPT Presentation

Efficient Policy Learning from Surrogate-Loss Classifications Andrew Bennett (Cornell Tech) Joint work with Nathan Kallus (Cornell Tech) Andrew Bennett Efficient Policy Learning 1 / 25

This Talk Introduction 1 Surrogate-Loss Reduction 2 Efficient Policy Learning Theory 3 ESPRM Algorithm 4 Experiments 5 Andrew Bennett Efficient Policy Learning 2 / 25

Offline Policy Learning Problem Important problem since experimenting with treatments may be unethetical, costly, or impossible! Andrew Bennett Efficient Policy Learning 3 / 25

Offline Policy Learning Problem Problem is non-trivial since data may be confounded! Andrew Bennett Efficient Policy Learning 4 / 25

Our Contribution Although there is work on efficient policy evaluation, we show that performing policy optimization using an efficiently estimated objective does not lead to efficient learning of optimal policy parameters. We present a novel algorithm for efficiently learning optimal policy parameters, and show that it leads to improved regret. Andrew Bennett Efficient Policy Learning 5 / 25

Policy Learning Formalization Notation: X : Individual covariates T : Binary treatment decision ( − 1 or 1) Y : Observed outcome given actual treatment T Y ( t ): (unobserved) Potential outcome that would have occurred under treatment t , for t ∈ {− 1 , 1 } Policy: π ( x ) denotes treatment decision given X = x . Assume iid data (( X 1 , T 1 , Y 1 ) , . . . , ( X n , T n , Y n )). Goal: Find a policy π that maximizes E [ Y ( π ( X ))]. Assume that Y = Y ( T ), and that logging policy was a function of X only (no hidden confounding) Andrew Bennett Efficient Policy Learning 7 / 25

Policy Learning Objective Let J ( π ) = E [ Y ( π ( X ))] − 1 2 [ Y (+1) − Y ( − 1)] Can easily show that J ( π ) = E [ ψπ ( X )] for a range of ψ such as TY ψ IPS = P ( T | X ) ψ DM = E [ Y | X , T = 1] − E [ Y | X , T = − 1] ψ DR = ψ DM + ψ IPS − T E [ Y | X , T ] P ( T | X ) Given estimates of the nuisance functions P ( T = t | X = x ) and E [ Y | X = x , T = t ], can estimate ψ from the observed data for any given triplet ( X , T , Y ) This suggests approximating the above objective by � n i =1 ˆ ψ i π ( X i ), where ˆ J n ( π ) = 1 ψ i are estimated as above. n Reduction to Empirical Risk Minimization! Andrew Bennett Efficient Policy Learning 8 / 25

Surrogate-Loss Reduction � n i =1 ˆ J n ( π ) = 1 ψ i π ( X i ) is a non-convex objective. n As in (weighted) binary classification, we can make the problem tractable by replacing this 0/1 loss with a convex surrogate. Consider parametric class of functions g θ for θ ∈ Θ, and let π θ ( x ) = sign( g θ ( x )) denote parametric policy class. Surrogate-loss objective is n L n ( θ ) = 1 � | ˆ ψ i | l ( g θ ( X i ) , sign( ˆ ψ i )) , n i =1 for some convex loss function l Andrew Bennett Efficient Policy Learning 9 / 25

Efficient Policy Learning There is a rich past literature that considers efficiently estimating the cost function to be optimized, based on the above approaches. However, none of this work addresses whether the optimal policy parameters themselves are efficiently estimated! Andrew Bennett Efficient Policy Learning 10 / 25

Assumptions for Surrogate-Loss Reduction 1. Valid Weighting Function: E [ ψ | X ] = E [ Y (+1) − Y ( − 1) | X ] This holds for the IS, DM, and DR ψ functions. 2. Regularity: E [ | ψ | ] < ∞ 3. Correct Specification: � � { g θ : θ ∈ Θ } ∩ arg min E [ | ψ | l ( g ( X ) , sign( ψ ))] � = ∅ g unconstrained This assumption means there is a policy in our parametric class that minimizes the population surrogate loss. Andrew Bennett Efficient Policy Learning 12 / 25

Semiparametric Model Implied by Assumptions Lemma Assuming correct specification, and using logistic regression loss, the implied model is given by all distributions on ( X , T , Y ) for which there exists θ ∗ ∈ Θ satisfying P ( ψ > 0 | X ) = σ ( g θ ∗ ( X )) almost surely . E [ | ψ | | X ] σ denotes the logistic function. This model is in general semiparametric , because the set of distributions on ( X , T , Y ) that satisfy this constraint is infinite-dimensional. Andrew Bennett Efficient Policy Learning 13 / 25

Efficient Learning Implies Optimal Regret Recall that J is true objective, and L is surrogate loss objective. Define: Regret J ( θ ) = arg max J ( π ) − J ( θ ) π unconstrained Regret L ( θ ) = L ( θ ) − inf θ ∈ Θ L ( θ ) , and let AR L (ˆ θ n ) be the distribution limit of n Regret L (ˆ θ n ). Theorem Under our assumptions, we have Regret J ( θ ) ≤ Regret L ( θ ) . Furthermore, if ˆ θ n is a semiparametrically efficient then E [ φ ( AR L (ˆ θ n ))] is minimized for any non-decreasing φ . Andrew Bennett Efficient Policy Learning 14 / 25

Conditional Moment Formulation Theorem Define m ( X ; θ ) = E [ | ψ | l ′ ( g θ ( X ) , sign( ψ )) | X ] . Then under our assumptions the policy given by θ ∗ is optimal if and only if m ( X ; θ ∗ ) = 0 almost surely. This is equivalent to E [ f ( X ) | ψ | l ′ ( g θ ( g θ ( X ) , sign( ψ ))] = 0 for every square integrable function f . There exists an extensive literature on solving these kinds of problems efficiently! Andrew Bennett Efficient Policy Learning 16 / 25

ESPRM Algorithm We extend an existing algorithm that was previously designed for efficiently solving instrumental variable regression. Define: u ( X , ψ ; θ, f ) = f ( X ) | ψ | l ′ ( g θ ( X ) , sign( ψ )) n n θ ) = 1 ψ i ; θ, f ) − 1 U n ( θ, f ; ˜ � u ( X i , ˆ � u ( X i , ˆ ψ i ; ˜ θ, f ) 2 n 4 n i =1 i =1 Given some flexible function class F , and prior policy estimate ˜ θ , our efficient surrogate policy risk minimization (ESPRM) estimator is given by: θ ESPRM = arg min ˆ U n ( θ, f ; ˜ sup θ ) θ ∈ Θ f ∈F Andrew Bennett Efficient Policy Learning 17 / 25

Experimental Setup We compare our ESPRM method against empirical risk minimization (ERM) using the logistic regression surrogate loss. We test the following policy classes: LinearPolicy : g θ ( x ) = θ t x FlexiblePolicy : g θ parametrized by neural network We test on the following kinds of synthetic scenarios: Linear : Optimal policy is linear Quadratic : Optimal policy is quadratic In all cases we compare algorithms over a large number of randomly generated synthetic scenarios of the given kind. Andrew Bennett Efficient Policy Learning 19 / 25

Experimental Results – Linear Policy Class � � E [Regret J (ˆ θ n )] RMRR (ˆ θ n ) = 1 − × 100% , E [Regret J (ˆ θ ERM] )] n Andrew Bennett Efficient Policy Learning 20 / 25

Experimental Results – Flexible Policy Class � � E [Regret J (ˆ θ n )] RMRR (ˆ θ n ) = 1 − × 100% , E [Regret J (ˆ θ ERM] )] n Andrew Bennett Efficient Policy Learning 21 / 25

Jobs Case Study We consider a case study based on different programs assigned to unemployed individuals in France. Individuals were randomly assigned to either an intensive counseling program by a private agency, or a similar program by a public agency. We also have access to individual covariates and outcomes (based on whether they re-entered work force within six months, minus treatment cost). Andrew Bennett Efficient Policy Learning 22 / 25

Jobs Case Study We divide the experimental data into train and test splits, and introduce selection bias by randomly dropping training units based on covariates. We learn treatment assignment policies using ESPRM and ERM on the artificially confounded training data. Learnt policies are evaluated on the test data using a Horvitz-Thompson estimator. Estimated policy values: Policy Class ERM ESPRM Difference Linear − 0 . 96 ± 4 . 32 4 . 42 ± 3 . 78 5 . 38 ± 5 . 06 Flexible − 1 . 75 ± 4 . 64 7 . 68 ± 3 . 16 9 . 42 ± 5 . 17 Andrew Bennett Efficient Policy Learning 23 / 25

Summary Although there is work on efficient policy evaluation, policy learning using an efficiently estimated objective does not lead to efficient learning of optimal policy parameters. We presented an algorithm for policy learning based on theory of conditional moment problems that is efficient. We showed both theoretically and empirically that efficient optimal policy estimation implies improved regret. Andrew Bennett Efficient Policy Learning 24 / 25

Thank You Thank you for listening, and please check our our full paper “Efficient Policy Learning from Surrogate-Loss Classification Reductions”! Andrew Bennett Efficient Policy Learning 25 / 25

Efficient Policy Learning from Surrogate-Loss Classifications - PowerPoint PPT Presentation

Efficient Policy Learning from Surrogate-Loss Classifications Andrew Bennett (Cornell Tech) Joint work with Nathan Kallus (Cornell Tech) Andrew Bennett Efficient Policy Learning 1 / 25 This Talk Introduction 1 Surrogate-Loss Reduction 2

Sampling Lecture 30 ME EN 575 Andrew Ning aning@byu.edu Outline Surrogate Based Optimization

EFPIA POSITION PAPER EFPIA POSITION PAPER THE EFPIA SURROGATE THE EFPIA SURROGATE ENDPOINT

Urban Drainage Systems PhD Candidate: Mahmood Mahmoodian Daily supervisor: Ulrich Leopold WHAT

Surrogate production technology in fish Martin Penika, Taiju Saito www.frov.jcu.cz Content

The Search for an Optimal Immunological Surrogate Endpoint in Randomized Vaccine Efficacy Trials

Surrogate models for Single and Multi-Objective Stochastic Optimization: Integrating Support

CHRONIC CHRONIC VISUAL LOSS VISUAL LOSS Wasu Supakornthanasarn, MD. Visual loss Sensory

Early Hearing Early Hearing Early Hearing loss D Early Hearing-loss D loss D loss D

Statistical Validation of Endophenotypes Using a Surrogate Endpoint Analytic Analogue Guan-Hua

Integration of Multiple Biomarkers (BM), Translation to Surrogate/Outcomes and Their Translation

Comparison of Ordinal and Metric Gaussian Process Regression as Surrogate Models for CMA

Content Comparison of two nearness-to- collision surrogate indicators at a Problem statement

Converting a biomarker to a surrogate What should it take? AstraZeneca Priority List- case

Introduction Carl Johnsson, PhD student from Lund University in Sweden Surrogate safety

IMPLEMENTATION OF THE BAN ON SURROGATE ADVERTISEMENTS NISHI ARORA OC 35 Track IMPLEMENTATION

Study of the surrogate-reac1on method via the simultaneous

M-Estimation of Log-Linear Structure Models Noah Smith , Doug Vail , and John Lafferty School of

Yes, complexity pseudoinverse methods Ola Hrkegrd Active Set Algorithms for

COVID-19 vaccines: Policy Questions, Evidence to Recommendation Framework & Critical and

The Brother in Law Effect David K. Levine, Federico Weinschelbaum and Felipe Zurita June 21, 2006

Efficient estimators in nonlinear and heteroscedastic autoregressive models with constraints

Estimation Theory Overview Introduction Up until now we have defined and discussed properties

The Interplay of Information Theory, Probability, and Statistics Andrew Barron Yale University,

Importance sampling algorithms for first passage time probabilities in the infinite server queue

Efficient Policy Learning from Surrogate-Loss Classifications - PowerPoint PPT Presentation

Efficient Policy Learning from Surrogate-Loss Classifications Andrew Bennett (Cornell Tech) Joint work with Nathan Kallus (Cornell Tech) Andrew Bennett Efficient Policy Learning 1 / 25 This Talk Introduction 1 Surrogate-Loss Reduction 2

Sampling Lecture 30 ME EN 575 Andrew Ning aning@byu.edu Outline Surrogate Based Optimization

EFPIA POSITION PAPER EFPIA POSITION PAPER THE EFPIA SURROGATE THE EFPIA SURROGATE ENDPOINT

Urban Drainage Systems PhD Candidate: Mahmood Mahmoodian Daily supervisor: Ulrich Leopold WHAT

Surrogate production technology in fish Martin Penika, Taiju Saito www.frov.jcu.cz Content

The Search for an Optimal Immunological Surrogate Endpoint in Randomized Vaccine Efficacy Trials

Surrogate models for Single and Multi-Objective Stochastic Optimization: Integrating Support

CHRONIC CHRONIC VISUAL LOSS VISUAL LOSS Wasu Supakornthanasarn, MD. Visual loss Sensory

Early Hearing Early Hearing Early Hearing loss D Early Hearing-loss D loss D loss D

Statistical Validation of Endophenotypes Using a Surrogate Endpoint Analytic Analogue Guan-Hua

Integration of Multiple Biomarkers (BM), Translation to Surrogate/Outcomes and Their Translation

Comparison of Ordinal and Metric Gaussian Process Regression as Surrogate Models for CMA

Content Comparison of two nearness-to- collision surrogate indicators at a Problem statement

Converting a biomarker to a surrogate What should it take? AstraZeneca Priority List- case

Introduction Carl Johnsson, PhD student from Lund University in Sweden Surrogate safety

IMPLEMENTATION OF THE BAN ON SURROGATE ADVERTISEMENTS NISHI ARORA OC 35 Track IMPLEMENTATION

Study of the surrogate-reac1on method via the simultaneous

M-Estimation of Log-Linear Structure Models Noah Smith , Doug Vail , and John Lafferty School of

Yes, complexity pseudoinverse methods Ola Hrkegrd Active Set Algorithms for

COVID-19 vaccines: Policy Questions, Evidence to Recommendation Framework &amp; Critical and

The Brother in Law Effect David K. Levine, Federico Weinschelbaum and Felipe Zurita June 21, 2006

Efficient estimators in nonlinear and heteroscedastic autoregressive models with constraints

Estimation Theory Overview Introduction Up until now we have defined and discussed properties

The Interplay of Information Theory, Probability, and Statistics Andrew Barron Yale University,

Importance sampling algorithms for first passage time probabilities in the infinite server queue

COVID-19 vaccines: Policy Questions, Evidence to Recommendation Framework & Critical and