Public Policy and Deep Reinforcement Learning on AWS Emily Webber | - - PowerPoint PPT Presentation

▶

Aug 12, 2023 179 likes •389 views

Public Policy and Deep Reinforcement Learning on AWS Emily Webber | Machine Learning Specialist at Amazon Web Services | To-be-open-sourced research project Public Policy Has Unique Challenges Structural Inefficiency Lack of single goal

SLIDE 1

Public Policy and Deep Reinforcement Learning

n AWS

Emily Webber | Machine Learning Specialist at Amazon Web Services | To-be-open-sourced research project

SLIDE 2

Public Policy Has Unique Challenges

Structural Inefficiency Lack of single goal Synthesize Information Leadership Turnover

SLIDE 3

What if we used machine learning to optimize public policy?

Personalized Collaborative Transparent Normalized Policy Data Decades of Economic Data Collaborative Reinforcement Learning

SLIDE 4

Data-Driven Public Policy Analysis Is Not New

Causal Inference
Counterfactual analysis
Intuitively, what would have happened

if the policy (or, treatment) had not been applied?

Can we convince ourselves that the

two groups were nearly identical

therwise?

Before After Treatment: Illinois 100 250 Control: New York 100 150 𝑍=𝛾↓0 +𝛾↓1 𝑌↓1 + 𝛾↓2 𝑌↓2 + 𝛾↓𝑈 𝑌↓𝑈 + … + 𝜗

Did the treatment cause this difference?

SLIDE 5

Machine Learning Reinforcement Learning

Use Case Actions Rewards

Learning Theory Fundamentals

Use Case Model Data

SLIDE 6

Mathematically speaking

Bellman Equation for Reinforcement Learning

Utility per state,

r value

Recursive call

n utility function

Reward per state, a real number Discount factor Transition value For each possible adjacent state Current state Available action Adjacent state, iterable

SLIDE 7

Our reward function

Ask, are they similar? T-test
Use logical reasoning
Eventually, scale with another ML

model using data labelled by experts

A deep learning model maps the economic variables to a policy suggestion The simulator picks treatment and control states and runs a regression on historical data We use the estimated effect of the policy as our reward signal, scaled by validity of the experiment

Causal Inference Reinforcement Learning Policy Estimation

“Pareto”

SLIDE 8

But how do we pick the right way to optimize?

SLIDE 9

Philosophical Foundations

Utilitarianism

Pareto Improvements

Egalitarianism Kantian Rights Libertarianism

Personal Value Equality Freedom Improve at least one person, without making anyone worse off Universal Rights

SLIDE 10

There is no single best optimization strategy What we can do is use data to automatically suggest policies based on user-defined preferences

SLIDE 11

What do you want to see in public policy?

Personal Freedom Equality of outcomes Less crime Access to education Access to social services Less waste Equality of opportunity Less traffic Better health care

What do you think impacts crime the most? In my neighborhood, people commit crimes because there are no jobs here. Submit Given your views, we recommend evaluating :

Outcomes Indicators

Crime Employment Income Savings

Confirm?

SLIDE 12

These policies are impacting you today. Here’s how to engage your elected officials Your policy recommendations Bill 789 Bill 238 Bill 121

Reducing income Creating jobs Increasing traffic

13.45 42.66 .05 Please correct bill 789, it is lowering my income Email Bill 789 Bill 238 Bill 121

Reduce taxation Continue investment Build more highways

SLIDE 13

Would you like to see another point of view?

What if we could step into someone else’s shoes?

SLIDE 14

Your policy recommendations Bill 789 Bill 238 Bill 121

Reduce taxation Continue investment Build more highways

Another point of view Bill 789 Bill 238 Bill 121

Increase taxation Continue investment More Public Transit Personal Freedom Increase Increase Equality Overall

SLIDE 15

for ism ism in philosophical_frameworks: utility = define_utility(ism ism) data = update_data(utility) model = get_pareto(data) Technically speaking:

SLIDE 16

How should we handle air traffic delays?

SLIDE 17

Utilitarianism

Egalitarianism Kantian Rights

Libertarianism

Do whatever increases

verall utility

Do what increases

verall equality

Uphold human rights Preserve Freedom

Different people

value timeliness differently

Need multiple ways
f defining utility for

diverse stakeholders

Use testing and

surveys to get a numerical estimate for how different people value certain

utcomes
Don’t prioritize airline

status travelers

Don’t let people pay

more for perks

Don’t do special

favors

Treat each traveler,

airliner, and airport the same

Uphold the human

sanctity of travelers

Provide food, lodging,

respectful notice

Make reasonable

attempts to avoid delays

Let people pick for

themselves

Don’t automatically

make decisions for travelers

Let travelers switch

across airliners

Ensure freedom of

airliners and airports

SLIDE 18

There is fundamental overlap between the philosophical frameworks. This overlap can be scaled by reward functions

SLIDE 19

There is no single right answer We need a computational system that can:

Synthesize different points of view
Weight these based on criteria, like population size
Be transparent, collaborative, timely
Change with the times

To efficiently support existing governing bodies

SLIDE 20

Public Policy and Deep Reinforcement Learning on AWS Emily Webber | - - PowerPoint PPT Presentation

Public Policy and Deep Reinforcement Learning

Public Policy Has Unique Challenges

What if we used machine learning to optimize public policy?

Data-Driven Public Policy Analysis Is Not New

Did the treatment cause this difference?

Machine Learning Reinforcement Learning

Use Case Actions Rewards

Learning Theory Fundamentals

Use Case Model Data

Mathematically speaking

Bellman Equation for Reinforcement Learning

Our reward function

model using data labelled by experts

But how do we pick the right way to optimize?

Philosophical Foundations

Utilitarianism

Pareto Improvements

Egalitarianism Kantian Rights Libertarianism

Personal Value Equality Freedom Improve at least one person, without making anyone worse off Universal Rights

There is no single best optimization strategy What we can do is use data to automatically suggest policies based on user-defined preferences

Would you like to see another point of view?

What if we could step into someone else’s shoes?

for ism ism in philosophical_frameworks: utility = define_utility(ism ism) data = update_data(utility) model = get_pareto(data) Technically speaking:

How should we handle air traffic delays?

Utilitarianism

Egalitarianism Kantian Rights

Libertarianism

There is fundamental overlap between the philosophical frameworks. This overlap can be scaled by reward functions

There is no single right answer We need a computational system that can:

To efficiently support existing governing bodies

Thank you!

Emily Webber | Amazon Web Services | LinkedIn effective-policies@amazon.com ß email me to collaborate!