Data Biased Robust Counter Strategies Michael Johanson, Michael - PowerPoint PPT Presentation

Data Biased Robust Counter Strategies Michael Johanson, Michael Bowling November 14, 2012 Q J ♣ ♦ K 1 0 ♥ ♠ P C R A U V G ♠ ♥ ♣ ♠ K Q ♦ A J ♠ 0 1 University of Alberta Computer Poker Research Group

Introduction Computer Poker Research Group Created Polaris - the world’s strongest program for playing Heads-Up Limit Texas Hold’em Poker July 2008: Went to Las Vegas, played against six poker pros, won the 2nd Man-Machine Poker Championship Won several events in the 2008 AAAI Computer Poker Competition Research goals: Solve very large extensive form games Learn to model and exploit opponent’s strategy

Model Uncertainty and Risk In this talk, we present a technique for dealing with three types of model uncertainty: The opponent / environment changes after we model it The model is more accurate in some areas than others The model’s prior beliefs are very inaccurate

Texas Hold’em Poker Our domain: 2-player Limit Texas Hold’em Poker Zero-Sum Extensive form game Repeated game (Hundreds or thousands of short games) Hidden information (Can’t see opponent’s cards) Stochastic elements (Cards are dealt randomly) Goal: Win as much money as possible RL interpretation: POMDP (when opponent’s strategy is static) Some properties of world are known Probability distribution at chance nodes Don’t know exactly what state you are in (because of opponent’s cards) Transition probabilities at opponent choice nodes are unknown Payoffs at terminal nodes are unknown

Types of strategies There are lots of ways to play games like poker. Two are well known: Nash Equilibrium Minimizes worst-case performance Doesn’t try to exploit opponent’s mistakes Best Response Maximizes performance against a specific static opponent Doesn’t try to minimize worst-case performance Problem: requires the opponent’s strategy Goals: Observe the opponent, build a model, and use it instead of the opponent’s strategy Bound worst-case performance Model could be inaccurate Opponent could change

Types of Strategies Performance against a static opponent, in millibets per game Game Theory: Exploitation of Opponent (mb/g) 60 Nash equilibrium. 50 Low 40 exploitiveness, low exploitability 30 Decision Theory: 20 Best response. High 10 exploitiveness, 0 high exploitability 0 1000 2000 3000 4000 5000 6000 Worst Case Exploitability (mb/g)

Types of Strategies Performance against a static opponent, in millibets per game Exploitation of Opponent (mb/g) 60 50 40 Mixture: Linear tradeoff of 30 exploitiveness 20 and exploitability 10 Mixture 0 0 1000 2000 3000 4000 5000 6000 Worst Case Exploitability (mb/g)

Types of Strategies Performance against a static opponent, in millibets per game Exploitation of Opponent (mb/g) 60 50 40 Restricted Nash Response: Much 30 better than linear 20 tradeoff 10 Mixture Restricted Nash Response 0 0 1000 2000 3000 4000 5000 6000 Worst Case Exploitability (mb/g)

Restricted Nash Response Restricted Nash Response Proposed by Johanson, Zinkevich and Bowling (Computing robust counter-strategies, NIPS 2007) Choose a value p and play an unusual game: With probability p , opponent is forced to play according to a static strategy With probability 1 − p , opponent is free to play as they like p = 1: Best response p = 0: Nash equilibrium 0 < p < 1: Different tradeoffs between exploiting model and being robust to any opponent! This provably generates the best possible counter-strategies to the opponent

Restricted Nash Response Performance against model of Orange Exploitation of Opponent (mb/g) 60 (1) 50 (0.99) (0.97) 40 (0.93) (0.9) 30 (0.8) (0.7) 20 (0.5) (0) 10 Restricted Nash Response 0 0 1000 2000 3000 4000 5000 6000 Worst Case Exploitability (mb/g)

Goals Goals: Observe the opponent, build a model, and use it instead of the opponent’s strategy Bound worst-case performance Model could be inaccurate Opponent could change

Frequentist Opponent Models Observe 100,000 to 1 million games played by the opponent Do frequency counts on 3/3 0/0 actions taken at information sets Model assumes opponent 6/10 1/4 3/4 4/10 takes actions with observed frequencies Need a default policy when 2 ♦ 2 ♥ K ♦ K ♥ there are no observations Poker: Always-Call

Problems with Restricted Nash Response Problem 1: Overfitting to the model 50 Orange 45 Model 40 Exploitation (mb/g) 35 30 25 20 15 10 5 0 0 200 400 600 800 1000 Exploitability (mb/g)

Problems with Restricted Nash Response Problem 2: Requires a lot of training data 20 0 Exploitation (mb/h) -20 -40 -60 100 1k -80 10k -100 100k 1m -120 0 200 400 600 800 1000 Exploitability (mb/h)

Data Biased Response Restricted Nash Response had two problems: Model wasn’t accurate in states we never observed Model was more accurate in some states than in others We need a new approach. We’d like to only use the model wherever we have reason to trust it New approach: use model’s accuracy as part of the restricted game

Data Biased Response Lets set up another restricted game. Instead of one p value for the whole tree, we’ll set one p value for each choice node, p ( i ) More observations → more confidence in the model → higher p ( i ) Set a maximum p ( i ) value, P max , that we vary to produce a range of strategies

Data Biased Response Three examples: 1-Step: p ( i ) = 0 if 0 observations, p ( i ) = P max otherwise 10-Step: p ( i ) = 0 if less than 10 observations, p ( i ) = P max otherwise 0-10 Linear: p ( i ) = 0 if 0 observations, p ( i ) = P max if 10 or more, and p ( i ) grows linearly in between By setting p ( i ) = 0 in unobserved states, our prior is that the opponent will play as strongly as possible

DBR doesn’t overfit to the model RNR and several DBR curves: 20 RN 1-Step 15 10-Step Exploitation (mb/h) 0-10 Linear 10 5 0 -5 0 200 400 600 800 1000 Exploitability (mb/h)

DBR works with fewer observations 0-10 Linear DBR curve: 20 18 16 Exploitation (mb/h) 14 12 10 100 8 1k 6 10k 100k 4 1m 2 0 50 100 150 200 250 Exploitability (mb/h)

Conclusion Data Biased Response technique: Generate a range of strategies, trading off exploitation and worst-case performance Take advantage of observed information Avoid overfitting to parts of the model we suspect are inaccurate

Future directions Extend to single-player domains Can overfitting be reduced by assuming a slightly adversarial environment in unobserved / underobserved areas? More rigorous method for setting p from the observations

Data Biased Robust Counter Strategies Michael Johanson, Michael - PowerPoint PPT Presentation

Data Biased Robust Counter Strategies Michael Johanson, Michael Bowling November 14, 2012 Q J K 1 0 P C R A U V G K Q A J 0 1 University of Alberta Computer Poker Research Group

Biased and Unbiased Samples James J. Heckman Econ 312, Spring 2019 May 14, 2019 1 / 125

Biased and Unbiased Samples James J. Heckman Econ 312, Spring 2019 May 13, 2019 1 / 125

8051 Serial Port and Timer/Counter Serial Port Timer Counter Chatchai Jantaraprim

Outlier Outlier Outlier- Outlier - -robust - robust robust robust identification

Extreme Event-Size Extreme Event-Size Fluctuations in Biased Fluctuations in Biased Random

The UN Global Counter- -Terrorism Strategy Terrorism Strategy The UN Global Counter The UN

Can We Understand Performance Counter Results? Vince Weaver ICL Lunch Talk 23 July 2010 How Do

Counter Braids: A novel counter architecture Balaji Prabhakar Balaji Prabhakar Stanford

For Loops and Arrays November 13, 2008 Counting Initialize counter Test counter against limit

Decidable Problems for Counter Systems Day 1 Introduction to Counter Systems St ephane Demri

Overview Narcotics Trafficking Operational picture Counter narcotics Strategies

Short Course in Supervised Learning Robust Optimization and Machine Learning Robust Supervised

Lecture 26 ANNOUNCEMENTS Homework 12 due Thursday, 12/6 OUTLINE Self-biased current sources

CSE 427 Computational Biology Gene Prediction A statistical interlude: Fair or biased? H H H H

Combining Biased and Unbiased Estimators in High Dimensions Bill Strawderman Rutgers University

Biased-Belief Equilibrium Yuval Heller (Bar Ilan) and Eyal Winter (Hebrew University) Bar Ilan,

Full Year 2018 Results 8 April 2019 Disclaimer This presentation (the Presentation) has

Register your organisation and staff for the eHealth Record System Gillian Leach Australian

CENTRAL REGIONAL DIRECTOR D AVE VE P ETERS ERSON , MA Transforming Lives 1 3/9/2017 Why is Nash

DYNAMIC THINNING LINES, A UNIVERSAL CONCEPT ON PLAICE NURSERY GROUNDS . Nash, R.D.M. 1 , Geffen,

Strategic Planning FY 2020 -2022 Office of Programmatic Services and Innovation February 27,

Retargeting Agricultural Investments Florence Kondylis January 23, 2017 Florence Kondylis

Merci pour votre attention E QUILIBRE DE N ASH & TRANSPORT OPTIMAL LJK G RENOBLE 3-4

Trade policy coordination and food price volatility Christophe Gouel 1 1 INRA CEPII FERDI

Data Biased Robust Counter Strategies Michael Johanson, Michael - PowerPoint PPT Presentation

Data Biased Robust Counter Strategies Michael Johanson, Michael Bowling November 14, 2012 Q J K 1 0 P C R A U V G K Q A J 0 1 University of Alberta Computer Poker Research Group

Biased and Unbiased Samples James J. Heckman Econ 312, Spring 2019 May 14, 2019 1 / 125

Biased and Unbiased Samples James J. Heckman Econ 312, Spring 2019 May 13, 2019 1 / 125

8051 Serial Port and Timer/Counter Serial Port Timer Counter Chatchai Jantaraprim

Outlier Outlier Outlier- Outlier - -robust - robust robust robust identification

Extreme Event-Size Extreme Event-Size Fluctuations in Biased Fluctuations in Biased Random

The UN Global Counter- -Terrorism Strategy Terrorism Strategy The UN Global Counter The UN

Can We Understand Performance Counter Results? Vince Weaver ICL Lunch Talk 23 July 2010 How Do

Counter Braids: A novel counter architecture Balaji Prabhakar Balaji Prabhakar Stanford

For Loops and Arrays November 13, 2008 Counting Initialize counter Test counter against limit

Decidable Problems for Counter Systems Day 1 Introduction to Counter Systems St ephane Demri

Overview Narcotics Trafficking Operational picture Counter narcotics Strategies

Short Course in Supervised Learning Robust Optimization and Machine Learning Robust Supervised

Lecture 26 ANNOUNCEMENTS Homework 12 due Thursday, 12/6 OUTLINE Self-biased current sources

CSE 427 Computational Biology Gene Prediction A statistical interlude: Fair or biased? H H H H

Combining Biased and Unbiased Estimators in High Dimensions Bill Strawderman Rutgers University

Biased-Belief Equilibrium Yuval Heller (Bar Ilan) and Eyal Winter (Hebrew University) Bar Ilan,

Full Year 2018 Results 8 April 2019 Disclaimer This presentation (the Presentation) has

Register your organisation and staff for the eHealth Record System Gillian Leach Australian

CENTRAL REGIONAL DIRECTOR D AVE VE P ETERS ERSON , MA Transforming Lives 1 3/9/2017 Why is Nash

DYNAMIC THINNING LINES, A UNIVERSAL CONCEPT ON PLAICE NURSERY GROUNDS . Nash, R.D.M. 1 , Geffen,

Strategic Planning FY 2020 -2022 Office of Programmatic Services and Innovation February 27,

Retargeting Agricultural Investments Florence Kondylis January 23, 2017 Florence Kondylis

Merci pour votre attention E QUILIBRE DE N ASH &amp; TRANSPORT OPTIMAL LJK G RENOBLE 3-4

Trade policy coordination and food price volatility Christophe Gouel 1 1 INRA CEPII FERDI

Merci pour votre attention E QUILIBRE DE N ASH & TRANSPORT OPTIMAL LJK G RENOBLE 3-4