Accelerating Best Response Calculation in Large Extensive Games Q - PowerPoint PPT Presentation

University of Alberta Agents Computer Poker Competition Man-vs-Machine 400 2007 Man-Machine: Narrow Human Win 320 Best Response (mbb/g) 240 160 2008 Man-Machine: Narrow Computer Win 80 0 2006 2007 2008 2009 2010 2011 Year Wednesday, November 14, 2012

Evaluating the University of Alberta agents Comparing Abstraction Techniques: Percentile HS Public PHS k-Means Earthmover Best Response (mbb/g) 400 300 200 100 0 1E+06 1E+07 1E+08 1E+09 1E+10 Abstraction Size (# information sets) Wednesday, November 14, 2012

Evaluating Computer Poker Agents: 2010 Competition Rock HyperB GS6 Best GGValuta PULPO Littlerock hopper (UofA) (CMU) Response Rock 6 3 7 37 77 300 hopper -6 3 1 31 77 237 GGValuta HyperB -3 -3 2 31 70 135 (UofA) -7 -1 -2 32 125 399 PULPO GS6 -37 -31 -31 -32 47 318 (CMU) -77 -77 -70 -125 -47 421 Littlerock Wednesday, November 14, 2012

♣ ♥ ♦ ♠ Conclusion Fast best-response calculation in imperfect information games The previously intractable computation can now be run in a day! Computer poker community is making steady progress towards robust strategies Many additional exciting results in the paper and at the poster! Wednesday, November 14, 2012

More details at our poster! Today, 4:00 - 5:20, Room 120-121 Wednesday, November 14, 2012

Additional Slides: Expectimax Public Tree n^2 to n Abstraction Pathologies CFR Hyperborean Polaris 2009 Additional Tilting Graphs Wednesday, November 14, 2012

Leduc Hold’em Pathologies Abstraction Best Response Real Game vs Real Game 0 J.Q.K vs Real Game 55.2 [JQ].K vs Real Game 69.0 J.[QK] vs Real Game 126.3 [JQK] vs Real Game 219.3 [JQ].K vs [JQ].K 272.2 [JQ].K vs J.Q.K 274.1 Real Game vs J.[QK] 345.7 Real Game vs [JQ].K 348.9 J.Q.K vs J.Q.K 359.9 J.Q.K vs [JQ].K 401.3 J.[QK] vs J.[QK] 440.6 459.5 Real Game vs [JQK] Real Game vs J.Q.K 491.0 [JQK] vs [JQK] 755.8 Home Wednesday, November 14, 2012

Expectimax My Tree Your Tree Reach: 2: 0.5 K: 0.5 0.5 0.5 -0.29 Home Wednesday, November 14, 2012

Conventional Best Response in one tree walk My Tree Your Tree Reach: 2: 0.5*0.75 K: 0.5*0.1 -0.29 0.05 0.38 Home Wednesday, November 14, 2012

Conventional Best Response in one tree walk My Tree Your Tree Reach: 2: 0.5*0.75*0.25 K: 0.5*0.1*0.9 -0.29 Home -0.045 0.09 0.05 Wednesday, November 14, 2012

Conventional Best Response in one tree walk My Tree Your Tree Reach: 2: 0.5*0.75 K: 0.5*0.1 -0.29 0.05 0.38 Home -0.045 Wednesday, November 14, 2012

Conventional Best Response in one tree walk My Tree Your Tree Reach: 2: 0.5*0.75*0.75 K: 0.5*0.1*0.1 -0.29 Home -0.045 0.14 0.28 0.005 Wednesday, November 14, 2012

Conventional Best Response in one tree walk My Tree Your Tree Reach: 2: 0.5*0.75 K: 0.5*0.1 -0.29 0.1 0.05 0.38 Home Wednesday, November 14, 2012

Conventional Best Response in one tree walk My Tree Your Tree Reach: 2: 0.5*0.75 K: 0.5*0.1 -0.29 0.05 0.38 0.1 Home Wednesday, November 14, 2012

Conventional Best Response in one tree walk My Tree Your Tree Reach: 2: 0.5*0.75 K: 0.5*0.1 -0.29 0.1 -0.05 0.05 0.38 Home Wednesday, November 14, 2012

Conventional Best Response in one tree walk My Tree Your Tree Reach: 2: 0.5*0.75 K: 0.5*0.1 0.1 -0.29 0.05 0.38 0.1 -0.05 Home Wednesday, November 14, 2012

Conventional Best Response in one tree walk My Tree Your Tree Reach: 2: 0.5 K: 0.5 0.5 0.5 -0.19 Home Wednesday, November 14, 2012

Conventional Best Response in one tree walk My Tree Your Tree Reach: 2: 0.5*0.25 K: 0.5*0.9 -0.19 0.45 0.13 Home Wednesday, November 14, 2012

1: Walking the Public Tree Their Reach Prob: 2: 0.5 K: 0.5 My Value: 2: K: Home Wednesday, November 14, 2012

1: Walking the Public Tree Their Reach Prob: 2: 0.5*0.25 K: 0.5*0.9 My Value: 2: K: Home Wednesday, November 14, 2012

1: Walking the Public Tree Their Reach Prob: 2: 0.5*0.25 K: 0.5*0.9 My Value: 2: -0.45 K:0.13 0.13 -0.45 -0.45, 0.13 Home Wednesday, November 14, 2012

1: Walking the Public Tree Their Reach Prob: 2: 0.5*0.25 K: 0.5*0.9 My Value: 2: -0.29 K: -0.29 0.13 -0.29 -0.45 -0.29 -0.29 -0.45, -0.29 0.13 Home Wednesday, November 14, 2012

1: Walking the Public Tree Their Reach Prob: 2: 0.5*0.25 K: 0.5*0.9 My Value: 2: -0.29 0.13 K: 0.13 -0.29 -0.29, 0.13 -0.29 0.13 -0.45 -0.29 -0.29 -0.45, -0.29 0.13 Home Wednesday, November 14, 2012

1: Walking the Public Tree Their Reach Prob: 2: 0.5 K: 0.5 My Value: 2: -0.29 0.13 K: 0.13 -0.29 -0.29, 0.13 Home Wednesday, November 14, 2012

1: Walking the Public Tree Their Reach Prob: 2: 0.5*0.75 K: 0.5*0.1 My Value: 2: 0.13 K: -0.29 -0.29, 0.13 Home Wednesday, November 14, 2012

1: Walking the Public Tree Their Reach Prob: 2: 0.5*0.75 K: 0.5*0.1 My Value: 2: -0.05 0.13 K: 0.09 -0.29 -0.29, 0.13 0.09 -0.05 -0.05, Home 0.09 Wednesday, November 14, 2012

1: Walking the Public Tree Their Reach Prob: 2: 0.5*0.75 K: 0.5*0.1 My Value: 2: 0.14 0.13 K: 0.14 -0.29 -0.29, 0.13 0.09 0.14 -0.05 0.14 -0.05, 0.14 Home 0.09 0.14 Wednesday, November 14, 2012

1: Walking the Public Tree Their Reach Prob: 2: 0.5*0.75 K: 0.5*0.1 My Value: 2: 0.14 0.13 K: 0.14 -0.29 0.09 0.23 -0.29, 0.13 0.09 0.14 -0.05 0.14 0.09 0.23 -0.05, 0.14 Home 0.09 0.14 Wednesday, November 14, 2012

1: Walking the Public Tree Their Reach Prob: 2: 0.5*0.75 K: 0.5*0.1 My Value: 2: 0.14 0.13 K: 0.14 -0.29 0.09 0.23 -0.29, 0.13 0.09 0.23 Home Wednesday, November 14, 2012

1: Walking the Public Tree Their Reach Prob: 2: 0.5*0.75 K: 0.5*0.1 My Value: 2: -0.05 0.13 K: 0.19 -0.29 0.09 -0.29, 0.23 0.19 -0.05 0.13 0.09 -0.05 0.23 0.19 Home Wednesday, November 14, 2012

1: Walking the Public Tree Their Reach Prob: 2: 0.5*0.75 K: 0.5*0.1 My Value: 2: 0.09 0.23 0.13 0.09 K: 0.23 -0.29 -0.29, 0.09 0.13 0.23 0.09 -0.05 0.23 0.19 Home Wednesday, November 14, 2012

1: Walking the Public Tree Their Reach Prob: 2: 0.5*0.75 K: 0.5*0.1 My Value: 2: -0.2 0.36 -0.2 K: 0.36 -0.2 0.36 Home Wednesday, November 14, 2012

1: Walking the Public Tree Their Reach Prob: 2: 0.5*0.75 K: 0.5*0.1 My Value: 0.17 0.17 0.18 Home Wednesday, November 14, 2012

Home Polaris 2008 Agent Size Tilt Best Response Pink 266m 0, 0, 0, 0 235.294 Orange 266m 7, 0, 0, 7 227.457 Peach 266m 0, 0, 0, 7 228.325 Red 115m 0, -7, 0, 0 257.231 Green 115m 0, -7, 0, -7 263.702 (Reference) 115m 0, 0, 0, 0 266.797 Wednesday, November 14, 2012

Polaris Polaris Hyperborean 500 400 300 200 100 0 2006 2007 2008 2009 2010 2011 Home Wednesday, November 14, 2012

Polaris Polaris Hyperborean 500 Man-vs-Machine 2007 Narrow loss 400 300 200 100 0 2006 2007 2008 2009 2010 2011 Home Wednesday, November 14, 2012

Polaris Polaris Hyperborean 500 Man-vs-Machine 2007 Narrow loss 400 300 200 Man-vs-Machine 2008 100 Narrow win 0 2006 2007 2008 2009 2010 2011 Home Wednesday, November 14, 2012

Tilting 340 Exploitability (mb/g) 320 300 280 260 -20 -10 0 10 20 Percent bonus for winner Home Wednesday, November 14, 2012

Tilting: 7% 400 Untilted Perc. E[HS 2 ] Untilted k-Means Exploitability (mb/g) Tilted Perc. E[HS 2 ] 350 Tilted k-Means 300 250 200 0 50 100 150 200 250 300 Abstraction size (millions of information sets) Home Wednesday, November 14, 2012

Counterfactual Regret Minimization: Abstract-Game Best Response 10-bucket Perfect Recall, Percentile 10 E[HS^2] 20 16 Exploitability (mbb/g) 12 8 4 0 0 400 800 1200 1600 2000 Home Iterations (million) Wednesday, November 14, 2012

Counterfactual Regret Minimization: Real Game Best Response 10-bucket Perfect Recall, Percentile 10 E[HS^2] 300 280 Exploitability (mbb/g) 260 240 220 200 0 1600 3200 4800 6400 8000 Home Iterations (million) Wednesday, November 14, 2012

Hyperborean 2009 Polaris Hyperborean Best Response (mbb/g) 500 400 300 200 100 0 ? 2006 2007 2008 2009 2010 2011 Year Home Wednesday, November 14, 2012

Abstraction: Perc HS 2 Home Wednesday, November 14, 2012

Abstraction: k-Means Home Wednesday, November 14, 2012

Abstraction: HS Distributions Distribution over future outcomes for hand AsAd 1 0.8 E[HS] 0.6 Probability 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 Expected Hand Strength Home Wednesday, November 14, 2012

Abstraction: HS Distributions Distribution over future outcomes for hand 2s7c 1 0.8 E[HS] 0.6 Probability 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 Home Expected Hand Strength Wednesday, November 14, 2012

k-Means Earthmover Abstraction Home Wednesday, November 14, 2012

3: Fast Terminal Node Evaluation His Reach Probs: My Values: 0.1 = ? 0.05 = ? 0.02 = ? Home Wednesday, November 14, 2012

3: Fast Terminal Node Evaluation His Reach Probs: My Values: u = utility for winner 0.1 = 0*0.1 + u*0.05 + u*0.02 + ... 0.05 = -u*0.1 + 0*0.05 + u*0.02 + ... 0.02 = -u*0.1 + -u*0.05 + 0*0.02 + ... ... ... Home Wednesday, November 14, 2012

3: Fast Terminal Node Evaluation The obvious O(n^2) algorithm: r[i] = his reach probs v[i] = my values u = utility for the winner for( a = each of my hands ) for( b = each of his hands ) if( a > b ) v[a] += u*r[b] else if( a < b ) v[a] -= u*r[b] Home Wednesday, November 14, 2012

Accelerating Best Response Calculation in Large Extensive Games Q - PowerPoint PPT Presentation

Accelerating Best Response Calculation in Large Extensive Games Q J # $ K 10 P " C R ! July 21, 2011 U G A V ! " # Michael Johanson, Kevin Waugh, ! K Q $ A J ! 10 University of Alberta Michael Bowling, Martin

Extensive-stage small cell lung cancer Tom Stinchcombe Duke Thoracic Oncology Program Extensive

Introduction to Game Theory Mehdi Dastani BBL-521 M.M.Dastani@uu.nl Extensive Games

Rapid Response Jobs are Alaskas Future Rapid Response Rapid Response Rapid Response is a

City of Piedmont Best Best & Krieger Company/BestBestKrieger @BBKlaw 2018 Best Best

Extensive Form Games Extensive-form games with perfect information When moving, each player

Extensive Form Games Mihai Manea MIT Extensive-Form Games N : finite set of players; nature

Game Theory Extensive Form Games Levent Ko ckesen Ko c University Levent Ko ckesen

Extensive Form Games Game Theory MohammadAmin Fazli Algorithmic Game Theory 1 TOC Perfect

Calculation of Optimal Parameters Calculation of Optimal Parameters for Aircraft Recognition on

MRL Calculation & ST Reserve Requirement July 2009 PRESENTED BY DAVID BONES CONTENTS 1. MRL

Calculation of Periodic Travelling Wave Stability: A Users Guide Jonathan A. Sherratt

Magnetic Field Calculation in Cross Calibration Cryogenics Science Center, KEK Hiroshi Yamaguchi

Lattice Calculation of PDFs Two Challenges. Euclidean lattice precludes the calculation of

Class 15: Calculation of natural frequency Class 15: Calculation of natural frequency Old Slide

Oxford COVID-19 Government Response Tracker Calculation and presentation of the Stringency Index

Accelerating Convergence of Free Energy Calculation with Replica

Dive Deeper in Finance GTC 2017 San Jos California Daniel Egloff Dr. sc. math.

Doubly Stochastic Inference for Deep Gaussian Processes Hugh Salimbeni Department of Computing

A quantum algorithm for model independent searches for new physics Prasanth Shyamsundar

Investor Presentation November 2019 Safe Harbor Statement The offering to which this

Multi-phase Particles Morphology Formation: Model & Methods Simone Rusconi BCAM Basque

Fresh Start to End Family Homelessness Department of Human Services Guiding Principles

T his article discusses some partial solu- 10 years or more. It is difficult to see any tions to

Intractable Peacebuilding: Evaluating a Generation of Work Across the Israeli-Palestinian Divide

Accelerating Best Response Calculation in Large Extensive Games Q - PowerPoint PPT Presentation

Accelerating Best Response Calculation in Large Extensive Games Q J # $ K 10 P " C R ! July 21, 2011 U G A V ! " # Michael Johanson, Kevin Waugh, ! K Q $ A J ! 10 University of Alberta Michael Bowling, Martin

Extensive-stage small cell lung cancer Tom Stinchcombe Duke Thoracic Oncology Program Extensive

Introduction to Game Theory Mehdi Dastani BBL-521 M.M.Dastani@uu.nl Extensive Games

Rapid Response Jobs are Alaskas Future Rapid Response Rapid Response Rapid Response is a

City of Piedmont Best Best &amp; Krieger Company/BestBestKrieger @BBKlaw 2018 Best Best

Extensive Form Games Extensive-form games with perfect information When moving, each player

Extensive Form Games Mihai Manea MIT Extensive-Form Games N : finite set of players; nature

Game Theory Extensive Form Games Levent Ko ckesen Ko c University Levent Ko ckesen

Extensive Form Games Game Theory MohammadAmin Fazli Algorithmic Game Theory 1 TOC Perfect

Calculation of Optimal Parameters Calculation of Optimal Parameters for Aircraft Recognition on

MRL Calculation &amp; ST Reserve Requirement July 2009 PRESENTED BY DAVID BONES CONTENTS 1. MRL

Calculation of Periodic Travelling Wave Stability: A Users Guide Jonathan A. Sherratt

Magnetic Field Calculation in Cross Calibration Cryogenics Science Center, KEK Hiroshi Yamaguchi

Lattice Calculation of PDFs Two Challenges. Euclidean lattice precludes the calculation of

Class 15: Calculation of natural frequency Class 15: Calculation of natural frequency Old Slide

Oxford COVID-19 Government Response Tracker Calculation and presentation of the Stringency Index

Accelerating Convergence of Free Energy Calculation with Replica

Dive Deeper in Finance GTC 2017 San Jos California Daniel Egloff Dr. sc. math.

Doubly Stochastic Inference for Deep Gaussian Processes Hugh Salimbeni Department of Computing

A quantum algorithm for model independent searches for new physics Prasanth Shyamsundar

Investor Presentation November 2019 Safe Harbor Statement The offering to which this

Multi-phase Particles Morphology Formation: Model &amp; Methods Simone Rusconi BCAM Basque

Fresh Start to End Family Homelessness Department of Human Services Guiding Principles

T his article discusses some partial solu- 10 years or more. It is difficult to see any tions to

Intractable Peacebuilding: Evaluating a Generation of Work Across the Israeli-Palestinian Divide

City of Piedmont Best Best & Krieger Company/BestBestKrieger @BBKlaw 2018 Best Best

MRL Calculation & ST Reserve Requirement July 2009 PRESENTED BY DAVID BONES CONTENTS 1. MRL

Multi-phase Particles Morphology Formation: Model & Methods Simone Rusconi BCAM Basque