Counterfactual Regret Minimization and Domination in Extensive-Form - PowerPoint PPT Presentation

Counterfactual Regret Minimization and Domination in Extensive-Form Games Richard Gibson University of Alberta Edmonton, Alberta, Canada

Overview Counterfactual Regret Minimization (CFR)

Overview Counterfactual Regret Minimization (CFR) Provably solves for Nash equilibrium 2-Player Zero-Sum Extensive-Form Games

Overview Counterfactual Regret Minimization (CFR) Provably solves for Seems to work well... Nash equilibrium 2-Player Zero-Sum Extensive-Form Extensive-Form Games, any Games Number of players

Overview Counterfactual Regret Minimization (CFR) Provably solves for Seems to work well... Nash equilibrium 2-Player Zero-Sum Extensive-Form Extensive-Form Games, any Games Number of players Question : Why do CFR strategies work well in extensive-form games outside of the 2-player zero-sum case?

Extensive-Form Games C QJ QK 0.5 0.5 1 1 c b c b 2 2 2 2 c b f c c b f c (1,-1) 1 (1,-1) (2,-2) (-1,1) -1 1 (1,-1) (-2,2) f c f c (-1,1) (2,-2) (-1,1) (-2,2)

Extensive-Form Games C QJ QK 0.5 0.5 1 1 c b c b 2 2 2 2 c b f c c b f c (1,-1) 1 (1,-1) (2,-2) (-1,1) -1 1 (1,-1) (-2,2) f c f c (-1,1) (2,-2) (-1,1) (-2,2) Information sets group states that are indistinguishable to the player.

Extensive-Form Games C QJ QK 0.5 0.5 1 1 0.4 0.6 0.4 0.6 2 2 2 2 0.7 0.3 0 1 0.9 0.1 1 0 (1,-1) 1 (1,-1) (2,-2) (-1,1) 1 (1,-1) (-2,2) 0.2 0.8 0.2 0.8 (-1,1) (2,-2) (-1,1) (-2,2) A strategy profile σ = ( σ 1 , σ 2 ) assigns a probability distribution over actions at each information set. Example: Probability player 1 checks is σ 1 ( Q?, c ) = 0.4 .

Extensive-Form Games C QJ QK 0.5 0.5 1 1 0.4 0.6 0.4 0.6 2 2 2 2 0.7 0.3 0 1 0.9 0.1 1 0 (1,-1) 1 (1,-1) (2,-2) (-1,1) 1 (1,-1) (-2,2) 0.2 0.8 0.2 0.8 (-1,1) (2,-2) (-1,1) (-2,2) u i ( σ ) is the expected utility for player i , assuming players play according to σ .

Counterfactual Regret Minimization (CFR) [Zinkevich et al ., NIPS 2007] ● CFR is an iterative algorithm that generates strategy profiles ( σ 1 , σ 2 , ... , σ T ) over many iterations T . ● Final output of CFR: σ AVG = Average( σ 1 , σ 2 , ... , σ T ). ● For 2-player zero-sum games, σ AVG is an ϵ -Nash equilbrium, with ϵ → 0 as T → ∞: AVG , σ 2 AVG ) ≥ max u 1 ( σ 1 * , σ 2 AVG ) - ϵ u 1 ( σ 1 * σ 1 AVG , σ 2 AVG ) ≥ max u 2 ( σ 1 AVG , σ 2 * ) - ϵ u 2 ( σ 1 σ 2 *

Counterfactual Regret Minimization (CFR) ● Outside of 2-player zero-sum games, σ AVG is not necessarily an approximate Nash equilibrium [Abou Risk and Szafron, AAMAS 2010] . – A player may gain by deviating from σ AVG . ● In these games, a Nash equilibrium might not be the most appropriate solution concept anyways. ● On the other hand, σ AVG performs very well in practice...

Annual Computer Poker Competition 3-Player Limit Hold'em - 2009 3-Player Limit Hold'em - 2011 Agent Instant Run-off: Round 0 Agent Instant Run-off: Round 0 Hyperborean-Eqm 319 ± 2 Sartre3p 243 ± 20 Hyperborean-BR 299 ± 2 Hyperborean-3p-limit-iro 204 ± 20 akuma 151 ± 2 LittleRock 113 ± 19 dpp 171 ± 2 AAIMontybot 96 ± 44 CMURingLimit -37 ± 2 dcubot3plr 77 ± 19 dcu3pl -63 ± 2 OwnBot -4 ± 30 Bluechip -548 ± 2 Bnold3 -91 ± 22 Entropy -108 ± 36 3-Player Limit Hold'em - 2010 player.zeta.3p -530 ± 33 Agent Instant Run-off: Round 0 Hyperborean.iro 144 ± 32 dcu3pl.tbr 98 ± 30 LittleRock 65 ± 35 Arnold3 -135 ± 39 Bender -172 ± 16

Counterfactual Regret Minimization (CFR) ● In games with more than 2-players, σ AVG is a “good” strategy. Why? ● What properties make a strategy good in games with more than 2-players? ● We know what a bad strategy is...

Domination C QJ QK 0.5 0.5 1 1 c b c b 2 2 2 2 c b f c c b f c (1,-1) 1 (1,-1) (2,-2) (-1,1) -1 1 (1,-1) (-2,2) f c f c (-1,1) (2,-2) (-1,1) (-2,2)

Domination C QJ QK 0.5 0.5 1 1 c b c b 2 2 2 2 c b 0 1 c b f c (1,-1) 1 (1,-1) (2,-2) (-1,1) -1 1 (1,-1) (-2,2) f c f c (-1,1) (2,-2) (-1,1) (-2,2) J, c that always calls with the Jack when faced Consider any player 2 strategy σ 2 with a bet.

Domination C QK QJ 0.5 0.5 1 1 c b c b 2 2 2 2 c b 0 1 c b f c (1,-1) 1 (1,-1) (2, -2 ) (-1,1) -1 1 (1,-1) (-2,2) f c f c (-1,1) (2,-2) (-1,1) (-2,2) J, c ) = ... + 0.5 σ 1 ( Q?, b )1 ( -2 ) + ... u 2 ( σ 1 , σ 2

Domination C QK QJ 0.5 0.5 1 1 c b c b 2 2 2 2 c b 1 0 c b f c (1,-1) 1 (1,-1) (2,-2) (-1,1) -1 1 (1,-1) (-2,2) f c f c (-1,1) (2,-2) (-1,1) (-2,2) J, c ) = ... + 0.5 σ 1 ( Q?, b )1 ( -2 ) + ... u 2 ( σ 1 , σ 2 J, f . Now consider the same player 2 strategy, except always folds the J. Call it σ 2

Domination C QK QJ 0.5 0.5 1 1 c b c b 2 2 2 2 c b 1 0 c b f c (1,-1) 1 (1, -1 ) (2,-2) (-1,1) -1 1 (1,-1) (-2,2) f c f c (-1,1) (2,-2) (-1,1) (-2,2) J, c ) = ... + 0.5 σ 1 ( Q?, b )1 ( -2 ) + ... u 2 ( σ 1 , σ 2 J, f ) = '' + 0.5 σ 1 ( Q?, b )1 ( -1 ) + '' u 2 ( σ 1 , σ 2

Domination C QK QJ 0.5 0.5 1 1 c b c b 2 2 2 2 c b 1 0 c b f c (1,-1) 1 (1, -1 ) (2,-2) (-1,1) -1 1 (1,-1) (-2,2) f c f c (-1,1) (2,-2) (-1,1) (-2,2) J, c ) = ... + 0.5 σ 1 ( Q?, b )1 ( -2 ) + ... u 2 ( σ 1 , σ 2 J, f ) = '' + 0.5 σ 1 ( Q?, b )1 ( -1 ) + '' u 2 ( σ 1 , σ 2 J, c ) ≤ u 2 ( σ 1 , σ 2 J, f ) for all σ 1 . u 2 ( σ 1 , σ 2 J, c ) < u 2 ( σ 1 , σ 2 J, f ) if σ 1 ( Q?, b ) > 0 u 2 ( σ 1 , σ 2

Domination C QJ QK 0.5 0.5 1 1 c b c b 2 2 2 2 c b f c c b f c (1,-1) 1 (1,-1) (2,-2) (-1,1) -1 1 (1,-1) (-2,2) f c f c (-1,1) (2,-2) (-1,1) (-2,2) σ 2 is a dominated strategy if there exists σ 2 ' such that u 2 ( σ 1 , σ 2 , σ 3 , ... ) ≤ u 2 ( σ 1 , σ 2 ' , σ 3 , ... ) for all σ 1 , σ 3 , ... u 2 ( σ 1 , σ 2 , σ 3 , ... ) < u 2 ( σ 1 , σ 2 ' , σ 3 , ... ) for some σ 1 , σ 3 , ...

Domination C QJ QK 0.5 0.5 1 1 c b c b 2 2 2 2 c b f c c b f c (1,-1) 1 (1,-1) (2,-2) (-1,1) -1 1 (1,-1) (-2,2) f c f c (-1,1) (2,-2) (-1,1) (-2,2) J, c is dominated by σ 2 J, f σ 2 K, f is dominated by σ 2 K, c σ 2

Domination C QJ QK 0.5 0.5 1 1 c b c b 2 2 2 2 c b f c c b f c (1,-1) 1 (1,-1) (2,-2) (-1,1) -1 1 (1,-1) (-2,2) f c f c (-1,1) (2,-2) (-1,1) (-2,2) Define a dominated action to be an action such that any strategy that always plays that action is dominated (assuming that player plays to reach that action).

Domination C QJ QK 0.5 0.5 1 1 c b c b 2 2 2 2 c b f c b c (1,-1) 1 (1,-1) (-1,1) -1 1 (-2,2) f c f c (-1,1) (2,-2) (-1,1) (-2,2)

Domination C QJ QK 0.5 0.5 1 1 0 1 0 1 2 2 2 2 c b f c b c (1,-1) 1 (1,-1) (-1,1) -1 1 (-2,2) f c f c (-1,1) (2,-2) (-1,1) (-2,2) b that always bets. Consider the player 1 strategy σ 1

Domination C QJ QK 0.5 0.5 1 1 0 1 0 1 2 2 2 2 c b 1 c b 1 (1,-1) 1 ( 1 ,-1) (-1,1) -1 1 ( -2 ,2) f c f c (-1,1) (2,-2) (-1,1) (-2,2) b , σ 2 ) = 0.5 ( 1 )( 1 )( 1 ) + 0.5 ( 1 )( 1 )( -2 ) = -0.5 u 1 ( σ 1

Domination C QJ QK 0.5 0.5 1 1 1 0 1 0 2 2 2 2 c b f c b c (1,-1) 1 (1,-1) (-1,1) -1 1 (-2,2) 0 1 0 1 (-1,1) (2,-2) (-1,1) (-2,2) b , σ 2 ) = 0.5 ( 1 )( 1 )( 1 ) + 0.5 ( 1 )( 1 )( -2 ) = -0.5 u 1 ( σ 1 cc that checks then calls. Now consider the player 1 strategy σ 1

Domination C QJ QK 0.5 0.5 1 1 1 0 1 0 2 2 2 2 1 0 f 0 1 c ( 1 ,-1) 1 (1,-1) (-1,1) -1 1 (-2,2) 0 1 0 1 (-1,1) (2,-2) (-1,1) ( -2 ,2) b , σ 2 ) = 0.5 ( 1 )( 1 )( 1 ) + 0.5 ( 1 )( 1 )( -2 ) = -0.5 u 1 ( σ 1 cc , σ 2 J c ,K b ) = 0.5 ( 1 )( 1 )( 1 ) + 0.5 ( 1 )( 1 )( 1 )( -2 ) = -0.5 u 1 ( σ 1

Domination C QJ QK 0.5 0.5 1 1 1 0 1 0 2 2 2 2 0 1 f 1 0 c (1,-1) 1 (1,-1) ( -1 ,1) -1 1 (-2,2) 0 1 0 1 (-1,1) ( 2 ,-2) (-1,1) (-2,2) b , σ 2 ) = 0.5 ( 1 )( 1 )( 1 ) + 0.5 ( 1 )( 1 )( -2 ) = -0.5 u 1 ( σ 1 cc , σ 2 J c ,K b ) = 0.5 ( 1 )( 1 )( 1 ) + 0.5 ( 1 )( 1 )( 1 )( -2 ) = -0.5 u 1 ( σ 1 cc , σ 2 J b ,K c ) = 0.5 ( 1 )( 1 )( 1 )( 2 ) + 0.5 ( 1 )( 1 )( -1 ) = +0.5 u 1 ( σ 1

Domination C QJ QK 0.5 0.5 1 1 c b c b 2 2 2 2 c b f c b c (1,-1) 1 (1,-1) (-1,1) -1 1 (-2,2) f c f c (-1,1) (2,-2) (-1,1) (-2,2) σ 1 is an iteratively dominated strategy if there exists σ 1 ' such that u 1 ( σ 1 , σ 2 , σ 3 , ... ) ≤ u 1 ( σ 1 ' , σ 2 , σ 3 , ... ) for all non-iteratively dominated σ 2 , σ 3 , ... u 1 ( σ 1 , σ 2 , σ 3 , ... ) < u 1 ( σ 1 ' , σ 2 , σ 3 , ... ) for some non-iteratively dominated σ 2 , σ 3 , ...

Domination C QJ QK 0.5 0.5 1 1 c b c b 2 2 2 2 c b f c b c (1,-1) 1 (1,-1) (-1,1) -1 1 (-2,2) f c f c (-1,1) (2,-2) (-1,1) (-2,2) b is iteratively dominated by σ 1 cc σ 1

Domination C QJ QK 0.5 0.5 1 1 c b c b 2 2 2 2 c b f c b c (1,-1) 1 (1,-1) (-1,1) -1 1 (-2,2) f c f c (-1,1) (2,-2) (-1,1) (-2,2) Define an iteratively dominated action to be an action such that any strategy that always plays that action is iteratively dominated (assuming that player plays to reach that action).

Counterfactual Regret Minimization and Domination in Extensive-Form - PowerPoint PPT Presentation

Counterfactual Regret Minimization and Domination in Extensive-Form Games Richard Gibson University of Alberta Edmonton, Alberta, Canada Overview Counterfactual Regret Minimization (CFR) Overview Counterfactual Regret Minimization (CFR)

Counterfactual Regret Minimization Gabriele Farina 1 Christian Kroer 2 Noam Brown 1 Tuomas Sandholm

Counterfactual Donkeys and the Modal Horizon Andreas Walker and Maribel Romero Counterfactual

From Bandits to Experts: A Tale of Domination and Independence Nicol` o Cesa-Bianchi Universit`

Two domination parameters in graphs Guangjun Xu Department of Mathematics and Statistics The

Domination in generalized Domination in generalized Petersen graphs Petersen graphs Advisor:

On traffic domination in communication networks Walid Ben-Ameur 1 Pablo Pavon 2 oro 3 , 4 Micha

Deep Counterfactual Regret Min inimization Noam Brown* 12 , Adam Lerer* 1 , Sam Gross 1 , Tuomas

No-Regret Learning in Convex Games Geoff Gordon, Amy Greenwald, Casey Marks, and Martin Zinkevich

Bayesian Counterfactual Risk Minimization Ben London (blondon@) Amazon Music Ted Sandler (sandler@)

Minimization Satoru Iwata (University of Tokyo) Submodular Function Minimization ( )

Regret Minimization for Online Buffering Problems Using the Weighted Majority Algorithm Sascha

Optimistic Regret Minimization for Extensive-Form Games via Dilated Distance-Generating Functions

Cautious R Regret M Minimization: Online O Optimization w with L Long-Term B Budg udget Co

Counterfactual-based mediation analysis Workshop 1 Rhian Daniel London School of Hygiene and

Counterfactual-based mediation analysis Workshop 2 Rhian Daniel London School of Hygiene and

Counterfactual Visual Explanations Yash Goyal Ziyan Wu Jan Ernst Dhruv Batra Devi Parikh

tt tr

Domination Dominating Set Colouring For a graph G = ( V , E ) , a vertex set D is called

Slot machines an approach to the Strategy Challenge in SMT solving St ephane Graham-Lengrand

Beyond the context-free boundary: generalizing Lambek calculus Michael Moortgat Flowincat

Domination in circle graphs Nicolas Bousquet Daniel Gon calves George B. Mertzios Christophe

Realizations of the Game Domination Number Bo stjan Bre sar, Sandi Klav zar, Ga sper

Equality in the Domination Chain in Triangulations Stephen Finbow Joint work with C. M. van

On some properties of Archimedean tiling graphs Liping Yuan College of Mathematics and

Sambuz

Useful Links

Newsletter

Mail Us