CMU-Q 15-381 Lecture 20: Game Theory I Teacher: Gianni A. Di Caro

I CE - CREAM W ARS http://youtu.be/jILgxeNBK_8 2

G AME T HEORY § Game theory is the formal study of conflict and cooperation in (rational) multi-agent systems § Decision-making where several players must make choices that potentially affect the interests of other players: the effect of the actions of several agents are interdependent (and agents are aware of it) § Example: Auctioning ! Psychology: Theory of social situations 3

E LEMENTS OF A G AME § The players: how many players are there? Does nature/chance play a role? Players are assumed to be rational § A complete description of what the players can do: the set of all possible actions. 4

E LEMENTS OF A G AME § A description of the payoff / consequences for each player for every possible combination of actions chosen by all players playing the game. § A description of all players’ preferences over payoffs Utility function for each player 5

A GENT D ESIGN VS . M ECHANISM D ESIGN § Agent strategy design: Game theory can be used to compute the expected utility for each decision, and use this to determine the best strategy (and its expected return) against a rational player Strategy ≡ Policy § System-level mechanism design: Define the rules of the game, such that the collective utility of the agents is maximized when each agent strategy is designed to maximize its own utility according to ASD 6

M AKING DECISIONS : B ASIC DEFINITIONS § Decision-making can involve: one action or a sequence of actions § Action outcomes can be certain or subject to uncertainty A set 𝐵 of alternative actions to choose from is given, it can be either § discrete (finite or numerable ) or continuous (infinite) 𝐵 = {𝑏 ' ,𝑏 ) ,⋯ , 𝑏 + } 𝐵 = 𝑏 𝑏 ∈ 0,10 } § § Strategy (=Policy): tells a player what to do for every possible situation ( state ) throughout the game (complete algorithm for playing the game). It can be deterministic or stochastic § Strategy set 𝑇 : set of all strategies available for 1 the players to play. Set 𝑇 can be finite or infinite 𝑏 ' 𝑏 ) Sequential game, one player States: {1,2,3,𝑈} 2 3 𝐵 ' = 𝑏 ' ,𝑏 ) , 𝐵 ) = 𝑐 ' ,𝑐 ) , 𝐵 1 = 𝑑 ' ,𝑑 ) , 𝐵 3 = 𝑐 ' 𝑐 ) 𝑑 ' 𝑑 ) 𝑇 = {𝑏 ' 𝑐 ' ,𝑏 ' 𝑐 ) ,𝑏 ) 𝑑 ' , 𝑏 ) 𝑑 ) } E.g. strategy: 𝑡 = {𝑏 ' 𝑐 ' } 7

M AKING DECISIONS : B ASIC DEFINITIONS § One-action (static) games 3 2 1 𝑑 ' 𝑑 ) 𝑐 ' 𝑐 ) 𝑏 ' 𝑏 ) States: {1,2,3,𝑈} 𝐵 ' = 𝑏 ' ,𝑏 ) , 𝐵 ) = 𝑐 ' ,𝑐 ) , 𝐵 1 = 𝑑 ' ,𝑑 ) , 𝐵 3 = 𝑇 = (1, 𝑏 ' ,(1, 𝑏 ) ),(2,𝑐 ' ), (2, 𝑐 ) ), (3, 𝑑 ' ),(3,𝑑 ) )} E.g. strategy: 𝑡 = {(1,𝑏 ' ), (2, 𝑐 ) ), (3, 𝑑 ' )} § The strategy defines the behavior of an agent § The observed behavior of an agent following a given strategy is the outcome of the strategy § Pure strategy: a strategy in which there is no randomization , one specific action from the set 𝐵 is selected with certainty at each state / decision node The strategy set 𝑇 is also indicated as the pure strategy set § 8

P AYOFFS AND U TILITIES § How do we choose the strategy? § Rational agents : Principle of Maximum Expected Utility § Payoffs ~ Rewards in MDPs: what results from taking an action § Payoff (for a single agent): function that associates a numerical value with every action in 𝐵 𝜌: 𝐵 → ℝ Payoff (for a multi-agent scenario): The payoff of the action 𝑏 for § agent 𝑗 depends on the actions of the other players! 𝜌: 𝐵×𝐵×⋯ ×𝐵 → ℝ Utility: it can be any convenient additive function 𝑣 of the payoffs § § In the following the payoffs will coincide with the utility of the agents (it fully makes sense for the static games that we will consider) Notation: we will use 𝜌 B and 𝑣 B quite interchangeably § 9

I NFORMATION AND T YPES OF G AMES § Complete information game: Utility functions, payoffs, strategies and “types” of players are common knowledge § Incomplete information game: Players may not possess full information about their opponents (e.g., in auctions, each player knows its utility but not that of the other players). “ Parameters ” of the game are not fully known § Perfect information game: Each player, when making any decision, is perfectly informed of all the events that have previously occurred (e.g., chess) [Full observability] § Imperfect information game: Not all information is accessible to the player (e.g., poker, prisoner’s dilemma) [Partial observability] 10

T URN - TAKING VS . S IMULTANEOUS MOVES § Static games § All players take actions “simultaneously” Morra § → Imperfect information games § Complete information § Single-move games § Dynamic games max o Turn-taking games o Fully observable ↔ min Perfect Information Games o Complete Information 10 10 9 100 o Repeated moves 11

(S TRATEGIC -) N ORMAL -F ORM G AME § Let’s focus on static games Payoff matrix § There is a strategic interaction among players § Strategy profile : a set of strategies for all players which fully specifies all actions in a game. It must include one and only one strategy for every player § A game in normal form consists of: o Set of players 𝑂 = {1,… , 𝑜} o Set of actions available to each player, that defines the strategy set 𝑇 = {𝑡 ' , 𝑡 ) ,⋯ , 𝑡 G } o For each 𝑗 ∈ 𝑂 , a utility function 𝑣 B defined Payoff matrix in a over the set of all possible strategy profiles 2-player game 𝑣 B ∶ 𝑇 + → ℝ If each player 𝑘 ∈ 𝑂 plays the strategy 𝑡 J ∈ 𝑇 , the utility of player 𝑗 is 𝑣 B 𝑡 ' ,… , 𝑡 + that is the same as player 𝑗 ’ s payoff when strategy profile (𝑡 ' ,… , 𝑡 + ) is chosen 12

T HE I CE C REAM W ARS § 𝑂 = 1,2 § 𝑇 = [0,1] § 𝑡 i is the fraction of beach § ….. K L MK N , 𝑡 B < 𝑡 J ) K L MK N • 𝑣 B 𝑡 B , 𝑡 = 1 − , 𝑡 B > 𝑡 J J ) ' ) , 𝑡 B = 𝑡 J 13

T HE PRISONER ’ S DILEMMA (1962) § Two men are charged with a crime. Police suspects they are the authors of the crime but doesn’t have enough evidence § They are taken into custody and 6 6 9 can’t communicate with each other § They are told that: 9 o If one rats out and the other § 𝑂 = 1,2 does not, the rat will be freed, other jailed for 9 years § 𝑇 = {𝐷𝑝𝑜𝑔𝑓𝑡𝑡, 𝐸𝑝𝑜 Y 𝑢 𝑑𝑝𝑜𝑔𝑓𝑡𝑡} § Strategy profiles: o If both rat out, both will be { 𝐷, 𝐷 , 𝐷, 𝐸 , 𝐸, 𝐷 , 𝐸, 𝐸 } jailed for 6 years § 𝑣 [ 𝐷, 𝐷 = 6, 𝑣 [ 𝐷, 𝐸 = 0, § They also know that if neither rats 𝑣 [ 𝐸, 𝐷 = 9, 𝑣 [ 𝐸, 𝐸 = 1 out, both will be jailed for 1 year § Symmetric for 𝑣 ^ 14

T HE PRISONER ’ S DILEMMA (1962) 15

P RISONER ’ S DILEMMA : P AYOFF MATRIX Don’t confess = Don’t rat out B Don’t Cooperate with each other Confess Confess Confess = Rat out Don’t cooperate to each other, act selfishly! Don’t -1,-1 -9,0 Confess A Confess 0,-9 -6,-6 What would you do? 16

P RISONER ’ S DILEMMA : P AYOFF MATRIX B Don’t confess: B § If A don’t confess, B gets -1 § If A confess, B gets -9 Don’t Confess Confess B Confess: Don’t § If A don’t confess, B gets 0 -1,-1 -9,0 § If A confess, B gets -6 Confess A Confess 0,-9 -6,-6 Rational agent B opts to Confess 17

P RISONER ’ S DILEMMA § Confess (Defection = Acting selfishly) is a dominant strategy for B : no matters what A plays, the best reply strategy is always to confess § (Strictly) dominant strategy : yields a player strictly higher payoff, regardless of which decision(s) the other player(s) choose § Weakly dominant strategy : ties in some cases § Because of symmetry, Confess is a dominant strategy also for A § A will reason as follows: B ’s dominant strategy is to Confess, therefore, given that we are both rational agents, B will also Confess and we will both get 6 years. 18

P RISONER ’ S DILEMMA § But, is the dominant strategy ( 𝐷 , 𝐷 ) the best strategy? Don’t B Confess Confess Don’t -1,-1 -9,0 Confess A Confess 0,-9 -6,-6 19

P ARETO OPTIMALITY VS . E QUILIBRIA § Being selfish is a dominant strategy , but the players can do much better by cooperating: (-1,-1), which is the Pareto-optimal outcome § Pareto optimality : an outcome such that there is no other outcome that makes any player better off without making at least another one player worse off → Outcome ( Don’t Confess, Don’t confess ): (-1,-1) § A strategy profile forms an equilibrium if no player can benefit by switching strategies, given that every other player sticks with the same strategy , which is the case of ( Confess, Confess ) § An equilibrium is a local optimum in the space of the strategies 20

CMU-Q 15-381 Lecture 20: Game Theory I Teacher: Gianni A. Di Caro - PowerPoint PPT Presentation

CMU-Q 15-381 Lecture 20: Game Theory I Teacher: Gianni A. Di Caro I CE - CREAM W ARS http://youtu.be/jILgxeNBK_8 2 G AME T HEORY Game theory is the formal study of conflict and cooperation in (rational) multi-agent systems

Slides for 15-381/781 15-381/781 Fall 2016

FACT: A Diagnostic for Group Fairness Trade-offs Joon Kim, CMU (joonsikk@cs.cmu.edu ) Jiahao Chen,

The bluetides simulation Tiziana DiMatteo (CMU ) Yu Feng (Berkeley), Rupert Croft (CMU ), Aklant

A New Boosting Algorithm Using Input-Dependent Regularizer Rong Jin rong+@cs.cmu.edu Yan Liu

CMU-Q 15-381 Lecture 1: Introduction AI, basic definitions, problems, road map Teacher:

CMU-Q 15-381 Lecture 4: Path Planning Teacher: Gianni A. Di Caro A PPLICATION : M OTION P

15-381: Artificial Intelligence Introduction and Overview Course data All up-to-date info is

CMU-Q 15-381 Lecture 18: Reinforcement Learning I Teacher: Gianni A. Di Caro H OW REALISTIC ARE

CMU-Q 15-381 Lecture 5: Classical Planning Factored Representations STRIPS Teacher: Gianni A.

CMU MDPs 15-381/781 Emma Brunskill (THIS TIME) Ariel Procaccia DeepMind 2 So long

CMU-Q 15-381 Lecture 8: Optimization I: Optimization for CSP Local Search Teacher: Gianni A.

CMU-Q 15-381 Lecture 15: Predictions in Markov Chains Markov Decision Processes Teacher:

CMU-Q 15-381 Lecture 16: Markov Decision Processes I Teacher: Gianni A. Di Caro R ECAP : M

CMU-Q 15-381 Lecture 23: Supervised Learning 1 Teacher: Gianni A. Di Caro M ACHINE L EARNING ?

EOR Enhanced Oil Recovery 3535 W. 16 th . St. Odessa, Texas 79763 Tel. (432) 381-6540 Fax

6. Parameter Passing Parameter Passing CS 381 Spring 2016 Example (Formal) Parameter void

VIEW STATE MACHINE FOR NETWORK CALLS ON ANDROID @MANDYBESS THOUGHTBOT WHAT IS A VIEW STATE

UDLS September 11, 2020 content warning history 4000 years old snow and nectar 618-907 AD

More on feldspars & quartz Halides, sulfates, borates, phosphates Evaporites &

Thermal Energy Storage Application Perspectives Technical Challenges in the Component Development

Lecture 2: Tokenization and Morphology Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel

How I decided to ask Eduardo to be my thesis adviser Madalena Chaves SontagFest, DIMACS,

Counting Strategies: Inclusion-Exclusion, Categories Russell Impagliazzo and Miles Jones Thanks

Introduction to Distributed Tracing Joe Elliott, Annanay Agarwal What are we doing here? -

Sambuz

Useful Links

Newsletter

Mail Us