Inverse Game Theory: Learning Utilities in Succinct Games Hesam Nikpey Pooya Shati Social and Economical Networks Dr. Fazli Spring 96-97

Inverse Game Theory: Learning Utilities in Succinct Games PAPER Volodymyr Kuleshov and Okke Schrijvers WINE 2015 conference 1

Problem Introduction Related works Equilibrium Concepts OUTLINE Succinct Games Rationalizing a Game Learning Utilities 2

Classic Game Theory PROBLEM Inverse Game Theory INTRODUCTION Succinct Games 3

Economics; design mechanisms Machine learning; helicopter autopilots APPLICATIONS Developing predictive techniques Forecasting the agents ’ behavior 4

Computer science: Computational complexity of rationalizing stable matchings Correlated equilibria RELATED Economics: WORKS Inferring utilities of bidders in online ad auctions Rationalizing agent behavior 5

Each player chooses a mixed strategy: 𝑞 𝑗 ∈ 𝐸(𝐵 𝑗 ) And no one is interested in changing her choice: NASH ∀𝑟 𝑗 ∈ 𝐸 𝐵 𝑗 : 𝑣 𝑗 𝑞 𝑗 , 𝑞 −𝑗 ≥ 𝑣 𝑗 𝑟 𝑗 , 𝑞 −𝑗 EQUILIBRIUM 𝑞 1 𝑞 2 6

𝑞 not necessarily product of distributions Equilibrium defined as 𝑗 ,𝑏 −𝑗 σ 𝑏 −𝑗 𝑞 𝑏 𝑘 𝑗 ,𝑏 −𝑗 𝑣 𝑗 𝑏 𝑘 𝑗 , 𝑏 −𝑗 ≥ σ 𝑏 −𝑗 𝑞 𝑏 𝑘 𝑗 , 𝑏 −𝑗 𝑣 𝑗 𝑏 𝑙 CORRELATED EQUILIBRIUM 𝑞 1,1 𝑞 1,2 𝑞 1,|𝐵 𝑗 | 𝑞 |𝐵 𝑘 |,1 𝑞 |𝐵 𝑘 |,|𝐵 𝑗 | 7

A specific kind of correlated equilibriums Probability distribution is sum of products of distributions POLYNOMIAL 𝑞 = σ 𝑙=1 𝐿 𝑟 𝑙 MIXTURE OF Where K is polynomial in input size and PRODUCTS every 𝑟 𝑙 is a product of distributions Every game has an easy to compute PMP equilibrium 8

Every player ’ s utility is determined by a limited number of observations SUCCINCT Interesting for the small number of parameters GAMES required to represent the utility Covering a vast number of games 9

LINEAR SUCCINCT GAMES A set of (not necessarily disjoint) factors for every player and a utility for every factor SUCCINCT 𝐻 ≔ [ 𝐵 𝑗 𝑗=1 𝑜 𝑜 𝑜 , 𝑤 𝑗 𝑗=1 , 𝑃 𝑗 𝑗=1 ] GAMES 𝑃 𝑗 ∈ 0,1 𝑛×𝑒 , 𝑤 𝑗 ∈ 𝑆 𝑒 ∀ 𝑗 : 𝑣 𝑗 = 𝑃 𝑗 𝑤 𝑗 Definition 𝑃 𝑗 is dimensionally large but has a compact representation 10

GRAPHICAL GAMES Player 𝑗 ’ s utility depends solely on her SUCCINCT and her neighbors ’ actions. GAMES if 𝑏, 𝑏 𝑂 𝑗 agree on the actions of 𝑂(𝑗) 𝑃 𝑗 𝑏,𝑏 𝑂 𝑗 = ቊ1 otherwise. 0 Example 1 11

CONGESTION GAMES Players choose from possible subsets of the set of resources. Each player should pay the cost of it ’ s chosen SUCCINCT resources according to the function: GAMES σ 𝑓∈𝑏 𝑗 𝑒 𝑓 (𝑚 𝑓 ) Where 𝑒 𝑓 is 𝑓 ’ s cost function and 𝑚 𝑓 is the Example 2 number of player ’ s using 𝑓 General case of network flow games if 𝑓 ∈ 𝑏 𝑗 and 𝑚 𝑓 𝑏 = 𝑀 𝑃 𝑗 𝑏,(𝑓,𝑀) = ቊ1 otherwise. 0 12

First we write the correlated equilibrium as a linear constraint: 𝑗 , 𝑏 −𝑗 σ 𝑏 −𝑗 𝑞 𝑏 𝑘 𝑗 , 𝑏 −𝑗 𝑣 𝑗 𝑏 𝑘 𝑗 , 𝑏 −𝑗 ≥ σ 𝑏 −𝑗 𝑞 𝑏 𝑘 𝑗 , 𝑏 −𝑗 𝑣 𝑗 𝑏 𝑙 RATIONALIZING A → 𝑞 𝑈 𝐷 𝑗𝑘𝑙 𝑣 𝑗 = 𝑞 𝑈 𝐷 𝑗𝑘𝑙 𝑃 𝑗 𝑤 𝑗 ≥ 0 GAME Where 𝐷 𝑗𝑘𝑙 is if 𝑏 𝑠𝑝𝑥 = (𝑏 𝑘 , 𝑏 −𝑗 𝑑𝑝𝑚 ) −1 𝐷 𝑗𝑘𝑙 (𝑏 𝑠𝑝𝑥 ,𝑏 𝑑𝑝𝑚 ) = ቐ if 𝑏 𝑠𝑝𝑥 = (𝑏 𝑙 , 𝑏 −𝑗 𝑑𝑝𝑚 ) 1 otherwise. 0 13

2 ∗ 𝑣 1 𝑏 1 2 + p 𝑏 1 2 ∗ 𝑣 1 𝑏 1 p 𝑏 1 1 , 𝑏 1 1 , 𝑏 1 1 , 𝑏 2 1 , 𝑏 2 2 2 ∗ 𝑣 1 𝑏 2 2 ∗ 𝑣 1 𝑏 2 2 + p 𝑏 1 1 , 𝑏 1 1 , 𝑏 1 1 , 𝑏 2 1 , 𝑏 2 2 ≥ p 𝑏 1 1 , 𝑏 1 2 𝑣 𝑏 1 1 , 0, −1, 0 RATIONALIZING A 1 , 𝑏 2 2 𝑣 𝑏 1 0, 1 , 0, −1 𝑟 1 , 𝑟 2 , 𝑟 3 , 𝑟 4 ≥ 0 GAME 1 , 𝑏 1 2 0 , 0 , 0 , 0 𝑣 𝑏 2 0 , 0 , 0 , 0 1 , 𝑏 2 2 𝑣 𝑏 2 Example Where: 𝑟 1 = 𝑞 𝑏 1 1 , 𝑏 1 2 𝑟 2 = 𝑞 𝑏 1 1 , 𝑏 2 2 𝑟 3 = 𝑞 𝑏 2 1 , 𝑏 1 2 𝑟 4 = 𝑞 𝑏 2 1 , 𝑏 2 2 14

To avoid trivial un-interesting solutions like 𝑤 𝑗 = 0 We add the condition: ∀𝑗: σ 𝑙=1 𝑒 𝑤 𝑗 𝑙 = 1 Furthermore by adding constraints or tweaking the NON- objective function of the optimization problem: DEGENERACY We can limit the answer space CONDITION We can add conditions based on prior knowledge of valuations and their coupling We can encourage properties like sparsity and entropy 15

FORMAL DEFINITION A set of 𝑀 partially observed succinct n-player games: INVERSE- 𝐻 𝑚 = 𝑜 𝑜 𝐵 𝑗𝑚 𝑗=1 , , 𝑃 𝑗𝑚 𝑗=1 for 𝑚 ∈ {1,2, … , 𝑀} UTILITY Each with an equilibria : 𝑞 𝑚 𝑚=1 𝑀 PROBLEM Find 𝑤 𝑗 𝑗=1 𝑂 Such that ∀𝑚, 𝑗, 𝑘, 𝑙: 𝑞 𝑚 𝑈 𝐷 𝑗𝑘𝑙𝑚 𝑃 𝑗𝑚 𝑤 𝑗 ≥ 0 16

T = 𝑞 𝑈 𝐷 𝑗𝑘𝑙 𝑃 𝑗 efficiently We need to compute c ijk Computing the probability of each factor in games that possess this property is feasible: COMPUTABILITY The following sum can be computed in PROPERTY polynomial time for any factor 𝑝 , product distribution 𝑞 and action 𝑏 𝑘 𝑗 σ 𝑏 −𝑗 : 𝑏 𝑘 𝑗 ,𝑏 −𝑗 ∈𝐵 𝑗 (𝑃) 𝑞(𝑏 −𝑗 ) 17

CONGESTION GAMES Each factor is a tuple (𝑓, 𝑀) meaning that the player 𝑗 and 𝑀 − 1 other players used the resource 𝑓 COMPUTABILITY 𝑗 case is trivial The answer for the 𝑓 ∉ 𝑏 𝑘 PROPERTY Otherwise we use dynamic programming to compute the probability of the sum of Bernoulli Example random variables being 𝑀 − 1 18

We had 𝑗 , 𝑏 −𝑗 𝑗 , 𝑏 −𝑗 𝑣 𝑗 𝑏 𝑘 𝑗 , 𝑏 −𝑗 ≥ σ 𝑏 −𝑗 𝑞 𝑏 𝑘 𝑗 , 𝑏 −𝑗 𝑣 𝑗 𝑏 𝑙 σ 𝑏 −𝑗 𝑞 𝑏 𝑘 Rewriting the left-hand side: σ 𝑏 −𝑗 𝑞(𝑏 𝑘 𝑗 , 𝑏 −𝑗 ) σ 𝑝∈𝑈 𝑗 (𝑏 𝑘 𝑗 ,𝑏 −𝑗 ) 𝑤 𝑗 (𝑝) LEARNING = σ 𝑝∈𝑃 𝑗 σ 𝑏 −𝑗 : 𝑏 𝑘 UTILITIES 𝑗 , 𝑏 −𝑗 𝑤 𝑗 (𝑝) 𝑗 ,𝑏 −𝑗 ∈𝐵 𝑗 (𝑝) 𝑞 𝑏 𝑘 = σ 𝑝∈𝑃 𝑗 𝑤 𝑗 (𝑝) σ 𝑏 −𝑗 : 𝑏 𝑘 𝑗 , 𝑏 −𝑗 𝑗 ,𝑏 −𝑗 ∈𝐵 𝑗 (𝑝) 𝑞 𝑏 𝑘 𝑈 (1) Computing 𝐷 𝑗𝑘𝑙 Where 𝑈 𝑗 𝑏 = 𝑝 𝑃 𝑏,𝑝 = 1} represents the set of factors triggered by 𝑏 Similarly for the right-hand side we have: = σ 𝑝∈𝑃 𝑗 𝑤 𝑗 (𝑝) σ 𝑏 −𝑗 : 𝑏 𝑙 𝑗 , 𝑏 −𝑗 𝑗 ,𝑏 −𝑗 ∈𝐵 𝑗 (𝑝) 𝑞 𝑏 𝑘 19

Subtracting the two results we have: σ 𝑝∈𝑃 𝑗 𝑤 𝑗 𝑝 [σ 𝑏 −𝑗 : 𝑏 𝑘 𝑗 , 𝑏 −𝑗 𝑗 ,𝑏 −𝑗 ∈𝐵 𝑗 (𝑝) 𝑞 𝑏 𝑘 𝑗 , 𝑏 −𝑗 ] ≥ 0 − 𝑞 𝑏 𝑘 𝑗 ,𝑏 −𝑗 ∈𝐵 𝑗 𝑝 𝑏 −𝑗 : 𝑏 𝑘 LEARNING We can factor 𝑞 out considering that it is a product UTILITIES of distributions. σ 𝑝∈𝑃 𝑗 𝑤 𝑗 𝑝 [σ 𝑏 −𝑗 : 𝑏 𝑘 𝑗 ,𝑏 −𝑗 ∈𝐵 𝑗 (𝑝) 𝑞 𝑏 −𝑗 𝑈 (2) Computing 𝐷 𝑗𝑘𝑙 − σ 𝑏 −𝑗 : 𝑏 𝑘 𝑗 ,𝑏 −𝑗 ∈𝐵 𝑗 𝑝 𝑞 𝑏 −𝑗 ] ≥ 0 The remaining inequality resembles the dot product T ) which we of 𝑤 𝑗 and another vector (namely c ijk know how to compute efficiently 20

Combination of these Linear Programs for every game results in valid valuations for each player: LEARNING Minimize σ 𝑗=1 𝑜 𝑔(𝑤 𝑗 ) UTILITIES 𝑈 𝑤 𝑗 ≥ 0 ∀𝑗, 𝑘, 𝑙 Subject to 𝑑 𝑗𝑘𝑙 1 𝑈 𝑤 𝑗 = 1 ∀𝑗 Optimization Problem Of course the resulting program is not necessarily feasible 21

FORMAL DEFINITION A set of 𝑀 partially observed succinct n-player games: INVERSE- 𝐻 𝑚 = 𝑜 𝐵 𝑗𝑚 𝑗=1 , , for 𝑚 ∈ {1,2, … , 𝑀} GAME Each with an equilibria : 𝑞 𝑚 𝑚=1 𝑀 PROBLEM Each with a set of candidate structures 𝑇 𝑚 𝑚=1 𝑀 𝑜 for Find 𝑤 𝑗 𝑗=1 𝑂 and choose a structure (𝑃 𝑗𝑚ℎ ) 𝑗=1 each game Such that ∀𝑚, 𝑗, 𝑘, 𝑙: 𝑞 𝑚 𝑈 𝐷 𝑗𝑘𝑙𝑚 𝑃 𝑗𝑚ℎ 𝑤 𝑗 ≥ 0 22

PROOF SKETCH 3-SAT reduction to a sequence of graphical games INVERSE- For every variable, a vertex with true and false GAME actions plus one base player with only one action PROBLEM For every clause, a game with three candidate structures. Each containing a single edge between one of the literals and the base node NP-HARDNESS Positive nodes play true, and negative nodes play false purely. 23

THANKS FOR YOUR ATTENTION Q & A

Recommend

More recommend