1
1
CS 331: Artificial Intelligence Game Theory I
2
Prisoner’s Dilemma
You and your partner have both been caught red handed near the scene of a burglary. Both
- f you have been brought to the police station,
CS 331: Artificial Intelligence Game Theory I 1 Prisoners Dilemma - - PDF document
CS 331: Artificial Intelligence Game Theory I 1 Prisoners Dilemma You and your partner have both been caught red handed near the scene of a burglary. Both of you have been brought to the police station, where you are interrogated
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
S1 S2 S3 S1 A = 0, B = 4 A = 4, B = 0 A = 5, B = 3 S2 A = 4, B = 0 A = 0, B = 4 A = 5, B = 3 S3 A = 3, B = 5 A = 3, B = 5 A = 6, B = 6
B won’t change his Strategy of S3 Payoff of 6 > 5 (S2) and 6 > 5 (S1) A won’t change her Strategy of S3 Payoff of 6 > 5 (S2) and 6 > 5 (S1)
33
1 S1 , s* 2 S2 , …, s* n Sn are a Nash equilibrium iff:
* * 1 * 1 * 1 * i n i i i i s
i
34
S1 S2 S3 S1 A = 0, B = 4 A = 4, B = 0 A = 5, B = 3 S2 A = 4, B = 0 A = 0, B = 4 A = 5, B = 3 S3 A = 3, B = 5 A = 3, B = 5 A = 6, B = 6 B A
B B B B A A A A
35
36
37
38
40
41
42
43
44
45
46
47
48
5p - 3 = -7p + 4 => 12p = 7 => p = 7/12 When p < 7/12, E plays ‘two’ When p > 7/12, E plays ‘one’ O gets to pick p to minimize E’s expected payoff. O picks the lowest point of the higher of the two lines. This happens at the intersection of the two lines. E’s expected payoff at p=7/12 is 5(7/12)-3 = -1/12 O’s mixed strategy is (7/12 for ‘one’, 5/12 for ‘two’)
E's expected payoff if O plays 'one' with probability p and 'two' with probability (1-p)
1 2 3 4 5 0.2 0.4 0.6 0.8 1 p Expected Payoff to E E plays 'one' E plays 'two'
49
50
7 = 12q q = 7/12 When q < 7/12, O plays ‘one’ When q > 7/12, O plays ‘two’ E gets to pick p to minimize O’s expected payoff. E picks the lowest point of the higher of the two lines. This happens at the intersection of the two lines. O’s expected payoff at q=7/12 is -5(7/12)+3 = -35/12 + 36/12 = 1/12. E’s mixed strategy is (7/12 for ‘one’, 5/12 for ‘two’)
O's expected payoff when E plays 'one' with probability q and 'two' with probability (1-q)
1 2 3 4 5 0.2 0.4 0.6 0.8 1 q O's Expected Payoff O plays 'one' O plays 'two'
51
52
m11p + m21(1-p)
m12p + m22(1-p)
max( m11p + m21(1-p), m12p + m22(1-p))
intersect
looking at B’s payoffs now B: S1 B: S2 A: S1 A = m11 A = m21 A: S2 A = m12 A = m22
54
55
Strategy S1 with probability p1 Strategy S2 with probability p2
:
Strategy SN with probability pN
Pure strategy S1: e1 = m11p1 + m21p2 + … + mN1pN Pure strategy S2: e2 = m12p1 + m22p2 + … + mN2pN
:
Pure strategy SM: eM = m1Mp1 + m2Mp2 + … + mNMpN
max( e1, e2, …, eM ) subject to Σ pi = 1 and 0 ≤ pi ≤ 1 for all i
actions)
payoffs
57
– Assumes opponents will play the equilibrium strategy – What to do with multiple Nash equilibria? – Computing Nash equilibria for complex games is nasty (perhaps even intractable) – Players have non-stationary policies
58