PENG Session 2 Roland M uhlenbernd Seminar f ur - - PowerPoint PPT Presentation
PENG Session 2 Roland M uhlenbernd Seminar f ur - - PowerPoint PPT Presentation
PENG Session 2 Roland M uhlenbernd Seminar f ur Sprachwissenschaft University of T ubingen Review Prominent 2-player Games: C D S R B S C 3 , 3 0 , 5 S 2 , 2 0 , 1 B 2 , 1 0 , 0 5 , 0 1 , 1 1 , 0 1 , 1 0 , 0 1 , 2 D R S
Review
Prominent 2-player Games: C D C 3, 3 0, 5 D 5, 0 1, 1
C: Cooperate, D: Defect
S R S 2, 2 0, 1 R 1, 0 1, 1
S: Stag, R: Rabbit
B S B 2, 1 0, 0 S 0, 0 1, 2
B: Bach, S: Stravinsky
Signaling Game SG = {S, R}, T, Pr, M, A, U N S R 1 R 1 S R 1 R 1
.5 .5
t1 t2 m1 m2 m1 m2 a1 a2 a1 a2 a1 a2 a1 a2
Repeated Games: Decisions
◮ From an agent’s perspective a game is a decision problem ◮ The agent has to decide between different moves
(e.g. cooperate or defect, m1 or m2, a1 or a2)
◮ An agent’s decision can be guided by
◮ update dynamics ◮ learning dynamics ◮ reasoning ◮ beliefs about participant ◮ best response ◮ imitation ◮ chance
Repeated Games: Update Dynamics
◮ Learning Dynamics: Collecting information of previous
encounters
◮ Reasoning: Forward induction (ai expects that aj expects
that... plays defect)
◮ Best response: ai plays that move that maximizes utility by
knowing or believing the opponents move
◮ Imitate the Best: Play that move that resulted in maximal
utility in the last round among all neighbours
Repeated Games: The Evolution of Cooperation
Robert Axelrod’s Computer tournament (1979): C D C 3;3 0;5 D 5;0 1;1
Tabelle: Prisoner’s Dilemma
◮ Finding the best strategy for the Iterated Prisoners’ Dilemma
(IPD)
◮ Game theorists were invited to submit their favourite strategy
(decision rule)
◮ All submitted strategies play against each other for 200 rounds ◮ The strategy with the highest average score wins the
tournament
Repeated Games: The Evolution of Cooperation
◮ TIT FOR TAT: Cooperate in the first round and then do what
your opponent did last round
◮ FRIEDMAN: Cooperate until the opponent defects, then
defect all the time
◮ DOWNING:
◮ Estimate probabilities p1 = P(C t
O|C t−1 I
), p2 = P(C t
O|Dt−1 I
)
◮ If p1 >> p2 the opponent is responsive: Cooperate ◮ Else the opponent is not responsive: Defect
◮ TRANQUILIZER:
◮ Cooperate the first moves and check the opponents response ◮ If there arises a pattern of mutual cooperation: Defect from
time to time
◮ If opponent continues cooperating, defections become more
frequent
◮ TIT FOR 2 TATS: Play TIT FOR TAT, but response with
defect if the opponent defected on the previous two moves
◮ JOSS: Play TIT FOR TAT, but response with defects in 10%
- f opponent’s cooperation moves
Repeated Games: The Evolution of Cooperation
Results:
- 1. The winner was TIT FOR TAT with 504 points
- 2. Success in such a game correlated with the following
characteristics:
◮ Be nice: cooperate, never be the first to defect. ◮ Be provocable: return defection for defection, cooperation for
cooperation.
◮ Don’t be envious: be fair with your partner. ◮ Don’t be too clever: or, don’t try to be tricky.