PENG Session 2 Roland M uhlenbernd Seminar f ur - - PowerPoint PPT Presentation

▶

Feb 18, 2023 244 likes •347 views

PENG Session 2 Roland M uhlenbernd Seminar f ur Sprachwissenschaft University of T ubingen Review Prominent 2-player Games: C D S R B S C 3 , 3 0 , 5 S 2 , 2 0 , 1 B 2 , 1 0 , 0 5 , 0 1 , 1 1 , 0 1 , 1 0 , 0 1 , 2 D R S

SLIDE 1

PENG Session 2

Roland M¨ uhlenbernd Seminar f¨ ur Sprachwissenschaft University of T¨ ubingen

SLIDE 2

Review

Prominent 2-player Games: C D C 3, 3 0, 5 D 5, 0 1, 1

C: Cooperate, D: Defect

S R S 2, 2 0, 1 R 1, 0 1, 1

S: Stag, R: Rabbit

B S B 2, 1 0, 0 S 0, 0 1, 2

B: Bach, S: Stravinsky

Signaling Game SG = {S, R}, T, Pr, M, A, U N S R 1 R 1 S R 1 R 1

.5 .5

t1 t2 m1 m2 m1 m2 a1 a2 a1 a2 a1 a2 a1 a2

SLIDE 3

Repeated Games: Decisions

◮ From an agent’s perspective a game is a decision problem ◮ The agent has to decide between different moves

(e.g. cooperate or defect, m1 or m2, a1 or a2)

◮ An agent’s decision can be guided by

◮ update dynamics ◮ learning dynamics ◮ reasoning ◮ beliefs about participant ◮ best response ◮ imitation ◮ chance

SLIDE 4

Repeated Games: Update Dynamics

◮ Learning Dynamics: Collecting information of previous

encounters

◮ Reasoning: Forward induction (ai expects that aj expects

that... plays defect)

◮ Best response: ai plays that move that maximizes utility by

knowing or believing the opponents move

◮ Imitate the Best: Play that move that resulted in maximal

utility in the last round among all neighbours

SLIDE 5

Repeated Games: The Evolution of Cooperation

Robert Axelrod’s Computer tournament (1979): C D C 3;3 0;5 D 5;0 1;1

Tabelle: Prisoner’s Dilemma

◮ Finding the best strategy for the Iterated Prisoners’ Dilemma

(IPD)

◮ Game theorists were invited to submit their favourite strategy

(decision rule)

◮ All submitted strategies play against each other for 200 rounds ◮ The strategy with the highest average score wins the

tournament

SLIDE 6

Repeated Games: The Evolution of Cooperation

◮ TIT FOR TAT: Cooperate in the first round and then do what

your opponent did last round

◮ FRIEDMAN: Cooperate until the opponent defects, then

defect all the time

◮ DOWNING:

◮ Estimate probabilities p1 = P(C t

O|C t−1 I

), p2 = P(C t

O|Dt−1 I

)

◮ If p1 >> p2 the opponent is responsive: Cooperate ◮ Else the opponent is not responsive: Defect

◮ TRANQUILIZER:

◮ Cooperate the first moves and check the opponents response ◮ If there arises a pattern of mutual cooperation: Defect from

time to time

◮ If opponent continues cooperating, defections become more

frequent

◮ TIT FOR 2 TATS: Play TIT FOR TAT, but response with

defect if the opponent defected on the previous two moves

◮ JOSS: Play TIT FOR TAT, but response with defects in 10%

f opponent’s cooperation moves

SLIDE 7

Repeated Games: The Evolution of Cooperation

Results:

1. The winner was TIT FOR TAT with 504 points
2. Success in such a game correlated with the following

characteristics:

◮ Be nice: cooperate, never be the first to defect. ◮ Be provocable: return defection for defection, cooperation for

cooperation.

◮ Don’t be envious: be fair with your partner. ◮ Don’t be too clever: or, don’t try to be tricky.

SLIDE 8

PENG Session 2

Roland M¨ uhlenbernd Seminar f¨ ur Sprachwissenschaft University of T¨ ubingen

Review

Prominent 2-player Games: C D C 3, 3 0, 5 D 5, 0 1, 1

S R S 2, 2 0, 1 R 1, 0 1, 1

B S B 2, 1 0, 0 S 0, 0 1, 2

Signaling Game SG = {S, R}, T, Pr, M, A, U N S R 1 R 1 S R 1 R 1

t1 t2 m1 m2 m1 m2 a1 a2 a1 a2 a1 a2 a1 a2

Repeated Games: Decisions

◮ From an agent’s perspective a game is a decision problem ◮ The agent has to decide between different moves

(e.g. cooperate or defect, m1 or m2, a1 or a2)

◮ An agent’s decision can be guided by

Repeated Games: Update Dynamics

◮ Learning Dynamics: Collecting information of previous

encounters

◮ Reasoning: Forward induction (ai expects that aj expects

that... plays defect)

◮ Best response: ai plays that move that maximizes utility by

knowing or believing the opponents move

◮ Imitate the Best: Play that move that resulted in maximal

utility in the last round among all neighbours

Repeated Games: The Evolution of Cooperation

Robert Axelrod’s Computer tournament (1979): C D C 3;3 0;5 D 5;0 1;1

Tabelle: Prisoner’s Dilemma

◮ Finding the best strategy for the Iterated Prisoners’ Dilemma

(IPD)

◮ Game theorists were invited to submit their favourite strategy

(decision rule)

◮ All submitted strategies play against each other for 200 rounds ◮ The strategy with the highest average score wins the

tournament

Repeated Games: The Evolution of Cooperation

◮ TIT FOR TAT: Cooperate in the first round and then do what

your opponent did last round

◮ FRIEDMAN: Cooperate until the opponent defects, then

defect all the time

◮ DOWNING:

), p2 = P(C t

)

◮ TRANQUILIZER:

time to time

frequent

◮ TIT FOR 2 TATS: Play TIT FOR TAT, but response with

defect if the opponent defected on the previous two moves

◮ JOSS: Play TIT FOR TAT, but response with defects in 10%

Repeated Games: The Evolution of Cooperation

Results:

characteristics:

cooperation.

Homework

◮ Model your own PD-Agent ◮ Check it out: www.pgrim.org/pragmatics