Changing Perspective Common themes throughout past papers - - PowerPoint PPT Presentation

changing perspective common themes throughout past papers
SMART_READER_LITE
LIVE PREVIEW

Changing Perspective Common themes throughout past papers - - PowerPoint PPT Presentation

Changing Perspective Common themes throughout past papers Repeated simple games with small number of actions Mostly theoretical papers Known available actions Game-theory perspective How are these papers


slide-1
SLIDE 1

Changing Perspective…

  • Common themes throughout past papers
  • Repeated simple games with small number of actions
  • Mostly theoretical papers
  • Known available actions
  • Game-theory perspective
  • How are these papers different?
  • Evolutionary approach
  • Empirical papers (no analytic proofs)
  • Complex games, large strategy space
  • AI perspective
slide-2
SLIDE 2

Comparing Today’s Papers

  • Similarities
  • Use genetic algorithm to generate strategies
  • Evaluate generated strategies against a fixed set of opponents
  • Describe a (potentially) iterative approach
  • Differences
  • Phelps focuses on process within each iteration
  • Ficici focuses on movement from one iteration to the next
  • Different types of games (constant sum v. general sum)
slide-3
SLIDE 3

A Novel Method for Automatic Strategy Acquisition in N-Player Non-zero-sum Games (Phelps et al.)

  • Double auction setting --> potentially infinitely large

strategy space (intractable game)

  • Basis of iterative approach (not really discussed in paper)

that improves set of heuristic strategies through search

  • Uses genetic algorithm to find best strategy for current

market conditions

  • Evaluates strategy using replicator dynamics. Uses the

size of the basin of attraction (“market share”) to quantify the fitness of the strategy.

slide-4
SLIDE 4

Replicator Dynamics: Calculating the Payoff Matrix

  • Given a starting point, predicts trajectory of population

mix of pure strategies using the equation:

˙ m

j = [u(e j, r

m ) − u( r m , r m )]× m j

  • The utility is based off of a roughly calibrated heuristic

payoff matrix. Payoff matrix generated through simulation of the game to get the expected payoff for each agent.

  • Payoffs are independent of type, justified by the type-

dependent variations in actual game being averaged out via sampling --> simplifies payoff matrix

  • Assumes that can simulate game
slide-5
SLIDE 5

Replicator Dynamics: Finding the Candidate Strategy

  • After running many trajectories, calculate the size of the

basin of attraction attributable to each pure strategy

  • Run perturbation analysis (increase payoffs of candidate

strategy) and replot replicator-dynamics direction field. Answers question of which strategy is worthwhile to concentrate efforts of improvement.

  • In the paper’s example, ran this algorithm on 3 strategies:

truth-telling (TT), Roth-Erev (RE), and Gjerstad-Dickhaut (GD) and found that RE is the best strategy to focus improvement efforts.

slide-6
SLIDE 6

The Novel Method For each individual in a generation from GA:

Sample

  • ver types

simulate games Heuristic Payoff Matrix

RD

Fitness Value

Create next generation in GA based on fitness values (Repeat for each generation)

slide-7
SLIDE 7

Results

  • Found the optimized strategy (OS) to be an RE strategy

with stateless Q-learning. OS’s market share against

  • riginal RE, GD, and TT is 65%, greater than TT (32%), GD

(3%) and the original RE (0%).

  • However, took 1800 CPU hours to compute (over 2

months)

  • Keep in mind, this is only for one iteration. Presumably,

next step would be to substitute OS for RE in the set of strategies and run the algorithm again. But Phelps does not go into this.

slide-8
SLIDE 8

Some Strengths and Weaknesses of Phelps paper

  • Strengths
  • Application to real-world setting (double auction)
  • Applies to general-sum game with (infinitely) large

strategy space

  • Weaknesses
  • Very large computation time. Additionally, time increases

exponentially with number of strategies (due to computation

  • f RD)
  • Dependent on having attractors in RD (there exists games

that this does not hold true)

  • Any more? Other remarks?
slide-9
SLIDE 9

Ficici and Pollack Paper

  • Phelps does not address how to iterate his approach
  • Second paper, A Game-Theoretic Memory Mechanism for

Coevolution (Ficici and Pollack) uses similar approach but with focus on moving from one iteration to the next

  • In each iteration, performs search for strategies (GA)
  • Evaluates fitness of strategies by playing them against

most fit opponent from last iteration (N)

  • Also keeps set of potentially useful strategies in memory

(M), updates memory every iteration

slide-10
SLIDE 10

Motivation / Set up for Second Paper

  • For symmetric zero-sum 2-player games (generalizes to

asymmetric constant-sum 2-player games)

  • In coevolutionary algorithms, the population contains

genetic diversity for effective search for new strategies AND

  • ften represents the solution to the problem --> these two
  • bjectives can be conflicting
  • Avoid “forgetting” - problematic if have intransitive cycle
  • This paper separates these two functions:
  • Uses memory mechanism to hold solution and previously

encountered strategies.

  • This lets the population maintain genetic diversity that’s

useful for performing effective search.

slide-11
SLIDE 11

Nash Memory Mechanism, some definitions

  • For mixed strategy m
  • C(m) = support set of m
  • S(m) = security set of m = {all s : E(m, s) ≥ 0}
  • Nash Memory Mechanism: N & M - mutually exclusive sets
  • N: holds best mixed strategy over time
  • M: holds strategies not in N, but that may be useful later,

has limited capacity c.

  • H: search heuristic (in this paper a GA was used)
  • Q: set of strategies delivered by H
slide-12
SLIDE 12

Memory Mechanism Updating

  • For each iteration, GA evolves strategies by testing against

fixed opponent N.

  • At end of iteration, tests each q ∈ Q and see if beats N
  • Let W = “winners from GA” = {q ∈ Q: E(q, N) > 0}
  • Use polynomial-time linear program to update N -> N’ and

M -> M’ defined by the following constraints:

  • C(N’) ⊆ (W ∪ N ∪ M)
  • S(N’) ⊇ (W ∪ N ∪ M), Note: S(N’) not nec. ⊇ S(N)
  • C(M’) ⊆ (W ∪ N ∪ M)
  • specifically, M’ gets unnecessary strategies from W,

strategies released from N, and unneeded leftovers from M

slide-13
SLIDE 13

Memory Mechanism Updating

W N N’ M M’

From Search Heuristic Pre-Update State Post-Update State Discard some if exceed capacity

slide-14
SLIDE 14

Strengths / Weaknesses of Ficici Paper

  • Strengths
  • Incorporates game theoretic notion of Nash into solution
  • Avoids the problems coming from forgetting in context of

games with intransitive cycles

  • Time-efficient (polynomial time for memory update)
  • Outperformed other memory mechanisms (BOG, DT) in test
  • Weaknesses
  • Allows for some memory loss in update rule
  • No analytical proofs
  • Strong results only (seem to) hold for 2-player constant sum
  • For general sum, lose efficiency, have multiple equilibria
  • Performance when intransitive cycles not as important?
slide-15
SLIDE 15

Concluding Remarks

  • How do these papers compare to what we’ve seen

previously?

  • Values and contexts for different approaches?
  • Other comments?