Multi-agent learning
Gradient as entGerard Vreeswijk, Intelligent Software Systems, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands.
Saturday 16th May, 2020
Multi-agent learning Gradient asent Gerard Vreeswijk , Intelligent - - PowerPoint PPT Presentation
Multi-agent learning Gradient asent Gerard Vreeswijk , Intelligent Software Systems, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands. Saturday 16 th May, 2020 Gradient ascent: idea Idea Author: Gerard
Gerard Vreeswijk, Intelligent Software Systems, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands.
Saturday 16th May, 2020
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 2
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 2
■ Every opponent is identified with a (possibly mixed) strategy.
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 2
■ Every opponent is identified with a (possibly mixed) strategy. ■ Players can observe the (possibly mixed) strategy of their opponent.
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 2
■ Every opponent is identified with a (possibly mixed) strategy. ■ Players can observe the (possibly mixed) strategy of their opponent. ■ After observation, every player changes its strategy a tiny bit in the
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 2
■ Every opponent is identified with a (possibly mixed) strategy. ■ Players can observe the (possibly mixed) strategy of their opponent. ■ After observation, every player changes its strategy a tiny bit in the
■ Comparison with fictitious play.
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 2
■ Every opponent is identified with a (possibly mixed) strategy. ■ Players can observe the (possibly mixed) strategy of their opponent. ■ After observation, every player changes its strategy a tiny bit in the
■ Comparison with fictitious play.
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 2
■ Every opponent is identified with a (possibly mixed) strategy. ■ Players can observe the (possibly mixed) strategy of their opponent. ■ After observation, every player changes its strategy a tiny bit in the
■ Comparison with fictitious play.
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 2
■ Every opponent is identified with a (possibly mixed) strategy. ■ Players can observe the (possibly mixed) strategy of their opponent. ■ After observation, every player changes its strategy a tiny bit in the
■ Comparison with fictitious play.
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 3
Dynami sAuthor: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 3
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 3
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 3
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 3
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 3
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 3
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 3
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 3
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 3
■ Convergence of IGA.
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 3
■ Convergence of IGA.
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 3
■ Convergence of IGA.
■ Convergence of IGA-WoLF
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 3
■ Convergence of IGA.
■ Convergence of IGA-WoLF + analysis of the proof of convergence.
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 4
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 5
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 5
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 5
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 5
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 5
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 5
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 5
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 5
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 5
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 6
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 6
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 6
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 6
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 6
■ There is at most one stationary
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 6
■ There is at most one stationary
■ If a stationary point exists, it
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 6
■ There is at most one stationary
■ If a stationary point exists, it
■ If there is a stationary point
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 7
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 7
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 8
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 9
Ane dierential map:Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 9
Ane dierential map:■ Because α, β ∈ [0, 1], the
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 9
Ane dierential map:■ Because α, β ∈ [0, 1], the
■ Suppose the state (α, β) is on
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 9
Ane dierential map:■ Because α, β ∈ [0, 1], the
■ Suppose the state (α, β) is on
■ To maintain dynamics within
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 9
Ane dierential map:■ Because α, β ∈ [0, 1], the
■ Suppose the state (α, β) is on
■ To maintain dynamics within
■ If nonzero, the projected
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 10
Ane dierential map:Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 10
Ane dierential map:Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 10
Ane dierential map:Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 10
Ane dierential map:Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 10
Ane dierential map:Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 10
Ane dierential map:Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 10
Ane dierential map:Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 10
Ane dierential map:Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 11
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 12
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 12
■ Symmetric, but not zero sum:
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 12
■ Symmetric, but not zero sum:
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 12
■ Symmetric, but not zero sum:
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 12
■ Symmetric, but not zero sum:
■ Matrix
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 12
■ Symmetric, but not zero sum:
■ Matrix
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 12
■ Symmetric, but not zero sum:
■ Matrix
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 13
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 13
■ Symmetric, but not zero sum:
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 13
■ Symmetric, but not zero sum:
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 13
■ Symmetric, but not zero sum:
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 13
■ Symmetric, but not zero sum:
■ Matrix
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 13
■ Symmetric, but not zero sum:
■ Matrix
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 13
■ Symmetric, but not zero sum:
■ Matrix
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 14
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 14
■ Symmetric, but not zero sum:
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 14
■ Symmetric, but not zero sum:
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 14
■ Symmetric, but not zero sum:
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 14
■ Symmetric, but not zero sum:
■ Matrix
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 14
■ Symmetric, but not zero sum:
■ Matrix
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 14
■ Symmetric, but not zero sum:
■ Matrix
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 15
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 15
■ Symmetric, but not zero sum:
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 15
■ Symmetric, but not zero sum:
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 15
■ Symmetric, but not zero sum:
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 15
■ Symmetric, but not zero sum:
■ Matrix
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 15
■ Symmetric, but not zero sum:
■ Matrix
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 15
■ Symmetric, but not zero sum:
■ Matrix
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 16
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 16
■ Symmetric, but not zero sum:
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 16
■ Symmetric, but not zero sum:
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 16
■ Symmetric, but not zero sum:
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 16
■ Symmetric, but not zero sum:
■ Matrix
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 16
■ Symmetric, but not zero sum:
■ Matrix
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 16
■ Symmetric, but not zero sum:
■ Matrix
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 17
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 17
■ Symmetric, zero sum:
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 17
■ Symmetric, zero sum:
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 17
■ Symmetric, zero sum:
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 17
■ Symmetric, zero sum:
■ Matrix
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 17
■ Symmetric, zero sum:
■ Matrix
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 17
■ Symmetric, zero sum:
■ Matrix
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 18
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 18
■ Symmetric, zero sum:
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 18
■ Symmetric, zero sum:
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 18
■ Symmetric, zero sum:
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 18
■ Symmetric, zero sum:
■ Matrix
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 18
■ Symmetric, zero sum:
■ Matrix
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 18
■ Symmetric, zero sum:
■ Matrix
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 19
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 19
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 19
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 19
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 19
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 19
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 19
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 19
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 19
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 20
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 21
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 21
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 21
t · ∂u1/∂α
t · ∂u2/∂β
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 21
t · ∂u1/∂α
t · ∂u2/∂β
t
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 21
t · ∂u1/∂α
t · ∂u2/∂β
t
t =Def
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 21
t · ∂u1/∂α
t · ∂u2/∂β
t
t =Def
t =Def
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 21
t · ∂u1/∂α
t · ∂u2/∂β
t
t =Def
t =Def
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 21
t · ∂u1/∂α
t · ∂u2/∂β
t
t =Def
t =Def
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 21
t · ∂u1/∂α
t · ∂u2/∂β
t
t =Def
t =Def
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 22
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 22
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 22
■ For ellipses with center (α∗, β∗) there are four possibilities, depending
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 22
■ For ellipses with center (α∗, β∗) there are four possibilities, depending
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 22
■ For ellipses with center (α∗, β∗) there are four possibilities, depending
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 22
■ For ellipses with center (α∗, β∗) there are four possibilities, depending
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 22
■ For ellipses with center (α∗, β∗) there are four possibilities, depending
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 22
■ For ellipses with center (α∗, β∗) there are four possibilities, depending
■ Bowling et al. do not prove this result but refer to Sing et al., who, on
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 23
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 23
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 23
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 23
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 23
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 23
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 24
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 24
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 24
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 24
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 24
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 24
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 24
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 24
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 24
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 25
lmin lmax
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 26
lmin lmax
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 27
lmin lmax
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 28
lmin lmax
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 29
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 29
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 29
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 29
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 29
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 29
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 29
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 30
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 31
:-)Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 31
■ Theorem (Singh, Kearns and
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 31
■ Theorem (Singh, Kearns and
■ Idea: use average payoffs to
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 31
■ Theorem (Singh, Kearns and
■ Idea: use average payoffs to
■ So gradient points slight more
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 31
■ Theorem (Singh, Kearns and
■ Idea: use average payoffs to
■ So gradient points slight more
■ At least works empirically
:-)Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 31
■ Theorem (Singh, Kearns and
■ Idea: use average payoffs to
■ So gradient points slight more
■ At least works empirically
:-)Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 32
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 32
■ Original work on gradient ascent in general-sum games:
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 32
■ Original work on gradient ascent in general-sum games:
Singh, Kearns, and Mansour (2000). “Nash convergence of gradient ascent in general- sum games”. In: Proc. of the Sixteenth Conf. on Uncertainty in Artificial Intelligence (pp. 541-548).
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 32
■ Original work on gradient ascent in general-sum games:
Singh, Kearns, and Mansour (2000). “Nash convergence of gradient ascent in general- sum games”. In: Proc. of the Sixteenth Conf. on Uncertainty in Artificial Intelligence (pp. 541-548).
■ Today’s presentation was mainly based on this conference
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 32
■ Original work on gradient ascent in general-sum games:
Singh, Kearns, and Mansour (2000). “Nash convergence of gradient ascent in general- sum games”. In: Proc. of the Sixteenth Conf. on Uncertainty in Artificial Intelligence (pp. 541-548).
■ Today’s presentation was mainly based on this conference
Bowling and Veloso (2001). “Convergence of gradient ascent with a Variable Learning Rate”. In Proc. of the Eighteenth Int. Conf. on Machine Learning (ICML), pp. 27-34, June 2001.
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 32
■ Original work on gradient ascent in general-sum games:
Singh, Kearns, and Mansour (2000). “Nash convergence of gradient ascent in general- sum games”. In: Proc. of the Sixteenth Conf. on Uncertainty in Artificial Intelligence (pp. 541-548).
■ Today’s presentation was mainly based on this conference
Bowling and Veloso (2001). “Convergence of gradient ascent with a Variable Learning Rate”. In Proc. of the Eighteenth Int. Conf. on Machine Learning (ICML), pp. 27-34, June 2001.
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 32
■ Original work on gradient ascent in general-sum games:
Singh, Kearns, and Mansour (2000). “Nash convergence of gradient ascent in general- sum games”. In: Proc. of the Sixteenth Conf. on Uncertainty in Artificial Intelligence (pp. 541-548).
■ Today’s presentation was mainly based on this conference
Bowling and Veloso (2001). “Convergence of gradient ascent with a Variable Learning Rate”. In Proc. of the Eighteenth Int. Conf. on Machine Learning (ICML), pp. 27-34, June 2001.
Bowling and Veloso (2002). “Multiagent Learning Using a Variable Learning Rate”. In: Artificial Intelligence 136, pp. 215-250, 2002.
Author: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 33
single mixed strategy p robabilit y distributionAuthor: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 33
■ With fictitious play, or gradient ascent, opponents are modelled by a
single mixed strategy. p robabilit y distributionAuthor: Gerard Vreeswijk. Slides last modified on May 16th, 2020 at 11:33 Multi-agent learning: Gradient ascent, slide 33
■ With fictitious play, or gradient ascent, opponents are modelled by a
single mixed strategy.■ With Bayesian play, opponents are modelled by a
p robabilit y distribution