Escaping Saddle Points in Constant Dimensional Spaces: an - - PowerPoint PPT Presentation

escaping saddle points in constant dimensional spaces an
SMART_READER_LITE
LIVE PREVIEW

Escaping Saddle Points in Constant Dimensional Spaces: an - - PowerPoint PPT Presentation

Escaping Saddle Points in Constant Dimensional Spaces: an Agent-based Modeling Perspective Grant Schoenebeck, University of Michigan Fang-Yi Yu , Harvard University Results Analyze the convergence rate of a family of stochastic processes


slide-1
SLIDE 1

Escaping Saddle Points in Constant Dimensional Spaces: an Agent-based Modeling Perspective

Grant Schoenebeck, University of Michigan Fang-Yi Yu, Harvard University

slide-2
SLIDE 2

Results

  • Analyze the convergence rate of

a family of stochastic processes

  • Three related applications

– Evolutionary game theory – Dynamics on social networks – Stochastic Gradient Descent Evolutionary Game theory Dynamics

  • n social

networks Stochastic Gradient Descent

slide-3
SLIDE 3

Target Audience

Evolutionary Game theory Dynamics

  • n social

networks Stochastic Gradient Descent

slide-4
SLIDE 4

Target Audience

Evolutionary Game theory Dynamics

  • n social

networks Stochastic Gradient Descent

slide-5
SLIDE 5

Target Audience (still not-to-scale)

Evolutionary Game Theory Stochastic Gradient Descent Dynamics on social networks

slide-6
SLIDE 6

Outline

  • Escaping saddle point

Stochastic Gradient Descent Dynamics

  • n social

networks Evolutionary game theory

slide-7
SLIDE 7

Outline

  • Escaping saddle point
  • Case study: dynamics on social networks

Stochastic Gradient Descent Dynamics

  • n social

networks Evolutionary game theory

slide-8
SLIDE 8

ESCAPING SADDLE POINTS

Upper bounds and lower bounds

slide-9
SLIDE 9

Reinforced random walk with 𝐺

A discrete time stochastic process {𝑌𝑙: 𝑙 = 0, 1, … } in ℝ𝑒 that admits the following representation, 𝑌𝑙+1 − 𝑌𝑙 = 1 𝑜 𝐺 𝑌𝑙 + 𝑉𝑙

𝑌𝑙 1 𝑜 𝐺(𝑌𝑙) 𝑌𝑙+1 1 𝑜 𝑉𝑙

slide-10
SLIDE 10

Reinforced random walk with 𝐺

A discrete time stochastic process {𝑌𝑙: 𝑙 = 0, 1, … } in ℝ𝑒 that admits the following representation, 𝑌𝑙+1 − 𝑌𝑙 = 1 𝑜 𝐺 𝑌𝑙 + 𝑉𝑙

  • Expected difference (drift), 𝐺 𝑌

𝑌𝑙 1 𝑜 𝐺(𝑌𝑙) 𝑌𝑙+1 1 𝑜 𝑉𝑙

slide-11
SLIDE 11

Reinforced random walk with 𝐺

A discrete time stochastic process {𝑌𝑙: 𝑙 = 0, 1, … } in ℝ𝑒 that admits the following representation, 𝑌𝑙+1 − 𝑌𝑙 = 1 𝑜 𝐺 𝑌𝑙 + 𝑉𝑙

  • Expected difference (drift), 𝐺 𝑌
  • Unbiased noise (noise), 𝑉𝑙

𝑌𝑙 1 𝑜 𝐺(𝑌𝑙) 𝑌𝑙+1 1 𝑜 𝑉𝑙

slide-12
SLIDE 12

Reinforced random walk with 𝐺

A discrete time stochastic process {𝑌𝑙: 𝑙 = 0, 1, … } in ℝ𝑒 that admits the following representation, 𝑌𝑙+1 − 𝑌𝑙 = 1 𝑜 𝐺 𝑌𝑙 + 𝑉𝑙

  • Expected difference (drift), 𝐺 𝑌
  • Unbiased noise (noise), 𝑉𝑙
  • Step size, 1/𝑜

𝑌𝑙 1 𝑜 𝐺(𝑌𝑙) 𝑌𝑙+1 1 𝑜 𝑉𝑙

slide-13
SLIDE 13

Examples

A discrete time Markov process {𝑌𝑙: 𝑙 = 0, 1, … } in ℝ𝑒 that admits the following representation, 𝑌𝑙+1 − 𝑌𝑙 = 1 𝑜 𝐺 𝑌𝑙 + 𝑉𝑙

  • Agent based models with 𝑜 agents

– Evolutionary games – Dynamics on social networks

  • Heuristic local search algorithms with uniform step size 1/𝑜
slide-14
SLIDE 14

Node Dynamic on complete graphs [SY18]

  • Let 𝑔

𝑂𝐸: 0,1 → [0,1]. 𝑜 agents interact on a complete graph

  • Each agent 𝑤 has an initial binary state 𝐷0(v) ∈ {0,1}
  • At round 𝑙,
  • Pick a node 𝑤 uniformly at random
  • Compute the fraction of opinion 1, 𝑌𝑙 =

𝐷𝑙

−1(1)

𝑜

  • Update 𝐷𝑙+1(𝑤) to 1 w.p. 𝑔

𝑂𝐸 𝑌𝑙 ; 0 o.w.

<- Complete graph

slide-15
SLIDE 15

Node Dynamic

Includes several existing dynamics

  • Voter model
  • Iterative majority [Mossel et al 14]
  • Iterative 3-majority [Doerr et al 11]

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1

Update functions

Voter Majority 3-Majority

slide-16
SLIDE 16

Node Dynamic

Node dynamic on complete graphs

  • Let 𝑔

𝑂𝐸: 0,1 → [0,1].

There are 𝑜 agents on a complete graph

  • Each agent 𝑤 has an initial binary state

𝐷0(v) ∈ {0,1}

  • At round 𝑙,
  • Pick a node 𝑤 uniformly at random
  • Compute the fraction of opinion 1, 𝑌𝑙 =

𝐷𝑙

−1(1)

𝑜

  • Update 𝐷𝑙+1(𝑤) to 1 w.p. 𝑔

𝑂𝐸 𝑌𝑙 ; 0 o.w.

Reinforced random walk on ℝ

  • 𝑌𝑙 be the fraction of nodes in state 1

at 𝑙.

slide-17
SLIDE 17

Node Dynamic

Node dynamic on complete graphs

  • Let 𝑔

𝑂𝐸: 0,1 → [0,1].

There are 𝑜 agents on a complete graph

  • Each agent 𝑤 has an initial binary state

𝐷0(v) ∈ {0,1}

  • At round 𝑙,
  • Pick a node 𝑤 uniformly at random
  • Compute the fraction of opinion 1, 𝑌𝑙 =

𝐷𝑙

−1(1)

𝑜

  • Update 𝐷𝑙+1(𝑤) to 1 w.p. 𝑔

𝑂𝐸 𝑌𝑙 ; 0 o.w.

Reinforced random walk on ℝ

  • 𝑌𝑙 be the fraction of nodes in state 1 at 𝑙.
  • Given 𝑌𝑙, the expected number of nodes in

state 1 after round 𝑙, is E[𝑜𝑌𝑙+1 ∣ 𝑌𝑙] = 𝑜𝑌𝑙 + (𝑔

𝑂𝐸 𝑌𝑙 − 𝑌𝑙).

slide-18
SLIDE 18

Node Dynamic

Node dynamic on complete graphs

  • Let 𝑔

𝑂𝐸: 0,1 → [0,1].

There are 𝑜 agents on a complete graph

  • Each agent 𝑤 has an initial binary state

𝐷0(v) ∈ {0,1}

  • At round 𝑙,
  • Pick a node 𝑤 uniformly at random
  • Compute the fraction of opinion 1, 𝑌𝑙 =

𝐷𝑙

−1(1)

𝑜

  • Update 𝐷𝑙+1(𝑤) to 1 w.p. 𝑔

𝑂𝐸 𝑌𝑙 ; 0 o.w.

Reinforced random walk on ℝ

  • 𝑌𝑙 be the fraction of nodes in state 1 at 𝑙.
  • Given 𝑌𝑙, the expected number of nodes in

state 1 after round 𝑙, is E[𝑜𝑌𝑙+1 ∣ 𝑌𝑙] = 𝑜𝑌𝑙 + (𝑔

𝑂𝐸 𝑌𝑙 − 𝑌𝑙).

Updated to 1 from 1

slide-19
SLIDE 19

Node Dynamic

Node dynamic on complete graphs

  • Let 𝑔

𝑂𝐸: 0,1 → [0,1].

There are 𝑜 agents on a complete graph

  • Each agent 𝑤 has an initial binary state

𝐷0(v) ∈ {0,1}

  • At round 𝑙,
  • Pick a node 𝑤 uniformly at random
  • Compute the fraction of opinion 1, 𝑌𝑙 =

𝐷𝑙

−1(1)

𝑜

  • Update 𝐷𝑙+1(𝑤) to 1 w.p. 𝑔

𝑂𝐸 𝑌𝑙 ; 0 o.w.

Reinforced random walk on ℝ

  • 𝑌𝑙 be the fraction of nodes in state 1 at 𝑙.
  • E 𝑌𝑙+1

𝑌𝑙 − 𝑌𝑙 =

1 𝑜 (𝑔 𝑂𝐸 𝑌𝑙 − 𝑌𝑙).

Drift 𝐺(𝑌𝑙)

slide-20
SLIDE 20

Node Dynamic

Node dynamic on complete graphs

  • Let 𝑔

𝑂𝐸: 0,1 → [0,1].

There are 𝑜 agents on a complete graph

  • Each agent 𝑤 has an initial binary state

𝐷0(v) ∈ {0,1}

  • At round 𝑙,
  • Pick a node 𝑤 uniformly at random
  • Compute the fraction of opinion 1, 𝑌𝑙 =

𝐷𝑙

−1(1)

𝑜

  • Update 𝐷𝑙+1(𝑤) to 1 w.p. 𝑔

𝑂𝐸 𝑌𝑙 ; 0 o.w.

Reinforced random walk on ℝ

  • 𝑌𝑙 be the fraction of nodes in state 1 at 𝑙.
  • 𝑌𝑙+1 − 𝑌𝑙 =

1 𝑜

𝑔

𝑂𝐸 𝑌𝑙 − 𝑌𝑙 + 𝑉𝑙 .

Drift Noise

slide-21
SLIDE 21

Question

Given 𝐺 and 𝑉, what is the limit of 𝑌𝑙 for sufficiently large 𝑜? 𝑌𝑙+1 − 𝑌𝑙 = 1 𝑜 𝐺 𝑌𝑙 + 𝑉𝑙

slide-22
SLIDE 22

Mean field approximation

𝑌𝑙+1 − 𝑌𝑙 = 1 𝑜 (𝐺 𝑌𝑙 + 𝑉 𝑌𝑙 )

𝑦′ = 𝐺(𝑦)

slide-23
SLIDE 23

Mean field approximation

If 𝑜 is large enough, for 𝑙 = 𝑃(𝑜), 𝑌𝑙 ≈ 𝑦

𝑙 𝑜 by Wormald et al 95.

slide-24
SLIDE 24

Regular point

If 𝑜 is large enough, for 𝑙 = 𝑃(𝑜), 𝑌𝑙 ≈ 𝑦

𝑙 𝑜 .

slide-25
SLIDE 25

Fixed point, 𝑮 𝒚∗ = 𝟏

If 𝑜 is large enough, for 𝑙 = 𝑃(𝑜), 𝑌𝑙 ≈ 𝑦

𝑙 𝑜 .

slide-26
SLIDE 26

Escaping non-attracting fixed point

When can the process escape a non-attracting fixed point?

slide-27
SLIDE 27

Escaping non-attracting fixed point

When can the process escape a non-attracting fixed point?

  • 1. Θ 𝑜
  • 2. Θ(𝑜 log 𝑜)
  • 3. Θ 𝑜 log 𝑜 4
  • 4. Θ 𝑜2
slide-28
SLIDE 28

Escaping non-attracting fixed point

When can the process escape a non-attracting fixed point?

  • 1. Θ 𝑜

2.

  • 2. 𝚰(𝒐 𝒎𝒑𝒉 𝒐)
  • 3. Θ 𝑜 log 𝑜 4
  • 4. Θ 𝑜2
slide-29
SLIDE 29

Lower bound

Escaping saddle point region takes at least Ω(𝑜 log 𝑜) steps.

𝑌0 = 𝑦∗ 𝜗

slide-30
SLIDE 30

Upper bound

Escaping saddle point region takes at most O(𝑜 log 𝑜) steps. If

𝑌0 = 𝑦∗ 𝜗reg 𝑌𝑈, 𝑈 = 𝑃(𝑜 log 𝑜)

slide-31
SLIDE 31

Upper bound

Escaping saddle point region takes at most O(𝑜 log 𝑜) steps. If

  • Noise, 𝑉𝑙

– Martingale difference – bounded – Noisy (covariance matrix is large)

  • Expected difference, 𝐺 ∈ 𝒟2

– 𝑦∗ is hyperbolic

𝑌0 = 𝑦∗ 𝜗reg 𝑌𝑈, 𝑈 = 𝑃(𝑜 log 𝑜)

slide-32
SLIDE 32

Gradient-like dynamics

Converges to an attracting fixed-point region in O(𝑜 log 𝑜) steps.

If

  • Noise, 𝑉𝑙

– Martingale difference – bounded – Noisy

  • Expected difference, 𝐺 ∈ 𝒟2

– Fixed points are hyperbolic – Potential function

slide-33
SLIDE 33

Outline

  • Escaping saddle point

Stochastic Gradient Descent Dynamics

  • n social

networks Evolutionary game theory

slide-34
SLIDE 34

Outline

  • Escaping saddle point
  • Case study: dynamics on social networks

Stochastic Gradient Descent Dynamics

  • n social

networks Evolutionary game theory

slide-35
SLIDE 35

(DIS)AGREEMENT IN PLANTED COMMUNITY NETWORKS

Dynamics on social networks

slide-36
SLIDE 36

Echo chamber

Beliefs are amplified through interactions in segregated systems

slide-37
SLIDE 37

Echo chamber

Beliefs are amplified through interactions in segregated systems

slide-38
SLIDE 38

Echo chamber

Beliefs are amplified through interactions in segregated systems

Rich-get-richer Community structure

slide-39
SLIDE 39

Question

What is the consensus time given a rich-get-richer opinion formation and the level of intercommunity connectivity?

slide-40
SLIDE 40

Node Dynamic [Schoenebeck, Yu 18]

  • Fixed a graph 𝐻 = (𝑊, 𝐹) opinion set

{0,1}

  • Given an initial configuration

𝑌0:V ↦ {0,1}

  • At round t,
  • A node v is picked uniformly at random

The update of opinion only depends on the fraction of opinions amongst its neighbors 𝑠𝑌𝑢−1 𝑤 = 1 7

slide-41
SLIDE 41

Node Dynamic 𝐎𝐄(𝐻, 𝑔

𝑶𝑬, 𝑌𝟏)

  • Fixed a (weighted) graph 𝐻 = (𝑊, 𝐹)
  • pinion set {0,1}, an update function

𝒈𝑶𝑬

  • Given an initial configuration

𝑌0:V ↦ {0,1}

  • At round t,
  • A node v is picked uniformly at random
  • 𝒀𝒖 𝒘 = 1 w.p. 𝒈𝑶𝑬 𝒔𝒀𝒖−𝟐 𝒘

; = 0 otherwise

𝑠𝑌𝑢−1 𝑤 = 1 7

slide-42
SLIDE 42

Planted Community

  • A weighted complete graph with n nodes, 𝐿(𝑜, 𝑞)

– Two communities with equal size – An edge has weight 𝑞 if in the same community and 1 − 𝑞 o.w.

2p-1

slide-43
SLIDE 43

Planted Community

  • A weighted complete graph with n nodes, 𝐿(𝑜, 𝑞)

– Two communities with equal size – An edge has weight 𝑞 if in the same community and 1 − 𝑞 o.w.

𝜀 = 2𝑞 − 1 1 Complete graph Two isolated complete graphs

slide-44
SLIDE 44

Question

  • What is the interaction between rich-get-richer opinion

formation and the level of intercommunity connectivity?

slide-45
SLIDE 45

Question

  • What is the interaction between rich-get-richer opinion

formation and the level of intercommunity connectivity?

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 Voter Majority 3-Majority

𝜀 = 2𝑞 − 1

slide-46
SLIDE 46

Strong Community Structure

  • There exists an initial state such that the process cannot reach

consensus fast.

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 3-Majority

Large 𝜀

slide-47
SLIDE 47

Weak Community Structure

  • For all initial states, the process reaches consensus fast.

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 3-Majority

Small 𝜀

slide-48
SLIDE 48

Our Dichotomy Theorem

  • Given a smooth rich-get-richer function 𝑔

𝑂𝐸 ∈ 𝒟2, and a

planted community graph 𝐻 = 𝐿(𝑜, 𝑞). The maximum expected consensus time of ND(𝐻, 𝑔

𝑂𝐸, 𝑌𝟏) has two cases:

𝜀 = 2𝑞 − 1 𝜀∗ 1 𝑃(𝑜log 𝑜) exp(Ω(𝑜)) Complete graph Two isolated complete graphs

slide-49
SLIDE 49

Node dynamic

(1, 1) (0, 0) Ԧ 𝑦 2 𝑜

  • A Markov chain on 2-d grid
  • (0,0) and (1,1) are consensus

states

community 1 community 2 2 𝑜

slide-50
SLIDE 50

Our Dichotomy Theorem

𝜀∗ 1 𝜀′ ` 𝜀′′ Complete graph Two isolated complete graphs 𝜀 = 2𝑞 − 1

slide-51
SLIDE 51

Our Dichotomy Theorem

𝜀∗ 1 𝜀′ ` 𝜀′′ Complete graph Two isolated complete graphs saddle point attracting point repelling point

slide-52
SLIDE 52

Our Dichotomy Theorem

𝜀∗ 1 𝜀′ ` 𝜀′′ Complete graph Two isolated complete graphs

slide-53
SLIDE 53

Our Dichotomy Theorem

𝜀∗ 1 𝜀′ ` 𝜀′′ Attracting point saddle point repelling point

slide-54
SLIDE 54

Fast consensus

𝑌𝑙+1 − 𝑌𝑙 =

1 𝑜 (𝐺 𝑂𝐸 𝑌𝑙 + 𝑉 𝑌𝑙 ) reach an attracting fixed point in

𝑃(𝑜 log 𝑜)

𝜀∗ 1 𝜀′ ` 𝜀′′

slide-55
SLIDE 55

Question?