Escaping Saddle Points in Constant Dimensional Spaces: an - - PowerPoint PPT Presentation
Escaping Saddle Points in Constant Dimensional Spaces: an - - PowerPoint PPT Presentation
Escaping Saddle Points in Constant Dimensional Spaces: an Agent-based Modeling Perspective Grant Schoenebeck, University of Michigan Fang-Yi Yu , Harvard University Results Analyze the convergence rate of a family of stochastic processes
Results
- Analyze the convergence rate of
a family of stochastic processes
- Three related applications
– Evolutionary game theory – Dynamics on social networks – Stochastic Gradient Descent Evolutionary Game theory Dynamics
- n social
networks Stochastic Gradient Descent
Target Audience
Evolutionary Game theory Dynamics
- n social
networks Stochastic Gradient Descent
Target Audience
Evolutionary Game theory Dynamics
- n social
networks Stochastic Gradient Descent
Target Audience (still not-to-scale)
Evolutionary Game Theory Stochastic Gradient Descent Dynamics on social networks
Outline
- Escaping saddle point
Stochastic Gradient Descent Dynamics
- n social
networks Evolutionary game theory
Outline
- Escaping saddle point
- Case study: dynamics on social networks
Stochastic Gradient Descent Dynamics
- n social
networks Evolutionary game theory
ESCAPING SADDLE POINTS
Upper bounds and lower bounds
Reinforced random walk with 𝐺
A discrete time stochastic process {𝑌𝑙: 𝑙 = 0, 1, … } in ℝ𝑒 that admits the following representation, 𝑌𝑙+1 − 𝑌𝑙 = 1 𝑜 𝐺 𝑌𝑙 + 𝑉𝑙
𝑌𝑙 1 𝑜 𝐺(𝑌𝑙) 𝑌𝑙+1 1 𝑜 𝑉𝑙
Reinforced random walk with 𝐺
A discrete time stochastic process {𝑌𝑙: 𝑙 = 0, 1, … } in ℝ𝑒 that admits the following representation, 𝑌𝑙+1 − 𝑌𝑙 = 1 𝑜 𝐺 𝑌𝑙 + 𝑉𝑙
- Expected difference (drift), 𝐺 𝑌
𝑌𝑙 1 𝑜 𝐺(𝑌𝑙) 𝑌𝑙+1 1 𝑜 𝑉𝑙
Reinforced random walk with 𝐺
A discrete time stochastic process {𝑌𝑙: 𝑙 = 0, 1, … } in ℝ𝑒 that admits the following representation, 𝑌𝑙+1 − 𝑌𝑙 = 1 𝑜 𝐺 𝑌𝑙 + 𝑉𝑙
- Expected difference (drift), 𝐺 𝑌
- Unbiased noise (noise), 𝑉𝑙
𝑌𝑙 1 𝑜 𝐺(𝑌𝑙) 𝑌𝑙+1 1 𝑜 𝑉𝑙
Reinforced random walk with 𝐺
A discrete time stochastic process {𝑌𝑙: 𝑙 = 0, 1, … } in ℝ𝑒 that admits the following representation, 𝑌𝑙+1 − 𝑌𝑙 = 1 𝑜 𝐺 𝑌𝑙 + 𝑉𝑙
- Expected difference (drift), 𝐺 𝑌
- Unbiased noise (noise), 𝑉𝑙
- Step size, 1/𝑜
𝑌𝑙 1 𝑜 𝐺(𝑌𝑙) 𝑌𝑙+1 1 𝑜 𝑉𝑙
Examples
A discrete time Markov process {𝑌𝑙: 𝑙 = 0, 1, … } in ℝ𝑒 that admits the following representation, 𝑌𝑙+1 − 𝑌𝑙 = 1 𝑜 𝐺 𝑌𝑙 + 𝑉𝑙
- Agent based models with 𝑜 agents
– Evolutionary games – Dynamics on social networks
- Heuristic local search algorithms with uniform step size 1/𝑜
Node Dynamic on complete graphs [SY18]
- Let 𝑔
𝑂𝐸: 0,1 → [0,1]. 𝑜 agents interact on a complete graph
- Each agent 𝑤 has an initial binary state 𝐷0(v) ∈ {0,1}
- At round 𝑙,
- Pick a node 𝑤 uniformly at random
- Compute the fraction of opinion 1, 𝑌𝑙 =
𝐷𝑙
−1(1)
𝑜
- Update 𝐷𝑙+1(𝑤) to 1 w.p. 𝑔
𝑂𝐸 𝑌𝑙 ; 0 o.w.
<- Complete graph
Node Dynamic
Includes several existing dynamics
- Voter model
- Iterative majority [Mossel et al 14]
- Iterative 3-majority [Doerr et al 11]
0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1
Update functions
Voter Majority 3-Majority
Node Dynamic
Node dynamic on complete graphs
- Let 𝑔
𝑂𝐸: 0,1 → [0,1].
There are 𝑜 agents on a complete graph
- Each agent 𝑤 has an initial binary state
𝐷0(v) ∈ {0,1}
- At round 𝑙,
- Pick a node 𝑤 uniformly at random
- Compute the fraction of opinion 1, 𝑌𝑙 =
𝐷𝑙
−1(1)
𝑜
- Update 𝐷𝑙+1(𝑤) to 1 w.p. 𝑔
𝑂𝐸 𝑌𝑙 ; 0 o.w.
Reinforced random walk on ℝ
- 𝑌𝑙 be the fraction of nodes in state 1
at 𝑙.
Node Dynamic
Node dynamic on complete graphs
- Let 𝑔
𝑂𝐸: 0,1 → [0,1].
There are 𝑜 agents on a complete graph
- Each agent 𝑤 has an initial binary state
𝐷0(v) ∈ {0,1}
- At round 𝑙,
- Pick a node 𝑤 uniformly at random
- Compute the fraction of opinion 1, 𝑌𝑙 =
𝐷𝑙
−1(1)
𝑜
- Update 𝐷𝑙+1(𝑤) to 1 w.p. 𝑔
𝑂𝐸 𝑌𝑙 ; 0 o.w.
Reinforced random walk on ℝ
- 𝑌𝑙 be the fraction of nodes in state 1 at 𝑙.
- Given 𝑌𝑙, the expected number of nodes in
state 1 after round 𝑙, is E[𝑜𝑌𝑙+1 ∣ 𝑌𝑙] = 𝑜𝑌𝑙 + (𝑔
𝑂𝐸 𝑌𝑙 − 𝑌𝑙).
Node Dynamic
Node dynamic on complete graphs
- Let 𝑔
𝑂𝐸: 0,1 → [0,1].
There are 𝑜 agents on a complete graph
- Each agent 𝑤 has an initial binary state
𝐷0(v) ∈ {0,1}
- At round 𝑙,
- Pick a node 𝑤 uniformly at random
- Compute the fraction of opinion 1, 𝑌𝑙 =
𝐷𝑙
−1(1)
𝑜
- Update 𝐷𝑙+1(𝑤) to 1 w.p. 𝑔
𝑂𝐸 𝑌𝑙 ; 0 o.w.
Reinforced random walk on ℝ
- 𝑌𝑙 be the fraction of nodes in state 1 at 𝑙.
- Given 𝑌𝑙, the expected number of nodes in
state 1 after round 𝑙, is E[𝑜𝑌𝑙+1 ∣ 𝑌𝑙] = 𝑜𝑌𝑙 + (𝑔
𝑂𝐸 𝑌𝑙 − 𝑌𝑙).
Updated to 1 from 1
Node Dynamic
Node dynamic on complete graphs
- Let 𝑔
𝑂𝐸: 0,1 → [0,1].
There are 𝑜 agents on a complete graph
- Each agent 𝑤 has an initial binary state
𝐷0(v) ∈ {0,1}
- At round 𝑙,
- Pick a node 𝑤 uniformly at random
- Compute the fraction of opinion 1, 𝑌𝑙 =
𝐷𝑙
−1(1)
𝑜
- Update 𝐷𝑙+1(𝑤) to 1 w.p. 𝑔
𝑂𝐸 𝑌𝑙 ; 0 o.w.
Reinforced random walk on ℝ
- 𝑌𝑙 be the fraction of nodes in state 1 at 𝑙.
- E 𝑌𝑙+1
𝑌𝑙 − 𝑌𝑙 =
1 𝑜 (𝑔 𝑂𝐸 𝑌𝑙 − 𝑌𝑙).
Drift 𝐺(𝑌𝑙)
Node Dynamic
Node dynamic on complete graphs
- Let 𝑔
𝑂𝐸: 0,1 → [0,1].
There are 𝑜 agents on a complete graph
- Each agent 𝑤 has an initial binary state
𝐷0(v) ∈ {0,1}
- At round 𝑙,
- Pick a node 𝑤 uniformly at random
- Compute the fraction of opinion 1, 𝑌𝑙 =
𝐷𝑙
−1(1)
𝑜
- Update 𝐷𝑙+1(𝑤) to 1 w.p. 𝑔
𝑂𝐸 𝑌𝑙 ; 0 o.w.
Reinforced random walk on ℝ
- 𝑌𝑙 be the fraction of nodes in state 1 at 𝑙.
- 𝑌𝑙+1 − 𝑌𝑙 =
1 𝑜
𝑔
𝑂𝐸 𝑌𝑙 − 𝑌𝑙 + 𝑉𝑙 .
Drift Noise
Question
Given 𝐺 and 𝑉, what is the limit of 𝑌𝑙 for sufficiently large 𝑜? 𝑌𝑙+1 − 𝑌𝑙 = 1 𝑜 𝐺 𝑌𝑙 + 𝑉𝑙
Mean field approximation
𝑌𝑙+1 − 𝑌𝑙 = 1 𝑜 (𝐺 𝑌𝑙 + 𝑉 𝑌𝑙 )
𝑦′ = 𝐺(𝑦)
Mean field approximation
If 𝑜 is large enough, for 𝑙 = 𝑃(𝑜), 𝑌𝑙 ≈ 𝑦
𝑙 𝑜 by Wormald et al 95.
Regular point
If 𝑜 is large enough, for 𝑙 = 𝑃(𝑜), 𝑌𝑙 ≈ 𝑦
𝑙 𝑜 .
Fixed point, 𝑮 𝒚∗ = 𝟏
If 𝑜 is large enough, for 𝑙 = 𝑃(𝑜), 𝑌𝑙 ≈ 𝑦
𝑙 𝑜 .
Escaping non-attracting fixed point
When can the process escape a non-attracting fixed point?
Escaping non-attracting fixed point
When can the process escape a non-attracting fixed point?
- 1. Θ 𝑜
- 2. Θ(𝑜 log 𝑜)
- 3. Θ 𝑜 log 𝑜 4
- 4. Θ 𝑜2
Escaping non-attracting fixed point
When can the process escape a non-attracting fixed point?
- 1. Θ 𝑜
2.
- 2. 𝚰(𝒐 𝒎𝒑𝒉 𝒐)
- 3. Θ 𝑜 log 𝑜 4
- 4. Θ 𝑜2
Lower bound
Escaping saddle point region takes at least Ω(𝑜 log 𝑜) steps.
𝑌0 = 𝑦∗ 𝜗
Upper bound
Escaping saddle point region takes at most O(𝑜 log 𝑜) steps. If
𝑌0 = 𝑦∗ 𝜗reg 𝑌𝑈, 𝑈 = 𝑃(𝑜 log 𝑜)
Upper bound
Escaping saddle point region takes at most O(𝑜 log 𝑜) steps. If
- Noise, 𝑉𝑙
– Martingale difference – bounded – Noisy (covariance matrix is large)
- Expected difference, 𝐺 ∈ 𝒟2
– 𝑦∗ is hyperbolic
𝑌0 = 𝑦∗ 𝜗reg 𝑌𝑈, 𝑈 = 𝑃(𝑜 log 𝑜)
Gradient-like dynamics
Converges to an attracting fixed-point region in O(𝑜 log 𝑜) steps.
If
- Noise, 𝑉𝑙
– Martingale difference – bounded – Noisy
- Expected difference, 𝐺 ∈ 𝒟2
– Fixed points are hyperbolic – Potential function
Outline
- Escaping saddle point
Stochastic Gradient Descent Dynamics
- n social
networks Evolutionary game theory
Outline
- Escaping saddle point
- Case study: dynamics on social networks
Stochastic Gradient Descent Dynamics
- n social
networks Evolutionary game theory
(DIS)AGREEMENT IN PLANTED COMMUNITY NETWORKS
Dynamics on social networks
Echo chamber
Beliefs are amplified through interactions in segregated systems
Echo chamber
Beliefs are amplified through interactions in segregated systems
Echo chamber
Beliefs are amplified through interactions in segregated systems
Rich-get-richer Community structure
Question
What is the consensus time given a rich-get-richer opinion formation and the level of intercommunity connectivity?
Node Dynamic [Schoenebeck, Yu 18]
- Fixed a graph 𝐻 = (𝑊, 𝐹) opinion set
{0,1}
- Given an initial configuration
𝑌0:V ↦ {0,1}
- At round t,
- A node v is picked uniformly at random
The update of opinion only depends on the fraction of opinions amongst its neighbors 𝑠𝑌𝑢−1 𝑤 = 1 7
Node Dynamic 𝐎𝐄(𝐻, 𝑔
𝑶𝑬, 𝑌𝟏)
- Fixed a (weighted) graph 𝐻 = (𝑊, 𝐹)
- pinion set {0,1}, an update function
𝒈𝑶𝑬
- Given an initial configuration
𝑌0:V ↦ {0,1}
- At round t,
- A node v is picked uniformly at random
- 𝒀𝒖 𝒘 = 1 w.p. 𝒈𝑶𝑬 𝒔𝒀𝒖−𝟐 𝒘
; = 0 otherwise
𝑠𝑌𝑢−1 𝑤 = 1 7
Planted Community
- A weighted complete graph with n nodes, 𝐿(𝑜, 𝑞)
– Two communities with equal size – An edge has weight 𝑞 if in the same community and 1 − 𝑞 o.w.
2p-1
Planted Community
- A weighted complete graph with n nodes, 𝐿(𝑜, 𝑞)
– Two communities with equal size – An edge has weight 𝑞 if in the same community and 1 − 𝑞 o.w.
𝜀 = 2𝑞 − 1 1 Complete graph Two isolated complete graphs
Question
- What is the interaction between rich-get-richer opinion
formation and the level of intercommunity connectivity?
Question
- What is the interaction between rich-get-richer opinion
formation and the level of intercommunity connectivity?
0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 Voter Majority 3-Majority
𝜀 = 2𝑞 − 1
Strong Community Structure
- There exists an initial state such that the process cannot reach
consensus fast.
0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 3-Majority
Large 𝜀
Weak Community Structure
- For all initial states, the process reaches consensus fast.
0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 3-Majority
Small 𝜀
Our Dichotomy Theorem
- Given a smooth rich-get-richer function 𝑔
𝑂𝐸 ∈ 𝒟2, and a
planted community graph 𝐻 = 𝐿(𝑜, 𝑞). The maximum expected consensus time of ND(𝐻, 𝑔
𝑂𝐸, 𝑌𝟏) has two cases:
𝜀 = 2𝑞 − 1 𝜀∗ 1 𝑃(𝑜log 𝑜) exp(Ω(𝑜)) Complete graph Two isolated complete graphs
Node dynamic
(1, 1) (0, 0) Ԧ 𝑦 2 𝑜
- A Markov chain on 2-d grid
- (0,0) and (1,1) are consensus
states
community 1 community 2 2 𝑜
Our Dichotomy Theorem
𝜀∗ 1 𝜀′ ` 𝜀′′ Complete graph Two isolated complete graphs 𝜀 = 2𝑞 − 1
Our Dichotomy Theorem
𝜀∗ 1 𝜀′ ` 𝜀′′ Complete graph Two isolated complete graphs saddle point attracting point repelling point
Our Dichotomy Theorem
𝜀∗ 1 𝜀′ ` 𝜀′′ Complete graph Two isolated complete graphs
Our Dichotomy Theorem
𝜀∗ 1 𝜀′ ` 𝜀′′ Attracting point saddle point repelling point
Fast consensus
𝑌𝑙+1 − 𝑌𝑙 =
1 𝑜 (𝐺 𝑂𝐸 𝑌𝑙 + 𝑉 𝑌𝑙 ) reach an attracting fixed point in
𝑃(𝑜 log 𝑜)
𝜀∗ 1 𝜀′ ` 𝜀′′