SLIDE 25 Equilibrium Models 1 on MNIST (Proof of Concept)
min
πΏ=(π΅,πΆ,π),π π
β
π=1
CE(π₯π(πΏ)β€π, π§π), π₯π(πΏ) = ππ(π₯π(πΏ), πΏ) = tanh(π΅π₯π(πΏ) + πΆπ¦π + π)
200 400 600 Time (s) 10β5 10β4 10β3 10β2
Objective
CG β CG FP β FP ITD β ITD 10 20 30 40 50 Iterations Γ100 0.925 0.930 0.935 0.940 0.945
Test accuracy
10 20 30 40 50 Iterations Γ100 10β5 10β4 10β3 10β2 10β1
Hypergradient norm ||g(Ξ»)||
10β3 10β2 10β1 100 Learning rate 0.88 0.89 0.90 0.91 0.92 0.93 0.94 0.95
Test accuracy vs learning rate
ππ(β
, πΏ) NOT a contraction for β methods.
- When ππ(β
, πΏ) is a contraction all the methods perform similarly.
- ITD is the most stable when the contraction assumption is not satisfied.
1Shaojie Bai, J Zico Kolter, and Vladlen Koltun. βDeep equilibrium modelsβ. In Advances in Neural Information Processing Systems. 2019, pp. 688β699.
16