Differentiable Linearized ADMM Guangcan Liu , 2 Zhouchen Lin , 1 - PowerPoint PPT Presentation

ICML | 2019 Thirty-sixth International Conference on Machine Learning Differentiable Linearized ADMM Guangcan Liu  , 2 Zhouchen Lin  , 1 Xingyu Xie *, 1 Jianlong Wu *, 1 Zhisheng Zhong 1 1 Key Lab. of Machine Perception, School of EECS, Peking University 2 B-DAT and CICAEET, School of Automation, Nanjing University of Information Science and Technology

Background Optimization plays a very important role in learning  • Most machine learning problems are, in the end, optimization problems  SVM  K-Means min 𝑔 𝑦, data , 𝑡. 𝑢. 𝑦 ∈ Θ  … 𝑦  Deep Learning --- personal opinions: In general, what the computers can do is nothing more than “computation”. Thus, to assign them the ability to “learn”, it is often desirable to convert a “learning” problem into some kind of computational problem.  Question: Conversely, can optimization benefit from learning ?

Learning-based Optimization A traditional optimization algorithm is indeed an ultra-deep  network with fixed parameters min 𝑔 𝑦, data , 𝑡. 𝑢. 𝑦 ∈ Θ 𝑦 𝑢+1 = 𝑕(𝑦 𝑢 ) 𝑦 2 +𝜇 ∥ 𝑦 ∥ 1 min ∥ 𝑧 − 𝐵𝑦 ∥ 2 𝑦 𝑦 𝑢+1 = ℎ 𝜄 𝑋 𝑓 𝑧 + 𝑇𝑦 𝑢 𝑇 = 𝐽 − 𝐵 𝑈 𝐵 𝑓 = 𝐵 𝑈 𝜍 , 𝑋 𝜍 Learning-based optimization: Introduce learnable parameters and “reduce”  the network depth, so as to improve computational efficiency • Gregor K, Lecun Y. Learning fast approximations of sparse coding. ICML 2010. • P. Sprechmann, A. M. Bronstein, and G. Sapiro Learning, Efficient Sparse and Low Rank Models , TPAMI 2015 • Yan Yang, Jian Sun, Huibin Li, Zongben Xu. ADMM-Net: A deep learning approach for compressive sensing MRI, NeurIPS 2016. • Brandon Amos, J. Zico Kolter. OptNet: optimization method as a layer in neural network. ICML 2017.

Learning-based Optimization ( Con’t ) Limits of existing work  • In a theoretical point of view, it is unclear why learning can improve computational efficiency, as theoretical convergence analysis is extremely rare  X. Chen, J. Liu, Z. Wang, W. Yin, Theoretical linear convergence of unfolded ISTA and its practical weights and thresholds, NeurIPS, 2018. specific to unconstrained problems 

D-LADMM: Differentiable Linearized ADMM Target constrained problem: known convex LADMM (Lin et al, NeurIPS 2011) : D-LADMM: are learnable non-linear functions learnable param.:

D-LADMM ( Con’t ) Questions: Q1: Can D-LADMM guarantee to solve correctly the optimization problem? Q2: What are the benefits of D-LADMM? Q3: How to train the model of D-LADMM?

Main Assumption assumption required by assumption required by   LADMM: D-LADMM: generalized none-emptiness of Assumption 1

Theoretical Result I Q1: Can D-LADMM guarantee to solve correctly the optimization problem? A1: Yes! D- LADMM’s k -th layer output distance to the solution set solution set of original problem Theorem 1 and Theorem 2 [ Convergence and Monotonicity ] (informal).

Theoretical Result II Q2: What are the benefits of D-LADMM? A2: Converge faster! D-LADMM > LADMM Theorem 3 [ Convergence Rate ] (informal). If the original problem satisfies Error Bound Condition (condition on A and B ), then linear convergence General case (no EBC): Lemma 4.4 [ Faster Convergence ] (informal). Define operators: For any 𝜕 ,

Training Approaches Q3: How to train the model of D-LADMM? Unsupervised way: minimizing duality gap  where is the dual function. Global optimum is attained whenever the objective (duality gap) reaches zero! Supervised way: minimizing square loss  ground-truth 𝑎 ∗ and 𝐹 ∗ are provided along with the training samples

Experiments Target optimization problem Table 1. PSNR comparison on 12 images with noise rate 10%. 15-layer D-LADMM achieves a performance comparable to, or even slightly better than, the LADMM algorithm with 1500 iterations!

Conclusion Convergence: D-LADMM layer- wisely converges to the desired Theory solution set D-LADMM LADMM Speed: D-LADMM converges to the solution set faster than LADMM does Empiricism minimizing duality gap minimizing square loss (unsupervised) (supervised)

Differentiable Linearized ADMM Guangcan Liu , 2 Zhouchen Lin , 1 - PowerPoint PPT Presentation

ICML | 2019 Thirty-sixth International Conference on Machine Learning Differentiable Linearized ADMM Guangcan Liu , 2 Zhouchen Lin , 1 Xingyu Xie , 1 Jianlong Wu , 1 Zhisheng Zhong 1 1 Key Lab. of Machine Perception, School of EECS,

EWG on Maritime Security Brunei Darussalam and New Zealand Co-Chairs 2014 2017 ADMM-PLUS EWG

An Enriched Perspective on Differentiable Stacks Benjamin MacAdam Joint work with Jonathan

Linearized two-layers neural network in high dimensions Song Mei Stanford University May 26,

Flexible ADMM for Block-Structured Convex and Nonconvex Optimization Zhi-Quan (Tom) Luo Joint

On the Equivalence of Inexact Proximal ALM and ADMM for a Class of Convex Composite Programming

Incompressible limit of the linearized NavierStokes equations. N.A. Gusev 1 1 Moscow Institute

Learning with Differentiable Perturbed Optimizers Quentin Berthet Youth in High-dimensions -

Learning with Differentiable Perturbed Optimizers Quentin Berthet Optimization for ML - CIRM -

The Differentiable Curry Martin Abadi, Dan Belov, Gordon Plotkin, Richard Wei, Dimitrios Vytiniotis

Differentiable Rendering for Mesh and Implicit Surface Weikai Chen Tencent America GAMES

Learning to map between ferns with differentiable binary embedding networks Maximilian Blendowski

Relay : a high level differentiable IR Jared Roesch TVMConf December 12th, 2018 1 This

Reparameterization Gradient for Non-differentiable Models Wonyeol Lee Hangyeol Yu Hongseok

Differentiable Cloth Simulation for Inverse Problems Junbang Liang 1 Content Motivation

By: BGEN JESS LOMEDA (Ret) Co-Chair, ADMM+ EWG on CS Chief, MISS, DND, Philippines ANNEX 10

Multicore Implementation of LDPC Decoders based on ADMM Algorithm Imen DEBBABI 1 , Nadia KHOUJA 1

JUST THE MATHS SLIDES NUMBER 2.4 SERIES 4 (Further convergence and divergence) by

Architecture Knowledge Management: Challenges, Approaches, Tools M. Ali Babar 1 & Ian Gorton 2

Air Force Research Laboratory Trust in Automation Research: Current Directions and Gaps Date: 5

Linking Security with Economics Re-Empower Citizens & Companies to Secure Economic Growth

Poisson-Minibatching for Gibbs Sampling with Convergence Rate Guarantees Ruqi Zhang and

Solutions of Equations in One Variable Newtons Method Numerical Analysis (9th Edition) R L

Convergence of Filtered Spherical Harmonic Equations for Radiation Transport Martin Frank (RWTH)

Convergence to equilibrium for rough differential equations Samy Tindel Purdue University

Differentiable Linearized ADMM Guangcan Liu , 2 Zhouchen Lin , 1 - PowerPoint PPT Presentation

ICML | 2019 Thirty-sixth International Conference on Machine Learning Differentiable Linearized ADMM Guangcan Liu , 2 Zhouchen Lin , 1 Xingyu Xie *, 1 Jianlong Wu *, 1 Zhisheng Zhong 1 1 Key Lab. of Machine Perception, School of EECS,

EWG on Maritime Security Brunei Darussalam and New Zealand Co-Chairs 2014 2017 ADMM-PLUS EWG

An Enriched Perspective on Differentiable Stacks Benjamin MacAdam Joint work with Jonathan

Linearized two-layers neural network in high dimensions Song Mei Stanford University May 26,

Flexible ADMM for Block-Structured Convex and Nonconvex Optimization Zhi-Quan (Tom) Luo Joint

On the Equivalence of Inexact Proximal ALM and ADMM for a Class of Convex Composite Programming

Incompressible limit of the linearized NavierStokes equations. N.A. Gusev 1 1 Moscow Institute

Learning with Differentiable Perturbed Optimizers Quentin Berthet Youth in High-dimensions -

Learning with Differentiable Perturbed Optimizers Quentin Berthet Optimization for ML - CIRM -

The Differentiable Curry Martin Abadi, Dan Belov, Gordon Plotkin, Richard Wei, Dimitrios Vytiniotis

Differentiable Rendering for Mesh and Implicit Surface Weikai Chen Tencent America GAMES

Learning to map between ferns with differentiable binary embedding networks Maximilian Blendowski

Relay : a high level differentiable IR Jared Roesch TVMConf December 12th, 2018 1 This

Reparameterization Gradient for Non-differentiable Models Wonyeol Lee Hangyeol Yu Hongseok

Differentiable Cloth Simulation for Inverse Problems Junbang Liang 1 Content Motivation

By: BGEN JESS LOMEDA (Ret) Co-Chair, ADMM+ EWG on CS Chief, MISS, DND, Philippines ANNEX 10

Multicore Implementation of LDPC Decoders based on ADMM Algorithm Imen DEBBABI 1 , Nadia KHOUJA 1

JUST THE MATHS SLIDES NUMBER 2.4 SERIES 4 (Further convergence and divergence) by

Architecture Knowledge Management: Challenges, Approaches, Tools M. Ali Babar 1 &amp; Ian Gorton 2

Air Force Research Laboratory Trust in Automation Research: Current Directions and Gaps Date: 5

Linking Security with Economics Re-Empower Citizens &amp; Companies to Secure Economic Growth

Poisson-Minibatching for Gibbs Sampling with Convergence Rate Guarantees Ruqi Zhang and

Solutions of Equations in One Variable Newtons Method Numerical Analysis (9th Edition) R L

Convergence of Filtered Spherical Harmonic Equations for Radiation Transport Martin Frank (RWTH)

Convergence to equilibrium for rough differential equations Samy Tindel Purdue University

ICML | 2019 Thirty-sixth International Conference on Machine Learning Differentiable Linearized ADMM Guangcan Liu , 2 Zhouchen Lin , 1 Xingyu Xie , 1 Jianlong Wu , 1 Zhisheng Zhong 1 1 Key Lab. of Machine Perception, School of EECS,

Architecture Knowledge Management: Challenges, Approaches, Tools M. Ali Babar 1 & Ian Gorton 2

Linking Security with Economics Re-Empower Citizens & Companies to Secure Economic Growth