Nonlinear Stein Variational Gradient Descent for Learning - PowerPoint PPT Presentation

Nonlinear Stein Variational Gradient Descent for Learning Diversified Mixture Models Dilin Wang Qiang Liu Department of Computer Science The University of Texas at Austin Dilin Wang and Qiang Liu Nonlinear SVGD 1 / 8

Learning Mixture Models Learning mixture models by maximum likelihood: � � �� m 1 � Θ = { θ i } m max F (Θ) := E x ∼D log p ( x | θ i ) , i =1 . m Θ i =1 Challenges : Optimization highly non-convex. Promoting diversification increases robustness [e.g., Borodin, 2009; xie et al., 2018] . Our work : A variational view + entropic regularization. Optimized by generalizing stein variational gradient descent [Liu, Wang 16] . Dilin Wang and Qiang Liu Nonlinear SVGD 2 / 8

Learning Diversified Infinite Mixtures Step 1 : Relaxing to learning infinite mixtures : � � �� max F [ ρ ] := E x ∼ D log E θ ∼ ρ [ p ( x | θ ) ] ρ � �� infinite mixture models m � Reduces to finite case when ρ := δ θ i / m i =1 Step 2 : Add entropy regularization to enforce diversity: max J [ ρ ] := F [ ρ ] + α H [ ρ ] , ρ � Entropy: H [ ρ ] = − ρ log ρ . Dilin Wang and Qiang Liu Nonlinear SVGD 3 / 8

Learning Diversified Infinite Mixtures Step 1 : Relaxing to learning infinite mixtures : � � �� max F [ ρ ] := E x ∼ D log E θ ∼ ρ [ p ( x | θ ) ] ρ � �� infinite mixture models m � Reduces to finite case when ρ := δ θ i / m i =1 Step 2 : Add entropy regularization to enforce diversity: max J [ ρ ] = F [ ρ ] + α H [ ρ ] , ρ �� likelihood diversity (nonlinear functional) (entropy) A difficult problem to solve. Achieved by generalizing Stein variational gradient descent (SVGD) [Liu, Wang 16] . Dilin Wang and Qiang Liu Nonlinear SVGD 3 / 8

Nonlinear SVGD: Derivation Want to approximate max J [ ρ ] = F [ ρ ] + α H [ ρ ] . ρ Approximate it with ρ := � i δ θ i / m . Iteratively update { θ i } to yield steepest descent on J [ ρ ]: φ ∗ ≈ arg max θ ′ φ ∈F ( J [ ρ ′ ] − J [ ρ ]) i ← θ i + ǫφ ( θ i ) , ρ ′ is the density of updated θ ′ i . F is the unit ball of a reproducing kernel Hilbert space (RKHS), with a positive definite kernel k ( θ i , θ j ). Dilin Wang and Qiang Liu Nonlinear SVGD 4 / 8

Yields a Simple Algorithm Starting from an initial { θ i } , repeat: � � θ i ← θ i + ǫ ˆ ∇ θ j F (Θ) k ( θ i , θ j ) + α ∇ θ j k ( θ i , θ j ) , ∀ i E θ j ∼ ρ � �� weighted sum of gradient repulsive force ∇ θ j F (Θ): the gradient of standard log likelihood. � Return ρ = δ θ i / m . i In comparison, gradient descent of standard log likelihood is θ i ← θ i + ǫ ∇ θ i F (Θ) , ∀ i Dilin Wang and Qiang Liu Nonlinear SVGD 5 / 8

Deep Embedded Clustering AE+ k -means DEPICT (Dizaji et al., 2017) Ours Figure: 2D-visualization with PCA on MNIST. DEC JULE DEPICT Ours Xie et al., 2016 Yang et al., 2016 Dizaji et al., 2017 NMI 0.816 0.913 0.917 0.933 ACC 0.844 0.964 0.965 0.974 Table: Results on MNIST. Dilin Wang and Qiang Liu Nonlinear SVGD 6 / 8

Deep Anomaly Detection Applied our method to improve deep anomaly detection. Method Precision Recall F1 DSEBM Zhai et al., 2016 0.7369 0.7477 0.7423 DCN Yang et al., 2017 0.7696 0.7829 0.7762 DAGMM-p Zong et al., 2018 0.7579 0.7710 0.7644 DAGMM-NVI Zong et al., 2018 0.9290 0.9447 0.9368 DAGMM Zong et al., 2018 0.9297 0.9442 0.9369 Ours 0.9659 0.9490 0.9573 Table: Results on KDDCUP99 dataset Dilin Wang and Qiang Liu Nonlinear SVGD 7 / 8

Conclusions 1 A new method to learn diversified mixture models 2 Generalizing Stein variational gradient descent (SVGD) 3 Simple and practical! Poster #231. Today 06:30 – 09:00 PM @ Pacific Ballroom Thank You Dilin Wang and Qiang Liu Nonlinear SVGD 8 / 8

Nonlinear Stein Variational Gradient Descent for Learning - PowerPoint PPT Presentation

Nonlinear Stein Variational Gradient Descent for Learning Diversified Mixture Models Dilin Wang Qiang Liu Department of Computer Science The University of Texas at Austin Dilin Wang and Qiang Liu Nonlinear SVGD 1 / 8 Learning Mixture

CS 6316 Machine Learning Gradient Descent Yangfeng Ji Department of Computer Science University

Applied Machine Learning Gradient Descent Methods Siamak Ravanbakhsh COMP 551 (Fall 2020)

Learning to learn by gradient descent by gradient descent Liyan Jiang July 18, 2019 1

Conjugate Gradient (CG) Majid Lesani Alireza Masoum Overview Backpropagation Gradient

Machine Learning (CSE 446): Gradient Descent and Stochastic Gradient Descent Sham M Kakade

Quantile Stein Variational Gradient Descent for Batch Bayesian Optimization Chengyue Gong [1] Jian

Stochastic Gradient Descent (SGD) Todays Class Stochastic Gradient Descent (SGD) SGD Recap

Painless Stochastic Gradient Descent : Interpolation, Line-Search, and Convergence Rates. MLSS

Gradient Descent Michail Michailidis & Patrick Maiden Outline

LOGISTIC REGRESSION, GRADIENT LOGISTIC REGRESSION, GRADIENT DESCENT, NEWTON DESCENT, NEWTON

Painless Stochastic Gradient Descent : Interpolation, Line-Search, and Convergence Rates. NeurIPS

Fitting Neural Networks Gradient Descent and Stochastic Gradient Descent CS109A Introduction to

Applied Machine Learning Applied Machine Learning Gradient Descent Methods Siamak Ravanbakhsh

Applied Machine Learning Applied Machine Learning Gradient Descent Methods Siamak Ravanbakhsh

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS

Conjugate gradient training algorithm Steepest descent algorithm Definitions: So far: j

Theoretical Background for Aerodynamic Shape Optimization John C. Vassberg Antony Jameson

6 Optimization This chapter provides a self-contained overview of some of the basic tools needed

1 Gradient descent with fixed step In this section, we discuss a gradient descent method with

CS475/CM375 Lecture 8: Oct 6, 2011 Iterative Methods Reading: [Saad] Chapt 4 CS475/CM375 (c) 2011

STK-IN4300 Statistical Learning Methods in Data Science Statistical Boosting Boosting as a

Stochastic Gradient Methods for Neural Networks Chih-Jen Lin National Taiwan University Last

Op Optimization for Machine Learning: g: Be Beyon ond St Stoc ochastic Gradient Descent

Deep Learning: Theory and Practice Matrix Calculus 31-1-2019 Linear and Logistic Regression

Nonlinear Stein Variational Gradient Descent for Learning - PowerPoint PPT Presentation

Nonlinear Stein Variational Gradient Descent for Learning Diversified Mixture Models Dilin Wang Qiang Liu Department of Computer Science The University of Texas at Austin Dilin Wang and Qiang Liu Nonlinear SVGD 1 / 8 Learning Mixture

CS 6316 Machine Learning Gradient Descent Yangfeng Ji Department of Computer Science University

Applied Machine Learning Gradient Descent Methods Siamak Ravanbakhsh COMP 551 (Fall 2020)

Learning to learn by gradient descent by gradient descent Liyan Jiang July 18, 2019 1

Conjugate Gradient (CG) Majid Lesani Alireza Masoum Overview Backpropagation Gradient

Machine Learning (CSE 446): Gradient Descent and Stochastic Gradient Descent Sham M Kakade

Quantile Stein Variational Gradient Descent for Batch Bayesian Optimization Chengyue Gong [1] Jian

Stochastic Gradient Descent (SGD) Todays Class Stochastic Gradient Descent (SGD) SGD Recap

Painless Stochastic Gradient Descent : Interpolation, Line-Search, and Convergence Rates. MLSS

Gradient Descent Michail Michailidis &amp; Patrick Maiden Outline

LOGISTIC REGRESSION, GRADIENT LOGISTIC REGRESSION, GRADIENT DESCENT, NEWTON DESCENT, NEWTON

Painless Stochastic Gradient Descent : Interpolation, Line-Search, and Convergence Rates. NeurIPS

Fitting Neural Networks Gradient Descent and Stochastic Gradient Descent CS109A Introduction to

Applied Machine Learning Applied Machine Learning Gradient Descent Methods Siamak Ravanbakhsh

Applied Machine Learning Applied Machine Learning Gradient Descent Methods Siamak Ravanbakhsh

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS

Conjugate gradient training algorithm Steepest descent algorithm Definitions: So far: j

Theoretical Background for Aerodynamic Shape Optimization John C. Vassberg Antony Jameson

6 Optimization This chapter provides a self-contained overview of some of the basic tools needed

1 Gradient descent with fixed step In this section, we discuss a gradient descent method with

CS475/CM375 Lecture 8: Oct 6, 2011 Iterative Methods Reading: [Saad] Chapt 4 CS475/CM375 (c) 2011

STK-IN4300 Statistical Learning Methods in Data Science Statistical Boosting Boosting as a

Stochastic Gradient Methods for Neural Networks Chih-Jen Lin National Taiwan University Last

Op Optimization for Machine Learning: g: Be Beyon ond St Stoc ochastic Gradient Descent

Deep Learning: Theory and Practice Matrix Calculus 31-1-2019 Linear and Logistic Regression

Gradient Descent Michail Michailidis & Patrick Maiden Outline