Derivative-Free Methods for Machine Learning Tasks Inverse Problem - PowerPoint PPT Presentation

EnKF Kovachki & Stuart Derivative-Free Methods for Machine Learning Tasks Inverse Problem Formulations The Ensemble Kalman Filter Ensemble Kalman Filter Numerics Nikola B. Kovachki 1 Andrew M. Stuart 1 1 Computational and Mathematical Sciences California Institute of Technology Inverse Problems and Machine Learning February 9-11th, 2018

Table of Contents EnKF Kovachki & Stuart Inverse Problem Formulations 1 Inverse Problem Formulations Ensemble Kalman Filter Numerics 2 Ensemble Kalman Filter 3 Numerics

Supervised Learning EnKF • Data : { ( x j , y j ) } N j =1 with x j ∈ X , y j ∈ Y and X , Y Hilbert spaces. Kovachki & • Find : G ( u |· ) : X → Y for parameter u ∈ U consistent with the data. Stuart • Concatenate : Inverse Problem Formulations y = G( u | x) + η Ensemble where G( ·| x) : U → Y N and η is model or data error. Kalman Filter Numerics • Losses : Φ( u ; x , y) N 1 � 2 � y − G( u | x) � 2 Y N + R ( u ) or − � y j , log G ( u | , x j ) � Y + R ( u ) j =1 • Standard Solution (SGD) : u = −∇ u Φ( u ; x , y); ˙ u (0) = u 0 u ∗ = u ( T )

Example EnKF • Classification : Kovachki & Stuart Inverse Problem Formulations Ensemble Kalman Filter Numerics • NLP :

Online Supervised Learning EnKF • Data : As before possibly with N = ∞ . Kovachki & • Dynamic : For j = 0 , 1 , 2 , . . . Stuart Inverse Problem u j +1 = u j Formulations Ensemble y j +1 = G ( u j +1 | x j +1 ) + η j +1 Kalman Filter Numerics • Find : u j given Y j = { y k } j k =1 and update sequentially. • Loss : Φ( u ; x , y ) 1 2 � y − G ( u | x ) � 2 Y + R ( u ) or − � y , log G ( u | x ) � Y + R ( u ) • Standard Solution (OGD) : u = −∇ u Φ( u ; x j +1 , y j +1 ); ˙ u (0) = u j u j +1 = u ( T j )

Example EnKF • Model Improvement : Kovachki & Stuart Inverse Problem Formulations Ensemble Kalman Filter Numerics • Stream Data :

Semi-Supervised Learning (on a graph) EnKF Kovachki & Bertozzi and Flenner 2012. (MMS) Stuart Bertozzi, Luo, Stuart, Zygalakis 2017. (preprint) Inverse Problem • Data : { x j } j ∈ Z and { y j } j ∈ Z ′ with Z ′ ⊂ Z and | Z ′ | ≪ | Z | . Formulations • Find : u : Z → R m such that Ensemble Kalman Filter Numerics ∀ j ∈ Z ′ y j = S ( u ( j )) + η j S : R m → Y is pre-specified. • Loss : 1 � � y j − S ( u ( j )) � 2 Φ( u ; x , y) = Y + R ( u ; x) 2 γ 2 j ∈ Z ′ • Standard Solution : Probit (convex optimization) or MCMC (Bayesian).

Example EnKF Kovachki & Stuart • Clustering : Inverse Problem Formulations Ensemble Kalman Filter Numerics

Continous-time EnKF EnKF Kantas, Beskos, Jasra, (2014) (JUQ) Kovachki & Iglesias, Law and Stuart, 2013. (IP) Stuart • Inverse Problem : Inverse Problem η ∼ N (0 , Γ) y = G( u ) + η Formulations Ensemble u ∼ µ 0 ( u ) Kalman Filter Numerics • Sequential Monte Carlo (SMC) : µ n ( du ) ∝ exp( − nh Φ( u ; y )) µ 0 ( du ) • Approximate SMC (EnKF) : u ( j ) n +1 = u ( j ) n + C uw ( u n )( C ww ( u n ) + Γ) − 1 (y − G( u ( j ) n )) • Continous-time : Γ �→ 1 h Γ, h → 0 J u ( j ) = − 1 � � G( u ( k ) ) − ¯ G , G( u ( j ) ) − y � Γ u ( k ) ˙ J k =1

Approximate Natural Gradient Decent EnKF Amari, 1998. (NC) Kovachki & Stuart • Linear : G( · ) = A · u ( j ) = − C ( u ) ∇ u Φ( u ( j ) , y) ˙ Inverse Problem Formulations where Ensemble Kalman Filter J Numerics C ( u ) = 1 Φ( u , y) = 1 ( u ( j ) − ¯ u ) ⊗ ( u ( j ) − ¯ � 2 � y − Au � 2 u ); Γ J j =1 • Natural Gradient Decent : u = − F − 1 ( u ) ∇ u Φ( u , y) ˙ • Cram´ er-Rao : u ] � F − 1 ( u ) Cov[ˆ

Long-time Linear Behavior EnKF Schillings and Stuart 2017. (SINUM) Kovachki & Stuart Theorem Inverse Problem Suppse G( · ) = A · and that y is the image of a truth u † under A. Define Formulations r ( j ) ( t ) = u ( j ) ( t ) − u † then (under some assumptions) Ensemble Kalman Filter Numerics Ar ( j ) ( t ) = Ar ( j ) � ( t ) + Ar ( j ) ⊥ ( t ) with Ar ( j ) u (0)) } and Ar ( j ) ∈ span { A ( u ( j ) (0) − ¯ ⊥ ∈ span { A ( u ( j ) (0) − ¯ u (0)) } ⊥ . � Furthermore Ar ( j ) � ( t ) → 0 as t → ∞ Ar ( j ) ⊥ ( t ) = Ar (1) ⊥ (0) ∀ t ≥ 0 .

Arbitrary Loss EnKF • Non-linear : Kovachki & u ( j ) = − C uw ( u )Γ − 1 (G( u ( j ) ) − y ) Stuart ˙ = − C uw ( u ) ∇ z Ψ(G( u ( j ) ) , y) Inverse Problem Formulations J Ensemble = − 1 � � G( u ( k ) ) − ¯ Kalman Filter G , ∇ z Ψ(G( u ( j ) ) , y) � u ( k ) J Numerics k =1 J C uw ( u ) = 1 Ψ(z , y) = 1 ( u ( j ) − ¯ � u ) ⊗ (G( u ( j ) ) − ¯ 2 � y − z � 2 G); Γ J j =1 • Concatenate : u = [ u (1) , . . . , u ( J ) ] u = − D ( u ) u ˙ where D ( jk ) ( u ) = 1 J � G( u ( k ) ) − ¯ G , ∇ z Ψ(G( u ( j ) ) , y) �

Nesterov Momentum EnKF Kovachki & Su, Boyd, Cand´ es 2014. (NIPS) Stuart • Momentum : Inverse Problem Formulations   u + 3 u n +1 = v n − h ∇ f ( v n ) ¨ t ˙ u + ∇ f ( u ) = 0 Ensemble     Kalman Filter n ⇐ ⇒ v n +1 = u n +1 + n +3 ( u n +1 − u n ) u (0) = 0 ˙ Numerics   v 0 = u 0 u (0) = u 0   • Modified Limit : u ( j ) + 3 u ( j ) = − C uw ( u ) ∇ z Ψ(G( u ( j ) ) , y)  ¨ t ˙   u ( j ) (0) = 0 ˙ u ( j ) (0) = u ( j )   0

Discrete Scheme EnKF Kovachki & • Concatenate : u = [ u (1) , . . . , u ( J ) ] Stuart Inverse Problem u + 3 Formulations ¨ t ˙ u = − D ( u ) u Ensemble Kalman Filter where Numerics D ( jk ) ( u ) = 1 J � G( u ( k ) ) − ¯ G , ∇ z Ψ(G( u ( j ) ) , y) � • Discretize :  u n +1 = v n − h n D ( v n ) v n   n n +3 ( u n +1 − u n ) v n +1 = u n +1 + v 0 = u 0 = [ u (1) 0 , . . . , u ( J )  0 ] T 

Initialization, Noise, and Predictions EnKF • Initial Ensemble : Kovachki & u (1) 0 , . . . , u ( J ) ∼ µ 0 ( u ) Stuart 0 Inverse Problem • Noise (Supervised) : Formulations Ensemble v ( j ) n +1 = v ( j ) + ξ ( j ) ξ ( j ) Kalman Filter ˜ n +1 ∼ µ n +1 ( u ) n n +1 Numerics � Cov[ µ n +1 ] ∝ h n Cov[ µ 0 ] • Ensemble Refresh (Online) : u ( j ) u n + ξ ( j ) ξ ( j ) n +1 = ¯ n +1 ∼ µ 0 ( u ) n +1 • Predict : J u n +1 = 1 u ( j ) � ¯ n +1 J j =1

Complete Algorithm EnKF Kovachki & Stuart • Mini-batch data (at each step): Inverse Problem l } m l } m Formulations x n = { x i ( n ) y n = { y i ( n ) l =1 l =1 Ensemble Kalman Filter where { i ( n ) 1 , . . . , i ( n ) Numerics m } ⊆ { 1 , . . . , N } . • Compute : use discrete scheme as shown with x �→ x n y �→ y n • Step-size : adaptive h h n = ǫ + � D n �

Convolutional Models EnKF Ba, Kiros and Hinton 2016. (NIPS) Kovachki & Stuart Net 1 ∼ 14k Net 2 ∼ 30k Conv12x3x3 Conv12x3x3 Inverse Problem Formulations MaxPool2x2 Conv12x3x3 Ensemble MaxPool2x2 Kalman Filter Conv24x3x3 Conv24x3x3 Numerics MaxPool2x2 Conv24x3x3 MaxPool2x2 Conv32x3x3 Conv32x3x3 MaxPool2x2 Conv32x3x3 FC-100 FC-100 FC-10 FC-10 • ReLU applied after each block. • Layer Normalization applied after each convolutional layer.

MNIST Dataset EnKF LeCun and Cortes 1999. Kovachki & Stuart Inverse Problem Formulations Ensemble Kalman Filter Numerics

MNIST Supervised EnKF Kovachki & Stuart Inverse Problem Formulations Ensemble Kalman Filter Numerics Figure: Test Accuracy of Net 1 on MNIST (batched). J Loss Momentum Noise Refresh 5000 Cross Entropy � � χ

MNIST Online EnKF Kovachki & Stuart Inverse Problem Formulations Ensemble Kalman Filter Numerics Figure: Test Accuracy of Net 1 on MNIST (online). J Loss Momentum Noise Refresh 5000 Cross Entropy χ χ �

Fashion MNIST Dataset EnKF Xiao, Rasul and Vollgraf 2017. Kovachki & Stuart Inverse Problem Formulations Ensemble Kalman Filter Numerics

Fashion MNIST Supervised EnKF Kovachki & Stuart Inverse Problem Formulations Ensemble Kalman Filter Numerics Figure: Test Accuracy of Net 2 on Fashion MNIST (batched). J Loss Momentum Noise Refresh 5000 Cross Entropy � � χ

RNN EnKF Kovachki & Stuart Inverse Problem Formulations Ensemble Kalman Filter Numerics

Time Series Online EnKF Kovachki & Stuart Inverse Problem Formulations Ensemble Kalman Filter Numerics Figure: Time Series prediction with a RNN. J Loss Momentum Noise Refresh 1000 MSE χ χ χ

Derivative-Free Methods for Machine Learning Tasks Inverse Problem - PowerPoint PPT Presentation

EnKF Kovachki & Stuart Derivative-Free Methods for Machine Learning Tasks Inverse Problem Formulations The Ensemble Kalman Filter Ensemble Kalman Filter Numerics Nikola B. Kovachki 1 Andrew M. Stuart 1 1 Computational and Mathematical

Geometric Interpretation of the Derivative (Review) Geometric Interpretation of the Derivative

2. Theory of the Derivative 2.1 Tangent Lines 2.2 Definition of Derivative 2.3 Rates of Change

Derivative Function Math 132 Stewart 2.2 In Notes 2.1, we defined the derivative of a

Adjoint Derivative Computation Moritz Diehl and Carlo Savorgnan Adjoint Derivative Computation

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Shared Memory Programming with OpenMP Lecture 6: Tasks What are tasks? Tasks are

Scheduling Aperiodic Tasks Background Scheduling Treat aperiodic tasks as lowest-priority

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Securities & Securities & Derivative Derivative Litigation Report Litigation Report

Securities & Securities & Derivative Derivative Litigation Repor t t Litigation Repor

Securities & Securities & Derivative Derivative Litigation Report Litigation Report

Securities Board of India Guest Lecture Convergence of Derivative and Cash Markets Andrew Sheng

Beautiful differentiation Conal Elliott LambdaPix 1 September, 2009 ICFP Conal Elliott

Numerical Solutions to Partial Differential Equations Zhiping Li LMAM and School of Mathematical

Numerical differentiation A new type of error that Piecewise & High-d Polys arises in

Rate of Change Click here to go to the lab titled "Derivatives Exploration: y = x 2 "

Quarterly D&O Claims Trends: Q3 2016 Sponsored By: About Advisen Advisen delivers: the

Pathwise derivatives, DDPG, Multigoal RL Katerina Fragkiadaki Part of the slides on path wise

Slide 4 / 213 Slide 4 (Answer) / 213 Slide 5 / 213 Derivatives Exploration Exploration into the

Listed company crime, misconduct and the SFC Mr Andrew Sheng Chairman Securities & Futures

Derivative-Free Methods for Machine Learning Tasks Inverse Problem - PowerPoint PPT Presentation

EnKF Kovachki & Stuart Derivative-Free Methods for Machine Learning Tasks Inverse Problem Formulations The Ensemble Kalman Filter Ensemble Kalman Filter Numerics Nikola B. Kovachki 1 Andrew M. Stuart 1 1 Computational and Mathematical

Geometric Interpretation of the Derivative (Review) Geometric Interpretation of the Derivative

2. Theory of the Derivative 2.1 Tangent Lines 2.2 Definition of Derivative 2.3 Rates of Change

Derivative Function Math 132 Stewart 2.2 In Notes 2.1, we defined the derivative of a

Adjoint Derivative Computation Moritz Diehl and Carlo Savorgnan Adjoint Derivative Computation

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Shared Memory Programming with OpenMP Lecture 6: Tasks What are tasks? Tasks are

Scheduling Aperiodic Tasks Background Scheduling Treat aperiodic tasks as lowest-priority

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Securities &amp; Securities &amp; Derivative Derivative Litigation Report Litigation Report

Securities &amp; Securities &amp; Derivative Derivative Litigation Repor t t Litigation Repor

Securities &amp; Securities &amp; Derivative Derivative Litigation Report Litigation Report

Securities Board of India Guest Lecture Convergence of Derivative and Cash Markets Andrew Sheng

Beautiful differentiation Conal Elliott LambdaPix 1 September, 2009 ICFP Conal Elliott

Numerical Solutions to Partial Differential Equations Zhiping Li LMAM and School of Mathematical

Numerical differentiation A new type of error that Piecewise &amp; High-d Polys arises in

Rate of Change Click here to go to the lab titled &quot;Derivatives Exploration: y = x 2 &quot;

Quarterly D&amp;O Claims Trends: Q3 2016 Sponsored By: About Advisen Advisen delivers: the

Pathwise derivatives, DDPG, Multigoal RL Katerina Fragkiadaki Part of the slides on path wise

Slide 4 / 213 Slide 4 (Answer) / 213 Slide 5 / 213 Derivatives Exploration Exploration into the

Listed company crime, misconduct and the SFC Mr Andrew Sheng Chairman Securities &amp; Futures

Securities & Securities & Derivative Derivative Litigation Report Litigation Report

Securities & Securities & Derivative Derivative Litigation Repor t t Litigation Repor

Securities & Securities & Derivative Derivative Litigation Report Litigation Report

Numerical differentiation A new type of error that Piecewise & High-d Polys arises in

Rate of Change Click here to go to the lab titled "Derivatives Exploration: y = x 2 "

Quarterly D&O Claims Trends: Q3 2016 Sponsored By: About Advisen Advisen delivers: the

Listed company crime, misconduct and the SFC Mr Andrew Sheng Chairman Securities & Futures