Probabilistic Numerics Part II Linear Algebra and Nonlinear - PowerPoint PPT Presentation

Probabilistic Numerics – Part II – Linear Algebra and Nonlinear Optimization Philipp Hennig MLSS 2015 20 / 07 / 2015 Emmy Noether Group on Probabilistic Numerics Department of Empirical Inference Max Planck Institute for Intelligent Systems Tübingen, Germany

Probabilistic Numerics Recap from Saturday On Saturday ▸ computation is inference ▸ classic methods for integration and solution of differential equations can be interpreted as MAP inference from Gaussian models ▸ customizing the implicit prior gives faster, tailored numerics ▸ probabilistic formulation allows propagation of uncertainty through composite computations 1 ,

Linear Algebra Ax = b A ∈ R N × N symmetric positive definite A x \ b 2 ,

Why you should care about linear algebra least-squares: a most basic machine learning task A − 1 A f ( x ) = k xX ( k XX + σ 2 I ) − 1 b = k xX A − 1 b ˆ 3 ,

Inference on Matrix Elements generic Gaussian priors [Hennig, SIOPT, 2015] ▸ prior on elements of inverse H = A − 1 ∈ R N × N with Σ ∈ R N 2 × N 2 p ( H ) = N(� H ; � ⇀ ⇀ ( 2 π ) N 2 / 2 ∣ Σ ∣ 1 / 2 exp [(� � � � ⇀ Σ − 1 (� � � � ⇀ ⊺ H 0 , Σ ) = 1 H − H 0 ) H − H 0 )] ▸ can collect noise-free observations p ( S,Y ∣ H ) = δ ( S − HY ) AS = Y S = HY ∈ R N × M ⇔ ▸ a linear projection: (using the Kronecker product) S km = ∑ S = ( I ⊗ Y ⊺ ) � H = C � C ∈ R NM × N 2 � ⇀ � ⇀ ⇀ ⇀ δ ki Y jm H ij . H ij ▸ posterior: p ( H ∣ S,Y ) = N [ � H 0 + Σ C ⊺ (C Σ C ⊺ ) − 1 ( � S − C H 0 ) , Σ − Σ C ⊺ (C Σ C ⊺ ) − 1 C Σ ] H ; � ⇀ ⇀ � � � � ⇀ ▸ requires O( N 3 M ) operations! Need structure in Σ 4 ,

p ( H ∣ S,Y ) = N [� H ; � ⇀ H 0 + Σ C ⊺ (C Σ C ⊺ ) − 1 (� ⇀ � � � � ⇀ S − C H 0 ) , Σ − Σ C ⊺ (C Σ C ⊺ ) − 1 C Σ ] ▸ good probabilistic numerical methods must have both ▸ low computational cost ▸ meaningful prior assumptions 5 ,

A factorization assumption with support on all matrices = + D ⊺ H 0 H C ⋅ ▸ cov ( H ij ,H kℓ ) = V ik W jℓ p ( H ) = N( H ; H 0 ,V ⊗ W ) ⇒ ▸ if V,W ≻ 0 , this puts nonzero mass on all H ∈ R N × N var ( H ij ) = V ii W jj ▸ draw n columns of C iid. from N( C ∶ i ;0 ,V / n ) ▸ draw n columns of D iid. from N( D ∶ i ;0 ,W / n ) 6 ,

A Structured Prior computation requires trading expressivity and cost [Hennig, SIOPT, 2015] ▸ prior p ( H ) = N(� H ; � ⇀ ⇀ H 0 ,V ⊗ W ) gives p ( H ∣ S,Y ) = N[ H ; H 0 + ( S − H 0 Y )( Y ⊺ WY ) − 1 Y ⊺ W, V ⊗ ( W − WY ( Y ⊺ WY ) − 1 Y ⊺ W )] H true H M Y A S ⇒ 7 ,

A Structured Prior computation requires trading expressivity and cost [Hennig, SIOPT, 2015] ▸ prior p ( H ) = N(� H ; � ⇀ ⇀ H 0 ,V ⊗ W ) gives p ( H ∣ S,Y ) = N[ H ; H 0 + ( S − H 0 Y )( Y ⊺ WY ) − 1 Y ⊺ W, V ⊗ ( W − WY ( Y ⊺ WY ) − 1 Y ⊺ W )] H true H M Y A S ⇒ ▸ two problems: ▸ still requires O( M 3 ) inversion just to compute mean ↝ would like diagonal Y ⊺ WY (conjugate observations) ▸ how to choose H 0 , V, W to get well-scaled prior? ↝ ‘empirical Bayesian’ choice to include H 7 ,

A Scaled Prior probabilistic computation needs meaningful priors [Hennig, SIOPT, 2015] ▸ using H 0 = ǫI with ǫ ≪ 1 . It would be nice to have W = V = H : var ( H ) ij = V ii W jj = H ii H jj for symmetric positive definite matrices, H ii > 0 , H 2 ij ≤ H ii H jj ▸ if W = V = H , p ( H ∣ S,Y ) = N[ H ; H 0 + ( S − H 0 Y )( Y ⊺ WY ) − 1 Y ⊺ W, V ⊗ ( W − WY ( Y ⊺ WY ) − 1 Y ⊺ W )] 8 ,

A Scaled Prior probabilistic computation needs meaningful priors [Hennig, SIOPT, 2015] ▸ using H 0 = ǫI with ǫ ≪ 1 . It would be nice to have W = V = H : var ( H ) ij = V ii W jj = H ii H jj for symmetric positive definite matrices, H ii > 0 , H 2 ij ≤ H ii H jj ▸ if W = V = H , p ( H ∣ S,Y ) = N[ H ; H 0 + ( S − H 0 Y )( Y ⊺ S ) − 1 S ⊺ , W ⊗ ( W − S ( Y ⊺ S ) − 1 S ⊺ )] 8 ,

A Scaled Prior probabilistic computation needs meaningful priors [Hennig, SIOPT, 2015] ▸ using H 0 = ǫI with ǫ ≪ 1 . It would be nice to have W = V = H : var ( H ) ij = V ii W jj = H ii H jj for symmetric positive definite matrices, H ii > 0 , H 2 ij ≤ H ii H jj ▸ if W = V = H , p ( H ∣ S,Y ) = N[ H ; H 0 + ( S − H 0 Y )( Y ⊺ S ) − 1 S ⊺ , W ⊗ ( W − S ( Y ⊺ S ) − 1 S ⊺ )] ▸ can choose conjugate directions S ⊺ AS = S ⊺ Y = diag i { g i } using Gram-Schmidt process. Choose orthogonal set { u 1 ,...,u N } y ⊺ i − 1 j u i s i = u i − ∑ s j y ⊺ j s j j = 1 then ( s m − H 0 y m ) s ⊺ M E ∣ S,Y [ H ] = H 0 + ∑ m y ⊺ m s m i = 1 8 ,

Active Learning of Matrix Inverses Gaussian Elimination [C.F . Gauss, 1809] which set of orthogonal directions should we choose? ▸ e.g. { u 1 ,...,u N } = { e 1 ,...,e N } ∣ S ∣ ∣ Y ∣ p ( H ) ∣ A ⋅ H M ∣ H true 9 ,

Active Learning of Matrix Inverses Gaussian Elimination [C.F . Gauss, 1809] which set of orthogonal directions should we choose? ▸ e.g. { u 1 ,...,u N } = { e 1 ,...,e N } ∣ S ∣ ∣ Y ∣ p ( H ) ∣ A ⋅ H M ∣ H true Gaussian eliminiation of A is maximum a-posteriori estimation of H under a well-scaled Gaussian prior, if the search directions are chosen from the unit vectors. 9 ,

Gaussian elimination as MAP inference: ▸ decide to use Gaussian prior ▸ factorization assumption (Kronecker structure) in covariance gives simple update ▸ implicitly choosing “ W = H ” gives well-scaled prior ▸ conjugate directions for efficient bookkeeping ▸ construct projections from unit vectors 10 ,

What about Uncertainty? calibrating prior covariance at runtime [Hennig, SIOPT, 2015] under “ W = H ” p ( H ∣ S,Y ) = N[ H ; H 0 + ( S − H 0 Y )( Y ⊺ S ) − 1 S ⊺ ,W ⊗ ( W − S ( Y ⊺ S ) − 1 S ⊺ )] just need WY = S . So choose W = S ( Y ⊺ S ) − 1 S ⊺ + ( I − Y ( Y ⊺ Y ) − 1 Y ⊺ ) Ω ( I − Y ( Y ⊺ Y ) − 1 Y ⊺ ) 1 0 . 8 m s m 0 . 6 y ⊺ 0 . 4 0 . 2 0 5 10 15 20 25 30 step m 11 ,

What about Uncertainty? calibrating prior covariance at runtime [Hennig, SIOPT, 2015] under “ W = H ” p ( H ∣ S,Y ) = N[ H ; H 0 + ( S − H 0 Y )( Y ⊺ S ) − 1 S ⊺ ,W ⊗ ( W − S ( Y ⊺ S ) − 1 S ⊺ )] just need WY = S . So choose W = S ( Y ⊺ S ) − 1 S ⊺ + ( I − Y ( Y ⊺ Y ) − 1 Y ⊺ ) Ω ( I − Y ( Y ⊺ Y ) − 1 Y ⊺ ) W M for W 0 = H W M for W 0 estimated 11 ,

▸ scaled, structured prior, exploration along unit vectors gives Gaussian elimination ▸ empirical Bayesian estimation of covariance gives scaled posterior uncertainty, retains classic estimate, at very low cost overhead 12 ,

Can we do better than Gaussian Elimination? encode symmetry H = H ⊺ [Hennig, SIOPT, 2015] ▸ Using Γ � H = 1 / 2 (� ⇀ H + H ⊺ ) , p ( symm. ∣ H ) = lim β � 0 N( 0;Γ � � � � ⇀ ⇀ H,β ) p ( H ∣ symm. ) = N(� H ; � ⇀ ⇀ H 0 ,W ⊗ ⊖ W ) ( W ⊗ ⊖ W ) ij,kℓ = 1 / 2 ( W ik W jℓ + W iℓ W jk ) H ∼ N ( H 0 , W ⊗ W ) H ∼ N ( H 0 , W ⊗ ⊖ W ) ▸ p ( S,Y ∣ H ) = δ ( S − HY ) now gives ( ∆ = S − H 0 Y , G = Y ⊺ WY ) p ( H ∣ S,Y ) = N[ H ; H 0 + ∆ G − 1 Y ⊺ W + WY G − 1 ∆ ⊺ − WY G − 1 ∆ ⊺ Y G − 1 Y ⊺ W, ( W − WY G − 1 Y ⊺ W )⊗ ⊖( W − WY G − 1 Y ⊺ W )] 13 ,

Probabilistic Numerics Part II Linear Algebra and Nonlinear - PowerPoint PPT Presentation

Probabilistic Numerics Part II Linear Algebra and Nonlinear Optimization Philipp Hennig MLSS 2015 20 / 07 / 2015 Emmy Noether Group on Probabilistic Numerics Department of Empirical Inference Max Planck Institute for Intelligent

Probabilistic Numerics Part I Integration and Differential Equations Philipp Hennig

1. Foundations of Numerics from Advanced Mathematics 1. Foundations of Numerics from Advanced

Chapter 1 What is Linear Algebra? Chapter 1 What is Linear Algebra? The study of linear

Probabilistic Numerics Uncertainty in Computation Philipp Hennig ParisBD 9 May 2017 Research

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE

Linear Algebra Linear algebra has become as basic and as applicable as calculus, and

Lecture 14: Dense Linear Algebra David Bindel 18 Oct 2010 Where we are This week: dense

PV Math Department MCL Vision Credit Options Credit General General/Post- College Honors

Sub-Riemannian geometry and numerics for SDEs Charles Curry May 9, 2019 SDE numerics The CMT

LLVM Numerics Improvements Michael C. Berg, Apple LLVM Developers Meeting, Brussels,

9. Hardware-Aware Numerics Approaching supercomputing ... 9. Hardware-Aware Numerics Numerical

MATRICES AND LINEAR ALGEBRA Linear Algebra Matrix manipulation is the original essence of

Matrices Basic Linear Algebra Overview Lecture will cover why matrices and linear algebra

Linear algebra explained in four pages Excerpt from the N O BULLSHIT GUIDE TO LINEAR ALGEBRA by

Expressive Linear Algebra in Haskell Henning Thielemann 2019-08-21 Expressive Linear Algebra in

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Runtime Verification of Executable Models Fernando Macas fernando.macias@hib.no Adrian Rutle

Forward-Looking Statements And Other I nformation From time to time, the Bank makes written and

New organisations in Norway Bled, Slovenia 13.05.2006 Leiv Bjarte Mjs lbm@hib.no Norway in

Placemaking towards a collective vision Tom McCulloch: Community First Oxfordshire

EMF Models getting XXL? An overview of available solutions Jonathan Lasalle Benoit Viaud ARTAL

Berkeley Heights School District Anti-Bullying Bill of Rights Reporting Period 1 Results June

Injection Attacks on Server (Section 7.3 in book + some extra stuff; Note: we skipped 7.2 for

Maintenance of Chronobiological Information by P System Mediated Assembly of Control Units for

Probabilistic Numerics Part II Linear Algebra and Nonlinear - PowerPoint PPT Presentation

Probabilistic Numerics Part II Linear Algebra and Nonlinear Optimization Philipp Hennig MLSS 2015 20 / 07 / 2015 Emmy Noether Group on Probabilistic Numerics Department of Empirical Inference Max Planck Institute for Intelligent

Probabilistic Numerics Part I Integration and Differential Equations Philipp Hennig

1. Foundations of Numerics from Advanced Mathematics 1. Foundations of Numerics from Advanced

Chapter 1 What is Linear Algebra? Chapter 1 What is Linear Algebra? The study of linear

Probabilistic Numerics Uncertainty in Computation Philipp Hennig ParisBD 9 May 2017 Research

Graphics 2014 Linear Algebra II Linear Maps &amp; Matrices Linear Maps &amp; Matrices CORE

Linear Algebra Linear algebra has become as basic and as applicable as calculus, and

Lecture 14: Dense Linear Algebra David Bindel 18 Oct 2010 Where we are This week: dense

PV Math Department MCL Vision Credit Options Credit General General/Post- College Honors

Sub-Riemannian geometry and numerics for SDEs Charles Curry May 9, 2019 SDE numerics The CMT

LLVM Numerics Improvements Michael C. Berg, Apple LLVM Developers Meeting, Brussels,

9. Hardware-Aware Numerics Approaching supercomputing ... 9. Hardware-Aware Numerics Numerical

MATRICES AND LINEAR ALGEBRA Linear Algebra Matrix manipulation is the original essence of

Matrices Basic Linear Algebra Overview Lecture will cover why matrices and linear algebra

Linear algebra explained in four pages Excerpt from the N O BULLSHIT GUIDE TO LINEAR ALGEBRA by

Expressive Linear Algebra in Haskell Henning Thielemann 2019-08-21 Expressive Linear Algebra in

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Runtime Verification of Executable Models Fernando Macas fernando.macias@hib.no Adrian Rutle

Forward-Looking Statements And Other I nformation From time to time, the Bank makes written and

New organisations in Norway Bled, Slovenia 13.05.2006 Leiv Bjarte Mjs lbm@hib.no Norway in

Placemaking towards a collective vision Tom McCulloch: Community First Oxfordshire

EMF Models getting XXL? An overview of available solutions Jonathan Lasalle Benoit Viaud ARTAL

Berkeley Heights School District Anti-Bullying Bill of Rights Reporting Period 1 Results June

Injection Attacks on Server (Section 7.3 in book + some extra stuff; Note: we skipped 7.2 for

Maintenance of Chronobiological Information by P System Mediated Assembly of Control Units for

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE