Conditional Expectation as the Basis for Bayesian Updating Hermann - PowerPoint PPT Presentation

Conditional Expectation as the Basis for Bayesian Updating Hermann G. Matthies Bojana V. Rosi´ c, Elmar Zander, Alexander Litvinenko, Oliver Pajonk Institute of Scientific Computing, TU Braunschweig Brunswick, Germany wire@tu-bs.de http://www.wire.tu-bs.de 15 Cond-Exp.tex,v 2.9 2017/07/18 00:35:08 hgm Exp

2 Overview 1. BIG DATA 2. Parameter identification 3. Stochastic identification — Bayes’s theorem 4. Conditional probability and conditional expectation 5. Updating — filtering. TU Braunschweig Institute of Scientific Computing

3 Representation of knowledge Data from measurements, sensors, observations ⇒ one form of knowledge about a system. ‘Big Data’ considers only data — looking for patterns, interpolating, etc. Mathematical / computational models of a system represent another form of knowledge — ‘structural’ knowledge — about a system. These models are often generated based on general physical laws (e.g. conservation laws), a very compressed form of knowledge. These two views on systems are not in competition, they are complementary. The challenge is to combine these forms of knowledge — in form of a synthesis. Knowledge may be uncertain. TU Braunschweig Institute of Scientific Computing

4 Big Data 16th century Johannes Kepler Isaac Newton Tycho Brahe Pierre-Simon Laplace (1571 – 1630) (1643 – 1727) (1546 – 1601) (1749 – 1827) Description Understanding Data Perfection I. Newton: The latest authors, like the most ancient, strove to subordinate the phenomena of nature to the laws of mathematics. Kepler’s 2nd law: (adapted from M. Ortiz) TU Braunschweig Institute of Scientific Computing

5 BIG DATA Mathematically speaking, big data algorithms (feature / pattern recognition) are regression (generalised interpolation) methods. Often based on deep artificial neural networks (deep ANNs), combining many inputs (= high-dimensional data). Deep networks are connected to sparse tensor decompositions (buzzword: deep-learning). Although often spectacularly successful, as knowledge representation, it is difficult to extract insight. But there is a connection of such regression to Bayesian updating. TU Braunschweig Institute of Scientific Computing

6 Inference Our uncertain knowledge about some situation is described by probabilities. Now we obtain new information. How does it change our knowledge — the probabilistic description? Answered by T. Bayes and P.-S. Laplace more than 250 years ago. Thomas Bayes Pierre-Simon Laplace (1701 – 1761) (1749 – 1827) TU Braunschweig Institute of Scientific Computing

7 Synopsis of Bayesian inference We have a some knowledge about an event A , but it can not be observed directly. After some new information B (an observation, a measurement), our knowledge has to be made consistent with the new information, i.e. we are looking for conditional probabilities P ( A|B ) . The idea is to change our present model by just so much — as little as possible — so that it becomes consistent. For this we have to predict — with our present knowledge / model — the probability of all possible observations and compare with the actual observation. TU Braunschweig Institute of Scientific Computing

8 Model inverse problem Geometry flow = 0 Sources 2 flow out 1.5 Dirichlet b.c. 0 1 0.5 0.5 1 1.5 0 2 Aquifier 2D Model Governing model equations: ̺ ∂u ∈ G ⊂ R d . ∂t − ∇ · ( κ · ∇ u ) = f Parameter q = log κ . Conductivity field κ , initial condition u 0 , and state u ( t ) may be unknown. They have to be determined from observations Y ( q ; u ) . TU Braunschweig Institute of Scientific Computing

9 A possible realisation of κ ( x, ω ) TU Braunschweig Institute of Scientific Computing

10 Measurement patches 1 1 0.5 0.5 0 0 −0.5 −0.5 −1 −1 −1 0 1 −1 0 1 447 measurement patches 239 measurement patches 1 1 0.5 0.5 0 0 −0.5 −0.5 −1 −1 −1 0 1 −1 0 1 120 measurement patches 10 measurement patches TU Braunschweig Institute of Scientific Computing

11 Convergence plot of updates 0 10 447 pt 239 pt 120 pt Relative error ε a 60 pt 10 pt −1 10 −2 10 0 1 2 3 4 Number of sequential updates TU Braunschweig Institute of Scientific Computing

12 Forecast and assimilated pdfs 6 κ f κ a 4 PDF 2 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 κ Forecast and assimilated probability density functions (pdfs) for κ at a point where κ t = 2 . TU Braunschweig Institute of Scientific Computing

13 Setting for identification General idea: We observe / measure a system, whose structure we know in principle. The system behaviour depends on some quantities (parameters), which we do not know ⇒ uncertainty. We model (uncertainty in) our knowledge in a Bayesian setting: as a probability distribution on the parameters. We start with what we know a priori, then perform a measurement. This gives new information, to update our knowledge (identification). Update in probabilistic setting works with conditional probabilities ⇒ Bayes’s theorem. Repeated measurements lead to better identification. TU Braunschweig Institute of Scientific Computing

14 Mathematical formulation of model Consider operator equation, physical system modelled by A : u ∈ U , d u + A ( u ; q ) d t = g d t + B ( u ; q ) d W U — space of states, g a forcing, W noise, q ∈ Q unknown parameters. Well-posed problem: for q, g and initial cond. u ( t 0 ) = u 0 a unique solution u ( t ) , given by the flow or solution operator, S : ( u 0 , t 0 , q, g, W, t ) �→ u ( t ; q ) = S ( u 0 , t 0 , q, g, W, t ) . Set extended state ξ = ( u, q ) ∈ X = U × Q , advance from ξ n − 1 = ( u n − 1 , q n − 1 ) at time t n − 1 to ξ n = ( u n , q n ) at t n , ξ n =( u n , q n ) = ( S ( u n − 1 , t n − 1 , q n , g, W, t n ) , q n ) =: f ( ξ n − 1 , w n − 1 ) . This is the model for the system observed at times t n . Applies also to stationary case A ( u ; q ) = g . TU Braunschweig Institute of Scientific Computing

15 Mathematical formulation of observation Measurement operator Y with values in Y : η n = Y ( u n ; q ) = Y ( S ( u n − 1 , t n − 1 , q, g, W, t n ); q ) . But observed at time t n , it is noisy y n with noise ǫ n y n = H ( η n , ǫ n ) = H ( Y ( u n ; q ) , ǫ n ) =: h ( ξ n , ǫ n ) = h ( f ( ξ n − 1 , w n − 1 ) , ǫ n ) . For given g, w , measurement η = Y ( u ( q ); q ) is just a function of q . This function is usually not invertible ⇒ ill-posed problem, measurement η does not contain enough information. Parameters q and initial state u 0 uncertain, modelled as RVs q ∈ Q = Q ⊗ S ⇒ u ∈ U = U ⊗ S , with e.g. S = L 2 ( Ω, P ) a RV-space. Bayesian setting allows updating of information about ξ = ( u, q ) . The problem of updating becomes well-posed. TU Braunschweig Institute of Scientific Computing

16 Mathematical formulation of filtering We want to track the extended state ξ n by a tracking equation for a RV x n through observations ˆ y n . • Prediction / forecast state is a RV x n,f = f ( x n − 1 , w n − 1 ) ; • Forecast observation is a RV y n = h ( x n,f , ǫ n ) , actual observation ˆ y n , • Updated / assimilated x n = x n,f + Ξ ( x n,f , y n , ˆ y n ) , • Hopefully x n ≈ ξ n , and the update map Ξ has to be determined. x n,i := Ξ ( x n,f , y n , ˆ y n ) is called the innovation. We concentrate on one step from forecast to assimilated variables. • Forecast state x f := x n,f , forecast observation y f := y n , • Actual observation ˆ y and assimilated variable x a := x f + Ξ ( x f , y f , ˆ y ) = x n = x n,f + Ξ ( x n,f , y n,f , ˆ y n ) . This is the filtering or update equation. TU Braunschweig Institute of Scientific Computing

17 Setting for updating Knowledge prior to new observation is also called forecast: the state u f ∈ U = U ⊗ S and parameters q f ∈ Q = Q ⊗ S modelled as random variables (RVs), also the extended state x f = ( u f , q f ) ∈ X = X ⊗ S and the measurement y ( x f , ε ) ∈ Y = Y ⊗ S . Then an observation ˆ y is performed, and is compared to predicted measurement y ( x f , ε ) . Bayes’s theorem gives only probability distribution of posterior or assimilated extended state x a . Here we want more: a filter x a := x f + Ξ ( x f , y f , ˆ y ) . TU Braunschweig Institute of Scientific Computing

18 Using Bayes’s theorem Classically, Bayes’s theorem gives conditional probability P ( I x |M y ) = P ( M y |I x ) P ( I x ) for P ( M y ) > 0 . P ( M y ) Well-known special form with densities of RVs x, y (w.r.t. some background measure µ ): = π ( y | x ) ( y | x ) π ( x | y ) ( x | y ) = π xy ( x, y ) π x ( x ); π y ( y ) Z y � with marginal density Z y := π y ( y ) = X π xy ( x, y ) µ (d x ) (from German Zustandssumme) — only valid when π xy ( x, y ) exists. Problems / paradoxa appear when P ( M y ) = 0 (and P ( M y |I x ) = 0 ) e.g. Borel-Kolmogorov paradox. Problem is limit P ( M y ) → 0 , or when no joint density π xy ( x, y ) exists. TU Braunschweig Institute of Scientific Computing

Conditional Expectation as the Basis for Bayesian Updating Hermann - PowerPoint PPT Presentation

Conditional Expectation as the Basis for Bayesian Updating Hermann G. Matthies Bojana V. Rosi c, Elmar Zander, Alexander Litvinenko, Oliver Pajonk Institute of Scientific Computing, TU Braunschweig Brunswick, Germany wire@tu-bs.de

CS70: Jean Walrand: Lecture 27. Expectation; Conditional Expectation; B(n, p); G(p) 1. Review of

more on expectation 1 2 properties of expectation properties of expectation Linearity, II

CS70: Jean Walrand: Lecture 34. Conditional Expectation CS70: Jean Walrand: Lecture 34.

Lecture 8: Conditional Expectation Ziyu Shao School of Information Science and Technology

Expectation Will Perkins January 21, 2013 Expectation Definition The expectation of a random

11/15/16 Conditional distributions Let X and Y be discrete r.v.s. Conditional probability mass

Approximating Probabilistic Bisimulation by Introduction Background Conditional Expectation

E [ Y | X = x ] = yPr [ Y = y | X = x ] Warning: This lecture is rated R. Definition Let X and

CS70: Jean Walrand: Lecture 34. Conditional Expectation 1. Review: joint distribution, LLSE 2.

CS70: Jean Walrand: Lecture 35. Conditional Expectation, Continuous Probability Warning: This

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Updating Autonomous Start to an Updating Autonomous Start to an RTK Field Survey (Part II) RTK

Updating Autonomous Start to an Updating Autonomous Start to an RTK Field Survey RTK Field

18.175: Lecture 26 More on martingales Scott Sheffield MIT 18.175 Lecture 26 1 Outline Conditional

Bayesian(Updating( Peter(Bossaerts,(Caltech( Goals( Relation(With(Reinforcement(Learning(

Asymmetric Bayesian Personalized Ranking for One-Class Collaborative Filtering Shan Ouyang, Lin

CSci 8980: Advanced Topics in Graphical Models Expectation Propagation Instructor: Arindam

Temporal probability models Chapter 15, Sections 15 Chapter 15, Sections 15 1 Outline

Introduction Maani Ghaffari January 8, 2020 Robotics Systems: How and Why?

Estimation of DSGE models St ephane Adjemian e du Maine, GAINS & CEPREMAP Universit

Ergodicity and Accuracy in Optimal Particle Filters David Kelly and Andrew Stuart Courant

Localization of Simultaneous Moving Sound Sources for Mobile Robot Using a Frequency-Domain

Stochastic particle methods in Bayesian statistical learning P. Del Moral (INRIA team ALEA) INRIA

Conditional Expectation as the Basis for Bayesian Updating Hermann - PowerPoint PPT Presentation

Conditional Expectation as the Basis for Bayesian Updating Hermann G. Matthies Bojana V. Rosi c, Elmar Zander, Alexander Litvinenko, Oliver Pajonk Institute of Scientific Computing, TU Braunschweig Brunswick, Germany wire@tu-bs.de

CS70: Jean Walrand: Lecture 27. Expectation; Conditional Expectation; B(n, p); G(p) 1. Review of

more on expectation 1 2 properties of expectation properties of expectation Linearity, II

CS70: Jean Walrand: Lecture 34. Conditional Expectation CS70: Jean Walrand: Lecture 34.

Lecture 8: Conditional Expectation Ziyu Shao School of Information Science and Technology

Expectation Will Perkins January 21, 2013 Expectation Definition The expectation of a random

11/15/16 Conditional distributions Let X and Y be discrete r.v.s. Conditional probability mass

Approximating Probabilistic Bisimulation by Introduction Background Conditional Expectation

E [ Y | X = x ] = yPr [ Y = y | X = x ] Warning: This lecture is rated R. Definition Let X and

CS70: Jean Walrand: Lecture 34. Conditional Expectation 1. Review: joint distribution, LLSE 2.

CS70: Jean Walrand: Lecture 35. Conditional Expectation, Continuous Probability Warning: This

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Updating Autonomous Start to an Updating Autonomous Start to an RTK Field Survey (Part II) RTK

Updating Autonomous Start to an Updating Autonomous Start to an RTK Field Survey RTK Field

18.175: Lecture 26 More on martingales Scott Sheffield MIT 18.175 Lecture 26 1 Outline Conditional

Bayesian(Updating( Peter(Bossaerts,(Caltech( Goals( Relation(With(Reinforcement(Learning(

Asymmetric Bayesian Personalized Ranking for One-Class Collaborative Filtering Shan Ouyang, Lin

CSci 8980: Advanced Topics in Graphical Models Expectation Propagation Instructor: Arindam

Temporal probability models Chapter 15, Sections 15 Chapter 15, Sections 15 1 Outline

Introduction Maani Ghaffari January 8, 2020 Robotics Systems: How and Why?

Estimation of DSGE models St ephane Adjemian e du Maine, GAINS &amp; CEPREMAP Universit

Ergodicity and Accuracy in Optimal Particle Filters David Kelly and Andrew Stuart Courant

Localization of Simultaneous Moving Sound Sources for Mobile Robot Using a Frequency-Domain

Stochastic particle methods in Bayesian statistical learning P. Del Moral (INRIA team ALEA) INRIA

Estimation of DSGE models St ephane Adjemian e du Maine, GAINS & CEPREMAP Universit