> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Probabilistic Fitting Marcel Lüthi, University of Basel Slides based on presentation by Sandro Schönborn 1
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Outline • Bayesian inference • Fitting using Markov Chain Monte Carlo • Exercise: MCMC in Scalismo • Fitting 3D Landmarks
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Bayesian inference
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Probabilities: What are they? Four possible interpretations: 1. Long-term frequencies • Relative frequency of an event over time 2. Physical tendencies (propensities) • Arguments about a physical situation (causes of relative frequencies) 3. Degree of belief (Bayesian probabilities) • Subjective beliefs about events/hypothesis/facts 4. (Logic) • Degree of logical support for a particular hypothesis
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Bayesian probabilities for image analysis • Bayesian probabilities make sense Gallileo’s view on Saturn where frequentists interpretations are not applicable! • No amount of repetition makes image sharp. • Uncertainty is not due to random effect, but because of bad telescope. • Still possible to use Bayesian inference. Image credit: McElrath, Statistical Rethinking: Figure 1.12 • Uncertainty summarizes our ignorance.
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Degree of belief: An example • Dentist example: Does the patient have a cavity? 𝑄 cavity = 0.1 𝑄 cavity toothache) = 0.8 𝑄 cavity toothache, gum problems) = 0.4 Bu But t th the e patien tient t eith either has a cavi vity or or doe oes not ot • There is no 80% cavity! • Having a cavity should not depend on whether the patient has a toothache or gum problems All these statements do not contradict each other, they summarize the dentist’s knowledge about the patient 6 AIMA: Russell & Norvig, Artificial Intelligence. A Modern Approach, 3 rd edition,
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Uncertainty: Bayesian Probability • Bayesian probabilities rely on a subjective perspective: • Probabilities express our current knowledge . • Can change when we learn or see more • More data -> more certain about our result. Subjectivity : There is no single, real underlying distribution. A probability distribution expresses our knowledge – It is different in different situations and for different observers since they have different knowledge. • Subjective != Arbitrary • Given belief, conclusions follow by laws of probability calculus 7
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Two important rules Probabilistic model: joint distribution of points 𝑄 𝑦 1 , 𝑦 2 Marginal Conditional Distribution of certain points only Distribution of points conditioned on known values of others 𝑄 𝑦 1 |𝑦 2 = 𝑄 𝑦 1 , 𝑦 2 𝑄 𝑦 1 = 𝑄(𝑦 1 , 𝑦 2 ) 𝑄 𝑦 2 𝑦 2 Product rule: 𝑄 𝑦 1 , 𝑦 2 = 𝑞 𝑦 1 𝑦 2 𝑞(𝑦 2 )
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Marginalization • Models contain irrelevant/hidden variables e.g. points on chin when nose is queried • Marginalize over hidden variables ( 𝐼 ) 𝑄(𝑌) = 𝑄(𝑌, 𝐼) 𝐼
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Belief Updates Model Ob Observ rvatio ion Pos osterior Face distribution Concrete points Face distribution Possibly uncertain consistent with observation Prior belief More knowledge Posterior belief
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Certain Observation • Observations are known values • Distribution of 𝑌 after observing 𝑦 1 , … , 𝑦 𝑂 : 𝑄 𝑌|𝑦 1 … 𝑦 𝑂 • Conditional probability 𝑄 𝑌|𝑦 1 … 𝑦 𝑂 = 𝑄 𝑌, 𝑦 1 , … , 𝑦 𝑂 𝑄 𝑦 1 , … , 𝑦 𝑂
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Towards Bayesian Inference • Update belief about 𝑌 by observing 𝑦 1 , … , 𝑦 𝑂 𝑄 𝑌 → 𝑄 𝑌 𝑦 1 … 𝑦 𝑂 • Factorize joint distribution 𝑄 𝑌, 𝑦 1 , … , 𝑦 𝑂 = 𝑄 𝑦 1 , … , 𝑦 𝑂 |𝑌 𝑄 𝑌 • Rewrite conditional distribution 𝑄 𝑌|𝑦 1 … 𝑦 𝑂 = 𝑄 𝑌, 𝑦 1 , … , 𝑦 𝑂 = 𝑄 𝑦 1 , … , 𝑦 𝑂 |𝑌 𝑄 𝑌 𝑄 𝑦 1 , … , 𝑦 𝑂 𝑄 𝑦 1 , … , 𝑦 𝑂 • General: Query ( 𝑅 ) and Evidence ( 𝐹 ) 𝑄 𝑅|𝐹 = 𝑄 𝑅, 𝐹 = 𝑄 𝐹|𝑅 𝑄 𝑅 𝑄 𝐹 𝑄 𝐹
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Uncertain Observation • Observations with uncertainty Model needs to describe how observations are distributed with joint distribution 𝑄 𝑅, 𝐹 • Still conditional probability But joint distribution is more complex • Joint distribution factorized 𝑄 𝑅, 𝐹 = 𝑄 𝐹|𝑅 𝑄 𝑅 • Likelihood 𝑄 𝐹|𝑅 • Prior 𝑄 𝑅
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Likelihood Join Joint Lik Likelih ihood Prio rior 𝑄 𝑅, 𝐹 = 𝑄 𝐹|𝑅 𝑄 𝑅 • Likelihood x prior: factorization is more flexible than full joint • Prior: distribution of core model without observation • Likelihood: describes how observations are distributed
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Bayesian Inference • Conditional/Bayes rule: method to update beliefs Likelih Lik ihood Prio rior Pos osterior 𝑄 𝑅|𝐹 = 𝑄 𝐹|𝑅 𝑄 𝑅 𝑄 𝐹 Mar argin inal l Lik Likelih ihood • Each observation updates our belief (changes knowledge!) 𝑄 𝑅 → 𝑄 𝑅 𝐹 → 𝑄 𝑅 𝐹, 𝐺 → 𝑄 𝑅 𝐹, 𝐺, 𝐻 → ⋯ • Bayesian Inference: How beliefs evolve with observation • Recursive: Posterior becomes prior of next inference step
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL General Bayesian Inference • Observation of additional variables • Common case, e.g. image intensities, surrogate measures (size, …) • Coupled to core model via likelihood factorization • General Bayesian inference case: • Distribution of data 𝐸 (formerly Evidence) • Parameters 𝜄 (formerly Query) 𝑄 𝜄|𝐸 = 𝑄 𝐸|𝜄 𝑄 𝜄 𝑄 𝐸|𝜄 𝑄 𝜄 = 𝑄 𝐸 ∫ 𝑄 𝐸|𝜄 𝑄 𝜄 𝑒𝜄 𝑄 𝜄|𝐸 ∝ 𝑄 𝐸|𝜄 𝑄 𝜄
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Checkpoint: Bayesian Inference • Why is the Bayesian interpretation better suited for image analysis than a frequentist approach? • Why is it often easier to specify a prior and a likelihood function, than the joint distribution? • Bayesian inference can be applied recursively. Can you give an example (from the course) where we use the posterior again as a prior? • Priors are subjective. Can we ever say one prior is better than another? • Is it conceivable that two individuals assign mutually exclusive priors to the same situation • Can they ever converge to the same conclusion? 20
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Fitting using Markov Chain Monte Carlo
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Posterior distribution MAP Solution 𝛽 ∗ = arg max 𝑞 𝜄 𝑞( image |𝜄) 𝜄 Local Maxima We need approximate inference! Post sterio ior Dis istributio ion 𝑞(θ| image ) = 𝑞 𝜄 𝑞(image|𝜄) 𝑞 image Infeasible to compute: p (image) = ∫ 𝑞 𝜄 𝑞 image 𝜄 𝑒𝜄
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Approximate Bayesian Inference Samplin ing methods Variati tional meth thods • Numeric approximations through simulation • Function approximation 𝑟(𝜄) arg max KL(𝑟(𝜄)|𝑞(𝜄|𝐸)) 𝑟 KL: Kullback- Leibler divergence
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Sampling Methods • Simulate a distribution 𝑞 through random samples 𝑦 𝑗 • Evaluate expectations 𝐹 𝑔 𝑦 = න 𝑔 𝑦 𝑞 𝑦 𝑒𝑦 𝑂 𝑔 = 1 ≈ መ 𝐹 𝑔 𝑦 𝑂 𝑔 𝑦 𝑗 , 𝑦 𝑗 ~ 𝑞 𝑦 𝑗 1 𝑊 መ 𝑔 ~ 𝑃 This is s dif diffic icult! 𝑂 • “Independent” of dimensionality • More samples increase accuracy 24
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Sampling from a Distribution • Easy for standard distributions … is it? • Uniform Random.nextDouble() Random.nextGaussian() • Gaussian • How to sample from more complex distributions? • Beta, Exponential, Chi square, Gamma, … • Posteriors are very often not in a “nice” standard text book form • Sadly, only very few distributions are easy to sample from • We need to sample from an unknown posterior with only unnormalized, expensive point-wise evaluation • General Samplers? • Yes! – Rejection, Importance, MCMC 25
Recommend
More recommend