statistical geometry processing
play

Statistical Geometry Processing Winter Semester 2011/2012 Bayesian - PowerPoint PPT Presentation

Statistical Geometry Processing Winter Semester 2011/2012 Bayesian Statistics Bayesian Statistics Summary Importance The only sound tool to handle uncertainty Manifold applications: Web search to self-driving cars Structure


  1. Statistical Geometry Processing Winter Semester 2011/2012 Bayesian Statistics

  2. Bayesian Statistics Summary • Importance  The only sound tool to handle uncertainty  Manifold applications: Web search to self-driving cars • Structure  Probability: positive , additive , normed measure  Learning is density estimation  Large dimensions are the source of (almost) all evil  No free lunch: There is no universal learning strategy 2

  3. Motivation

  4. Modern AI Classic artificial intelligence: • Write a complex program with enough rules to understand the world • This has been perceived as not very successful Modern artificial intelligence • Machine learning • Learn structure from data  Minimal amount of “hardwired” rules  “Data driven approach” • Mimics human development (training, early childhood) 4

  5. Data Driven Computer Science Statistical data analysis is everywhere: • Cell phones (transmission, error correction) • Structural biology • Web search • Credit card fraud detection • Face recognition in point-and-shoot cameras • ... 5

  6. Probability Theory (a very brief summary)

  7. Probability Theory (a very brief summary) Part I: Philosophy

  8. What is Probability? Question: • What is probability? Example: • A bin with 50 red and 50 blue balls • Person A takes a ball • Question to Person B: What is the probability for red ? What happened: • Person A took a blue ball • Not visible to person B 8

  9. Philosophical Debate… An old philosophical debate: • What does “probability” actually mean? • Can we assign probabilities to events for which the outcome is already fixed? (but we do not know it for sure) “Fixed outcome” examples: • Probability for life on mars • Probability for J.F. Kennedy having been assassinated by a intra-government conspiracy • Probability that the code you wrote is correct 9

  10. Two Camps Frequentists ’ (traditional) view: • Well defined experiment • Probability is the relative number of positive outcomes • Only meaningful as a mean of many experiments Bayesian view: • Probability expresses a degree of belief • Mathematical model of uncertainty • Can be subjective 10

  11. Mathematical Point of View Mathematics: • Math does not tell you what is true • It only tells you the consequences if you accept other assumptions (axioms) to be true • Mathematicians don’t do philosophy. Mathematical definition of probability: • Properties of probability measures • Consistent with both views • Defines rules for computing with probabilities • Setting up probabilities is not a math problem 11

  12. Probability Theory (a very brief summary) Part II: Probability Measures

  13. Kolmogorov’s Axioms Discrete probability space:  = { w 1 , …, w n } • Elementary events : Subsets A   • General events : • Probability measure: Pr : P (  )   A valid probability measure must ensure: Pr(A)  0 • Positive: [A  B =  ]  [Pr(A) + Pr(B) = Pr( A  B )] • Additive: Pr(  ) = 1 • Normed: 13

  14. Other Properties Follow Properties derived from Kolmogorov’s Axioms: • P(A)  [0..1] • P(A) = P(  \ A) = 1 – P(A) • P(  ) = 0 • Pr(A  B) = Pr(A) + Pr(B) – Pr(A  B) • … counted twice 14

  15. In other words Mathematical probability is a • non-negative , normed , additive measure.  Always  0  Sums to 1  Disjoint pieces add up 15

  16. In other words Mathematical probability is a • non-negative , normed , additive measure. w 1 – elementary event w 2 – elementary event … 1 2 3 4 5 6 7 8 more likely: w 21 8 … 16 … 21 less likely: w 64 Pr( w 21 ) > Pr( w 64 )  64  i Pr( w i ) = 1 • Think of a density on some domain  16

  17. In other words Mathematical probability is a • non-negative , normed , additive measure. A is an event 1 2 3 4 5 6 7 8 8 … … 16 Pr( A ) =  i  A Pr( w i ) 21 22 23 29 30 31 = Pr( w 21 ) + Pr( w 22 ) + Pr( w 23 ) 36 37 38 + Pr( w 29 ) + Pr( w 30 ) + Pr( w 31 )  + Pr( w 36 ) + Pr( w 37 ) + Pr( w 38 ) 64 • Think of a density on some domain  17

  18. In other words Mathematical probability is a • non-negative , normed , additive measure.  Always  0  Sums to 1  Disjoint pieces add up What does this model? • You can always think of an area with density. • All pieces are positive. • Sum of densities is 1. 18

  19. Discrete Models Discrete probability space:  = { w 1 , …, w n } • Elementary events : Subsets A   • General events : • Probability measure: Pr : P (  )   Probability measures: • Sum of elementary probabilities =  w  A Pr ( w i ) Pr( A )  i 19

  20. Continuous Probability Measures Continuous probability space:   ℝ d • Elementary events : “reasonable” *) subsets A   • General events : • Probability measure: Pr : σ (  )   assigns probability to subsets *) of  *) not “ all” subsets: Borel sigma algebra (details omitted) The same axioms: Pr(A)  0 • Positive: [A  B =  ]  [Pr(A) + Pr(B) = Pr(A  B)] • Additive: P(  ) = 1 • Normed: 20

  21. Continuous Density Density model • No elementary probabilities • Instead: density p : ℝ d  ℝ  0 A is an event Pr(A) = ∫ A p ( x ) d x Density p ( x ) with  p ( x )  0 and ∫  p ( x ) d x = 1 21

  22. Random Variables Random Variables • Assign numbers or vectors from ℝ d to outcomes • Notation:  random variable X p  density p ( x ) = Pr( X = x ) • Usually: x = X Variable = domain of the density  22

  23. Unified View Discrete models as special case p ( x ), x  ℝ p ( w i ), w i  {1,...,9} Dirac-Delta pulses p ( x ) = Σ i δ ( x – x i ) p ( w i ) Idealization 1 2 3 4 5 6 7 8 9 ∫ ℝ d δ ( x ) d x = 1 1 3 5 9 w i x δ (0) very large Discrete model Continuous model d(x) = 0 everywhere else 23

  24. Probability Theory (a very brief summary) Part III: Statistical Dependence

  25. Conditional Probability Conditional Probability: • Pr(A | B) = Probability of A given B [is true] • Easy to show: Pr(A  B) = Pr(A | B) · Pr( B) Statistical Independence • A and B independent :  Pr(A  B) = Pr(A) · Pr( B) • Knowing the value of A does not yield information about B (and vice versa) 25

  26. Factorization Independence = Density Factorization p ( x 1 , x 2 ) p ( x 1 ) p ( x 2 )  = x 2 x 2 x 1 x 1 p ( x 1 , x 2 ) = p ( x 1 )  p ( x 2 ) 26

  27. Factorization Independence = Density Factorization p ( x 1 , x 2 ) p ( x 1 ) p ( x 2 ) 1 2 ... k  = x 2 x 2 ... 1 2 ... k 2 1 x 1 k 1 2 ... k x 1 p ( x 1 , x 2 ) = p ( x 1 )  p ( x 2 ) O( d ⋅ k ) O( k d ) 27

  28. Marginals Example 1 • Two random variables p ( a , b ) a , b  [0,1] b 𝑒𝑐 • Joint distribution p ( a , b) • We do not know b 0 (could by anything) a 0 1 • What is the distribution of a ? 1 𝑞 𝑏 = 𝑞 𝑏, 𝑐 𝑒𝑐 a 0 1 0 “Marginal Probability” 28

  29. Conditional Probability Bayes’ Rule : Pr(B | A)·Pr(A ) Pr(A | B) = Pr(B) Derivation • Pr(A  B) = Pr(A | B) · Pr( B) Pr(A  B) = Pr(B | A) · Pr( A)  Pr(A | B) · Pr( B) = Pr(B | A) · Pr( A) 29

  30. Bayesian Inference Example: Statistical Inference • Medical test to check for a medical condition • A: Medical test positive?  99% correct if patient is ill  But in 1 of 100 cases, reports illness for healthy patients • B: Patient has disease?  We know: One in 10 000 people have it A patient is diagnosed with the disease: • How likely is it for the patient to actually be sick? 30

  31. Bayesian Inference Apply Bayes’ Rule: A: Medical test positive? B: Patient has disease? Pr(B | A) = Pr(A | B)·Pr(B ) Pr(A) Pr(test pos. | disease)·Pr( deasease ) Pr(disease | test positive) = Pr(test pos.|disease)Pr(disease) + Pr(test pos.|disease)Pr(disease) 0.99 · 0.0001 = 0.000099 = 0.99 ·0.0001 + 0.01·0.9999 0.0100979901  0.0098  1  most likely healthy 100 31

  32. Intuition Soccer Stadium – 10 000 people 100 people with positive test 1 person actually sick 32

  33. Conclusion Pr(B | A)·Pr(A ) Pr(A | B) = Bayes’ Rule: Pr(B) • Used to fuse knowledge  “Prior” knowledge (prevalence of disease)  “Measurement”: tests, sensor data, new information  Can be used repeatedly to add more information • Standard tool for interpreting sensor measurements (Sensor fusion, reconstruction) • Examples:  Image reconstruction (noisy sensors)  Face recognition 33

  34. Chain Rule Incremental update • Probability can be split into chain of conditional probabilities: Pr 𝑌 𝑜 , … , 𝑌 2 , 𝑌 1 = Pr 𝑌 𝑜 𝑌 𝑜−1 , 𝑌 𝑜−2 , … , 𝑌 1 ) ⋯ Pr 𝑌 3 𝑌 2 , 𝑌 1 Pr(𝑌 2 |𝑌 1 )Pr(𝑌 1 ) • Example application:  X i is measurement at time i  Update probability distribution as more data comes in • Attention – although it might look like, this does not reduce the complexity of the joint distribution 34

  35. Probability Theory (a very brief summary) Part IV: Uniqueness – Philosophy Again...

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend