CS786: Lecture 1 May 1st Basics: review of probability theory 1 - PDF document

CS786: Lecture 1  May 1st  Basics: review of probability theory 1 CS 786 Lecture Slides (c) 2012 P. Poupart Theories to deal with uncertainty  Dempster-Shafer theory  Fuzzy set theory  Possibility theory  Probability theory • Well established  Axioms of probability theory rediscovered by many scientists over time • Theory used by most scientists today 2 CS 786 Lecture Slides (c) 2012 P. Poupart 1

Probabilities  Objectivist/Frequentist viewpoint: • Pr(q) denotes the relative frequency that q was observed to be true  Subjectivist/Bayesian viewpoint: • We'll quantify our beliefs using probabilities • Pr(q) denotes probability that you believe q is true • Note: statistics/data influence degrees of belief  Let’s formalize things… 3 CS 786 Lecture Slides (c) 2012 P. Poupart Random Variables  Assume set V of random variables : X, Y , etc. • Each RV X has a domain of values Dom(X) • X can take on any value from Dom(X) • Assume V and Dom(X) finite  Examples • Dom(X) = {x 1 , x 2 , x 3 } • Dom(Weather) = {sunny, cloudy, rainy} • Dom(StudentInPascalsOffice) = {bob, georgios, veronica, tianhan…} • Dom(CraigHasCoffee) = {T,F} (boolean var) 4 CS 786 Lecture Slides (c) 2012 P. Poupart 2

Random Variables/Possible Worlds  A formula is a logical combination of variable assignments: • X = x 1 ; (X = x 2 ∨ X = x 3 ) ∧ Y = y 2 ; (x 2 ∨ x 3 ) ∧ y 2 • chc ∧ ~cm, etc… • let L denote the set of formulae (our language)  A possible world is an assignment of values to each RV • these are analogous to truth assts (interpretations) • Let W be the set of worlds 5 CS 786 Lecture Slides (c) 2012 P. Poupart Probability Distributions  A probability distribution Pr: L → [0,1] s.t. • 0 ≤ Pr( α) ≤ 1 • Pr( α ) = Pr( β ) if α is logically equivalent to β • Pr( α ) = 1 if α is a tautology (always true) • Pr( α ) = 0 if α is impossible (always false) • Pr( α ∨β ) = Pr( α ) + Pr( β ) - Pr( α ∧β )  For continuous random variables, we use probability densities. 6 CS 786 Lecture Slides (c) 2012 P. Poupart 3

Example Distribution T – mail truck outside Pr(t) =1 M – mail waiting Pr(-t) = 0 C – craig wants coffee Pr(c) = .2 A – craig is angry Pr( -c) = .8 Pr(m) = .9 t c m a 0.162 t c m a 0.0 Pr(a) = .618 t c m a 0.018 t c m a 0.0 Pr(c & m) = .18 t c m a 0.016 t c m a 0.0 Pr(c v m) = .92 t c m a 0.004 t c m a 0.0 Pr(a -> m) t c m a 0.432 t c m a 0.0 = Pr(-a v m) = 1 – Pr(a & -m) t c m a 0.288 t c m a 0.0 = .976 t c m a 0.008 t c m a 0.0 t c m a 0.072 t c m a 0.0 7 CS 786 Lecture Slides (c) 2012 P. Poupart Conditional Probability  Conditional probability critical in inference  Pr( ) b a  Pr( | ) b a Pr( ) a • if Pr(a) = 0, we often treat Pr(b|a)=1 by convention 8 CS 786 Lecture Slides (c) 2012 P. Poupart 4

Intuitive Meaning of Cond. Prob.  Intuitively, if you learned a, you would change your degree of belief in b from Pr(b) to Pr(b|a)  In our example: • Pr(m|c) = 0.9 • Pr(m| ~c) = 0.9 • Pr(a) = 0.618 • Pr(a|~m) = 0.27 • Pr(a|~m & c) = 0.8  Notice the nonmonotonicity in the last three cases when additional evidence is added • contrast this with logical inference 9 CS 786 Lecture Slides (c) 2012 P. Poupart Some Important Properties  Product Rule: Pr(ab) = Pr(a|b)Pr(b)  Summing Out Rule: b   Pr( ) Pr( | ) Pr( ) a a b b  ( ) Dom B  Chain Rule: Pr(abcd) = Pr(a|bcd)Pr(b|cd)Pr(c|d)Pr(d) • holds for any number of variables 10 CS 786 Lecture Slides (c) 2012 P. Poupart 5

Bayes Rule  Bayes Rule: Pr( | ) Pr( ) b a a  Pr( | ) a b Pr( ) b  Bayes rule follows by simple algebraic manipulation of the defn of condition probability • why is it so important? why significant? • usually, one “direction” easier to assess than other 11 CS 786 Lecture Slides (c) 2012 P. Poupart Example of Use of Bayes Rule  Disease ∊ {malaria, cold, flu}; Symptom = fever • Must compute Pr(D | fever) to prescribe treatment  Why not assess this quantity directly? • Pr(mal | fever) is not natural to assess; Pr(fever | mal) reflects the underlying “causal” mechanism • Pr(mal | fever) is not “stable”: a malaria epidemy changes this quantity (for example)  So we use Bayes rule: • Pr(mal | fever) = Pr(fever | mal) Pr(mal) / Pr(fever) • note that Pr(fev) = Pr(m&fev) + Pr(c&fev) + Pr(fl&fev) • so if we compute Pr of each disease given fever using Bayes rule, normalizing constant is “free” 12 CS 786 Lecture Slides (c) 2012 P. Poupart 6

Probabilistic Inference  By probabilistic inference, we mean • given a prior distribution Pr over variables of interest, representing degrees of belief • and given new evidence E=e for some var E • Revise your degrees of belief: posterior Pr e  How do your degrees of belief change as a result of learning E=e (or more generally E = e , for set E ) 13 CS 786 Lecture Slides (c) 2012 P. Poupart Conditioning  We define Pr e ( α ) = Pr( α | e )  That is, we produce Pr e by conditioning the prior distribution on the observed evidence e  Intuitively, • we set Pr(w) = 0 for any world falsifying e • we set Pr(w) = Pr(w) / Pr(e) for any world consistent with e • last step known as normalization (ensures that the new measure sums to 1) 14 CS 786 Lecture Slides (c) 2012 P. Poupart 7

Semantics of Conditioning p1 p3 p1 α p1 p2 p4 p2 α p2 E=e E=e E=e E=e Pr Pr e α = 1/( p1+p2) normalizing constant 15 CS 786 Lecture Slides (c) 2012 P. Poupart Inference: Computational Bottleneck  Semantically/conceptually, picture is clear; but several issues must be addressed  Issue 1: How do we specify the full joint distribution over X 1 , X 2 ,…, X n ? • exponential number of possible worlds • e.g., if the X i are boolean, then 2 n numbers (or 2 n -1 parameters/degrees of freedom, since they sum to 1) • these numbers are not robust/stable • these numbers are not natural to assess (what is probability that “Pascal wants coffee; it’s raining in Toronto; robot charge level is low; …”?) 16 CS 786 Lecture Slides (c) 2012 P. Poupart 8

Inference: Computational Bottleneck  Issue 2: Inference in this rep’n frightfully slow • Must sum over exponential number of worlds to answer query Pr( α ) or to condition on evidence e to determine Pr e ( α )  How do we avoid these two problems? • no solution in general • but in practice there is structure we can exploit  We’ll use conditional independence 17 CS 786 Lecture Slides (c) 2012 P. Poupart Independence  Recall that x and y are independent iff: • Pr(x) = Pr(x|y) iff Pr(y) = Pr(y|x) iff Pr(xy) = Pr(x)Pr(y) • intuitively, learning y doesn’t influence beliefs about x  x and y are conditionally independent given z iff: • Pr(x|z) = Pr(x|yz) iff Pr(y|z) = Pr(y|xz) iff Pr(xy|z) = Pr(x|z)Pr(y|z) iff … • intuitively, learning y doesn’t influence your beliefs about x if you already know z • e.g., learning someone’s mark on 886 project can influence the probability you assign to a specific GPA; but if you already knew 886 final grade , learning the project mark would not influence GPA assessment 18 CS 786 Lecture Slides (c) 2012 P. Poupart 9

What does independence buy us?  Suppose (say, boolean) variables X 1 , X 2 ,…, X n are mutually independent • we can specify full joint distribution using only n parameters (linear) instead of 2 n -1 (exponential)  How? • Simply specify Pr(x 1 ), … Pr(x n ) • from this I can recover probability of any world or any (conjunctive) query easily • e.g. Pr(x 1 ~x 2 x 3 x 4 ) = Pr(x 1 ) (1-Pr(x 2 )) Pr(x 3 ) Pr(x 4 ) • we can condition on observed value X k = x k trivially by changing Pr( x k ) to 1, leaving Pr( x i ) untouched for i ≠k 19 CS 786 Lecture Slides (c) 2012 P. Poupart The Value of Independence  Complete independence reduces both representation of joint and inference from O(2 n ) to O(n): pretty significant!  Unfortunately, such complete mutual independence is very rare. Most realistic domains do not exhibit this property.  Fortunately, most domains do exhibit a fair amount of conditional independence. And we can exploit conditional independence for representation and inference as well.  Bayesian networks do just this 20 CS 786 Lecture Slides (c) 2012 P. Poupart 10

CS786: Lecture 1 May 1st Basics: review of probability theory 1 - PDF document

CS786: Lecture 1 May 1st Basics: review of probability theory 1 CS 786 Lecture Slides (c) 2012 P. Poupart Theories to deal with uncertainty Dempster-Shafer theory Fuzzy set theory Possibility theory Probability theory

CS786 Lecture 13: May 14, 2012 Sampling techniques [KF Chapter 12] CS786 P. Poupart 2012 1

CS786 Lecture 15: May 21, 2012 MAP inference [KF Chapter 13] CS786 P. Poupart 2012 1 MAP Queries

CS786 Lecture 12: May 12, 2012 Inference as Optimization (continued) [KF Chapter 11] CS786 P.

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

Markov Networks [KF] Chapter 4 CS 786 University of Waterloo Lecture 7: May 24, 2012 Outline

Learning and Inference in Markov Logic Networks CS 786 University of Waterloo Lecture 24: July

Markov Logic Networks Matt Richardson and Pedro Domingos (2006), Markov Logic Networks, Machine

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

Pocket Lecture Pocket Lecture Pocket Lecture Pocket Lecture Listen Audio Notes Progress

Multiphase Modelling in Cancer Helen Byrne Wolfson Centre for Mathematical Biology Mathematical

Previous Lecture Todays Lecture Slides for Lecture 5 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 30 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 28 Completion of divide-by-3 counter

Previous Lecture Todays Lecture Slides for Lecture 12 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 3 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 2 ENEL 353: Digital Circuits Fall 2013

CS 423 Operating System Design: Overview and Basic Concepts Professor Adam Bates Fall

Solving large scale eigenvalue problems Lecture 11, May 9, 2018: JacobiDavidson algorithms

Feature Structures, Unification Some grammatical phenomena Linguistic features Feature

DEMO println(string s) goes crazy ..or how to make code do more than it should Trivial

Symbolic Execution of Debian Packages Nicolas Jeannerod nicolas.jeannerod@irif.fr joint work

Computational Linguistics: Feature Agreement Raffaella Bernardi Contents First Last Prev

Background: Web Proxies HTTP/HTTPS /SOCKS Exit nodes Exit nodes Exit nodes are constrained

Field of values error estimates for evaluating functions of matrices via the Arnoldi method