 
              Outline • Uncertainty • Probability Chapter13 • Syntax and Semantics Inference • Independence and Bayes' Rule • Uncertainty 20070531 Chap13 1 20070531 Chap13 2 Uncertainty Methods for handling uncertainty Let action A t = leave for airport t minutes before flight • Default or nonmonotonic logic: Will A t get me there on time? - Assume my car does not have a flat tire - Assume A 25 works unless contradicted by evidence • Issues: Problems: What assumptions are reasonable? 1. partial observability (road state, other drivers' plans, etc .) How to handle contradiction? 2. noisy sensors (traffic reports) 3. uncertainty in action outcomes (flat tire, etc .) • Rules with fudge factors : 4. immense complexity of modeling and predicting traffic A 25 | → 0.3 - get there on time - Sprinkler | → 0.99 WetGrass Hence a purely logical approach either - WetGrass | → 0.7 Rain 1. risks falsehood: “ A 25 will get me there on time”, or • Issues: Problems with combination, e.g., Sprinkler causes 2. leads to conclusions that are too weak for decision making: Rain ?? “ A 25 will get me there on time if there's no accident on the bridge and it • Probability doesn't rain and my tires remain intact etc .” - Model agent's degree of belief - Given the available evidence, ( A 1440 might reasonably be said to get me there on time but I'd have to stay A 25 will get me there on time with probability 0.04 - overnight in the airport … ) 20070531 Chap13 3 20070531 Chap13 4 Probability Making decisions under uncertainty Probabilistic assertions summarize effects of Suppose I believe the following: laziness: failure to enumerate exceptions, qualifications, P(A 25 gets me there on time | …) = 0.04 - etc. P(A 90 gets me there on time | …) = 0.70 - ignorance: lack of relevant facts, initial conditions, etc. P(A 120 gets me there on time | …) = 0.95 Subjective probability: P(A 1440 gets me there on time | … ) = 0.9999 Probabilities relate propositions to agent's own state of • knowledge Which action to choose? • e.g., P(A 25 | no reported accidents) = 0.06 Depends on my preferences for missing flight vs. time spent waiting, etc. These are not assertions about the world. - Utility theory is used to represent and infer preferences - Decision theory = probability theory + utility theory Probabilities of propositions change with new evidence: • Principle of Maximum Expected Utility (MEU) e.g., P(A 25 | no reported accidents, 5 a.m.) = 0.15 An agent is rational iff it chooses the action that yields the highest expected utility, averaged over all the possible outcomes of the actions. 20070531 Chap13 5 20070531 Chap13 6 1
Syntax Syntax (cont.) • Basic element: random variable • Atomic event: A complete specification of the state of the world about which the agent is Similar to propositional logic: possible worlds defined by • uncertain. assignment of values to random variables. e.g., if the world consists of only two Boolean variables • Boolean random variables cavity and toothache , then there are 4 distinct atomic e.g., cavity (do I have a cavity?) events: • Discrete random variables cavity = false ∧ toothache = false e.g., weather is one of < sunny,rainy,cloudy,snow > cavity = false ∧ toothache = true • Domain values must be exhaustive and mutually exclusive cavity = true ∧ toothache = false cavity = true ∧ Elementary proposition constructed by assignment of a toothache = true • value to a random variable e.g., weather = sunny , cavity = false (abbrev. as ¬ cavity ) Atomic events are mutually exclusive and • • Complex propositions formed from elementary propositions exhaustive. and standard logical connectives e.g., weather = sunny ∨ cavity = false 20070531 Chap13 7 20070531 Chap13 8 Axioms of probability Prior probability • Prior or unconditional probabilities of propositions • For any propositions A, B correspond to belief prior to arrival of any (new) evidence 0 ≤ P( A ) ≤ 1 - e.g., P( cavity = true) = 0.1 and P( weather = sunny) = 0.72 - P( true ) = 1 and P( false ) = 0 • Probability distribution gives values for all possible assignments: P( A ∨ B ) = P( A ) + P( B ) - P( A ∧ B - ) e.g., P( weather ) = <0.72, 0.1, 0.08, 0.1> (normalized, i.e., sums to 1) • Joint probability distribution for a set of random variables gives the probability of every atomic event e.g., P( weather,cavity ) = a 4 × 2 matrix of values: weather = sunny rainy cloudy snow cavity = true 0.144 0.02 0.016 0.02 0.08 cavity = false 0.576 0.08 0.064 • Every question about a domain can be answered by the joint distribution. 20070531 Chap13 9 20070531 Chap13 10 Conditional probability Conditional probability (cont.) • Conditional or posterior probabilities • Definition of conditional probability: P(a | b) = P(a ∧ b) / P(b) if P(b e.g., P( cavity | toothache ) = 0.8 ) > 0 i.e., the prob. of having a cavity will be 0.8 given that all we know is toothache • Product rule gives an alternative formulation: P(a ∧ b) = P(a | b) * P(b) = P(b | a) * P(a ) • If we know more, e.g., cavity is also given, then • A general version holds for whole distributions, we have P( cavity | toothache,cavity ) = 1 e.g., P ( weather, cavity ) = P ( weather | cavity ) * P ( cavity ) (View as a set of 4 × 2 equations, not matrix mult .) • New evidence may be irrelevant, allowing simplification, • Chain rule is derived by successive application e.g., P( cavity | toothache, sunny ) = P( cavity | toothache ) = 0.8 of product rule: P (X 1 , …, X n ) = P (X 1 , ..., X n-1 ) * P (X n | X 1 , ..., X n-1 ) • This kind of inference, sanctioned by domain = P (X 1 , ..., X n-2 ) * P (X n-1 | X 1 , ..., X n-2 ) * P (X n | X 1 , ..., X n-1 ) knowledge, is crucial. = … = P (X 1 ) * P (X 2 | X 1 ) * P (X 3 | X 1 , X 2 ) * … * P (X n | X 1 , ..., X n-1 ) 20070531 Chap13 11 20070531 Chap13 12 2
Inference by enumeration Inference by enumeration (cont.-1) • Start with the joint probability distribution: • Start with the joint probability distribution: • For any proposition φ , sum the atomic events • For any proposition φ , sum the atomic events where it is true: P( φ ) = Σ ω : ω╞φ P( ω ) where it is true: P( φ ) = Σ ω : ω╞φ P( ω ) P( toothache ) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2 20070531 Chap13 13 20070531 Chap13 14 Inference by enumeration (cont.-2) Inference by enumeration (cont.-3) • Start with the joint probability distribution: • Start with the joint probability distribution: • For any proposition φ , sum the atomic events • Can also compute conditional probabilities: where it is true: P( φ ) = Σ ω : ω╞φ P( ω ) P( ¬ cavity | toothache ) = P( ¬ cavity ∧ toothache ) P( toothache ) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2 P( toothache ) P(cavity ∨ toothache ) = 0.108 + 0.012 + 0.016 + 0.064 = 0.016+0.064 = 0.08 + 0.072 + 0.008 = 0.28 0.108 + 0.012 + 0.016 + 0.064 0.2 = 0.4 20070531 Chap13 15 20070531 Chap13 16 Normalization Inference by enumeration (cont.-4) Typically, we are interested in the posterior joint distribution of the query variables X given specific values e for the evidence variables E Let the hidden variables (remaining unobserved variable) be Y • Denominator can be viewed as a normalization constant α Then the required summation of joint entries is done by summing out the hidden variables: P( cavity | toothache ) = α P( cavity, toothache ) P(X | E = e) = α P(X, E = e) = α Σ y P(X ,E= e, Y = y) = α [P( cavity, toothache, catch ) + P( cavity, toothache , ¬ catch )] = α [<0.108, 0.016> + <0.012, 0.064>] = α <0.12, 0.08> = <0.6, 0.4> • The terms in the summation are joint entries because X , E and Y together exhaust the set of random variables • General idea: compute distribution on query variable by fixing evidence variables and summing over hidden variables • Obvious problems: P(X | e) = α P(X, e) = α Σ y P(X, e, y) 1. Worst-case time complexity O(d n ) where d is the largest arity 2. Space complexity O(d n ) to store the joint distribution 3. How to find the numbers for O(d n ) entries? 20070531 Chap13 17 20070531 Chap13 18 3
Recommend
More recommend