Probability recap CS 188: Artificial Intelligence Conditional - PDF document

Probability recap CS 188: Artificial Intelligence § Conditional probability § Product rule Bayes ’ Nets § Chain rule Representation and Independence § X, Y independent iff: Pieter Abbeel – UC Berkeley § X and Y are conditionally independent given Z iff: Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore 2 Probabilistic Models Bayes ’ Nets: Big Picture § Models describe how (a portion of) the world works § Two problems with using full joint distribution tables as our probabilistic models: § Models are always simplifications § Unless there are only a few variables, the joint is WAY too big to § May not account for every variable represent explicitly. For n variables with domain size d, joint table has d n entries --- exponential in n. § May not account for all interactions between variables § “ All models are wrong; but some are useful. ” § Hard to learn (estimate) anything empirically about more than a – George E. P. Box few variables at a time § What do we do with probabilistic models? § Bayes ’ nets: a technique for describing complex joint § We (or our agents) need to reason about unknown variables, distributions (models) using simple, local distributions given evidence (conditional probabilities) § Example: explanation (diagnostic reasoning) § More properly called graphical models § Example: prediction (causal reasoning) § We describe how variables locally interact § Example: value of information § Local interactions chain together to give global, indirect interactions 3 4 Bayes’ Nets Graphical Model Notation § Representation § Nodes: variables (with domains) § Can be assigned (observed) or § Informal first introduction of Bayes’ nets unassigned (unobserved) through causality “intuition” § Arcs: interactions § More formal introduction of Bayes’ nets § Similar to CSP constraints § Indicate “ direct influence ” between variables § Conditional Independences § Formally: encode conditional independence (more later) § Probabilistic Inference § For now: imagine that arrows mean direct causation (in general, they don ’ t!) § Learning Bayes’ Nets from Data 5 6 1

Example: Coin Flips Example: Traffic § N independent coin flips § Variables: § R: It rains R § T: There is traffic X 1 X 2 X n § Model 1: independence T § Model 2: rain causes traffic § No interactions between variables: absolute independence § Why is an agent using model 2 better? 7 8 Example: Traffic II Example: Alarm Network § Let ’ s build a causal graphical model § Variables § B: Burglary § Variables § A: Alarm goes off § T: Traffic § M: Mary calls § R: It rains § J: John calls § L: Low pressure § D: Roof drips § E: Earthquake! § B: Ballgame § C: Cavity 9 10 Bayes ’ Net Semantics Probabilities in BNs § Let ’ s formalize the semantics of a § Bayes ’ nets implicitly encode joint distributions Bayes ’ net § As a product of local conditional distributions A 1 A n § To see what probability a BN gives to a full assignment, multiply § A set of nodes, one per variable X all the relevant conditionals together: § A directed, acyclic graph X § A conditional distribution for each node § Example: § A collection of distributions over X, one for each combination of parents ’ values § CPT: conditional probability table § This lets us reconstruct any entry of the full joint § Description of a noisy “ causal ” process § Not every BN can represent every joint distribution § The topology enforces certain conditional independencies A Bayes net = Topology (graph) + Local Conditional Probabilities 11 12 2

Example: Coin Flips Example: Traffic X 1 X 2 X n +r 1/4 R ¬ r 3/4 h 0.5 h 0.5 h 0.5 t 0.5 t 0.5 t 0.5 +r +t 3/4 T ¬ t 1/4 ¬ r +t 1/2 ¬ t 1/2 Only distributions whose variables are absolutely independent can be represented by a Bayes ’ net with no arcs. 13 14 Example: Alarm Network Example Bayes ’ Net: Insurance E P(E) B P(B) B urglary E arthqk +e 0.002 +b 0.001 ¬ e 0.998 ¬ b 0.999 A larm B E A P(A|B,E) +b +e +a 0.95 J ohn M ary +b +e ¬ a 0.05 calls calls +b ¬ e +a 0.94 A J P(J|A) A M P(M|A) +b ¬ e ¬ a 0.06 ¬ b +e +a 0.29 +a +j 0.9 +a +m 0.7 ¬ b +e ¬ a 0.71 +a ¬ j 0.1 +a ¬ m 0.3 ¬ b ¬ e +a 0.001 ¬ a +j 0.05 ¬ a +m 0.01 ¬ a ¬ j 0.95 ¬ a ¬ m 0.99 ¬ b ¬ e ¬ a 0.999 16 Example Bayes ’ Net: Car Build your own Bayes nets! § http://www.aispace.org/bayes/index.shtml 17 18 3

Size of a Bayes ’ Net Bayes’ Nets § How big is a joint distribution over N Boolean variables? § Representation 2 N § Informal first introduction of Bayes’ nets through causality “intuition” § How big is an N-node net if nodes have up to k parents? § More formal introduction of Bayes’ nets O(N * 2 k+1 ) § Conditional Independences § Both give you the power to calculate § BNs: Huge space savings! § Probabilistic Inference § Also easier to elicit local CPTs § Also turns out to be faster to answer queries (coming) § Learning Bayes’ Nets from Data 21 22 Representing Joint Probability Chain Rule à Bayes’ net Distributions § Chain rule representation: applies to ALL distributions § Table representation : § Pick any ordering of variables, rename accordingly as x 1 , x 2 , … , x n d n -1 number of parameters: Exponential in n § Chain rule representation: number of parameters: (d-1) + d(d-1) + d 2 (d-1)+ … +d n-1 (d-1) = d n -1 § Bayes’ net representation: makes assumptions § Pick any ordering of variables, rename accordingly as x 1 , x 2 , … , x n number of parameters: (d-1) + d(d-1) + d 2 (d-1)+ … +d n-1 (d-1) = d n -1 § Pick any directed acyclic graph consistent with the ordering § Assume following conditional independencies: Size of CPT = (number of different joint instantiations of the preceding variables) P ( x i | x 1 · · · x i − 1 ) = P ( x i | parents ( X i )) times (number of values current variable can take on minus 1) à Joint: à § Both can represent any distribution over the n random variables. Linear Makes sense same number of parameters needs to be stored. number of parameters: (maximum number of parents = K) in n § Chain rule applies to all orderings of the variables, so for a given 24 distribution we can represent it in n! = n factorial = n(n-1)(n-2) … 2.1 23 different ways with the chain rule Note: no causality assumption made anywhere. Causality? Example: Traffic § When Bayes ’ nets reflect the true causal patterns: § Basic traffic net § Often simpler (nodes have fewer parents) § Often easier to think about § Let ’ s multiply out the joint § Often easier to elicit from experts § BNs need not actually be causal § Sometimes no causal net exists over the domain r 1/4 R r t 3/16 § E.g. consider the variables Traffic and Drips ¬ r 3/4 r ¬ t 1/16 § End up with arrows that reflect correlation, not causation ¬ r t 6/16 § What do the arrows really mean? ¬ r ¬ t 6/16 r t 3/4 § Topology may happen to encode causal structure T ¬ t 1/4 § Topology only guaranteed to encode conditional independence ¬ r t 1/2 ¬ t 1/2 25 26 4

Example: Reverse Traffic Example: Coins § Reverse causality? § Extra arcs don ’ t prevent representing independence, just allow non-independence X 1 X 2 X 1 X 2 t 9/16 T r t 3/16 ¬ t 7/16 r ¬ t 1/16 ¬ r t 6/16 h 0.5 h 0.5 h 0.5 h | h 0.5 ¬ r ¬ t 6/16 t 0.5 t 0.5 t 0.5 t | h 0.5 t r 1/3 R h | t 0.5 ¬ r 2/3 § Adding unneeded arcs isn ’ t t | t 0.5 ¬ t r 1/7 wrong, it ’ s just inefficient ¬ r 6/7 27 28 Bayes’ Nets Bayes Nets: Assumptions § To go from chain rule to Bayes’ net representation, we § Representation made the following assumption about the distribution: § Informal first introduction of Bayes’ nets P ( x i | x 1 · · · x i − 1 ) = P ( x i | parents ( X i )) through causality “intuition” § Turns out that probability distributions that satisfy the above § More formal introduction of Bayes’ nets (“chain-rule à Bayes net”) conditional independence assumptions § Conditional Independences § often can be guaranteed to have many more conditional independences § These guaranteed additional conditional independences can be § Probabilistic Inference read off directly from the graph § Learning Bayes’ Nets from Data § Important for modeling: understand assumptions made 29 30 when choosing a Bayes net graph Example Independence in a BN § Given a Bayes net graph X Y Z W § Important question: Are two nodes guaranteed to be independent given certain evidence? § Conditional independence assumptions directly from simplifications in chain rule: Equivalent question: Are two nodes independent given the evidence in all distributions that can be encoded with the Bayes net graph? § Before proceeding: How about opposite question: Are § Additional implied conditional independence two nodes guaranteed to be dependent given certain assumptions? evidence? § No! For any BN graph you can choose all CPT’s such that all variables are independent by having P(X | Pa(X) = paX) not 31 depend on the value of the parents. Simple way of doing so: pick all entries in all CPTs equal to 0.5 (assuming binary variables) 5

Probability recap CS 188: Artificial Intelligence Conditional - PDF document

Probability recap CS 188: Artificial Intelligence Conditional probability Product rule Bayes Nets Chain rule Representation and Independence X, Y independent iff: Pieter Abbeel UC Berkeley X and Y are

Probability recap CS 188: Artificial Intelligence Conditional probability Product rule

Artificial Intelligence Artificial Intelligence Artificial Intelligence Study and design of

Probability Recap CS 188: Artificial Intelligence Hidden Markov Models Conditional probability

Artificial Intelligence Course Presentation Summary Artificial Intelligence Motivations

Artificial Intelligence Course Presentation Summary Artificial Intelligence Motivations

Introduction to Probability Click to go to Table of Contents Slide 5 / 188 Probability One way

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Artificial intelligence Artificial Intelligence is the science of PHILOSOPHY OF ARTIFICIAL

Artificial Intelligence Intro (Chapter 1 of AIMA) Summary Artificial Intelligence What is AI?

Introduction to Number of Probability favorable outcomes Probability = of an event Total

Today CS 188: Artificial Intelligence Uncertainty Spring 2006 Probability Basics

CS 188: Artificial Intelligence Bayes Nets Representation and Independence Pieter Abbeel

Continuing Probability. Wrap up: Total Probability and Conditional Probability. Continuing

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Causal Dependence Boris Kment 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Geometric Processing CS 148: Summer 2016 Introduction of Graphics and Imaging Zahid Hossain

Computer Graphics Course Subdivision 2005 The process of creating a smooth (curve) surface

Subdivision Surfaces Geris Game (1989) : Pixar Animation Studios Nathan Carr, N. Nikolaidis et

Arguing about potential causal relations Leila Amgoud Henri Prade IRIT, Universit de

Causal reasoning and inference with causal Bayes nets Alexander Gebharter Duesseldorf Center for

Probabilistic Models CS 4100: Artificial Intelligence Bayes Nets Models describe how (a

ECE 4524 Artificial Intelligence and Engineering Applications Lecture 19: Bayesian Networks