CZECH TECHNICAL UNIVERSITY IN PRAGUE Faculty of Electrical Engineering Department of Cybernetics Bayesian networks Petr Poˇ s´ ık Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics Significant parts of this material come from the lectures on Bayesian networks which are part of Artificial Intelligence course by Pieter Abbeel and Dan Klein. The original lectures can be found at http://ai.berkeley.edu P. Poˇ s´ ık c � 2020 Artificial Intelligence – 1 / 38 petr.posik@fel.cvut.cz
Introduction P. Poˇ s´ ık c � 2020 Artificial Intelligence – 2 / 38 petr.posik@fel.cvut.cz
Uncertainty Probabilistic reasoning is one of the frameworks that allow us to maintain our beliefs and knowledge in uncertain environments. Introduction • Uncertainty • Notation • Question • Joint distribution • Cheatsheet • Contents Bayesian networks Inference Summary P. Poˇ s´ ık c � 2020 Artificial Intelligence – 3 / 38 petr.posik@fel.cvut.cz
Uncertainty Probabilistic reasoning is one of the frameworks that allow us to maintain our beliefs and knowledge in uncertain environments. Introduction Usual scenario: • Uncertainty • Notation ■ Observed variables (evidence): known things related to the state of the world; often • Question imprecise, noisy (info from sensors, symptoms of a patient, etc.). • Joint distribution • Cheatsheet ■ Unobserved, hidden variables: unknown, but important aspects of the world; we • Contents need to reason about them (what the position of an object is, whether a disease is Bayesian networks present, etc.) Inference ■ Model: describes the relations among hidden and observed variables; allows us to Summary reason. P. Poˇ s´ ık c � 2020 Artificial Intelligence – 3 / 38 petr.posik@fel.cvut.cz
Uncertainty Probabilistic reasoning is one of the frameworks that allow us to maintain our beliefs and knowledge in uncertain environments. Introduction Usual scenario: • Uncertainty • Notation ■ Observed variables (evidence): known things related to the state of the world; often • Question imprecise, noisy (info from sensors, symptoms of a patient, etc.). • Joint distribution • Cheatsheet ■ Unobserved, hidden variables: unknown, but important aspects of the world; we • Contents need to reason about them (what the position of an object is, whether a disease is Bayesian networks present, etc.) Inference ■ Model: describes the relations among hidden and observed variables; allows us to Summary reason. Models (including probabilistic) ■ describe how (a part of) the world works. ■ are always approximations or simplifications: ■ They cannot acount for everything (they would be as complex as the world itself). ■ They represent only a chosen subset of variables and interactions between them. ■ “All models are wrong; some are useful.” — George E. P. Box P. Poˇ s´ ık c � 2020 Artificial Intelligence – 3 / 38 petr.posik@fel.cvut.cz
Uncertainty Probabilistic reasoning is one of the frameworks that allow us to maintain our beliefs and knowledge in uncertain environments. Introduction Usual scenario: • Uncertainty • Notation ■ Observed variables (evidence): known things related to the state of the world; often • Question imprecise, noisy (info from sensors, symptoms of a patient, etc.). • Joint distribution • Cheatsheet ■ Unobserved, hidden variables: unknown, but important aspects of the world; we • Contents need to reason about them (what the position of an object is, whether a disease is Bayesian networks present, etc.) Inference ■ Model: describes the relations among hidden and observed variables; allows us to Summary reason. Models (including probabilistic) ■ describe how (a part of) the world works. ■ are always approximations or simplifications: ■ They cannot acount for everything (they would be as complex as the world itself). ■ They represent only a chosen subset of variables and interactions between them. ■ “All models are wrong; some are useful.” — George E. P. Box A probabilistic model is a joint distribution over a set of random variables. P. Poˇ s´ ık c � 2020 Artificial Intelligence – 3 / 38 petr.posik@fel.cvut.cz
Notation Random variables (start with capital letters): X , Y , Weather , . . . Introduction • Uncertainty Values of random variables (start with lower-case letters): • Notation • Question x 1 , e i , rainy , . . . • Joint distribution • Cheatsheet • Contents Probability distribution of a random variable: Bayesian networks P ( X ) or P X Inference Summary Probability of a random event: P ( X = x 1 ) or P X ( x 1 ) Shorthand for a probability of a random event (if there is no chance of confusion): P (+ r ) meaning P ( Rainy = true ) or P ( r ) meaning P ( Weather = rainy ) P. Poˇ s´ ık c � 2020 Artificial Intelligence – 4 / 38 petr.posik@fel.cvut.cz
Question Which of the following equations for the joint probability distributions over random variables X 1 , . . . , X n holds in general? Introduction n A • Uncertainty ∏ P ( X 1 , X 2 , . . . , X n ) = P ( X 1 ) P ( X 2 ) P ( X 3 ) · . . . = P ( X i ) • Notation • Question i = 1 • Joint distribution n B • Cheatsheet ∏ P ( X 1 , X 2 , . . . , X n ) = P ( X 1 ) P ( X 2 | X 1 ) P ( X 3 | X 2 ) · . . . = P ( X i | X i − 1 ) • Contents i = 1 Bayesian networks n C Inference ∏ P ( X 1 , X 2 , . . . , X n ) = P ( X 1 ) P ( X 2 | X 1 ) P ( X 3 | X 1 , X 2 ) · . . . = P ( X i | X 1 , . . . , X i − 1 ) Summary i = 1 D None of the above holds in general, all of them hold in special cases only. P. Poˇ s´ ık c � 2020 Artificial Intelligence – 5 / 38 petr.posik@fel.cvut.cz
Joint probability distribution Joint distribution over a set of variables X 1 , . . . , X n (here descrete) assigns a probability to each combination of values: Introduction P ( X 1 = x 1 , . . . , X n = x n ) = P ( x 1 , . . . , x n ) • Uncertainty • Notation For a proper probability distribution: • Question • Joint distribution • Cheatsheet ∑ ∀ x 1 , . . . , x n : P ( x 1 , . . . , x n ) ≥ 0 and P ( x 1 , . . . , x n ) = 1 • Contents x 1 ,..., x n Bayesian networks Inference Summary P. Poˇ s´ ık c � 2020 Artificial Intelligence – 6 / 38 petr.posik@fel.cvut.cz
Joint probability distribution Joint distribution over a set of variables X 1 , . . . , X n (here descrete) assigns a probability to each combination of values: Introduction P ( X 1 = x 1 , . . . , X n = x n ) = P ( x 1 , . . . , x n ) • Uncertainty • Notation For a proper probability distribution: • Question • Joint distribution • Cheatsheet ∑ ∀ x 1 , . . . , x n : P ( x 1 , . . . , x n ) ≥ 0 and P ( x 1 , . . . , x n ) = 1 • Contents x 1 ,..., x n Bayesian networks Probabilistic inference Inference Summary ■ Compute a desired probability from other known probabilities (e.g. marginal or conditional from joint). ■ Conditional probabilities turn out to be the most interesting ones: ■ They represent our or agent’s beliefs given the evidence (measured values of observable variables). P ( bus on time | rush our ) = 0.8 ■ ■ Probabilities change with new evidence: P ( bus on time ) = 0.95 ■ P ( bus on time | rush our ) = 0.8 ■ P ( bus on time | rush our, dry roads ) = 0.85 ■ P. Poˇ s´ ık c � 2020 Artificial Intelligence – 6 / 38 petr.posik@fel.cvut.cz
Probability cheatsheet Conditional probability: P ( X | Y ) = P ( X , Y ) Introduction P ( Y ) • Uncertainty • Notation Product rule: • Question • Joint distribution • Cheatsheet P ( X , Y ) = P ( X | Y ) P ( Y ) • Contents Bayesian networks Bayes rule: Inference P ( x | y ) = P ( y | x ) P ( x ) P ( y | x ) P ( x ) Summary = P ( y ) ∑ i P ( y | x i ) P ( x i ) Chain rule: n ∏ P ( X 1 , X 2 , . . . , X n ) = P ( X 1 ) P ( X 2 | X 1 ) P ( X 3 | X 1 , X 2 ) · . . . = P ( X i | X 1 , . . . , X i − 1 ) i = 1 X ⊥ ⊥ Y ( X and Y are independent ) iff ∀ x , y : P ( x , y ) = P ( x ) P ( y ) X ⊥ ⊥ Y | Z ( X and Y are conditinally independent given Z ) iff ∀ x , y , z : P ( x , y | z ) = P ( x | z ) P ( y | z ) P. Poˇ s´ ık c � 2020 Artificial Intelligence – 7 / 38 petr.posik@fel.cvut.cz
Contents ■ What is a Bayesian network? ■ How it encodes the joint probability distributions? Introduction ■ What independence assumptions does it encode? • Uncertainty ■ How to perform reasoning using BN? • Notation • Question • Joint distribution • Cheatsheet • Contents Bayesian networks Inference Summary P. Poˇ s´ ık c � 2020 Artificial Intelligence – 8 / 38 petr.posik@fel.cvut.cz
Bayesian networks P. Poˇ s´ ık c � 2020 Artificial Intelligence – 9 / 38 petr.posik@fel.cvut.cz
What’s wrong with the joint distribution? How many free parameters n params does a probability distribution over n variables have, each variable having at least d possible values? ■ For all variables binary ( d = 2): Introduction Bayesian networks • Issues • BN • BN example • Independence • Independence? • Conditional independence • Question • Causality • Assumptions in BN • Independence in BN • Causal chain • Common cause • Common effect • D-separation • D-sep examples Inference Summary P. Poˇ s´ ık c � 2020 Artificial Intelligence – 10 / 38 petr.posik@fel.cvut.cz
Recommend
More recommend