Baysian Networks Marco Chiarandini Department of Mathematics & - PowerPoint PPT Presentation

Lecture 5 Baysian Networks Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Slides by Stuart Russell and Peter Norvig

Probability Basis Course Overview Bayesian networks ✔ Introduction Learning ✔ Artificial Intelligence Supervised ✔ Intelligent Agents Learning Bayesian Networks, Neural Networks ✔ Search Unsupervised ✔ Uninformed Search EM Algorithm ✔ Heuristic Search Reinforcement Learning Uncertain knowledge and Games and Adversarial Search Reasoning Minimax search and Probability and Bayesian Alpha-beta pruning approach Multiagent search Bayesian Networks Knowledge representation and Hidden Markov Chains Reasoning Kalman Filters Propositional logic First order logic Inference Plannning 2

Probability Basis Outline Bayesian networks 1. Probability Basis 2. Bayesian networks 3

Probability Basis Summary Bayesian networks Probability is a rigorous formalism for uncertain knowledge Joint probability distribution specifies probability of every atomic event Queries can be answered by summing over atomic events For nontrivial domains, we must find a way to reduce the joint size Independence and conditional independence provide the tools 4

Probability Basis Outline Bayesian networks 1. Probability Basis 2. Bayesian networks 5

Probability Basis Outline Bayesian networks ♦ Syntax ♦ Semantics ♦ Parameterized distributions 6

Probability Basis Bayesian networks Bayesian networks Definition A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions Syntax: a set of nodes, one per variable a directed, acyclic graph (link ≈ “directly influences”) a conditional distribution for each node given its parents: Pr ( X i | Parents ( X i )) In the simplest case, conditional distribution represented as a conditional probability table (CPT) giving the distribution over X i for each combination of parent values 7

Probability Basis Example Bayesian networks Topology of network encodes conditional independence assertions: Cavity Weather Toothache Catch Weather is independent of the other variables Toothache and Catch are conditionally independent given Cavity 8

Probability Basis Example Bayesian networks I’m at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn’t call. Sometimes it’s set off by minor earthquakes. Is there a burglar? Variables: Burglar , Earthquake , Alarm , JohnCalls , MaryCalls Network topology reflects “causal” knowledge: – A burglar can set the alarm off – An earthquake can set the alarm off – The alarm can cause Mary to call – The alarm can cause John to call 9

Probability Basis Example contd. Bayesian networks P(E) P(B) Burglary Earthquake .001 .002 B E P(A|B,E) T T .95 Alarm T F .94 F T .29 F F .001 A P(J|A) A P(M|A) T JohnCalls .90 .70 MaryCalls T F .05 F .01 10

Probability Basis Compactness Bayesian networks A CPT for Boolean X i with k Boolean parents has 2 k B E rows for the combinations of parent values A Each row requires one number p for X i = true (the number for X i = false is just 1 − p ) If each variable has no more than k parents, J M the complete network requires O ( n · 2 k ) numbers I.e., grows linearly with n , vs. O ( 2 n ) for the full joint distribution For burglary net, 1 + 1 + 4 + 2 + 2 = 10 numbers (vs. 2 5 − 1 = 31) 11

Probability Basis Global semantics Bayesian networks “Global” semantics defines the full joint distribution B E as the product of the local conditional distributions: n A � P ( x 1 , . . . , x n ) = P ( x i | parents ( X i )) i = 1 J M e.g., P ( j ∧ m ∧ a ∧ ¬ b ∧ ¬ e ) = P ( j | a ) P ( m | a ) P ( a | ¬ b , ¬ e ) P ( ¬ b ) P ( ¬ e ) = 0 . 9 × 0 . 7 × 0 . 001 × 0 . 999 × 0 . 998 ≈ 0 . 00063 12

Probability Basis Constructing Bayesian networks Bayesian networks Need a method such that a series of locally testable assertions of conditional independence guarantees the required global semantics Choose an ordering of variables X 1 , . . . , X n For i = 1 to n add X i to the network select parents from X 1 , . . . , X i − 1 such that Pr ( X i | Parents ( X i )) = Pr ( X i | X 1 , . . . , X i − 1 ) This choice of parents guarantees the global semantics: n � Pr ( X 1 , . . . , X n ) = Pr ( X i | X 1 , . . . , X i − 1 ) (chain rule) i = 1 n � = Pr ( X i | Parents ( X i )) (by construction) i = 1 13

Probability Basis Example Bayesian networks Suppose we choose the ordering M , J , A , B , E P ( J | M ) = P ( J ) ? No MaryCalls P ( A | J , M ) = P ( A | J ) ? P ( A | J , M ) = P ( A ) ? No JohnCalls P ( B | A , J , M ) = P ( B | A ) ? Yes P ( B | A , J , M ) = P ( B ) ? No Alarm P ( E | B , A , J , M ) = P ( E | A ) ? No P ( E | B , A , J , M ) = P ( E | A , B ) ? Yes Deciding conditional independence is Burglary hard in noncausal directions Earthquake (Causal models and conditional independence seem hardwired for humans!) Assessing conditional probabilities is hard in noncausal directions Network is less compact: 1 + 2 + 4 + 2 + 4 = 13 numbers needed 14

Probability Basis Example: Car insurance Bayesian networks SocioEcon Age GoodStudent ExtraCar Mileage RiskAversion VehicleYear SeniorTrain DrivingSkill MakeModel DrivingHist Antilock DrivQuality HomeBase AntiTheft CarValue Airbag Accident Ruggedness Theft OwnDamage Cushioning OwnCost OtherCost MedicalCost LiabilityCost PropertyCost 16

Probability Basis Compact conditional distributions Bayesian networks CPT grows exponentially with number of parents CPT becomes infinite with continuous-valued parent or child Solution: canonical distributions that are defined compactly Deterministic nodes are the simplest case: X = f ( Parents ( X )) for some function f E.g., Boolean functions NorthAmerican ⇔ Canadian ∨ US ∨ Mexican E.g., numerical relationships among continuous variables ∂ Level = inflow + precipitation - outflow - evaporation ∂ t 17

Probability Basis Compact conditional distributions contd. Bayesian networks Noisy-OR distributions model multiple noninteracting causes 1) Parents U 1 . . . U k include all causes (can add leak node) 2) Independent failure probability q i for each cause alone j � = ⇒ P ( X | U 1 . . . U j , ¬ U j + 1 . . . ¬ U k ) = 1 − q i i = 1 Cold Flu Malaria P ( Fever ) P ( ¬ Fever ) F F F 0.0 1 . 0 F F T 0 . 9 0.1 F T F 0 . 8 0.2 F T T 0 . 98 0 . 02 = 0 . 2 × 0 . 1 T F F 0 . 4 0.6 T F T 0 . 94 0 . 06 = 0 . 6 × 0 . 1 T T F 0 . 88 0 . 12 = 0 . 6 × 0 . 2 T T T 0 . 988 0 . 012 = 0 . 6 × 0 . 2 × 0 . 1 Number of parameters linear in number of parents 18

Probability Basis Hybrid (discrete+continuous) networks Bayesian networks Discrete ( Subsidy ? and Buys ? ); continuous ( Harvest and Cost ) Subsidy? Harvest Cost Buys? Option 1: discretization—possibly large errors, large CPTs Option 2: finitely parameterized canonical families 1) Continuous variable, discrete+continuous parents (e.g., Cost ) 2) Discrete variable, continuous parents (e.g., Buys ? ) 19

Probability Basis Continuous child variables Bayesian networks Need one conditional density function for child variable given continuous parents, for each possible assignment to discrete parents Most common is the linear Gaussian model, e.g.,: P ( Cost = c | Harvest = h , Subsidy = true ) = N ( a t h + b t , σ t ) � � 2 � � c − ( a t h + b t ) 1 − 1 = √ exp 2 σ t σ t 2 π Mean Cost varies linearly with Harvest , variance is fixed � Linear variation is unreasonable over the full range but works OK if the likely range of Harvest is narrow 20

Probability Basis Continuous child variables Bayesian networks P(Cost|Harvest,Subsidy?=true) 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 5 5 Cost 10 0 All-continuous network with linear Gaussian distributions = ⇒ full joint distribution is a multivariate Gaussian Discrete+continuous linear Gaussian network is a conditional Gaussian network i.e., a multivariate Gaussian over all continuous variables for each combination of discrete variable values 21

Probability Basis Discrete variable w/ continuous parents Bayesian networks Probability of Buys ? given Cost should be a “soft” threshold: Normal Distribution: µ = 0, σ = 1 1.0 0.8 Cumulative Probability 0.6 0.4 0.2 0.0 −3 −2 −1 0 1 2 3 x Probit distribution uses integral of Gaussian: � x Φ( x ) = −∞ N ( 0 , 1 )( x ) dx P ( Buys ? = true | Cost = c ) = Φ(( − c + µ ) /σ ) 22

Probability Basis Why the probit? Bayesian networks 1. It’s sort of the right shape 2. Can be viewed as hard threshold whose location is subject to noise Cost Cost Noise Buys? 23

Probability Basis Discrete variable contd. Bayesian networks Sigmoid (or logit) distribution also used in neural networks: 1 P ( Buys ? = true | Cost = c ) = 1 + exp ( − 2 − c + µ ) σ Sigmoid has similar shape to probit but much longer tails: Logistic Distribution: location = 0, scale = 1 1.0 0.8 Cumulative Probability 0.6 0.4 0.2 0.0 −5 0 5 x 24

Baysian Networks Marco Chiarandini Department of Mathematics & - PowerPoint PPT Presentation

Lecture 5 Baysian Networks Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Slides by Stuart Russell and Peter Norvig Probability Basis Course Overview Bayesian networks Introduction

Baysian Haplotype Inference via the Dirichlet Process Eric Xing, Micheal Jordan, Roded Sharan

P2P Networks as Content P2P Networks as Content Delivery Networks Delivery Networks FINAL

Current Network Structure for Pediatrics Hospital Networks Country, state, regional, Academic

Mobile Communications Ad-Hoc Networks & Wireless Sensor Networks Ad-hoc networks

Outline Applications of Random Networks Random Networks Applications of Random Networks

Types of networks (social networks, computer networks, entity- relationship networks, )

CSC421 Intro to Artificial Intelligence UNIT 22: Probabilistic Reasoning Midterm Review

1 Baysian networks for decision analysis The TV show problem as a BN Problems More than

1 Baysian networks for decision analysis The TV show problem as a BN Problems Choice 1

Computer Networks I Computer Networks I Networks A networks connection structure is known as

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Mobility and cellular networks Mobility and cellular networks Cellular radio and PCS networks

Overview Multi-layer networks: Cognitive Modeling limits of single layer networks; Lecture

Chapter 1 Communication Networks and Services Networks and Services Network Architecture and

Regional Networks Regional Networks Rural Creative Placemaking Summit Regional

Core Models of Complex Networks Principles of Complex Systems Generalized random networks

Probabilistic Models CS 4100: Artificial Intelligence Bayes Nets Models describe how (a

Evaluating Software Sensors for Actively Profiling Windows 2000 Computer Users Mark Shavlik

TulStat 911 PSC The City Experience Inside City Hall April 17, 2017 911 PSC Mission

Overview Last Week: How to program UNIX processes (Chapters 7-9) fork() and exec() Unix

1 Building the (Entire) Joint Example: Alarm Network We can take a Bayes net and build any

Embedded Systems Programming Signaling (Module 24) Yann-Hang Lee Arizona State University

s tr t s

Leading with Innovation NIC Virtual Conference November 9, 2016 2 Drones: Implications of

Baysian Networks Marco Chiarandini Department of Mathematics & - PowerPoint PPT Presentation

Lecture 5 Baysian Networks Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Slides by Stuart Russell and Peter Norvig Probability Basis Course Overview Bayesian networks Introduction

Baysian Haplotype Inference via the Dirichlet Process Eric Xing, Micheal Jordan, Roded Sharan

P2P Networks as Content P2P Networks as Content Delivery Networks Delivery Networks FINAL

Current Network Structure for Pediatrics Hospital Networks Country, state, regional, Academic

Mobile Communications Ad-Hoc Networks &amp; Wireless Sensor Networks Ad-hoc networks

Outline Applications of Random Networks Random Networks Applications of Random Networks

Types of networks (social networks, computer networks, entity- relationship networks, )

CSC421 Intro to Artificial Intelligence UNIT 22: Probabilistic Reasoning Midterm Review

1 Baysian networks for decision analysis The TV show problem as a BN Problems More than

1 Baysian networks for decision analysis The TV show problem as a BN Problems Choice 1

Computer Networks I Computer Networks I Networks A networks connection structure is known as

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Mobility and cellular networks Mobility and cellular networks Cellular radio and PCS networks

Overview Multi-layer networks: Cognitive Modeling limits of single layer networks; Lecture

Chapter 1 Communication Networks and Services Networks and Services Network Architecture and

Regional Networks Regional Networks Rural Creative Placemaking Summit Regional

Core Models of Complex Networks Principles of Complex Systems Generalized random networks

Probabilistic Models CS 4100: Artificial Intelligence Bayes Nets Models describe how (a

Evaluating Software Sensors for Actively Profiling Windows 2000 Computer Users Mark Shavlik

TulStat 911 PSC The City Experience Inside City Hall April 17, 2017 911 PSC Mission

Overview Last Week: How to program UNIX processes (Chapters 7-9) fork() and exec() Unix

1 Building the (Entire) Joint Example: Alarm Network We can take a Bayes net and build any

Embedded Systems Programming Signaling (Module 24) Yann-Hang Lee Arizona State University

s tr t s

Leading with Innovation NIC Virtual Conference November 9, 2016 2 Drones: Implications of

Mobile Communications Ad-Hoc Networks & Wireless Sensor Networks Ad-hoc networks