15-780: ProbabilisticGraphicalModels J. Zico Kolter February 22-24, - PowerPoint PPT Presentation

15-780: Probabilistic Graphical Models J. Zico Kolter February 22-24, 2016 1

Outline Introduction Probability background Probabilistic graphical models Probabilistic inference MAP Inference 2

Probabilistic reasoning Thus far, most of the problems we have encountered in the course have been deterministic (e.g., assigning an exact set of value to variables, searching where we can deterministically transition between states, optimizing deterministic costs, etc) Many tasks in the real world involve reasoning about and making decisions using probabilities A fundamental shift in AI work that occurred during the 80s/90s 4

Example: topic modeling “Genetics” “Evolution” “Disease” “Computers” human evolution disease computer genome evolutionary host models 0.4 dna species bacteria information genetic organisms diseases data 0.3 genes life resistance computers sequence origin bacterial system Probability gene biology new network 0.2 molecular groups strains systems sequencing phylogenetic control model 0.1 map living infectious parallel information diversity malaria methods 0.0 genetics group parasite networks mapping new parasites software 1 8 16 26 36 46 56 66 76 86 96 project two united new Topics sequences common tuberculosis simulations For documents in a large collection of text, model p ( Word | Topic ) , p ( Topic ) Figure from (Blei, 2011), shows topics and top words learned automatically from reading 17,000 Science articles 5

Example: image segmentation Figure (Nowozin and Lampert, 2012) shows image segmentation problem, original image on left, where goal is to separate foreground from background Middle figure shows a segmentation where each pixel is individually classified as belonging to foreground or background Right figure shows a segmentation where the segmentation is inferred from a probability model over all pixels jointly (encoding probability that neighboring pixels tend to belong to the same group) 6

Example: modeling protein networks In cellular modeling, can we automatically determine how the presence or absence of some proteins affects other proteins? Figure from (Sachs et al., 2005) shows automatically inferred protein probability network, which captured most of the known interactions using data-driven methods (far less manual effort than previous methods) 7

Probabilistic graphical models A common theme in the past several examples is that each relied on a probabilistic model defined over hundreds, thousands, or potentially millions of different quantities “Traditional” joint probability models would not be able to tractably represent and reason over such distributions A key advance in AI has been the development of probabilistic models that exploit notions of independence to compactly model and answer probabilities queries about such distributions 8

Random variables A random variable (informally) is a variable whose value is not initially known Instead, the variable can take on different values (and it must take on exactly one of these values), each with an associated probability Weather ∈ { sunny , rainy , cloudy , snowy } p ( Weather = sunny ) = 0 . 3 p ( Weather = rainy ) = 0 . 2 . . . In this course we’ll deal almost exclusively with discrete random variables (taking on values from some finite set) 10

Notation for random variables We’ll use upper case letters X i to denote random variables Important : for a random variable X i taking values { 0 , 1 , 2 }   0 . 1 p ( X i ) = 0 . 4   0 . 5 represents a tuple of the probabilities for each value that X i can take Conversely p ( x i ) (for x i a specific value in { 0 , 1 , 2 } ), or sometimes p ( X i = x i ) , refers to a number (the corresponding entry in the p ( X i ) vector) 11

Given two random variables X 1 with values { 0 , 1 , 2 } and X 2 with values { 0 , 1 } : - p ( X 1 , X 2 ) refers to the entire joint distribution, i.e., it is a tuple with 6 elements (one for each setting of variables) - p ( x 1 , x 2 ) is a number indicating the probability that X 1 = x 1 and X 2 = x 2 . - p ( X 1 , x 2 ) is a tuple with 3 elements, the probabilities for all values of X 1 and the specific value x 2 (note: this is not a probability distribution, it will not sum to one) 12

Basic rules of probability Marginalization : for any random variables X 1 , X 2 [ ] ∑ ∑ p ( X 1 ) = p ( X 1 , x 2 ) = p ( X 1 | x 2 ) p ( x 2 ) x 2 x 2 Conditional probability : The conditional probability p ( X 1 | X 2 ) is defined as p ( X 1 | X 2 ) = p ( X 1 , X 2 ) p ( X 2 ) Chain rule : For any X 1 , . . . , X n n ∏ p ( X 1 , . . . , X n ) = p ( X i | X 1 , . . . , X i − 1 ) i =1 13

= Bayes’s rule : Using definition of conditional probability p ( X 1 , X 2 ) = p ( X 1 | X 2 ) p ( X 2 ) = p ( X 2 | X 1 ) p ( X 1 ) ⇒ p ( X 1 | X 2 ) = p ( X 2 | X 1 ) p ( X 1 ) p ( X 2 | X 1 ) p ( X 1 ) = p ( X 2 ) ∑ x 1 p ( X 2 | x 1 ) p ( x 1 ) An example : I want to know if I have come down with a rare strain of flu (occuring in only 1/10,000 people). There is a an accurate test for the flu (if I have the flu, it will tell me I have it 99% of the time, and if I do not have it, it will tell me I do not have it 99% of the time). I go to the doctor and test positive. What is the probability I have this flu? 14

Independence Two random variables X 1 and X 2 are said to be (marginally, or unconditionally) independent , written X 1 ⊥ ⊥ X 2 , if the joint distribution is given by the product of the marginal distributions p ( X 1 , X 2 ) = p ( X 1 ) p ( X 2 ) ⇐ ⇒ p ( X 1 | X 2 ) = p ( X 1 ) Two random variables X 1 , X 2 are conditionally independent given X 3 , written X 1 ⊥ ⊥ X 2 | X 3 , if p ( X 1 , X 2 | X 3 ) = p ( X 1 | X 3 ) p ( X 2 | X 3 ) ⇐ ⇒ p ( X 1 | X 2 , X 3 ) = p ( X 1 | X 3 ) Marginal independence does not imply conditional independence or vice versa 15

High dimensional distributions Probabilistic graphical models (PGMs) are about representing probability distributions over random variables p ( X ) ≡ p ( X 1 , . . . , X n ) where for the remainder of this lecture, x i ∈ { 0 , 1 } n Naively, since there are 2 n possible assignments to X 1 , . . . , X n , can represent this distribution completely using 2 n − 1 numbers, but quickly becomes intractable for large n PGMs are methods to represent these distributions more compactly, by exploiting conditional independence 17

Bayesian networks A Bayesian network is defined by: 1. A directed acyclic graph (DAG) G = ( V = { X 1 , . . . , X n } , E ) 2. A set of conditional probability tables p ( X i | Parents ( X i )) Defines the joint probability distribution n ∏ p ( X ) = p ( X i | Parents ( X i )) i =1 Equivalently, each node is conditionally independent of all non-descendants given its parents 18

Can write distribution as Bayes net example Burglary? Earthquake? X 1 X 2 Alarm? X 3 X 4 X 5 MaryCalls? JohnCalls? 19

Can write distribution as Bayes net example Burglary? Earthquake? p ( X 1 = 1) p ( X 2 = 1) X 1 X 2 0.001 0.002 p ( X 3 = 1) X 1 X 2 Alarm? X 3 0 0 0.001 0 1 0.29 1 0 0.94 X 4 X 5 1 1 0.95 MaryCalls? JohnCalls? p ( X 4 = 1) p ( X 5 = 1) X 3 X 3 0 0.05 0 0.01 1 0.9 1 0.7 19

Bayes net example Burglary? Earthquake? p ( X 1 = 1) p ( X 2 = 1) X 1 X 2 0.001 0.002 p ( X 3 = 1) X 1 X 2 Alarm? X 3 0 0 0.001 0 1 0.29 1 0 0.94 X 4 X 5 1 1 0.95 MaryCalls? JohnCalls? p ( X 4 = 1) p ( X 5 = 1) X 3 X 3 0 0.05 0 0.01 1 0.9 1 0.7 Can write distribution as p ( X ) = p ( X 1 ) p ( X 2 | X 1 ) p ( X 3 | X 1:2 ) p ( X 4 | X 1:3 ) p ( X 5 | X 1:4 ) 19 = p ( X 1 ) p ( X 2 ) p ( X 3 | X 1 , X 2 ) p ( X 4 | X 3 ) p ( X 5 | X 3 )

Independence in Bayes nets Burglary? Earthquake? ⊥ X 5 ? X 4 ⊥ X 1 X 2 ⊥ X 5 | X 3 ? X 4 ⊥ Alarm? X 3 ⊥ X 2 ? X 1 ⊥ ⊥ X 2 | X 3 ? X 1 ⊥ X 4 X 5 MaryCalls? JohnCalls? ⊥ X 2 | X 5 ? X 1 ⊥ 20

Conditional independence in Bayesian networks is characterized by a test called d-separation Two variables X i and X j and conditionally independent given a set of variables X I , if and only if, for all trails connecting X i and X j in the graph, at least one of the following holds: 1. The trail contains a set of nodes X u → X v → X w and X v ∈ X I 2. The trail contains a set of nodes X u ← X v → X w and X v ∈ X I 3. The trail contains a set of nodes X u → X v ← X w and X v and its descendants are not in X I For computing d-separation: (R. Shachter, “Bayes-Ball: The Rational Pastime,” 1998) 21

15-780: ProbabilisticGraphicalModels J. Zico Kolter February 22-24, - PowerPoint PPT Presentation

15-780: ProbabilisticGraphicalModels J. Zico Kolter February 22-24, 2016 1 Outline Introduction Probability background Probabilistic graphical models Probabilistic inference MAP Inference 2 Outline Introduction Probability background

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic Graphical Models A graph G that

Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Probabilistic Graphical Models Probabilistic Graphical Models introduction to learning Siamak

Computer Science Let me be provocative Probabilistic graphical models is how we do probabilistic

Probabilistic Graphical Models Probabilistic Graphical Models Undirected Models Fall 2019

Probabilistic Graphical Models Probabilistic Graphical Models parameter learning in undirected

Probabilistic Graphical Models Probabilistic Graphical Models Gaussian Network Models Fall 2019

CS 6782: Fall 2010 Probabilistic Graphical Models Guozhang Wang December 10, 2010 1

Probabilistic Graphical Models Probabilistic Graphical Models Relationship between the directed

Probabilistic Graphical Models Probabilistic Graphical Models Review of probability theory

Probabilistic Graphical Models Probabilistic Graphical Models Loopy BP and Bethe Free Energy

Probabilistic Graphical Models Probabilistic Graphical Models Structure learning in Bayesian

Probabilistic Graphical Models Probabilistic Graphical Models MAP inference Siamak Ravanbakhsh

Probabilistic Graphical Models Probabilistic Graphical Models Markov Chain Monte Carlo Inference

The Elimination Algorithm Probabilistic Graphical Models (10- Probabilistic Graphical Models

FY 2017 Supplemental Comprehensive Housing Counseling Grant Application Training Audio is

Chaos, Random Matrix Theory and Spectral Properties of the SYK Model Jacobus Verbaarschot

DISSOLUTION PROFILE SIMILARITY FACTOR, F 2 * Yi Tsong (FDA) *: Disclaimer: This presentation

Income ome A Appr pproach Reser serve f e for Replacemen ents an operating expense

Patty Roberts, Director Myth ESY is just about regression and recoupment Step #1 Download the

Factor Automata of Automata and Applications Mehryar Mohri 1,2 , Pedro Moreno 2 , Eugene Weinstein

Climate Change and the New Industrial Revolution - How we can respond and prosper Professor Lord

Kaon form factor and decay constant from lattice QCD Aida X. El-Khadra (University of Illinois)