the elimination algorithm
play

The Elimination Algorithm Probabilistic Graphical Models (10- - PDF document

School of Computer Science The Elimination Algorithm Probabilistic Graphical Models (10- Probabilistic Graphical Models (10 -708) 708) Lecture 4, Sep 26, 2007 Receptor A Receptor A Receptor B Receptor B X 1 X 1 X 1 X 2 X 2 X 2 Eric Xing


  1. School of Computer Science The Elimination Algorithm Probabilistic Graphical Models (10- Probabilistic Graphical Models (10 -708) 708) Lecture 4, Sep 26, 2007 Receptor A Receptor A Receptor B Receptor B X 1 X 1 X 1 X 2 X 2 X 2 Eric Xing Eric Xing Kinase C Kinase C X 3 X 3 X 3 Kinase D Kinase D X 4 X 4 X 4 Kinase E Kinase E X 5 X 5 X 5 TF F TF F X 6 X 6 X 6 Reading: J-Chap 3, KF-Chap. 8, 9 Gene G Gene G X 7 X 7 X 7 Gene H Gene H X 8 X 8 X 8 1 � Questions? Eric Xing 2 1

  2. Probabilistic Inference � We now have compact representations of probability distributions: Graphical Models � A GM G describes a unique probability distribution P � How do we answer queries about P ? � We use inference as a name for the process of computing answers to such queries Eric Xing 3 Query 1: Likelihood � Most of the queries one may ask involve evidence Evidence e is an assignment of values to a set E variables in the domain � Without loss of generality E = { X k+1 , …, X n } � � Simplest query: compute probability of evidence ∑ ∑ = e e L K P ( ) P ( x , ,x , ) 1 k x x 1 k this is often referred to as computing the likelihood of e � Eric Xing 4 2

  3. Query 2: Conditional Probability � Often we are interested in the conditional probability distribution of a variable given the evidence e e P ( X, ) P ( X, ) = = e P ( X | ) ∑ = e e P ( ) P ( X x, ) x this is the a posteriori belief in X , given evidence e � � We usually query a subset Y of all domain variables X = { Y , Z } and "don't care" about the remaining, Z : ∑ = = Y Y Z z e P ( | e ) P ( , | ) z the process of summing out the "don't care" variables z is called � marginalization , and the resulting P ( y | e ) is called a marginal prob. Eric Xing 5 Applications of a posteriori Belief Prediction : what is the probability of an outcome given the starting � condition ? A B C the query node is a descendent of the evidence � Diagnosis : what is the probability of disease/fault given symptoms � ? A B C the query node an ancestor of the evidence � Learning under partial observation � fill in the unobserved values under an "EM" setting (more later) � The directionality of information flow between variables is not restricted � by the directionality of the edges in a GM probabilistic inference can combine evidence form all parts of the network � Eric Xing 6 3

  4. Query 3: Most Probable Assignment � In this query we want to find the most probable joint assignment (MPA) for some variables of interest � Such reasoning is usually performed under some given evidence e , and ignoring (the values of) other variables z : ∑ = = Y e y e y z e MPA ( | ) arg max P ( | ) arg max P ( , | ) ∈ Y ∈ Y y y z this is the maximum a posteriori configuration of y . � Eric Xing 7 Applications of MPA � Classification find most likely label, given the evidence � � Explanation what is the most likely scenario, given the evidence � Cautionary note: � The MPA of a variable depends on its "context"---the set of variables been jointly queried x y P(x,y) 0 0 0.35 � Example: 0 1 0.05 MPA of X ? � 1 0 0.3 MPA of ( X, Y ) ? � 1 1 0.3 Eric Xing 8 4

  5. Complexity of Inference Thm: Computing P ( X = x | e ) in a GM is NP-hard � Hardness does not mean we cannot solve inference It implies that we cannot find a general procedure that works efficiently � for arbitrary GMs For particular families of GMs, we can have provably efficient � procedures Eric Xing 9 Approaches to inference � Exact inference algorithms The elimination algorithm � Message-passing algorithm (sum-product, belief propagation) � The junction tree algorithms � � Approximate inference techniques Stochastic simulation / sampling methods � Markov chain Monte Carlo methods � Variational algorithms � Eric Xing 10 5

  6. Marginalization and Elimination � A signal transduction pathway: A B C D E What is the likelihood that protein E is active? � Query: P ( e ) ∑∑∑∑ = P ( e ) P(a,b,c,d, e) d c b a a naïve summation needs to enumerate over an exponential number of terms � By chain decomposition, we get ∑∑∑∑ = P ( a ) P ( b | a ) P ( c | b ) P ( d | c ) P ( e | d ) d c b a Eric Xing 11 Elimination on Chains A B C D E � Rearranging terms ... ∑∑∑∑ = P ( e ) P ( a ) P ( b | a ) P ( c | b ) P ( d | c ) P ( e | d ) d c b a ∑∑∑ ∑ = P ( c | b ) P ( d | c ) P ( e | d ) P ( a ) P ( b | a ) d c b a Eric Xing 12 6

  7. Elimination on Chains X A B C D E � Now we can perform innermost summation ∑∑∑ ∑ = P ( e ) P ( c | b ) P ( d | c ) P ( e | d ) P ( a ) P ( b | a ) d c b a ∑∑∑ = P ( c | b ) P ( d | c ) P ( e | d ) p ( b ) d c b � This summation "eliminates" one variable from our summation argument at a "local cost". Eric Xing 13 Elimination in Chains X X A B C D E � Rearranging and then summing again, we get ∑∑∑ = P ( e ) P ( c | b ) P ( d | c ) P ( e | d ) p ( b ) d c b ∑∑ ∑ = P ( d | c ) P ( e | d ) P ( c | b ) p ( b ) d c b ∑∑ = P ( d | c ) P ( e | d ) p ( c ) d c Eric Xing 14 7

  8. Elimination in Chains X X X X A B C D E � Eliminate nodes one by one all the way to the end, we get ∑ = P ( e ) P ( e | d ) p ( d ) d � Complexity: Each step costs O(|Val(X i )|*|Val(X i+1 )|) operations: O(kn 2 ) � Compare to naïve evaluation that sums over joint values of n-1 variables O(n k ) � Eric Xing 15 Undirected Chains A B C D E � Rearranging terms ... 1 ∑∑∑∑ = φ φ φ φ P ( e ) ( b , a ) ( c , b ) ( d , c ) ( e , d ) Z d c b a 1 ∑∑∑ ∑ = φ φ φ φ ( c , b ) ( d , c ) ( e , d ) ( b , a ) Z d c b a = L Eric Xing 16 8

  9. The Sum-Product Operation � In general, we can view the task at hand as that of computing the value of an expression of the form: ∑∏ φ φ ∈ F z where F is a set of factors � We call this task the sum-product inference task. Eric Xing 17 Outcome of elimination Let X be some set of variables, � let F be a set of factors such that for each φ ∈ F , Scope [ φ ] ∈ X , let Y ⊂ X be a set of query variables, and let Z = X − Y be the variable to be eliminated The result of eliminating the variable Z is a factor � ∑∏ τ = φ Y ( ) φ ∈ F z This factor does not necessarily correspond to any probability or conditional � probability in this network. (example forthcoming) Eric Xing 18 9

  10. Dealing with evidence � Conditioning as a Sum-Product Operation 1 ⎧ ≡ if E e δ = i i The evidence potential: ( E , e ) ⎨ � ≠ i i ⎩ 0 if E e i i ∏ δ = δ E e Total evidence potential: ( , ) ( E i e , ) � i ∈ i I E Introducing evidence --- restricted factors: � ∑∏ τ = φ × δ Y e E e ( , ) ( , ) z e φ ∈ F , Eric Xing 19 Inference on General GM via Variable Elimination General idea: Write query in the form � ∑ ∑∑∏ = e L P ( X , ) P ( x | pa ) 1 i i x n x x i 3 2 � this suggests an "elimination order" of latent variables to be marginalized Iteratively � � Move all irrelevant terms outside of innermost sum � Perform innermost sum, getting a new term � Insert the new term into the product φ e ( X , ) wrap-up = 1 � e P ( X | ) ∑ 1 φ e ( X , ) 1 x 1 Eric Xing 20 10

  11. The elimination algorithm Procedure Elimination ( G , // the GM E , // evidence Z , // Set of variables to be eliminated X , // query variable(s) ) Initialize ( G ) 1. Evidence ( E ) 2. Sum-Product-Elimination ( F , Z , ≺ ) 3. Normalization ( F ) 4. Eric Xing 21 The elimination algorithm Procedure Initialize ( G , Z ) Let Z 1 , . . . ,Z k be an ordering of Z 1. such that Z i ≺ Z j iff i < j Initialize F with the full the set of 2. factors Procedure Evidence ( E ) for each i ∈ Ι E , 1. F = F ∪δ ( E i , e i ) Procedure Sum-Product-Variable- Elimination ( F , Z , ≺ ) for i = 1, . . . , k 1. F ← Sum-Product-Eliminate-Var( F , Z i ) φ ∗ ← ∏ φ∈ F φ 2. return φ ∗ 3. Normalization ( φ ∗ ) 4. Eric Xing 22 11

  12. The elimination algorithm Procedure Normalization ( φ ∗ ) Procedure Initialize ( G , Z ) P ( X | E )= φ ∗ ( X )/ ∑ x φ ∗ ( X ) Let Z 1 , . . . ,Z k be an ordering of Z 1. 1. such that Z i ≺ Z j iff i < j Initialize F with the full the set of 2. factors Procedure Evidence ( E ) for each i ∈ Ι E , 1. Procedure Sum-Product-Eliminate-Var ( F = F ∪δ ( E i , e i ) F , // Set of factors Z // Variable to be eliminated Procedure Sum-Product-Variable- Elimination ( F , Z , ≺ ) ) F ′ ← { φ ∈ F : Z ∈ Scope [ φ ]} 1. for i = 1, . . . , k 1. F ′′ ← F − F ′ 2. F ← Sum-Product-Eliminate-Var( F , Z i ) ψ ← ∏ φ ∈ F ′ φ φ ∗ ← ∏ φ∈ F φ 3. 2. τ ← ∑ Z ψ 4. return φ ∗ 3. return F ′′ ∪ { τ } Normalization ( φ ∗ ) 5. 4. Eric Xing 23 A more complex network A food web B A C D F E G H What is the probability that hawks are leaving given that the grass condition is poor? Eric Xing 24 12

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend