The Elimination Algorithm Probabilistic Graphical Models (10- - PDF document

School of Computer Science The Elimination Algorithm Probabilistic Graphical Models (10- Probabilistic Graphical Models (10 -708) 708) Lecture 4, Sep 26, 2007 Receptor A Receptor A Receptor B Receptor B X 1 X 1 X 1 X 2 X 2 X 2 Eric Xing Eric Xing Kinase C Kinase C X 3 X 3 X 3 Kinase D Kinase D X 4 X 4 X 4 Kinase E Kinase E X 5 X 5 X 5 TF F TF F X 6 X 6 X 6 Reading: J-Chap 3, KF-Chap. 8, 9 Gene G Gene G X 7 X 7 X 7 Gene H Gene H X 8 X 8 X 8 1 � Questions? Eric Xing 2 1

Probabilistic Inference � We now have compact representations of probability distributions: Graphical Models � A GM G describes a unique probability distribution P � How do we answer queries about P ? � We use inference as a name for the process of computing answers to such queries Eric Xing 3 Query 1: Likelihood � Most of the queries one may ask involve evidence Evidence e is an assignment of values to a set E variables in the domain � Without loss of generality E = { X k+1 , …, X n } � � Simplest query: compute probability of evidence ∑ ∑ = e e L K P ( ) P ( x , ,x , ) 1 k x x 1 k this is often referred to as computing the likelihood of e � Eric Xing 4 2

Query 2: Conditional Probability � Often we are interested in the conditional probability distribution of a variable given the evidence e e P ( X, ) P ( X, ) = = e P ( X | ) ∑ = e e P ( ) P ( X x, ) x this is the a posteriori belief in X , given evidence e � � We usually query a subset Y of all domain variables X = { Y , Z } and "don't care" about the remaining, Z : ∑ = = Y Y Z z e P ( | e ) P ( , | ) z the process of summing out the "don't care" variables z is called � marginalization , and the resulting P ( y | e ) is called a marginal prob. Eric Xing 5 Applications of a posteriori Belief Prediction : what is the probability of an outcome given the starting � condition ? A B C the query node is a descendent of the evidence � Diagnosis : what is the probability of disease/fault given symptoms � ? A B C the query node an ancestor of the evidence � Learning under partial observation � fill in the unobserved values under an "EM" setting (more later) � The directionality of information flow between variables is not restricted � by the directionality of the edges in a GM probabilistic inference can combine evidence form all parts of the network � Eric Xing 6 3

Query 3: Most Probable Assignment � In this query we want to find the most probable joint assignment (MPA) for some variables of interest � Such reasoning is usually performed under some given evidence e , and ignoring (the values of) other variables z : ∑ = = Y e y e y z e MPA ( | ) arg max P ( | ) arg max P ( , | ) ∈ Y ∈ Y y y z this is the maximum a posteriori configuration of y . � Eric Xing 7 Applications of MPA � Classification find most likely label, given the evidence � � Explanation what is the most likely scenario, given the evidence � Cautionary note: � The MPA of a variable depends on its "context"---the set of variables been jointly queried x y P(x,y) 0 0 0.35 � Example: 0 1 0.05 MPA of X ? � 1 0 0.3 MPA of ( X, Y ) ? � 1 1 0.3 Eric Xing 8 4

Complexity of Inference Thm: Computing P ( X = x | e ) in a GM is NP-hard � Hardness does not mean we cannot solve inference It implies that we cannot find a general procedure that works efficiently � for arbitrary GMs For particular families of GMs, we can have provably efficient � procedures Eric Xing 9 Approaches to inference � Exact inference algorithms The elimination algorithm � Message-passing algorithm (sum-product, belief propagation) � The junction tree algorithms � � Approximate inference techniques Stochastic simulation / sampling methods � Markov chain Monte Carlo methods � Variational algorithms � Eric Xing 10 5

Marginalization and Elimination � A signal transduction pathway: A B C D E What is the likelihood that protein E is active? � Query: P ( e ) ∑∑∑∑ = P ( e ) P(a,b,c,d, e) d c b a a naïve summation needs to enumerate over an exponential number of terms � By chain decomposition, we get ∑∑∑∑ = P ( a ) P ( b | a ) P ( c | b ) P ( d | c ) P ( e | d ) d c b a Eric Xing 11 Elimination on Chains A B C D E � Rearranging terms ... ∑∑∑∑ = P ( e ) P ( a ) P ( b | a ) P ( c | b ) P ( d | c ) P ( e | d ) d c b a ∑∑∑ ∑ = P ( c | b ) P ( d | c ) P ( e | d ) P ( a ) P ( b | a ) d c b a Eric Xing 12 6

Elimination on Chains X A B C D E � Now we can perform innermost summation ∑∑∑ ∑ = P ( e ) P ( c | b ) P ( d | c ) P ( e | d ) P ( a ) P ( b | a ) d c b a ∑∑∑ = P ( c | b ) P ( d | c ) P ( e | d ) p ( b ) d c b � This summation "eliminates" one variable from our summation argument at a "local cost". Eric Xing 13 Elimination in Chains X X A B C D E � Rearranging and then summing again, we get ∑∑∑ = P ( e ) P ( c | b ) P ( d | c ) P ( e | d ) p ( b ) d c b ∑∑ ∑ = P ( d | c ) P ( e | d ) P ( c | b ) p ( b ) d c b ∑∑ = P ( d | c ) P ( e | d ) p ( c ) d c Eric Xing 14 7

Elimination in Chains X X X X A B C D E � Eliminate nodes one by one all the way to the end, we get ∑ = P ( e ) P ( e | d ) p ( d ) d � Complexity: Each step costs O(|Val(X i )|*|Val(X i+1 )|) operations: O(kn 2 ) � Compare to naïve evaluation that sums over joint values of n-1 variables O(n k ) � Eric Xing 15 Undirected Chains A B C D E � Rearranging terms ... 1 ∑∑∑∑ = φ φ φ φ P ( e ) ( b , a ) ( c , b ) ( d , c ) ( e , d ) Z d c b a 1 ∑∑∑ ∑ = φ φ φ φ ( c , b ) ( d , c ) ( e , d ) ( b , a ) Z d c b a = L Eric Xing 16 8

The Sum-Product Operation � In general, we can view the task at hand as that of computing the value of an expression of the form: ∑∏ φ φ ∈ F z where F is a set of factors � We call this task the sum-product inference task. Eric Xing 17 Outcome of elimination Let X be some set of variables, � let F be a set of factors such that for each φ ∈ F , Scope [ φ ] ∈ X , let Y ⊂ X be a set of query variables, and let Z = X − Y be the variable to be eliminated The result of eliminating the variable Z is a factor � ∑∏ τ = φ Y ( ) φ ∈ F z This factor does not necessarily correspond to any probability or conditional � probability in this network. (example forthcoming) Eric Xing 18 9

Dealing with evidence � Conditioning as a Sum-Product Operation 1 ⎧ ≡ if E e δ = i i The evidence potential: ( E , e ) ⎨ � ≠ i i ⎩ 0 if E e i i ∏ δ = δ E e Total evidence potential: ( , ) ( E i e , ) � i ∈ i I E Introducing evidence --- restricted factors: � ∑∏ τ = φ × δ Y e E e ( , ) ( , ) z e φ ∈ F , Eric Xing 19 Inference on General GM via Variable Elimination General idea: Write query in the form � ∑ ∑∑∏ = e L P ( X , ) P ( x | pa ) 1 i i x n x x i 3 2 � this suggests an "elimination order" of latent variables to be marginalized Iteratively � � Move all irrelevant terms outside of innermost sum � Perform innermost sum, getting a new term � Insert the new term into the product φ e ( X , ) wrap-up = 1 � e P ( X | ) ∑ 1 φ e ( X , ) 1 x 1 Eric Xing 20 10

The elimination algorithm Procedure Elimination ( G , // the GM E , // evidence Z , // Set of variables to be eliminated X , // query variable(s) ) Initialize ( G ) 1. Evidence ( E ) 2. Sum-Product-Elimination ( F , Z , ≺ ) 3. Normalization ( F ) 4. Eric Xing 21 The elimination algorithm Procedure Initialize ( G , Z ) Let Z 1 , . . . ,Z k be an ordering of Z 1. such that Z i ≺ Z j iff i < j Initialize F with the full the set of 2. factors Procedure Evidence ( E ) for each i ∈ Ι E , 1. F = F ∪δ ( E i , e i ) Procedure Sum-Product-Variable- Elimination ( F , Z , ≺ ) for i = 1, . . . , k 1. F ← Sum-Product-Eliminate-Var( F , Z i ) φ ∗ ← ∏ φ∈ F φ 2. return φ ∗ 3. Normalization ( φ ∗ ) 4. Eric Xing 22 11

The elimination algorithm Procedure Normalization ( φ ∗ ) Procedure Initialize ( G , Z ) P ( X | E )= φ ∗ ( X )/ ∑ x φ ∗ ( X ) Let Z 1 , . . . ,Z k be an ordering of Z 1. 1. such that Z i ≺ Z j iff i < j Initialize F with the full the set of 2. factors Procedure Evidence ( E ) for each i ∈ Ι E , 1. Procedure Sum-Product-Eliminate-Var ( F = F ∪δ ( E i , e i ) F , // Set of factors Z // Variable to be eliminated Procedure Sum-Product-Variable- Elimination ( F , Z , ≺ ) ) F ′ ← { φ ∈ F : Z ∈ Scope [ φ ]} 1. for i = 1, . . . , k 1. F ′′ ← F − F ′ 2. F ← Sum-Product-Eliminate-Var( F , Z i ) ψ ← ∏ φ ∈ F ′ φ φ ∗ ← ∏ φ∈ F φ 3. 2. τ ← ∑ Z ψ 4. return φ ∗ 3. return F ′′ ∪ { τ } Normalization ( φ ∗ ) 5. 4. Eric Xing 23 A more complex network A food web B A C D F E G H What is the probability that hawks are leaving given that the grass condition is poor? Eric Xing 24 12

The Elimination Algorithm Probabilistic Graphical Models (10- - PDF document

School of Computer Science The Elimination Algorithm Probabilistic Graphical Models (10- Probabilistic Graphical Models (10 -708) 708) Lecture 4, Sep 26, 2007 Receptor A Receptor A Receptor B Receptor B X 1 X 1 X 1 X 2 X 2 X 2 Eric Xing

The Elimination Algorithm Chris Williams School of Informatics, University of Edinburgh October

Dead Code Elimination & Dead code elimination Constant Propagation Conceptually similar

Second Order Cut-Elimination Mikheil Rukhaia Supervisor: Prof. Alexander Leitsch Introduction

Odds Algorithm An Online Algorithm Group Fibonado 20. Dec 2016 Group Fibonado Odds Algorithm

A framework for malaria elimination Dr Pedro Alonso, GMP Director Rationale for new elimination

Redundant Feature Elimination Redundant Feature Elimination for Multi-Class Problems for

Image Weather Image Weather 7 Effects Elimination Effects Elimination Abstract Problem

Hepatitis C Elimination in New York State Clifton Garmon, Angie Woody & Mary Taylor from

Decentralization towards elimination Datuk Dr. Muhammad Radzi Abu Hassan, Ministry of Health,

CS3220 Gaussian Elimination and LU Steve Marschner Spring 2010 one step of the elimination

Malaria elimination will require New tools Science and politics of malaria elimination in

5. Linear Inequalities and Elimination Searching for certificates Projection of polyhedra

Outline Outline Unreachable-Code Elimination Unreachable Code Elimination Control-Flow and

Tail call elimination Tail calls and their elimination Michel Schinz Loops in functional

Reuse Optimization Last time Common subexpression elimination (CSE) Today Partial

Dead Code Elimination (DCE) Dead code elimination is an optimization that removes DEAD

Engineering L8: Transitioning to SPL Transitioning/Adopting SPLs If we decide to adopt SPLs and

Automatic Differentiation of Parallelised Convolutional Neural Networks - Lessons from Adjoint PDE

Smart Objects SAPIENZA Universit di Roma, M.Sc. in Product Design Prof. Fabio Patrizi What is

Product Design Introduction: learning objectives Yves Bellouard Galatea Lab, STI/IMT, EPFL

Variable Elimination 1 Inference Exact inference Enumeration Variable elimination

CS786 Lecture 15: May 21, 2012 MAP inference [KF Chapter 13] CS786 P. Poupart 2012 1 MAP Queries

Re Reas ason onin ing g Un Unde der Un Uncerta tain inty ty: Varia Va iabl ble eli

Belief Propagation Probabilistic Graphical Models Sharif University of Technology Spring 2017

The Elimination Algorithm Probabilistic Graphical Models (10- - PDF document

School of Computer Science The Elimination Algorithm Probabilistic Graphical Models (10- Probabilistic Graphical Models (10 -708) 708) Lecture 4, Sep 26, 2007 Receptor A Receptor A Receptor B Receptor B X 1 X 1 X 1 X 2 X 2 X 2 Eric Xing

The Elimination Algorithm Chris Williams School of Informatics, University of Edinburgh October

Dead Code Elimination &amp; Dead code elimination Constant Propagation Conceptually similar

Second Order Cut-Elimination Mikheil Rukhaia Supervisor: Prof. Alexander Leitsch Introduction

Odds Algorithm An Online Algorithm Group Fibonado 20. Dec 2016 Group Fibonado Odds Algorithm

A framework for malaria elimination Dr Pedro Alonso, GMP Director Rationale for new elimination

Redundant Feature Elimination Redundant Feature Elimination for Multi-Class Problems for

Image Weather Image Weather 7 Effects Elimination Effects Elimination Abstract Problem

Hepatitis C Elimination in New York State Clifton Garmon, Angie Woody &amp; Mary Taylor from

Decentralization towards elimination Datuk Dr. Muhammad Radzi Abu Hassan, Ministry of Health,

CS3220 Gaussian Elimination and LU Steve Marschner Spring 2010 one step of the elimination

Malaria elimination will require New tools Science and politics of malaria elimination in

5. Linear Inequalities and Elimination Searching for certificates Projection of polyhedra

Outline Outline Unreachable-Code Elimination Unreachable Code Elimination Control-Flow and

Tail call elimination Tail calls and their elimination Michel Schinz Loops in functional

Reuse Optimization Last time Common subexpression elimination (CSE) Today Partial

Dead Code Elimination (DCE) Dead code elimination is an optimization that removes DEAD

Engineering L8: Transitioning to SPL Transitioning/Adopting SPLs If we decide to adopt SPLs and

Automatic Differentiation of Parallelised Convolutional Neural Networks - Lessons from Adjoint PDE

Smart Objects SAPIENZA Universit di Roma, M.Sc. in Product Design Prof. Fabio Patrizi What is

Product Design Introduction: learning objectives Yves Bellouard Galatea Lab, STI/IMT, EPFL

Variable Elimination 1 Inference Exact inference Enumeration Variable elimination

CS786 Lecture 15: May 21, 2012 MAP inference [KF Chapter 13] CS786 P. Poupart 2012 1 MAP Queries

Re Reas ason onin ing g Un Unde der Un Uncerta tain inty ty: Varia Va iabl ble eli

Belief Propagation Probabilistic Graphical Models Sharif University of Technology Spring 2017

Dead Code Elimination & Dead code elimination Constant Propagation Conceptually similar

Hepatitis C Elimination in New York State Clifton Garmon, Angie Woody & Mary Taylor from