Probabilistic Graphical Models Guest Lecture by Narges Razavian - PowerPoint PPT Presentation

Probabilistic Graphical Models Guest Lecture by Narges Razavian Machine Learning Class April 14 2017

Today What is probabilistic graphical model and why it is useful? Bayesian Networks Basic Inference Generative Models Fancy Inference (when some variables are unobserved) How to learn model parameters from data Undirected Graphical Models Inference (Belief Propagation) New Directions in PGM research & wrapping up

“ What I cannot create, I do not understand.” -Richard Feynman

Generative models vs Discriminative models Discriminative models learn P(Y|X) . It’s easier, requires less data, but is only useful for one particular task: Given X, what is P(Y|X)? [Example: Logistic Regression, Feed-Forward or Convolutional Neural Networks, etc.] Generative models instead learn P(Y, X) completely. Once they do that, they can compute everything! P(X) = ∫ y P(X,Y) P(Y) = ∫ x P(X,Y) P(Y|X) = P(Y,X) / ∫ y P(Y,X) [Caveat: No Free Lunch!! You want to answer every question under the sun? You need more data!]

Probabilistic Graphical Models: Main “Classic” approach to modeling P(Y, X) P( Y 1 ,…, Y M , X 1 , … ,X D )

Some Calculations on Space Imagine each variable is binary P( Y 1 ,…, Y M , X 1 , … , X D )

Some Calculations on Space Imagine each variable is binary P( Y 1 ,…, Y M , X 1 , … , X D ) How many parameters do we need to estimate from data to specify P(Y,X)??

Some Calculations on Space Imagine each variable is binary P( Y 1 ,…, Y M , X 1 , … , X D ) How many parameters do we need to estimate from data to specify P(Y,X)?? 2 (M+D) -1

Too many parameters! What can be done? 1) Look for conditional independences 2) Use Chain Rule for probabilities to break P(Y,X) into smaller pieces 3) Rewrite P(Y,X) as product of smaller factors a) Maybe you have more data for a subset of variables.. 4) Simplify some of the modeling assumptions to cut parameters a) I.e. Assume data is multivariate Gaussian b) I.e. Assume conditional independencies even if they don’t really always apply

Bayesian Networks Use chain rule for probabilities ● This is always true, no approximations or assumptions, so no reduction in number of parameters either ● BNs: Conditional Independence Assumption: ○ For some of variables, P(X i | X 1 , …, X i-1 ) is approximated with P(X i | Subset of (X 1 , …, X i-1 ) ) ■ This “ Subset of (X 1 , …, X i-1 )” is referred to as Parents(X i ) Reduce parameters (if binary for instance) from 2 (i-1) to 2 |parents(Xi)| ■

Some Example of a BN for SNPs

Benefits of Bayesian Networks 1) Once estimated they can answer any conditional or marginal queries! a) Called Inference 2) Fewer parameters to estimate! 3) We can start putting prior information into the network 4) We can incorporate LATENT(Hidden/Unobserved) variables based on how we/domain experts think variables might be related 5) Generating samples from the distribution becomes super easy.

Inference in Bayesian Networks Query types: X1: X2: Difficulty Intelligence 1) Conditional probabilities P(Y|X)=? X3: Grade X4: SAT P(X i ==a|X \i ==B,Y==C)=? X5: Letter of recom 2) Maximum a posteriori estimate Argmax xi P(X i |X \i ) = ? Argmax yi P(Yi| X) = ?

Key operation: Marginalization P(X) = Σ y P(X,Y) P(X5 | X2=a) = ?? X1: X2: Difficulty Intelligence P(X5 | X2=a) = P(X5 , X2=a) / P(X2=a) P(X5 , X2=a) = Σ X1,X3,X4 P(X1,X2=a,X3,X4,X5) X3: Grade X4: SAT P(X2=a) = Σ X1,X3,X4,X5 P(X1,X2=a,X3,X4,X5) X5: Letter of recom

Marginalize from the first parents (root) to the variable...

Marginalize from the first parents (root) to the variable... This method is called sum-product or variable elimination

Marginalization when P(X) = Σ y P(X,Y) P(X5 | X2=a) = ?? X1: X2: P(X5 | X2=a) = P(X5 , X2=a) / P(X2=a) Difficulty Intelligence X3: Grade X4: SAT X5: Letter of recom

Estimating Parameters of a Bayesian Network ● Maximum Likelihood Estimation ● Also sometimes Maximum Pseudolikelihood estimation

How to estimate parameters of a Bayesian Network? (1) You have observed all Y,X variables and dependency structure is known If you remember from other lectures: Likelihood(D; Parameters) = ∏ Dj in data P(Dj | Parameters) = ∏ Dj in data ∏ Xij in Dj P(Xij | Par(Xij) , Parameters{Par(Xij) -> Xij}) = ∏ i in variable set ∏ Dj in data P(Xij | Par(Xij) , Parameters{Par(Xij) -> Xij}) = ∏ i in variable set (Independent Local terms function of All observed Xij and Par(Xij)) MLE-Parameters{Par(Xij) -> Xij} = Argmax ( Local likelihood of observed Xij and Par(Xij) in data!)

How to estimate parameters of a Bayesian Network? (1) You have observed all Y,X variables and dependency structure is known ● If variables are discrete: P(Xi = a | Parents(Xi) = B) = Count (Xi==a & Pa(Xi) == B) Count (Pa(Xi) == B)

How to estimate parameters of a Bayesian Network? (1) You have observed all Y,X variables and dependency structure is known ● If variables are discrete: P(Xi = a | Parents(Xi) = B) = Count (Xi==a & Pa(Xi) == B) Count (Pa(Xi) == B) ● If variables are continuous: P(Xi = a | Parents(Xi) = B) = fit Some_PDF_Function (a,B)

How to estimate parameters of a Bayesian Network? (1) You have observed all Y,X variables and dependency structure is known P(Xi=a | Parent(Xi) =B) = Some_PDF_Function (a, B) Single Multivariate Gaussian Mixture of Multivariate Gaussian Non-parametric density functions

How to estimate parameters of a Bayesian Network? (2) You have observed all Y,X variables, but dependency structure is NOT known

Structure learning when all variables are observed 1) Neighborhood Selection: ● Lasso: L1 regularized regression per variable, learning using other variables. ● Not necessarily a tree structure 2) Tree Learning via Chaw-Liu method: ● Per variable pairs find empirical distribution P(Xi,Xj) = Count(Xi,Xj)/M ● Per variable pairs, compute mutual information ● Use I(Xi,Xj) as weight in graph. Learn maximum spanning tree.

How to estimate parameters of a Bayesian Network? (3) You have unobserved variables!!, but dependency structure is known Most commonly used Bayesian Networks these days!

In practice, Bayes Nets are most used to inject priors and structure into the task Modeling documents as a collection of topics where each topic is a distribution over words: Topic Modeling via Latent Dirichlet Allocation

In Practice, Bayes Nets are most used to inject priors and structure Correcting for hidden confounders in expression data

Probabilistic Graphical Models Guest Lecture by Narges Razavian - PowerPoint PPT Presentation

Probabilistic Graphical Models Guest Lecture by Narges Razavian Machine Learning Class April 14 2017 Today What is probabilistic graphical model and why it is useful? Bayesian Networks Basic Inference Generative Models Fancy Inference

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic Graphical Models A graph G that

Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Probabilistic Graphical Models Probabilistic Graphical Models introduction to learning Siamak

Computer Science Let me be provocative Probabilistic graphical models is how we do probabilistic

Probabilistic Graphical Models Probabilistic Graphical Models Undirected Models Fall 2019

Probabilistic Graphical Models Probabilistic Graphical Models parameter learning in undirected

Probabilistic Graphical Models Probabilistic Graphical Models Gaussian Network Models Fall 2019

CS 6782: Fall 2010 Probabilistic Graphical Models Guozhang Wang December 10, 2010 1

Probabilistic Graphical Models Probabilistic Graphical Models Relationship between the directed

Probabilistic Graphical Models Probabilistic Graphical Models Review of probability theory

Probabilistic Graphical Models Probabilistic Graphical Models Loopy BP and Bethe Free Energy

Probabilistic Graphical Models Probabilistic Graphical Models Structure learning in Bayesian

Probabilistic Graphical Models Probabilistic Graphical Models MAP inference Siamak Ravanbakhsh

Probabilistic Graphical Models Probabilistic Graphical Models Markov Chain Monte Carlo Inference

The Elimination Algorithm Probabilistic Graphical Models (10- Probabilistic Graphical Models

Counting Given a set S we will use | S | for the number of elements of S . Simple Probability A

and Why? Elja Arjas UH, THL, UiO Understanding the concepts of randomness and probability: Does

Probability Recap David Dalpiaz STAT 430, Fall 2017 1 Administration Questions? Comments?

Probability Chapter 4 Section 2: Fundamentals Section 3: Addition Rule Section 4:

Developing and Shipping LLVM and Clang with CMake The lesser of two evils Chris Bieneman IRC:

A deterministic algorithm for stochastic multistage problems or The problem-child algorithm

Adversarial event generator tuning with Bayesian Optimization Maxim Borisyak, Andrey Ustyuzhanin

Some basics in probability and statistics . Course of Machine Learning Master Degree in Computer