Advanced Machine Learning Introduction to Probabilistic Graphical - PowerPoint PPT Presentation

Advanced Machine Learning Introduction to Probabilistic Graphical Models Amit Sethi Electrical Engineering, IIT Bombay

Objectives • Learn about statistical dependency of variables • Understand how this dependency can be coded in graphs • Understand the basic intuition behind Bayesian Networks 2

Bayesian models with which you are familiar • Bayes theorem • p(C k | x ) = p(C k ) p( x |C k ) / p( x ) • Posterior = prior x likelihood / evidence • Naïve Bayes • p(C k | x ) = p(C k |x 1 ,…, x n ) α p(C k ) П i p(x i |C k ) • A decision about the class can now be based on: • Prior, and • Simplified class conditional densities of x • Log of the posterior probability leads to a linear discriminant for certain class conditionals from the exponential families

Consider an inference problem • Trying to guess if the family is out: – When wife leaves the house she leaves the outdoor light on (but sometimes leaves it on for a guest) – When wife leaves the house, she usually puts the dog out – When dog has a bowel problem, she goes to the backyard – If the dog is in the backyard, I will probably hear it (but it might be the neighbor's dog) • If the dog is barking and the light is off, is the family out? 4 Example source: “Bayesian Networks without Tears” by Eugene Charniak, AI Magazine, AAAI 1991

Some observations • A lot of the events in the world are related • The relations are not deterministic but probabilistic – Some events are the cause and others are effects – The effect usually has a sharper conditional distribution given the cause than if the cause is unknown 5

Bayesian Network definition • A Bayesian network is a directed graph in which each node (variable) is annotated with a conditional probability distribution that encodes statistical dependency: – Each node corresponds to a random variable – If there is an arrow (edge) from node X to node Y , X is said to be a parent of Y – Each node X i has a conditional probability distribution P(X i | Parents(X i )) that quantifies the effect of the parents on the node – The graph has no directed cycles (and hence is a directed acyclic graph, or DAG 6 Source: “Pattern Recognition and Machine Learning”, Book by Christopher Bishop

Back to our inference problem • Trying to guess if the family is out, given that the light is off and the dog is barking • 1: Brute force (no independence assumption): – P(fo) = Σ bp Σ do p(fo, bp, do, lo=1, hb=1) = Σ bp Σ do p(fo) p(bp|fo) p(do|fo,bp ) … • 2: Using Factorization property of BNs: – P(fo) = Σ bp Σ do p(fo) p(bp) p(lo|fo) p(do|fo,bp) p(hb|do) = p(fo) Σ bp p(bp) p(lo|fo) Σ do p(do|fo,bp) p(hb|do) 7 Source: “Pattern Recognition and Machine Learning”, Book by Christopher Bishop

What have we gained, so far? 1. We have made it easier for us to visualize relationships between variables 2. We have simplified joint distribution as a product of lower dimensional conditional distributions 3. We have simplified marginalization of the joint distribution by taking out terms that do not depend on the function being marginalized 8

Notion of D-separation • Influence of x “flows through” z to y y x z z y z x x y x z y • In which case the influence does not flow iff z is known (z D- separates x and y, or the path becomes inactive given z)? • Ans: (a), (b) and (c). For (d), iff z is unknown 9 Source: “Pattern Recognition and Machine Learning”, Book by Christopher Bishop

Markov Networks aka MRF • Definition – Graphical models with undirected edges – Variables are nodes – Relationships between variables are undirected edges • Properties – Notion of conditional independence is simpler – Joint distributions are represented by clique (largest fully connected subset of nodes) potentials 11 Source: “Pattern Recognition and Machine Learning”, Book by Christopher Bishop

Joint distributions in MRFs • If there is no link between two nodes x i and x j , then conditional independence can be expressed as: p(x i ,x j | x \{i,j} ) = p(x i | x \{i,j} ) p(x j | x \{i,j} ) • Due to Hammersley-Clifford theorem, the sets of distributions represented by the MRF’s conditional independence structure is the same as those that can be represented by a product of maximal clique potentials. i.e. the joint distribution is written as a product of potential functions ψ c (x c ) over the maximal cliques of the graph p(x) = 1/Z П c ψ c (x c ) • Here Z is the partition function, which is ∑ x П c ψ c (x c ) , which is a normalization constant 12 Source: “Pattern Recognition and Machine Learning”, Book by Christopher Bishop

This clique potential can be represented in terms of an energy function • The clique potentials are strictly positive, hence can be defined in terms of an energy ψ c ( x c ) = exp( -E ( x c )) • Now, product of clique potentials is equivalent to sum of energies • However, the clique potentials do not have a specific probabilistic interpretation 13

In general, the BN and MRFs represent a non-overlapping set of distributions • A directed graph whose conditional independence cannot be expressed as an undirected graph • An undirected graph whose conditional independence cannot be expressed as an directed graph Source: “Pattern Recognition and Machine Learning”, Book by Christopher Bishop

An example use of an MRF in image denoising or binary segmentation • Objective: – Find the underlying clean image • Assumptions: – Most of the pixels are not corrupt – Neighbouring pixels are likely to be same • Define (with values {-1,+1}): – x i to be underlying true pixel – y i to be observed pixels (iid | x i ) • Potentials: – For observing: - η x i y i – For spatial coherence: - β x i x j – For prior: -hx i • Total energy: 15 Source: “Pattern Recognition and Machine Learning”, Book by Christopher Bishop

Now, we minimize the energy to get the desired results • Energy function: E( x,y ) = h ∑ i x i – β ∑ {i,j} x i x j – η ∑ i x i y i • And p( x,y ) = 1/Z exp{-E( x,y )} • We initialize y i and find x i such that the energy is minimized. The following are results with two different energy minimization algorithms: 16 Source: “Pattern Recognition and Machine Learning”, Book by Christopher Bishop

Factor graphs are the most general form of Graphical Models • Factor graphs make the relationship among variables explicit by using Factor Nodes • Factorization: – If we can represent p(X) as a product of factors: p(x) = П s f s (x s ) = f a (x 1 ,x 2 ) f b (x 1 ,x 2 ) f c (x 2 ,x 3 ) f d (x 3 ) • Then, we can draw a Bipartite graph (undirected) such that: – Set of nodes V represent variables – Set of nodes F represent functions or factors – No node in V connected to another node in V – No node in F connected to another node in F 17 Source: “Pattern Recognition and Machine Learning”, Book by Christopher Bishop

Relation between the three PGMs • In a Bayesian Network, co-parents need to be moralized (married) to form edges in an MRF (because they are not independent given the children) • For an MRF, every clique is represented by a function node • Priors of parentless variables can also be incorporated in factor graphs • Loops can be avoided in factor graphs by combining functions that form a loop 18 Source: “Pattern Recognition and Machine Learning”, Book by Christopher Bishop

Advanced Machine Learning Introduction to Probabilistic Graphical - PowerPoint PPT Presentation

Advanced Machine Learning Introduction to Probabilistic Graphical Models Amit Sethi Electrical Engineering, IIT Bombay Objectives Learn about statistical dependency of variables Understand how this dependency can be coded in graphs

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

ADVANCED MACHINE LEARNING Kernel PCA 11 ADVANCED MACHINE LEARNING Overview Todays Lecture

ADVANCED MACHINE LEARNING Non-linear regression techniques 1 1 ADVANCED MACHINE LEARNING

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Discrete and Continuous Reinforcement Learning (not part of exam material) 1 1 ADVANCED

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

Learning chordal Markov networks by dynamic programming Kustaa Kangas Teppo Niinim aki Mikko

Belief Propagation for Spatial Network Embeddings Andrew Frank Alex Ihler Padhraic Smyth

Probabilistic Graphical Models Probabilistic Graphical Models Undirected Models Fall 2019

Probabilistic Graphical Models Lecture 10 Undirected Models CS/CNS/EE 155 Andreas Krause

Overview Filtering images MAP, Tikhonov and Poisson model of the noise A-priori and Markov

Permutation Editing and Matching via Embeddings Graham Cormode, S. Muthukrishnan, Cenk Sahinalp

HOST Physical Unclonable Functions I ECE 525 Introduction We discussed the basic tenets of

Section 2 Link Layer CSE 461 Autumn 2015 Panji Wisesa Byte Count Add a length to the

Sambuz

Useful Links

Newsletter

Mail Us