Introduction to Machine Learning Undirected Graphical Models - PowerPoint PPT Presentation

Introduction to Machine Learning Undirected Graphical Models Barnabás Póczos

Credits Many of these slides are taken from Ruslan Salakhutdinov, Hugo Larochelle, & Eric Xing • http://www.dmi.usherb.ca/~larocheh/neural_networks • http://www.cs.cmu.edu/~rsalakhu/10707/ • http://www.cs.cmu.edu/~epxing/Class/10708/ Reading material: • http://www.cs.cmu.edu/~rsalakhu/papers/Russ_thesis.pdf • Section 30.1 of Information Theory, Inference, and Learning Algorithms by David MacKay • http://www.stat.cmu.edu/~larry/=sml/GraphicalModels.pdf 2

Undirected Graphical Models = Markov Random Fields Probabilistic graphical models: a powerful framework for representing dependency structure between random variables. Markov network (or undirected graphical model) is a set of random variables having a dependency structure described by an undirected graph . Semantic labeling 3

Cliques Clique : a subset of nodes such that there exists a link between all pairs of nodes in a subset. Maximal Clique: a clique such that it is not possible to include any other nodes in the set without it ceasing to be a clique. This graph has two maximal cliques: Other cliques: 4

Undirected Graphical Models = Markov Random Fields Directed graphs are useful for expressing causal relationships between random variables, whereas undirected graphs are useful for expressing dependencies between random variables. The joint distribution defined by the graph is given by the product of non-negative potential functions over the maximal cliques (connected subset of nodes). In this example, the joint distribution factorizes as: 5

Markov Random Fields (MRFs)  Each potential function is a mapping from the joint configurations of random variables in a maximal clique to non- negative real numbers .  The choice of potential functions is not restricted to having specific probabilistic interpretations.  where E(x) is called an energy function. 6

Conditional Independence It follows that the undirected graphical structure represents conditional independence : Theorem : 7

MRFs with Hidden Variables For many interesting problems, we need to introduce hidden or latent variables .  Our random variables will contain both visible and hidden variables x=(v,h)  Computing the Z partition function is intractable  Computing the summation over hidden variables is intractable  Parameter learning is very challenging. 8

Boltzmann Machines Definition: [Boltzmann machines] MRFs with maximum click size two [pairwise (edge) potentials] on binary-valued nodes are called Boltzmann machines The joint probabilities are given by : The parameter θ ij measures the dependence of x i on x j , conditioned on the other nodes . 9

Boltzmann Machines Theorem: One can prove that the conditional distribution of one node conditioned on the others is given by the logistic function in Boltzmann Machines: Proof: 10

Boltzmann Machines Proof [Continued]: Q.E.D. 11

Example: Image Denoising Let us look at the example of noise removal from a binary image. The image is an array of {-1, +1} pixel values.  We take the original noise-free image (x) and randomly flip the sign of pixels with a small probability. This process creates the noisy image (y)  Our goal is to estimate the original image x from the noisy observations y.  We model the joint distribution with 12

Inference: Iterated Conditional Models Goal: Solution : coordinate-wise gradient descent 13

Gaussian MRFs • The information matrix is sparse, but the covariance matrix is not. 14

Restricted Boltzmann Machines 15

Restricted Boltzmann Machines Restricted = no connections in the hidden layer + no connection in the visible layer x Partition function (intractable) 16

Gaussian-Bernoulli RBM [Quadratic in v linear in h] 17

Possible Tasks with RBM Tasks:  Inference:  Evaluate the likelihood function:  Sampling from RBM:  Training RBM: 18

Inference 19

Inference Theorem : Inference in RBM is simple: the conditional distributions are logistic functions x Similarly, 20

Proof: 21

Proof [Continued]: Q.E.D. 22

Evaluating the Likelihood 23

Calculating the Likelihood of an RBM Theorem : Calculating the likelihood is simple in RBM (apart from the partition function) Free energy 24

Proof : Q.E.D. 25

Sampling 26

Sampling from p(x,h) in RBM Goal: Generate samples from Sampling is tricky … it is easier much in directed graphical models. Here we will use Gibbs sampling . x Similarly, 27

Gibbs Sampling: The Problem Suppose that we can generate samples from Our goal is to generate samples from 28

Gibbs Sampling: Pseudo Code 29

Gibbs Sampling 30

Training 31

RBM Training Training is complicated… To train an RBM, we would like to minimize the negative log-likelihood function: To solve this, we use stochastic gradient ascent: Theorem: Negative phase Positive phase (hard to computer) 32

RBM Training Proof: 33

RBM Training Proof [Continued]: First term Second term First term: Difficult to calculate the expectation Negative phase 34

RBM Training Proof [Continued]: 35

RBM Training Proof [Continued]: First term Second term Second term: The conditionals are independent logistic distributions Positive phase 36 Q.E.D

RBM Training Since We need to calculate where 37

RBM Training Since We need to calculate The second term is more tricky. Approximate the expectations with a single sample: where 38

RBM Training Logistic Logistic 39

CD-k (Contrastive Divergence) Pseudocode 40

Results 41

RBM Training Results http://deeplearning.net/tutorial/rbm.html Learned filters Original images Samples generated by the RBM after training. Each row represents a mini-batch of negative particles (samples from independent Gibbs chains). 1000 steps of Gibbs sampling were taken between each of those rows. 42

Summary Tasks:  Inference:  Evaluate the likelihood function:  Sampling from RBM:  Training RBM: 43

Thanks for your Attention!

RBM Training Results 45

Gaussian-Bernoulli RBM Training Results Each document (story) is represented with a bag of world coming from a multinomial distribution with parameters (h = topics). After training we can generate words from this topics. 46

Introduction to Machine Learning Undirected Graphical Models - PowerPoint PPT Presentation

Introduction to Machine Learning Undirected Graphical Models Barnabs Pczos Credits Many of these slides are taken from Ruslan Salakhutdinov, Hugo Larochelle, & Eric Xing http://www.dmi.usherb.ca/~larocheh/neural_networks

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Renewable Gas Interconnection & Operating Agreements Proposal May 18, 2020 Energy Division

Inverse Kinematics (part 1) CSE169: Computer Animation Instructor: Steve Rotenberg UCSD, Winter

Cooperative Business Relationships 200 Contents Intergovernmental Agreements Inter-agency

Partner Partner Partner Partner +1 202 263 3241 +1 202 263 3241 +1 202 263 3241 + 1 202 263

Joint Action and the Emergence of Mindreading s.butterfill@warwick.ac.uk challenge Explain

Which Joint Actions Ground Social Cognition? s.butterfill@warwick.ac.uk challenge Explain the

Product-form solutions for models with joint-state dependent transition rates Simonetta Balsamo,

Truss St Tru s Structures Truss Definitions and Details 1 Truss: Mimic Beam Behavior 2

Introduction to Machine Learning Undirected Graphical Models - PowerPoint PPT Presentation

Introduction to Machine Learning Undirected Graphical Models Barnabs Pczos Credits Many of these slides are taken from Ruslan Salakhutdinov, Hugo Larochelle, & Eric Xing http://www.dmi.usherb.ca/~larocheh/neural_networks

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Renewable Gas Interconnection &amp; Operating Agreements Proposal May 18, 2020 Energy Division

Inverse Kinematics (part 1) CSE169: Computer Animation Instructor: Steve Rotenberg UCSD, Winter

Cooperative Business Relationships 200 Contents Intergovernmental Agreements Inter-agency

Partner Partner Partner Partner +1 202 263 3241 +1 202 263 3241 +1 202 263 3241 + 1 202 263

Joint Action and the Emergence of Mindreading s.butterfill@warwick.ac.uk challenge Explain

Which Joint Actions Ground Social Cognition? s.butterfill@warwick.ac.uk challenge Explain the

Product-form solutions for models with joint-state dependent transition rates Simonetta Balsamo,

Truss St Tru s Structures Truss Definitions and Details 1 Truss: Mimic Beam Behavior 2

Renewable Gas Interconnection & Operating Agreements Proposal May 18, 2020 Energy Division