Mixture Models Bhiksha Raj 27 Oct 2016 11755/18797 1 Learning - PowerPoint PPT Presentation

Machine Learning for Signal Processing Expectation Maximization Mixture Models Bhiksha Raj 27 Oct 2016 11755/18797 1

Learning Distributions for Data • Problem: Given a collection of examples from some data, estimate its distribution • Solution: Assign a model to the distribution – Learn parameters of model from data • Models can be arbitrarily complex – Mixture densities, Hierarchical models. 11755/18797 2

A Thought Experiment 6 3 1 5 4 1 2 4 … • A person shoots a loaded dice repeatedly • You observe the series of outcomes • You can form a good idea of how the dice is loaded – Figure out what the probabilities of the various numbers are for dice • P(number) = count(number)/count(rolls) • This is a maximum likelihood estimate – Estimate that makes the observed sequence of numbers most probable 11755/18797 3

The Multinomial Distribution • A probability distribution over a discrete collection of items is a Multinomial  ( : belongs to a discrete set ) ( ) P X X P X • E.g. the roll of dice – X : X in (1,2,3,4,5,6) • Or the toss of a coin – X : X in (head, tails) 11755/18797 4

Maximum Likelihood Estimation n 2 n 4 n 1 n 5 n 6 n 3 p 6 p 3 p 4 p 1 p 2 p 2 p 4 p 5 p 1 p 5 p 6 p 3 • Basic principle: Assign a form to the distribution – E.g. a multinomial – Or a Gaussian • Find the distribution that best fits the histogram of the data 11755/18797 5

Defining “Best Fit” • The data are generated by draws from the distribution – I.e. the generating process draws from the distribution • Assumption: The world is a boring place – The data you have observed are very typical of the process • Consequent assumption: The distribution has a high probability of generating the observed data – Not necessarily true • Select the distribution that has the highest probability of generating the data – Should assign lower probability to less frequent observations and vice versa 11755/18797 6

Maximum Likelihood Estimation: Multinomial • Probability of generating (n 1 , n 2 , n 3 , n 4 , n 5 , n 6 )   n ( , , , , , ) P n n n n n n Const p i 1 2 3 4 5 6 i i • Find p 1 ,p 2 ,p 3 ,p 4 ,p 5 ,p 6 so that the above is maximized • Alternately maximize        log ( , , , , , ) log( ) log P n n n n n n Const n p 1 2 3 4 5 6 i i i – Log() is a monotonic function – argmax x f(x) = argmax x log(f(x)) • Solving for the probabilities gives us EVENTUALLY n – Requires constrained optimization to  i p ITS JUST  i ensure probabilities sum to 1 n COUNTING! j j 11755/18797 7

Segue: Gaussians   1  X  m Q    m Q  m 1 T ( ) ( ; , ) exp 0 . 5 ( ) ( ) P X N X X  Q d ( 2 ) | | • Parameters of a Gaussian: – Mean m , Covariance Q 11755/18797 8

Maximum Likelihood: Gaussian  Given a collection of observations ( X 1 , X 2 ,…), estimate mean m and covariance Q   1      m Q  m 1 T ( , ,...) exp 0 . 5 ( ) ( ) P X X X X 1 2 i i  Q d ( 2 ) | | i           Q   m Q  m 1 T log ( , ,...) 0 . 5 log | | ( ) ( ) P X X C X X 1 2 i i i • Maximizing w.r.t m and Q gives us ITS STILL 1 1      JUST m  Q   m  m T X X X i i i COUNTING! N N i i 11755/18797 9

Laplacian  m   1 | | x  m     ( ) ( ; , ) exp P x L x b   2 b b • Parameters: Median m , scale b ( b > 0) – m is also the mean, but is better viewed as the median 11755/18797 10

Maximum Likelihood: Laplacian  Given a collection of observations ( x 1 , x 2 ,…), estimate mean m and scale b  m | |   x     i log ( , ,...) log( ) P x x C N b 1 2 b i • Maximizing w.r.t m and b gives us Still just counting 1  m    m ({ }) | | median x b x i i N i 11755/18797 11

Dirichlet (from wikipedia) log of the density as we change α from α=(0.3, 0.3, 0.3) to (2.0, 2.0, 2.0), keeping all the individual αi's equal to each other.  K =3. Clockwise from top left:  a ( ) α =(6, 2, 2), (3, 7, 5), (6, 2, 6), (2, 3, 4) i  a   a  1 ( ) ( ; ) i P X D X x i   i    a  i • Parameters are a s i   i – Determine mode and curvature • Defined only of probability vectors – X = [x 1 x 2 .. x K ] , S i x i = 1, x i >= 0 for all i 11755/18797 12

Maximum Likelihood: Dirichlet  Given a collection of observations ( X 1 , X 2 ,…), estimate a                 a    a   a   log ( , ,...) ( 1 ) log( ) log log P X X X N N   1 2 , i j i i i     j i i i • No closed form solution for a s. – Needs gradient ascent • Several distributions have this property: the ML estimate of their parameters have no closed form solution 11755/18797 13

Continuing the Thought Experiment 6 3 1 5 4 1 2 4 … 4 4 1 6 3 2 1 2 … • Two persons shoot loaded dice repeatedly – The dice are differently loaded for the two of them • We observe the series of outcomes for both persons • How to determine the probability distributions of the two dice? 11755/18797 14

Estimating Probabilities 6 4 5 1 2 3 4 5 2 2 1 4 3 4 6 2 1 6 … • Observation: The sequence of numbers from the two dice – As indicated by the colors, we know who rolled what number 11755/18797 15

Estimating Probabilities 6 4 5 1 2 3 4 5 2 2 1 4 3 4 6 2 1 6 … • Observation: The sequence of numbers from the two dice – As indicated by the colors, we know who rolled what number 4 1 3 5 2 4 4 2 6.. 6 5 2 4 2 1 3 6 1.. • Segregation: Separate the blue Collection of “blue” Collection of “red” observations from the red numbers numbers 11755/18797 16

Estimating Probabilities • Observation: The sequence of 6 4 5 1 2 3 4 5 2 2 1 4 3 4 6 2 1 6 … numbers from the two dice – As indicated by the colors, we know who rolled what number • Segregation: Separate the blue 4 1 3 5 2 4 4 2 6.. 6 5 2 4 2 1 3 6 1.. observations from the red • From each set compute probabilities for each of the 6 0.3 0.3 0.25 0.25 possible outcomes 0.2 0.2 0.15 0.15 0.1 0.1 no. of times number was rolled 0.05 0.05  ( ) 0 0 P number 1 2 3 4 5 6 1 2 3 4 5 6 total number of observed rolls 11755/18797 17

A Thought Experiment 6 4 1 5 3 2 2 2 … 6 3 1 5 4 1 2 4 … 4 4 1 6 3 2 1 2 … • Now imagine that you cannot observe the dice yourself • Instead there is a “ caller ” who randomly calls out the outcomes – 40% of the time he calls out the number from the left shooter, and 60% of the time, the one from the right (and you know this) • At any time, you do not know which of the two he is calling out • How do you determine the probability distributions for the two dice? 18 11755/18797

A Thought Experiment 6 4 1 5 3 2 2 2 … 6 3 1 5 4 1 2 4 … 4 4 1 6 3 2 1 2 … • How do you now determine the probability distributions for the two sets of dice … • .. If you do not even know what fraction of time the blue numbers are called, and what fraction are red? 11755/18797 19

A Mixture Multinomial • The caller will call out a number X in any given callout IF – He selects “RED”, and the Red die rolls the number X – OR – He selects “BLUE” and the Blue die rolls the number X • P(X) = P(Red)P(X|Red) + P(Blue)P(X|Blue) – E.g. P(6) = P(Red)P(6|Red) + P(Blue)P(6|Blue) • A distribution that combines (or mixes ) multiple multinomials is a mixture multinomial   ( ) ( ) ( | ) P X P Z P X Z Z Mixture weights Component multinomials 11755/18797 20

Mixture Distributions Mixture Gaussian    m Q  ( ) ( ) ( ; , ) P X P Z N X ( ) ( ) ( | ) P X P Z P X Z z z Z Z Mixture weights Component distributions Mixture of Gaussians and Laplacians     m Q  m ( ) ( ) ( ; , ) ( ) ( ; , ) P X P Z N X P Z L X b , z z i z z i Z Z i • Mixture distributions mix several component distributions – Component distributions may be of varied type • Mixing weights must sum to 1.0 • Component distributions integrate to 1.0 • Mixture distribution integrates to 1.0 11755/18797 21

Maximum Likelihood Estimation   ( ) ( ) ( | ) P X P Z P X Z • For our problem: Z – Z = color of dice n   X        n ( , , , , , ) ( ) ( ) ( | ) P n n n n n n Const P X Const P Z P X Z X 1 2 3 4 5 6   X X Z • Maximum likelihood solution: Maximize         log( ( , , , , , )) log( ) log ( ) ( | ) P n n n n n n Const n P Z P X Z 1 2 3 4 5 6 X   X Z • No closed form solution (summation inside log)! – In general ML estimates for mixtures do not have a closed form – USE EM! 11755/18797 22

Mixture Models Bhiksha Raj 27 Oct 2016 11755/18797 1 Learning - PowerPoint PPT Presentation

Machine Learning for Signal Processing Expectation Maximization Mixture Models Bhiksha Raj 27 Oct 2016 11755/18797 1 Learning Distributions for Data Problem: Given a collection of examples from some data, estimate its distribution

Bernoulli Mixture Models Victor Medina Researcher at SBIF DataCamp Mixture Models in R The

Structure of mixture models Victor Medina Researcher at SBIF DataCamp Mixture Models in R

AND MACHINE LEARNING CHAPTER 10: MIXTURE MODELS AND EM Mixture Models - Define a joint

Gaussian Mixture Models & EM CE-717: Machine Learning Sharif University of Technology M.

Deep Gaussian Mixture Models Cinzia Viroli (University of Bologna, Italy) joint with Geoff

Classification of High Dimensional Data By Two-way Mixture Models Jia Li Statistics Department

CSci 8980: Advanced Topics in Graphical Models Mixture Models, EM, Exponential Families

Solutions Unit 6 1 Solutions Homogenous Mixture (Solution) two or more substances mixed

Mixture Selection, Mechanism Design, and Signaling Ho Yee Cheung Shaddin Dughmi Yu Cheng Ehsan

Constrained Mixture Estimation for Constrained Mixture Estimation Analysis and Robust

Binary liquid mixture of EmimBF 4 and methoxyethanol Binary liquid mixture excess molar volume

CSC321 Lecture 18: Mixture Modeling Roger Grosse Roger Grosse CSC321 Lecture 18: Mixture

MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS Charles Martin SO FAR; RNNS THAT MODEL

Lecture 20 Lecture 20 Nov 12 th 2008 Clustering with Mixture of Gaussians Clustering with Mixture

Flexible Mixture Modeling and Model-Based Clustering in R Bettina Grn September 2017 c

Assignment 3 Zahra Sheikhbahaee Zeou Hu & Colin Vandenhof February 2020 1 [2 points]

RAFT Consensus Slide content borrowed from Diego Ongaro, John Ousterhout, and Alberto Montresor

7th International dCache Workshop Berlin Bits and Pieces 2013 Christian Bernardt (at DESY)

Directed Probabilistic Graphical Models CMSC 678 UMBC Announcement 1: Assignment 3 Due Wednesday

Logging with SF4L and Logback J.Serrat 102759 Software Design November 3, 2015 Index Why

The Why, What, and How of Software Transactions for More Reliable Concurrency Dan Grossman

CS320: Performance Evaluation Plotting data sets Semi log plots Log log plots Analyzing Program

Modeling and Simulating Social Systems with MATLAB Lecture 2 Statistics and Plotting in

CS599: Convex and Combinatorial Optimization Fall 2013 Lecture 8: Convex Functions Wrapup

Mixture Models Bhiksha Raj 27 Oct 2016 11755/18797 1 Learning - PowerPoint PPT Presentation

Machine Learning for Signal Processing Expectation Maximization Mixture Models Bhiksha Raj 27 Oct 2016 11755/18797 1 Learning Distributions for Data Problem: Given a collection of examples from some data, estimate its distribution

Bernoulli Mixture Models Victor Medina Researcher at SBIF DataCamp Mixture Models in R The

Structure of mixture models Victor Medina Researcher at SBIF DataCamp Mixture Models in R

AND MACHINE LEARNING CHAPTER 10: MIXTURE MODELS AND EM Mixture Models - Define a joint

Gaussian Mixture Models &amp; EM CE-717: Machine Learning Sharif University of Technology M.

Deep Gaussian Mixture Models Cinzia Viroli (University of Bologna, Italy) joint with Geoff

Classification of High Dimensional Data By Two-way Mixture Models Jia Li Statistics Department

CSci 8980: Advanced Topics in Graphical Models Mixture Models, EM, Exponential Families

Solutions Unit 6 1 Solutions Homogenous Mixture (Solution) two or more substances mixed

Mixture Selection, Mechanism Design, and Signaling Ho Yee Cheung Shaddin Dughmi Yu Cheng Ehsan

Constrained Mixture Estimation for Constrained Mixture Estimation Analysis and Robust

Binary liquid mixture of EmimBF 4 and methoxyethanol Binary liquid mixture excess molar volume

CSC321 Lecture 18: Mixture Modeling Roger Grosse Roger Grosse CSC321 Lecture 18: Mixture

MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS Charles Martin SO FAR; RNNS THAT MODEL

Lecture 20 Lecture 20 Nov 12 th 2008 Clustering with Mixture of Gaussians Clustering with Mixture

Flexible Mixture Modeling and Model-Based Clustering in R Bettina Grn September 2017 c

Assignment 3 Zahra Sheikhbahaee Zeou Hu &amp; Colin Vandenhof February 2020 1 [2 points]

RAFT Consensus Slide content borrowed from Diego Ongaro, John Ousterhout, and Alberto Montresor

7th International dCache Workshop Berlin Bits and Pieces 2013 Christian Bernardt (at DESY)

Directed Probabilistic Graphical Models CMSC 678 UMBC Announcement 1: Assignment 3 Due Wednesday

Logging with SF4L and Logback J.Serrat 102759 Software Design November 3, 2015 Index Why

The Why, What, and How of Software Transactions for More Reliable Concurrency Dan Grossman

CS320: Performance Evaluation Plotting data sets Semi log plots Log log plots Analyzing Program

Modeling and Simulating Social Systems with MATLAB Lecture 2 Statistics and Plotting in

CS599: Convex and Combinatorial Optimization Fall 2013 Lecture 8: Convex Functions Wrapup

Gaussian Mixture Models & EM CE-717: Machine Learning Sharif University of Technology M.

Assignment 3 Zahra Sheikhbahaee Zeou Hu & Colin Vandenhof February 2020 1 [2 points]