MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS Charles Martin

SO FAR; RNNS THAT MODEL CATEGORICAL DATA SO FAR; RNNS THAT MODEL CATEGORICAL DATA

SO FAR; RNNS THAT MODEL CATEGORICAL DATA SO FAR; RNNS THAT MODEL CATEGORICAL DATA Remember that most RNNs (and most deep learning models) end with a so�max layer.

SO FAR; RNNS THAT MODEL CATEGORICAL DATA SO FAR; RNNS THAT MODEL CATEGORICAL DATA Remember that most RNNs (and most deep learning models) end with a so�max layer. This layer outputs a probability distribution for a set of categorical predictions.

SO FAR; RNNS THAT MODEL CATEGORICAL DATA SO FAR; RNNS THAT MODEL CATEGORICAL DATA Remember that most RNNs (and most deep learning models) end with a so�max layer. This layer outputs a probability distribution for a set of categorical predictions. E.g.:

SO FAR; RNNS THAT MODEL CATEGORICAL DATA SO FAR; RNNS THAT MODEL CATEGORICAL DATA Remember that most RNNs (and most deep learning models) end with a so�max layer. This layer outputs a probability distribution for a set of categorical predictions. E.g.: image labels,

SO FAR; RNNS THAT MODEL CATEGORICAL DATA SO FAR; RNNS THAT MODEL CATEGORICAL DATA Remember that most RNNs (and most deep learning models) end with a so�max layer. This layer outputs a probability distribution for a set of categorical predictions. E.g.: image labels, letters, words,

SO FAR; RNNS THAT MODEL CATEGORICAL DATA SO FAR; RNNS THAT MODEL CATEGORICAL DATA Remember that most RNNs (and most deep learning models) end with a so�max layer. This layer outputs a probability distribution for a set of categorical predictions. E.g.: image labels, letters, words, musical notes,

SO FAR; RNNS THAT MODEL CATEGORICAL DATA SO FAR; RNNS THAT MODEL CATEGORICAL DATA Remember that most RNNs (and most deep learning models) end with a so�max layer. This layer outputs a probability distribution for a set of categorical predictions. E.g.: image labels, letters, words, musical notes, robot commands,

SO FAR; RNNS THAT MODEL CATEGORICAL DATA SO FAR; RNNS THAT MODEL CATEGORICAL DATA Remember that most RNNs (and most deep learning models) end with a so�max layer. This layer outputs a probability distribution for a set of categorical predictions. E.g.: image labels, letters, words, musical notes, robot commands, moves in chess.

EXPRESSIVE DATA IS OFTEN CONTINUOUS EXPRESSIVE DATA IS OFTEN CONTINUOUS

SO ARE BIO-SIGNALS SO ARE BIO-SIGNALS Image Credit: Wikimedia

CATEGORICAL VS. CONTINUOUS MODELS CATEGORICAL VS. CONTINUOUS MODELS

NORMAL (GAUSSIAN) DISTRIBUTION NORMAL (GAUSSIAN) DISTRIBUTION

NORMAL (GAUSSIAN) DISTRIBUTION NORMAL (GAUSSIAN) DISTRIBUTION “Standard” probability distribution

NORMAL (GAUSSIAN) DISTRIBUTION NORMAL (GAUSSIAN) DISTRIBUTION “Standard” probability distribution Has two parameters:

NORMAL (GAUSSIAN) DISTRIBUTION NORMAL (GAUSSIAN) DISTRIBUTION “Standard” probability distribution Has two parameters: mean ( μ ) and

NORMAL (GAUSSIAN) DISTRIBUTION NORMAL (GAUSSIAN) DISTRIBUTION “Standard” probability distribution Has two parameters: mean ( μ ) and standard deviation ( σ )

NORMAL (GAUSSIAN) DISTRIBUTION NORMAL (GAUSSIAN) DISTRIBUTION “Standard” probability distribution Has two parameters: mean ( μ ) and standard deviation ( σ ) Probability Density Function:

NORMAL (GAUSSIAN) DISTRIBUTION NORMAL (GAUSSIAN) DISTRIBUTION “Standard” probability distribution Has two parameters: mean ( μ ) and standard deviation ( σ ) Probability Density Function: (x− μ )2 1 N(x ∣ μ , σ 2) = e− 2 σ 2 √ 2π σ 2

PROBLEM: NORMAL DISTRIBUTION MIGHT NOT FIT DATA PROBLEM: NORMAL DISTRIBUTION MIGHT NOT FIT DATA What if the data is complicated?

PROBLEM: NORMAL DISTRIBUTION MIGHT NOT FIT DATA PROBLEM: NORMAL DISTRIBUTION MIGHT NOT FIT DATA What if the data is complicated? It’s easy to “fit” a normal model to any data.

PROBLEM: NORMAL DISTRIBUTION MIGHT NOT FIT DATA PROBLEM: NORMAL DISTRIBUTION MIGHT NOT FIT DATA What if the data is complicated? It’s easy to “fit” a normal model to any data. Just calculate μ and σ

PROBLEM: NORMAL DISTRIBUTION MIGHT NOT FIT DATA PROBLEM: NORMAL DISTRIBUTION MIGHT NOT FIT DATA What if the data is complicated? It’s easy to “fit” a normal model to any data. Just calculate μ and σ But this might not fit the data well.

MIXTURE OF NORMALS MIXTURE OF NORMALS Three groups of parameters:

MIXTURE OF NORMALS MIXTURE OF NORMALS Three groups of parameters: means ( μ ): location of each component

MIXTURE OF NORMALS MIXTURE OF NORMALS Three groups of parameters: means ( μ ): location of each component standard deviations ( σ ): width of each component

MIXTURE OF NORMALS MIXTURE OF NORMALS Three groups of parameters: means ( μ ): location of each component standard deviations ( σ ): width of each component Weight (π): height of each curve

MIXTURE OF NORMALS MIXTURE OF NORMALS Three groups of parameters: means ( μ ): location of each component standard deviations ( σ ): width of each component Weight (π): height of each curve Probability Density Function:

MIXTURE OF NORMALS MIXTURE OF NORMALS Three groups of parameters: means ( μ ): location of each component standard deviations ( σ ): width of each component Weight (π): height of each curve Probability Density Function: K πiN(x ∣ μ , σ 2) p(x) = ∑ i=1

THIS SOLVES OUR PROBLEM: THIS SOLVES OUR PROBLEM: Returning to our modelling problem, let’s plot the PDF of a evenly-weighted mixture of the two sample normal models. We set: In this case, I knew the right parameters, but normally you would have to estimate , or learn , these somehow…

THIS SOLVES OUR PROBLEM: THIS SOLVES OUR PROBLEM: Returning to our modelling problem, let’s plot the PDF of a evenly-weighted mixture of the two sample normal models. We set: K = 2 In this case, I knew the right parameters, but normally you would have to estimate , or learn , these somehow…

THIS SOLVES OUR PROBLEM: THIS SOLVES OUR PROBLEM: Returning to our modelling problem, let’s plot the PDF of a evenly-weighted mixture of the two sample normal models. We set: K = 2 π = [0.5, 0.5] In this case, I knew the right parameters, but normally you would have to estimate , or learn , these somehow…

THIS SOLVES OUR PROBLEM: THIS SOLVES OUR PROBLEM: Returning to our modelling problem, let’s plot the PDF of a evenly-weighted mixture of the two sample normal models. We set: K = 2 π = [0.5, 0.5] μ = [ − 5, 5] In this case, I knew the right parameters, but normally you would have to estimate , or learn , these somehow…

THIS SOLVES OUR PROBLEM: THIS SOLVES OUR PROBLEM: Returning to our modelling problem, let’s plot the PDF of a evenly-weighted mixture of the two sample normal models. We set: K = 2 π = [0.5, 0.5] μ = [ − 5, 5] σ = [2, 3] In this case, I knew the right parameters, but normally you would have to estimate , or learn , these somehow…

THIS SOLVES OUR PROBLEM: THIS SOLVES OUR PROBLEM: Returning to our modelling problem, let’s plot the PDF of a evenly-weighted mixture of the two sample normal models. We set: K = 2 π = [0.5, 0.5] μ = [ − 5, 5] σ = [2, 3] (bold used to indicate the vector of parameters for each component) In this case, I knew the right parameters, but normally you would have to estimate , or learn , these somehow…

MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS

MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS Neural networks used to model complicated real-valued data.

MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS Neural networks used to model complicated real-valued data. i.e., data that might not be very “normal”

MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS Neural networks used to model complicated real-valued data. i.e., data that might not be very “normal” Usual approach: use a neuron with linear activation to make predictions.

MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS Neural networks used to model complicated real-valued data. i.e., data that might not be very “normal” Usual approach: use a neuron with linear activation to make predictions. Training function could be MSE (mean squared error).

MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS Neural networks used to model complicated real-valued data. i.e., data that might not be very “normal” Usual approach: use a neuron with linear activation to make predictions. Training function could be MSE (mean squared error). Problem! This is equivalent to fitting to a single normal model! �

MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS Neural networks used to model complicated real-valued data. i.e., data that might not be very “normal” Usual approach: use a neuron with linear activation to make predictions. Training function could be MSE (mean squared error). Problem! This is equivalent to fitting to a single normal model! � (See Bishop, C (1994) for proof and more details)

MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS

MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS Idea: output parameters of a mixture model instead!

MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS Charles Martin - PowerPoint PPT Presentation

MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS Charles Martin SO FAR; RNNS THAT MODEL CATEGORICAL DATA SO FAR; RNNS THAT MODEL CATEGORICAL DATA SO FAR; RNNS THAT MODEL CATEGORICAL DATA SO FAR; RNNS THAT MODEL CATEGORICAL DATA Remember that

Bernoulli Mixture Models Victor Medina Researcher at SBIF DataCamp Mixture Models in R The

Structure of mixture models Victor Medina Researcher at SBIF DataCamp Mixture Models in R

Relative Density Chapters 3.5 Relative Density 1 2/5/2015 Minimum Density Pluviate soil from

Polyethylene Monomer: Ethylene High Density Polyethylene (HDPE) Low Density Polyethylene

Bulk Density and Void Content Bulk Density Bulk density ( n .) the mass of a unit volume of bulk

Joint Optimisation of Tandem Systems using Gaussian Mixture Density Neural Network Discriminative

3 M IXTURE DENSITY ESTIMATION In this chapter we consider mixture densities, the main building

Sub-quadratic Markov tree mixture models for probability density estimation Sourour Ammar 1 , Ph.

The Dark Matter density MW Components Global density Data: inner Data: outer Data: masers

Classification of High Dimensional Data By Two-way Mixture Models Jia Li Statistics Department

Solutions Unit 6 1 Solutions Homogenous Mixture (Solution) two or more substances mixed

Mixture Selection, Mechanism Design, and Signaling Ho Yee Cheung Shaddin Dughmi Yu Cheng Ehsan

Deep Gaussian Mixture Models Cinzia Viroli (University of Bologna, Italy) joint with Geoff

Constrained Mixture Estimation for Constrained Mixture Estimation Analysis and Robust

Binary liquid mixture of EmimBF 4 and methoxyethanol Binary liquid mixture excess molar volume

CSC321 Lecture 18: Mixture Modeling Roger Grosse Roger Grosse CSC321 Lecture 18: Mixture

Slope Stability Dr. Hend AlShatnawi Hashemite University Class of 2019-2020 Slope Stability

Interpolating sequences for the Dirichlet space Nicola Arcozzi, with R. Rochberg and E. Sawyer

Deep Learning - Theory and Practice Linear Regression, Least Squares 13-02-2020 Classification

Bible Study Bishop Sally Dyck Matthew 16:13-16 John 21:15-17 Acts 10:34-35 & 11:15-18 The

10701 Recitation 5 Duality and SVM Ahmed Hefny Outline Langrangian and Duality The

Variational Laplace Autoencoders Yookoon Park, Chris Dongjoo Kim and Gunhee Kim Vision and

Bishop Steven L. Ullestad Celebrating Renewal ST. ELIZABETH OF HUNGARY SERVICE Pastor

Cubical Exact Equality and Categorical Gluing J. Sterling 1 C. Angiuli 1 D. Gratzer 2 1 Department