Mixtures of Tree-Structured Probabilistic Graphical Models for - PowerPoint PPT Presentation

Mixtures of Tree-Structured Probabilistic Graphical Models for Density Estimation in High Dimensional Spaces F. Schnitzler University of Li` ege 24 September 2012 F. Schnitzler (ULG) Mixtures of Markov trees PhD defense 1 / 45

Density estimation Density estimation consists in learning a joint probability density x i ∼ P ( X ) � N � P ( X ), based on N realisations of the problem: i =1 . Example: estimating P ( X =”result of a dice throw”). ◮ A realisation belongs to { 1 , 2 , 3 , 4 , 5 , 6 } . ◮ 10 realisations: D = (2 , 3 , 1 , 3 , 6 , 1 , 4 , 2 , 6 , 2). ◮ A possible estimate, based on these realisations: P ( X = 1) = 2 / 10, P ( X = 2) = 3 / 10, P ( X = 3) = 3 / 10, P ( X = 4) = 1 / 10, P ( X = 5) = 0 / 10, P ( X = 6) = 1 / 10. In this thesis: density estimation for high-dimensional problems: ◮ high number of discrete variables p (thousands or more), ◮ low number of samples N (hundreds). F. Schnitzler (ULG) Mixtures of Markov trees PhD defense 2 / 45

Density estimation for electrical networks Dimensions Tens of thousands to millions (depending on the level of detail). Example: in the order of 10 000 transmis- sion nodes at extra-high voltage level, 100 000 wind turbines. Prediction Power flow in/out countries, based on local consump- tion? Production of solar/wind energy, based on the weather? Power in each line, based on production? F. Schnitzler (ULG) Mixtures of Markov trees PhD defense 3 / 45

Density estimation in bioinformatics Dimensions thousands to tens of thousands of genes, hundreds of thousands of proteins. Prediction Effect of a combination of diseases and treatments on gene expression level? Most efficient medicine to tackle a particular disease? F. Schnitzler (ULG) Mixtures of Markov trees PhD defense 4 / 45

Problem solving with probabilistic graphical models Question Learning Probabilistic Inference Learning set Answer algorithm graphical model algorithm Definitions Learning refers to the automatic construction of a model from a set of observations, the learning set . It may be done only once. A probabilistic graphical model encodes a probability density, e.g. a joint probability density over a set of variables: P ( X ). Probabilistic inference , on a given model and for a particular question, consists in computing an answer to the query. The more general the model, the more questions can be answered. F. Schnitzler (ULG) Mixtures of Markov trees PhD defense 5 / 45

What is so difficult about it? Algorithmic complexity Algorithmic complexity refers to the asymptotic complexity of an algorithm as a function of the size of the input problem. Example: a O ( p ) algorithm has a run-time that increases linearly with p , as p → ∞ . For problems with a high number of variables p , the algorithms must have a complexity that is a very low-order polynomial in p . Small number of samples → high variance Having few samples leads to variability in the models constructed. Illustration: dice throw example, 2 sequences of observations: ◮ D 1 = (2 , 3 , 1 , 3 , 6 , 1 , 4 , 2 , 6 , 2) : P 1 (”5”) = 0 / 10 ◮ D 2 = (1 , 5 , 2 , 6 , 1 , 1 , 6 , 1 , 6 , 5) : P 2 (”5”) = 2 / 10 Both models cannot be right: variance is a source of error. F. Schnitzler (ULG) Mixtures of Markov trees PhD defense 6 / 45

Mixtures of Tree-Structured Probabilistic Graphical Models for Density Estimation in High Dimensional Spaces B C D µ 1 A Goal: Density estimation for high p (number of A variables) low N (number of samples). µ 2 Mixture C algorithmic complexity B D → simple models ( ≡ Markov trees) high variance C D µ 3 → simple models A → mixture of simple models B m � P ˆ T ( X ) = µ i P T i ( X ) i =1 F. Schnitzler (ULG) Mixtures of Markov trees PhD defense 7 / 45

Mixtures of Tree-Structured Probabilistic Graphical Models for Density Estimation in High Dimensional Spaces 1 Background 2 Contributions (x3) 3 Final words Background: What is it you are doing again? ? Probabilistic graphical models ? Tree-structured probabilistic graphical models ? Mixtures F. Schnitzler (ULG) Mixtures of Markov trees PhD defense 8 / 45

A Bayesian network is a PGM encoding a joint probability density over a set of variables X . A Bayesian network is composed of 2 elements. The directed acyclic graph structure G encodes (conditional independence) relationships among variables. The set of parameters θ quantifies the probability density. Burglary Earthquake Alarm RadioNews The value of a variable NeighborCall is either “yes” or “no”. The Bayesian network reduces the number of parameters stored: P ( B , E , A , R , N ) = P ( B ) P ( E|B ) P ( A|B , E ) P ( R|B , E , A ) P ( N|B , E , A , R ) P ( B ) P ( E ) P ( A|B , E ) P ( R|E ) P ( N|A ) = parameters : 2 5 − 1 = 31 ↔ 1 + 1 + 4 + 2 + 2 = 10 F. Schnitzler (ULG) Mixtures of Markov trees PhD defense 9 / 45

A Markov tree is Bayesian network where the graphical structure is constrained to be a (connected) tree. Bayesian networks not a Bayesian Markov forests not a Markov network forest Markov trees not a Markov tree X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 3 X 2 X 3 X 2 X 3 X 4 X 4 X 4 X 4 F. Schnitzler (ULG) Mixtures of Markov trees PhD defense 10 / 45

The class of models considered is a tradeoff between capacity of representation and computational complexity. Question Learning Probabilistic Inference Learning set Answer algorithm graphical model algorithm Bayesian networks Markov trees Complexity superexponential in tractable for both learning ( O ( p 2 log p )) p for both learning and and inference inference ( O ( p )) Accuracy any probability den- only a restricted set of prob- sity can be encoded ability densities can be encoded Capacity of representation might be detrimental to accuracy! F. Schnitzler (ULG) Mixtures of Markov trees PhD defense 11 / 45

Average error of a model learnt = bias + variance. Overly simple class of models Overly complex class of models target polynomial target polynomial mean model learned mean model learned standard deviation limit standard deviation limit Bias is the difference between the mean model learned (over all possible sets of observations of a given size) and the best model. Variance is the average variability of a model learnt, with respect to the mean model learned. When the complexity of the class of models increases: ◮ bias tends to decrease, ◮ variance to increase. F. Schnitzler (ULG) Mixtures of Markov trees PhD defense 12 / 45

Constructing a mixture can reduce the variance. Normal algorithm: Error: Learning Learning set bias Variance ( D ) algorithm Perturb: randomize the learning algorithm Error: Perturbed Learning set bias Variance ( D ) Variance (algo) algorithm Perturb&Combine: generate several models and combine them Error: Variance (mixture algo) Perturbed m � = Learning set bias Variance ( D ) algorithm trees m → ∞ × m P ( X ) = 1 � m i =1 P i ( X ) m F. Schnitzler (ULG) Mixtures of Markov trees PhD defense 13 / 45

The validation of the algorithms is empirical. Algorithms tested Score of the algorithms: accuracy efficiency F. Schnitzler (ULG) Mixtures of Markov trees PhD defense 14 / 45

The validation of the algorithms is empirical. Algorithms tested Target distributions P Score of the algorithms: accuracy efficiency F. Schnitzler (ULG) Mixtures of Markov trees PhD defense 14 / 45

The validation of the algorithms is empirical. Algorithms tested Learning sets Generated mixtures Target distributions P Score of the algorithms: accuracy efficiency F. Schnitzler (ULG) Mixtures of Markov trees PhD defense 14 / 45

Mixtures of Tree-Structured Probabilistic Graphical Models for - PowerPoint PPT Presentation

Mixtures of Tree-Structured Probabilistic Graphical Models for Density Estimation in High Dimensional Spaces F. Schnitzler University of Li` ege 24 September 2012 F. Schnitzler (ULG) Mixtures of Markov trees PhD defense 1 / 45 Density

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic Graphical Models A graph G that

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak

Probabilistic Graphical Models Probabilistic Graphical Models introduction to learning Siamak

Computer Science Let me be provocative Probabilistic graphical models is how we do probabilistic

CS 6782: Fall 2010 Probabilistic Graphical Models Guozhang Wang December 10, 2010 1

CSCE 496/896 Lecture 11: Structured Prediction and Structured Prediction and Probabilistic

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Analysis of a model of elastic plastic mixtures (Prandtl-Reuss-mixtures) Project of Josef

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Final Examples Announcements Trees Tree-Structured Data def tree(label, branches=[]): A tree

Probabilistic Graphical Models Probabilistic Graphical Models Review of probability theory

Probabilistic Graphical Models Probabilistic Graphical Models Loopy BP and Bethe Free Energy

Probabilistic Graphical Models Probabilistic Graphical Models Structure learning in Bayesian

Probabilistic Graphical Models Probabilistic Graphical Models MAP inference Siamak Ravanbakhsh

Hidden Markov Models Biostatistics 615/815 Lecture 12: . . . . . . Summary . . Example

Determination of the probability of target attainment (PTA) Topic 3b Section 4.4 Matthew Rizk

Distribution of Eigenvalues of Linear Stochastic Systems S A DHIKARI and R. S. L ANGLEY

Ehsan Nazerfard nazerfard@eecs.wsu.edu October 11, 2011 Introduction: Graphical Models

THE RELIABILITY-BASED PROBABILISTIC APPROACH FOR COMPOSITE TAIL PLANE STRUCTURES S. Lee, I. Kim*,

Agricultural & Food Sector C. Robert Taylor Professor Emeritus of Agricultural Economics

Contrastive Divergence by Accelerated Langevin Dynamics . . . Masayuki Ohzeki Kyoto

Program for Heat-map Entropy Evaluation of Eye-tracking Data Seung-Bin Son a , Yejin Lee a ,

Mixtures of Tree-Structured Probabilistic Graphical Models for - PowerPoint PPT Presentation

Mixtures of Tree-Structured Probabilistic Graphical Models for Density Estimation in High Dimensional Spaces F. Schnitzler University of Li` ege 24 September 2012 F. Schnitzler (ULG) Mixtures of Markov trees PhD defense 1 / 45 Density

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic Graphical Models A graph G that

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak

Probabilistic Graphical Models Probabilistic Graphical Models introduction to learning Siamak

Computer Science Let me be provocative Probabilistic graphical models is how we do probabilistic

CS 6782: Fall 2010 Probabilistic Graphical Models Guozhang Wang December 10, 2010 1

CSCE 496/896 Lecture 11: Structured Prediction and Structured Prediction and Probabilistic

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Analysis of a model of elastic plastic mixtures (Prandtl-Reuss-mixtures) Project of Josef

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Final Examples Announcements Trees Tree-Structured Data def tree(label, branches=[]): A tree

Probabilistic Graphical Models Probabilistic Graphical Models Review of probability theory

Probabilistic Graphical Models Probabilistic Graphical Models Loopy BP and Bethe Free Energy

Probabilistic Graphical Models Probabilistic Graphical Models Structure learning in Bayesian

Probabilistic Graphical Models Probabilistic Graphical Models MAP inference Siamak Ravanbakhsh

Hidden Markov Models Biostatistics 615/815 Lecture 12: . . . . . . Summary . . Example

Determination of the probability of target attainment (PTA) Topic 3b Section 4.4 Matthew Rizk

Distribution of Eigenvalues of Linear Stochastic Systems S A DHIKARI and R. S. L ANGLEY

Ehsan Nazerfard nazerfard@eecs.wsu.edu October 11, 2011 Introduction: Graphical Models

THE RELIABILITY-BASED PROBABILISTIC APPROACH FOR COMPOSITE TAIL PLANE STRUCTURES S. Lee, I. Kim*,

Agricultural &amp; Food Sector C. Robert Taylor Professor Emeritus of Agricultural Economics

Contrastive Divergence by Accelerated Langevin Dynamics . . . Masayuki Ohzeki Kyoto

Program for Heat-map Entropy Evaluation of Eye-tracking Data Seung-Bin Son a , Yejin Lee a ,

Agricultural & Food Sector C. Robert Taylor Professor Emeritus of Agricultural Economics