mixtures of tree structured probabilistic graphical
play

Mixtures of Tree-Structured Probabilistic Graphical Models for - PowerPoint PPT Presentation

Mixtures of Tree-Structured Probabilistic Graphical Models for Density Estimation in High Dimensional Spaces F. Schnitzler University of Li` ege 24 September 2012 F. Schnitzler (ULG) Mixtures of Markov trees PhD defense 1 / 45 Density


  1. Mixtures of Tree-Structured Probabilistic Graphical Models for Density Estimation in High Dimensional Spaces F. Schnitzler University of Li` ege 24 September 2012 F. Schnitzler (ULG) Mixtures of Markov trees PhD defense 1 / 45

  2. Density estimation Density estimation consists in learning a joint probability density x i ∼ P ( X ) � N � P ( X ), based on N realisations of the problem: i =1 . Example: estimating P ( X =”result of a dice throw”). ◮ A realisation belongs to { 1 , 2 , 3 , 4 , 5 , 6 } . ◮ 10 realisations: D = (2 , 3 , 1 , 3 , 6 , 1 , 4 , 2 , 6 , 2). ◮ A possible estimate, based on these realisations: P ( X = 1) = 2 / 10, P ( X = 2) = 3 / 10, P ( X = 3) = 3 / 10, P ( X = 4) = 1 / 10, P ( X = 5) = 0 / 10, P ( X = 6) = 1 / 10. In this thesis: density estimation for high-dimensional problems: ◮ high number of discrete variables p (thousands or more), ◮ low number of samples N (hundreds). F. Schnitzler (ULG) Mixtures of Markov trees PhD defense 2 / 45

  3. Density estimation for electrical networks Dimensions Tens of thousands to millions (depending on the level of detail). Example: in the order of 10 000 transmis- sion nodes at extra-high voltage level, 100 000 wind turbines. Prediction Power flow in/out countries, based on local consump- tion? Production of solar/wind energy, based on the weather? Power in each line, based on production? F. Schnitzler (ULG) Mixtures of Markov trees PhD defense 3 / 45

  4. Density estimation in bioinformatics Dimensions thousands to tens of thousands of genes, hundreds of thousands of proteins. Prediction Effect of a combination of diseases and treatments on gene expression level? Most efficient medicine to tackle a particular disease? F. Schnitzler (ULG) Mixtures of Markov trees PhD defense 4 / 45

  5. Problem solving with probabilistic graphical models Question Learning Probabilistic Inference Learning set Answer algorithm graphical model algorithm Definitions Learning refers to the automatic construction of a model from a set of observations, the learning set . It may be done only once. A probabilistic graphical model encodes a probability density, e.g. a joint probability density over a set of variables: P ( X ). Probabilistic inference , on a given model and for a particular question, consists in computing an answer to the query. The more general the model, the more questions can be answered. F. Schnitzler (ULG) Mixtures of Markov trees PhD defense 5 / 45

  6. What is so difficult about it? Algorithmic complexity Algorithmic complexity refers to the asymptotic complexity of an algorithm as a function of the size of the input problem. Example: a O ( p ) algorithm has a run-time that increases linearly with p , as p → ∞ . For problems with a high number of variables p , the algorithms must have a complexity that is a very low-order polynomial in p . Small number of samples → high variance Having few samples leads to variability in the models constructed. Illustration: dice throw example, 2 sequences of observations: ◮ D 1 = (2 , 3 , 1 , 3 , 6 , 1 , 4 , 2 , 6 , 2) : P 1 (”5”) = 0 / 10 ◮ D 2 = (1 , 5 , 2 , 6 , 1 , 1 , 6 , 1 , 6 , 5) : P 2 (”5”) = 2 / 10 Both models cannot be right: variance is a source of error. F. Schnitzler (ULG) Mixtures of Markov trees PhD defense 6 / 45

  7. What is so difficult about it? Algorithmic complexity Algorithmic complexity refers to the asymptotic complexity of an algorithm as a function of the size of the input problem. Example: a O ( p ) algorithm has a run-time that increases linearly with p , as p → ∞ . For problems with a high number of variables p , the algorithms must have a complexity that is a very low-order polynomial in p . Small number of samples → high variance Having few samples leads to variability in the models constructed. Illustration: dice throw example, 2 sequences of observations: ◮ D 1 = (2 , 3 , 1 , 3 , 6 , 1 , 4 , 2 , 6 , 2) : P 1 (”5”) = 0 / 10 ◮ D 2 = (1 , 5 , 2 , 6 , 1 , 1 , 6 , 1 , 6 , 5) : P 2 (”5”) = 2 / 10 Both models cannot be right: variance is a source of error. F. Schnitzler (ULG) Mixtures of Markov trees PhD defense 6 / 45

  8. Mixtures of Tree-Structured Probabilistic Graphical Models for Density Estimation in High Dimensional Spaces B C D µ 1 A Goal: Density estimation for high p (number of A variables) low N (number of samples). µ 2 Mixture C algorithmic complexity B D → simple models ( ≡ Markov trees) high variance C D µ 3 → simple models A → mixture of simple models B m � P ˆ T ( X ) = µ i P T i ( X ) i =1 F. Schnitzler (ULG) Mixtures of Markov trees PhD defense 7 / 45

  9. Mixtures of Tree-Structured Probabilistic Graphical Models for Density Estimation in High Dimensional Spaces 1 Background 2 Contributions (x3) 3 Final words Background: What is it you are doing again? ? Probabilistic graphical models ? Tree-structured probabilistic graphical models ? Mixtures F. Schnitzler (ULG) Mixtures of Markov trees PhD defense 8 / 45

  10. Mixtures of Tree-Structured Probabilistic Graphical Models for Density Estimation in High Dimensional Spaces 1 Background 2 Contributions (x3) 3 Final words Background: What is it you are doing again? ? Probabilistic graphical models ? Tree-structured probabilistic graphical models ? Mixtures F. Schnitzler (ULG) Mixtures of Markov trees PhD defense 8 / 45

  11. A Bayesian network is a PGM encoding a joint probability density over a set of variables X . A Bayesian network is composed of 2 elements. The directed acyclic graph structure G encodes (conditional independence) relationships among variables. The set of parameters θ quantifies the probability density. Burglary Earthquake Alarm RadioNews The value of a variable NeighborCall is either “yes” or “no”. The Bayesian network reduces the number of parameters stored: P ( B , E , A , R , N ) = P ( B ) P ( E|B ) P ( A|B , E ) P ( R|B , E , A ) P ( N|B , E , A , R ) P ( B ) P ( E ) P ( A|B , E ) P ( R|E ) P ( N|A ) = parameters : 2 5 − 1 = 31 ↔ 1 + 1 + 4 + 2 + 2 = 10 F. Schnitzler (ULG) Mixtures of Markov trees PhD defense 9 / 45

  12. A Markov tree is Bayesian network where the graphical structure is constrained to be a (connected) tree. Bayesian networks not a Bayesian Markov forests not a Markov network forest Markov trees not a Markov tree X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 3 X 2 X 3 X 2 X 3 X 4 X 4 X 4 X 4 F. Schnitzler (ULG) Mixtures of Markov trees PhD defense 10 / 45

  13. The class of models considered is a tradeoff between capacity of representation and computational complexity. Question Learning Probabilistic Inference Learning set Answer algorithm graphical model algorithm Bayesian networks Markov trees Complexity superexponential in tractable for both learning ( O ( p 2 log p )) p for both learning and and inference inference ( O ( p )) Accuracy any probability den- only a restricted set of prob- sity can be encoded ability densities can be en- coded Capacity of representation might be detrimental to accuracy! F. Schnitzler (ULG) Mixtures of Markov trees PhD defense 11 / 45

  14. The class of models considered is a tradeoff between capacity of representation and computational complexity. Question Learning Probabilistic Inference Learning set Answer algorithm graphical model algorithm Bayesian networks Markov trees Complexity superexponential in tractable for both learning ( O ( p 2 log p )) p for both learning and and inference inference ( O ( p )) Accuracy any probability den- only a restricted set of prob- sity can be encoded ability densities can be en- coded Capacity of representation might be detrimental to accuracy! F. Schnitzler (ULG) Mixtures of Markov trees PhD defense 11 / 45

  15. Average error of a model learnt = bias + variance. Overly simple class of models Overly complex class of models target polynomial target polynomial mean model learned mean model learned standard deviation limit standard deviation limit Bias is the difference between the mean model learned (over all possible sets of observations of a given size) and the best model. Variance is the average variability of a model learnt, with respect to the mean model learned. When the complexity of the class of models increases: ◮ bias tends to decrease, ◮ variance to increase. F. Schnitzler (ULG) Mixtures of Markov trees PhD defense 12 / 45

  16. Constructing a mixture can reduce the variance. Normal algorithm: Error: Learning Learning set bias Variance ( D ) algorithm Perturb: randomize the learning algorithm Error: Perturbed Learning set bias Variance ( D ) Variance (algo) algorithm Perturb&Combine: generate several models and combine them Error: Variance (mixture algo) Perturbed m � = Learning set bias Variance ( D ) algorithm trees m → ∞ × m P ( X ) = 1 � m i =1 P i ( X ) m F. Schnitzler (ULG) Mixtures of Markov trees PhD defense 13 / 45

  17. The validation of the algorithms is empirical. Algorithms tested Score of the algorithms: accuracy efficiency F. Schnitzler (ULG) Mixtures of Markov trees PhD defense 14 / 45

  18. The validation of the algorithms is empirical. Algorithms tested Target distributions P Score of the algorithms: accuracy efficiency F. Schnitzler (ULG) Mixtures of Markov trees PhD defense 14 / 45

  19. The validation of the algorithms is empirical. Algorithms tested Learning sets Generated mixtures Target distributions P Score of the algorithms: accuracy efficiency F. Schnitzler (ULG) Mixtures of Markov trees PhD defense 14 / 45

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend