outline
play

Outline Motivation and challenge Dirichlet Process and Infinite - PDF document

Nonparametric Bayesian M Nonparametric Bayesian Models odels --Learning and Reasoning in Open Possible Worlds -- Learning and Reasoning in Open Possible Worlds Eric Xing epxing@cs.cmu.edu Machine Learning Dept./Language Technology


  1. Nonparametric Bayesian M Nonparametric Bayesian Models odels --Learning and Reasoning in Open Possible Worlds -- Learning and Reasoning in Open Possible Worlds Eric Xing epxing@cs.cmu.edu Machine Learning Dept./Language Technology Inst./Computer Science Dept. Carnegie Mellon University 1 VLPR09 @ Beijing, China 8/6/2009 Outline � Motivation and challenge � Dirichlet Process and Infinite Mixture � Formulation � Approximate Inference algorithm Example: population clustering � � Hierarchical Dirichlet Process and Multi-Task Clustering � Formulation � Transformed DP and HDP � Kernel stick-breaking process Application: joint image segmentation � � Dynamic Dirichlet Process � Hidden Markov DP � Temporal DPM Application: evolutionary clustering of documents � � Summary 2 VLPR09 @ Beijing, China 8/6/2009

  2. Clustering 3 VLPR09 @ Beijing, China 8/6/2009 Image Segmentation � How to segment images? � Manual segmentation (very expensive) � Algorithm segmentation � K-means � Statistical mixture models � Spectral clustering � Problems with most existing algorithms � Ignore the spatial information � Perform the segmentation one image at a time � Need to specify the number of segments a priori 4 VLPR09 @ Beijing, China 8/6/2009

  3. Discover Object Categories � Discover what objects are present in a collection of images in an unsupervised way � Find those same objects in novel images � Determine what local image features correspond to what objects; segmenting the image 5 VLPR09 @ Beijing, China 8/6/2009 Learn and Recognize Natural Scene Categories 6 VLPR09 @ Beijing, China 8/6/2009

  4. Object Recognition and Tracking (1.9, 9.0, 2.1) (1.8, 7.4, 2.3) (1.9, 6.1, 2.2) (0.7, 5.1, 3.2) (0.6, 5.9, 3.2) (0.9, 5.8, 3.1) t=1 t=2 t=3 7 VLPR09 @ Beijing, China 8/6/2009 The Evolution of Science Research Research circles circles Phy Phy Bio Research Research topics topics CS PNAS papers papers PNAS 2000 ? 1900 8 VLPR09 @ Beijing, China 8/6/2009

  5. A Classical Approach � Clustering as Mixture Modeling � Then "model selection" 9 VLPR09 @ Beijing, China 8/6/2009 Partially Observed, Open and Evolving Possible Worlds � Unbounded # of objects/trajectories � Changing attributes � Birth/death, merge/split � Relational ambiguity � The parametric paradigm: � Finite ( ) Event model motion model ( { } Event model ) ( { } ) motion model { } { } 0 1 + 1 p φ φ T φ t φ t p : p � Structurally k or k k k Entity space Entity space unambiguous Ξ Ξ * * 1 + | 1 + t | t t t observation space observation space Sensor model Sensor model ( { } ) φ p | x k How to open it up? How to open it up? 10 10 VLPR09 @ Beijing, China 8/6/2009

  6. Model Selection vs. Posterior Inference � Model selection � "intelligent" guess: ??? � cross validation: data-hungry � � information theoretic: ( ) � AIC f g ˆ ⋅ ⋅ θ KL K arg min ( ) | ( | , ) ML � TIC Parsimony, Ockam's Parsimony, Ockam's Razor Razor � MDL : � Bayes factor: need to compute data likelihood � Posterior inference: we want to handle uncertainty of model complexity explicitly p M D p D M p M ∝ ( | ) ( | ) ( ) { } M ≡ θ K , � we favor a distribution that does not constrain M in a "closed" space! 11 11 VLPR09 @ Beijing, China 8/6/2009 Two "Recent" Developments � First order probabilistic languages (FOPLs) � Examples: PRM, BLOG … � Lift graphical models to "open" world (#rv, relation, index, lifespan …) � Focus on complete, consistent, and operating rules to instantiate possible worlds, and formal language of expressing such rules � Operational way of defining distributions over possible worlds, via sampling methods � Bayesian Nonparametrics � Examples: Dirichlet processes, stick-breaking processes … � From finite, to infinite mixture, to more complex constructions (hierarchies, spatial/temporal sequences, …) � Focus on the laws and behaviors of both the generative formalisms and resulting distributions � Often offer explicit expression of distributions, and expose the structure of the distributions --- motivate various approximate schemes 12 12 VLPR09 @ Beijing, China 8/6/2009

  7. Outline � Motivation and challenge � Dirichlet Process and Infinite Mixture � Formulation � Approximate Inference algorithm Example: population clustering � � Hierarchical Dirichlet Process and Multi-Task Clustering � Formulation � Transformed DP and HDP � Kernel stick-breaking process � Application: joint image segmentation � Dynamic Dirichlet Process � Hidden Markov DP � Temporal DPM Application: evolutionary clustering of documents � � Summary 13 13 VLPR09 @ Beijing, China 8/6/2009 Clustering � How to label them ? � How many clusters ??? 14 14 VLPR09 @ Beijing, China 8/6/2009

  8. Random Partition of Probability Space { } φ , 6 π 6 { } φ , 4 π 4 { } . ( event, p event ) φ , 5 π 5 { } φ , 3 π 3 { } centroid := φ φ , 2 π 2 { } Image ele. :=( x, θ ) … φ , 1 π 1 15 15 VLPR09 @ Beijing, China 8/6/2009 Stick-breaking Process 0 0.4 0.4 ∞ ∑ G = π δ θ ( ) k k k 1 = 0.6 0.5 0.3 G θ ~ k 0 ∞ ∑ Location 1 0.3 0.8 0.24 π = k k 1 = k 1 - ∏ 1 π = β β ( - ) k k k j 1 = Mass G 0 1 β α ~ Beta( , ) k 16 16 VLPR09 @ Beijing, China 8/6/2009

  9. DP – a P ó lya urn Process 2 = 5 p + α 3 = 5 p + α α = 5 p + α = G p K : ( ) 0 ( ) α G DP( G ) ~ Joint: Joint: 0 � Self-reinforcing property α K n � exchangeable partition ∑ φ φ α δ + G k G | , , ~ . Marginal: Marginal: 0 0 i − i 1 φ 1 − + α − + α i i of samples k = 1 k 17 17 VLPR09 @ Beijing, China 8/6/2009 Clustering and DP Mixture 2 = 5 p + α 3 = 5 p + α α = 5 p + α = G p K : ( ) 0 1 3 2 4 5 6 � We can associate mixture components with colors in the Pólya urn model and thereby define a clustering of the data 18 18 VLPR09 @ Beijing, China 8/6/2009

  10. Chinese Restaurant Process θ θ 1 2 P c k 1 0 0 ( = | ) = c i - i α 1 0 α α 1 + 1 + α 1 1 α α α 2 + 2 + 2 + α 1 2 α α α 3 + 3 + 3 + α m m .... 1 2 α α + α i i i + - 1 + - 1 - 1 19 19 VLPR09 @ Beijing, China 8/6/2009 Dirichlet Process � A CDF , G , on possible worlds φ φ 6 of random partitions follows a 4 φ φ 5 3 φ Dirichlet Process if for any 2 φ 1 measurable finite partition ( φ 1 , φ 2 , .., φ m ): a distribution ( G ( φ 1 ), G( φ 2 ), …, G ( φ m ) ) ~ Dirichlet( α G 0 ( φ 1 ), …., α G 0( φ m ) ) another distribution where G 0 is the base measure and α is the scale parameter Process G G defines a distribution of distribution Thus a Thus a Dirichlet Dirichlet Process defines a distribution of distribution 20 20 VLPR09 @ Beijing, China 8/6/2009

  11. Graphical Model Representations of DP G 0 G G 0 G 0 0 α α α α G θ π ∞ θ i y i x i x i N N The Pólya urn construction The Stick-breaking construction 21 21 VLPR09 @ Beijing, China 8/6/2009 Example: DP-haplotyper [Xing et al, 2004] � Clustering human populations α G 0 DP G K infinite mixture components A θ (for population haplotypes haplotypes ) H n 1 H n 2 Likelihood model (for individual haplotypes and genotypes genotypes ) G n N � Inference: Markov Chain Monte Carlo (MCMC) � Gibbs sampling � Metropolis Hasting 22 22 VLPR09 @ Beijing, China 8/6/2009

  12. Inheritance and Observation Models � Single-locus mutation model A 1 → A A H C i 2 Ancestral i e C A e i 3 pool 1 … ⎧ θ = h a for t t ⎪ C θ = 1 − θ P h a ⎨ ( | , ) i H t t ≠ h a for 2 ⎪ − 1 t t B ⎩ | | → = θ h a with prob . H t t i 1 Haplotypes � Noisy observation model H i 2 → H H G 1 , i i i 2 P g h h ( | , ) : 1 2 G G Genotype = ⊕ λ g h h with prob i . 1 2 t , t , t 23 23 VLPR09 @ Beijing, China 8/6/2009 MCMC for Haplotype Inference � Gibbs sampling for exploring the posterior distribution under the proposed model θ λ c a e , � Integrate out the parameters such as or , and sample i k h and i e = ∝ = p c k p c k p h a ( | , , ) ( | ) ( | , ) c h a c h c − − − i i i i i k i [ ] [ ] , [ ] e e e e e e Posterior Prior x Likelihood Pólya urn M � Gibbs sampling algorithm: draw samples of each random variable to be sampled given values of all the remaining variables 24 24 VLPR09 @ Beijing, China 8/6/2009

  13. MCMC for Haplotype Inference Sample c ie(j) , from 1. Sample a k from 2. Sample h ie(j) from 3. For DP scale parameter α : a vague inverse Gamma prior � 25 25 VLPR09 @ Beijing, China 8/6/2009 Convergence of Ancestral Inference 26 26 VLPR09 @ Beijing, China 8/6/2009

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend