On some distributional properties of Gibbs-type priors Igor Pr - PowerPoint PPT Presentation

On some distributional properties of Gibbs-type priors Igor Pr¨ unster University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics Workshop ICERM, 21st September 2012 Joint work with: P. De Blasi, S. Favaro, A. Lijoi and R. Mena Gibbs–type priors 1 / 35

Outline Bayesian Nonparametric Modeling Discrete nonparametric priors Gibbs–type priors Weak support Stick–breaking representation Distribution on the number of clusters Prior distribution on the number of clusters Posterior distribution on the number of cluster Discovery probability in species sampling problems Frequentist nonparametric estimators BNP approach to discovery probability estimation Frequentist Posterior Consistency Discrete “ true ” distribution Continuous “ true ” distribution Gibbs–type priors 2 / 35

BNP Modeling Discrete nonparametric priors The Bayesian nonparametric framework de Finetti’s representation theorem: a sequence of X –valued observations ( X n ) n ≥ 1 is exchangeable if and only if for any n ≥ 1 iid X i | ˜ ˜ P ∼ P i = 1 , . . . , n ˜ ∼ Q P = ⇒ Q , defined on the space of probability measures P , is the de Finetti measure of ( X n ) n ≥ 1 and acts as a prior distribution for Bayesian inference being the law of a random probability measure ˜ P . Gibbs–type priors 3 / 35

BNP Modeling Discrete nonparametric priors The Bayesian nonparametric framework de Finetti’s representation theorem: a sequence of X –valued observations ( X n ) n ≥ 1 is exchangeable if and only if for any n ≥ 1 iid X i | ˜ ˜ P ∼ P i = 1 , . . . , n ˜ ∼ Q P = ⇒ Q , defined on the space of probability measures P , is the de Finetti measure of ( X n ) n ≥ 1 and acts as a prior distribution for Bayesian inference being the law of a random probability measure ˜ P . If Q is not degenerate on a subclass of P indexed by a finite dimensional parameter, it leads to a nonparametric model = ⇒ natural requirement (Ferguson, 1974): Q should have “large” support (possibly the whole P ) Gibbs–type priors 3 / 35

BNP Modeling Discrete nonparametric priors Discrete nonparametric priors If Q selects (a.s.) discrete distributions i.e. ˜ P is a discrete random probability measure ˜ � P ( · ) = p i δ Z i ( · ) , ˜ ( ♦ ) i ≥ 1 then a sample ( X 1 , . . . , X n ) will exhibit ties with positive probability i.e. feature K n distinct observations X ∗ 1 , . . . , X ∗ K n with frequencies N 1 , . . . , N K n such that � K n i =1 N i = n . Gibbs–type priors 4 / 35

BNP Modeling Discrete nonparametric priors Discrete nonparametric priors If Q selects (a.s.) discrete distributions i.e. ˜ P is a discrete random probability measure ˜ � P ( · ) = p i δ Z i ( · ) , ˜ ( ♦ ) i ≥ 1 then a sample ( X 1 , . . . , X n ) will exhibit ties with positive probability i.e. feature K n distinct observations X ∗ 1 , . . . , X ∗ K n with frequencies N 1 , . . . , N K n such that � K n i =1 N i = n . 1. Species sampling: model for species distribution within a population • X ∗ i is the i –the distinct species in the sample; • N i is the frequency of X ∗ i ; • K n is total number of distinct species in the sample. ⇒ Species metaphor = 2. Density estimation and clustering of latent variables: model for a latent level of a hierarchical model; many successful applications can be traced back to this idea due to Lo (1984) where the mixture of Dirichlet process is introduced. Gibbs–type priors 4 / 35

BNP Modeling Discrete nonparametric priors Probability of discovering a new species A key quantity is the probability of discovering a new species P [ X n +1 = “new” | X ( n ) ] ( ∗ ) where throughout we set X ( n ) := ( X 1 , . . . , X n ). Gibbs–type priors 5 / 35

BNP Modeling Discrete nonparametric priors Probability of discovering a new species A key quantity is the probability of discovering a new species P [ X n +1 = “new” | X ( n ) ] ( ∗ ) where throughout we set X ( n ) := ( X 1 , . . . , X n ). Discrete ˜ P can be classified in 3 categories according to ( ∗ ): (a) P [ X n +1 = “new” | X ( n ) ] = f ( n , model parameters) ⇐ ⇒ depends on n but not on K n and N n = ( N 1 , . . . , N K n ) ⇒ Dirichlet process (Ferguson, 1973); = Gibbs–type priors 5 / 35

BNP Modeling Discrete nonparametric priors Probability of discovering a new species A key quantity is the probability of discovering a new species P [ X n +1 = “new” | X ( n ) ] ( ∗ ) where throughout we set X ( n ) := ( X 1 , . . . , X n ). Discrete ˜ P can be classified in 3 categories according to ( ∗ ): (a) P [ X n +1 = “new” | X ( n ) ] = f ( n , model parameters) ⇐ ⇒ depends on n but not on K n and N n = ( N 1 , . . . , N K n ) ⇒ Dirichlet process (Ferguson, 1973); = (b) P [ X n +1 = “new” | X ( n ) ] = f ( n , K n , model parameters) ⇐ ⇒ depends on n and K n but not on N n = ( N 1 , . . . , N K n ) ⇐ ⇒ Gibbs–type priors (Gnedin and Pitman, 2006); (c) P [ X n +1 = “new” | X ( n ) ] = f ( n , K n , N n , model parameters) ⇐ ⇒ depends on all information conveyed by the sample i.e. n , K n and N n = ( N 1 , . . . , N K n ) ⇐ ⇒ serious tractability issues. Gibbs–type priors 5 / 35

BNP Modeling Gibbs–type priors Complete predictive structure ˜ P is a Gibbs-type random probability measure of order σ ∈ ( −∞ , 1) if and only if it gives rise to predictive distributions of the form K n � � � = V n +1 , K n +1 P ∗ ( A ) + V n +1 , K n � X ( n ) � � X n +1 ∈ A ( N i − σ ) δ X ∗ i ( A ) , ( ◦ ) P � V n , K n V n , K n i =1 where { V n , j : n ≥ 1 , 1 ≤ j ≤ n } is a set of weights which satisfy the recursion V n , j = ( n − j σ ) V n +1 , j + V n +1 , j +1 . ( ♦ ) = ⇒ completely characterized by choice of σ < 1 and a set of weights V n , j ’s. Gibbs–type priors 6 / 35

BNP Modeling Gibbs–type priors Complete predictive structure ˜ P is a Gibbs-type random probability measure of order σ ∈ ( −∞ , 1) if and only if it gives rise to predictive distributions of the form K n � � � = V n +1 , K n +1 P ∗ ( A ) + V n +1 , K n � X ( n ) � � X n +1 ∈ A ( N i − σ ) δ X ∗ i ( A ) , ( ◦ ) P � V n , K n V n , K n i =1 where { V n , j : n ≥ 1 , 1 ≤ j ≤ n } is a set of weights which satisfy the recursion V n , j = ( n − j σ ) V n +1 , j + V n +1 , j +1 . ( ♦ ) = ⇒ completely characterized by choice of σ < 1 and a set of weights V n , j ’s. � k − 1 i =1 ( θ + i σ ) with σ ≥ 0 and θ > − σ or σ < 0 and θ = r | σ | with E.g., if V n , j = ( θ +1) n − 1 r ∈ N , one obtains the two parameter Poisson–Dirichlet (PD) process (Perman, Pitman & Yor, 1992) aka Pitman–Yor process, which yields K n � � � = θ + K n σ 1 � � � X ( n ) P ∗ ( A ) + P X n +1 ∈ A ( N i − σ ) δ X ∗ i ( A ) . � θ + n θ + n i =1 ⇒ if σ = 0, the PD reduces to the Dirichlet process and θ + K n σ θ = to θ + n . θ + n Gibbs–type priors 6 / 35

BNP Modeling Gibbs–type priors The Gibbs–structure allows to look at the predictive distributions as the result of two steps: (1) X n +1 is a new species with probability V n +1 , K n +1 / V n , K n , whereas it equals one of the “old” { X ∗ 1 , . . . , X ∗ K n } with probability 1 − V n +1 , K n +1 / V n , K n = ( n − K n σ ) V n +1 , K n / V n , K n ⇒ This step depends on n and K n but not on the frequencies = N n = ( N 1 , . . . , N K n ). Gibbs–type priors 7 / 35

BNP Modeling Gibbs–type priors The Gibbs–structure allows to look at the predictive distributions as the result of two steps: (1) X n +1 is a new species with probability V n +1 , K n +1 / V n , K n , whereas it equals one of the “old” { X ∗ 1 , . . . , X ∗ K n } with probability 1 − V n +1 , K n +1 / V n , K n = ( n − K n σ ) V n +1 , K n / V n , K n ⇒ This step depends on n and K n but not on the frequencies = N n = ( N 1 , . . . , N K n ). (2) (i) Given X n +1 is new, it is independently sampled from P ∗ . (ii) Given X n +1 is a tie, it coincides with X ∗ i with probability ( N i − σ ) / ( n − K n σ ) . Gibbs–type priors 7 / 35

BNP Modeling Gibbs–type priors Who are the members of this class of priors? Gnedin and Pitman (2006) provided also a characterization of Gibbs–type priors according to the value of σ : ◮ σ = 0 = ⇒ Dirichlet process or Dirichlet process mixed over its total mass parameter θ > 0; Gibbs–type priors 8 / 35

BNP Modeling Gibbs–type priors Who are the members of this class of priors? Gnedin and Pitman (2006) provided also a characterization of Gibbs–type priors according to the value of σ : ◮ σ = 0 = ⇒ Dirichlet process or Dirichlet process mixed over its total mass parameter θ > 0; ◮ 0 < σ < 1 = ⇒ random probability measures closely related to a normalized σ –stable process (Poisson–Kingman models based on the σ -stable process) characterized by σ and a probability distribution γ . Special cases: in addition to the PD process another noteworthy example is given by the normalized generalized gamma process (NGG) for which V n , j = e β σ j − 1 n − 1 � � � � n − 1 j − i ( − 1) i β i /σ Γ � σ ; β , Γ( n ) i i =0 where β > 0, σ ∈ (0 , 1) and Γ( x , a ) denotes the incomplete gamma function. If σ = 1 / 2 it reduces to the normalized inverse Gaussian process (N–IG). Gibbs–type priors 8 / 35

On some distributional properties of Gibbs-type priors Igor Pr - PowerPoint PPT Presentation

On some distributional properties of Gibbs-type priors Igor Pr unster University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics Workshop ICERM, 21st September 2012 Joint work with: P. De Blasi, S. Favaro, A. Lijoi and R.

Gibbs-non-Gibbs dynamical transitions. A large-deviation paradigm R. Fern andez F. den

Linear mixed models with improper priors and flexible distributional assumptions for longitudinal

Gibbs sampling Dr. Jarad Niemi Iowa State University March 29, 2018 Jarad Niemi (Iowa State)

Type Checking Grammar Rule Semantic Rule var-decl id : type-exp Insert (id.name, type-exp .

Conjugate Priors: Beta and Normal; Choosing Priors 18.05 Spring 2014 Jeremy Orloff and Jonathan

Conjugate Priors: Beta and Normal; Choosing Priors 18.05 Spring 2014 Jeremy Orloff and Jonathan

Factors of Gibbs measures on subshifts What is a Gibbs measure? Two-ish definitions Equivalence

College P Planning N Night GIBBS GIBBS HIGH IGH SCHOOL SC SCHO HOOL COUNSE SELING OFFICE

Distributional Semantics The unsupervised modeling of meaning on a large scale Tim Van de Cruys

Choosing Priors Probability Intervals 18.05 Spring 2014 Conjugate priors A prior is conjugate

Type Theory and Distributional Models of Meaning Shalom Lappin Kings College London Workshop

Mixture of g Priors for Bayesian Variable Selection Feng Liang, Rui Paulo et al. Sheng Zhang

Conjugate Priors: Beta and Normal 18.05 Spring 2018 Review: Continuous priors, discrete data

P-values, Probability, Priors, Rabbits, P-values, Probability, Priors, Rabbits, Quantifauxcation,

Informative Priors for Graphical Model Structure James Cussens, University of York

The Gibbs Sampler CSE 527 Lecture 9 Lawrence, et al. Detecting Subtle Sequence

An in-house expression database : CleanEx CleanEx : CONCEPT AND ORGANIZATION CleanEx_exp

TextMed: A Multi-Agent System with Reinforcement Learning Agents for Biomedical Text Mining

Overview of Vaccine Equity and Prioritization Frameworks Sara Oliver MD, MSPH ACIP Meeting

We Want You! (To Work for a Federal Agency) What You Need to Know about Applying for a Position

Distance Metrics Mark Voorhies 5/14/2015 Mark Voorhies Distance Metrics New verbs f u n c t i

FUNCTIONAL PEPTIDOMICS OF AMPHIBIAN VENOMS The dermal granular (venom) gland The dermal granular

Using k-mers with errors for Nanopore read analysis - Quentin Bonenfant Laurent No Hlne

A multi-source domain annotation pipeline for quantitative metagenomic and metatranscriptomic