exchangeability
play

Exchangeability Peter Orbanz Columbia University P ARAMETERS AND P - PowerPoint PPT Presentation

Exchangeability Peter Orbanz Columbia University P ARAMETERS AND P ATTERNS Parameters P ( X | ) = Probability [ data | pattern ] 3 2 1 output, y 0 1 2 3 5 0 5 input, x Inference idea data = underlying pattern +


  1. Exchangeability Peter Orbanz Columbia University

  2. P ARAMETERS AND P ATTERNS Parameters P ( X | θ ) = Probability [ data | pattern ] 3 2 1 output, y 0 − 1 − 2 − 3 − 5 0 5 input, x Inference idea data = underlying pattern + independent noise Peter Orbanz 2 / 25

  3. T ERMINOLOGY Parametric model ◮ Number of parameters fixed (or constantly bounded) w.r.t. sample size Nonparametric model ◮ Number of parameters grows with sample size ◮ ∞ -dimensional parameter space Example: Density estimation x 2 p(x) µ x 1 Parametric Nonparametric Peter Orbanz 3 / 25

  4. N ONPARAMETRIC B AYESIAN MODEL Definition A nonparametric Bayesian model is a Bayesian model on an ∞ -dimensional parameter space. Interpretation Parameter space T = set of possible patterns. Recall previous tutorials: Model T Application Gaussian process Smooth functions Regression problems DP mixtures Smooth densities Density estimation CRP, 2-param. CRP Parititons Clustering Solution to Bayesian problem = posterior distribution on patterns Peter Orbanz [Sch95] 4 / 25

  5. DE F INETTI ’ S T HEOREM Infinite exchangeability For all π ∈ S ∞ (= infinite symmetric group): P ( X 1 , X 2 , . . . ) = P ( X π ( 1 ) , X π ( 2 ) , ... ) π ( P ) = P or Theorem (de Finetti) � ∞ � � � ⇔ P ( X 1 , X 2 , . . . ) = Q ( X n ) d ν ( Q ) P exchangeable M ( X ) n = 1 ◮ Q is a random measure ◮ ν uniquely determined by P Peter Orbanz 5 / 25

  6. F INITE E XCHANGEABILITY Finite sequence X 1 , . . . , X n Exchangeability of finite sequence �⇒ de Finetti-representation Example: Two exchangeable random bits X 1 = 0 X 1 = 1 X 2 = 0 1 / 2 0 X 2 = 1 1 / 2 0 Suppose de Finetti holds; then � [ 0 , 1 ] p 2 d ν ( p ) � � P ( X 1 = X 2 = 1 ) = ν { p = 0 } = 1 0 = ⇒ � [ 0 , 1 ] ( 1 − p ) 2 d ν ( p ) P ( X 1 = X 2 = 0 ) = ν { p = 1 } = 1 Intuition Finite exchangeability does not eliminate sequential patterns. Peter Orbanz [DF80] 6 / 25

  7. S UPPORT OF P RIORS Model P 0 outside model: misspecified P 0 = P θ 0 M ( X ) Peter Orbanz [Gho10, KvdV06] 7 / 25

  8. S UPPORT OF N ONPARAMETRIC P RIORS Large support ◮ Support of nonparametric priors is larger ( ∞ -dimensional) than of parametric priors (finite-dimensional). ◮ However: No uniform prior (or even “neutral” improper prior) exists on M ( X ) . Interpretation of nonparametric prior assumptions Concentration of nonparametric prior on subset of M ( X ) typically represents structural prior assumption. ◮ GP regression with unknown bandwidth: ◮ Any continuous function possible ◮ Prior can express e.g. “very smooth functions are more probable” ◮ Clustering: Expected number of clusters is... ◮ ...small − → CRP prior ◮ ...power law − → two-parameter CRP Peter Orbanz 8 / 25

  9. P ARAMETERIZED M ODELS X Probability model X ( ω ) P ( X ) = X [ P ] Ω X ω P Θ( ω ) Θ T Parameterized model P [ X | Θ] X ∞ F T X ∞ M ( X ) ⊃ P Ω T Θ ◮ P = { P [ X | θ ] | θ ∈ T } ◮ F ≡ law of large numbers ◮ T : P [ . | Θ = θ ] �→ θ bijection ◮ Θ := T ◦ F ◦ X ∞ Peter Orbanz [Sch95] 9 / 25

  10. J USTIFICATION : B Y E XCHANGEABILITY Again: de Finetti � ∞ � ∞ � � � � � � P ( X 1 , X 2 , . . . ) = Q ( X n ) d ν ( Q ) = Q ( X n | Θ = θ ) d ν T ( θ ) M ( X ) T n = 1 n = 1 ◮ Θ random measure (since Θ( ω ) ∈ M ( X ) ) Convergence results The de Finetti theorem comes with a convergence result attached: weakly ◮ Empirical measure: F n − − − → θ as n → ∞ ◮ Posterior Λ n (Θ | X 1 , . . . , X n ) = Λ n ( . , ω ) in M ( T ) exists n →∞ ◮ Posterior convergence: Λ n ( . , ω ) − − − → δ Θ( ω ) Peter Orbanz [Kal01] 10 / 25

  11. S PECIAL T YPES OF E XCHANGEABLE D ATA

  12. M ODIFICATIONS Pólya Urns n α 1 � P ( X n + 1 | X 1 = x 1 , . . . , X n = x n ) = δ x j ( X n + 1 ) + α + nG 0 ( X n + 1 ) α + n j = 1 Exchangeable: ◮ ν is DP ( α, G 0 ) �� ∞ � ◮ � ∞ n = 1 Q ( X n | θ ) = � ∞ n = 1 θ ( X n ) = � ∞ j = 1 c j δ t j ( X n ) n = 1 Exchangeable increment processes (H. Bühlmann) Stationary, exchangeable increment process = mixture of Lévy processes � P (( X t ) t ∈ R + ) = L α,γ,µ (( X t ) t ∈ R + ) d ν ( α, γ, µ ) L α,γ,µ = Lévy process with jump measure µ [B¨ Peter Orbanz 60, Kal01] 12 / 25

  13. M ODIFICATION 2: R ANDOM P ARTITIONS Random partition of N Π = { B 1 , B 2 , . . . } e.g. {{ 1 , 3 , 5 , . . . } , { 2 , 4 } , { 10 } , . . . } U 3 U 1 U 2 Paint-box distribution ◮ Weights s 1 , s 2 , . . . ≥ 0 with � s j ≤ 1 ◮ U 1 , U 2 , . . . ∼ Uniform [ 0 , 1 ] s 1 s 2 1 − � j s j Sampling Π ∼ β [ . | s ] : i , j ∈ N in same block ⇔ U i , U j in same interval � { i } separate block ⇔ U i in interval 1 − s j Theorem (Kingman) � Π exchangeable ⇔ P (Π ∈ . ) = β [Π ∈ . | s ] Q ( d s ) Peter Orbanz [Kin78] 13 / 25

  14. R OTATION INVARIANCE Rotatable sequence P n ( X 1 , . . . , X n ) = P n ( R n ( X 1 , . . . , X n )) R n ∈ O ( n ) for all Infinite case : ⇔ X 1 , X 2 , . . . rotatable X 1 , . . . , X n rotatable for all n Theorem (Freedman) Infinite sequence rotatable iff � ∞ � � � P ( X 1 , X 2 , . . . ) = N σ ( X n ) d ν R + ( σ ) R + n = 1 N σ denotes ( 0 , σ ) -Gaussian Peter Orbanz 14 / 25

  15. T WO INTERPRETATIONS As special case of de Finetti ◮ Rotatable ⇒ exchangeable ◮ General de Finetti: Parameter space T = M ( X ) ◮ Rotation invariance: T shrinks to { N σ | σ ∈ R + } As invariance under different symmetry ◮ Exchangeability = invariance of P ( X 1 , X 2 , ... ) under group action ◮ Freedman: Different group ( O ( n ) rather than S ∞ ) ◮ In these cases: symmetry ⇒ decomposition theorem Peter Orbanz 15 / 25

  16. N ON - EXCHANGEABLE D ATA

  17. E XCHANGEABILITY : R ANDOM G RAPHS Random graph with independent edges θ : [ 0 , 1 ] 2 → [ 0 , 1 ] Given: symmetric function ◮ U 1 , U 2 , . . . ∼ Uniform [ 0 , 1 ] 0 0 0 ◮ Edge ( i , j ) present: ( i , j ) ∼ Bernoulli ( θ ( U i , U j )) θ Call this distribution Γ( G ∈ . | θ ) . 1 1 1 Theorem (Aldous; Hoover) 3 4 2 A random (dense) graph G is exchangeable iff 1 5 � P ( G ∈ . ) = Γ( G ∈ . | θ ) Q ( d θ ) 9 T 6 7 8 Peter Orbanz [Ald81, Hoo79] 17 / 25

  18. E XCHANGEABILITY : R ANDOM G RAPHS Random graph with independent edges θ : [ 0 , 1 ] 2 → [ 0 , 1 ] Given: symmetric function ◮ U 1 , U 2 , . . . ∼ Uniform [ 0 , 1 ] U 1 U 2 0 0 0 ◮ Edge ( i , j ) present: U 1 Pr { edge 1 , 2 } ( i , j ) ∼ Bernoulli ( θ ( U i , U j )) θ U 2 Call this distribution Γ( G ∈ . | θ ) . 1 1 1 Theorem (Aldous; Hoover) 3 4 2 A random (dense) graph G is exchangeable iff 1 5 � P ( G ∈ . ) = Γ( G ∈ . | θ ) Q ( d θ ) 9 T 6 7 8 Peter Orbanz [Ald81, Hoo79] 17 / 25

  19. DE F INETTI : G EOMETRY Finite case e 1 � P = ν i e i ν 1 e i ∈E ◮ E = { e 1 , e 2 , e 3 } P ◮ ( ν 1 , ν 2 , ν 3 ) barycentric coordinates ν 3 ν 2 e 2 e 3 Infinite/continuous case � � P ( . ) = e ( . ) d ν ( e ) = k ( θ, . ) d ν T ( θ ) E T ◮ k : T → E ⊂ M ( X ) probability kernel (= conditional probability) ◮ k is random measure with values k ( θ, . ) ∈ E ◮ de Finetti: k ( θ, . ) = � n ∈ N Q ( . | θ ) and T = M ( X ) Peter Orbanz 18 / 25

  20. D ECOMPOSITION BY S YMMETRY Theorem (Varadarajan) ◮ G nice group on space Y ◮ Call measure µ ergodic if µ ( A ) ∈ { 0 , 1 } for all G -invariant sets A . ◮ E := { ergodic probability measures } Then there is a Markov kernel k : Y → E s.t.: � P ∈ M ( V ) ⇔ P ( A ) = k ( θ, A ) d ν ( θ ) G -invariant T de Finetti ◮ G = S ∞ and Y = X ∞ ◮ G -invariant sets = exchangeable events ◮ E = factorial distributions (“Hewitt-Savage 0-1 law”) Peter Orbanz [Var63] 19 / 25

  21. S YMMETRY AND S UFFICIENCY

  22. S UFFICIENT S TATISTICS Problem Apparently no direct connection with standard models Sufficient Statistic Functions S n of data sufficient if: ◮ Intuitively: S n ( X 1 , . . . , X n ) contains all information sample provides on parameter ◮ Formally: P n ( X 1 , . . . , X n | Θ , S n ) = P ( X 1 , . . . , X n | S ) for all n Sufficiency and symmetry � n ◮ P exchangeable ⇔ S n ( x 1 , . . . , x n ) = 1 i = 1 δ x n sufficient n �� n ◮ P rotatable ⇔ S n ( x 1 , . . . , x n ) = i = 1 x 2 i = � ( x 1 , . . . , x n ) � 2 sufficient Peter Orbanz 21 / 25

  23. D ECOMPOSITION BY S UFFICIENCY Theorem (Diaconis and Freedman; Lauritzen; several others) Given: Sufficient statistic S n for each n k n ( . , s n ) = conditional probability of X 1 , . . . , X n given s n 1. k n converges to a limit function: n →∞ k n ( . , S n ( X 1 ( ω ) , . . . , X n ( ω ))) − − − → k ∞ ( . , ω ) 2. P ( X 1 , X 2 , . . . ) has the decomposition � P ( . ) = k ∞ ( . , ω ) d ν ( ω ) 3. The model P ⊂ M ( X ) is a convex set with extreme points k ∞ ( . , ω ) 4. The measure ν is uniquely determined by P (Theorem statement omits technical conditions.) Peter Orbanz 22 / 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend