bayes and lancaster at the chinese restaurant
play

Bayes and Lancaster at the Chinese restaurant. Statistical uses of - PowerPoint PPT Presentation

Bayes and Lancaster at the Chinese restaurant. Statistical uses of the Fleming-Viot Process. Dario Span` o University of Warwick 1st Berlin-Padova Young Researchers Workshop 23-25 October, 2014 0 / 1 Based on joint works with Bob Gri ffi ths


  1. Bayes and Lancaster at the Chinese restaurant. Statistical uses of the Fleming-Viot Process. Dario Span` o University of Warwick 1st Berlin-Padova Young Researchers Workshop 23-25 October, 2014 0 / 1

  2. Based on joint works with Bob Gri ffi ths (Oxford) Paul Jenkins (Warwick) Matteo Ruggiero (Torino) and Omiros Papaspiliopoulos (Barcelona) 1 / 1

  3. Outline Chinese Restaurant Process and Bayes Computable filters Fleming-Viot Lancaster joins the restaurant 2 / 1

  4. Dirichlet measures and the Chinese restaurant process Infinitely many delegates participate to an important probability young researcher workshop. Day 1: dinner at the chinese restaurant. Delegates enter the room one by one, and, if k tables occupied by n 1 , . . . , n k persons ( P k i =1 n i = n ), then the ( n + 1)-th delegate: joins table with n j people with probability n j / ( n + θ ) ( j = 1 , . . . , k ); chooses a new table with probability θ / ( n + θ ); each new table labelled with a color chosen at random from a space E of colors, using a prob. distribution P 0 . 3 / 1

  5. Dirichlet measures and the Chinese restaurant process Infinitely many delegates participate to an important probability young researcher workshop. Day 1: dinner at the chinese restaurant. Delegates enter the room one by one, and, if k tables occupied by n 1 , . . . , n k persons ( P k i =1 n i = n ), then the ( n + 1)-th delegate: joins table with n j people with probability n j / ( n + θ ) ( j = 1 , . . . , k ); chooses a new table with probability θ / ( n + θ ); each new table labelled with a color chosen at random from a space E of colors, using a prob. distribution P 0 . Let X n = “color of table occupied by n -th delegate”, n 2 N . Denote X ( n ) = ( X 1 , . . . , X n ) and n X e n ( X ( n ) ) := 1 δ X i , n 2 N . n i =1 3 / 1

  6. Bayes at the Chinese restaurant. The sequence ( X 1 , X 2 , . . . ) is infinitely exchangeable. 4 / 1

  7. Bayes at the Chinese restaurant. The sequence ( X 1 , X 2 , . . . ) is infinitely exchangeable. Prior : e ( X ( n ) ) ! a . s . n !1 F where F ⇠ π θ , P 0 Ferguson-Dirichlet : π θ , P 0 ( F ( A 1 ) , . . . , F ( A d )) ⇠ Dir ( θ P 0 ( A 1 ) , . . . , θ P 0 ( A d )) , for every d 2 N and every partition ( A 1 , . . . , A d ) of E . 4 / 1

  8. Bayes at the Chinese restaurant. The sequence ( X 1 , X 2 , . . . ) is infinitely exchangeable. Prior : e ( X ( n ) ) ! a . s . n !1 F where F ⇠ π θ , P 0 Ferguson-Dirichlet : π θ , P 0 ( F ( A 1 ) , . . . , F ( A d )) ⇠ Dir ( θ P 0 ( A 1 ) , . . . , θ P 0 ( A d )) , for every d 2 N and every partition ( A 1 , . . . , A d ) of E . Likelihood: � � = µ ⌦ n L X ( n ) | F = µ 4 / 1

  9. Bayes at the Chinese restaurant. The sequence ( X 1 , X 2 , . . . ) is infinitely exchangeable. Prior : e ( X ( n ) ) ! a . s . n !1 F where F ⇠ π θ , P 0 Ferguson-Dirichlet : π θ , P 0 ( F ( A 1 ) , . . . , F ( A d )) ⇠ Dir ( θ P 0 ( A 1 ) , . . . , θ P 0 ( A d )) , for every d 2 N and every partition ( A 1 , . . . , A d ) of E . Likelihood: � � = µ ⌦ n L X ( n ) | F = µ Posterior: L ( F | x ( n ) ) ⇠ π θ + n , θ + n P 0 . n θ θ + n e ( x ( n ) )+ . 4 / 1

  10. How crowded is your table? Under π θ , P 0 , the pdf of ( F ( A 1 ) , . . . , F ( A d )) is 2 3 ⇣ ⌘ k Y ( x 1 , ..., x d ) 2 [0 , 1] d : | x | = 1 x θ P 0 ( A j ) � 1 4 5 I / j j =1 If E = { 0 , 1 } , then P 0 = p 0 2 [0 , 1] so π θ , p 0 = beta ( θ p 0 , θ (1 � p 0 )) If E any polish, then F ( A ) ⇠ beta ( θ P 0 ( A ) , θ (1 � P 0 ( A ))) 5 / 1

  11. Di ff usion model The time-evolution of a genetic variant, or allele, is well approximated by a di ff usion process on the interval [0 , 1]. 1 Allele frequency 0 Time 6 / 1

  12. Di ff usion model The time-evolution of a genetic variant, or allele, is well approximated by a di ff usion process on the interval [0 , 1]. 1 Allele frequency 0 Time Wright-Fisher SDE p dF t = b θ ( F t ) dt + F t (1 � F t ) dW t , F 0 = µ, t � 0 . The infinitesimal drift, b θ ( x ), encapsulates directional forces such as natural selection, migration, mutation, . . . 6 / 1

  13. Filtering with genetic time series data. We do not observe the path of the frequency di ff usion F = ( F t : t � 0), but only samples taken at distinct time points t 1 < . . . < t k . Key assumption on likelihood iid X 1 ( t ) , . . . , X n ( t ) ( t ) | F t ⇠ F t , t 2 { t 1 , . . . , t k } 1 Allele frequency 0 Time How to infer di ff usion sample path properties given data? 7 / 1

  14. Optimal filter. Assume the di ff usion has stationary measure π and transition function P t ( µ, d ν ). Let f µ ( x ) the likelihood of data given signal, both at stationarity. Two operators. Update operator (Bayes’ rule) : φ x ( π )( d µ ) = f µ ( x ) π ( d µ ) E π ( X ) prediction operator (propagator) : Z ψ t ( π )( d ν ) = π ( d µ ) P t ( µ, d ν ) M 1 Definition The optimal filter is the solution of the recursion π 0 = φ x t 0 ( π ) , π n = φ x tn ( ψ t n � t n − 1 ( π n )) it is called computable filter if iterating n times update/propagation involves finite sums whose number of terms depends on n . 8 / 1

  15. Filtering with genetic time series data. We do not observe the path of the frequency di ff usion F = ( F t : t � 0), but only samples taken at distinct time points t 1 < . . . < t k . Key assumption on likelihood iid X 1 ( t ) , . . . , X n ( t ) ( t ) | F t ⇠ F t , t 2 { t 1 , . . . , t k } 1 Allele frequency 0 Time How to infer di ff usion sample path properties given data? 9 / 1

  16. Filtering with genetic time series data. We do not observe the path of the frequency di ff usion F = ( F t : t � 0), but only samples taken at distinct time points t 1 < . . . < t k . Key assumption on likelihood iid X 1 ( t ) , . . . , X n ( t ) ( t ) | F t ⇠ F t , t 2 { t 1 , . . . , t k } 1 Allele frequency 0 Time A priori , F t 1 ⇠ π 10 / 1

  17. Filtering with genetic time series data. We do not observe the path of the frequency di ff usion F = ( F t : t � 0), but only samples taken at distinct time points t 1 < . . . < t k . Key assumption on likelihood iid X 1 ( t ) , . . . , X n ( t ) ( t ) | F t ⇠ F t , t 2 { t 1 , . . . , t k } 1 Allele frequency 0 Time Update F t 0 | Data at t 1 11 / 1

  18. Filtering with genetic time series data. We do not observe the path of the frequency di ff usion F = ( F t : t � 0), but only samples taken at distinct time points t 1 < . . . < t k . Key assumption on likelihood iid X 1 ( t ) , . . . , X n ( t ) ( t ) | F t ⇠ F t , t 2 { t 1 , . . . , t k } 1 Allele frequency 0 Time Predict F 2 based on posterior update at t 1 via P t 2 � t 1 ( F x ( n 1) ( t 1 ) , · ) 12 / 1

  19. Filtering with genetic time series data. We do not observe the path of the frequency di ff usion F = ( F t : t � 0), but only samples taken at distinct time points t 1 < . . . < t k . Key assumption on likelihood iid X 1 ( t ) , . . . , X n ( t ) ( t ) | F t ⇠ F t , t 2 { t 1 , . . . , t k } Update distribution of F t 2 given Data at t 2 . Carry on for t 3 , t 4 , · · · . 13 / 1

  20. Tractability of a filter. p dF t = b θ ( F t ) dt + F t (1 � F t ) dW t , F 0 = µ, t � 0 . Ideally we would like to be able to Know the stationary distribution π ; Know how to compute posterior P ( F t | Data at t ); Know how to compute P t ( µ, d ν ). Generally all three aspects are intractable. Neutral Fleming-Viot models have them all ! b α , β ( x ) = 1 2[ α (1 � x ) � β x ] , α , β > 0 . 14 / 1

  21. Tractability of a filter. p dF t = b θ ( F t ) dt + F t (1 � F t ) dW t , F 0 = µ, t � 0 . Ideally we would like to be able to Know the stationary distribution π ; Beta( α , β ) distribution Know how to compute posterior P ( F t | Data at t );CRP Know how to compute P t ( µ, d ν ).Lancaster probability Generally all three aspects are intractable. Neutral Fleming-Viot models have them all ! b α , β ( x ) = 1 2[ α (1 � x ) � β x ] , α , β > 0 . 14 / 1

  22. What are Lancaster probabilities? Definition Let ( X , Y ) be an exchangeable pair of random variables with (identical) marginal distn. π . The joint distribution of ( X , Y ) is a Lancaster probability distribution if, for every n , E [ Y n | X = x ] = ρ n x n + polynomial in x of degree less than n The coe ffi cients { ρ n } are termed Canonical Correlation Coe ffi cients . In neutral Fleming-Viot model P t µ n = e � 1 2 n ( n + θ � 1) t µ n + . . . , θ = α + β . Benefit in filtering: Given F 0 = µ sample of size n from µ is su ffi cient to predict sample of size n at time t . 15 / 1

  23. Genealogy and eigenvalues The canonical correlation coe ffi cients e � 1 2 n ( n + θ � 1) t are the eigenvalues of the semigroup P t . A probabilistic interpretation is in terms of the model’s genealogy (dual to the di ff usion). 1 Allele frequency 0 Time 16 / 1

  24. Neutral model, a closer look. Finite population of size N , discrete, non-overlapping generations. at each generation, type of individuals J 1 , . . . , J N labeled with points in some Polish space E (e.g. E = { 0 , 1 } ). At each time k , each individual picks her parent uniformly at random from previous generation k � 1. Any individual with probability 1 � u inherits her parent’s type. With probability u it mutates to a new type chosen from E according to a probability distribution P 0 on E (if E = { 0 , 1 } , then P 0 { 1 } 2 [0 , 1]). Let N X F N ( k ) := 1 δ J i ( k ) , k = 0 , 1 , . . . . N i =1 17 / 1

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend