etre bay esien quand on a trop de donn ees
play

Etre bay esien quand on a trop de donn ees Pr ec ed e dune - PowerPoint PPT Presentation

Etre bay esien quand on a trop de donn ees Pr ec ed e dune introduction au mille-feuille CRIStAL emi Bardenet 1 R 1 CNRS & CRIStAL, Univ. Lille, France R emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data


  1. ˆ Etre bay´ esien quand on a trop de donn´ ees Pr´ ec´ ed´ e d’une introduction au mille-feuille CRIStAL emi Bardenet 1 R´ 1 CNRS & CRIStAL, Univ. Lille, France R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 1

  2. emi ∈ SigMA ⊂ DatIng ⊂ CRIStAL ⊂ Univ. Lille R´ R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 2

  3. emi ∈ SigMA ⊂ DatIng ⊂ CRIStAL ⊂ Univ. Lille R´ R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 2

  4. emi ∈ SigMA ⊂ DatIng ⊂ CRIStAL ⊂ Univ. Lille R´ ◮ 222 permanents dont 22 CNRS et 27 Inria. R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 2

  5. emi ∈ SigMA ⊂ DatIng ⊂ CRIStAL ⊂ Univ. Lille R´ ◮ ∼ 40 permanents. R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 2

  6. emi ∈ SigMA ⊂ DatIng ⊂ CRIStAL ⊂ Univ. Lille R´ ◮ 13 permanents. R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 2

  7. emi ∈ SigMA ⊂ DatIng ⊂ CRIStAL ⊂ Univ. Lille R´ R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 2

  8. emi ∈ SigMA ⊂ DatIng ⊂ CRIStAL ⊂ Univ. Lille R´ R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 2

  9. Bayesian inference ◮ A biologist decides on ◮ a likelihood p ( x | θ ), ◮ a prior p ( θ ), ◮ Then he has implicitely decided on ◮ a posterior p ( θ | x ) = p ( x | θ ) p ( θ ) . Z ◮ Bayesian inference is all about computing integrals � h ( θ ) p ( θ | x ) d θ. ◮ MCMC samples an ergodic Markov chain ( θ t ) t =1 ,..., T with stationary distribution p ( ·| θ ), so that when T → ∞ , √ � T � 1 � d � T →∞ N (0 , σ 2 ) . h ( θ t ) − h ( θ ) p ( θ | x ) d θ − − − − → T T t =1 ◮ Sampling ( θ t ) t =1 ,..., T requires T likelihood evaluations. R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 3

  10. Bayesian inference ◮ A biologist decides on ◮ a likelihood p ( x | θ ), ◮ a prior p ( θ ), ◮ Then he has implicitely decided on ◮ a posterior p ( θ | x ) = p ( x | θ ) p ( θ ) . Z ◮ Bayesian inference is all about computing integrals � h ( θ ) p ( θ | x ) d θ. ◮ MCMC samples an ergodic Markov chain ( θ t ) t =1 ,..., T with stationary distribution p ( ·| θ ), so that when T → ∞ , √ � T � 1 � d � T →∞ N (0 , σ 2 ) . h ( θ t ) − h ( θ ) p ( θ | x ) d θ − − − − → T T t =1 ◮ Sampling ( θ t ) t =1 ,..., T requires T likelihood evaluations. R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 3

  11. Bayesian inference ◮ A biologist decides on ◮ a likelihood p ( x | θ ), ◮ a prior p ( θ ), ◮ Then he has implicitely decided on ◮ a posterior p ( θ | x ) = p ( x | θ ) p ( θ ) . Z ◮ Bayesian inference is all about computing integrals � h ( θ ) p ( θ | x ) d θ. ◮ MCMC samples an ergodic Markov chain ( θ t ) t =1 ,..., T with stationary distribution p ( ·| θ ), so that when T → ∞ , √ � T � 1 � d � T →∞ N (0 , σ 2 ) . h ( θ t ) − h ( θ ) p ( θ | x ) d θ − − − − → T T t =1 ◮ Sampling ( θ t ) t =1 ,..., T requires T likelihood evaluations. R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 3

  12. Bayesian inference ◮ A biologist decides on ◮ a likelihood p ( x | θ ), ◮ a prior p ( θ ), ◮ Then he has implicitely decided on ◮ a posterior p ( θ | x ) = p ( x | θ ) p ( θ ) . Z ◮ Bayesian inference is all about computing integrals � h ( θ ) p ( θ | x ) d θ. ◮ MCMC samples an ergodic Markov chain ( θ t ) t =1 ,..., T with stationary distribution p ( ·| θ ), so that when T → ∞ , √ � T � 1 � d � T →∞ N (0 , σ 2 ) . h ( θ t ) − h ( θ ) p ( θ | x ) d θ − − − − → T T t =1 ◮ Sampling ( θ t ) t =1 ,..., T requires T likelihood evaluations. R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 3

  13. Bayesian inference ◮ A biologist decides on ◮ a likelihood p ( x | θ ), ◮ a prior p ( θ ), ◮ Then he has implicitely decided on ◮ a posterior p ( θ | x ) = p ( x | θ ) p ( θ ) . Z ◮ Bayesian inference is all about computing integrals � h ( θ ) p ( θ | x ) d θ. ◮ MCMC samples an ergodic Markov chain ( θ t ) t =1 ,..., T with stationary distribution p ( ·| θ ), so that when T → ∞ , √ � T � 1 � d � T →∞ N (0 , σ 2 ) . h ( θ t ) − h ( θ ) p ( θ | x ) d θ − − − − → T T t =1 ◮ Sampling ( θ t ) t =1 ,..., T requires T likelihood evaluations. R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 3

  14. Tall data ◮ Assume data are independent conditional on θ , n � p ( x | θ ) = p ( x i | θ ) i =1 . ◮ Can you get the same central limit theorem while never evaluating all terms in the product? ◮ Yes [1], sometimes using o ( n ) datapoints per iteration! [2] ◮ Unanswered yet: What is the equivalent of stochastic gradient for integration? R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 4

  15. Tall data ◮ Assume data are independent conditional on θ , n � p ( x | θ ) = p ( x i | θ ) i =1 . ◮ Can you get the same central limit theorem while never evaluating all terms in the product? ◮ Yes [1], sometimes using o ( n ) datapoints per iteration! [2] ◮ Unanswered yet: What is the equivalent of stochastic gradient for integration? R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 4

  16. Metropolis-Hastings � p ( x | θ ) , p ( θ ) , q ( θ ′ | θ ) , θ 0 , N iter , X � MH 1 for k ← 1 to N iter 2 θ ← θ k − 1 θ ′ ∼ q ( . | θ ) , u ∼ U (0 , 1) , 3 � n i =1 p ( x i | θ ′ ) p ( θ ′ ) q ( θ | θ ′ ) 4 α = � n i =1 p ( x i | θ ) p ( θ ) q ( θ ′ | θ ) 5 if u < α θ k ← θ ′ 6 ⊲ Accept 7 else θ k ← θ ⊲ Reject 8 return ( θ k ) k =1 ,..., N iter R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 5

  17. Metropolis-Hastings � p ( x | θ ) , p ( θ ) , q ( θ ′ | θ ) , θ 0 , N iter , X � MH 1 for k ← 1 to N iter 2 θ ← θ k − 1 θ ′ ∼ q ( . | θ ) , u ∼ U (0 , 1) , 3 � n i =1 p ( x i | θ ′ ) p ( θ ′ ) q ( θ | θ ′ ) 4 α = � n i =1 p ( x i | θ ) p ( θ ) q ( θ ′ | θ ) 5 if u < α θ k ← θ ′ 6 ⊲ Accept 7 else θ k ← θ ⊲ Reject 8 return ( θ k ) k =1 ,..., N iter R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 5

  18. Metropolis-Hastings � p ( x | θ ) , p ( θ ) , q ( θ ′ | θ ) , θ 0 , N iter , X � MH 1 for k ← 1 to N iter 2 θ ← θ k − 1 θ ′ ∼ q ( . | θ ) , u ∼ U (0 , 1) , 3 � n i =1 p ( x i | θ ′ ) p ( θ ′ ) q ( θ | θ ′ ) 4 α = � n i =1 p ( x i | θ ) p ( θ ) q ( θ ′ | θ ) 5 if u < α θ k ← θ ′ 6 ⊲ Accept 7 else θ k ← θ ⊲ Reject 8 return ( θ k ) k =1 ,..., N iter R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 5

  19. Metropolis-Hastings � p ( x | θ ) , p ( θ ) , q ( θ ′ | θ ) , θ 0 , N iter , X � MH 1 for k ← 1 to N iter 2 θ ← θ k − 1 θ ′ ∼ q ( . | θ ) , u ∼ U (0 , 1) , 3 � n i =1 p ( x i | θ ′ ) p ( θ ′ ) q ( θ | θ ′ ) 4 α = � n i =1 p ( x i | θ ) p ( θ ) q ( θ ′ | θ ) 5 if u < α θ k ← θ ′ 6 ⊲ Accept 7 else θ k ← θ ⊲ Reject 8 return ( θ k ) k =1 ,..., N iter R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 5

  20. Metropolis-Hastings � p ( x | θ ) , p ( θ ) , q ( θ ′ | θ ) , θ 0 , N iter , X � MH 1 for k ← 1 to N iter 2 θ ← θ k − 1 θ ′ ∼ q ( . | θ ) , u ∼ U (0 , 1) , 3 � n i =1 p ( x i | θ ′ ) p ( θ ′ ) q ( θ | θ ′ ) 4 α = � n i =1 p ( x i | θ ) p ( θ ) q ( θ ′ | θ ) 5 if u < α θ k ← θ ′ 6 ⊲ Accept 7 else θ k ← θ ⊲ Reject 8 return ( θ k ) k =1 ,..., N iter R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 5

  21. Subsampling approaches p ( x | θ ) , p ( θ ) , q ( θ ′ | θ ) , θ 0 , N iter , X � � MH 1 for k ← 1 to N iter 2 θ ← θ k − 1 θ ′ ∼ q ( . | θ ) , u ∼ U (0 , 1) , 3 � n i =1 p ( x i | θ ′ ) p ( θ ′ ) q ( θ | θ ′ ) 4 α = � n q ( θ ′ | θ ) i =1 p ( x i | θ ) p ( θ ) 5 if u < α θ k ← θ ′ 6 ⊲ Accept 7 else θ k ← θ ⊲ Reject 8 return ( θ k ) k =1 ,..., N iter R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 6

  22. Subsampling approaches p ( x | θ ) , p ( θ ) , q ( θ ′ | θ ) , θ 0 , N iter , X � � MH 1 for k ← 1 to N iter 2 θ ← θ k − 1 θ ′ ∼ q ( . | θ ) , u ∼ U (0 , 1) , 3 � � u p ( θ ) q ( θ ′ | θ ) ψ ( u , θ, θ ′ ) ← 1 4 n log p ( θ ′ ) q ( θ | θ ′ ) � � p ( x i | θ ′ ) � n Λ n ( θ, θ ′ ) ← 1 5 i =1 log n p ( x i | θ ) if Λ n ( θ, θ ′ ) > ψ ( u , θ, θ ′ ) 6 θ k ← θ ′ 7 ⊲ Accept 8 else θ k ← θ ⊲ Reject 9 return ( θ k ) k =1 ,..., N iter ◮ Can we use t � p ( x ∗ i | θ ′ ) t ( θ, θ ′ ) = 1 � Λ ∗ � log ? p ( x ∗ t i | θ ) i =1 R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 6

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend