bayesian inference process convolution models dave higdon
play

Bayesian inference & process convolution models Dave Higdon, - PowerPoint PPT Presentation

Bayesian inference & process convolution models Dave Higdon, Statistical Sciences Group, LANL 1 MOVING AVERAGE SPATIAL MODELS 2 Kernel basis representation for spatial processes z ( s ) Define m basis functions k 1 ( s ) , . . . , k m ( s )


  1. Bayesian inference & process convolution models Dave Higdon, Statistical Sciences Group, LANL 1

  2. MOVING AVERAGE SPATIAL MODELS 2

  3. Kernel basis representation for spatial processes z ( s ) Define m basis functions k 1 ( s ) , . . . , k m ( s ) . 0.8 basis 0.4 0.0 � 2 0 2 4 6 8 10 12 3 s Here k j ( s ) is normal density cetered at spatial location ω j : 1 2 π exp { − 1 2( s − ω j ) 2 } √ k j ( s ) = m set z ( s ) = j =1 k j ( s ) x j where x ∼ N (0 , I m ) . � Can represent z = ( z ( s 1 ) , . . . , z ( s n )) T as z = Kx where K ij = k j ( s i )

  4. x and k ( s ) determine spatial processes z ( s ) k j ( s ) x j z ( s ) 1.5 1.5 basis z(s) 0.5 0.5 � 0.5 � 0.5 � 2 0 2 4 6 8 10 12 � 2 0 2 4 6 8 10 12 s s 4 Continuous representation: m z ( s ) = j =1 k j ( s ) x j where x ∼ N (0 , I m ) . � Discrete representation: For z = ( z ( s 1 ) , . . . , z ( s n )) T , z = Kx where K ij = k j ( s i )

  5. Estimate z ( s ) by specifying k j ( s ) and estimating x k j ( s ) x j z ( s ) 0.5 1.0 0.0 0.5 � 0.5 fitted bases y(s) 0.0 � 1.0 � 0.5 � 1.5 � 2.0 � 1.0 5 � 2 0 2 4 6 8 10 12 � 2 0 2 4 6 8 10 12 s s Discrete representation: For z = ( z ( s 1 ) , . . . , z ( s n )) T , z = Kx where K ij = k j ( s i ) ⇒ standard regression model: y = Kx + �

  6. Formulation for the 1-d example Data y = ( y ( s 1 ) , . . . , y ( s n )) T observed at locations s 1 , . . . , s n . Once knot locations ω j , j = 1 , . . . , m and kernel choice k ( s ) are specified, the remaining model formulation is trivial: Likelihood:  − 1   n 2 λ y ( y − Kx ) T ( y − Kx )   L ( y | x, λ y ) ∝ λ y exp   2    where K ij = k ( ω j − s i ) . Priors: 6  − 1   m 2 λ x x T x   π ( x | λ x ) ∝ λ x exp   2    π ( λ x ) ∝ λ a x − 1 exp { − b x λ x } x π ( λ y ) ∝ λ a y − 1 exp { − b y λ y } y Posterior: π ( x, λ x , λ y | y ) ∝ λ a y + n 2 − 1 − λ y [ b y + . 5( y − Kx ) T ( y − Kx )] � � exp × y λ a x + m 2 − 1 − λ x [ b x + . 5 x T x ] � � exp x

  7. Posterior and full conditionals Posterior: π ( x, λ x , λ y | y ) ∝ λ a y + n 2 − 1 − λ y [ b y + . 5( y − Kx ) T ( y − Kx )] � � exp × y λ a x + m 2 − 1 − λ x [ b x + . 5 x T x ] � � exp x Full conditionals: π ( x | · · · ) ∝ exp { − 1 2[ λ y x T K T Kx − 2 λ y x T K T y + λ x x T x ] } π ( λ x | · · · ) ∝ λ a x + m 2 − 1 − λ x [ b x + . 5 x T x ] � � exp x π ( λ y | · · · ) ∝ λ a y + n 2 − 1 7 − λ y [ b y + . 5( y − Kx ) T ( y − Kx )] � � exp y Gibbs sampler implementation x | · · · ∼ N (( λ y K T K + λ x I m ) − 1 λ y K T y, ( λ y K T K + λ x I m ) − 1 ) λ x | · · · ∼ Γ ( a x + m 2 , b x + . 5 x T x ) λ y | · · · ∼ Γ ( a y + n 2 , b y + . 5( y − Kx ) T ( y − Kx ))

  8. Gibbs sampler: intuition Gibbs sampler for a bivariate normal density − 1   − 1  � �  − 1 1 ρ 2 ( z 1 z 2 )  1  z 1    � �   ρ   � �   π ( z ) = π ( z 1 , z 2 ) ∝ exp   � �     � � ρ 1 1  z 2  2 � �  ρ  � �     � �  6 Full conditionals of π ( z ) : z 1 | z 2 ∼ N ( ρ z 2 , 1 − ρ 2 ) z 2 | z 1 ∼ N ( ρ z 1 , 1 − ρ 2 ) ( z 1 1 , z 1 2 ) = z 1 9 • initialize chain with u 6 z 2  0  1       ρ z 0 ∼ N  ,       0 1    ρ • draw z 1 1 ∼ N ( ρ z 0 2 , 1 − ρ 2 ) - z 0 = ( z 0 u u 1 , z 0 ( z 1 1 , z 0 2 ) 2 ) now ( z 1 1 , z 0 2 ) T ∼ π ( z ) - z 1

  9. Gibbs sampler: intuition Gibbs sampler gives z 0 , z 2 , . . . , z T which can be treated as dependent draws from π ( z ) . If z 0 is not a draw from π ( z ) , then the initial realizations will not have the correct distribution. In practice, the first 100?, 1000? realizations are discarded. The draws can be used to make inference 6 about π ( z ) : • Posterior mean of z is estimated by:      z k  = 1  ˆ µ 1 T � 1      10 z k ˆ µ 2 T k =1 2 z 2 • Posterior probabilities: P ( z 1 > 1) = 1 T � k =1 I [ z k � 1 > 1] T P ( z 1 > z 2 ) = 1 T � k =1 I [ z k 1 > z k � 2 ] T • 90% interval: ( z [5%] , z [95%] ) . 1 1 - z 1

  10. 1-d example m = 6 knots evenly spaced between − . 3 and 1 . 2 . n = 5 data points at s = . 05 , . 25 , . 52 , . 65 , . 91 . k ( s ) is N (0 , sd = . 3) a y = 10 , b y = 10 · ( . 25 2 ) ⇒ strong prior at λ y = 1 /. 25 2 ; a x = 1 , b x = . 001 3 3 basis functions 2 2 1 1 y(s) k j (s) 0 0 � 1 � 1 � 2 � 2 � 3 � 3 20 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 s s 8 50 6 40 4 30 � x , � y 2 x j 20 0 10 � 2 � 4 0 0 200 400 600 800 1000 0 200 400 600 800 1000 iteration iteration

  11. 1-d example From posterior realizations of knot weights x , one can construct posterior real- � m izations of the smooth fitted function z ( s ) = j =1 k j ( s ) x j . Note strong prior on λ y required since n is small. 3 3 posterior realizations of z(s) mean & pointwise 90% bounds 2 2 1 1 z(s), y(s) z(s), y(s) 0 0 21 � 1 � 1 � 2 � 2 � 3 � 3 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 s s

  12. 1-d example m = 20 knots evenly spaced between − 2 and 12 . n = 18 data points evenly spaced between 0 and 10. k ( s ) is N (0 , sd = 2) data posterior mean and 80% CI for z(s) 1.5 1.5 1.0 1.0 y y 0.5 0.5 0.0 0.0 -0.5 -0.5 8 0 2 4 6 8 10 0 2 4 6 8 10 s s posterior dist for lam_y posterior dist for lam_x 300 200 250 200 150 150 100 100 50 50 0 0 0 10 20 30 40 50 60 0.0 0.5 1.0 1.5 2.0 lambda_y lambda_x

  13. 1-d example m = 20 knots evenly spaced between − 2 and 12 . n = 18 data points evenly spaced between 0 and 10. k ( s ) is N (0 , sd = 1) data posterior mean and 80% CI for z(s) 1.5 1.5 1.0 1.0 y y 0.5 0.5 0.0 0.0 -0.5 -0.5 9 0 2 4 6 8 10 0 2 4 6 8 10 s s posterior dist for lam_y posterior dist for lam_x 200 150 150 100 100 50 50 0 0 0 100 200 300 400 500 600 0.5 1.0 1.5 2.0 2.5 3.0 3.5 lambda_y lambda_x

  14. Basis representations for spatial processes z ( s ) Represent z ( s ) at spatial locations s 1 , . . . , s n . z = ( z ( s 1 ) , . . . , z ( s n )) T ∼ N (0 , Σ z ) . Recall z = Kx, where KK T = Σ z and x ∼ N (0 , I n ) . Gives a discrete representation of z ( s ) at locations s 1 , . . . , s n . discrete representation continuous representation 1.0 1.0 0.5 0.5 basis basis 0.0 0.0 10 � 0.5 � 0.5 � 1.0 � 1.0 0 2 4 6 8 10 0 2 4 6 8 10 s s n n z ( s i ) = z ( s ) = j =1 k j ( s ) x j j =1 K ij x j � � Columns of K give basis functions. Can use a subset of these basis functions to reduce dimensionality.

  15. decomposition basis covariance 10 1.0 8 Cholesky (w/piv) 0.5 location s 6 basis 0.0 4 � 0.5 2 � 1.0 0 0 2 4 6 8 10 0 2 4 6 8 10 location s s 10 1.0 8 SVD 0.5 location s 6 basis 0.0 4 � 0.5 2 11 � 1.0 0 0 2 4 6 8 10 0 2 4 6 8 10 location s s 10 1.0 8 kernels 0.5 location s 6 basis 0.0 4 � 0.5 2 � 1.0 0 0 2 4 6 8 10 0 2 4 6 8 10 s location s

  16. How many basis kernels? Define m basis functions k 1 ( s ) , . . . , k m ( s ) . 1.0 10 0.5 8 m = 20 ? location s 6 basis 0.0 4 � 0.5 2 � 1.0 0 0 2 4 6 8 10 0 2 4 6 8 10 s location s 1.0 10 8 0.5 m = 10 ? location s 6 basis 0.0 4 12 � 0.5 2 � 1.0 0 0 2 4 6 8 10 0 2 4 6 8 10 s location s 1.0 10 8 0.5 m = 6 ? location s 6 basis 0.0 4 � 0.5 2 � 1.0 0 0 2 4 6 8 10 0 2 4 6 8 10 s location s

  17. Moving average specifications for spatial models z ( s ) smoothing kernel k ( s ) white noise process x = ( x 1 , . . . , x m ) at spatial locations ω 1 , . . . , ω m . 13 x ∼ N (0 , 1 I m ) λ x spatial process z ( s ) constructed by convolving x with smoothing kernel k ( s ) m z ( s ) = j =1 x j k ( ω j − s ) � ⇒ z ( s ) is a Gaussian process with mean 0 and covariance given by Cov ( z ( s ) , z ( s � )) = 1 m j =1 k ( ω j − s ) k ( ω j − s � ) � λ x

  18. 14

  19. Example: constructing 1-d models for z ( s ) 3 2 z(s), x(s) m = 20 knot locations 1 -1 ω 1 , . . . , ω m equally spaced between − 2 and 12. -3 x = ( x 1 , . . . , x m ) T ∼ N (0 , I m ) 0 2 4 6 8 10 s 3 � m z ( s ) = k =1 k ( ω k − s ) x k 2 z(s), x(s) 1 -1 k ( s ) is a normal density with sd = 1 1 2 , and 1 4 , -3 4th frame uses k ( s ) = 1 . 4 exp { − 1 . 4 | s |} . 0 2 4 6 8 10 s 3 16 2 z(s), x(s) 1 General points: -1 -3 • smooth kernels required 0 2 4 6 8 10 s • spacing depends on kernel width 3 2 − knot spacing ≤ 1 sd for normal k ( s ) z(s), x(s) 1 -1 • kernel width is equivalent to scale parameter in -3 GP models 0 2 4 6 8 10 s

  20. 17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend