seminar statistics for structures a graphical perspective
play

Seminar Statistics for structures A graphical perspective on - PowerPoint PPT Presentation

Seminar Statistics for structures A graphical perspective on Gauss-Markov process priors Moritz Schauer University of Amsterdam 1 / 26 Outline Midpoint displacement construction of a Brownian motion Corresponding Gaussian Markov


  1. Seminar “Statistics for structures” A graphical perspective on Gauss-Markov process priors Moritz Schauer University of Amsterdam 1 / 26

  2. Outline ◮ Midpoint displacement construction of a Brownian motion ◮ Corresponding Gaussian Markov random field ◮ Chordal graphs ◮ Sparse Cholesky decomposition ◮ Connection to inference of diffusion processes 2 / 26

  3. Mid-point displacement L´ evy-Ciesielski construction of a Brownian motion ( W t ) t ∈ [0 , 1] [1] 3 / 26

  4. Faber-Schauder basis Figure: Elements ψ l,k , 1 ≤ l ≤ 3 of the hierarchical (Faber-) Schauder basis 4 / 26

  5. Schauder basis functions A location and scale family based on the “hat” function � ( x ) = (2 x ) 1 [0 , 1 2 ) + 2( x − 1) 1 [ 1 2 , 1] k = 0 , . . . , 2 j − 1 − 1 ψ j,k ( x ) = � (2 j − 1 x − k ) , j ≥ 1 , 5 / 26

  6. Mid-point displacement II Start with Brownian motion bridge ( W t ) t ∈ [0 , 1] 2 j − 1 − 1 J � � W J = Z j,k ψ j,k j =1 k =0 W J – truncated Faber–Schauder expansion Z J = vec ( Z j,k , j ≤ J, 0 ≤ k < 2 j − 1 ) Z J – independent zero mean Gaussian random variables Z j,k = W 2 − j (2 k +1) − 1 2( W 2 − j +1 k + W 2 − j +1 ( k +1) ) 6 / 26

  7. Mid-point displacement II Start with mean zero Gauss–Markov process ( W t ) t ∈ [0 , 1] 2 j − 1 − 1 J � � W J = Z j,k ψ j,k j =1 k =0 W J – truncated Faber–Schauder expansion Z J = vec ( Z j,k , j ≤ J, 0 ≤ k < 2 j − 1 ) Z J – mean zero Gaussian vector with precision matrix Γ Z j,k = W 2 − j (2 k +1) − 1 2( W 2 − j +1 k + W 2 − j +1 ( k +1) ) 7 / 26

  8. Markov property Write ι := ( j, k ) , ι ′ = ( j ′ , k ′ ) In general ⊥ Z ι ′ | Z { ι,ι ′ } C Γ ι,ι ′ = 0 if Z ι ⊥ By the Markov property Γ ι,ι ′ = 0 ψ ι · ψ ι ′ ≡ 0 if 8 / 26

  9. Gaussian Markov random field A Gaussian vector ( Z 1 , . . . , Z n ) together with the graph G ( { 1 , . . . , n } , E ) where no edge in E between ι and ι ′ ⊥ Z ι ′ | Z { ι,ι ′ } C if Z ι ⊥ 9 / 26

  10. Chordal graph / Triangulated graph “A chordal graph is a graph in which all cycles of four or more vertices have a chord , which is an edge that is not part of the cycle but connects two vertices of the cycle.” 10 / 26

  11. Interval graph The open supports of ψ j,k form an interval graph on pairs ( j, k ) . Interval graphs are chordal graphs. In red a cycle of four vertices with a blue chord 1 1 An interval graph is the intersection graph of a family of intervals on the real line. Interval graphs are chordal graphs. 11 / 26

  12. Sampling from the prior ◮ Sample J ◮ Compute factorization SS ′ = Γ J ◮ Solve by backsubstitution L ′ Z = WN with WN – standard white noise Hence: How to find sparse factors? 12 / 26

  13. Perfect elimination ordering “A perfect elimination ordering in a graph is an ordering of the vertices of the graph such that, for each vertex v, v and the neighbors of v that occur after v in the order form a clique .” Example: (3 , 0) (3 , 1) (3 , 2) (3 , 4) (2 , 0) (2 , 1) (1 , 0) 13 / 26

  14. Ordering the columns and rows of Γ according to the perfect elimination ordering of the chordal graph: S is the sparse Cholesky factor of ˜ ˜ Γ     � � � � � � � � � � � � ˜ ˜     Γ = S = � � � �     � � � � � � � � � � � � � � � � � � � � � � � � � � � � Cholesky decomposition has no fill in! 14 / 26

  15. Exploiting hierarchical structure Order rows and columns of Γ according to the location of the maxima of ψ j,k . Γ has sparsity structure (3 , 0) (2 , 0) (3 , 1) (1 , 0) (3 , 2) (2 , 1) (3 , 3)   � � � � � � � � � �   Γ =  , � � � � � � �  � � � � � � � � � � Γ = SS ′ where   � � � � �   S =  . � � � � � � �  � � � � � 15 / 26

  16. Recursive sparsity pattern S 1 = ( s 11 )    2 J − 1 − 1 S J − 1 0 0  l S J = 1 S cl s cc S cr   2 J − 1 − 1 S J − 1  0 0 r 16 / 26

  17. Hierarchical back-substitution A hierarchical back-substitution problem of the form       0 0 S l X l B l  = S cl s cc S cr x c b c      0 0 S r X r B r � �� � ( m +1+ m ) × ( m +1+ m ) can be recursively solved by solving the back-substitution problems S l X l = B l , S r X r = B r and setting x c = s − 1 cc · ( b c − S cl X l − S cr X r ) 17 / 26

  18. Factorization in quasi linear time       A ′ A l 0 S l 0 0 S l S cr 0 cl  = 0 0 A cl a cc A cr S cl s cc S cr s cc      A ′ 0 A r 0 0 S r 0 S cr S r cr   S l S ′ S ′ l S cl 0 l S ′ s 2 cc + S cl S ′ cl + S cr S ′ S ′ = cl S l r S cr   cr S ′ S r S ′ 0 cr S r r Here A l = S l S ′ l and A r = S r S ′ r are two hierarchical factorization problems of level J − 1 , A l = S ′ cl S l and A r = S ′ cr S r are hierarchical back-substitution problems and � a cc − S cl S ′ cl + S cr S ′ s cc = cr . 18 / 26

  19. Approximative sparse inversion using nested dissection [2] 19 / 26

  20. Application: Nonparametric inference for diffusion process d X t = b 0 ( X t ) d t + d W t (1) Prior P ( J ≥ j ) ≥ C exp( − 2 j ) and 2 j − 1 − 1 J � � b = Z j,k ψ j,k j =1 k =0 M Ξ J ≥ pd Γ J ≥ pd m Ξ J 2 , Ξ J = diagm(2 − 2( j − 1) α , 1 ≤ j ≤ J, 0 ≤ k < 2 j − 1 ) where α = 1 20 / 26

  21. Gaussian inverse problem Likelihood � � T � T � b ( X t ) d X t − 1 b 2 ( X t ) d t p ( X | b ) = exp 2 0 0 � T ι = 1 , . . . , 2 J − 1 µ J ι = ψ ι ( X t ) d X t , 0 � T ι, ι ′ = 1 , . . . , 2 J − 1 . G J ι,ι ′ = ψ ι ( X t ) ψ ι ′ ( X t ) d t, 0 Γ J and G J have the same sparsity pattern 21 / 26

  22. Conjugate posterior For fix level J , Z J | J, X ∼ N (Σ J µ J , Σ J ) where Σ J = (Γ J + G J ) − 1 . On J a reversible jump algorithm can be used. 22 / 26

  23. Posterior contraction rates (periodic case) Besov norm, supremum norm for f = � � z j,k ψ j,k � 2 ( j − 1) α | z j,k | � f � α = sup � f � ∞ ≤ | z j,k | max k j ≥ 1 ,k j Sieves   2 j − 1 − 1 L   � � z j,k ψ j,k : 2 α ( j − 1) | z j,k | ≤ M, j, k = . . . B L,M =   j =1 k =0 Rate β β T − 1+2 β log( T ) β ≥ α 1+2 β 23 / 26

  24. Anderson’s lemma If X ∼ N (0 , Σ X ) and Y ∼ N (0 , Σ Y ) independent with Σ X ≤ pd Σ Y positive definite, then then for all symmetric convex sets P ( Y ∈ C ) ≤ P ( X ∈ C ) . 24 / 26

  25. Summary ◮ Midpoint displacement construction of Gauss-Markov processes ◮ Corresponding Gaussian Markov random field ◮ Chordal graphs and perfect elimination orderings ◮ Sparse Cholesky decomposition ◮ Rates for randomly truncated prior 25 / 26

  26. Image sources [1] http://math.stackexchange.com/questions/251856 /area-enclosed-by-2-dimensional-random-curve [2] http://kartoweb.itc.nl/geometrics/ reference%20surfaces/body.htm 26 / 26

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend