cse 527 lecture 9
play

CSE 527 Lecture 9 The Gibbs Sampler Talk Today Zasha Weinberg - PowerPoint PPT Presentation

CSE 527 Lecture 9 The Gibbs Sampler Talk Today Zasha Weinberg Combi HSB K-069, 1:30 Fast, accurate annotation of non-coding RNAs The Gibbs Sampler Lawrence et al. Detecting Subtle Sequence Signals: A Gibbs Sampling


  1. CSE 527 Lecture 9 The Gibbs Sampler

  2. Talk Today • Zasha Weinberg Combi HSB K-069, 1:30 “Fast, accurate annotation of non-coding RNAs”

  3. The “Gibbs Sampler” • Lawrence et al. “Detecting Subtle Sequence Signals: A Gibbs Sampling Strategy for Multiple Sequence Alignment” Science 1993

  4. The Double Helix Los Alamos Science

  5. Some DNA Binding Domains

  6. Some History • Geman & Geman, IEEE PAMI 1984 • Hastings, Biometrika, 1970 • Metropolis, Rosenbluth, Rosenbluth, Teller, & Teller, “Equations of State Calculations by Fast Computing Machines,” J. Chem. Phys. 1953 • Josiah Williard Gibbs, 1839-1903, American physicist, a pioneer of thermodynamics

  7. How to Average • An old problem: • n random variables: x 1 , x 2 , . . . , x k • Joint distribution (p.d.f.): P ( x 1 , x 2 , . . . , x k ) • Some function: f ( x 1 , x 2 , . . . , x k ) • Want Expected Value: E ( f ( x 1 , x 2 , . . . , x k ))

  8. How to Average E ( f ( x 1 , x 2 , . . . , x k )) = � � � f ( x 1 , x 2 , . . . , x k ) · P ( x 1 , x 2 , . . . , x k ) dx 1 dx 2 . . . dx k · · · • x 1 x 2 x k • Approach 1: direct integration (rarely solvable analytically, esp. in high dim) • Approach 2: numerical integration (often difficult, e.g., unstable, esp. in high dim) • Approach 3: Monte Carlo integration sample and average: x (1) , � x (2) , . . . � x ( n ) ∼ p ( � x ) � � n x )) ≈ 1 x ( i ) ) E ( f ( � i =1 f ( � n

  9. Markov Chain Monte Carlo (MCMC) • Independent sampling also often hard, but not required for expectation X t +1 | � � • MCMC X t • Simplest & most common: Gibbs Sampling P ( x i | x 1 , x 2 , . . . , x i − 1 , x i +1 , . . . , x k ) • Algorithm for t = 1 to ∞ t+ 1 t for i = 1 to k do : x t +1 ,i ∼ P ( x t +1 ,i | x t +1 , 1 , x t +1 , 2 , . . . , x t +1 ,i − 1 , x t,i +1 , . . . , x t,k )

  10. • Input: again assume sequences s1, ..., sk with one length w motif per sequence • Motif model: WMM • Parameters: Where are the motifs? for 1 <= i <= k, have 1 <= xi <= |si|-w+1 • “Full conditional”: to calc P ( x i = j | x 1 , x 2 , . . . , x i − 1 , x i +1 , . . . , x k ) build WMM from motifs in all sequences except i, then calc prob that motif in ith seq occurs at j by usual “scanning” alg.

  11. Randomly initialize xi’s for t = 1 to ∞ for i = 1 to k discard motif instance from si; recalc WMM from rest Similar to for j = 1 ... |si|-w+1 MEME, but it would calculate prob that ith motif is at j: average over, rather than P ( x i = j | x 1 , x 2 , . . . , x i − 1 , x i +1 , . . . , x k ) sample from pick new xi according to that distribution

  12. Issues • Burnin - how long must we run the chain to reach stationarity? • Mixing - how long a post-burnin sample must we take to get a good sample of the stationary distribution? (Recall that individual samples are not independent, and may not “move” freely through the sample space.)

  13. Variants & Extensions • “Phase Shift” - may settle on suboptimal solution that overlaps part of motif. Periodically try moving all motif instances a few spaces left or right. • Algorithmic adjustment of pattern width: Periodically add/remove flanking positions to maximize (roughly) average relative entropy per position • Multiple patterns per string

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend