the gibbs sampler cse 527 lecture 9
play

The Gibbs Sampler CSE 527 Lecture 9 Lawrence, et al. Detecting - PowerPoint PPT Presentation

The Gibbs Sampler CSE 527 Lecture 9 Lawrence, et al. Detecting Subtle Sequence Signals: A The Gibbs Sampler Gibbs Sampling Strategy for Multiple Sequence Alignment, Science 1993 The Double Helix Some DNA Binding Domains


  1. The “Gibbs Sampler” CSE 527 Lecture 9 • Lawrence, et al. “Detecting Subtle Sequence Signals: A The Gibbs Sampler Gibbs Sampling Strategy for Multiple Sequence Alignment,” Science 1993 The Double Helix Some DNA Binding Domains Los Alamos Science

  2. Some History How to Average • Geman & Geman, IEEE PAMI 1984 • Hastings, Biometrika, 1970 An old problem: • Metropolis, Rosenbluth, Rosenbluth, Teller, • n random variables: x 1 , x 2 , . . . , x k • Joint distribution (p.d.f.): & Teller, “Equations of State Calculations by P ( x 1 , x 2 , . . . , x k ) • Some function: Fast Computing Machines,” J. Chem. Phys. f ( x 1 , x 2 , . . . , x k ) • Want Expected Value: 1953 E ( f ( x 1 , x 2 , . . . , x k )) • Josiah Williard Gibbs, 1839-1903, American physicist, a pioneer of thermodynamics

  3. Markov Chain Monte How to Average Carlo (MCMC) E ( f ( x 1 , x 2 , . . . , x k )) = � � � f ( x 1 , x 2 , . . . , x k ) · P ( x 1 , x 2 , . . . , x k ) dx 1 dx 2 . . . dx k • Independent sampling also often hard, but • · · · x 1 x 2 x k not required for expectation • Approach 1: direct integration • MCMC w/ stationary dist = P X t +1 ∼ P ( � � X t +1 | � X t ) (rarely solvable analytically, esp. in high dim) • Simplest & most common: Gibbs Sampling • Approach 2: numerical integration P ( x i | x 1 , x 2 , . . . , x i − 1 , x i +1 , . . . , x k ) (often difficult, e.g., unstable, esp. in high dim) • Algorithm • Approach 3: Monte Carlo integration for t = 1 to � t+ 1 t sample and average: x (1) , � x (2) , . . . � x ( n ) ∼ P ( � � x ) for i = 1 to k do : x )) ≈ 1 � n x ( i ) ) E ( f ( � i =1 f ( � n x t +1 ,i ∼ P ( x t +1 ,i | x t +1 , 1 , x t +1 , 2 , . . . , x t +1 ,i − 1 , x t,i +1 , . . . , x t,k ) • Input: again assume sequences s 1 , s 2 , ..., s k with one length w motif per sequence • Motif model: WMM � Y i,j • Parameters: Where are the motifs? for 1 � i � k , have 1 � x i � | s i |- w +1 1 3 5 7 9 11 ... • “Full conditional”: to calc Sequence i P ( x i = j | x 1 , x 2 , . . . , x i − 1 , x i +1 , . . . , x k ) build WMM from motifs in all sequences except i , then calc prob that motif in i th seq occurs at j by usual “scanning” alg.

  4. Issues Randomly initialize x i ’s for t = 1 to � • Burnin - how long must we run the chain for i = 1 to k to reach stationarity? discard motif instance from s i ; • Mixing - how long a post-burnin sample recalc WMM from rest for j = 1 ... |s i |-w +1 must we take to get a good sample of the Similar to stationary distribution? (Recall that calculate prob that i th motif is at j : MEME, but it individual samples are not independent, and would P ( x i = j | x 1 , x 2 , . . . , x i − 1 , x i +1 , . . . , x k ) average over, may not “move” freely through the sample pick new x i according to that distribution rather than sample from space. Also, many isolated modes.) Variants & Extensions • “Phase Shift” - may settle on suboptimal solution that overlaps part of motif. Periodically try moving all motif instances a few spaces left or right. • Algorithmic adjustment of pattern width: Periodically add/remove flanking positions to maximize (roughly) average relative entropy per position • Multiple patterns per string

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend