fast estimation of posterior change point probabilities
play

Fast estimation of posterior change-point probabilities for CNV data - PowerPoint PPT Presentation

Fast estimation of posterior change-point probabilities for CNV data The Minh Luong, Yves Rozenholc, Gregory Nuel, MAP5, Universit e Paris Descartes July 5, 2012 Luong et al, MAP5 Fast estimation of posterior change-point probabilities


  1. Fast estimation of posterior change-point probabilities for CNV data The Minh Luong, Yves Rozenholc, Gregory Nuel, MAP5, Universit´ e Paris Descartes July 5, 2012 Luong et al, MAP5 Fast estimation of posterior change-point probabilities

  2. Introduction Change-point methods: applications in econometrics, engineering, network security, signal processing, music classification, bioinformatics e.g. copy number variation (CNV), to identify regions where DNA mutations are related to disease susceptibility High-resolution data, 10’s thousands of clones per chromosome Array comparative genomic hybridization (aCGH) Single nucleotide polymorphism (SNP) array array CGH profile, source: Redon and Carter, Methods Mol Biol. 2009; 529: 37-49. Luong et al, MAP5 Fast estimation of posterior change-point probabilities

  3. Examples of R packages for change-point analysis Unsupervised hidden Markov model (HMM) approaches Willenbrock and Fridyland (2005) - aCGH package Marioni et al (2006) - snapCGH package Non-HMM segmentation approaches Venkatraman and Olshen (2004) - DNAcopy package Hup´ e et al (2004) - GLAD package Likelihood-based approaches - penalization criteria Picard et al (2005) - cghseg package Change-point uncertainty (MCMC) Erdman et al (2008) - bcp package Luong et al, MAP5 Fast estimation of posterior change-point probabilities

  4. Motivation Few exact non-MCMC methods for assessing uncertainty of change-point estimates Methods for finding exact posterior probabilities of change-points: O ( n 2 ) complexity frequentist - Gu´ edon (2007) Bayesian - Rigaill (2011) High-resolution data in genomics technologies ( > 10 , 000 observations per chromosome): Smaller inter-segmental differences: characterize uncertainty More data: need efficient estimates O ( n 2 ) not feasible Next-generation sequencing: need methods adaptable to non-normal data Luong et al, MAP5 Fast estimation of posterior change-point probabilities

  5. Segmentation approach to change-point detection Dataset: X = ( X 1 , X 2 , . . . , X n ): real-valued observations. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Hidden state space: ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● S = ( S 1 , S 2 , . . . , S n ): ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● corresponding segment indices. Y ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Distribution: ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● S: ● 1 2 3 ● ● 4 5 ● ● ● P ( X i | S i = k , θ k ) ∼ g θ k ( · ): X i ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● belongs to segment k . ● Problem of interest: Find P ( S i | X ; θ ) =?, when segments Figure: Segment-based unknown given data change-point detection (K=5) Luong et al, MAP5 Fast estimation of posterior change-point probabilities

  6. Constrained hidden Markov model for segmentation Use of HMM algorithms to estimate posterior probabilities with linear complexity S : Markov chain over { 1 , 2 , . . . , K , K + 1 } , M K : set of possible S { S ∈ M K } : K states in n observations Constraints on HMM correspond exactly to a segmentation change-point model. Find best partitioning S ∈ M K into K non-overlapping intervals, distribution homogeneous within each segment S 1 = 1 , S n = K , junk state: K + 1 Allow for transitions of only 0 or +1, S i − S i − 1 ∈ { 0 , 1 } . P ( S i = k + 1 | S i − 1 = k ) = η k ( i ) P ( S i = k | S i − 1 = k ) = 1 − η k ( i ) Luong et al, MAP5 Fast estimation of posterior change-point probabilities

  7. Adapted forward-backward algorithm Forward and backward quantities, for observation i and state k : F i ( k ) = P ( X 1: i = x 1: i , S i = k ) B i ( k ) = P ( X i +1: n = x i +1: n , S n = K | S i = k ) Initialization: F 1 (1) = g θ 1 ( x 1 ) B 1 ( K − 1) = η K ( x n ) g θ k ( x n ) , B 1 ( K ) = (1 − η K ( x n )) g θ k ( x n ) Recursion: F i ( k ) = [ F i − 1 ( k )(1 − η k ( i )) + 1 k > 1 F i − 1 ( k − 1) η k ( i )] g θ k ( x i ) B i − 1 ( k ) = (1 − η k ( i )) g θ k ( x i ) B i ( k ) + 1 k < K η k +1 ( i ) g θ k +1 ( x i ) B i ( k + 1) Luong et al, MAP5 Fast estimation of posterior change-point probabilities

  8. Posterior probabilities from forward-backward algorithm Posterior probability of state k for observation i P ( S i = k | X 1: n = x 1: n ) = F i ( k ) B i ( k ) F 1 (1) B 1 (1) . Posterior probability of obs i being the k th change-point P ( CP k = i | X 1: n = x 1: n ) = P ( S i = k , S i +1 = k + 1 | X 1: n = x 1: n ) = F i ( k ) η k ( i ) g θ k +1 ( x k +1 ) B i +1 ( k + 1) F 1 (1) B 1 (1) Posterior transition probability from k − 1 th to k th state P ( S i = k | S i − 1 = k − 1 , X 1: n = x 1: n ) = η k − 1 ( i − 1) g θ k ( x i ) B i ( k ) . B i − 1 ( k − 1) Luong et al, MAP5 Fast estimation of posterior change-point probabilities

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend