Exact posterior distributions over the segmentation space and model - PowerPoint PPT Presentation

Exact posterior distributions over the segmentation space and model selection for multiple change-point detection problems Guillem Rigaill, Emilie Lebarbier and Stéphane Robin, August 2010 G.Rigaill ( ) August 2010 1 / 16

Application to DNA Copy number DNA Copy number analysis In normal cells: copy number = 2 (pairs of chromosome) In tumor cells: copy number � = 2 on many points of the genome Gain and loss of DNA: ◮ chromosomes ◮ smaller regions up to 10Kb G.Rigaill ( ) August 2010 2 / 16

Multiple change-point detection The data The signal we observe Y t is noisy The true signal is affected by abrupt changes Segments and segmentations M K the set of all possible segmentations with K segments m ∈ M K a specific segmentation r ∈ m a segment of m with n r observations G.Rigaill ( ) August 2010 3 / 16

A model, a simple example Normal heteroscedastic segmentation Y t ∼ N ( µ r , σ 2 ∀ t ∈ r r ) { Y t } t are independent Parameter estimation Given the breakpoint positions, the estimation of other parameters is straightforward For example, using maximum likelihood we get: 1 ˆ µ r = � t ∈ r Y t n r G.Rigaill ( ) August 2010 4 / 16

Estimation of breakpoint positions? Problems For n points, there are 2 n − 1 possible segmentations Breakpoints are discrete parameters How to select one segmentation out of so many? How to explore the segmentation space? Some solutions Dynamic Programming (DP) to recover the optimal solution: O ( n 2 ) Various model selection criteria: ◮ The BIC criteria is not theoretically justified ◮ [Zhang and Siegmund(2007)] proposed a modified BIC criteria G.Rigaill ( ) August 2010 5 / 16

One example Application to a DNA copy number profile Algorithm 1 ◮ DP to recover the best segmentation in K = 1 up to K = 30 segments Select K 2 ◮ with the modified BIC Questions Is the optimal segmentation far better than others? Quality of the segment/breakpoint localizations? G.Rigaill ( ) August 2010 6 / 16

Bayesian framework Some probabilities P ( m ) prior distribution of segmentation m P ( K ) prior distribution of the number of segments P ( Y | θ m , m ) distribution of the data given m and θ m Assumption: Factorisability If the segment are independent: P ( Y | m ) = Π r ∈ m P ( Y r | r ) P ( Y r | r ) = P ( Y r | θ r ) P ( θ r ) d θ r , with θ r parameters or segment r � G.Rigaill ( ) August 2010 7 / 16

Computation Quantities of interest P ( m | Y ) posterior probability of a segmentation m P ( K | Y ) posterior probability of the number of segments S K ( r ) posterior probability of the segment r ICL ( K ) Integrated Completed Likelihood [Biernacki et al.(2000)] ICL ( K ) = − log P ( Y , K ) + H ( K ) ICL favours the K where the best segmentation is by far the best one H ( K ) entropy: H ( K ) = − � m ∈M K P ( m | Y , K ) log P ( m | Y , K ) Small entropy means that the best segmentation in K is by far the best fit to the data G.Rigaill ( ) August 2010 8 / 16

P ( m | Y ) and P ( K | Y ) P ( m | Y ) P ( m | Y ) = P ( Y | m ) . P ( m ) = Π r ∈ m P ( Y r | r ) . P ( m ) P ( Y r | r ) = P ( Y r | θ r ) P ( θ r ) d θ r , with θ r parameters or segment r � BIC criteria is derived from an approximation of this P ( m | Y ) In fact, it can be computed exactly P ( K | Y ) � P ( Y , K ) = P ( Y , m ) m ∈M K P ( K | Y ) can be computed as successive matrix-vector products Similar computations were proposed by using backward-forward like algorithms [Fearnhead(2005), Guédon(2008)] P ( K | Y ) can be used to select the number of segments G.Rigaill ( ) August 2010 9 / 16

Posterior probability of a segment Posterior probability of a segment S K , k ( � t 1 , t 2 � ) segmentations having r = � t 1 , t 2 � as their k -th segment. Compute exactly their probability S K , k ( � t 1 , t 2 � ) in O ( n 2 ) : k − 1 seg. before t 1 × 1 between t 1 & t 2 × K − k after t 2 M k − 1 ( � 1 , t 1 � ) × { � t 1 , t 2 � } × M K − k ( � t 2 , n + 1 � ) S K ( � t 1 , t 2 � ) segmentations including segment � t 1 , t 2 � � S K ( � t 1 , t 2 � ) = S K , k ( � t 1 , t 2 � ) k � S K ( � t 1 , t 2 � ) = S K , k ( � t 1 , t 2 � ) k G.Rigaill ( ) August 2010 10 / 16

Entropy Entropy Exact computation in O ( K . n 2 ) , uses the posterior probability of segments H ( K ) = − � m ∈M K P ( m | Y , K ) log P ( m | Y , K ) m ∈M K P ( m | Y , K ) log (Π r ∈ m P ( Y r | r ) . P ( m )) = − � r S K ( r ) log P ( Y r | r ) + log P ( K | Y ) = − � ICL ICL ( K ) = − log P ( Y , K ) + H ( K ) G.Rigaill ( ) August 2010 11 / 16

Simulation Design and results Simulated sequence of 150 observations 6 change-points (positions: 21, 29, 68, 82, 115, 135). Do P ( m | Y ) , P ( K | Y ) and ICL ( K ) recover the correct number of breakpoints (in relation with the level of noise)? G.Rigaill ( ) August 2010 12 / 16

A CGH example CGH Profiles P ( m | Y ) : 3 segments ICL ( K ) : 4 segments G.Rigaill ( ) August 2010 13 / 16

A CGH example ICL favors segmentations with small entropy P ( m | Y ) : 3 segments ICL ( K ) : 4 segments Segments probability if K = 3 Segments probability if K = 4 G.Rigaill ( ) August 2010 14 / 16

Conclusion Exact computation in O ( Kn 2 ) ◮ Posterior Probability of a segment ◮ Entropy of the segmentation space Model selection ◮ Exact computation of P ( m | Y ) ◮ Exact computation of P ( K | Y ) ◮ Exact computation of ICL ( K ) (using the entropy) G.Rigaill ( ) August 2010 15 / 16

References Zhang, N. R. and Siegmund, D. O. (2007) A modified Bayes information criterion with applications to the analysis of comparative genomic hybridization data Biometrics , 63, 22–32, PMID: 17447926 Biernacki, C. and Celeux, G. and Govaert, G. (2000) Assessing a mixture model for clustering with the integrated completed likelihood IEEE Transactions on Pattern Analysis and Machine Intelligence , 22, 719–725. Fearnhead, P . (2005), Exact Bayesian curve fitting and signal segmentation, IEEE Transactions on Signal Processing , 53, 2160–2166. Guédon, Y. (2008), Exploring the segmentation space for the assessment of multiple change-point models, Tech. Rep. 6619, INRIA. G.Rigaill ( ) August 2010 16 / 16

Exact posterior distributions over the segmentation space and model - PowerPoint PPT Presentation

Exact posterior distributions over the segmentation space and model selection for multiple change-point detection problems Guillem Rigaill, Emilie Lebarbier and Stphane Robin, August 2010 G.Rigaill ( ) August 2010 1 / 16 Application to DNA

A O I Posterior View A O I Posterior View A O I

Segmentation Bottom-up Segmentation Semantic / instance segmentation Many Slides from L.

VIDEO SIGNALS Segmentation WHAT IS SEGMENTATION WHAT IS SEGMENTATION Segmentation is a

Section 33: Hip Structural Components 33-1 posterior posterior anterior anterior head of

Semantic Segmentation / Instance Segmentation Based on Deep learning Yiding Liu 2018.12.08

Stat 5102 Lecture Slides: Deck 1 Empirical Distributions, Exact Sampling Distributions,

Formal Modeling in Cognitive Science 1 Distributions Lecture 20: Joint, Marginal, and Conditional

Segmentation Segmentation Segmentation Define the accurate boundaries of all objects in an image

Segmentation using Segmentation using Bayesian Decision Theory Bayesian Decision Theory

? ? ? ? Basic Charts Outline - Distributions & Histograms - Mean, Mode, Average - Chart

Algebraic Tools for Exact Geometric Computing I - Exact Arithmetic and Filtering Michael Hemmer

Lecture 8: Image Segmentation Peng Chao Face++ Researcher pengchao@megvii.com Nov. 2017

Pixel-Level Im Image Understanding wit ith Semantic Segmentation and Panoptic Segmentation

Co-Segmentation of 3D Shapes via Subspace Clustering Ruizhen Hu Lubin Fan

Introduction to RFM segmentation Karolis Urbonas Head of Data Science, Amazon DataCamp

Image Segmentation Machine Learning Study Group Presented by Yaochen Xie Jan 25, 2018 Outline

Preference Proposals Each student will submit Two (2) votes for topic areas in the form

An Algorithmic View on Multi-related-segments: a unifying model for approximate common interval

GenomicTuples and DNA methylation patterns Peter Hickey (@PeteHaitch) - Walter and Eliza Hall

Lectures 13: High throughput sequencing: Beyond the genome

A segmentation-clustering problem for the analysis of array CGH data F. Picard, S. Robin, E.

ALLPATHS: de novo assembly of whole genome micro-reads by Butler et al. Presented by Tim Smith

t tt

Parallel Compu,ng Strategies for NGS Sequence Mapping Kun Huang

Sambuz

Useful Links

Newsletter

Mail Us

Exact posterior distributions over the segmentation space and model - PowerPoint PPT Presentation

Exact posterior distributions over the segmentation space and model selection for multiple change-point detection problems Guillem Rigaill, Emilie Lebarbier and Stphane Robin, August 2010 G.Rigaill ( ) August 2010 1 / 16 Application to DNA

A O I Posterior View A O I Posterior View A O I

Segmentation Bottom-up Segmentation Semantic / instance segmentation Many Slides from L.

VIDEO SIGNALS Segmentation WHAT IS SEGMENTATION WHAT IS SEGMENTATION Segmentation is a

Section 33: Hip Structural Components 33-1 posterior posterior anterior anterior head of

Semantic Segmentation / Instance Segmentation Based on Deep learning Yiding Liu 2018.12.08

Stat 5102 Lecture Slides: Deck 1 Empirical Distributions, Exact Sampling Distributions,

Formal Modeling in Cognitive Science 1 Distributions Lecture 20: Joint, Marginal, and Conditional

Segmentation Segmentation Segmentation Define the accurate boundaries of all objects in an image

Segmentation using Segmentation using Bayesian Decision Theory Bayesian Decision Theory

? ? ? ? Basic Charts Outline - Distributions &amp; Histograms - Mean, Mode, Average - Chart

Algebraic Tools for Exact Geometric Computing I - Exact Arithmetic and Filtering Michael Hemmer

Lecture 8: Image Segmentation Peng Chao Face++ Researcher pengchao@megvii.com Nov. 2017

Pixel-Level Im Image Understanding wit ith Semantic Segmentation and Panoptic Segmentation

Co-Segmentation of 3D Shapes via Subspace Clustering Ruizhen Hu Lubin Fan

Introduction to RFM segmentation Karolis Urbonas Head of Data Science, Amazon DataCamp

Image Segmentation Machine Learning Study Group Presented by Yaochen Xie Jan 25, 2018 Outline

Preference Proposals Each student will submit Two (2) votes for topic areas in the form

An Algorithmic View on Multi-related-segments: a unifying model for approximate common interval

GenomicTuples and DNA methylation patterns Peter Hickey (@PeteHaitch) - Walter and Eliza Hall

Lectures 13: High throughput sequencing: Beyond the genome

A segmentation-clustering problem for the analysis of array CGH data F. Picard, S. Robin, E.

ALLPATHS: de novo assembly of whole genome micro-reads by Butler et al. Presented by Tim Smith

t tt

Parallel Compu,ng Strategies for NGS Sequence Mapping Kun Huang

Sambuz

Useful Links

Newsletter

Mail Us

? ? ? ? Basic Charts Outline - Distributions & Histograms - Mean, Mode, Average - Chart