chicago statistical methodology for signal detection in
play

CHiCAGO: Statistical methodology for signal detection in Capture - PowerPoint PPT Presentation

CHiCAGO: Statistical methodology for signal detection in Capture Hi-C data Jonathan Cairns jonathan.cairns@babraham.ac.uk @jonathancairns Fraser/Spivakov labs, Babraham Insitute 4th October 2016 Table of Contents Introduction 1 The CHiCAGO


  1. CHiCAGO: Statistical methodology for signal detection in Capture Hi-C data Jonathan Cairns jonathan.cairns@babraham.ac.uk @jonathancairns Fraser/Spivakov labs, Babraham Insitute 4th October 2016

  2. Table of Contents Introduction 1 The CHiCAGO model 2 Results 3 J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 2 / 20

  3. Table of Contents Introduction 1 The CHiCAGO model 2 Results 3 J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 3 / 20

  4. Motivation J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 4 / 20

  5. CHi-C: improved resolution at promoters, over Hi-C Lieberman-Aiden et al (2009) J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 5 / 20

  6. CHi-C: improved resolution at promoters, over Hi-C Approx. 12-fold increase in read coverage Sch¨ onfelder et al (2015), Mifsud et al (2015), Sahl´ en et al (2015) J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 5 / 20

  7. The data Align reads & filter out artefacts with HiCUP Wingett et al (2016) J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 6 / 20

  8. The data Align reads & filter out artefacts with HiCUP Obtain counts X ij : Wingett et al (2016) J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 6 / 20

  9. The data Align reads & filter out artefacts with HiCUP Obtain counts X ij : other ends (i) 823,000 1 3 7 5 4 0 0 0 0 1 0 2 0 4 6 5 4 0 baits (j) 0 0 1 2 0 4 6 9 10 ... 22,000 0 0 0 1 1 2 5 3 4 0 0 0 0 0 1 1 5 7 ... ... Wingett et al (2016) J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 6 / 20

  10. MIR625−201 (224546) MIR625 300 ● no interaction ● 200 N ● ● ● 100 ● ● ● ● ● ● 50 ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ●● ● ●● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ●● ● ● ● ● ● 0 ● ● ● ● ● ●● ● ●●●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −6e+05 −4e+05 −2e+05 0e+00 2e+05 4e+05 6e+05 Distance from viewpoint PPP1CB−004,PPP1CB−006,PPP1CB−005,PPP1CB−003,PPP1CB−001,PPP1CB−009,... (340147) PPP1CB ● 500 ● interaction ● 400 300 ● ● N 200 ● ● ● ● ● ● ● ● ● 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● 0 ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● −6e+05 −4e+05 −2e+05 0e+00 2e+05 4e+05 6e+05 Distance from viewpoint

  11. Table of Contents Introduction 1 The CHiCAGO model 2 Results 3 J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 8 / 20

  12. CHiCAGO CHiCAGO – Capture Hi-C Analysis of Genomic Organization. J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 9 / 20

  13. Model Background comes from two sources: J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 10 / 20

  14. Model Background comes from two sources: Brownian Technical Source Random collisions Sequencing artefacts J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 10 / 20

  15. Model Background comes from two sources: Brownian Technical Source Random collisions Sequencing artefacts Depends on Yes (decreasing) No distance? J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 10 / 20

  16. Model Background comes from two sources: Brownian Technical Source Random collisions Sequencing artefacts Depends on Yes (decreasing) No distance? Dominates Close to bait Far from bait J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 10 / 20

  17. Model Background comes from two sources: Brownian Technical Source Random collisions Sequencing artefacts Depends on Yes (decreasing) No distance? Dominates Close to bait Far from bait Under H 0 (no interaction), counts are sum of the two components: X ij = B ij + T ij J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 10 / 20

  18. Brownian background estimation X ij = B ij + T ij J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 11 / 20

  19. Brownian background estimation X ij = B ij + T ij B ij ∼ NB, with E ( B ij ) = f ( d ij ) 50 40 30 250 20 10 0 50 40 30 1177 20 10 0 50 Expected count 40 30 5348 20 10 0 50 40 30 5373 20 10 0 50 40 30 5382 20 10 0 −500000 −250000 0 250000 500000 Distance from bait J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 11 / 20

  20. Brownian background estimation X ij = B ij + T ij B ij ∼ NB, with E ( B ij ) = f ( d ij ) × (bait bias) j 50 40 30 250 20 10 0 50 40 30 1177 20 10 0 50 Expected count 40 30 5348 20 10 0 50 40 30 5373 20 10 0 50 40 30 5382 20 10 0 −500000 −250000 0 250000 500000 Distance from bait J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 11 / 20

  21. Brownian background estimation X ij = B ij + T ij B ij ∼ NB, with E ( B ij ) = f ( d ij ) × (bait bias) j × (other end bias) i 50 40 30 250 20 10 0 50 40 30 1177 20 10 0 50 Expected count 40 30 5348 20 10 0 50 40 30 5373 20 10 0 50 40 30 5382 20 10 0 −500000 −250000 0 250000 500000 Distance from bait J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 11 / 20

  22. Distance function 2.5 2.0 1.5 log(f(d)) 1.0 0.5 0.0 −0.5 10 13 11 12 14 log(distance) Brownian background estimation X ij = B ij + T ij B ij ∼ NB, with E ( B ij ) = f ( d ij ) × (bait bias) j × (other end bias) i J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 11 / 20

  23. Distance function 2.5 2.0 1.5 log(f(d)) 1.0 0.5 0.0 −0.5 10 13 11 12 14 log(distance) Brownian background estimation X ij = B ij + T ij B ij ∼ NB, with E ( B ij ) = f ( d ij ) × (bait bias) j × (other end bias) i f ( d ): estimated close to bait ( < 1 . 5 Mb ) in 20 kb bins. J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 11 / 20

  24. Distance function 2.5 2.0 1.5 log(f(d)) 1.0 0.5 0.0 −0.5 10 13 11 12 14 log(distance) Brownian background estimation X ij = B ij + T ij B ij ∼ NB, with E ( B ij ) = f ( d ij ) × (bait bias) j × (other end bias) i f ( d ): estimated close to bait ( < 1 . 5 Mb ) in 20 kb bins. bin-wise estimates f ( d b ) from geometric mean across baits J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 11 / 20

  25. Brownian background estimation X ij = B ij + T ij B ij ∼ NB, with E ( B ij ) = f ( d ij ) × (bait bias) j × (other end bias) i Distance function f ( d ): 2.5 estimated close to bait 2.0 ( < 1 . 5 Mb ) in 20 kb bins. 1.5 log(f(d)) 1.0 bin-wise estimates f ( d b ) from geometric mean 0.5 across baits 0.0 −0.5 interpolation: cubic fit on log-log scale 10 13 11 12 14 log(distance) J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 11 / 20

  26. Brownian background estimation X ij = B ij + T ij B ij ∼ NB, with E ( B ij ) = f ( d ij ) × (bait bias) j × (other end bias) i Bait-specific bias: J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 11 / 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend