CHiCAGO: Statistical methodology for signal detection in Capture - PowerPoint PPT Presentation

CHiCAGO: Statistical methodology for signal detection in Capture Hi-C data Jonathan Cairns jonathan.cairns@babraham.ac.uk @jonathancairns Fraser/Spivakov labs, Babraham Insitute 4th October 2016

Table of Contents Introduction 1 The CHiCAGO model 2 Results 3 J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 2 / 20

Motivation J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 4 / 20

CHi-C: improved resolution at promoters, over Hi-C Lieberman-Aiden et al (2009) J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 5 / 20

CHi-C: improved resolution at promoters, over Hi-C Approx. 12-fold increase in read coverage Sch¨ onfelder et al (2015), Mifsud et al (2015), Sahl´ en et al (2015) J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 5 / 20

The data Align reads & filter out artefacts with HiCUP Wingett et al (2016) J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 6 / 20

The data Align reads & filter out artefacts with HiCUP Obtain counts X ij : Wingett et al (2016) J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 6 / 20

The data Align reads & filter out artefacts with HiCUP Obtain counts X ij : other ends (i) 823,000 1 3 7 5 4 0 0 0 0 1 0 2 0 4 6 5 4 0 baits (j) 0 0 1 2 0 4 6 9 10 ... 22,000 0 0 0 1 1 2 5 3 4 0 0 0 0 0 1 1 5 7 ... ... Wingett et al (2016) J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 6 / 20

MIR625−201 (224546) MIR625 300 ● no interaction ● 200 N ● ● ● 100 ● ● ● ● ● ● 50 ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ●● ● ●● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ●● ● ● ● ● ● 0 ● ● ● ● ● ●● ● ●●●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −6e+05 −4e+05 −2e+05 0e+00 2e+05 4e+05 6e+05 Distance from viewpoint PPP1CB−004,PPP1CB−006,PPP1CB−005,PPP1CB−003,PPP1CB−001,PPP1CB−009,... (340147) PPP1CB ● 500 ● interaction ● 400 300 ● ● N 200 ● ● ● ● ● ● ● ● ● 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● 0 ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● −6e+05 −4e+05 −2e+05 0e+00 2e+05 4e+05 6e+05 Distance from viewpoint

CHiCAGO CHiCAGO – Capture Hi-C Analysis of Genomic Organization. J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 9 / 20

Model Background comes from two sources: J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 10 / 20

Model Background comes from two sources: Brownian Technical Source Random collisions Sequencing artefacts J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 10 / 20

Model Background comes from two sources: Brownian Technical Source Random collisions Sequencing artefacts Depends on Yes (decreasing) No distance? J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 10 / 20

Model Background comes from two sources: Brownian Technical Source Random collisions Sequencing artefacts Depends on Yes (decreasing) No distance? Dominates Close to bait Far from bait J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 10 / 20

Model Background comes from two sources: Brownian Technical Source Random collisions Sequencing artefacts Depends on Yes (decreasing) No distance? Dominates Close to bait Far from bait Under H 0 (no interaction), counts are sum of the two components: X ij = B ij + T ij J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 10 / 20

Brownian background estimation X ij = B ij + T ij J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 11 / 20

Brownian background estimation X ij = B ij + T ij B ij ∼ NB, with E ( B ij ) = f ( d ij ) 50 40 30 250 20 10 0 50 40 30 1177 20 10 0 50 Expected count 40 30 5348 20 10 0 50 40 30 5373 20 10 0 50 40 30 5382 20 10 0 −500000 −250000 0 250000 500000 Distance from bait J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 11 / 20

Brownian background estimation X ij = B ij + T ij B ij ∼ NB, with E ( B ij ) = f ( d ij ) × (bait bias) j 50 40 30 250 20 10 0 50 40 30 1177 20 10 0 50 Expected count 40 30 5348 20 10 0 50 40 30 5373 20 10 0 50 40 30 5382 20 10 0 −500000 −250000 0 250000 500000 Distance from bait J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 11 / 20

Brownian background estimation X ij = B ij + T ij B ij ∼ NB, with E ( B ij ) = f ( d ij ) × (bait bias) j × (other end bias) i 50 40 30 250 20 10 0 50 40 30 1177 20 10 0 50 Expected count 40 30 5348 20 10 0 50 40 30 5373 20 10 0 50 40 30 5382 20 10 0 −500000 −250000 0 250000 500000 Distance from bait J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 11 / 20

Distance function 2.5 2.0 1.5 log(f(d)) 1.0 0.5 0.0 −0.5 10 13 11 12 14 log(distance) Brownian background estimation X ij = B ij + T ij B ij ∼ NB, with E ( B ij ) = f ( d ij ) × (bait bias) j × (other end bias) i J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 11 / 20

Distance function 2.5 2.0 1.5 log(f(d)) 1.0 0.5 0.0 −0.5 10 13 11 12 14 log(distance) Brownian background estimation X ij = B ij + T ij B ij ∼ NB, with E ( B ij ) = f ( d ij ) × (bait bias) j × (other end bias) i f ( d ): estimated close to bait ( < 1 . 5 Mb ) in 20 kb bins. J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 11 / 20

Distance function 2.5 2.0 1.5 log(f(d)) 1.0 0.5 0.0 −0.5 10 13 11 12 14 log(distance) Brownian background estimation X ij = B ij + T ij B ij ∼ NB, with E ( B ij ) = f ( d ij ) × (bait bias) j × (other end bias) i f ( d ): estimated close to bait ( < 1 . 5 Mb ) in 20 kb bins. bin-wise estimates f ( d b ) from geometric mean across baits J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 11 / 20

Brownian background estimation X ij = B ij + T ij B ij ∼ NB, with E ( B ij ) = f ( d ij ) × (bait bias) j × (other end bias) i Distance function f ( d ): 2.5 estimated close to bait 2.0 ( < 1 . 5 Mb ) in 20 kb bins. 1.5 log(f(d)) 1.0 bin-wise estimates f ( d b ) from geometric mean 0.5 across baits 0.0 −0.5 interpolation: cubic fit on log-log scale 10 13 11 12 14 log(distance) J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 11 / 20

Brownian background estimation X ij = B ij + T ij B ij ∼ NB, with E ( B ij ) = f ( d ij ) × (bait bias) j × (other end bias) i Bait-specific bias: J. Cairns (Babraham Institute) regulatorygenomicsgroup.org/chicago 11 / 20

CHiCAGO: Statistical methodology for signal detection in Capture - PowerPoint PPT Presentation

CHiCAGO: Statistical methodology for signal detection in Capture Hi-C data Jonathan Cairns jonathan.cairns@babraham.ac.uk @jonathancairns Fraser/Spivakov labs, Babraham Insitute 4th October 2016 Table of Contents Introduction 1 The CHiCAGO

Tx Signal: 1000 Hz sine wave; Attenuation; Random noise with 0.5ms spike Tx Signal Noise Rx

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Speech Processing 15-492/18-492 Speech Synthesis Signal Processing Signal Manipulation Signal

Waveform Generation Fundamental part of signal processing is the signal. Within the

Sampling a Signal an analog signal together with some samples of the signal. The samples

Signal Types Recall even digital signals are just voltages Analog signal Continuous

Signal Types Recall even digital signals are just voltages Analog signal Continuous

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Scaling Methodology Scaling Methodology Dan Smith Director HW Engineering dsmith@nvidia.com

Comparison of Preparatory Signal Comparison of Preparatory Signal Detection Techniques for

Collaborative Signal Detection: Human-human and Human-computer teams Jason S. McCarley Ali

Low Level Low Level Low Level Low Level Detection of Detection of Detection of Detection of

Routine signal detection and statistical tools on paediatrics Paediatric workshop 28 April

Lecture 10 Pulses Chapter 8 and 9 Detection of a Binary Signal Detection of a Binary Wave-

OFDM Signal Navigation NAV 2008 2 OFDM Signal Navigation NAV 2008 3 OFDM Signal Navigation

Weak-Signal Digital Modes Weak-Signal Digital Modes The weak-signal digimodes have been

Reduced immune response to vaccinaFons in children with elevated exposure to perfluorinated

There is nothing There is nothing permanent except change permanent except change

Math 140 Most of what weve done so far is data explorationways to uncover, display, and

iRODS usage at CC-IN2P3: a long history Jean-Yves Nief Yonny Cardenas Pascal Calvat What is

Fuku Fukushima L shima Lesso essons ns Unlear Unlearned ned for or Def Defense ense-In

Cancer control in Aotearoa New Zealand Dr Bev Lawton, Dr Tony Blakely, Dr Sara Filoche UICC Sept

SOURCES OF PFAA S AND PBDE EXPOSURE ON ST. LAWRENCE ISLAND AK CHE 11/1015 1 PCB contamination

SUMMARY AND RECOMMENDATIONS Data integraon, analysis, and interpretaon of eight academic

Sambuz

Useful Links

Newsletter

Mail Us

CHiCAGO: Statistical methodology for signal detection in Capture - PowerPoint PPT Presentation

CHiCAGO: Statistical methodology for signal detection in Capture Hi-C data Jonathan Cairns jonathan.cairns@babraham.ac.uk @jonathancairns Fraser/Spivakov labs, Babraham Insitute 4th October 2016 Table of Contents Introduction 1 The CHiCAGO

Tx Signal: 1000 Hz sine wave; Attenuation; Random noise with 0.5ms spike Tx Signal Noise Rx

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Speech Processing 15-492/18-492 Speech Synthesis Signal Processing Signal Manipulation Signal

Waveform Generation Fundamental part of signal processing is the signal. Within the

Sampling a Signal an analog signal together with some samples of the signal. The samples

Signal Types Recall even digital signals are just voltages Analog signal Continuous

Signal Types Recall even digital signals are just voltages Analog signal Continuous

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Scaling Methodology Scaling Methodology Dan Smith Director HW Engineering dsmith@nvidia.com

Comparison of Preparatory Signal Comparison of Preparatory Signal Detection Techniques for

Collaborative Signal Detection: Human-human and Human-computer teams Jason S. McCarley Ali

Low Level Low Level Low Level Low Level Detection of Detection of Detection of Detection of

Routine signal detection and statistical tools on paediatrics Paediatric workshop 28 April

Lecture 10 Pulses Chapter 8 and 9 Detection of a Binary Signal Detection of a Binary Wave-

OFDM Signal Navigation NAV 2008 2 OFDM Signal Navigation NAV 2008 3 OFDM Signal Navigation

Weak-Signal Digital Modes Weak-Signal Digital Modes The weak-signal digimodes have been

Reduced immune response to vaccinaFons in children with elevated exposure to perfluorinated

There is nothing There is nothing permanent except change permanent except change

Math 140 Most of what weve done so far is data explorationways to uncover, display, and

iRODS usage at CC-IN2P3: a long history Jean-Yves Nief Yonny Cardenas Pascal Calvat What is

Fuku Fukushima L shima Lesso essons ns Unlear Unlearned ned for or Def Defense ense-In

Cancer control in Aotearoa New Zealand Dr Bev Lawton, Dr Tony Blakely, Dr Sara Filoche UICC Sept

SOURCES OF PFAA S AND PBDE EXPOSURE ON ST. LAWRENCE ISLAND AK CHE 11/1015 1 PCB contamination

SUMMARY AND RECOMMENDATIONS Data integra*on, analysis, and interpreta*on of eight academic

Sambuz

Useful Links

Newsletter

Mail Us

SUMMARY AND RECOMMENDATIONS Data integraon, analysis, and interpretaon of eight academic