ACGCGAAACGTTCTATCG Peter Hickey (@PeteHaitch) Simulating DNA - - PowerPoint PPT Presentation

acgcgaaacgttctatcg
SMART_READER_LITE
LIVE PREVIEW

ACGCGAAACGTTCTATCG Peter Hickey (@PeteHaitch) Simulating DNA - - PowerPoint PPT Presentation

DNA methylation ACGCGAAACGTTCTATCG Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 1 / 14 DNA methylation ACGCGAAACGTTCTATCG Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 1 / 14 DNA


slide-1
SLIDE 1

DNA methylation

ACGCGAAACGTTCTATCG

Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 1 / 14

slide-2
SLIDE 2

DNA methylation

ACGCGAAACGTTCTATCG

Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 1 / 14

slide-3
SLIDE 3

DNA methylation

ACGCGAAACGTTCTATCG

CH3CH3 CH3

Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 1 / 14

slide-4
SLIDE 4

DNA methylation

ACGCGAAACGTTCTATCG

CH3CH3 CH3

Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 1 / 14

slide-5
SLIDE 5

Measuring DNA methylation

Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 2 / 14

slide-6
SLIDE 6

Measuring DNA methylation

Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 2 / 14

slide-7
SLIDE 7

Measuring DNA methylation

Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 2 / 14

slide-8
SLIDE 8

Measuring DNA methylation

Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 2 / 14

slide-9
SLIDE 9

Measuring DNA methylation

βi = 3/3 βi+1 = 4/4 βi+2 = 2/4 βi+3 = 0/4

Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 2 / 14

slide-10
SLIDE 10

Differentially methylated regions (DMRs)1

0.2 0.5 0.8 1 kb Normals Cancers

Position (bp) Methylation β-values

1Hansen, K. D. et al. Nat Genet 43, 768–775 (2011) Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 3 / 14

slide-11
SLIDE 11

Differentially methylated regions (DMRs)1

0.2 0.5 0.8 1 kb

CpG islands (CGIs)

Normals Cancers

Position (bp) Methylation β-values

1Hansen, K. D. et al. Nat Genet 43, 768–775 (2011) Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 3 / 14

slide-12
SLIDE 12

Why I care about simulating DNA methylation data

Methods development and validation

Do methods designed to find DMRs actually work? What method reigns supreme?

Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 4 / 14

slide-13
SLIDE 13

Why I care about simulating DNA methylation data

Methods development and validation

Do methods designed to find DMRs actually work? What method reigns supreme?

How to decide?

No “gold standard” data ⇒ simulate

Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 4 / 14

slide-14
SLIDE 14

Why I care about simulating DNA methylation data

Methods development and validation

Do methods designed to find DMRs actually work? What method reigns supreme?

How to decide?

No “gold standard” data ⇒ simulate No simulation software ⇒ I’m writing methsim.

Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 4 / 14

slide-15
SLIDE 15

Simulation approaches

Simulate β-values

Simulate independent βi

d

= Beta(µi, νi) + induce correlation via variogram model.

Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 5 / 14

slide-16
SLIDE 16

Simulation approaches

Simulate β-values

Simulate independent βi

d

= Beta(µi, νi) + induce correlation via variogram model. Re-sample real data in a way that tries to preserve correlation structure.

Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 5 / 14

slide-17
SLIDE 17

Simulation approaches

Simulate β-values

Simulate independent βi

d

= Beta(µi, νi) + induce correlation via variogram model. Re-sample real data in a way that tries to preserve correlation structure. β-values are summarised measurements.

Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 5 / 14

slide-18
SLIDE 18

Simulation approaches

Simulate β-values

Simulate independent βi

d

= Beta(µi, νi) + induce correlation via variogram model. Re-sample real data in a way that tries to preserve correlation structure. β-values are summarised measurements. Correlations of β-values are spurious.

Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 5 / 14

slide-19
SLIDE 19

Simulation approaches

Simulate β-values

Simulate independent βi

d

= Beta(µi, νi) + induce correlation via variogram model. Re-sample real data in a way that tries to preserve correlation structure. β-values are summarised measurements. Correlations of β-values are spurious.

Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 5 / 14

slide-20
SLIDE 20

Simulation approaches

Simulate β-values

Simulate independent βi

d

= Beta(µi, νi) + induce correlation via variogram model. Re-sample real data in a way that tries to preserve correlation structure. β-values are summarised measurements. Correlations of β-values are spurious.

Simulate individual methylation events

Higher resolution.

Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 5 / 14

slide-21
SLIDE 21

Simulation approaches

Simulate β-values

Simulate independent βi

d

= Beta(µi, νi) + induce correlation via variogram model. Re-sample real data in a way that tries to preserve correlation structure. β-values are summarised measurements. Correlations of β-values are spurious.

Simulate individual methylation events

Higher resolution. Contains the mechanistic dependence structure.

Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 5 / 14

slide-22
SLIDE 22

Simulation approaches

Simulate β-values

Simulate independent βi

d

= Beta(µi, νi) + induce correlation via variogram model. Re-sample real data in a way that tries to preserve correlation structure. β-values are summarised measurements. Correlations of β-values are spurious.

Simulate individual methylation events

Higher resolution. Contains the mechanistic dependence structure. Difficult given current data.

Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 5 / 14

slide-23
SLIDE 23

My solution

methsim: An R package for simulating whole genome DNA methylation data.

Parameter distributions estimated from input data. Parts written in C++ (via Rcpp). Results today from a preliminary version of methsim.

Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 6 / 14

slide-24
SLIDE 24

My solution

methsim: An R package for simulating whole genome DNA methylation data.

Parameter distributions estimated from input data. Parts written in C++ (via Rcpp). Results today from a preliminary version of methsim.

Outline of methsim

1 Segment genome into “region of similarity” (MethylSeekR1) 2 Simulate “meta-haplotypes” within each region using Markov

model.

3 Simulate sequencing of reads. aBurger, L., Gaidatzis, D., Schübeler, D. & Stadler, M. B. Nucleic Acids Res

(2013). doi:10.1093/nar/gkt599

Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 6 / 14

slide-25
SLIDE 25

Simulating meta-haplotypes

(2) For each region: Simulate each meta-haplotype using a Markov model Transition matrices depend on distance between CGs and the type of region Assign haplotype i in region r frequency qi,r

q1,r qi,r qH,r q1,r+1 qi,r+1 qH,r+1

Region r Region r+1 (3) Simulate read positions Simulate reads for region r by sampling from ith haplotype with probability qi,r Simulate sequencing error

Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 7 / 14

slide-26
SLIDE 26

Simulating meta-haplotypes

(2) For each region: Simulate each meta-haplotype using a Markov model Transition matrices depend on distance between CGs and the type of region Assign haplotype i in region r frequency qi,r

q1,r qi,r qH,r q1,r+1 qi,r+1 qH,r+1

Region r Region r+1 (3) Simulate read positions Simulate reads for region r by sampling from ith haplotype with probability qi,r Simulate sequencing error

Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 7 / 14

slide-27
SLIDE 27

CGI Non−CGI 1 2 3 4 1 0 1

β values density

data

Real (ADS) methsim

Distribution of β values

Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 8 / 14

slide-28
SLIDE 28

4 4

CGI Non−CGI

50 100 150 200

Distance between CpGs (bp)

median log odds ratio

data

Real (ADS) methsim

Within haplotype co-methylation at neighbouring CpGs

Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 9 / 14

slide-29
SLIDE 29

4 4

all all

50 100 150 200

Distance between CpGs (bp)

median log odds ratio

data

ADS MySim

Within haplotype co-methylation at neighbouring CpGs

Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 10 / 14

slide-30
SLIDE 30

4 4

all all

50 100 150 200

Distance between CpGs (bp)

median log odds ratio (80% percentile band)

data

ADS MySim

Within haplotype co-methylation at neighbouring CpGs

Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 10 / 14

slide-31
SLIDE 31

1 1

CGI Non−CGI

250 500 750 1000

Distance between CpGs (bp)

Pearson correlation

data

Real (ADS) methsim

Correlations of pairs of β values

Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 11 / 14

slide-32
SLIDE 32

Summary

methsim models the mechanistic dependence structure of DNA methylation data.

Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 12 / 14

slide-33
SLIDE 33

Summary

methsim models the mechanistic dependence structure of DNA methylation data. Will be using methsim to simulate data with inserted DMRs and compare DMR-detection methods.

Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 12 / 14

slide-34
SLIDE 34

Summary

methsim models the mechanistic dependence structure of DNA methylation data. Will be using methsim to simulate data with inserted DMRs and compare DMR-detection methods. methsim is open source and developed on GitHub.

Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 12 / 14

slide-35
SLIDE 35

Thanks

For advice and supervision Terry Speed (WEHI) and Peter Hall (University of Melbourne). For data Ryan Lister (UWA). For R and C++ help Bioconductor and Rcpp mailing lists, especially Martin Morgan. For funding Australian Postgraduate Award, Victorian Life Sciences Computing Initiative. For sanity Friends and family.

Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 13 / 14

slide-36
SLIDE 36

To find out more www.peterhickey.org/ASC2014 GitHub/Twitter: @PeteHaitch

Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 14 / 14