DNA methylation
ACGCGAAACGTTCTATCG
Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 1 / 14
ACGCGAAACGTTCTATCG Peter Hickey (@PeteHaitch) Simulating DNA - - PowerPoint PPT Presentation
DNA methylation ACGCGAAACGTTCTATCG Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 1 / 14 DNA methylation ACGCGAAACGTTCTATCG Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 1 / 14 DNA
Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 1 / 14
Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 1 / 14
Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 1 / 14
Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 1 / 14
Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 2 / 14
Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 2 / 14
Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 2 / 14
Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 2 / 14
Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 2 / 14
0.2 0.5 0.8 1 kb Normals Cancers
Position (bp) Methylation β-values
1Hansen, K. D. et al. Nat Genet 43, 768–775 (2011) Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 3 / 14
0.2 0.5 0.8 1 kb
CpG islands (CGIs)
Normals Cancers
Position (bp) Methylation β-values
1Hansen, K. D. et al. Nat Genet 43, 768–775 (2011) Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 3 / 14
Do methods designed to find DMRs actually work? What method reigns supreme?
Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 4 / 14
Do methods designed to find DMRs actually work? What method reigns supreme?
No “gold standard” data ⇒ simulate
Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 4 / 14
Do methods designed to find DMRs actually work? What method reigns supreme?
No “gold standard” data ⇒ simulate No simulation software ⇒ I’m writing methsim.
Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 4 / 14
Simulate independent βi
d
= Beta(µi, νi) + induce correlation via variogram model.
Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 5 / 14
Simulate independent βi
d
= Beta(µi, νi) + induce correlation via variogram model. Re-sample real data in a way that tries to preserve correlation structure.
Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 5 / 14
Simulate independent βi
d
= Beta(µi, νi) + induce correlation via variogram model. Re-sample real data in a way that tries to preserve correlation structure. β-values are summarised measurements.
Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 5 / 14
Simulate independent βi
d
= Beta(µi, νi) + induce correlation via variogram model. Re-sample real data in a way that tries to preserve correlation structure. β-values are summarised measurements. Correlations of β-values are spurious.
Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 5 / 14
Simulate independent βi
d
= Beta(µi, νi) + induce correlation via variogram model. Re-sample real data in a way that tries to preserve correlation structure. β-values are summarised measurements. Correlations of β-values are spurious.
Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 5 / 14
Simulate independent βi
d
= Beta(µi, νi) + induce correlation via variogram model. Re-sample real data in a way that tries to preserve correlation structure. β-values are summarised measurements. Correlations of β-values are spurious.
Higher resolution.
Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 5 / 14
Simulate independent βi
d
= Beta(µi, νi) + induce correlation via variogram model. Re-sample real data in a way that tries to preserve correlation structure. β-values are summarised measurements. Correlations of β-values are spurious.
Higher resolution. Contains the mechanistic dependence structure.
Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 5 / 14
Simulate independent βi
d
= Beta(µi, νi) + induce correlation via variogram model. Re-sample real data in a way that tries to preserve correlation structure. β-values are summarised measurements. Correlations of β-values are spurious.
Higher resolution. Contains the mechanistic dependence structure. Difficult given current data.
Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 5 / 14
methsim: An R package for simulating whole genome DNA methylation data.
Parameter distributions estimated from input data. Parts written in C++ (via Rcpp). Results today from a preliminary version of methsim.
Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 6 / 14
methsim: An R package for simulating whole genome DNA methylation data.
Parameter distributions estimated from input data. Parts written in C++ (via Rcpp). Results today from a preliminary version of methsim.
Outline of methsim
1 Segment genome into “region of similarity” (MethylSeekR1) 2 Simulate “meta-haplotypes” within each region using Markov
model.
3 Simulate sequencing of reads. aBurger, L., Gaidatzis, D., Schübeler, D. & Stadler, M. B. Nucleic Acids Res
(2013). doi:10.1093/nar/gkt599
Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 6 / 14
(2) For each region: Simulate each meta-haplotype using a Markov model Transition matrices depend on distance between CGs and the type of region Assign haplotype i in region r frequency qi,r
q1,r qi,r qH,r q1,r+1 qi,r+1 qH,r+1
Region r Region r+1 (3) Simulate read positions Simulate reads for region r by sampling from ith haplotype with probability qi,r Simulate sequencing error
Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 7 / 14
(2) For each region: Simulate each meta-haplotype using a Markov model Transition matrices depend on distance between CGs and the type of region Assign haplotype i in region r frequency qi,r
q1,r qi,r qH,r q1,r+1 qi,r+1 qH,r+1
Region r Region r+1 (3) Simulate read positions Simulate reads for region r by sampling from ith haplotype with probability qi,r Simulate sequencing error
Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 7 / 14
CGI Non−CGI 1 2 3 4 1 0 1
β values density
data
Real (ADS) methsim
Distribution of β values
Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 8 / 14
4 4
CGI Non−CGI
50 100 150 200
median log odds ratio
data
Real (ADS) methsim
Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 9 / 14
4 4
all all
50 100 150 200
median log odds ratio
data
ADS MySim
Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 10 / 14
4 4
all all
50 100 150 200
median log odds ratio (80% percentile band)
data
ADS MySim
Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 10 / 14
CGI Non−CGI
Pearson correlation
Real (ADS) methsim
Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 11 / 14
methsim models the mechanistic dependence structure of DNA methylation data.
Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 12 / 14
methsim models the mechanistic dependence structure of DNA methylation data. Will be using methsim to simulate data with inserted DMRs and compare DMR-detection methods.
Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 12 / 14
methsim models the mechanistic dependence structure of DNA methylation data. Will be using methsim to simulate data with inserted DMRs and compare DMR-detection methods. methsim is open source and developed on GitHub.
Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 12 / 14
For advice and supervision Terry Speed (WEHI) and Peter Hall (University of Melbourne). For data Ryan Lister (UWA). For R and C++ help Bioconductor and Rcpp mailing lists, especially Martin Morgan. For funding Australian Postgraduate Award, Victorian Life Sciences Computing Initiative. For sanity Friends and family.
Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 13 / 14
Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 14 / 14