acgcgaaacgttctatcg
play

ACGCGAAACGTTCTATCG Peter Hickey (@PeteHaitch) Simulating DNA - PowerPoint PPT Presentation

DNA methylation ACGCGAAACGTTCTATCG Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 1 / 14 DNA methylation ACGCGAAACGTTCTATCG Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 1 / 14 DNA


  1. DNA methylation ACGCGAAACGTTCTATCG Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 1 / 14

  2. DNA methylation ACGCGAAACGTTCTATCG Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 1 / 14

  3. DNA methylation CH 3 CH 3 CH 3 ACGCGAAACGTTCTATCG Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 1 / 14

  4. DNA methylation CH 3 CH 3 CH 3 ACGCGAAACGTTCTATCG Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 1 / 14

  5. Measuring DNA methylation Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 2 / 14

  6. Measuring DNA methylation Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 2 / 14

  7. Measuring DNA methylation Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 2 / 14

  8. Measuring DNA methylation Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 2 / 14

  9. Measuring DNA methylation β i = 3/3 β i+2 = 2/4 β i+3 = 0/4 β i+1 = 4/4 Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 2 / 14

  10. Differentially methylated regions (DMRs) 1 0.8 Methylation 0.5 Normals β-values Cancers 0.2 1 kb Position (bp) 1 Hansen, K. D. et al. Nat Genet 43, 768–775 (2011) Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 3 / 14

  11. Differentially methylated regions (DMRs) 1 0.8 Methylation 0.5 Normals β-values Cancers 0.2 1 kb CpG islands (CGIs) Position (bp) 1 Hansen, K. D. et al. Nat Genet 43, 768–775 (2011) Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 3 / 14

  12. Why I care about simulating DNA methylation data Methods development and validation Do methods designed to find DMRs actually work? What method reigns supreme? Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 4 / 14

  13. Why I care about simulating DNA methylation data Methods development and validation Do methods designed to find DMRs actually work? What method reigns supreme? How to decide? No “gold standard” data ⇒ simulate Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 4 / 14

  14. Why I care about simulating DNA methylation data Methods development and validation Do methods designed to find DMRs actually work? What method reigns supreme? How to decide? No “gold standard” data ⇒ simulate No simulation software ⇒ I’m writing methsim . Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 4 / 14

  15. Simulation approaches Simulate β -values d Simulate independent β i = Beta ( µ i , ν i ) + induce correlation via variogram model. Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 5 / 14

  16. Simulation approaches Simulate β -values d Simulate independent β i = Beta ( µ i , ν i ) + induce correlation via variogram model. Re-sample real data in a way that tries to preserve correlation structure. Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 5 / 14

  17. Simulation approaches Simulate β -values d Simulate independent β i = Beta ( µ i , ν i ) + induce correlation via variogram model. Re-sample real data in a way that tries to preserve correlation structure. β -values are summarised measurements. Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 5 / 14

  18. Simulation approaches Simulate β -values d Simulate independent β i = Beta ( µ i , ν i ) + induce correlation via variogram model. Re-sample real data in a way that tries to preserve correlation structure. β -values are summarised measurements. Correlations of β -values are spurious. Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 5 / 14

  19. Simulation approaches Simulate β -values d Simulate independent β i = Beta ( µ i , ν i ) + induce correlation via variogram model. Re-sample real data in a way that tries to preserve correlation structure. β -values are summarised measurements. Correlations of β -values are spurious. Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 5 / 14

  20. Simulation approaches Simulate β -values d Simulate independent β i = Beta ( µ i , ν i ) + induce correlation via variogram model. Re-sample real data in a way that tries to preserve correlation structure. β -values are summarised measurements. Correlations of β -values are spurious. Simulate individual methylation events Higher resolution. Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 5 / 14

  21. Simulation approaches Simulate β -values d Simulate independent β i = Beta ( µ i , ν i ) + induce correlation via variogram model. Re-sample real data in a way that tries to preserve correlation structure. β -values are summarised measurements. Correlations of β -values are spurious. Simulate individual methylation events Higher resolution. Contains the mechanistic dependence structure. Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 5 / 14

  22. Simulation approaches Simulate β -values d Simulate independent β i = Beta ( µ i , ν i ) + induce correlation via variogram model. Re-sample real data in a way that tries to preserve correlation structure. β -values are summarised measurements. Correlations of β -values are spurious. Simulate individual methylation events Higher resolution. Contains the mechanistic dependence structure. Difficult given current data. Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 5 / 14

  23. My solution methsim : An R package for simulating whole genome DNA methylation data. Parameter distributions estimated from input data. Parts written in C ++ (via Rcpp ). Results today from a preliminary version of methsim . Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 6 / 14

  24. My solution methsim : An R package for simulating whole genome DNA methylation data. Parameter distributions estimated from input data. Parts written in C ++ (via Rcpp ). Results today from a preliminary version of methsim . Outline of methsim 1 Segment genome into “region of similarity” ( MethylSeekR 1 ) 2 Simulate “meta-haplotypes” within each region using Markov model. 3 Simulate sequencing of reads. a Burger, L., Gaidatzis, D., Schübeler, D. & Stadler, M. B. Nucleic Acids Res (2013). doi:10.1093/nar/gkt599 Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 6 / 14

  25. Simulating meta-haplotypes (2) For each region: Simulate each meta-haplotype using a Markov model Transition matrices depend on distance between CGs and the type of region Assign haplotype i in region r frequency q i,r q 1,r q 1,r+1 q i,r q i,r+1 q H,r q H,r+1 Region r Region r+1 (3) Simulate read positions Simulate reads for region r by sampling from i th haplotype with probability q i,r Simulate sequencing error Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 7 / 14

  26. Simulating meta-haplotypes (2) For each region: Simulate each meta-haplotype using a Markov model Transition matrices depend on distance between CGs and the type of region Assign haplotype i in region r frequency q i,r q 1,r q 1,r+1 q i,r q i,r+1 q H,r q H,r+1 Region r Region r+1 (3) Simulate read positions Simulate reads for region r by sampling from i th haplotype with probability q i,r Simulate sequencing error Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 7 / 14

  27. Distribution of β values CGI Non−CGI 4 3 density data Real (ADS) methsim 2 1 0 0 1 0 1 β values Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 8 / 14

  28. Within haplotype co-methylation at neighbouring CpGs 4 median log odds ratio CGI 0 data 4 Real (ADS) Non−CGI methsim 0 0 50 100 150 200 Distance between CpGs (bp) Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 9 / 14

  29. Within haplotype co-methylation at neighbouring CpGs 4 median log odds ratio all 0 data 4 ADS MySim all 0 0 50 100 150 200 Distance between CpGs (bp) Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 10 / 14

  30. Within haplotype co-methylation at neighbouring CpGs 4 (80% percentile band) median log odds ratio all 0 data 4 ADS MySim all 0 0 50 100 150 200 Distance between CpGs (bp) Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 10 / 14

  31. Correlations of pairs of β values 1 Pearson correlation CGI 0 data Real (ADS) 1 methsim Non−CGI 0 0 250 500 750 1000 Distance between CpGs (bp) Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 11 / 14

  32. Summary methsim models the mechanistic dependence structure of DNA methylation data. Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 12 / 14

  33. Summary methsim models the mechanistic dependence structure of DNA methylation data. Will be using methsim to simulate data with inserted DMRs and compare DMR-detection methods. Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 12 / 14

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend