genome hacking
play

Genome Hacking Yaniv Erlich @erlichya @erlichya 10/26/15 Yaniv - PowerPoint PPT Presentation

Genome Hacking Yaniv Erlich @erlichya @erlichya 10/26/15 Yaniv Erlich Intro. Methodology The Venter case Anonymous datasets Summary We need to share genetic information Hereditary Spastic Joubert syndrome Hemifacial Paraparesis


  1. Genome Hacking Yaniv Erlich @erlichya @erlichya 10/26/15 Yaniv Erlich

  2. Intro. Methodology The Venter case Anonymous datasets Summary We need to share genetic information Hereditary Spastic Joubert syndrome Hemifacial Paraparesis (Endevson et al.) Microsomia (Erlich et al.) (Zielinski,.., & Erlich) PLoS One Genetic Privacy @erlichya 10/26/15 Yaniv Erlich

  3. Intro. Methodology The Venter case Anonymous datasets Summary Vulnerability research Me Intercom IT department of a major Fingerprint bank reader Genetic Privacy @erlichya 10/26/15 Yaniv Erlich

  4. Intro. Methodology The Venter case Anonymous datasets Summary Correlation between Y-chr and surnames Y www.ysearch.org: Smith Y Smith ACACACAC… Y Smith Erlich Genetic privacy @erlichya 10/26/15 Yaniv Erlich

  5. Intro. Methodology The Venter case Anonymous datasets Summary The main idea A systematic study: can we recover the identity of anonymous genomic datasets? Genetic Privacy @erlichya 10/26/15 Yaniv Erlich

  6. Intro. Methodology The Venter case Anonymous datasets Summary Databases of interest 1 40,000 publicly accessible surname-Ychr records www.ysearch.org www.smgf.org Genetic Privacy @erlichya 10/26/15 Yaniv Erlich

  7. Intro. Methodology The Venter case Anonymous datasets Summary How to find surnames? Estimating the time to most recent common ancestor surname i-th record Target in db Genetic Privacy @erlichya 10/26/15 Yaniv Erlich

  8. Intro. Methodology The Venter case Anonymous datasets Summary Empirical test to determine the probability of recovering a US surname Inferring surname Surname inference Querying Y-chr of a real algorithm Ysearch and x900 person SMGF Comparing the predicted surname to the true one For US Caucasian males: 12% Successful recoveries 5% Wrong recoveries 83% Unknown Genetic Privacy @erlichya 10/26/15 Yaniv Erlich

  9. Intro. Methodology The Venter case Anonymous datasets Summary Distribution of inferred surnames Most of the inferred surnames are relatively rare Genetic Privacy @erlichya 10/26/15 Yaniv Erlich

  10. Intro. Methodology The Venter case Anonymous datasets Summary Triangulate individuals with metadata Age State Surname 2.0% 1.5% Freq. 1.0% 0.5% 0.0% Adams 0 10 20 30 40 50 60 70 80 90 Age 100,000 rounds The median of Age+state+surname age+state+surname Only age+state is 1 2 males. Genetic Privacy @erlichya 10/26/15 Yaniv Erlich

  11. Intro. Methodology The Venter case Anonymous datasets Summary Putting it all together: the Venter case We got a surname from whole genome sequencing data DYS458: 17 repeats lobSTR www.ysearch.org: Try it yourself: bit.ly/find_craig Genetic Privacy @erlichya 10/26/15 Yaniv Erlich

  12. Intro. Methodology The Venter case Anonymous datasets Summary Getting to Craig Venter Searching for: 1. Venter 2. California 3. Born in 1946 4. Male In USSearch.com Two matches, including: Genetic Privacy @erlichya 10/26/15 Yaniv Erlich

  13. Intro. Methodology The Venter case Anonymous datasets Summary Can we identify anonymous genomes? Genetic Privacy @erlichya 10/26/15 Yaniv Erlich

  14. Intro. Methodology The Venter case Anonymous datasets Summary 1000 Genomes cases 10 CEU (Utah) genomes Winfield Utah Found an obituary that has the exact description of the pedigree surname Probability of a random match < 5x10 -9 predictions *Some of the details in this slide were modified to respect the identity of the family Genetic Privacy @erlichya 10/26/15 Yaniv Erlich

  15. Intro. Methodology The Venter case Anonymous datasets Summary Beginner’s luck? p<10 -5 p<5x10 -6 p<5x10 -6 p<10 -5 p<5x10 -9 Successful surname recovery (targeted individual) Person tested by genetic genealogy service (source) Patrilineal line from source to target In total: 5 successful surname recoveries Breaching the privacy of close to 50 CEU samples . Genetic Privacy @erlichya 10/26/15 Yaniv Erlich Yaniv Erlich

  16. Intro. Methodology The Venter case Anonymous datasets Summary Aftermath Our study "..And Erlich responded in an exemplary way to his team’s findings by contacting the NIH and other genetics researchers with his findings before publishing them. This sets an important precedent for constructively dealing with newly discovered privacy loopholes, and other researchers NIH response should take note.. ” (Nature Editorial) Genetic Privacy @erlichya 10/26/15 Yaniv Erlich

  17. Intro. Methodology The Venter case Anonymous datasets Summary The hitchhiker guide to genome hacking Genetic Privacy @erlichya 10/26/15 Yaniv Erlich

  18. Intro. Methodology The Venter case Anonymous datasets Summary The hitchhiker guide to genome hacking Genetic imputation ApoE Alzheimer’s disease Ba___k O__ma i_ t__ Pr______t Barack Obama is the President Genetic Privacy @erlichya 10/26/15 Yaniv Erlich

  19. Intro. Methodology The Venter case Anonymous datasets Summary The path forward Key points: transparency, reputation system, compensation Genetic Privacy @erlichya 10/26/15 Yaniv Erlich

  20. Yaniv Erlich @erlichya @erlichya 10/26/15 Yaniv Erlich

  21. Acknowledgements T eam Genetic Privacy Melissa Gymrek (HST – Harvard/MIT) Amy McGuire (Baylor) David Golan (Tel-Aviv University) Eran Halperin (Tel-Aviv University) Funding: Andria and Paul Heafy @erlichya 10/26/15 Yaniv Erlich

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend