Genome Hacking Yaniv Erlich @erlichya @erlichya 10/26/15 Yaniv - - PowerPoint PPT Presentation

genome hacking
SMART_READER_LITE
LIVE PREVIEW

Genome Hacking Yaniv Erlich @erlichya @erlichya 10/26/15 Yaniv - - PowerPoint PPT Presentation

Genome Hacking Yaniv Erlich @erlichya @erlichya 10/26/15 Yaniv Erlich Intro. Methodology The Venter case Anonymous datasets Summary We need to share genetic information Hereditary Spastic Joubert syndrome Hemifacial Paraparesis


slide-1
SLIDE 1

Yaniv Erlich

@erlichya 10/26/15

Genome Hacking

@erlichya

Yaniv Erlich

slide-2
SLIDE 2

Yaniv Erlich

@erlichya 10/26/15

We need to share genetic information

Hereditary Spastic Paraparesis (Erlich et al.) Joubert syndrome (Endevson et al.) Hemifacial Microsomia (Zielinski,.., & Erlich)

PLoS One

Intro. Methodology The Venter case Anonymous datasets Summary Genetic Privacy

slide-3
SLIDE 3

Yaniv Erlich

@erlichya 10/26/15

Vulnerability research

Genetic Privacy

Intercom Fingerprint reader

IT department

  • f a major

bank

Me

Intro. Methodology The Venter case Anonymous datasets Summary

slide-4
SLIDE 4

Yaniv Erlich

@erlichya 10/26/15

Correlation between Y-chr and surnames

www.ysearch.org:

Y Y

Smith Smith

Y

Smith

Erlich

Genetic privacy Intro. Methodology The Venter case Anonymous datasets Summary

ACACACAC…

slide-5
SLIDE 5

Yaniv Erlich

@erlichya 10/26/15

A systematic study: can we recover the identity of anonymous genomic datasets? The main idea

Intro. Methodology The Venter case Anonymous datasets Summary Genetic Privacy

slide-6
SLIDE 6

Yaniv Erlich

@erlichya 10/26/15

Databases of interest

www.smgf.org www.ysearch.org 140,000 publicly accessible surname-Ychr records

Intro. Methodology The Venter case Anonymous datasets Summary Genetic Privacy

slide-7
SLIDE 7

Yaniv Erlich

@erlichya 10/26/15

How to find surnames?

Estimating the time to most recent common ancestor

Intro. Methodology The Venter case Anonymous datasets Summary

Target i-th record in db

surname Genetic Privacy

slide-8
SLIDE 8

Yaniv Erlich

@erlichya 10/26/15

Empirical test to determine the probability of recovering a US surname

Y-chr of a real person

Querying Ysearch and SMGF

Inferring surname

x900

For US Caucasian males: 12% Successful recoveries 5% Wrong recoveries 83% Unknown

Comparing the predicted surname to the true one Intro. Methodology The Venter case Anonymous datasets Summary Surname inference algorithm Genetic Privacy

slide-9
SLIDE 9

Yaniv Erlich

@erlichya 10/26/15

Distribution of inferred surnames

Most of the inferred surnames are relatively rare

Intro. Methodology The Venter case Anonymous datasets Summary Genetic Privacy

slide-10
SLIDE 10

Yaniv Erlich

@erlichya 10/26/15

The median of age+state+surname is12 males.

Age+state+surname Only age+state

Triangulate individuals with metadata

Age State

0.0% 0.5% 1.0% 1.5% 2.0% 0 10 20 30 40 50 60 70 80 90

Surname

Adams

100,000 rounds

Intro. Methodology The Venter case Anonymous datasets Summary

Age Freq.

Genetic Privacy

slide-11
SLIDE 11

Yaniv Erlich

@erlichya 10/26/15

Putting it all together: the Venter case

www.ysearch.org:

lobSTR

DYS458: 17 repeats

Try it yourself: bit.ly/find_craig

We got a surname from whole genome sequencing data

Intro. Methodology The Venter case Anonymous datasets Summary Genetic Privacy

slide-12
SLIDE 12

Yaniv Erlich

@erlichya 10/26/15

Getting to Craig Venter

Searching for:

  • 1. Venter
  • 2. California
  • 3. Born in 1946
  • 4. Male

In USSearch.com

Intro. Methodology The Venter case Anonymous datasets Summary

Two matches, including:

Genetic Privacy

slide-13
SLIDE 13

Yaniv Erlich

@erlichya 10/26/15

Can we identify anonymous genomes?

Intro. Methodology The Venter case Anonymous datasets Summary Genetic Privacy

slide-14
SLIDE 14

Yaniv Erlich

@erlichya 10/26/15

1000 Genomes cases

surname predictions

10 CEU (Utah) genomes

Found an obituary that has the exact description of the pedigree Probability of a random match < 5x10-9

Winfield Utah

*Some of the details in this slide were modified to respect the identity of the family Intro. Methodology The Venter case Anonymous datasets Summary Genetic Privacy

slide-15
SLIDE 15

Yaniv Erlich

@erlichya 10/26/15

Yaniv Erlich

Beginner’s luck?

In total:

5 successful surname recoveries Breaching the privacy of close to 50 CEU samples.

Successful surname recovery (targeted individual) Patrilineal line from source to target Person tested by genetic genealogy service (source) p<5x10-9 p<10-5 p<5x10-6 p<5x10-6 p<10-5

Intro. Methodology The Venter case Anonymous datasets Summary Genetic Privacy

slide-16
SLIDE 16

Yaniv Erlich

@erlichya 10/26/15

Aftermath

Our study

NIH response

"..And Erlich responded in an exemplary way to his team’s findings by contacting the NIH and other genetics researchers with his findings before publishing them. This sets an important precedent for constructively dealing with newly discovered privacy loopholes, and other researchers should take note.. ” (Nature Editorial)

Intro. Methodology The Venter case Anonymous datasets Summary Genetic Privacy

slide-17
SLIDE 17

Yaniv Erlich

@erlichya 10/26/15

The hitchhiker guide to genome hacking

Intro. Methodology The Venter case Anonymous datasets Summary Genetic Privacy

slide-18
SLIDE 18

Yaniv Erlich

@erlichya 10/26/15

The hitchhiker guide to genome hacking

Intro. Methodology The Venter case Anonymous datasets Summary

ApoE Alzheimer’s disease

Ba___k O__ma i_ t__ Pr______t Barack Obama is the President

Genetic imputation

Genetic Privacy

slide-19
SLIDE 19

Yaniv Erlich

@erlichya 10/26/15

The path forward

Intro. Methodology The Venter case Anonymous datasets Summary

Key points: transparency, reputation system, compensation

Genetic Privacy

slide-20
SLIDE 20

Yaniv Erlich

@erlichya 10/26/15

Yaniv Erlich

@erlichya

slide-21
SLIDE 21

Yaniv Erlich

@erlichya 10/26/15

Acknowledgements

Andria and Paul Heafy

Funding:

T eam Genetic Privacy Melissa Gymrek (HST – Harvard/MIT) Amy McGuire (Baylor) David Golan (Tel-Aviv University) Eran Halperin (Tel-Aviv University)