1 , Roberta Baronio 1 , Emiliano De Cristofaro 2 , Pierre Baldi 1 , - - PowerPoint PPT Presentation

1 roberta baronio 1 emiliano de cristofaro 2 pierre baldi
SMART_READER_LITE
LIVE PREVIEW

1 , Roberta Baronio 1 , Emiliano De Cristofaro 2 , Pierre Baldi 1 , - - PowerPoint PPT Presentation

1 , Roberta Baronio 1 , Emiliano De Cristofaro 2 , Pierre Baldi 1 , and Gene Tsudik 1 Paolo Gasti 1 UC Irvine 2 PARC work done while at UC Irvine * See: http://www.imdb.com/title/tt0119177/ 2 Outline Genomics Background Privacy


slide-1
SLIDE 1

Pierre Baldi

1, Roberta Baronio 1, Emiliano De Cristofaro 2,

Paolo Gasti

1, and Gene Tsudik 1

1 UC Irvine 2 PARC – work done while at UC Irvine

* See: http://www.imdb.com/title/tt0119177/

slide-2
SLIDE 2

Outline

  • Genomics Background
  • Privacy Concerns
  • Related Work and Challenges
  • Privacy-Preserving Testing of Full Human Genomes
  • Paternity Test
  • Personalized Medicine
  • Compatibility Tests
  • Conclusion

2

slide-3
SLIDE 3

Genomics 101

  • Genome:
  • Contains all of the biological information needed to build

and maintain a “living example” of an organism

  • Encoded in DNA, one polymer of nucleotides
  • A,G,C,T
  • Human Genome:
  • Approximately 3 billion nucleotides
  • Stored in 23 chromosome pairs (plus mtDNA)
  • DNA Sequencing:
  • Determining precise sequence of nucleotides in a

strand of DNA

  • Since the 70’s, a major driving force in life-science
  • Rise of High-Throughput Sequencing (HTS)

3

Image ¡from: ¡bio.unt.edu ¡ Image ¡from: ¡scilogs.be ¡

slide-4
SLIDE 4

Full Genome Sequencing (FGS)

  • Full Sequencing of Human Genomes:
  • The “Human Genome Project”: first full genome in 2003
  • In UK, 1000 genomes are already available
  • The “Race for $1000 genome” by 2012
  • $100 by 2017

4

Image ¡from: ¡eyeondna.com ¡

slide-5
SLIDE 5

Full Genome Sequencing (FGS)

  • Advances in FGS:
  • The “Human Genome Project”: first full genome in 2003
  • In UK, 1000 genomes are already available
  • The “Race for $1000 genome” by 2012
  • $100 by 2017

Ubiquitous availability of FGS is in sight!

  • New Frontiers:
  • Better understanding of human genome
  • Most individuals will have access to their (full) genomes
  • Personalized Medicine
  • Testing not only in-vitro but also in-silico
  • Cheaper and more accurate genetic testing

5

Image ¡from: ¡eyeondna.com ¡ Image ¡from: ¡blog.bufferapp.com ¡

slide-6
SLIDE 6

What about privacy?

  • Sensitivity of human genome:
  • Uniquely identifies an individual (and discloses

ethnicity, disease predispositions, phenotypic traits, …)

  • Once leaked, it cannot be “revoked”
  • De-identification and obfuscation are not effective
  • Legislation, e.g., Genetic Information Nondiscrimination Act (GINA)
  • Privacy challenges:
  • Available legislation often not technical enough
  • Need for a better understanding of genomics applications
  • Ubiquitous availability of low-cost FGS will amplify privacy concerns…

…It is not too early to investigate them!

6

Image ¡from: ¡scienceprogress.org ¡

slide-7
SLIDE 7

Testing on Full Human Genomes

Availability of affordable FGS allows to query/test genomic information not only in vitro but also in silico, e.g.,:

  • Paternity Tests
  • Commercial in-vitro testing widespread (starting at $79)
  • With the availability of full genomes, we can design

algorithms (w/o the need for external companies)

  • Personalized Medicine
  • Treatment/medication tailored to patient’s genetic makeup
  • E.g., testing of tpmt gene advised before prescribing

drugs for childhood leukemia and autoimmune diseases

7

Image ¡from: ¡frogsmoke.com ¡ Image ¡from: ¡8ieldofscience.com ¡

slide-8
SLIDE 8

Testing on Full Human Genomes (2)

  • Genetic Tests
  • Newborn/fetal screening
  • Confirmational diagnostics
  • Pre-symptomatic testing
  • E.g., Huntington’s disease
  • Compatibility tests
  • Dating web sites finding “good matches”
  • Partners assessing possibility of transmitting on to their

children genetic diseases with Mendelian inheritance [1]

8

Image ¡from: ¡dnares.in ¡ [1] V. McKusick and S. Antonarakis. Mendelian inheritance in man: a catalog of human genes and genetic disorders. John Hopkins University Press, 1994.

slide-9
SLIDE 9

Related Work

  • Crypto techniques with applications to

DNA testing:

  • [TKC07], [BA10]: privacy-preserving error-resilient string searching
  • [GHS10], [HT10]: secure pattern matching
  • [KM10]: secure text processing and CODIS test
  • Similarity of DNA Sequences
  • [JKS08]: secure edit distance and Smith-Waterman scores
  • Other techniques
  • [WWL+09]: secure computation on genomic data at a provider
  • [BKKT08]: identity test, paternity test, and more

9

Image ¡from ¡jonloomer.com ¡

slide-10
SLIDE 10

Challenges

  • Efficiency
  • Do available cryptographic protocols scale to full genomes?
  • Short sequences vs 3-billion protocol input
  • Need domain knowledge to minimize computation
  • Error Resilience
  • Can we use techniques resilient to sequencing errors?
  • More in the paper…
  • Our Goal:
  • Explore techniques viable today
  • Combine efficient cryptographic techniques with genomics domain knowledge

10

Image ¡from ¡zedge.net ¡

slide-11
SLIDE 11

Outline

  • Genomics Background
  • Privacy Concerns
  • Related Work and Challenges
  • Privacy-Preserving Testing of Full Human Genomes
  • Paternity Test
  • Personalized Medicine
  • Compatibility Tests
  • Conclusion

11

slide-12
SLIDE 12

Privacy-Preserving Genetic Paternity Test

  • A Strawman Approach for Paternity Test:
  • On average, ~99.5% of any two human genomes are identical
  • Parents and children have even more similar genomes
  • Compare candidate’s genome with that of the alleged child:
  • Test positive if percentage of matching nucleotides is > 99.5 + τ
  • First-Attempt Privacy-Preserving Protocol:
  • Use an appropriate secure two-party protocol for the comparison
  • PROs: High-accuracy and error resilience
  • CONs: Performance not promising (3 billion symbols in input)
  • In our experiments, computation takes a few days

12

slide-13
SLIDE 13

Privacy-Preserving Genetic Paternity Test (2)

  • Improved Protocol
  • ~99.5% of any two human genomes are identical
  • Why don’t we compare only the remaining 0.5%?

But… We don’t know (yet) where exactly this 0.5% occur!

Using Private Set Intersection Cardinality for privacy-preserving comparison, it would take about 1 hour

13

Image ¡from ¡dna-­‑testing-­‑for-­‑paternity.com ¡

slide-14
SLIDE 14

Private Set Intersection Cardinality (PSI-CA)

Server Client

S = {s1,,sw}

Private Set Intersection Cardinality (PSI-CA)

14

C = {c1,,cv}

S∩C ⊥

slide-15
SLIDE 15

Privacy-Preserving Genetic Paternity Test (3)

  • In-vitro emulation – RFLP-based paternity test
  • Restriction Fragment Length Polymorphism (RFLP) analysis:

a difference between samples of homologous DNA molecules from differing locations of restriction enzyme sites

  • DNA sample is cut into fragments by enzymes
  • Fragments separated according to their lengths by gel electrophoresis
  • Paternity test is positive if enough fragments have the same length
  • RFLP-based PPGPT – Reduction to PSI-CA
  • Participants: “client” (receives the result), “server” (remains oblivious)
  • Public input: , enzymes

, markers

  • Private input: digitized genomes

15

E = {e1,...,ej} M = {mk1,...,mkl}

τ

slide-16
SLIDE 16

Privacy-Preserving RFLP-based Paternity Test

16

Private Set Intersection Cardinality Test Result (#fragments with same length)

slide-17
SLIDE 17

Remarks

17

  • Why compare fragment lengths?
  • Isn’t it more accurate to compare actual contents?
  • In reality, RFLP yields “false positives” with very low probability
  • This approach increases resilience to sequencing errors
  • Performance Evaluation
  • About 1min pre-processing to emulate enzyme digestion process
  • About 10ms computation time on Intel Core i5 with 25 fragments
  • Less than 1s on a smartphone (Nokia N900, 600MHz CPU)
  • Extending to 50 fragments doubles computation time and increases

accuracy by orders of magnitudes

  • Communication overhead: only a few KBs
slide-18
SLIDE 18

Personalized Medicine (PM)

  • Drugs designed for patients’ genetic features
  • Associating drugs with a unique genetic fingerprint
  • Max effectiveness for patients with matching genome
  • Test drug’s “genetic fingerprint” against patient’s genome
  • Examples:
  • tmpt gene – relevant to leukemia
  • (1) G->C mutation in pos. 238 of gene’s c-DNA, or (2) G->A mutation in pos. 460

and one A->G is pos. 419 cause the tpmt disorder (relevant for leukemia patients)

  • hla-B gene – relevant to HIV treatment
  • One G->T mutation (known as hla-B*5701 allelic variant) is associated with

extreme sensitivity to abacavir (HIV drug)

18

Image ¡from: ¡8ieldofscience.com ¡

slide-19
SLIDE 19

Privacy-preserving PM Testing (P3MT)

19

  • Challenges:
  • Patients may refuse to unconditionally release their genomes
  • Or may be sued by their relatives…
  • DNA fingerprint corresponding to a drug may be proprietary:

ü We need privacy-protecting fingerprint matching

  • But we also need to enable FDA approval on the drug/fingerprint

ü We reduce P3MT to Authorized Private Set Intersection (APSI)

slide-20
SLIDE 20

S∩C =

def

sj ∈ S ∃ci ∈ C :ci = sj ∧auth(ci) is valid

{ }

Authorized Private Set Intersection (APSI)

Server Client

S = {s1,,sw} C = {(c1,auth(c1)),,(cv,auth(cv))}

Authorized Private Set Intersection

20

CA

C = {c1,,cv}

S∩C =

def

sj ∈ S ∃ci ∈ C :ci = sj

{ }

slide-21
SLIDE 21

Reducing P3MT to APSI

  • Intuition:
  • FDA acts as CA, Pharmaceutical company as Client, Patient as Server
  • Patient’s private input set:
  • Pharmaceutical company’s input set:
  • Each item in needs to be authorized by FDA

21

fp(D) = bj

* || j

( ) { }

G = (bi ||i) bi ∈ {A,C,G,T}

{ }i=1

3⋅109

fp(D)

Patient

APSI

CA Company

G = (bi ||i)

{ }

fp(D) = bj

* || j

( ) { }

fp(D) = bj

* || j

( ), auth bj

* || j

( )

( ) { }

Test Result

slide-22
SLIDE 22

Epilogue

  • In conclusion:
  • Explored three privacy-sensitive genomic applications
  • Paternity tests, personalized medicine, genetic compatibility testing
  • Unlike prior work, we focused on full genomes

(1) Efficient constructions based on: privacy-preserving operations on private sets (2) Domain knowledge in genomics (and simulation of in-vitro techniques)

  • Lesson learned: Good when two communities connect and attempt to

solve real-world problems

  • Future Work
  • More on private testing of fully-sequenced human genomes
  • Paternity test based on other methods (e.g., STR or SNP)
  • Ancestry testing, certified forensic identification, …
  • Reducing communication overhead
  • Non-human genomes (e.g., crops, animals, …)

22

slide-23
SLIDE 23

Thank you!

Emiliano De Cristofaro

edc@parc.com www.emilianodc.com

23

slide-24
SLIDE 24

Bonus Slides

24

slide-25
SLIDE 25

P3MT – Performance Evaluation

  • Pre-Computation
  • Done once, for all possible tests
  • Patient’s pre-processing of the genome: a few days
  • Optimization:
  • Patient applies reference-based compression techniques
  • Rather than full genome, input all differences with “reference” genome (0.5%)
  • In our experiments, about 3.5 hours
  • Online Computation
  • Depend (linearly) on fingerprint size – typically a few nucleotides
  • 0.82ms for hla-b*5701, 2.46ms for tpmt (in our Intel i5-560M setting)
  • Communication
  • Depends on the size of encrypted genome (about 4GB)

25

slide-26
SLIDE 26

Genetic Compatibility Testing (GCT)

  • The importance of GCT:
  • Predicting whether potential partners are at risk of conceiving a child

with a recessive genetic disease

  • That is, both partners carry at least one gene affected by mutation
  • Examples:
  • Beta-Thalassemia (red cells smaller than average)
  • Due to a mutation in hbb gene
  • “Minor” when mutation occurs only in one chromosome (no severe impact)
  • “Major” results in likely premature death (both chromosomes have the mutation)
  • If both partners have the minor form, their child carry the major variant with 25% prob.
  • Lynch Syndrome (high risk of colon cancer)
  • Parents have a 50% of passing it on to their child
  • Construction, open problems, and performance:
  • In the paper

26