genomic analysis
play

Genomic Analysis Hoon Cho (MIT) and David Wu (Stanford) March, 2015 - PowerPoint PPT Presentation

Homomorphic Encryption for Genomic Analysis Hoon Cho (MIT) and David Wu (Stanford) March, 2015 Homomorphic Encryption Homomorphic encryption (HE): encryption schemes that support computation on ciphertexts Consists of three functions: m c


  1. Homomorphic Encryption for Genomic Analysis Hoon Cho (MIT) and David Wu (Stanford) March, 2015

  2. Homomorphic Encryption Homomorphic encryption (HE): encryption schemes that support computation on ciphertexts Consists of three functions: m c c m Enc Dec pk sk Must satisfy usual notion of semantic security

  3. Homomorphic Encryption Homomorphic encryption: encryption schemes that support computation on ciphertexts Consists of three functions: 𝑑 1 = Enc π‘žπ‘™ (𝑛 1 ) 𝑑 3 Eval 𝑔 𝑑 2 = Enc π‘žπ‘™ (𝑛 2 ) 𝑓𝑙 Dec 𝑑𝑙 Evaπ‘š 𝑔 𝑓𝑙, 𝑑 1 , 𝑑 2 = 𝑔 𝑛 1 , 𝑛 2

  4. Fully Homomorphic Encryption (FHE) Many homomorphic encryption schemes: β€’ ElGamal: 𝑔 𝑛 0 , 𝑛 1 = 𝑛 0 𝑛 1 β€’ Paillier: 𝑔 𝑛 0 , 𝑛 1 = 𝑛 0 + 𝑛 1 Fully homomorphic encryption: homomorphic with respect to two operations: addition and multiplication β€’ [BGN05]: one multiplication, many additions (SWHE) β€’ [Gen09]: first FHE construction from lattices

  5. Task 1: Computing GWAS Genotypes for different AA AG AA AG GG Case: individuals at a fixed location AG AG GA GG GG Control: in the genome allele counts Minor Allele Frequency: min π‘œ 𝐡 ,π‘œ 𝐻 π‘œ 𝐡 +π‘œ 𝐻 Observed (Obs) and expected (Exp) are πœ“ 2 -statistic: πœ“ 2 = βˆ‘ Obsβˆ’Exp 2 functions of the different allele counts in Exp the case and control groups

  6. Limitations of FHE In theory: SWHE/FHE can evaluate arbitrary functions But many limitations in practice: β€’ Computation must be expressed as an arithmetic circuit: thus, division is hard β€’ Performance degrades rapidly in multiplicative depth of circuit

  7. Striking a Balance Observation : allele min π‘œ 𝐡 ,π‘œ 𝐻 Minor Allele Frequency: π‘œ 𝐡 +π‘œ 𝐻 counts are sufficient for computing MAF and πœ“ 2 Obsβˆ’Exp 2 πœ“ 2 -statistic: πœ“ 2 = βˆ‘ Exp Solution : delegate aggregation to the cloud, client computes the statistical quantities of interest

  8. Practical Outsourcing Solution : delegate aggregation to the cloud, client computes the statistical quantities of interest Solution enables use of symmetric primitives (e.g., AES) Symmetric primitives + arithmetic faster than public key decryption

  9. Symmetric Encryption π‘œ 𝐡 π‘œ 𝐷 π‘œ 𝐻 π‘œ π‘ˆ each genotype encode 2 0 0 0 AA represented as a vector of counts blind 2 + 𝑠 0 + 𝑠 0 + 𝑠 0 + 𝑠 π‘ˆ 𝐡 𝐷 𝐻 encrypt entries by adding independent, blinding factors from β„€ π‘œ

  10. Symmetric Encryption AA 2 + 𝑠 0 + 𝑠 0 + 𝑠 0 + 𝑠 π‘ˆ 𝐡 𝐷 𝐻 AG β€² β€² β€² β€² 1 + 𝑠 0 + 𝑠 1 + 𝑠 0 + 𝑠 π‘ˆ 𝐡 𝐷 𝐻 β€² β€² β€² β€² Sum 3 + 𝑠 𝐡 + 𝑠 0 + 𝑠 𝑑 + 𝑠 1 + 𝑠 𝐻 + 𝑠 0 + 𝑠 π‘ˆ + 𝑠 π‘ˆ 𝐡 𝐷 𝐻 decryption: compute blinding factors and subtract

  11. Symmetric Encryption generate blinding factors using PRF(𝑙, tag) tag: SNP id ǁ group id ǁ subject id AA 2 + 𝑠 0 + 𝑠 0 + 𝑠 0 + 𝑠 π‘ˆ 𝐡 𝐷 𝐻

  12. Symmetric Encryption Homomorphic operations consist of only additions Encryption and decryption are symmetric primitives

  13. Further Improvements Client must do linear work to decrypt β€’ Alternative: if the data comes in batches, the client can precompute the counts per batch during encryption β€’ Decryption time proportional to number of batches

  14. Performance Timing (in seconds) for computing MAF + πœ“ 2 statistics (500 subjects) # SNPs Encryption Aggregation Decryption 100 0.17 0.02 0.15 1,000 1.68 0.17 1.42 10,000 17.47 1.59 15.06 100,000 179.53 17.72 145.52 Only a few hundred lines to implement!

  15. Task 2: Hamming Distance Computation location of edit edit chr1:101088593: (C οƒ  T) chr1:100011666: (T οƒ  C) chr1:101265309: (C οƒ  T) chr1:101265309: (C οƒ  T) chr1:10165300: (T οƒ  G) chr1:10165300: (T οƒ  C) and so on… and so on… compute the Hamming distance between two sequences (represented as edits with respect to a reference genome)

  16. Task 2: Hamming Distance Computation chr1:101088593: (C οƒ  T) chr1:101265309: (C οƒ  T) ATGCTTA GTGGC… chr1:10165300: (T οƒ  G) and so on… chr1:100011666: (T οƒ  C) chr1:101265309: (C οƒ  T) ACGCTTG GTGGC… chr1:10165300: (T οƒ  C) and so on… naΓ―ve method: expand sequences, pairwise equality test

  17. Task 2: Hamming Distance Computation chr1:101088593: (C οƒ  T) chr1:101265309: (C οƒ  T) ATGCTTAGTGGC… chr1:10165300: (T οƒ  G) and so on… sequences too long: over 3 billion base pairs in human genome desire: protocol with performance proportional to number of edits

  18. Task 2: Hamming Distance Computation chr1:101088593: (C οƒ  T) chr1:100011666: (T οƒ  C) chr1:101265309: (C οƒ  T) chr1:101265309: (C οƒ  T) chr1:10165300: (T οƒ  G) chr1:10165300: (T οƒ  C) and so on… and so on… Genome A Genome B view genomes as sets of edits from reference: 𝑒 𝐼 𝐡, 𝐢 = 𝐡 + 𝐢 βˆ’ 2 β‹… 𝐡 ∩ 𝐢

  19. Task 2: Hamming Distance Computation Problem reduces to set intersection: 𝑒 𝐼 𝐡, 𝐢 = 𝐡 + 𝐢 βˆ’ 2 β‹… 𝐡 ∩ 𝐢 Slight caveat: same location, different chr1:10165300: (T οƒ  G) edit: contribution to Hamming distance chr1:10165300: (T οƒ  C) should be 1

  20. Task 2: Hamming Distance Computation Formulate as two set intersection problems: 𝑒 𝐼 𝐡, 𝐢 = 𝐡 + 𝐢 βˆ’ 𝐡 ∩ 𝐢 βˆ’ 𝐡 loc ∩ 𝐢 loc locations location, only edit pairs

  21. Homomorphic Set Intersection chr1:101088593: (C οƒ  T) chr1:100011666: (T οƒ  C) chr1:101265309: (C οƒ  T) chr1:101265309: (C οƒ  T) chr1:10165300: (T οƒ  G) chr1:10165300: (T οƒ  C) and so on… and so on… Equality function: 𝑔 𝑦, 𝑧 = 𝟐 𝑦 = 𝑧 Simple solution: sum over pairwise equality tests

  22. Homomorphic Set Intersection Homomorphic evaluation of equality function: If 𝑦, 𝑧 ∈ 0,1 , 𝑔 𝑦, 𝑧 = 𝟐 𝑦 = 𝑧 = 1 βˆ’ 𝑦 βˆ’ 𝑧 2 Easy to generalize to π‘œ bit integers, but requires degree 2π‘œ homomorphism

  23. Homomorphic Set Intersection Hashing to decrease number of pairwise comparisons hashing chr1:100011666: (T οƒ  C) chr1:101088593: (C οƒ  T) chr1:101265309: (C οƒ  T) chr1:101265309: (C οƒ  T) equality chr1:10165300: (T οƒ  G) chr1:10165300: (T οƒ  C) test and so on… and so on… hash elements into buckets, pairwise equality test on hashed values within buckets

  24. Homomorphic Set Intersection: Tradeoffs More buckets οƒ  lower collision rate, possibly more ciphertexts chr1:101088593: (C οƒ  T) chr1:101265309: (C οƒ  T) chr1:10165300: (T οƒ  G) and so on… More bits οƒ  lower collision rate, more homomorphism for equality test Tunable parameters: β€’ number of buckets Larger buckets οƒ  less likely that β€’ bits used to represent each bucket overflows element in a bucket β€’ bucket size

  25. Performance Timing (in seconds) for homomorphic set intersection using HELib: Key Size of Sets Hashing Encryption Computation Encryption Generation 1,000 23.80 0.007 31.97 104.16 1.78 5,000 23.36 0.025 95.38 475.37 1.78 10,000 27.14 0.093 176.50 936.64 1.91 Primary drawback: key sizes + ciphertext sizes very large (several hundred MB to just over 1 GB)

  26. Conclusions Task 1: Most efficient solution is to compute counts – symmetric primitives suffice Task 2: Hashing-based homomorphic set intersection can handle edit-sets with up to ten thousand elements, but with large parameter sizes

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend