fengjun li 1 yuxin chen 1 bo luo 1 dongwon lee 2 and peng
play

- PowerPoint PPT Presentation

Fengjun Li 1 , Yuxin Chen 1 , Bo Luo 1 , Dongwon Lee 2 , and Peng Liu 2 1 EECS Department, University of Kansas, 2 College of IST, Penn State


  1. ������������������������� ������� Fengjun Li 1 , Yuxin Chen 1 , Bo Luo 1 , Dongwon Lee 2 , and Peng Liu 2 1 EECS Department, University of Kansas, 2 College of IST, Penn State University �

  2. �������������� • Record linkage is to identify related ������� associated with the same entity from multiple databases Citi Bank BOA 3485 9902 8184 8900 8628 9434 7552 7338 7856 4420 8201 8835 6720 4782 7752 4571 8291 7749 4310 2238 5975 4862 1134 1718 6720 4782 7752 4571 7856 4420 8201 8835 5642 7561 0173 2010 4812 6420 1330 7752 4812 6420 1330 7752 5493 4476 2316 7795

  3. ��������������������������������� • Privacy becomes an issue when data is sensitive. – I will only share with you on the “linked records” – I will not give you the plain text of my primary keys. • Secure multi0party set intersection problem – Solutions based on commutative encryption – Solutions based on homomorphic encryption �

  4. �������������� ����������������������� ����� • ����������������������� ���������������������������������� ������������������������������������������������ ��� ���������� ����� � • ������������ [Agrawa et. al., SIGMOD 2003]: … � � � � � � … � � � � � � … � � � � … and … � � � � Alice compares … with to find intersection. � � � � � � … � �

  5. � ������������ !�������"�������������� ����� • ����������������������� ������������������������� ������������������������������ ������������������������������ ������������������ � • ������������ [Freedman et. al., EUROCRYP 2004]: ∏ = − 1. Constructs polynomial � ( � ) ( � � ) � ∑ � = � = α � 2. Computes coefficients in � ( � ) � � = � 0 {r1,r2,…,rm} {s1,s2,…,sn} α α α Encrypt coefficients with homomorphic key: � ( ), � ( ),..., � ( ) 0 1 � ∑ = � � 3. Re(construct encrypted polynomial: = α � � ( � ( � )) � ( � � ) = � 0 4. Evaluate � ( R (sj)) for each element sj 5. Choose random γ and v, and compute � ( γ× R (sj)+v). For each sj � � ∩ � , � ( R (sj))=0, and � ( γ× R (sj)+v)=E(v). � ( γ× R (sj)+v) 6. Decrypt � ( γ× R (sj)+v), and the number of v = | � ∩ � |.

  6. ������������� • Extended from record linkage [On et. al., ICDE 2007] – Records 0> groups of records ? Citi Bank BOA �������� �������� ����������� ��������� 3485 9902 8184 8900 8628 9434 7552 7338 7856 4420 8201 8835 6720 4782 7752 4571 8291 7749 4310 2238 5975 4862 1134 1718 6720 4782 7752 4571 7856 4420 8201 8835 5642 7561 0173 2010 4812 6420 1330 7752 4812 6420 1330 7752 5493 4476 2316 7795 • Group linkage is to identify related ����������������� associated with the same entity from multiple databases

  7. ������������� • For two sets of groups of records � ={R 1 , …, R u } and � ={S 1 , …, S v }, GL calculates ���������������������� ��� (R,S), and determines if � and � are associated with the same entity – For R={r 1 ,…r m } and S={s 1 ,…s n }, calculate ������������� ���������� ��� (r,s) � ��� ( � , � ) is a function of ��� (r,s)

  8. �������������#��$����%���"�����$����� BOA Citi Bank �������� �������� ����������� ��������� 3485 9902 8184 8900 4812 6420 1330 7752 7856 4420 8201 8835 6490 3920 1132 5683 8291 7749 4310 2238 5975 4862 1134 1718 6720 4782 7752 4571 7856 4420 8201 8835 5642 7561 0173 2010 4812 6420 1330 7752 4812 6420 1330 7752 5493 4476 2316 7795 ������������� ��������� 7840 0021 8848 4532 8628 9434 7552 7338 8852 8789 5984 7823 3392 8929 5582 8410 4481 8342 9931 1756 5943 5170 4436 1685 8628 9434 7552 7338 7840 0021 8848 4532 5546 1379 4673 4418 4683 1670 9576 9940 �

  9. �������������#������$������%���"���� �$����� • modeling and representation • modeling and representation = of data and knowledge for of data, metadata, ontologies, scientific domains and processes • querying and analysis of • querying of scientific data scientific data. �

  10. ������%��&���"���'�(������� ���&��� • Two parties share two groups after they confirm both groups are associated with the same entity. • Privacy? – Cannot share “intersect” records when two groups are not linked. Citi Bank BOA No! �������� �������� ����������� ��������� 3485 9902 8184 8900 8628 9434 7552 7338 7856 4420 8201 8835 6720 4722 7732 4577 8291 7749 4310 2238 5975 4862 1134 1718 6720 4782 7752 4571 7856 4420 8201 8835 5642 7561 0173 2010 4812 6420 1330 7752 4812 6420 1330 7752 5493 4476 2316 7795

  11. ��������������������������������� )����* • PPRL protocols can be applied in PPGL – Secure set intersection size – The intersection size can be used to calculate group0level similarity • However, directly applying PPRL protocol suffers from �������������������������� problem ��

  12. ������%��&���"���'�(������� ���&��� � � � – Identities of overlapped group members can be inferred – An attacker can manipulate the group members to infer more

  13. '������������������ – Alice and Bob negotiate a similarity threshold – For each group0wise comparison, Bob answers only “YES” or “NO”, instead of calculated similarity value ��

  14. +"���"����������������������������� ������� • �������������� Alice and Bob preset a threshold θ, and follow the protocol to match two groups � and � . In the end, they learn only | � |, | � |, and a Boolean result � , where � = ���� iff ��� ( � , � ) ≥θ. • We propose three TPPGL protocols for both exact matching and approximate matching – K0combination approach for TPPGL0E – Homomorphic encryption approach for TPPGL0E – TPPGL0A protocol with record0level cut0off

  15. ,����&���������������"�(���+����� � • Alice has a set of groups � ={r 1 ,…,r m }, and Bob has a set of groups � ={s 1 ,…,s n }. They negotiate a similarity threshold θ. • Calculate the ����������������������������������� in � and � for them to be linked + θ  ( � � )  ��� ( � , � ) =k/(| �� +| � |0k) ≥θ, so = �    + θ  1   • We enumerate all �������������� of Alice’s and Bob’s group elements. � and � are linked iff there is at least one identical k0combination. ��

  16. ����� Alice’s group � ={r 1 ,…,r m }, Bob’s group � ={s 1 ,…,s n }, and a pre0negotiate similarity threshold θ ��������� � = � � • Alice creates k(cobinations and sort them: {A1,…, Ap}; � � = � � Bob creates k(combinations and sort them: {B1,…, Bq}; � • Alice applies hash function to obtain: {h(A1),…, h(Ap)}; Bob applies hash function to obtain: {h(B1),…, h(Bq)}; • Alice encrypts hashed k(combinations: {E � (h(A1)),…, E � (h(Ap))}; Bob encrypts hashed k(combinations: {E � (h(B1)),…, E � (h(Bq))} • Alice encrypts {E � (h(B1)),…, E � (h(Bq))} with E � , and compares E � (E � (h(B))) and E � (E � (h(A))) •If the intersection size is greater than 1, group similarity is greater than θ ��!���� Alice and Bob learn | � |, | � |, and if group similarity > θ

  17. ,����&���������������" �$����� +   ( 4 3 ) 0 . 7 = = � 3   + 1 0 . 7  

  18. ,����&���������������"�(���+����� � • Problem? – Computation!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend