matthias grossglauser epfl ctw 2013
play

Matthias Grossglauser, EPFL CTW 2013 1 4417749 care packages - PowerPoint PPT Presentation

Matthias Grossglauser, EPFL CTW 2013 1 4417749 care packages 2006-03 03-02 09:19:32 4417749 movies for dogs 2006-03 03-02 09:24:14 4417749 blue book 2006-03 03-03 11:48:52 4417749 best dog for older owner 2006-03 03-06 11:48:24


  1. Matthias Grossglauser, EPFL CTW 2013 1

  2. 4417749 care packages 2006-03 03-02 09:19:32 4417749 movies for dogs 2006-03 03-02 09:24:14 4417749 blue book 2006-03 03-03 11:48:52 4417749 best dog for older owner 2006-03 03-06 11:48:24 4417749 best dog for older owner 2006-03 03-06 11:48:24 4417749 rescue of older dogs 2006-03 03-06 11:55:25 4417749 school supplies for the iraq children 2006-03 03-06 13:36:33 4417749 school supplies for the iraq children 2006-03 03-06 13:36:33 4417749 pine straw lilburn delivery 2006-03 03-06 18:35:02 4417749 pine straw delivery in in gwinnett county 2006-03 03-06 18:36:35 4417749 landscapers in lilburn ga ga. 2006-03 03-06 18:37:26 4417749 pne straw in lilburn ga ga. 2006-03 03-06 18:38:19 4417749 pine straw in in lilburn ga ga. 2006-03 03-06 18:38:27 4417749 gwinnett county yellow pages 2006-03 03-06 18:42:08 ... anonymized user ID 2

  3.  Search ches es:  “ landscapers in Lilburn, Ga ”  “homes sold in shadow lake subdivision gwinnett county georgia ”  “ jarrett t. arnold ”, “ jack t. arnold ”  441 417749=T 7749=Thel elma Arnold ld  62 years old widow and dog owner  home: Lilburn, GA  AOL press rele lease: e:  “There was no personally identifiable data provided by AOL with those records, but search queries themselves can sometimes include such information.”  Heads had to roll…  AOL CTO Maureen Govern (+2 others) fired 3

  4.  Personall lly identifiable le Name Name Home Home Work Work information (PII): Adam Adam A A EPFL EPFL  “information that can be Barbara Barbara B B EPFL EPFL used to uniquely identify, Carlos Carlos A A UNIL UNIL contact, or locate a single person or can be used with other sources to uniquely identify a single individual” A (wikipedia) B UNIL EPFL 4

  5.  Adversary has:  Anonymized network = unlabeled graph  Side information: subgraph; statistics on certain nodes; noisy version of whole network; … anonymized social network side information Adam Barbara Carlos 5

  6.  Other er appli lica cations:  Find overlap in networks:  Social networks from different domains & time slots  Identify viruses by function-call patterns  Computer vision: matching segment graphs for different viewing angles  … matching nodes peter.muster@epfl.ch 021-693-1233 by structure only 6

  7. Fundamental feasibility w/o side information, but with ∞ time and memory 7

  8. 8

  9. 9

  10.  Is it fundamen entall lly hard or easy to match ch simila lar graphs by structu cture? e?  Fundamen ental =  Information-theoretic: ignore computational & memory cost  Hard: in addition to second graph, no other side information  Demanding: want to match every vertex

  11.  First publi lished ed 1959 59 by Erdös & Rényi  Focus on existence results  Large 𝒐 asymptotics cs and phase transitions  Connectivity  Existence of subgraphs  Giant component  Chromatic number G ( n , p ( n ))  Automorphism group  … Threshold for asymmetry: 𝑞 = log 𝑜 /𝑜 11

  12. Asymmetric Symmetric AuG = 1 AuG = 12 AuG = size of automorphism group 12

  13. Generator 𝐻 = 𝐻(𝑜,𝑞) sampled ( 𝑡 ) not sampled ( 1 − 𝑡 ) “real” social ties phone calls emails 𝑡 measures similarity 13

  14. 𝑜! possible mappings! Δ 𝜌 0 = 0 Δ 𝜌 = 2 14

  15.  Assumption:  Attacker has infinite computational power  Can try all possible mappings π and compute edge mismatch function Δ ( π )  Ques estion:  Are there conditions on p, s such that       unique min of ( ) 1 P 0  If yes: adversary would be able to match vertex sets only through the structure of the two networks!  Note: e:  𝐻(𝑜,𝑞; 𝑡) model: statistically uniform, low clustering, degree distribution not skewed -> conjecture: harder than real networks 15

  16.  Theorem em: 𝑜𝑞𝑡 : E[degree] of G 1,2 threshold for aug(G)=1  For the G(n,p;s) matching problem, if 2 s    8 log ( 1 ) nps n  2 s then the identity permutation minimizes Δ (.) a.a.s. Penalty for difference G 1 -G 2 “growing slowly”  Inter erpreta etation: two piece ces of bad/go good news  Surprisingly weak condition: degree growing faster than ~ log 𝑜 enough to break anonymity  Decrease with 𝑡 only quadratic 16

  17.  Fix a particu cula lar map π V π : set of mismatched nodes under π G 1 π є Π 11 G 2 Transposition  invariant edge 17

  18. E π = V x V π : all the edges 𝑜 − 𝑙 nodes modified under π V π : 𝑙 nodes n 1 5 2 4 3 Δ 0 :each edge contributes Δ π :each pair of edges contributes Bernoulli( 2𝑞𝑡(1 − 𝑡) ): Bernoulli( 2𝑞𝑡(1 − 𝑞𝑡) ): sampling errors matching errors 12 13 14 15 23 24 25 34 35 45 1n 2n … 12 13 14 15 23 24 25 34 35 45 1n 2n 18

  19. 𝐻(𝑜, 𝑞; 𝑡, 𝑢) matching problem 19 19

  20.  Result: lt:  Dependence on 𝑜 still the same: 𝑜𝑞𝑡 = 𝑑(𝑡, 𝑢) log 𝑜 + 𝜕(1)  Dependence on 𝑡 and 𝑢 less intuitive  Inter erpreta etation:  Node mismatch does not help/hurt too much either 20

  21. Phase transition, and an efficient & tractable matching algorithm… 21

  22. INPUT: Seed map of known pairs Propagate the map to “similar” neighbors on left and right [A. Narayanan, V. Shmatikov, "De-anonymizing social networks“, IEEE Symp. On Security and Privacy, 2009] 22

  23. Similarity metric:  A B  sim ( A , B ) A B 23

  24. Find max sim(u,v) Continue until done… …or blocked 24

  25.  How many seeds are need eded ed?  Is there a phase transition?  How efficien ently ly can we match ch?  Tuning parameter eters? [A. Narayanan, V. Shmatikov, "De-anonymizing social networks“, IEEE Symp. on Security and Privacy, 2009]

  26. 𝐻 1 𝐻 2 26

  27. If ≥ 𝑠 matched neighbors  match 𝐻 1 matching error 𝐻 2 27

  28. 𝐻(𝑜, 𝑞) 28

  29. P(  )=1 𝑜𝑞 < 1 : consumption > production Extinction prob. of branching process (failure rate) 𝑜𝑞 > 1 : production > consumption P(  )=0 29

  30. Activation from 𝑠 neighbors [S. Janson, T. Luczak, T. Turova, T. Vallier, Bootstrap Percolation on the Random Graph 𝐻(𝑜, 𝑞) , Annals Applied Prob., 22(5), 2012] 30

  31. consumption > production production > consumption P(  )=1 𝑢 𝑑 𝑜𝑞 = 𝜕(1) P(  )=0 𝑏 𝑑 31

  32.  Theorem em: phase transition in # seeds 𝑜 −1 ≪ 𝑞𝑡 ≪ 𝑡𝑜 − 1 2 − 3  For 2𝑠 : 𝑏 𝑏 𝑑 → 𝛽 < 1 ,  If final map is 𝑝(𝑜) w.h.p. 𝑏 𝑏 𝑑 > 𝛽 > 1 ,  If final map is 𝑜 − 𝑝 𝑜 w.h.p.  Seed set size thres eshold ld:  𝑏 𝑑 = 1 − 𝑠 −1 𝑢 𝑑 1/(𝑠−1) 𝑠−1 !  𝑢 𝑑 = 𝑜 𝑞𝑡 2 𝑠 32

  33.  Bootstrap perco cola lation in 𝑯(𝒐, 𝒒) :  # credits of node 𝑗 at time 𝑢 : i.i.d. Binomials  Perco cola lation graph match ching in 𝑯(𝒐, 𝒒; 𝒕)  # credits of pair 𝑗,𝑘 at time 𝑢 : dependent, different Binomials  As long as no matching error so far, increments at 𝑢 𝑗, 𝑗 ~𝐶𝑓𝑠 𝑞𝑡 2 , 𝑗, 𝑘 ~𝐶𝑓𝑠((𝑞𝑡) 2 )  Different:  Dependent: for 𝑗, 𝑗 ′ ,𝑘 all different: 𝑗, 𝑘 + + = 𝑞𝑡 2  𝑄 𝑗, 𝑘 + + 𝑗 ′ , 𝑘 + + = 𝑞𝑡  𝑄 𝐻 𝐻 1 𝐻 2 33

  34.  Approach ch:  Focus on regime where 𝑌 = no bad pair (𝑗,𝑘) get enough credits (𝑠) to be potentially matched 𝑞𝑡 ≪ 𝑜 − 1 2 − 3 2𝑠  True for  Need to choose 𝑠 large enough (sparse graphs: 𝑠 ≥ 4 , otherwise higher)  Conditional on 𝑌 , only need to focus on good pairs (𝑗, 𝑗)  Equivalence with bootstrap problem  does it percolate? 𝑜 −1 ≪ 𝑞𝑡  Need to have 𝑏 > 𝑏 𝑑 large  Need to have seed set size enough 34

  35. 35

  36. 36

  37. 37

  38. 38

  39. How to get started in practice 39

  40.  Ques estion:  Can similar idea inform algorithm design?  Wishli list:  Cold-st start: how to match without seeds?  Sparse se graphs: s: how to avoid blocking?  Error propagation: how to correct mismatches? 40

  41. Fingerprint: Fingerprint: (deg=3, (deg=4, dist(seed1)=3, dist(seed1)=1, seed1 dist(seed2)=1) dist(seed2)=3) Fingerprint: Fingerprint: (deg=3, (deg=1, dist(seed1)=1, dist(seed1)=4, seed2 dist(seed2)=3) dist(seed2)=2)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend