matthias grossglauser epfl ctw 2013

Matthias Grossglauser, EPFL CTW 2013 1 4417749 care packages - PowerPoint PPT Presentation

Matthias Grossglauser, EPFL CTW 2013 1 4417749 care packages 2006-03 03-02 09:19:32 4417749 movies for dogs 2006-03 03-02 09:24:14 4417749 blue book 2006-03 03-03 11:48:52 4417749 best dog for older owner 2006-03 03-06 11:48:24


  1. Matthias Grossglauser, EPFL CTW 2013 1

  2. 4417749 care packages 2006-03 03-02 09:19:32 4417749 movies for dogs 2006-03 03-02 09:24:14 4417749 blue book 2006-03 03-03 11:48:52 4417749 best dog for older owner 2006-03 03-06 11:48:24 4417749 best dog for older owner 2006-03 03-06 11:48:24 4417749 rescue of older dogs 2006-03 03-06 11:55:25 4417749 school supplies for the iraq children 2006-03 03-06 13:36:33 4417749 school supplies for the iraq children 2006-03 03-06 13:36:33 4417749 pine straw lilburn delivery 2006-03 03-06 18:35:02 4417749 pine straw delivery in in gwinnett county 2006-03 03-06 18:36:35 4417749 landscapers in lilburn ga ga. 2006-03 03-06 18:37:26 4417749 pne straw in lilburn ga ga. 2006-03 03-06 18:38:19 4417749 pine straw in in lilburn ga ga. 2006-03 03-06 18:38:27 4417749 gwinnett county yellow pages 2006-03 03-06 18:42:08 ... anonymized user ID 2

  3.  Search ches es:  “ landscapers in Lilburn, Ga ”  “homes sold in shadow lake subdivision gwinnett county georgia ”  “ jarrett t. arnold ”, “ jack t. arnold ”  441 417749=T 7749=Thel elma Arnold ld  62 years old widow and dog owner  home: Lilburn, GA  AOL press rele lease: e:  “There was no personally identifiable data provided by AOL with those records, but search queries themselves can sometimes include such information.”  Heads had to roll…  AOL CTO Maureen Govern (+2 others) fired 3

  4.  Personall lly identifiable le Name Name Home Home Work Work information (PII): Adam Adam A A EPFL EPFL  “information that can be Barbara Barbara B B EPFL EPFL used to uniquely identify, Carlos Carlos A A UNIL UNIL contact, or locate a single person or can be used with other sources to uniquely identify a single individual” A (wikipedia) B UNIL EPFL 4

  5.  Adversary has:  Anonymized network = unlabeled graph  Side information: subgraph; statistics on certain nodes; noisy version of whole network; … anonymized social network side information Adam Barbara Carlos 5

  6.  Other er appli lica cations:  Find overlap in networks:  Social networks from different domains & time slots  Identify viruses by function-call patterns  Computer vision: matching segment graphs for different viewing angles  … matching nodes peter.muster@epfl.ch 021-693-1233 by structure only 6

  7. Fundamental feasibility w/o side information, but with ∞ time and memory 7

  8. 8

  9. 9

  10.  Is it fundamen entall lly hard or easy to match ch simila lar graphs by structu cture? e?  Fundamen ental =  Information-theoretic: ignore computational & memory cost  Hard: in addition to second graph, no other side information  Demanding: want to match every vertex

  11.  First publi lished ed 1959 59 by Erdös & Rényi  Focus on existence results  Large 𝒐 asymptotics cs and phase transitions  Connectivity  Existence of subgraphs  Giant component  Chromatic number G ( n , p ( n ))  Automorphism group  … Threshold for asymmetry: 𝑞 = log 𝑜 /𝑜 11

  12. Asymmetric Symmetric AuG = 1 AuG = 12 AuG = size of automorphism group 12

  13. Generator 𝐻 = 𝐻(𝑜,𝑞) sampled ( 𝑡 ) not sampled ( 1 − 𝑡 ) “real” social ties phone calls emails 𝑡 measures similarity 13

  14. 𝑜! possible mappings! Δ 𝜌 0 = 0 Δ 𝜌 = 2 14

  15.  Assumption:  Attacker has infinite computational power  Can try all possible mappings π and compute edge mismatch function Δ ( π )  Ques estion:  Are there conditions on p, s such that       unique min of ( ) 1 P 0  If yes: adversary would be able to match vertex sets only through the structure of the two networks!  Note: e:  𝐻(𝑜,𝑞; 𝑡) model: statistically uniform, low clustering, degree distribution not skewed -> conjecture: harder than real networks 15

  16.  Theorem em: 𝑜𝑞𝑡 : E[degree] of G 1,2 threshold for aug(G)=1  For the G(n,p;s) matching problem, if 2 s    8 log ( 1 ) nps n  2 s then the identity permutation minimizes Δ (.) a.a.s. Penalty for difference G 1 -G 2 “growing slowly”  Inter erpreta etation: two piece ces of bad/go good news  Surprisingly weak condition: degree growing faster than ~ log 𝑜 enough to break anonymity  Decrease with 𝑡 only quadratic 16

  17.  Fix a particu cula lar map π V π : set of mismatched nodes under π G 1 π є Π 11 G 2 Transposition  invariant edge 17

  18. E π = V x V π : all the edges 𝑜 − 𝑙 nodes modified under π V π : 𝑙 nodes n 1 5 2 4 3 Δ 0 :each edge contributes Δ π :each pair of edges contributes Bernoulli( 2𝑞𝑡(1 − 𝑡) ): Bernoulli( 2𝑞𝑡(1 − 𝑞𝑡) ): sampling errors matching errors 12 13 14 15 23 24 25 34 35 45 1n 2n … 12 13 14 15 23 24 25 34 35 45 1n 2n 18

  19. 𝐻(𝑜, 𝑞; 𝑡, 𝑢) matching problem 19 19

  20.  Result: lt:  Dependence on 𝑜 still the same: 𝑜𝑞𝑡 = 𝑑(𝑡, 𝑢) log 𝑜 + 𝜕(1)  Dependence on 𝑡 and 𝑢 less intuitive  Inter erpreta etation:  Node mismatch does not help/hurt too much either 20

  21. Phase transition, and an efficient & tractable matching algorithm… 21

  22. INPUT: Seed map of known pairs Propagate the map to “similar” neighbors on left and right [A. Narayanan, V. Shmatikov, "De-anonymizing social networks“, IEEE Symp. On Security and Privacy, 2009] 22

  23. Similarity metric:  A B  sim ( A , B ) A B 23

  24. Find max sim(u,v) Continue until done… …or blocked 24

  25.  How many seeds are need eded ed?  Is there a phase transition?  How efficien ently ly can we match ch?  Tuning parameter eters? [A. Narayanan, V. Shmatikov, "De-anonymizing social networks“, IEEE Symp. on Security and Privacy, 2009]

  26. 𝐻 1 𝐻 2 26

  27. If ≥ 𝑠 matched neighbors  match 𝐻 1 matching error 𝐻 2 27

  28. 𝐻(𝑜, 𝑞) 28

  29. P(  )=1 𝑜𝑞 < 1 : consumption > production Extinction prob. of branching process (failure rate) 𝑜𝑞 > 1 : production > consumption P(  )=0 29

  30. Activation from 𝑠 neighbors [S. Janson, T. Luczak, T. Turova, T. Vallier, Bootstrap Percolation on the Random Graph 𝐻(𝑜, 𝑞) , Annals Applied Prob., 22(5), 2012] 30

  31. consumption > production production > consumption P(  )=1 𝑢 𝑑 𝑜𝑞 = 𝜕(1) P(  )=0 𝑏 𝑑 31

  32.  Theorem em: phase transition in # seeds 𝑜 −1 ≪ 𝑞𝑡 ≪ 𝑡𝑜 − 1 2 − 3  For 2𝑠 : 𝑏 𝑏 𝑑 → 𝛽 < 1 ,  If final map is 𝑝(𝑜) w.h.p. 𝑏 𝑏 𝑑 > 𝛽 > 1 ,  If final map is 𝑜 − 𝑝 𝑜 w.h.p.  Seed set size thres eshold ld:  𝑏 𝑑 = 1 − 𝑠 −1 𝑢 𝑑 1/(𝑠−1) 𝑠−1 !  𝑢 𝑑 = 𝑜 𝑞𝑡 2 𝑠 32

  33.  Bootstrap perco cola lation in 𝑯(𝒐, 𝒒) :  # credits of node 𝑗 at time 𝑢 : i.i.d. Binomials  Perco cola lation graph match ching in 𝑯(𝒐, 𝒒; 𝒕)  # credits of pair 𝑗,𝑘 at time 𝑢 : dependent, different Binomials  As long as no matching error so far, increments at 𝑢 𝑗, 𝑗 ~𝐶𝑓𝑠 𝑞𝑡 2 , 𝑗, 𝑘 ~𝐶𝑓𝑠((𝑞𝑡) 2 )  Different:  Dependent: for 𝑗, 𝑗 ′ ,𝑘 all different: 𝑗, 𝑘 + + = 𝑞𝑡 2  𝑄 𝑗, 𝑘 + + 𝑗 ′ , 𝑘 + + = 𝑞𝑡  𝑄 𝐻 𝐻 1 𝐻 2 33

  34.  Approach ch:  Focus on regime where 𝑌 = no bad pair (𝑗,𝑘) get enough credits (𝑠) to be potentially matched 𝑞𝑡 ≪ 𝑜 − 1 2 − 3 2𝑠  True for  Need to choose 𝑠 large enough (sparse graphs: 𝑠 ≥ 4 , otherwise higher)  Conditional on 𝑌 , only need to focus on good pairs (𝑗, 𝑗)  Equivalence with bootstrap problem  does it percolate? 𝑜 −1 ≪ 𝑞𝑡  Need to have 𝑏 > 𝑏 𝑑 large  Need to have seed set size enough 34

  35. 35

  36. 36

  37. 37

  38. 38

  39. How to get started in practice 39

  40.  Ques estion:  Can similar idea inform algorithm design?  Wishli list:  Cold-st start: how to match without seeds?  Sparse se graphs: s: how to avoid blocking?  Error propagation: how to correct mismatches? 40

  41. Fingerprint: Fingerprint: (deg=3, (deg=4, dist(seed1)=3, dist(seed1)=1, seed1 dist(seed2)=1) dist(seed2)=3) Fingerprint: Fingerprint: (deg=3, (deg=1, dist(seed1)=1, dist(seed1)=4, seed2 dist(seed2)=3) dist(seed2)=2)

Recommend


More recommend