Matthias Grossglauser, EPFL CTW 2013 1 4417749 care packages - PowerPoint PPT Presentation

Matthias Grossglauser, EPFL CTW 2013 1

4417749 care packages 2006-03 03-02 09:19:32 4417749 movies for dogs 2006-03 03-02 09:24:14 4417749 blue book 2006-03 03-03 11:48:52 4417749 best dog for older owner 2006-03 03-06 11:48:24 4417749 best dog for older owner 2006-03 03-06 11:48:24 4417749 rescue of older dogs 2006-03 03-06 11:55:25 4417749 school supplies for the iraq children 2006-03 03-06 13:36:33 4417749 school supplies for the iraq children 2006-03 03-06 13:36:33 4417749 pine straw lilburn delivery 2006-03 03-06 18:35:02 4417749 pine straw delivery in in gwinnett county 2006-03 03-06 18:36:35 4417749 landscapers in lilburn ga ga. 2006-03 03-06 18:37:26 4417749 pne straw in lilburn ga ga. 2006-03 03-06 18:38:19 4417749 pine straw in in lilburn ga ga. 2006-03 03-06 18:38:27 4417749 gwinnett county yellow pages 2006-03 03-06 18:42:08 ... anonymized user ID 2

 Search ches es:  “ landscapers in Lilburn, Ga ”  “homes sold in shadow lake subdivision gwinnett county georgia ”  “ jarrett t. arnold ”, “ jack t. arnold ”  441 417749=T 7749=Thel elma Arnold ld  62 years old widow and dog owner  home: Lilburn, GA  AOL press rele lease: e:  “There was no personally identifiable data provided by AOL with those records, but search queries themselves can sometimes include such information.”  Heads had to roll…  AOL CTO Maureen Govern (+2 others) fired 3

 Personall lly identifiable le Name Name Home Home Work Work information (PII): Adam Adam A A EPFL EPFL  “information that can be Barbara Barbara B B EPFL EPFL used to uniquely identify, Carlos Carlos A A UNIL UNIL contact, or locate a single person or can be used with other sources to uniquely identify a single individual” A (wikipedia) B UNIL EPFL 4

 Adversary has:  Anonymized network = unlabeled graph  Side information: subgraph; statistics on certain nodes; noisy version of whole network; … anonymized social network side information Adam Barbara Carlos 5

 Other er appli lica cations:  Find overlap in networks:  Social networks from different domains & time slots  Identify viruses by function-call patterns  Computer vision: matching segment graphs for different viewing angles  … matching nodes peter.muster@epfl.ch 021-693-1233 by structure only 6

Fundamental feasibility w/o side information, but with ∞ time and memory 7

 Is it fundamen entall lly hard or easy to match ch simila lar graphs by structu cture? e?  Fundamen ental =  Information-theoretic: ignore computational & memory cost  Hard: in addition to second graph, no other side information  Demanding: want to match every vertex

 First publi lished ed 1959 59 by Erdös & Rényi  Focus on existence results  Large 𝒐 asymptotics cs and phase transitions  Connectivity  Existence of subgraphs  Giant component  Chromatic number G ( n , p ( n ))  Automorphism group  … Threshold for asymmetry: 𝑞 = log 𝑜 /𝑜 11

Asymmetric Symmetric AuG = 1 AuG = 12 AuG = size of automorphism group 12

Generator 𝐻 = 𝐻(𝑜,𝑞) sampled ( 𝑡 ) not sampled ( 1 − 𝑡 ) “real” social ties phone calls emails 𝑡 measures similarity 13

𝑜! possible mappings! Δ 𝜌 0 = 0 Δ 𝜌 = 2 14

 Assumption:  Attacker has infinite computational power  Can try all possible mappings π and compute edge mismatch function Δ ( π )  Ques estion:  Are there conditions on p, s such that       unique min of ( ) 1 P 0  If yes: adversary would be able to match vertex sets only through the structure of the two networks!  Note: e:  𝐻(𝑜,𝑞; 𝑡) model: statistically uniform, low clustering, degree distribution not skewed -> conjecture: harder than real networks 15

 Theorem em: 𝑜𝑞𝑡 : E[degree] of G 1,2 threshold for aug(G)=1  For the G(n,p;s) matching problem, if 2 s    8 log ( 1 ) nps n  2 s then the identity permutation minimizes Δ (.) a.a.s. Penalty for difference G 1 -G 2 “growing slowly”  Inter erpreta etation: two piece ces of bad/go good news  Surprisingly weak condition: degree growing faster than ~ log 𝑜 enough to break anonymity  Decrease with 𝑡 only quadratic 16

 Fix a particu cula lar map π V π : set of mismatched nodes under π G 1 π є Π 11 G 2 Transposition  invariant edge 17

E π = V x V π : all the edges 𝑜 − 𝑙 nodes modified under π V π : 𝑙 nodes n 1 5 2 4 3 Δ 0 :each edge contributes Δ π :each pair of edges contributes Bernoulli( 2𝑞𝑡(1 − 𝑡) ): Bernoulli( 2𝑞𝑡(1 − 𝑞𝑡) ): sampling errors matching errors 12 13 14 15 23 24 25 34 35 45 1n 2n … 12 13 14 15 23 24 25 34 35 45 1n 2n 18

𝐻(𝑜, 𝑞; 𝑡, 𝑢) matching problem 19 19

 Result: lt:  Dependence on 𝑜 still the same: 𝑜𝑞𝑡 = 𝑑(𝑡, 𝑢) log 𝑜 + 𝜕(1)  Dependence on 𝑡 and 𝑢 less intuitive  Inter erpreta etation:  Node mismatch does not help/hurt too much either 20

Phase transition, and an efficient & tractable matching algorithm… 21

INPUT: Seed map of known pairs Propagate the map to “similar” neighbors on left and right [A. Narayanan, V. Shmatikov, "De-anonymizing social networks“, IEEE Symp. On Security and Privacy, 2009] 22

Similarity metric:  A B  sim ( A , B ) A B 23

Find max sim(u,v) Continue until done… …or blocked 24

 How many seeds are need eded ed?  Is there a phase transition?  How efficien ently ly can we match ch?  Tuning parameter eters? [A. Narayanan, V. Shmatikov, "De-anonymizing social networks“, IEEE Symp. on Security and Privacy, 2009]

𝐻 1 𝐻 2 26

If ≥ 𝑠 matched neighbors  match 𝐻 1 matching error 𝐻 2 27

𝐻(𝑜, 𝑞) 28

P(  )=1 𝑜𝑞 < 1 : consumption > production Extinction prob. of branching process (failure rate) 𝑜𝑞 > 1 : production > consumption P(  )=0 29

Activation from 𝑠 neighbors [S. Janson, T. Luczak, T. Turova, T. Vallier, Bootstrap Percolation on the Random Graph 𝐻(𝑜, 𝑞) , Annals Applied Prob., 22(5), 2012] 30

consumption > production production > consumption P(  )=1 𝑢 𝑑 𝑜𝑞 = 𝜕(1) P(  )=0 𝑏 𝑑 31

 Theorem em: phase transition in # seeds 𝑜 −1 ≪ 𝑞𝑡 ≪ 𝑡𝑜 − 1 2 − 3  For 2𝑠 : 𝑏 𝑏 𝑑 → 𝛽 < 1 ,  If final map is 𝑝(𝑜) w.h.p. 𝑏 𝑏 𝑑 > 𝛽 > 1 ,  If final map is 𝑜 − 𝑝 𝑜 w.h.p.  Seed set size thres eshold ld:  𝑏 𝑑 = 1 − 𝑠 −1 𝑢 𝑑 1/(𝑠−1) 𝑠−1 !  𝑢 𝑑 = 𝑜 𝑞𝑡 2 𝑠 32

 Bootstrap perco cola lation in 𝑯(𝒐, 𝒒) :  # credits of node 𝑗 at time 𝑢 : i.i.d. Binomials  Perco cola lation graph match ching in 𝑯(𝒐, 𝒒; 𝒕)  # credits of pair 𝑗,𝑘 at time 𝑢 : dependent, different Binomials  As long as no matching error so far, increments at 𝑢 𝑗, 𝑗 ~𝐶𝑓𝑠 𝑞𝑡 2 , 𝑗, 𝑘 ~𝐶𝑓𝑠((𝑞𝑡) 2 )  Different:  Dependent: for 𝑗, 𝑗 ′ ,𝑘 all different: 𝑗, 𝑘 + + = 𝑞𝑡 2  𝑄 𝑗, 𝑘 + + 𝑗 ′ , 𝑘 + + = 𝑞𝑡  𝑄 𝐻 𝐻 1 𝐻 2 33

 Approach ch:  Focus on regime where 𝑌 = no bad pair (𝑗,𝑘) get enough credits (𝑠) to be potentially matched 𝑞𝑡 ≪ 𝑜 − 1 2 − 3 2𝑠  True for  Need to choose 𝑠 large enough (sparse graphs: 𝑠 ≥ 4 , otherwise higher)  Conditional on 𝑌 , only need to focus on good pairs (𝑗, 𝑗)  Equivalence with bootstrap problem  does it percolate? 𝑜 −1 ≪ 𝑞𝑡  Need to have 𝑏 > 𝑏 𝑑 large  Need to have seed set size enough 34

How to get started in practice 39

 Ques estion:  Can similar idea inform algorithm design?  Wishli list:  Cold-st start: how to match without seeds?  Sparse se graphs: s: how to avoid blocking?  Error propagation: how to correct mismatches? 40

Fingerprint: Fingerprint: (deg=3, (deg=4, dist(seed1)=3, dist(seed1)=1, seed1 dist(seed2)=1) dist(seed2)=3) Fingerprint: Fingerprint: (deg=3, (deg=1, dist(seed1)=1, dist(seed1)=4, seed2 dist(seed2)=3) dist(seed2)=2)

Matthias Grossglauser, EPFL CTW 2013 1 4417749 care packages - PowerPoint PPT Presentation

Matthias Grossglauser, EPFL CTW 2013 1 4417749 care packages 2006-03 03-02 09:19:32 4417749 movies for dogs 2006-03 03-02 09:24:14 4417749 blue book 2006-03 03-03 11:48:52 4417749 best dog for older owner 2006-03 03-06 11:48:24

Networks out of Control: Models and Methods for Random Networks Matthias Grossglauser Patrick

Fast and Accurate Inference of PlackettLuce Models Lucas Maystre , Matthias Grossglauser LCA

The Player Kernel Lucas Maystre , Victor Kristof, Antonio Gonzlez Ferrer, Matthias Grossglauser

Reformulations in Mathematical Programming Leo Liberti LIX, Ecole Polytechnique, France CTW

Viktor Kuncak EPFL Laboratory for Automated Reasoning and Analysis http://lara.epfl.ch

A Bayesian method for matching two similar graphs without seeds Pedram Pedarsani (EPFL)

Mobility Increases the Capacity of Ad-hoc Wireless Networks Matthias Grossglauser and David Tse

7/8/2013 1 7/8/2013 2 7/8/2013 3 7/8/2013 4 7/8/2013 5 7/8/2013 6 7/8/2013 7 7/8/2013

Public Keys Arjen K. Lenstra (EPFL, Switzerland) James P. Hughes (Self, Palo Alto, USA) Maxime

Launch Hard or Go Home! Predicting the Success of Kickstarter Campaigns Vincent Etter, Matthias

ChoiceRank Identifying Preferences from Node Tra ff ic in Networks Lucas Maystre, Matthias

2013 EPFL Team Project The 2013 EPFL iGEM Team 2 Problem Colorectal cancer: 3 rd most diagnosed

Revised: March 4, 2013 3/19/2013 3/19/2013 2 3/19/2013 3 3/19/2013 4 3/19/2013 5

A Simple 3-Approximation of Minimum Manhattan Networks Bernhard Fuchs and Anna Schulze TU

Webis at the TREC 2012 Session track Matthias Hagen Martin Potthast Matthias Busse Jakob Gomoll

PSAMP Framework Document draft-ietf-psamp-framework-02.txt Duffield, Greenberg, Grossglauser,

3D Dig igitisation Ste Stephen Gray ray & Ki Kirs rsty ty Merr errett UoB oB Res

Virtual ring routing Virtual ring routing Some slides from http://

How to mature a 20 y.o. Scotch Franois Pellegrini EQUIPE PROJET BACCHUS Bordeaux 02/02/2012

CLIC Detectors and Physics Jan Strube CERN on behalf of the CLIC Detector and Physics study

Optimizing Indirect Memory References with milk Vladimir Kiriansky, Yunming Zhang, Saman

A Content-Centric Network for Autonomous Driving Swarun Kumar Lixin Shi, Stephanie Gil, Nabeel

A Bucket Graph Based Labelling Algorithm for the Resource Constrained Shortest Path Problem with

Graph Sparsification Approaches to Scalable Integrated Circuit Modeling and Simulations Zhuo Feng

Sambuz

Useful Links

Newsletter

Mail Us

Matthias Grossglauser, EPFL CTW 2013 1 4417749 care packages - PowerPoint PPT Presentation

Matthias Grossglauser, EPFL CTW 2013 1 4417749 care packages 2006-03 03-02 09:19:32 4417749 movies for dogs 2006-03 03-02 09:24:14 4417749 blue book 2006-03 03-03 11:48:52 4417749 best dog for older owner 2006-03 03-06 11:48:24

Networks out of Control: Models and Methods for Random Networks Matthias Grossglauser Patrick

Fast and Accurate Inference of PlackettLuce Models Lucas Maystre , Matthias Grossglauser LCA

The Player Kernel Lucas Maystre , Victor Kristof, Antonio Gonzlez Ferrer, Matthias Grossglauser

Reformulations in Mathematical Programming Leo Liberti LIX, Ecole Polytechnique, France CTW

Viktor Kuncak EPFL Laboratory for Automated Reasoning and Analysis http://lara.epfl.ch

A Bayesian method for matching two similar graphs without seeds Pedram Pedarsani (EPFL)

Mobility Increases the Capacity of Ad-hoc Wireless Networks Matthias Grossglauser and David Tse

7/8/2013 1 7/8/2013 2 7/8/2013 3 7/8/2013 4 7/8/2013 5 7/8/2013 6 7/8/2013 7 7/8/2013

Public Keys Arjen K. Lenstra (EPFL, Switzerland) James P. Hughes (Self, Palo Alto, USA) Maxime

Launch Hard or Go Home! Predicting the Success of Kickstarter Campaigns Vincent Etter, Matthias

ChoiceRank Identifying Preferences from Node Tra ff ic in Networks Lucas Maystre, Matthias

2013 EPFL Team Project The 2013 EPFL iGEM Team 2 Problem Colorectal cancer: 3 rd most diagnosed

Revised: March 4, 2013 3/19/2013 3/19/2013 2 3/19/2013 3 3/19/2013 4 3/19/2013 5

A Simple 3-Approximation of Minimum Manhattan Networks Bernhard Fuchs and Anna Schulze TU

Webis at the TREC 2012 Session track Matthias Hagen Martin Potthast Matthias Busse Jakob Gomoll

PSAMP Framework Document draft-ietf-psamp-framework-02.txt Duffield, Greenberg, Grossglauser,

3D Dig igitisation Ste Stephen Gray ray &amp; Ki Kirs rsty ty Merr errett UoB oB Res

Virtual ring routing Virtual ring routing Some slides from http://

How to mature a 20 y.o. Scotch Franois Pellegrini EQUIPE PROJET BACCHUS Bordeaux 02/02/2012

CLIC Detectors and Physics Jan Strube CERN on behalf of the CLIC Detector and Physics study

Optimizing Indirect Memory References with milk Vladimir Kiriansky, Yunming Zhang, Saman

A Content-Centric Network for Autonomous Driving Swarun Kumar Lixin Shi, Stephanie Gil, Nabeel

A Bucket Graph Based Labelling Algorithm for the Resource Constrained Shortest Path Problem with

Graph Sparsification Approaches to Scalable Integrated Circuit Modeling and Simulations Zhuo Feng

Sambuz

Useful Links

Newsletter

Mail Us

3D Dig igitisation Ste Stephen Gray ray & Ki Kirs rsty ty Merr errett UoB oB Res