Community-Preserving Generalization of Social Networks Jordi - PowerPoint PPT Presentation

Introduction Preliminary concepts Graph Generalization Algorithm Experimental Set Up Results Conclusions Community-Preserving Generalization of Social Networks Jordi Casas-Roma 1 and Fran¸ cois Rousseau 2 1 Universitat Oberta de Catalunya, Barcelona, Spain jcasasr@uoc.edu 2 ´ Ecole Polytechnique, Palaiseau, France rousseau@lix.polytechnique.fr SoMeRis ’15, Paris, August 25, 2015

Introduction Preliminary concepts Graph Generalization Algorithm Experimental Set Up Results Conclusions Overview Introduction 1 Preliminary concepts 2 Graph Generalization Algorithm 3 Experimental Set Up 4 Results 5 Conclusions 6

Introduction Preliminary concepts Graph Generalization Algorithm Experimental Set Up Results Conclusions Introduction Scenario Release data to third parties Preserve the privacy of users

Introduction Preliminary concepts Graph Generalization Algorithm Experimental Set Up Results Conclusions Simple Anonymization Simple anonymization does not work! User Dan can be re-identified using his structural properties. Figure 1 : Original network Figure 2 : Simple anonymization Amy 1 2 3 4 Tim Bob Lis 5 6 7 Ann Dan Tom 8 9 Eva Joe Figure 3 : Dan’s 1-neighbourhood Figure 4 : Dan is re-identified 2 3 1 2 3 4 6 5 6 7 8 9 8 9

Introduction Preliminary concepts Graph Generalization Algorithm Experimental Set Up Results Conclusions Anonymization methods Goals Introduce noise to hinder the re-identification processes. Adding/removing edges. Adding fake nodes. Grouping nodes into clusters. . . . Preserve user’s privacy vs. Maximize data utility (minimize information loss). Figure 5 : Dan’s 1-neighbourhood Figure 6 : Noise added 2 3 1 2 3 4 6 5 6 7 8 9 8 9

Introduction Preliminary concepts Graph Generalization Algorithm Experimental Set Up Results Conclusions Anonymization methods Basic approaches for anonymity on networks: Network modification approaches consists on modifying (adding and/or deleting) edges or vertices in a network. Randomization k -anonymity model Clustering-based approaches (also known as generalization) consist on cluster vertices and edges into groups and anonymize a sub-network into a super-vertex in order to publish the aggregate information about structural properties. Differentially private approaches guarantee that individuals are protected under the definition of differential privacy, which imposes a guarantee on the data release mechanism rather than on the data itself. The goal is to provide statistical information about the data while preserving the privacy of users.

Introduction Preliminary concepts Graph Generalization Algorithm Experimental Set Up Results Conclusions Graph degeneracy and k-shell k-Core Let k be an integer. A subgraph H k = ( V ′ , E ′ ), induced by the subset of vertices V ′ ⊆ V (and a fortiori by the subset of edges E ′ ⊆ E ), is called a k-core if and only if ∀ v i ∈ V ′ , deg H k ( v i ) ≥ k and H k is the maximal subgraph with this property. k-Shell The notion of k-shell corresponds to the subgraph induced by the set of vertices that belong to the k -core but not the ( k + 1)-core, denoted by S k such that S k = { v i ∈ G , v i ∈ H k ∧ v i / ∈ H k +1 } . Core number The core number or shell index of a vertex v i is the highest order of a core that contains this vertex, denoted by core ( v i ).

Introduction Preliminary concepts Graph Generalization Algorithm Experimental Set Up Results Conclusions Graph degeneracy and k-shell Graph G and its decomposition in disjoint k -shells C D B F 3-shell E A 2-shell 1-shell 0-shell

Introduction Preliminary concepts Graph Generalization Algorithm Experimental Set Up Results Conclusions Vertex similarity measures Manhattan similarity ...measures how many equal neighbors the two vertices share but also how many non-neighbors they share. � n Sim Manhattan ( v i , v j ) = 1 − 1 | ( v i , v k ) − ( v j , v k ) | (1) n k =1 where ( v i , v k ) = 1 if ( v i , v k ) ∈ E and ( v i , v k ) = 0 otherwise.

Introduction Preliminary concepts Graph Generalization Algorithm Experimental Set Up Results Conclusions Vertex similarity measures 2-path similarity ...measures the number of paths of length 2 between two vertices. � n Sim 2- path ( v i , v j ) = 1 ( v i , v k )( v j , v k ) (2) n k =1 where ( v i , v k ) = 1 if ( v i , v k ) ∈ E and ( v i , v k ) = 0 otherwise.

Introduction Preliminary concepts Graph Generalization Algorithm Experimental Set Up Results Conclusions Step 1 – Information gathering Collect the information needed in the next step to define the partition groups. computes the k -shell of the original graph, since it will preserve the 1 graph decomposition and also the clustering structure; computes vertex similarity measures in order to define groups of 2 vertices that share some properties regarding graph’s structure. Manhattan similarity 2-path similarity Multilevel clustering algorithm Fastgreedy clustering algorithm

Introduction Preliminary concepts Graph Generalization Algorithm Experimental Set Up Results Conclusions Step 2 – Super-vertex definition Define super-vertices according to the previously collected information for each vertex. For each k -shell in the graph, we merge vertices belonging to the 1 same group partition into the same super-vertex. Additionally, max fusion parameter is defined to avoid merging too 2 many vertices into one super-vertex (split the super-vertex onto two independent super-vertices). As a result of this step, a set of super-vertices is defined and each vertex is assigned to one, and only one, super-vertex.

Introduction Preliminary concepts Graph Generalization Algorithm Experimental Set Up Results Conclusions Step 3 – Generalized graph creation Create the new generalized graph according to the super-vertices defined in the previous step. Define an empty, undirected, edge-labeled and vertex-labeled graph 1 G = ( � � V , � E ). The process iterates by adding each previously defined super-vertex 2 sv i ∈ � V . A super-edge between two super-vertices is created if there exists an 3 edge between two vertices contained in each of the super-vertices, ( sv i , sv j ) ∈ � E ↔ ( v k , v p ) ∈ E : v k ∈ sv i ∧ v p ∈ sv j . Each super-vertex contains information about the number of vertices, which have merged into this super-vertex ( IntraVertices ) and also the number of edges between the vertices contained in it ( IntraEdges ). Super-edges contain a label indicating the number of edges between all vertices from their endpoints ( InterEdges ).

Introduction Preliminary concepts Graph Generalization Algorithm Experimental Set Up Results Conclusions Example Toy example generalization process. 1-0 1 1 3-shell 3-3 4 2 4-4 2 2-shell 2-1 4-4 3-3 (a) Original graph (b) Generalized graph

Introduction Preliminary concepts Graph Generalization Algorithm Experimental Set Up Results Conclusions Networks Synthetic networks: ER-1000 – Erd¨ os-R´ enyi Model [3] is a classical random graph model. It defines a random graph as n vertices connected by m edges that are chosen randomly from the n ( n − 1) / 2 possible edges. In our experiments, n =1,000 and m =5,000. BA-1000 – Barab´ asi-Albert Model [2], also called scale-free model, is a network whose degree distribution follows a power law (for degree d , its probability density function is P ( d ) = d − γ ; n =1,000 and γ =1 in our experiments). Real networks: Polblogs – Political blogosphere data [1] compiles the data on the links among US political blogs. URV email – the email communication network at the University Rovira i Virgili in Tarragona (Spain) [4].

Introduction Preliminary concepts Graph Generalization Algorithm Experimental Set Up Results Conclusions Generic information loss measures Network metrics Average distance ( dist ) Diameter ( d ) Harmonic mean of the shortest distance ( h ) Transitivity ( T ) We compute the error on these network metrics as follows: ǫ m ( G , � G ) = | m ( G ) − m ( � G ) | , (3)

Introduction Preliminary concepts Graph Generalization Algorithm Experimental Set Up Results Conclusions Clustering-specific information loss measures Anonymization � G G process p Clustering Clustering method c method c Original clusters Perturbed clusters Precision c ( � c ( G ) G ) index n � G ) = 1 precision ( G , � ✶ l tc ( v i )= l pc ( v i ) , (4) n i =1 where ✶ x = y equals 1 if x = y and 0 otherwise.

Introduction Preliminary concepts Graph Generalization Algorithm Experimental Set Up Results Conclusions Clustering-specific information loss measures Clustering algorithms Multilevel (ML) Infomap (IM) Fast greedy modularity optimization (Fastgreedy or FG) Algorithm of Girvan and Newman (Girvan-Newman or GN)

Community-Preserving Generalization of Social Networks Jordi - PowerPoint PPT Presentation

Introduction Preliminary concepts Graph Generalization Algorithm Experimental Set Up Results Conclusions Community-Preserving Generalization of Social Networks Jordi Casas-Roma 1 and Fran cois Rousseau 2 1 Universitat Oberta de Catalunya,

FERTILITY PRESERVING SURGERY FERTILITY PRESERVING SURGERY FERTILITY PRESERVING SURGERY FERTILITY

COMMUNITY MANAGEMENT jono bacon COMMUNITY COMMUNITY COMMUNITY COMMUNITY COMMUNITY COMMUNITY

Local Substitutability for Sequence Generalization Fran cois Coste , Ga elle Garet , Jacques

Data Anonymization - Generalization Algorithms Li Xiong, Slawek Goryczka CS573 Data Privacy and

Data Anonymization - Generalization Algorithms Li Xiong CS573 Data Privacy and Anonymity

CSC321 Lecture 9: Generalization Roger Grosse Roger Grosse CSC321 Lecture 9: Generalization 1 /

VC GENERALIZATION BOUND VC GENERALIZATION BOUND Matthieu Bloch March 12, 2020 1 LOGISTICS (AND

Deep learning: Challenges in learning and generalization Tomas Mikolov, Facebook AI What is

Generalization of Cycle-Covering Heuristics Clemens B uchner Department of Mathematics and

Generalization Bounds and Stability Lorenzo Rosasco Tomaso Poggio 9.520 Class 6 February, 23

CSC413/2516 Lecture 7: Generalization & Recurrent Neural Networks Jimmy Ba Jimmy Ba

Introduction Social and Economic Networks MohammadAmin Fazli Social and Economic Networks 1

Submodular Maximization applied to Marketing Over Social Networks Vahab Mirrokni Google

SOCIAL NETWORKS OF ELDERLY PEOPLE Hayden Manseau 1 1. THE PROBLEM 2 THE IMPACT OF SOCIAL

ComPAS: Community Preserving Sampling for Streaming Graphs Sandipan Sikdar Chair for

P 4 PCN: Privacy-Preserving Path Probing for Payment Channel Networks Ruozhou Yu, Assistant

Criteo 101 Investor Presentation February 2019 1 Safe harbor statement This presentation

Presentation for the February 22, 201 7 Meeting of the Alternative Reference Rates Committee

MULTILINGUAL AUTOMATED TEXT ANONYMIZATION Francisco Dias francisco.m.c.dias@tecnico.ulisboa.pt

Benefit Analysis of an Electronic Road Use Charge System Steven Newman, CEO EROAD New Zealand

Incident Response as a Team Sport: Emerging and Best Practices Gerard Stegmaier Reed Smith LLP

Big Data and the application of anonymization techniques Annual Privacy Forum 2015 7-8 October,

This presenta,on is on behalf of SoCalGas and Southern

Reinventing Edge Computing Applications by harnessing the power of AI, GPU, & 5G 5G AI

Community-Preserving Generalization of Social Networks Jordi - PowerPoint PPT Presentation

Introduction Preliminary concepts Graph Generalization Algorithm Experimental Set Up Results Conclusions Community-Preserving Generalization of Social Networks Jordi Casas-Roma 1 and Fran cois Rousseau 2 1 Universitat Oberta de Catalunya,

FERTILITY PRESERVING SURGERY FERTILITY PRESERVING SURGERY FERTILITY PRESERVING SURGERY FERTILITY

COMMUNITY MANAGEMENT jono bacon COMMUNITY COMMUNITY COMMUNITY COMMUNITY COMMUNITY COMMUNITY

Local Substitutability for Sequence Generalization Fran cois Coste , Ga elle Garet , Jacques

Data Anonymization - Generalization Algorithms Li Xiong, Slawek Goryczka CS573 Data Privacy and

Data Anonymization - Generalization Algorithms Li Xiong CS573 Data Privacy and Anonymity

CSC321 Lecture 9: Generalization Roger Grosse Roger Grosse CSC321 Lecture 9: Generalization 1 /

VC GENERALIZATION BOUND VC GENERALIZATION BOUND Matthieu Bloch March 12, 2020 1 LOGISTICS (AND

Deep learning: Challenges in learning and generalization Tomas Mikolov, Facebook AI What is

Generalization of Cycle-Covering Heuristics Clemens B uchner Department of Mathematics and

Generalization Bounds and Stability Lorenzo Rosasco Tomaso Poggio 9.520 Class 6 February, 23

CSC413/2516 Lecture 7: Generalization &amp; Recurrent Neural Networks Jimmy Ba Jimmy Ba

Introduction Social and Economic Networks MohammadAmin Fazli Social and Economic Networks 1

Submodular Maximization applied to Marketing Over Social Networks Vahab Mirrokni Google

SOCIAL NETWORKS OF ELDERLY PEOPLE Hayden Manseau 1 1. THE PROBLEM 2 THE IMPACT OF SOCIAL

ComPAS: Community Preserving Sampling for Streaming Graphs Sandipan Sikdar Chair for

P 4 PCN: Privacy-Preserving Path Probing for Payment Channel Networks Ruozhou Yu, Assistant

Criteo 101 Investor Presentation February 2019 1 Safe harbor statement This presentation

Presentation for the February 22, 201 7 Meeting of the Alternative Reference Rates Committee

MULTILINGUAL AUTOMATED TEXT ANONYMIZATION Francisco Dias francisco.m.c.dias@tecnico.ulisboa.pt

Benefit Analysis of an Electronic Road Use Charge System Steven Newman, CEO EROAD New Zealand

Incident Response as a Team Sport: Emerging and Best Practices Gerard Stegmaier Reed Smith LLP

Big Data and the application of anonymization techniques Annual Privacy Forum 2015 7-8 October,

This presenta,on is on behalf of SoCalGas and Southern

Reinventing Edge Computing Applications by harnessing the power of AI, GPU, &amp; 5G 5G AI

CSC413/2516 Lecture 7: Generalization & Recurrent Neural Networks Jimmy Ba Jimmy Ba

Reinventing Edge Computing Applications by harnessing the power of AI, GPU, & 5G 5G AI