Learning to de-anonymize social networks A machine learning approach - PowerPoint PPT Presentation

Learning to de-anonymize social networks A machine learning approach to social graph de-anonymization Kumar Sharad October 28, 2016 Royal Holloway, University of London ACM Workshop on Artificial Intelligence and Security, Vienna, Austria (AISec 2016)

This talk 1. True Friends Let You Down: Benchmarking Social Graph Anonymization Schemes 2. Change of Guard: The Next Generation of Social Graph De-anonymization Attacks 2

Overview 1. Introduction 2. Preliminaries 3. Benchmarking social graph anonymization schemes 4. The next generation of social graph de-anonymization attacks 5. Conclusions 3

Introduction

The art of data anonymization • Goal : process data without jeopardizing privacy. • Popular : randomize identifiers and/or perturb data. • Pros : cheap, preserves utility, provides legal immunity. • Cons : lack of privacy guarantees. 4

Privacy challenges in anonymized social graphs • Social graphs tend to be particularly notorious to anonymize. • How can we compare various anonymization schemes? • Can we measure privacy leak purely based on graph topology? • Could this lead to end-to-end graph de-anonymization? de-anonymization function. 5 • Intuition : Train a machine learning model to learn the

Preliminaries

Node features • Graph nodes represent individuals and the edges represent relationship among them. • Feature vector purely based on topology (no edge weights or directionality). • Too generic: high false positives. • Too specific: low true positives. • Quantize neighborhood degree distribution. 6

The 2-hop neighborhood of a node Ego 1-hop 1-hop 1-hop 2-hop 2-hop 2-hop 2-hop 7

Node feature vector Feature vector of a node with neighbors of degrees – size = 15 70 bins … … … … 8 [ 1 , 1 , 3 , 3 , 5 , 6 , 7 , 13 , 16 , 20 , 21 , 30 , 65 , 69 , 72 , 1030 , 1100 ] . c 0 = 8 c 1 = 4 c 2 = 0 c 4 = 3 c 69 = 2

Node feature vector 1-hop nodes 2-hop nodes … … 8 c 0 = 8 c 1 = 4 c 2 = 0 c 137 = 3 c 138 = 1 c 139 = 46

The learning task • Distinguish whether a pair of graph node feature vectors represent the same individual. classify node pairs. 9 • Given a node pair classify them as identical or non-identical. • We use random forest which is a collection of decision trees to • Prediction: aggregate the decision of all trees.

Benchmarking social graph anonymization schemes

Motivation • A large number of social graph anonymization schemes have been proposed with varied goals. • None of them provide any privacy guarantees. • Preserving privacy vs. preserving utility. • Ad-hoc development of schemes has created a skewed ecosystem. • Research gap: how to compare social graph anonymization schemes? 10

Approach • Compare social graph anonymization schemes based on anonymity provided vs. utility preserved. • Use a machine learning framework to benchmark perturbation-based social graph anonymization schemes. • Automates evaluation and levels the playing field. 11

• A sanitized social network is released. • Adversary uses graph topology to predict the true correspondences. 1A. Narayanan and V. Shmatikov. De-anonymizing social networks. IEEE S&P 2009. 12 The adversarial model 1 (1/2) • Adversary obtains an auxiliary social network with some overlap. • Sample nodes at random from original graph G to generate two graphs G 1 and G 2 with an overlap.

The adversarial model (2/2) Coefficient. Jaccard Coefficient Jaccard Coefficient between sets X and Y at least one of which is non-empty is defined as: 13 • Perturb G 1 and G 2 to produce G aux and G san . • Overlap between G 1 and G 2 is measured using Jaccard JC ( X , Y ) = | X ∩ Y | | X ∪ Y |

G G 1 G 2 Sample G aux G san Perturb Perturb Generating perturbed graphs. 14

Schemes analyzed 1. Random Sparsification (RSP) 2. Random Edge Perturbation (REP) 3. k -Degree Anonymous (KDA) 4. 1-hop k -Anonymous (1HKA) 5. Random Add/Delete (RAD) 6. Random Switch (RSW) 15

Measuring utility 1. Degree distribution (DD) 2. Joint degree distribution (JDD) 3. Average degree connectivity 4. Degree centrality 5. Eigenvector centrality 16

Measuring anonymity • Measured by the de-anonymization success achieved as depicted by ROC curves with varying perturbation. • A higher AUC implies weaker anonymity. • An increase in perturbation should produce a commensurate decrease in de-anonymization success while minimizing damage to utility. 17

Training without ground truth • What is the best way to train a model given the adversary only data. • The datasets do not need to be further anonymized. • Identical and non-identical node pairs are extremely different. 18 has access to G aux and G san ? • Ideal: generate G aux and G san from G . • Practical: split G aux and G san individually and merge the sampled

Split Split G aux G san G aux G san G san G aux Training without ground truth by splitting the original graphs. 19

Split G aux G san G aux G san G san G aux Training without ground truth by splitting the original graphs. 19 Split ′ ′

G aux G san Training without ground truth by splitting the original graphs. aux G san G san G aux G 19 Split Split ′ ′ ′′ ′′

Evaluation and results Publicly available datasets used • Flickr (80 513 nodes, 5 899 882 edges). • Facebook New Orleans dataset (63 731 nodes, 817 090 edges). 20

Degree distribution 21 Facebook: Random Sparsification Facebook: Random Edge Perturbation 10 4 10 4 Original Original µ = 10 − 4 α E = 0 . 75 µ = 10 − 3 α E = 0 . 50 10 3 10 3 µ = 10 − 2 α E = 0 . 25 Frequency (log) Frequency (log) 10 2 10 2 10 1 10 1 10 0 10 0 10 0 10 1 10 2 10 3 10 4 10 0 10 1 10 2 10 3 10 4 Degree (log) Degree (log) Facebook: k -Degree Anonymity 10 4 Original k = 10 k = 50 10 3 k = 100 Frequency (log) 10 2 10 1 10 0 10 0 10 1 10 2 10 3 10 4 Degree (log)

Joint degree distribution: RSP 22 FB: RSP ( α E = 0 . 75) FB: No Anonymization 100 100 480 540 420 480 80 80 420 360 360 60 60 300 300 240 240 40 40 180 180 120 20 20 120 60 60 0 0 0 0 0 20 40 60 80 100 0 20 40 60 80 100 FB: RSP ( α E = 0 . 50) FB: RSP ( α E = 0 . 25) 100 100 800 540 480 700 80 80 420 600 60 360 60 500 300 400 40 240 40 300 180 200 20 120 20 100 60 0 0 0 0 0 20 40 60 80 100 0 20 40 60 80 100

Joint degree distribution: REP 23 FB: REP ( µ = 10 − 4 ) FB: No Anonymization 100 100 1050 480 900 420 80 80 360 750 60 60 300 600 240 450 40 40 180 300 120 20 20 150 60 0 0 0 0 0 20 40 60 80 100 0 20 40 60 80 100 FB: REP ( µ = 10 − 3 ) FB: REP ( µ = 10 − 2 ) 100 500 2250 4500 4000 2000 80 400 3500 1750 1500 3000 60 300 2500 1250 1000 2000 40 200 750 1500 1000 20 500 100 250 500 0 0 0 0 0 20 40 60 80 100 0 100 200 300 400 500

Learning to de-anonymize social networks A machine learning approach - PowerPoint PPT Presentation

Learning to de-anonymize social networks A machine learning approach to social graph de-anonymization Kumar Sharad October 28, 2016 Royal Holloway, University of London ACM Workshop on Artificial Intelligence and Security, Vienna, Austria

Key parse TCP assembly Offline Online capture anonymize Anon. One-Way Interface Key (anon.

A Practical Attack to De-Anonymize Social Network Users Gilbert Wondracek (Vienna University of

A Method to Compress and Anonymize Packet Traces Markus Peuhkuri 2001-11-02 Abstract Data

Bitcoin and Anonymity Anonymity Basics How to de-anonymize Bitcoin Mixing

Data Anonymization Graham Cormode Graham Cormode graham@research.att.com 1 Why Anonymize?

Introduction Social and Economic Networks MohammadAmin Fazli Social and Economic Networks 1

Submodular Maximization applied to Marketing Over Social Networks Vahab Mirrokni Google

SOCIAL NETWORKS OF ELDERLY PEOPLE Hayden Manseau 1 1. THE PROBLEM 2 THE IMPACT OF SOCIAL

Types of networks (social networks, computer networks, entity- relationship networks, )

Querying Geo-social Data by Bridging Spatial Networks and Social Networks Yerach Ben Yaron

Social Networks What are they, really? What we will learn today What is a social network?

P2P Networks as Content P2P Networks as Content Delivery Networks Delivery Networks FINAL

Current Network Structure for Pediatrics Hospital Networks Country, state, regional, Academic

CHAPTER VII VII CHAPTER Learning in Recurrent Networks Learning in Recurrent Networks CHAPTER

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Learning in Social Networks E. Viennet Laboratoire de Traitement et Transport de lInformation

Specifying appropriate null models with longitudinal SEMs Sven O. Spie German Stata User Group

WWW.TOTW.ORG By Kenneth M Hoeck Finally, brethren, whatsoever things are true, whatsoever things

Lycurgus Note to user: Replace this image with your own. Lycurgus as one of the Seven Wise Men

Matthew Series Lesson #153 February 5, 2017 Dean Bible Ministries www.deanbibleministries.org Dr.

Data Anonymization - Generalization Algorithms Li Xiong, Slawek Goryczka CS573 Data Privacy and

K-Anonymity & Social Networks CompSci 590.03 Instructor: Ashwin Machanavajjhala (Some slides

De-anonymizing Data CompSci 590.03 Instructor: Ashwin

Taming the Devil: Techniques for Evaluating Anonymized Network Data Scott Coull 1 , Charles Wright

Learning to de-anonymize social networks A machine learning approach - PowerPoint PPT Presentation

Learning to de-anonymize social networks A machine learning approach to social graph de-anonymization Kumar Sharad October 28, 2016 Royal Holloway, University of London ACM Workshop on Artificial Intelligence and Security, Vienna, Austria

Key parse TCP assembly Offline Online capture anonymize Anon. One-Way Interface Key (anon.

A Practical Attack to De-Anonymize Social Network Users Gilbert Wondracek (Vienna University of

A Method to Compress and Anonymize Packet Traces Markus Peuhkuri 2001-11-02 Abstract Data

Bitcoin and Anonymity Anonymity Basics How to de-anonymize Bitcoin Mixing

Data Anonymization Graham Cormode Graham Cormode graham@research.att.com 1 Why Anonymize?

Introduction Social and Economic Networks MohammadAmin Fazli Social and Economic Networks 1

Submodular Maximization applied to Marketing Over Social Networks Vahab Mirrokni Google

SOCIAL NETWORKS OF ELDERLY PEOPLE Hayden Manseau 1 1. THE PROBLEM 2 THE IMPACT OF SOCIAL

Types of networks (social networks, computer networks, entity- relationship networks, )

Querying Geo-social Data by Bridging Spatial Networks and Social Networks Yerach Ben Yaron

Social Networks What are they, really? What we will learn today What is a social network?

P2P Networks as Content P2P Networks as Content Delivery Networks Delivery Networks FINAL

Current Network Structure for Pediatrics Hospital Networks Country, state, regional, Academic

CHAPTER VII VII CHAPTER Learning in Recurrent Networks Learning in Recurrent Networks CHAPTER

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Learning in Social Networks E. Viennet Laboratoire de Traitement et Transport de lInformation

Specifying appropriate null models with longitudinal SEMs Sven O. Spie German Stata User Group

WWW.TOTW.ORG By Kenneth M Hoeck Finally, brethren, whatsoever things are true, whatsoever things

Lycurgus Note to user: Replace this image with your own. Lycurgus as one of the Seven Wise Men

Matthew Series Lesson #153 February 5, 2017 Dean Bible Ministries www.deanbibleministries.org Dr.

Data Anonymization - Generalization Algorithms Li Xiong, Slawek Goryczka CS573 Data Privacy and

K-Anonymity &amp; Social Networks CompSci 590.03 Instructor: Ashwin Machanavajjhala (Some slides

De-anonymizing Data CompSci 590.03 Instructor: Ashwin

Taming the Devil: Techniques for Evaluating Anonymized Network Data Scott Coull 1 , Charles Wright

K-Anonymity & Social Networks CompSci 590.03 Instructor: Ashwin Machanavajjhala (Some slides