Classifying Online Social Network Users Through the Social Graph - - PowerPoint PPT Presentation

classifying online social network users through the
SMART_READER_LITE
LIVE PREVIEW

Classifying Online Social Network Users Through the Social Graph - - PowerPoint PPT Presentation

Classifying Online Social Network Users Through the Social Graph Cristina P erez Sol` a and Jordi Herrera Joancomart Departament dEnginyeria de la Informaci o i les Comunicacions Universitat Aut` onoma de Barcelona October


slide-1
SLIDE 1

Classifying Online Social Network Users Through the Social Graph

Cristina P´ erez Sol` a and Jordi Herrera Joancomart´ ı

Departament d’Enginyeria de la Informaci´

  • i les Comunicacions

Universitat Aut`

  • noma de Barcelona

October 25th, 2012

slide-2
SLIDE 2

Introduction Classifier proposal The experiments Conclusions and further work

1

Introduction

2

Classifier proposal

3

The experiments

4

Conclusions and further work

2 / 23

slide-3
SLIDE 3

Introduction Classifier proposal The experiments Conclusions and further work About the title

Classifying...

Definition

Classification is the problem of identifying to which of a set of categories a new observation belongs. The decision is made on the basis of a training set

  • f data containing observations whose category membership is already known.

3 / 23

slide-4
SLIDE 4

Introduction Classifier proposal The experiments Conclusions and further work About the title

... Online Social Network Users...

4 / 23

slide-5
SLIDE 5

Introduction Classifier proposal The experiments Conclusions and further work About the title

...Through the Social Graph

Definition A social graph is a graph where nodes represent users in a social network and edges represent relationships between these users.

5 / 23

slide-6
SLIDE 6

Introduction Classifier proposal The experiments Conclusions and further work What do we want to do?

Goals

Design a user (node) classifier that uses the graph structure alone (no semantic information is needed). Apply the previously designed classifier to label OSN users. Demonstrate that OSN user classification is possible with naively anonymized graphs.

6 / 23

slide-7
SLIDE 7

Introduction Classifier proposal The experiments Conclusions and further work Why is it interesting?

Motivation

User classification as a privacy attack User classification allows an attacker to infer (private) attributes from the user. Attributes may be sensitive by themselves. Attribute disclosure may have undesirable consecuences for the user. In any case, the user is not able to control the disclosure of the information about himself anymore...

7 / 23

slide-8
SLIDE 8

Introduction Classifier proposal The experiments Conclusions and further work

1

Introduction

2

Classifier proposal Architecture overview Classifier modules Specific design details

3

The experiments

4

Conclusions and further work

8 / 23

slide-9
SLIDE 9

Introduction Classifier proposal The experiments Conclusions and further work Architecture overview

Classifier Architecture

The proposed classifier is implemented with a 5 module architecture, which includes two different classifiers: an initial classifier and a relational classifier.

Initial classifier Relational classifier Data preprocessing Data preprocessing

  • Clus. coeff.

& degrees Class labels New class labels Neighborhood analysis

9 / 23

slide-10
SLIDE 10

Introduction Classifier proposal The experiments Conclusions and further work Classifier modules

Initial classifier

The initial classifier analyzes the graph structure and maps each node to a 2-dimensional sample: degree & clustering coefficient. The output is an initial assignation of nodes to categories.

10 / 23

slide-11
SLIDE 11

Introduction Classifier proposal The experiments Conclusions and further work Classifier modules

Neighborhood analysis

The neighborhood analysis module reports to which kind of nodes is every node connected, using the labels assigned by the initial classifier.

11 / 23

slide-12
SLIDE 12

Introduction Classifier proposal The experiments Conclusions and further work Classifier modules

Relational classifier

The relational classifier maps users to n-dimensional samples, using both degree & clustering coefficient and the neighborhood information to classify users. The output is a new assignation of nodes to categories, which can differ from the initial classification.

12 / 23

slide-13
SLIDE 13

Introduction Classifier proposal The experiments Conclusions and further work Specific design details

Some details about the classifier

The graph is directed, so we distinguish between indegree and

  • utdegree (instead of having just degree).

This distinction increases by 2 the number of dimensions in the neighborhood analysis. We can have as many categories as we want: we just have to add more dimensions! Classifiers are instantiated with Support Vector Machines with soft margins. The relational classifier is applied iteratively.

13 / 23

slide-14
SLIDE 14

Introduction Classifier proposal The experiments Conclusions and further work

1

Introduction

2

Classifier proposal

3

The experiments Experiment design Experiment results

4

Conclusions and further work

14 / 23

slide-15
SLIDE 15

Introduction Classifier proposal The experiments Conclusions and further work Experiment design

The main goal

Research question Is an attacker able to recover attributes from OSN users knowing just the social graph structure and the attributes of a small subset

  • f the nodes in the graph?

We are facing a within network classification problem, where nodes for which the labels are unknown are linked to nodes for which the label is known.

15 / 23

slide-16
SLIDE 16

Introduction Classifier proposal The experiments Conclusions and further work Experiment design

Data used in the experiments

We collected data from 936.423 Twitter users, which were all the neighbors of a subset of 300 nodes. We constructed two disjoint graphs G1 = (V1, E1) and G2 = (V2, E2) with users and their relationships. We labeled the nodes of the graphs to obtain the ground of truth:

Binary classification: individual or company. Multiclass classification: normal user, blogger, celebrity, media and organization.

16 / 23

slide-17
SLIDE 17

Introduction Classifier proposal The experiments Conclusions and further work Experiment design

An experiment

Each of the experiments consisted on: Randomly selecting a subset of nodes (Vtrain) to be used as training samples: 65%, 50%, 35% and 20% of nodes. Training the classifiers with those samples. Classifying the rest of the nodes (Vtest = V Vtrain). Evaluating the overall performance using the ground of truth. We performed 100 experiments for each of the training set sizes and for both classification problems.

17 / 23

slide-18
SLIDE 18

Introduction Classifier proposal The experiments Conclusions and further work Experiment results

Binary Classification Results

1 2 3 4 5 6 7 8 9 10 0.5 0.55 0.6 0.65 0.7 0.75

Iteration Correct rate

Correct rates

D1−65% train D1−50% train D1−35% train D1−20% train D2−65% train D2−50% train D2−35% train D2−20% train

18 / 23

slide-19
SLIDE 19

Introduction Classifier proposal The experiments Conclusions and further work Experiment results

Multiclass Classification Results

1 2 3 4 5 6 7 8 9 10 0.3 0.35 0.4 0.45 0.5 0.55 0.6

Iteration Correct rate

Correct rates

Cata − 65% train Cata − 50% train Cata − 35% train Cata − 20% train 19 / 23

slide-20
SLIDE 20

Introduction Classifier proposal The experiments Conclusions and further work

1

Introduction

2

Classifier proposal

3

The experiments

4

Conclusions and further work

20 / 23

slide-21
SLIDE 21

Introduction Classifier proposal The experiments Conclusions and further work Conclusions

Conclusions

Information found in the social graph is enough to perform classification. It is possible to classify OSN users using a naively anonymized copy of a social graph. Naive anonymization does not protect OSN users from attribute disclosure. Success rate varies depening on the training set sizes.

21 / 23

slide-22
SLIDE 22

Introduction Classifier proposal The experiments Conclusions and further work Further work

Further work

Integrate both structural and semantic information to improve classification. Study the impact of different graph anonymization techniques (other than the naive anonymization) on the classification. Analyze the performance of other classification techniques for relational data.

22 / 23

slide-23
SLIDE 23

Classifying Online Social Network Users Through the Social Graph

Cristina P´ erez Sol` a and Jordi Herrera Joancomart´ ı

Departament d’Enginyeria de la Informaci´

  • i les Comunicacions

Universitat Aut`

  • noma de Barcelona

October 25th, 2012

slide-24
SLIDE 24

Linear SVM

24 / 23

slide-25
SLIDE 25

Non linear SVM

25 / 23