Wherefore Art Thou R3579X? Anonymized Social Networks, Hidden - - PowerPoint PPT Presentation

▶

Feb 03, 2024 374 likes •638 views

Wherefore Art Thou R3579X? Anonymized Social Networks, Hidden Patterns, and Structural Stenography A social network occurs anywhere there is social interaction between people. Examples include Email, instant messaging, Facebook, blogging

SLIDE 1

Wherefore Art Thou R3579X? Anonymized Social Networks, Hidden Patterns, and Structural Stenography

SLIDE 2

A social network occurs anywhere there is

social interaction between people.

Examples include Email, instant

messaging, Facebook, blogging trackbacks, coauthor networks

SLIDE 3

SLIDE 4

The structure of social networks can be

interesting

How are friendships usually structured? Are there hubs, such as Heather, who connect separate networks? How many degrees of Kevin Bacon? We can investigate these questions if we have the data to mine.

SLIDE 5

For our examples, we will use a network

f emails sent between users.

How do we protect users’ privacy while

still releasing the data for research?

John Mary Vertex Vertex Directed edge

SLIDE 6

Remove any identifiable information, such

as name and other attributes.

Randomly rename the vertices

R3579X R73313

SLIDE 7

Convert directed edges to undirected

edges. This increases the complexity and

makes it harder to attack.

R3579X R73313 Undirected edge

SLIDE 8

Let’s say you want to know if

two vertices are connected on the graph.

All the identifying info has been

removed, so how do we do it?

SLIDE 9

An active attack involves the adversary

creating vertices in the graph before the graph is released

The adversary will create edges between

the vertices in a fashion that it can then recognize later on in when the graph is released

SLIDE 10

We create k new vertices around 2*(log n)

where n is the total number of vertices

We create new do – d1 edges between

these new vertices and the other ones in the graph

Then, we randomly create edges between

these new nodes with independent probability of 1/2

SLIDE 11

Given the graph, how do we find the

subgraph that we created?

Create a search tree, pruning the tree

based on the properties of our subgraph, such as the number of degrees of our new vertices

SLIDE 12

Tom John Mary Mike Zoe

SLIDE 13

Tom John Mary k5 k1 k2 k4 k3 Mike Zoe

SLIDE 14

Tom John Mary k5 k1 k2 k4 k3 Mike Zoe

SLIDE 15

Tom John Mary k5 k1 k2 k4 k3 Mike Zoe

SLIDE 16

JKL ZXCV QWER DFG WER UYT ASD HGF ASDF BNM

SLIDE 17

JKL ZXCV QWER k5 k1 k2 k4 k3 ASDF BNM

SLIDE 18

JKL John Mary k5 k1 k2 k4 k3 ASDF BNM

SLIDE 19

The paper proves that the search tree

does not grow too large and that the algorithm displays good performance

Also, it proves that the subgraph is unique

so that we don’t identify the wrong subgraph

SLIDE 20

They simulate an attack on LiveJournal

friendship links. They create the accounts

n the website, make the connections,

and then crawl the site and anonymize the data

The network has 4.4 million nodes and 77

million edges

SLIDE 21

SLIDE 22

Only needs sqrt(log(n)) new nodes to

attack the graph

However, it’s much more computationally

intensive and less practical in the real world, although it takes less nodes

SLIDE 23

SLIDE 24

It’s a lot like an active attack, except you

don’t create new nodes, instead you collaborate with your friends and find yourselves in the graph

However, because you did not specifically

target certain people, you may not be able to identify other people when you find yourself

Wherefore Art Thou R3579X? Anonymized Social Networks, Hidden Patterns, and Structural Stenography

A social network occurs anywhere there is

social interaction between people.

Examples include Email, instant

messaging, Facebook, blogging trackbacks, coauthor networks

The structure of social networks can be

interesting

How are friendships usually structured? Are there hubs, such as Heather, who connect separate networks? How many degrees of Kevin Bacon? We can investigate these questions if we have the data to mine.

For our examples, we will use a network

How do we protect users’ privacy while

still releasing the data for research?

John Mary Vertex Vertex Directed edge

Remove any identifiable information, such

as name and other attributes.

Randomly rename the vertices

R3579X R73313

Convert directed edges to undirected

makes it harder to attack.

R3579X R73313 Undirected edge

Let’s say you want to know if

two vertices are connected on the graph.

All the identifying info has been

removed, so how do we do it?

An active attack involves the adversary

creating vertices in the graph before the graph is released

The adversary will create edges between

the vertices in a fashion that it can then recognize later on in when the graph is released

We create k new vertices around 2*(log n)

where n is the total number of vertices

We create new do – d1 edges between

these new vertices and the other ones in the graph

Then, we randomly create edges between

these new nodes with independent probability of 1/2

Given the graph, how do we find the

subgraph that we created?

Create a search tree, pruning the tree

based on the properties of our subgraph, such as the number of degrees of our new vertices

Tom John Mary Mike Zoe

Tom John Mary k5 k1 k2 k4 k3 Mike Zoe

Tom John Mary k5 k1 k2 k4 k3 Mike Zoe

Tom John Mary k5 k1 k2 k4 k3 Mike Zoe

JKL ZXCV QWER DFG WER UYT ASD HGF ASDF BNM

JKL ZXCV QWER k5 k1 k2 k4 k3 ASDF BNM

JKL John Mary k5 k1 k2 k4 k3 ASDF BNM

The paper proves that the search tree

does not grow too large and that the algorithm displays good performance

Also, it proves that the subgraph is unique

so that we don’t identify the wrong subgraph

They simulate an attack on LiveJournal

friendship links. They create the accounts

and then crawl the site and anonymize the data

The network has 4.4 million nodes and 77

million edges

Only needs sqrt(log(n)) new nodes to

attack the graph

However, it’s much more computationally

intensive and less practical in the real world, although it takes less nodes

It’s a lot like an active attack, except you

don’t create new nodes, instead you collaborate with your friends and find yourselves in the graph

However, because you did not specifically

target certain people, you may not be able to identify other people when you find yourself

We cannot rely on anonymization to

ensure privacy in social networks

Possible improvements: add noise to the

data by adding/removing random edges