Generating Useful Network-based Features for Analyzing Social - - PowerPoint PPT Presentation

generating useful network based features for analyzing
SMART_READER_LITE
LIVE PREVIEW

Generating Useful Network-based Features for Analyzing Social - - PowerPoint PPT Presentation

Generating Useful Network-based Features for Analyzing Social Networks Jun Karam on, Yutaka Matsuo and Mitsuru I shizuka University of Tokyo Published in Proc. of AAAI 2008 Presented by: Congyi Liu 1 OUTLINE Introduction Related Works


slide-1
SLIDE 1

1

Generating Useful Network-based Features for Analyzing Social Networks

Jun Karam on, Yutaka Matsuo and Mitsuru I shizuka

University of Tokyo Published in Proc. of AAAI 2008 Presented by: Congyi Liu

slide-2
SLIDE 2

2

OUTLINE

Introduction Related Works Methodology Experiment Result Discussion and Conclusion

slide-3
SLIDE 3

3

Interaction among users creates a social

network among users. Many efforts are underway to analyze user intersections by analyzing social networks among users.

Link-based classification: classifying

samples using the relations and links that are present among them.

Link prediction: predicting whether there

would be a link between a pair of nodes (in the future) given the (previously)

  • bserved links.

Social Network

Introduction Related Works Methodology Experiment Result Discussion and Conclusion 1 2 3 1 2 1 2 3 4 5 1 2 3 4 5 1 2

slide-4
SLIDE 4

4

Motivation: Greater potential exists for new features using a

network structure.

Problems: Numerous methods exist to aggregate features for link-

based classification and link prediction;

The network structure among users influences each user

differently;

It is difficult to determine useful feature aggregation in

advance.

Motivation

Introduction Related Works Methodology Experiment Result Discussion and Conclusion 1 2 3 1 2 1 2 3 4 5 1 2 3 4 5 1 2

slide-5
SLIDE 5

5

Propose an algorithm to identify important network- based features systematically from a given social network to analyze user behavior efficiently.

Define general operators that are applicable to the social network; The combinations of the operators provide different features; Using the datasets, @cosme and Hatena Bookmark, the performance of

link-based classification and link prediction increase compared to existing approaches.

Contribution

Introduction Related Works Methodology Experiment Result Discussion and Conclusion 1 2 3 1 2 1 2 3 4 5 1 2 3 4 5 1 2

slide-6
SLIDE 6

6

Density: the number of edges in a (sub-)graph, expressed as a

proportion of the maximum possible number of edges.

Centrality measures: measure the structural importance of a node,

e.g. the power of individual actors.

Characteristic path length: the average distance between any two

nodes in the network (or a component of it).

Clustering coefficient: the ratio of edges between the nodes within a

node’s neighborhood to the number of edges that can possibly exist between them.

Structural equivalence, structural holes…

Features used in Social Network Analysis

Introduction Related Works Methodology Experiment Result Discussion and Conclusion 1 2 3 1 2 1 2 3 4 5 1 2 3 4 5 1 2

slide-7
SLIDE 7

7

Other Features used in Related Works

Introduction Related Works Methodology Experiment Result Discussion and Conclusion 1 2 3 1 2 1 2 3 4 5 1 2 3 4 5 1 2

Features used in link-based classification Features used in link prediction

slide-8
SLIDE 8

8

Recognizing that traditional studies in social science have

demonstrated the usefulness of several indices, we can assume that feature generation toward the indices is also useful.

Feature Generation:

Intuition

Introduction Related Works Methodology Experiment Result Discussion and Conclusion 1 2 3 1 2 1 2 3 4 5 1 2 3 4 5 1 2

slide-9
SLIDE 9

9

Feature Generation

  • Step 1: Defining a Node Set

Based on a network structure

  • i.e. is a set of nodes within distance k from x.

Based on the category of a node

i.e. Define the node set for which the categorical value A is a

  • Step 2: Operation on a Node Set
  • Define operators with respect to two nodes; then expand it to a node set
  • returns 1 if nodes x and y are within distance k, and 0 otherwise.
  • returns 1 if the shortest path between y and z includes node x.
  • returns a set of values for each pair of y,z ∈N.
  • Step 3: Aggregation of Values
  • Based on a list of values, several standard operations can be added to the list.

i.e. summation (Sum), average (Avg), maximum (Max), and minimum (Min)

  • Step 4: Optionally, we can take the average, difference, or product of two values
  • btained in Step 3.

Introduction Related Works Methodology Experiment Result Discussion and Conclusion 1 2 3 1 2 1 2 3 4 5 1 2 3 4 5 1 2

) (k x

C

a A

N

=

) , (

) (

y x s k

) , ( z y ux

N ux o

slide-10
SLIDE 10

10

Generate network-based features which represent a score (i.e.

connection weight) on two nodes x and y.

  • i.e. Calculate preferential attachment (|Γ(x)| · |Γ(y)|) by respectively

counting the links of nodes x and y, thereby obtaining a value as the product of two values.

Define a node set that is relevant to both node x and node y.

  • i.e. Common neighbors (|Γ(x)∩Γ(y)|) depend on the number of common

nodes which are adjacent to nodes x and y.

Several operators should be added/modified for link prediction aside

from link-based classification to cover more features.

  • i.e. Operator ux is modified as uxy(z,w), which returns 1 if the shortest path

between z and w includes lxy and 0 otherwise.

For Link Prediction: Relational Features

Introduction Related Works Methodology Experiment Result Discussion and Conclusion 1 2 3 1 2 1 2 3 4 5 1 2 3 4 5 1 2

slide-11
SLIDE 11

11

Operator List

Introduction Related Works Methodology Experiment Result Discussion and Conclusion 1 2 3 1 2 1 2 3 4 5 1 2 3 4 5 1 2

slide-12
SLIDE 12

12

Constraints

Introduction Related Works Methodology Experiment Result Discussion and Conclusion 1 2 3 1 2 1 2 3 4 5 1 2 3 4 5 1 2

64 features for link-based classification. For link prediction, we can generate 126 features in Method 1 and 160

features in Method 2.

Some resultant features sometimes correspond to well-known indices.

i.e. Denote the network density as

Regarding link prediction, we can also generate several features that

are often used in relevant studies in the literature.

i.e. Common neighbors is realized by

slide-13
SLIDE 13

13

@cosme dataset

Data selection for link-based classification

① Choose a community as a target; ② select users in the community as

positive examples; ③ As negative examples, select those who are not in the community but who have friends who are in the target community.

Data selection for link prediction

① The positive examples are picked up randomly among links created

between time T and T' (T < T' < T''); ② The negative examples are those created between time T' and T''. Hatena Bookmark dataset

First define similarity between users. Create training and test data similarly to the @cosme dataset

Datasets

Introduction Related Works Methodology Experiment Result Discussion and Conclusion 1 2 3 1 2 1 2 3 4 5 1 2 3 4 5 1 2

slide-14
SLIDE 14

14

Results: Link-based Classification

Introduction Related Works Methodology Experiment Result Discussion and Conclusion 1 2 3 1 2 1 2 3 4 5 1 2 3 4 5 1 2

slide-15
SLIDE 15

15

Results: Link-based Classification

Introduction Related Works Methodology Experiment Result Discussion and Conclusion 1 2 3 1 2 1 2 3 4 5 1 2 3 4 5 1 2

slide-16
SLIDE 16

16

Results: Link Prediction

Introduction Related Works Methodology Experiment Result Discussion and Conclusion 1 2 3 1 2 1 2 3 4 5 1 2 3 4 5 1 2

slide-17
SLIDE 17

17

Results: Link Prediction

Introduction Related Works Methodology Experiment Result Discussion and Conclusion 1 2 3 1 2 1 2 3 4 5 1 2 3 4 5 1 2

slide-18
SLIDE 18

18

Consider a tradeoff: keeping operators simple and

covering various indices.

Other features cannot be composed in the current

setting.

Do not argue that the operators defined are optimal

  • r better than any other set of operators.

The number of features becomes huge when they

increasingly add operators.

Discussion

Introduction Related Works Methodology Experiment Result Discussion and Conclusion 1 2 3 1 2 1 2 3 4 5 1 2 3 4 5 1 2

slide-19
SLIDE 19

19

Can generate features that are well studied in social

network analysis, along with some useful new features, in a systematic fashion.

Applied the proposed method to two datasets for

link-based classification and link prediction tasks and thereby demonstrated that some features are useful for predicting user interactions.

Conclusion

Introduction Related Works Methodology Experiment Result Discussion and Conclusion 1 2 3 1 2 1 2 3 4 5 1 2 3 4 5 1 2

slide-20
SLIDE 20

20

Thank You!