Link Prediction in Online Social Networks Using Group Information - - PowerPoint PPT Presentation

link prediction in online social networks using group
SMART_READER_LITE
LIVE PREVIEW

Link Prediction in Online Social Networks Using Group Information - - PowerPoint PPT Presentation

Introduction Proposal Experimental Evaluation Conclusion Link Prediction in Online Social Networks Using Group Information Jorge Valverde-Rebaza and Alneu de Andrade Lopes Laboratory of Computational Intelligence (LABIC) University of So


slide-1
SLIDE 1

Introduction Proposal Experimental Evaluation Conclusion

Link Prediction in Online Social Networks Using Group Information

Jorge Valverde-Rebaza and Alneu de Andrade Lopes

Laboratory of Computational Intelligence (LABIC) University of São Paulo (USP) Brazil

July 2014

slide-2
SLIDE 2

Introduction Proposal Experimental Evaluation Conclusion

Outline

1

Introduction

2

Proposal

3

Experimental Evaluation

4

Conclusion

Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 2

slide-3
SLIDE 3

Introduction Proposal Experimental Evaluation Conclusion Social Networks Groups Detection Link Prediction

Outline

1

Introduction

2

Proposal

3

Experimental Evaluation

4

Conclusion

Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 3

slide-4
SLIDE 4

Introduction Proposal Experimental Evaluation Conclusion Social Networks Groups Detection Link Prediction

Social Networks

Structure made up of a set of actors (individual or

  • rganizations) and social relations between them

Social network analysis is an interesting research field in graph and complex network theory, data mining, machine learning and other areas Rise of online social networks

Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 4

slide-5
SLIDE 5

Introduction Proposal Experimental Evaluation Conclusion Social Networks Groups Detection Link Prediction

Groups Detection

Real networks are characterized by high concentration of links within special groups of vertices and low concentrations of links between these groups Online social networks offer a wide variety of possible groups: families, working and friendship circles, artistic or academic preferences, towns, nations, etc.

Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 5

slide-6
SLIDE 6

Introduction Proposal Experimental Evaluation Conclusion Social Networks Groups Detection Link Prediction

Link Prediction Process

Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 6

slide-7
SLIDE 7

Introduction Proposal Experimental Evaluation Conclusion Social Networks Groups Detection Link Prediction

Link Prediction Measures

Based on global information

Higher accuracy Very time-consuming computation Usually infeasible for large-scale networks E.g.: Katz index, Hitting time index, Simrank, etc. [Lü and Zhou, 2011] Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 7

slide-8
SLIDE 8

Introduction Proposal Experimental Evaluation Conclusion Social Networks Groups Detection Link Prediction

Link Prediction Measures

Based on global information

Higher accuracy Very time-consuming computation Usually infeasible for large-scale networks E.g.: Katz index, Hitting time index, Simrank, etc. [Lü and Zhou, 2011]

Based on local information

Lower accuracy than measures based on global information Faster computation E.g.: Common neighbors (CN), Adamic Adar (AA), Jaccard (Jac), Resource Allocation (RA), Preferential Attachment (PA), etc. [Lü and Zhou, 2011] Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 7

slide-9
SLIDE 9

Introduction Proposal Experimental Evaluation Conclusion Social Networks Groups Detection Link Prediction

Link Prediction Measures

Based on global information

Higher accuracy Very time-consuming computation Usually infeasible for large-scale networks E.g.: Katz index, Hitting time index, Simrank, etc. [Lü and Zhou, 2011]

Based on local information

Lower accuracy than measures based on global information Faster computation E.g.: Common neighbors (CN), Adamic Adar (AA), Jaccard (Jac), Resource Allocation (RA), Preferential Attachment (PA), etc. [Lü and Zhou, 2011]

Hybrid strategy based on community information

As the community structure grows, the accuracy of these measures drastically improves Perform better than most of measures based on local information E.g.: PFF [Zheleva et al., 2010], CN1, RA1 [Soundarajan and Hopcroft, 2012], WIC, W-measures [Valverde-Rebaza and Lopes, 2012], etc. Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 7

slide-10
SLIDE 10

Introduction Proposal Experimental Evaluation Conclusion Social Networks Groups Detection Link Prediction

Link Prediction Measures

Based on global information

Higher accuracy Very time-consuming computation Usually infeasible for large-scale networks E.g.: Katz index, Hitting time index, Simrank, etc. [Lü and Zhou, 2011]

Based on local information

Lower accuracy than measures based on global information Faster computation E.g.: Common neighbors (CN), Adamic Adar (AA), Jaccard (Jac), Resource Allocation (RA), Preferential Attachment (PA), etc. [Lü and Zhou, 2011]

Hybrid strategy based on community information

As the community structure grows, the accuracy of these measures drastically improves Perform better than most of measures based on local information E.g.: PFF [Zheleva et al., 2010], CN1, RA1 [Soundarajan and Hopcroft, 2012], WIC, W-measures [Valverde-Rebaza and Lopes, 2012], etc.

A node belongs to just one group

Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 7

slide-11
SLIDE 11

Introduction Proposal Experimental Evaluation Conclusion Preliminary WOCG CNG TPOG

Outline

1

Introduction

2

Proposal

3

Experimental Evaluation

4

Conclusion

Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 8

slide-12
SLIDE 12

Introduction Proposal Experimental Evaluation Conclusion Preliminary WOCG CNG TPOG

Preliminary

We consider that each node participates in multiple groups In the network G(V, E) exists M > 1 groups identified by different group labels g1, g2, . . . gM Each node x belongs to a set of node groups G = {ga, gb, . . . gp} with size P > 0 and P ≤ M The set of neighbors of a vertex x is Γ(x) = {y | (x, y) ∈ E} The set of all common neighbors (CN) of a vertex pair (x, y) is Λx,y = Γ(x) ∩ Γ(y)

Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 9

slide-13
SLIDE 13

Introduction Proposal Experimental Evaluation Conclusion Preliminary WOCG CNG TPOG

CN Within and Outside of Common Groups (WOCG)

Considering Gα,β = Gα ∩ Gβ We redefine the set of CN as Λx,y = ΛWCG

x,y

∪ ΛOCG

x,y

ΛWCG

x,y

= {zGγ ∈ Λx,y | Gα,β ∩ Gγ = ∅} - the set of common neighbors within common groups (WCG) ΛOCG

x,y

= Λx,y − ΛWCG

x,y

  • the set of common neighbors outside
  • f the common groups (OCG)

Our final score, called as common neighbors within and

  • utside of common groups (WOCG) measure, is defined

as: sWOCG

x,y

= |ΛWCG

x,y

| |ΛOCG

x,y |

(1)

Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 10

slide-14
SLIDE 14

Introduction Proposal Experimental Evaluation Conclusion Preliminary WOCG CNG TPOG

Common Neighbors of Groups (CNG)

We define the set of common neighbors of groups as ΛG

x,y = {zGγ ∈ Λx,y | Gα ∩ Gγ = ∅ ∨ Gβ ∩ Gγ = ∅}

Our final score, called as common neighbors of groups (CNG), is defined as: sCNG

x,y

= |ΛG

x,y|

(2)

Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 11

slide-15
SLIDE 15

Introduction Proposal Experimental Evaluation Conclusion Preliminary WOCG CNG TPOG

CN with Total and Partial Overlapping of Groups (TPOG)

We redefine the set of CNG as ΛG

x,y = ΛTOG x,y

∪ ΛPOG

x,y

ΛTOG

x,y

= {zGγ ∈ ΛG

x,y | Gα ∩ Gγ = ∅ ∧ Gβ ∩ Gγ = ∅} - the set

  • f CN with total overlapping of groups (TOG)

ΛPOG

x,y

= ΛG

x,y − ΛTOG x,y

  • the set of CN with partial overlapping
  • f groups (POG)

Our final score, called as the common neighbors with total and partial overlapping of groups (TPOG) measure, is defined as: sTPOG

x,y

= |ΛTOG

x,y |

|ΛPOG

x,y |

(3)

Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 12

slide-16
SLIDE 16

Introduction Proposal Experimental Evaluation Conclusion Datasets Experimental setup Results

Outline

1

Introduction

2

Proposal

3

Experimental Evaluation

4

Conclusion

Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 13

slide-17
SLIDE 17

Introduction Proposal Experimental Evaluation Conclusion Datasets Experimental setup Results

Datasets

Table : High-level topological features of our four social networks [Mislove et al., 2007]

Flickr LiveJournal Orkut Youtube Number of nodes 1, 846, 198 5, 284, 457 3, 072, 441 1, 157, 827 Number of links 22, 613, 981 77, 402, 652 223, 534, 301 4, 945, 382 Average degree per node 12.24 16.97 106.1 4.29 Fraction of links symmetric 62.0% 73.5% 100.0% 79.1% Average path length 5.67 5.88 4.25 5.10 Diameter 27 20 9 21 Average clustering coefficient 0.313 0.330 0.171 0.136 Average assortativity coefficient 0.202 0.179 0.072 −0.033 Number of node groups 103, 648 7, 489, 073 8, 730, 859 30, 087 Average number of groups membership per node 4.62 21.25 106.44 0.25 Average group size 82 15 37 10 Average group clustering coefficient 0.47 0.81 0.52 0.34

Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 14

slide-18
SLIDE 18

Introduction Proposal Experimental Evaluation Conclusion Datasets Experimental setup Results

Experimental setup

For a network G(V, E), the set E is divided into the training set, ET, and the test set, EP For EP are randomly selected 2/3 of links formed by nodes with average degree two times greater than the

  • average. The remaining links constitute ET. This is

performed 10 times for each network Evaluate traditional local measures: CN, AA, Jac, RA and PA, and our proposals: WOCG, CNG and TPOG

Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 15

slide-19
SLIDE 19

Introduction Proposal Experimental Evaluation Conclusion Datasets Experimental setup Results

Experimental setup

For each network, create 5 types of characteristic vectors were considered : VLocal (all the local measures), VGroup (all our proposals), VTop (three best local measures - CN, AA and RA - and two best of our proposals - CNG and TPOG), VTop2 (the five best overall measures: TPOG, CNG, AA, WOCG and CN) and VTotal (all measures evaluated)

Table : Number of instances by class for all networks

Existent Non-existent Total Flickr 500001 500001 1000002 LiveJournal 300001 300001 600002 Orkut 1500001 1500001 3000002 Youtube 20001 20001 40002

Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 16

slide-20
SLIDE 20

Introduction Proposal Experimental Evaluation Conclusion Datasets Experimental setup Results

Unsupervised results measured by AUC

Table : Prediction results measured by AUC

WOCG CNG TPOG CN AA Jac RA PA Flickr 0.637 (5.0) 0.728 (1.0) 0.728 (2.0) 0.674 (3.0) 0.656 (4.0) 0.431 (8.0) 0.616 (6.0) 0.566 (7.0) Livejournal 0.596 (4.0) 0.611 (3.0) 0.665 (1.0) 0.582 (5.0) 0.580 (6.0) 0.624 (2.0) 0.565 (7.0) 0.542 (8.0) Orkut 0.649 (2.0) 0.621 (3.0) 0.651 (1.0) 0.572 (7.0) 0.620 (4.0) 0.575 (6.0) 0.566 (8.0) 0.602 (5.0) Youtube 0.434 (7.0) 0.723 (5.0) 0.555 (6.0) 0.834 (4.0) 0.928 (1.0) 0.217 (8.0) 0.892 (3.0) 0.917 (2.0) Average rank 4.50 (4.0) 3.00 (2.0) 2.50 (1.0) 4.75 (5.0) 3.75 (3.0) 6.00 (7.5) 6.00 (7.5) 5.50 (6.0) 1 2 3 4 5 6 7 8 TPOG CNG AA WOCG CN PA Jac RA CD Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 17

slide-21
SLIDE 21

Introduction Proposal Experimental Evaluation Conclusion Datasets Experimental setup Results

Unsupervised results measured by AUC

Table : Prediction results measured by AUC

WOCG CNG TPOG CN AA Jac RA PA Flickr 0.637 (5.0) 0.728 (1.0) 0.728 (2.0) 0.674 (3.0) 0.656 (4.0) 0.431 (8.0) 0.616 (6.0) 0.566 (7.0) Livejournal 0.596 (4.0) 0.611 (3.0) 0.665 (1.0) 0.582 (5.0) 0.580 (6.0) 0.624 (2.0) 0.565 (7.0) 0.542 (8.0) Orkut 0.649 (2.0) 0.621 (3.0) 0.651 (1.0) 0.572 (7.0) 0.620 (4.0) 0.575 (6.0) 0.566 (8.0) 0.602 (5.0) Youtube 0.434 (7.0) 0.723 (5.0) 0.555 (6.0) 0.834 (4.0) 0.928 (1.0) 0.217 (8.0) 0.892 (3.0) 0.917 (2.0) Average rank 4.50 (4.0) 3.00 (2.0) 2.50 (1.0) 4.75 (5.0) 3.75 (3.0) 6.00 (7.5) 6.00 (7.5) 5.50 (6.0) 1 2 3 4 5 6 7 8 TPOG CNG AA WOCG CN PA Jac RA CD

Figure : Post-hoc test for results with CD = 5.25

Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 17

slide-22
SLIDE 22

Introduction Proposal Experimental Evaluation Conclusion Datasets Experimental setup Results

Unsupervised results measured by precision

1,000 2,500 5,000 0.2 0.4 0.6 0.8 1 L Precision

(a) Flickr

1,000 2,500 5,000 0.2 0.4 0.6 0.8 1 L Precision

(b) LiveJournal

1,000 2,500 5,000 0.2 0.4 0.6 0.8 1 L Precision

(c) Orkut

1,000 2,500 5,000 0.2 0.4 0.6 0.8 1 L Precision

(d) Youtube

WOCG CNG TPOG CN AA Jac RA PA

Figure : Precision results on four social networks evaluated

Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 18

slide-23
SLIDE 23

Introduction Proposal Experimental Evaluation Conclusion Datasets Experimental setup Results

Supervised results measured by f-measure

Table : Average of F-measure on four social networks

J48 NB MLP SMO J48 NB MLP SMO Flickr VLocal 0.777 0.507 0.713 0.651 Orkut VLocal 0.825 0.702 0.800 0.764 Flickr VGroup 0.706 0.583 0.699 0.668 Orkut VGroup 0.781 0.676 0.773 0.737 Flickr VTop 0.724 0.525 0.711 0.676 Orkut VTop 0.799 0.720 0.77 0.759 Flickr VTop2 0.722 0.558 0.709 0.669 Orkut VTop2 0.793 0.722 0.773 0.758 Flickr VTotal 0.777 0.548 0.712 0.680 Orkut VTotal 0.826 0.731 0.801 0.771 LiveJournal VLocal 0.797 0.687 0.788 0.774 Youtube VLocal 0.823 0.531 0.73 0.565 LiveJournal VGroup 0.768 0.698 0.768 0.750 Youtube VGroup 0.658 0.563 0.655 0.567 LiveJournal VTop 0.791 0.700 0.787 0.772 Youtube VTop 0.789 0.543 0.724 0.617 LiveJournal VTop2 0.79 0.691 0.781 0.772 Youtube VTop2 0.780 0.600 0.717 0.613 LiveJournal VTotal 0.797 0.702 0.786 0.774 Youtube VTotal 0.826 0.577 0.723 0.623 Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 19

slide-24
SLIDE 24

Introduction Proposal Experimental Evaluation Conclusion Conclusion

Outline

1

Introduction

2

Proposal

3

Experimental Evaluation

4

Conclusion

Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 20

slide-25
SLIDE 25

Introduction Proposal Experimental Evaluation Conclusion Conclusion

Conclusion

Our proposals consider that a node can belong to more than one group, as usually occurs in real networks In an unsupervised strategy, our proposals outperform the local measures but there is no statistically significant winner In a supervised strategy, our proposals combined with local measures may improve the performance of classifiers In general, our proposals improve the performance of link prediction task by considering mainly the information of common groups to which users belong to

Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 21

slide-26
SLIDE 26

Introduction Proposal Experimental Evaluation Conclusion Conclusion

References

Lü, L. and Zhou, T. (2011). Link prediction in complex networks: A survey. Physica A: Statistical Mechanics and its Applications, 390(6):1150 – 1170. Mislove, A., Marcon, M., Gummadi, K. P ., Druschel, P ., and Bhattacharjee, B. (2007). Measurement and analysis of online social networks. In ACM SIGCOMM IMC ’07, pages 29–42. Soundarajan, S. and Hopcroft, J. (2012). Using community information to improve the precision of link prediction methods. In Proceedings of the 21st International Conference Companion on World Wide Web, WWW ’12 Companion, pages 607–608. Valverde-Rebaza, J. and Lopes, A. (2012). Link Prediction in Complex Networks Based on Cluster Information. In Advances in Artificial Intelligence - SBIA 2012, Lecture Notes in Computer Science, pages 92–101. Springer Berlin Heidelberg. Zheleva, E., Getoor, L., Golbeck, J., and Kuter,

  • U. (2010).

Using friendship ties and family circles for link prediction. In Proceedings of the Second International Conference on Advances in Social Network Mining and Analysis, SNAKDD’08, pages 97–113, Berlin, Heidelberg. Jorge Valverde-Rebaza and Alneu de Andrade Lopes Link Prediction in OSN using group information 22

slide-27
SLIDE 27

Introduction Proposal Experimental Evaluation Conclusion Conclusion

Thank you

Jorge Carlos Valverde-Rebaza

jvalverr@icmc.usp.br