Quick detection of popular entities in large on-line networks Nelly - PowerPoint PPT Presentation

Quick detection of popular entities in large on-line networks Nelly Litvak University of Twente, Stochastic Operations Research group Joint work with K. Avrachenkov (INRIA), L. Ostroumova (Yandex) Luchon 24-06-2014

Finding largest nodes in large complex networks ◮ Complex networks: Internet, World Wide Web, social networks, protein-protein interactions, citation networks. [ Nelly Litvak, 24-06-2014 ] 2/28

Finding largest nodes in large complex networks ◮ Complex networks: Internet, World Wide Web, social networks, protein-protein interactions, citation networks. ◮ Many networks are very large. [ Nelly Litvak, 24-06-2014 ] 2/28

Finding largest nodes in large complex networks ◮ Complex networks: Internet, World Wide Web, social networks, protein-protein interactions, citation networks. ◮ Many networks are very large. ◮ Facebook has more than 1 billion users. With an average user having 190 friends, the number of social links in Facebook is 190 billion. ◮ The static part of the web graph has more than 10 billion pages. With an average number of 38 hyper-links per page, the total number of hyper-links is 380 billion. [ Nelly Litvak, 24-06-2014 ] 2/28

Finding top-k largest degree nodes ◮ Goal: Find top- k network nodes with largest degrees [ Nelly Litvak, 24-06-2014 ] 3/28

Finding top-k largest degree nodes ◮ Goal: Find top- k network nodes with largest degrees ◮ Some applications: ◮ Routing via large degree nodes ◮ Proxy for various centrality measures ◮ Node clustering and classification ◮ Epidemic processes on networks ◮ Finding most popular entities (e.g. interest groups) [ Nelly Litvak, 24-06-2014 ] 3/28

Finding top-k largest degree nodes ◮ Goal: Find top- k network nodes with largest degrees ◮ Some applications: ◮ Routing via large degree nodes ◮ Proxy for various centrality measures ◮ Node clustering and classification ◮ Epidemic processes on networks ◮ Finding most popular entities (e.g. interest groups) ◮ It is simply interesting! [ Nelly Litvak, 24-06-2014 ] 3/28

Top-k largest degree nodes If the adjacency list of the network is known... the top- k list of nodes can be found by the HeapSort with complexity O ( N + klog ( N )) , where N is the total number of nodes. Even this modest complexity can be demanding for large networks. [ Nelly Litvak, 24-06-2014 ] 4/28

Top-k largest degree nodes If the adjacency list of the network is known... the top- k list of nodes can be found by the HeapSort with complexity O ( N + klog ( N )) , where N is the total number of nodes. Even this modest complexity can be demanding for large networks. Questions: ◮ How to do this faster? [ Nelly Litvak, 24-06-2014 ] 4/28

Top-k largest degree nodes If the adjacency list of the network is known... the top- k list of nodes can be found by the HeapSort with complexity O ( N + klog ( N )) , where N is the total number of nodes. Even this modest complexity can be demanding for large networks. Questions: ◮ How to do this faster? ◮ How to do it when the network structure is not known (cannot be crawled without restrictions or stored in the memory)? [ Nelly Litvak, 24-06-2014 ] 4/28

Top-k largest degree nodes If the adjacency list of the network is known... the top- k list of nodes can be found by the HeapSort with complexity O ( N + klog ( N )) , where N is the total number of nodes. Even this modest complexity can be demanding for large networks. Questions: ◮ How to do this faster? ◮ How to do it when the network structure is not known (cannot be crawled without restrictions or stored in the memory)? Answer: Randomized algorithms. [ Nelly Litvak, 24-06-2014 ] 4/28

Top-k largest degree nodes If the adjacency list of the network is known... the top- k list of nodes can be found by the HeapSort with complexity O ( N + klog ( N )) , where N is the total number of nodes. Even this modest complexity can be demanding for large networks. Questions: ◮ How to do this faster? ◮ How to do it when the network structure is not known (cannot be crawled without restrictions or stored in the memory)? Answer: Randomized algorithms. Idea: Find a ‘good enough’ answer in a short time. [ Nelly Litvak, 24-06-2014 ] 4/28

Top-k largest degree nodes If the adjacency list of the network is known... the top- k list of nodes can be found by the HeapSort with complexity O ( N + klog ( N )) , where N is the total number of nodes. Even this modest complexity can be demanding for large networks. Questions: ◮ How to do this faster? ◮ How to do it when the network structure is not known (cannot be crawled without restrictions or stored in the memory)? Answer: Randomized algorithms. Idea: Find a ‘good enough’ answer in a short time. Avrachenkov, L, Sokol, Towsley (2012); Cooper, Radzik, Siantos (2012), Borgs, Brautbar, Chayes, Khanna, Lucier (2012), Brautbar and Kearns (2010), Kumar, Lang, Marlow, Tomkins (2008) [ Nelly Litvak, 24-06-2014 ] 4/28

Finding most popular entities in directed on-line social networks ◮ Social networks are large [ Nelly Litvak, 24-06-2014 ] 5/28

Finding most popular entities in directed on-line social networks ◮ Social networks are large ◮ The complete graphs structure is only available to the owners [ Nelly Litvak, 24-06-2014 ] 5/28

Finding most popular entities in directed on-line social networks ◮ Social networks are large ◮ The complete graphs structure is only available to the owners ◮ Many companies maintain network statistics ( twittercounter.com , followerwonk.com , twitaholic.com , www.insidefacebook.com , yavkontakte.ru ) [ Nelly Litvak, 24-06-2014 ] 5/28

Finding most popular entities in directed on-line social networks ◮ Social networks are large ◮ The complete graphs structure is only available to the owners ◮ Many companies maintain network statistics ( twittercounter.com , followerwonk.com , twitaholic.com , www.insidefacebook.com , yavkontakte.ru ) ◮ The network can be accessed only via API, with limited access [ Nelly Litvak, 24-06-2014 ] 5/28

Finding most popular entities in directed on-line social networks ◮ Social networks are large ◮ The complete graphs structure is only available to the owners ◮ Many companies maintain network statistics ( twittercounter.com , followerwonk.com , twitaholic.com , www.insidefacebook.com , yavkontakte.ru ) ◮ The network can be accessed only via API, with limited access ◮ Twitter API allows one access per minute. We need 950 years to crawl the current Twitter graph! [ Nelly Litvak, 24-06-2014 ] 5/28

Finding most popular entities in directed on-line social networks ◮ Social networks are large ◮ The complete graphs structure is only available to the owners ◮ Many companies maintain network statistics ( twittercounter.com , followerwonk.com , twitaholic.com , www.insidefacebook.com , yavkontakte.ru ) ◮ The network can be accessed only via API, with limited access ◮ Twitter API allows one access per minute. We need 950 years to crawl the current Twitter graph! Goal: Find top- k most popular entities in social (directed) networks (nodes with highest in/out-degrees, largest interest groups, largest user categories), using the minimal number of API requests. [ Nelly Litvak, 24-06-2014 ] 5/28

Problem formulation ◮ Consider a bi-partite graph ( V , W , E ) ◮ V and W are sets of entities, | V | = M , | W | = N . ◮ A directed edge ( v , w ) ∈ E represents a relation between v ∈ V and w ∈ W . ◮ Goal: Quickly find entities in W with highest degrees. [ Nelly Litvak, 24-06-2014 ] 6/28

Problem formulation ◮ Consider a bi-partite graph ( V , W , E ) ◮ V and W are sets of entities, | V | = M , | W | = N . ◮ A directed edge ( v , w ) ∈ E represents a relation between v ∈ V and w ∈ W . ◮ Goal: Quickly find entities in W with highest degrees. Example. V = W is a set of Twit- ter users, ( v , w ) means that v fol- lows w . Example. V is a set of users, W is a set of interest groups, ( v , w ) means that user v is a member of an interest group w . [ Nelly Litvak, 24-06-2014 ] 6/28

Algorithm for finding top- k most popular entities Algorithm for finding top- k most popular entities 1 Choose a set A ⊂ V of n 1 nodes sampled from V at random. 2 For each v ∈ A retrieve the id’s of nodes in W that have an edge from v . 3 Compute S w – the number of edges of w ∈ W from A . 4 Retrieve the actual degrees for the n 2 nodes w with the largest values of S w . 5 Return the identified top- k list of most popular entities in W . In total, we use n = n 1 + n 2 requests to API (Step 2 and Step 4). [ Nelly Litvak, 24-06-2014 ] 7/28

Finding most followed users on Twitter ◮ Huge network (more than 500M users) [ Nelly Litvak, 24-06-2014 ] 8/28

Finding most followed users on Twitter ◮ Huge network (more than 500M users) ◮ Network accessed only through Twitter API [ Nelly Litvak, 24-06-2014 ] 8/28

Finding most followed users on Twitter ◮ Huge network (more than 500M users) ◮ Network accessed only through Twitter API ◮ The rate of requests is limited ◮ One request: ◮ ID’s of at most 5000 followers of a node, or ◮ the number of followers of a node ◮ In a randomly chosen set of n 1 Twitter users only a few users follow more than 5000 people. Thus, we retrieve at most 5000 followees of each node. This does not affect the results. [ Nelly Litvak, 24-06-2014 ] 8/28

Quick detection of popular entities in large on-line networks Nelly - PowerPoint PPT Presentation

Quick detection of popular entities in large on-line networks Nelly Litvak University of Twente, Stochastic Operations Research group Joint work with K. Avrachenkov (INRIA), L. Ostroumova (Yandex) Luchon 24-06-2014 Finding largest nodes in

The Slope of a Line The Slope of a Line The Slope of a Line The Slope of a Line The Slope of a

Title Slide Math 696 Class July 19, 2002 Line 1 Line 2 Line 3 Line 4 Line 5 Line 6 Line 7

Sorting Chapter 7 1 Quick Sort One of the most popular fast sorting algorithms Quick sort

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

XML and Databases Chapter 2: XML II: Entities and Marked Sections Prof. Dr. Stefan Brass

X-Line 101 June 2019 X-Line 101 X-Line Unit Overview What makes X-Line unique X-Line 101

Printout Tuesday, October 29, 2019 7:38 PM Quick Notes Page 1 Quick Notes Page 2 Quick Notes

Router Architectures CPU CPU Memory Memory packets NFE NFE Processor Processor Line Card

Hollywood Science Hollywood Science Week 2: Science in Popular Culture A quick recap A quick

Your Faith: A Popular Presentation of Catholic Belief Your Faith: A Popular Presentation of

Clojure: What Just Happened? Rich Hickey Clojure is Becoming Popular Popular*

Low Level Low Level Low Level Low Level Detection of Detection of Detection of Detection of

5.3.9 Line detection by local pre-processing operators Special local operators: line finding

QUICK INTRODUCTION People call me GONZ QUICK INTRODUCTION 1. Never went to Art School

Hartford Line: A New Model for Intercity Passenger Rail 1 Hartford Line Service 2 Hartford

Coupling On-line and Off-line Random Graphs Woojin Kim March 1st Introduction Preliminary

Directed Polymers in Random Environment with Heavy Tails A. Auffinger O. Louidor Courant (New

Zeros of random analytic functions and extreme value theory Zakhar Kabluchko University of Ulm

Implicit Extremes and Implicit MaxStable Laws Stilian Stoev ( sstoev@umich.edu ) University of

Extreme values for diffusion in random media Ivan Corwin Columbia University From pollen to

Extreme Event Modelling Zhou Introduction Theory and Liwei Wu Methods Asymptotic Supervisor:

Measuring PDFs by QCD fitting Jon Pumplin PDF School (DESY 2023 October 2009) Hadrons

On the nature of financial risk: Why risk is so hard to measure and why risk models fail so often

Game Theory 1 A game has two players, A and B and a matrix . This is called a a ij