Inference in OSNs via Lightweight Partial Crawls Jithin K. - PowerPoint PPT Presentation

Estimator Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 10

Estimator Key property of tours: Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 10

Estimator Length of 𝑙 th tour True value of the contracted graph Key property of tours: 𝑔 𝑣, 𝑤 ∶= 𝑕(𝑣, 𝑤) Samples in 𝑙 th tour Degree of super-node except when 𝑣 or 𝑤 is 𝑇 𝑜 Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 10

Estimator Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 11

Estimator  Unbiased (unlike asymptotic in [Ribeiro and Towsley ‘10]) Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 11

Estimator  Unbiased (unlike asymptotic in [Ribeiro and Towsley ‘10])  Strongly consistent Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 11

Estimator  Unbiased (unlike asymptotic in [Ribeiro and Towsley ‘10])  Strongly consistent Confidence interval Sampled variance Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 11

Bayesian formulation Find a posterior probability distribution with suitable prior distribution Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 12

Bayesian formulation (contd.) Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 13

Simulations on real-world networks Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 14

Simulations on real-world networks Dogster network: Online social network for dogs ? Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 14

Simulations on real-world networks: Dogster network Percentage of graph covered: 2.72% (edges), 14.86% (nodes) 415K nodes, 8.27M edges 15

Simulations on real-world networks: Dogster network Percentage of graph covered: 2.72% (edges), 14.86% (nodes) 415K nodes, 8.27M edges Estimated value 15

Simulations on real-world networks: Friendster network 64K nodes, 1.25M edges Percentage of graph covered: 7.43% (edges), 18.52% (nodes) Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 16

Simulations on real-world networks: Friendster network 64K nodes, 1.25M edges Percentage of graph covered: 7.43% (edges), 18.52% (nodes) Estimated value Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 16

Simulations on real-world networks: Friendster network 64K nodes, 1.25M edges Percentage of graph covered: 7.43% (edges), 18.52% (nodes) Estimated value Estimated value Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 16

Simulations on real-world networks: ADD Health data A friendship network among high school students in USA 1545 nodes, 4003 edges Percentage of graph covered: 10.87% (edges), 19.76% (nodes) Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 17

Simulations on real-world networks: ADD Health data A friendship network among high school students in USA 1545 nodes, 4003 edges Percentage of graph covered: 10.87% (edges), 19.76% (nodes) Estimated value Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 17

Simulations on real-world networks: ADD Health data A friendship network among high school students in USA 1545 nodes, 4003 edges Percentage of graph covered: 10.87% (edges), 19.76% (nodes) Estimated value Estimated value Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 17

What if the super- node is not that “super”? Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 18

What if the super- node is not that “super”? Adaptive crawler: super-node gets bigger as crawling progresses Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 18

What if the super- node is not that “super”? Adaptive crawler: super-node gets bigger as crawling progresses How to add nodes to super-node: Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 18

What if the super- node is not that “super”? Adaptive crawler: super-node gets bigger as crawling progresses How to add nodes to super-node:  via any method as long as independent of already observed tours Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 18

What if the super- node is not that “super”? Adaptive crawler: super-node gets bigger as crawling progresses How to add nodes to super-node:  via any method as long as independent of already observed tours  Emulates retrospectively adding new node 𝑗 into super-node from the start Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 18

What if the super- node is not that “super”? Adaptive crawler: super-node gets bigger as crawling progresses How to add nodes to super-node:  via any method as long as independent of already observed tours  Emulates retrospectively adding new node 𝑗 into super-node from the start  Checks previous tours. Breaks them when 𝑗 is found. Tour 4 Tour 2 Tour 3 Tour 1 Original tour: : node 𝑗 ……. sample 2 sample 1 sample 𝑙 = 𝑇 𝑜 Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 18

What if the super- node is not that “super”? Adaptive crawler: super-node gets bigger as crawling progresses How to add nodes to super-node:  via any method as long as independent of already observed tours  Emulates retrospectively adding new node 𝑗 into super-node from the start  Checks previous tours. Breaks them when 𝑗 is found.  Start 𝑙 new tours from newly added node 𝑗 ; k ~ negative Binomial distribution (function of degrees of 𝑗, and no of tours) b a d “Correction” tours from 𝒋 : e h f Start at 𝑗 , end in 𝑗 or 𝑇 4 i l r n m p Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 18

What if the super- node is not that “super”? Adaptive crawler: super-node gets bigger as crawling progresses How to add nodes to super-node:  via any method as long as independent of already observed tours  Emulates retrospectively adding new node 𝑗 into super-node from the start  Checks previous tours. Breaks them when 𝑗 is found.  Start 𝑙 new tours from newly added node 𝑗 ; k ~ negative Binomial distribution (function of degrees of 𝑗, and no of tours) Theorem Dynamic and static super-node sample paths are equivalent in distribution Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 18

From metric 𝜈(𝐻) does network look random ? Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 19

Estimation and hypothesis testing in Chung-Lu or configuration model Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 20

Estimation and hypothesis testing in Chung-Lu or configuration model Assumption: edges labels can be written as a function of node labels Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 20

Estimation and hypothesis testing in Chung-Lu or configuration model Assumption: edges labels can be written as a function of node labels  Does the true value of the given graph belongs to the class of values when the edges are formed purely at random? Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 20

Inference in OSNs via Lightweight Partial Crawls Jithin K. - PowerPoint PPT Presentation

Inference in OSNs via Lightweight Partial Crawls Jithin K. Sreedharan Inria, France Konstantin Avrachenkov Bruno Ribeiro Inria, France Purdue University, USA Sigmetrics 2016, June 16 Motivation Estimation and inference in Online Social

Large Crawls of the Web for Linguistic Purposes Marco Baroni SSLMIT, University of Bologna

Overview Partial Constituent Fronting in German The phenomenon: Partial constituent fronting

The lightweight beam for Heavyweight applications The impact of this lightweight beam concept

The lightweight beam for Heavyweight applications The impact of this lightweight steel beam will

Its time to Think Lightweight! www.thinklightweight.com TO D A Y S TO P IC S 1.

Lightweight Cryptography and and RFID Security Svetla Nikova COSIC KUL COSIC, KULeuven and

Grape growing crawls to the North Situation in Finland Ari Markkula ari.markkula@omenakumpu.com

Google Hacking 19 September 2013 Updated August 2015 #s Google's cache is over 95 Petabytes

Some Issues related to the Mining of OSNs represented as Graphs (presentation) Conference Paper

The Effects of Restrictions on Number of Connections in OSNs A Case-Study on Twitter Saptarshi

inDecentralizedOnlineSocialNetworks OnlineSocialNetworks(OSNs)

Partial Functions and Categories of Partial Maps Science Atlantic at Acadia University Darien

Partial Orders on the integers. In this case ( a , b ) R if a b . a a so R is reflexive. a b

JUST THE MATHS SLIDES NUMBER 14.1 PARTIAL DIFFERENTIATION 1 (Partial derivatives of the

The Semantics of Partial Model Introduction Transformations Partial Models Transforming

Exploiting Modern Hardware Features via Lightweight Profiling Probir Roy Scalable Tools

Web Content Mining Dr. Ahmed Rafea Outline Introduction The Web: Opportunities &

Mining Second Life: Characterizing User Mobility in a Popular Virtual World Chi-Anh La - Pietro

jk: Using Dynamic Analysis to Crawl and Test Modern Web Applications Giancarlo Pellegrino (1) ,

Focussed Web Crawling Using RL Searching web for pages relevant to a specific subject No

CRAWLING WIT ITH Deeksha Kushal Motwani APACHE NUTCH Shailender Joseph Web-Crawling Apache

* A new open source language * A concurrent garbage collected language * Builds large programs

Machine Learning: A Promising Direction for Web Tracking Countermeasures Jason Bau, Jonathan

Twi$erEcho : a Distributed Focused Crawler to Support Open

Sambuz

Useful Links

Newsletter

Mail Us

Inference in OSNs via Lightweight Partial Crawls Jithin K. - PowerPoint PPT Presentation

Inference in OSNs via Lightweight Partial Crawls Jithin K. Sreedharan Inria, France Konstantin Avrachenkov Bruno Ribeiro Inria, France Purdue University, USA Sigmetrics 2016, June 16 Motivation Estimation and inference in Online Social

Large Crawls of the Web for Linguistic Purposes Marco Baroni SSLMIT, University of Bologna

Overview Partial Constituent Fronting in German The phenomenon: Partial constituent fronting

The lightweight beam for Heavyweight applications The impact of this lightweight beam concept

The lightweight beam for Heavyweight applications The impact of this lightweight steel beam will

Its time to Think Lightweight! www.thinklightweight.com TO D A Y S TO P IC S 1.

Lightweight Cryptography and and RFID Security Svetla Nikova COSIC KUL COSIC, KULeuven and

Grape growing crawls to the North Situation in Finland Ari Markkula ari.markkula@omenakumpu.com

Google Hacking 19 September 2013 Updated August 2015 #s Google's cache is over 95 Petabytes

Some Issues related to the Mining of OSNs represented as Graphs (presentation) Conference Paper

The Effects of Restrictions on Number of Connections in OSNs A Case-Study on Twitter Saptarshi

inDecentralizedOnlineSocialNetworks OnlineSocialNetworks(OSNs)

Partial Functions and Categories of Partial Maps Science Atlantic at Acadia University Darien

Partial Orders on the integers. In this case ( a , b ) R if a b . a a so R is reflexive. a b

JUST THE MATHS SLIDES NUMBER 14.1 PARTIAL DIFFERENTIATION 1 (Partial derivatives of the

The Semantics of Partial Model Introduction Transformations Partial Models Transforming

Exploiting Modern Hardware Features via Lightweight Profiling Probir Roy Scalable Tools

Web Content Mining Dr. Ahmed Rafea Outline Introduction The Web: Opportunities &amp;

Mining Second Life: Characterizing User Mobility in a Popular Virtual World Chi-Anh La - Pietro

jk: Using Dynamic Analysis to Crawl and Test Modern Web Applications Giancarlo Pellegrino (1) ,

Focussed Web Crawling Using RL Searching web for pages relevant to a specific subject No

CRAWLING WIT ITH Deeksha Kushal Motwani APACHE NUTCH Shailender Joseph Web-Crawling Apache

* A new open source language * A concurrent garbage collected language * Builds large programs

Machine Learning: A Promising Direction for Web Tracking Countermeasures Jason Bau, Jonathan

Twi$erEcho : a Distributed Focused Crawler to Support Open

Sambuz

Useful Links

Newsletter

Mail Us

Web Content Mining Dr. Ahmed Rafea Outline Introduction The Web: Opportunities &