Accelerating Influence Spread Estimation on Social Networks in the - PowerPoint PPT Presentation

Accelerating Influence Spread Estimation on Social Networks in the Continuous-time Domain Koushik Pal Zissis Poulos

Viral Marketing • Online social networks enable large scale word-of-mouth marketing

Word-of-Mouth Marketing Strategy Convince them to These customers Identify adopt the idea/ endorse the product influencers product among their friends ! ! ! ! ! ! ! • The Big Question: which individuals should we target initially, such that the expected number of follow-ups is maximized?

Influence Maximization in the Continuous-time Domain • Model: Continuous-time Independent Cascade [Nan Du et al, NIPS 2013] • Infection: a node adopts the opinion/product • Pairwise conditional density between nodes f ji (t j |t i ) = f ji (t i - t j ) over time – “time it takes for node i to infect node j” f(t i - t j ) (t i - t j ) 1.2 1 Sampling is Eddie to Jane required to 4 0.6 3 f(t i - t j ) 0.3 0.1 generate weights 0.2 2 (t i - t j ) Jane to Mike Slide Credit: Nan Du et al, NIPS 2013

Influence Maximization in the Continuous-time Domain • Model: Continuous-time Independent Cascade [Nan Du et al, NIPS 2013] • Infection: a node adopts the opinion/product • Pairwise conditional density between nodes f ji (t j |t i ) = f ji (t i - t j ) over time – “time it takes for node i to infect node j” f(t i - t j ) Shortest Path Property (t i - t j ) 1.2 1 Eddie to Jane 4 0.6 3 f(t i - t j ) 0.3 For a given sample, node 1 infects 0.1 node 4 after time D 14 = length of 0.2 2 (t i - t j ) shortest path between nodes 1 and 4 Jane to Mike D 14 = 0.6

Influence Maximization in the Continuous-time Domain • In reality a campaign has a strict deadline T • Role of T in spread 1.2 Infected! Not infected! 1 0.5 4 0.6 5 3 0.3 0.1 Infected nodes 0.2 2 may change per sample 0 D 14 = 0.6 D 15 = 1.1 T = 1 • Expected spread of node (set of nodes) = expected # nodes it infects

Problem Statement • “Find set S of k nodes that maximizes expected spread σ (S)” • NP-hard...but there exists a greedy 63%-approximation algorithm (Kempe et al, 2003) 1: initialize S = Ø 2: for i = 1 to k do 3: select u argmax w V\S [ σ (S U {w}) - σ (S)] 4: S S U {u} 5: end for 6: return S 1: for j = 1 to N do // N samples, ≈ 100,000 2: for all nodes not in S do // # nodes, |V| 3: enumerate shortest paths ≤ T 4: return u node with max # of such paths on avg #P-complete

Solution 1: Naïve Sampling • Follows exactly the previous pseudo-code Weight generator sample 1 sample i sample N complete independence complete independence … … w w w Σ σ (w) N

Solution 2: Cohen’s Estimator • Proposed by Nan Du et al, NIPS 2013 (ConTinEst framework) • Replace all-pairs shortest paths with Cohen’s randomized algorithm • Estimates neighborhood size (spread) per node, per sample • Faster by a O(|V|/log|V|) factor – fewer samples – speed vs. accuracy trade-off 1: for j = 1 to N do // N samples, ≈ 100,000 2: for all nodes not in S do // # nodes, |V| 3: enumerate shortest paths ≤ T 4: return u node with max # of such paths 1: for j = 1 to N do // N samples, ≈ 10,000 2: for all nodes not in S do // # nodes, |V| 3: estimate d-neighborhood with d ≤ T 4: return u node with largest neighborhood on avg

Solution 2: Cohen’s Estimator • Proposed by Nan Du et al, 2013 (ConTinEst framework) Weight generator sample i sample N sample 1 complete independence complete independence … … w w w Σ σ (w) N

Parallelization • Naïve Sampling: embarrassingly parallel ̶ complete independence across samples ̶ [100,000 … 1,000,000] samples for convergence, motivates acceleration ̶ • Cohen’s Estimator: fewer samples [10,000 … 50,000] ̶ core randomized algorithm exhibits heavy sequential dependence ̶ • Concerns: space vs speed trade-offs need to pre-generate weights (on host vs on device) ̶ balance data loads/unloads between host and device ̶ batch sampling? ̶

Data Allocation – Host Side • G(V,E): adjacency list representation O(|V|+|E|) • Edge weights: pre-generated and stored for all samples O(N*|E|) sample 1 sample 2 sample N … |E| |E| |E| • Memory intensive (2GB for small 200-node network, 1M samples) • Implement batch sampling/allocation • fix batch size to constant B such that N/B batches are passed to device

Batch Sampling with Batch Size B sample 1 sample 2 sample B sample N-B sample N-B+1 sample N … … … |E| |E| |E| |E| |E| |E| … … … spread spread spread spread spread spread batch 1 batch N/B 0.1 2 0.3 6 1 0.1 T = 0.5 0.8 5 0.2 7 0.7 3 4 0.4 1 2 3 4 5 6 7 Global Memory 4 2 1 3 2 1 1 device-to-host copy

Latency Improvements for GPU • Inherent semi-randomness causes poor memory coalescence • Adjacent threads may need to access edge weights far apart in memory • Improvement #1: Rearrange edge weight order on device memory all all edges samples sample i edge k • Improvement #2: Use 1D texture memory for read-only data (weights, topology etc) • Improvement #3: Disable L1 cache (fewer wasteful fetches)

Experimental Setup • System: • AWS GRID K520 • 3074 CUDA Cores • 8GB DDR5 • Compute Capability 3.0 • CPU: Intel Xeon E5 (Sandy Bridge) • Social Graphs: • Twitter_small | 236 nodes| 2479 edges • Google_medium | 638 nodes | 16043 edges • Twitter_big | 1049 nodes | 54555 edges • Sampling range: 100 – 100,000 samples

Results – Naïve Sampling Performance gains when using texture pipeline / read-only data cache for read-only data

Results – Naïve Sampling 9 hrs x3.5 GPU vs. CPU for Naïve Sampling

Results – Cohen’s Estimator • Smaller gains • Space complexity 38% bottleneck

Thank You!

Accelerating Influence Spread Estimation on Social Networks in the - PowerPoint PPT Presentation

Accelerating Influence Spread Estimation on Social Networks in the Continuous-time Domain Koushik Pal Zissis Poulos Viral Marketing Online social networks enable large scale word-of-mouth marketing Word-of-Mouth Marketing Strategy

Spread Spectrum Concept Frequency Hopping Spread Spectrum Direct Sequence Spread

Social Media and Social Influence Nihar Shah Peter Tu # cs286r 7 November 2012 Nihar Shah

Social influence Conformity Informational influence Influence that produces conformity when a

Maximizing the Spread of In Influence through a Social Network David Kempe, Jon Kleinberg, va

On social influence, topics, and communities Francesco Bonchi www.francescobonchi.com Plan of

Maximizing the Spread of Influence through a Social Network Han Wang Department of Computer

Maximizing the Spread of Maximizing the Spread of I nfluence through a Social I nfluence through

Tracking the spread of Tracking the spread of insecticide resistance in insecticide resistance

Some curves are flattening whilst others are steepening 60 90 40 10s 30s spread 2s 5s spread

Media Diffusion: Cascading Behavior in Networks Epidemic Spread Influence Maximization 1

Media Cascading Behavior in Networks Epidemic Spread Influence Maximization Introduction

INFLUENCE OF LEAD ON ORGANO - INFLUENCE OF LEAD ON ORGANO- - INFLUENCE OF LEAD ON ORGANO

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

Ideas worth spreading: How does network position influence the spread of research topics?

Social Influence Analysis in Social Netw orking Big Data: Opportunities and Challenges

MLSE Channel Estimation MLSE Channel Estimation MLSE Channel Estimation Parametric or Non-

Lets Meet Our Client Your client, the Michigan Mighties, is a struggling NFL football

You need to think more. @annaqcheng Growth Marketing My name is Anna Cheng. I do growth for

Using GIS to Manage Work Progra Using GIS to Manage Work Program Presented by Corey Webb, D3 GIS

S-100 Test Environment Presented by KHOA International Hydrographic Organization S-100WG-4,

Social Media and YOU The Benefits! Presented by Marci Rosenblum Social Media Marketing

Make learning awesome Q2 2020 Trading update - Please see notice to market for all details July 2

Investor Presentation June 2018 Forward Looking Statements 2 This presentation contains

1 Winner :The Best Brand Performance on Instagram 2017 Top 3 :The Best Brand Performance 2017