Milad Eftekhar , Yashar Ganjali, Nick Koudas Introduction - - PowerPoint PPT Presentation

β–Ά
milad eftekhar yashar ganjali nick koudas introduction
SMART_READER_LITE
LIVE PREVIEW

Milad Eftekhar , Yashar Ganjali, Nick Koudas Introduction - - PowerPoint PPT Presentation

Milad Eftekhar , Yashar Ganjali, Nick Koudas Introduction Identifying the most influential individuals is a well- studied problem. We generalize this problem to identify the most influential groups . Application:


slide-1
SLIDE 1

Milad Eftekhar, Yashar Ganjali, Nick Koudas

slide-2
SLIDE 2

Introduction

  • Identifying the 𝑙 most influential individuals is a well-

studied problem.

  • We generalize this problem to identify the π‘š most

influential groups.

  • Application:
  • Companies often target groups of people
  • E.g. by billboards, TV commercials, newspaper ads, etc.

2

slide-3
SLIDE 3

Group targeting

  • Groups
  • Advantages
  • Improved performance
  • Natural targets for advertising
  • An economical choice

3

Billboard

slide-4
SLIDE 4

Fine-Grained Diffusion (FGD)

  • Determine how advertising to a group translates into

individual adopters.

  • Run individual diffusion process on these adopters.

4

slide-5
SLIDE 5

FGD Modeling

  • Graph 𝐻′: add a node for each group, add edges between

a node corresponding to a group 𝑕𝑗 and its members with weight π‘₯𝑗 that depends on

  • Advertising budget, size of group, the escalation factor, and the

budget needed to convince an individual

5

slide-6
SLIDE 6

FGD Modeling (Cont’d)

  • Escalation Factor 𝛾: how many more initial adaptors we

can get by group targeting rather than individual targeting.

6

Advertising budget $1000 Advertising budget $1000 Cost of convincing an individual $100 10 initial adopters Billboard Cost = $1000 Audience = 10000 individuals 20 initial adopters 𝛾 = 20 10 = 2 Individual Targeting Group Targeting

slide-7
SLIDE 7

FGD Modeling (Cont’d)

  • Escalation Factor 𝛾
  • Based on the problem structure, the size and shape of the network,

the initial advertising method, etc.

  • Individual advertising: 𝛾 = 1
  • Billboard advertising: 𝛾 = 200
  • Online advertising: 𝛾 = 400

7

slide-8
SLIDE 8

Problem statement

  • Goal: Find the π‘š most influential groups (blue group-

nodes)

  • NP-hard under FGD model

8

Top-2 influential groups Top-2 influential groups

slide-9
SLIDE 9

topfgd algorithm

  • Diffusion in FGD is monotone and submodular
  • topfgd: a greedy algorithm provides a (1-1/e)

approximation factor.

  • In each iteration, add the group resulting to the maximum marginal

increase in the final influence.

  • Time: 𝑃(π‘šΓ—π‘›Γ—|πΉπ‘—π‘œπ‘’|×𝑆)

9

slide-10
SLIDE 10

Coarse-Graind Diffusion (CGD)

  • FGD is not practical for large social networks
  • Idea: incorporate information about individuals without

running explicitly on the level of individuals

  • A graph to model inter-group influences

10

Group 1

slide-11
SLIDE 11

CGD Modeling

  • Differences with β€œIndividual Diffusion” models
  • No binary decisions
  • Progress fraction for each group
  • Two types of diffusion
  • Inter-group diffusion
  • Intra-group diffusion
  • Submodularity?

11

Progress fraction = 0.6

slide-12
SLIDE 12

CGD Diffusion Model

  • Each newly activated fraction of a group can activate its

neighboring groups

  • As a result of an activation attempt from A to B, some activation

attempts also occur between members of B

  • Continue for several iterations to converge

12

Group 1 Group 1

0 β†’ 0.2 0.2 0 β†’ 0.04 0.2 0.04 β†’ 0.05

slide-13
SLIDE 13

topcgd algorithm

  • Goal: Find the π‘š most influential groups
  • NP-hard under CGD model
  • Diffusion in CGD is monotone and submodular
  • topcgd: a greedy algorithm provides a (1-1/e)

approximation factor.

  • Time: 𝑃( πΉπ‘—π‘œπ‘’ + π‘›π‘š 𝑛𝑒 + π‘œ )
  • 𝑒 is the number of iterations to converge (~10)

13

slide-14
SLIDE 14

Experimental setup

  • Datasets:
  • DBLP: 800K nodes, 6.3M edges, 3200 groups
  • Comparison
  • Spend same advertising budget on all algorithms
  • Measure the final influence (the number of convinced individuals)
  • Run Individual Diffusion process on the initial convinced individuals

14

slide-15
SLIDE 15

Results

  • DBLP-1980: 8000 nodes, 69 groups
  • Compare topid vs. topfgd vs. topcgd
  • Final influence: topfgd and topcgd outperform topid for 𝛾 > 3
  • Time: topid (30 days), topfgd (an hour), topcgd (0.2 sec)

15

1000 2000 3000 4000 5000 6000 10 20 30 40 50 60 70 80 90 100

final influence Ξ²

topfgd topcgd topid

slide-16
SLIDE 16

Results (Cont’d)

  • DBLP: topcgd vs. Baselines
  • rnd, small, big, degree
  • Time of topcgd: 100 minutes
  • topfgd and topid not practical

16

10000 20000 30000 40000 50000 60000 70000 10 30 50 70 90

final influence Ξ²

topcgd degree big rnd small

slide-17
SLIDE 17

Conclusion and Future Works

  • Focus on groups rather than individuals
  • Wider diffusion
  • Improved performance
  • More less influential individuals vs. less more influential individuals
  • Although

CGD aggregates the information about individuals (hence improved performance), it results to final influence comparable to FGD.

  • We are interested in a generalized model where
  • Groups are allowed to receive different budgets
  • The cost of advertising to each group is predetermined

17

slide-18
SLIDE 18

Thanks! (Questions?)

18