Data science for networked data Po-Ling Loh University of - - PowerPoint PPT Presentation

data science for networked data
SMART_READER_LITE
LIVE PREVIEW

Data science for networked data Po-Ling Loh University of - - PowerPoint PPT Presentation

Data science for networked data Po-Ling Loh University of Wisconsin-Madison Department of Statistics AISTATS Okinawa, Japan April 16, 2019 Joint work with: Justin Khim (UPenn), Varun Jog (UW-Madison), Ashley Hou (UW-Madison), Wen Yan


slide-1
SLIDE 1

Data science for networked data

Po-Ling Loh

University of Wisconsin-Madison Department of Statistics

AISTATS Okinawa, Japan April 16, 2019 Joint work with: Justin Khim (UPenn), Varun Jog (UW-Madison), Ashley Hou (UW-Madison), Wen Yan (Southeast University), and Muni Pydi (UW-Madison)

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 1 / 45

slide-2
SLIDE 2

Key problems in network modeling

1 Given data from a network, how do we estimate the network? 2 How do we model dynamic processes over a network? 3 How do we perform efficient search over a network? Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 2 / 45

slide-3
SLIDE 3

Key problems in network modeling

1 Given data from a network, how do we estimate the network? 2 How do we model dynamic processes over a network? 3 How do we perform efficient search over a network? Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 2 / 45

slide-4
SLIDE 4

Key problems in network modeling

1 Given data from a network, how do we estimate the network? 2 How do we model dynamic processes over a network? 3 How do we perform efficient search over a network? Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 2 / 45

slide-5
SLIDE 5

Key problems in network modeling

1 Given data from a network, how do we estimate the network? 2 How do we model dynamic processes over a network? 3 How do we perform efficient search over a network? Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 2 / 45

slide-6
SLIDE 6

Key problems in network modeling

1 Given data from a network, how do we estimate the network? 2 How do we model dynamic processes over a network? 3 How do we perform efficient search over a network? Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 2 / 45

slide-7
SLIDE 7

Key problems in network modeling

1 Given data from a network, how do we estimate the network? 2 How do we model dynamic processes over a network? 3 How do we perform efficient search over a network? Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 2 / 45

slide-8
SLIDE 8

Key problems in network modeling

1 Given data from a network, how do we estimate the network? 2 How do we model dynamic processes over a network? 3 How do we perform efficient search over a network? Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 2 / 45

slide-9
SLIDE 9

Key problems in network modeling

1 Given data from a network, how do we estimate the network? 2 How do we model dynamic processes over a network? 3 How do we perform efficient search over a network? Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 2 / 45

slide-10
SLIDE 10

Key problems in network modeling

1 Given data from a network, how do we estimate the network? 2 How do we model dynamic processes over a network? 3 How do we perform efficient search over a network? Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 2 / 45

slide-11
SLIDE 11

Key problems in network modeling

1 Given data from a network, how do we estimate the network? 2 How do we model dynamic processes over a network? 3 How do we perform efficient search over a network? Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 2 / 45

slide-12
SLIDE 12

Prelude: Network estimation

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 3 / 45

slide-13
SLIDE 13

Graphical models

Method for constructing connectivity network from matrix of data

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 4 / 45

slide-14
SLIDE 14

Graphical models

Method for constructing connectivity network from matrix of data gene expression (mRNA) data

  • E. coli network

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 4 / 45

slide-15
SLIDE 15

Graphical models

Method for constructing connectivity network from matrix of data fMRI/EEG readings “functional connectivity” network

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 4 / 45

slide-16
SLIDE 16

Graphical models

Mathematical analysis derived for Gaussian data In practice, transform data to Gaussian before applying algorithm

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 5 / 45

slide-17
SLIDE 17

Graphical models

But not all data are transformable!

??

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 6 / 45

slide-18
SLIDE 18

Graphical models

But not all data are transformable!

??

We have developed new methods for estimating graphical models for discrete (count) data

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 6 / 45

slide-19
SLIDE 19

Graphical models

But not all data are transformable!

??

We have developed new methods for estimating graphical models for discrete (count) data However, life is more than network estimation. . .

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 6 / 45

slide-20
SLIDE 20

Outline

1

Statistical inference Confidence sets for source estimation Graph hypothesis testing

2

Resource allocation Influence maximization Budget allocation Network immunization

3

Local algorithms

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 7 / 45

slide-21
SLIDE 21

Statistical inference

Justin Khim (UPenn)

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 8 / 45

slide-22
SLIDE 22

Source estimation

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 9 / 45

slide-23
SLIDE 23

Source estimation

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 10 / 45

slide-24
SLIDE 24

Source estimation

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 11 / 45

slide-25
SLIDE 25

Source estimation

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 12 / 45

slide-26
SLIDE 26

Source estimation

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 13 / 45

slide-27
SLIDE 27

Source estimation

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 14 / 45

slide-28
SLIDE 28

Source estimation

?

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 15 / 45

slide-29
SLIDE 29

Confidence sets

Instead: Find a confidence set that includes root node with probability at least 1 − ǫ

1 −

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 16 / 45

slide-30
SLIDE 30

Confidence sets

Question: How does size of confidence set grow with number of infected nodes n?

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 17 / 45

slide-31
SLIDE 31

Confidence sets

It doesn’t!

1 −

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 18 / 45

slide-32
SLIDE 32

Confidence sets

It doesn’t!

1 −

Rough interpretation: No ”information loss” about source as disease spreads

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 18 / 45

slide-33
SLIDE 33

Inference algorithm

Select nodes that are most “central” to network of infected individuals

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 19 / 45

slide-34
SLIDE 34

Inference algorithm

For each node, compute “min-max subtree size”

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 20 / 45

slide-35
SLIDE 35

Inference algorithm

For each node, compute “min-max subtree size”

u

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 20 / 45

slide-36
SLIDE 36

Inference algorithm

For each node, compute “min-max subtree size”

8

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 20 / 45

slide-37
SLIDE 37

Inference algorithm

For each node, compute “min-max subtree size”

8 14 14 11 17 17 17 17 15 16

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 20 / 45

slide-38
SLIDE 38

Inference algorithm

Select K(ǫ) nodes with smallest values

8 14 14 11 17 17 17 17 15 16

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 21 / 45

slide-39
SLIDE 39

Theory for confidence sets

Theorem

Suppose d ≥ 3. Then the min-max subtree estimator with Kψ(ǫ) = C(d)

ǫ

yields a 1 − ǫ confidence set for the root.

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 22 / 45

slide-40
SLIDE 40

Theory for confidence sets

Theorem

Suppose d ≥ 3. Then the min-max subtree estimator with Kψ(ǫ) = C(d)

ǫ

yields a 1 − ǫ confidence set for the root. Note: Cannot construct finite confidence set for d = 2; need set of size K = Θ(√n)

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 22 / 45

slide-41
SLIDE 41

Extensions and open directions

Similar result holds for broader class of “regular” trees Robustness: Confidence set eventually settles down after finitely many steps

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 23 / 45

slide-42
SLIDE 42

Extensions and open directions

Similar result holds for broader class of “regular” trees Robustness: Confidence set eventually settles down after finitely many steps Open directions: What if underlying graph is not a tree? What if network is asymmetric? What if nodes can heal?

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 23 / 45

slide-43
SLIDE 43

Graph testing

vs. vs. Question: Can we use epidemic data to infer network structure?

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 24 / 45

slide-44
SLIDE 44

Graph testing

vs. vs. Question: Can we use epidemic data to infer network structure? vs. vs.

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 24 / 45

slide-45
SLIDE 45

Graph testing

Observations: Infection status of n nodes in graph

k infected nodes (1) c censored (nonreporting) nodes (⋆) n − k − c uninfected nodes (0)

vs. vs.

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 25 / 45

slide-46
SLIDE 46

Graph testing

vs. vs. H0 H1 H2 T = 10 T = 0 T = 3 Compute test statistic T = # edges between infected nodes

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 26 / 45

slide-47
SLIDE 47

Graph testing

vs. vs. H0 H1 H2 T = 10 T = 0 T = 3 Compute test statistic T = # edges between infected nodes Need to construct proper rejection rule based on T, derive validity of hypothesis test

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 26 / 45

slide-48
SLIDE 48

Infection model

Parameters λ, η

For each node v, generate Tv ∼ Exp(λ) For each edge (u, v), generate Tuv ∼ Exp(η)

Infection time of any vertex v is tv = minu∈N(v){tu + Tuv} ∧ Tv

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 27 / 45

slide-49
SLIDE 49

Infection model

Parameters λ, η

For each node v, generate Tv ∼ Exp(λ) For each edge (u, v), generate Tuv ∼ Exp(η)

Infection time of any vertex v is tv = minu∈N(v){tu + Tuv} ∧ Tv Observation vector corresponds to infection states at a certain time Subset of censored nodes chosen uniformly at random

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 27 / 45

slide-50
SLIDE 50

Permutation test

Goal: For α ∈ (0, 1), construct rejection rule such that P(reject | H0 is true) ≤ α

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 28 / 45

slide-51
SLIDE 51

Permutation test

Goal: For α ∈ (0, 1), construct rejection rule such that P(reject | H0 is true) ≤ α Use permutation test that computes T for

  • n

k,c,n−k−c

  • reassignments
  • f infected/nonreporting/uninfected nodes

H1 T = 4 T = 4 T = 4 T = 0

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 28 / 45

slide-52
SLIDE 52

Permutation test

Goal: For α ∈ (0, 1), construct rejection rule such that P(reject | H0 is true) ≤ α Use permutation test that computes T for

  • n

k,c,n−k−c

  • reassignments
  • f infected/nonreporting/uninfected nodes

H1 T = 4 T = 4 T = 4 T = 0 Based on (randomly chosen) permutations, compute p-value/rejection region and reject H0 if (p-value of T) ≤ α

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 28 / 45

slide-53
SLIDE 53

Permutation test

T(I)

reject H0 do not reject H0 α

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 29 / 45

slide-54
SLIDE 54

Permutation test

T(I)

reject H0 do not reject H0 α

In practice, sufficient to compute empirical distribution from large number of random permutations

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 29 / 45

slide-55
SLIDE 55

Theory for permutation test

Success depends on symmetries of underlying networks rather than parameters λ, η Consider Π0 = Aut(G0) and Π1 = Aut(G1), subsets of Sn

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 30 / 45

slide-56
SLIDE 56

Theory for permutation test

Success depends on symmetries of underlying networks rather than parameters λ, η Consider Π0 = Aut(G0) and Π1 = Aut(G1), subsets of Sn π 2 Aut(G)

1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 30 / 45

slide-57
SLIDE 57

Theory for permutation test

Success depends on symmetries of underlying networks rather than parameters λ, η Consider Π0 = Aut(G0) and Π1 = Aut(G1), subsets of Sn π 2 Aut(G)

1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8

Theorem

Let π be drawn uniformly from Sn. If Π1Π0 = Sn, the permutation test controls Type I error at level α.

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 30 / 45

slide-58
SLIDE 58

Extensions and open directions

Characterization of condition Π1Π0 = Sn for various graph families Bounds on Type II error for specific graphs Conditioning on identity of censored nodes

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 31 / 45

slide-59
SLIDE 59

Extensions and open directions

Characterization of condition Π1Π0 = Sn for various graph families Bounds on Type II error for specific graphs Conditioning on identity of censored nodes Open directions: How to identify which graphs to use as null/alternative hypotheses? Inhomogeneous λ and η? Confidence sets for underlying network?

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 31 / 45

slide-60
SLIDE 60

Resource allocation

?

Justin Khim Varun Jog Ashley Hou Wen Yan (UPenn) (UW-Madison) (UW-Madison) (Southeast University)

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 32 / 45

slide-61
SLIDE 61

Influence maximization (with Justin Khim and Varun Jog)

New goal: Seed a network to “infect” as many nodes as possible Useful for information dissemination, marketing, etc. t = 0

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 33 / 45

slide-62
SLIDE 62

Influence maximization (with Justin Khim and Varun Jog)

New goal: Seed a network to “infect” as many nodes as possible Useful for information dissemination, marketing, etc. t = 0 t = 1

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 33 / 45

slide-63
SLIDE 63

Influence maximization (with Justin Khim and Varun Jog)

New goal: Seed a network to “infect” as many nodes as possible Useful for information dissemination, marketing, etc. t = 0 t = 1 t = 2

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 33 / 45

slide-64
SLIDE 64

Influence maximization (with Justin Khim and Varun Jog)

New goal: Seed a network to “infect” as many nodes as possible Useful for information dissemination, marketing, etc. t = 0 t = 1 t = 2

Questions

1 If k nodes may be infected initially, which nodes should be selected to

maximize infection spread?

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 33 / 45

slide-65
SLIDE 65

Influence maximization (with Justin Khim and Varun Jog)

New goal: Seed a network to “infect” as many nodes as possible Useful for information dissemination, marketing, etc. t = 0 t = 1 t = 2

Questions

1 If k nodes may be infected initially, which nodes should be selected to

maximize infection spread?

2 How to determine maximal set efficiently? Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 33 / 45

slide-66
SLIDE 66

Model: Linear threshold model (broadly, triggering models)

Edges have weights (bij), satisfying

j bji ≤ 1

Nodes choose thresholds θi ∈ [0, 1] i.i.d., uniformly at random

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 34 / 45

slide-67
SLIDE 67

Model: Linear threshold model (broadly, triggering models)

Edges have weights (bij), satisfying

j bji ≤ 1

Nodes choose thresholds θi ∈ [0, 1] i.i.d., uniformly at random

0.5 0.6 0.2 0.1 0.4 0.3 0.7 0.9 0.4

t = 0 On each round, uninfected nodes compute total weight of infected neighbors and become infected if

  • j is infected

bji > θi

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 34 / 45

slide-68
SLIDE 68

Model: Linear threshold model (broadly, triggering models)

Edges have weights (bij), satisfying

j bji ≤ 1

Nodes choose thresholds θi ∈ [0, 1] i.i.d., uniformly at random

0.5 0.6 0.2 0.1 0.4 0.3 0.7 0.9 0.4 0.5 0.6 0.2 0.1 0.4 0.3 0.7 0.9 0.2 0.6 0.4

t = 0 t = 1 On each round, uninfected nodes compute total weight of infected neighbors and become infected if

  • j is infected

bji > θi

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 34 / 45

slide-69
SLIDE 69

Model: Linear threshold model (broadly, triggering models)

Edges have weights (bij), satisfying

j bji ≤ 1

Nodes choose thresholds θi ∈ [0, 1] i.i.d., uniformly at random

0.5 0.6 0.2 0.1 0.4 0.3 0.7 0.9 0.4 0.5 0.6 0.2 0.1 0.4 0.3 0.7 0.9 0.2 0.6 0.4 0.5 0.6 0.2 0.1 0.4 0.3 0.7 0.9 0.5 0.2 0.6 0.4

t = 0 t = 1 t = 2 On each round, uninfected nodes compute total weight of infected neighbors and become infected if

  • j is infected

bji > θi

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 34 / 45

slide-70
SLIDE 70

Previous work

Monotonicity, submodularity of influence function in triggering models (Kempe et al. ’03)

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 35 / 45

slide-71
SLIDE 71

Previous work

Monotonicity, submodularity of influence function in triggering models (Kempe et al. ’03) = ⇒ Greedy algorithm yields

  • 1 − 1

e

  • approximation to

max

A⊆V :|A|≤k I(A)

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 35 / 45

slide-72
SLIDE 72

Previous work

Monotonicity, submodularity of influence function in triggering models (Kempe et al. ’03) = ⇒ Greedy algorithm yields

  • 1 − 1

e

  • approximation to

max

A⊆V :|A|≤k I(A)

However, method involves approximating I at each iteration of greedy algorithm via simulations

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 35 / 45

slide-73
SLIDE 73

Key contributions

1 Computable upper and lower bounds for influence function in general

triggering models

2 Characterization of gap between bounds Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 36 / 45

slide-74
SLIDE 74

Key contributions

1 Computable upper and lower bounds for influence function in general

triggering models

2 Characterization of gap between bounds 3 Proof of monotonicity, submodularity for family of lower bounds

= ⇒

  • 1 − 1

e

  • approximation for sequential greedy algorithm

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 36 / 45

slide-75
SLIDE 75

Key contributions

1 Computable upper and lower bounds for influence function in general

triggering models

2 Characterization of gap between bounds 3 Proof of monotonicity, submodularity for family of lower bounds

= ⇒

  • 1 − 1

e

  • approximation for sequential greedy algorithm

Leads to significant speed-ups: LB1 LB2 UB Simulation Erd¨

  • s-Renyi

1.00 2.36 27.43 710.58 Preferential attachment 1.00 2.56 28.49 759.83 2D-grid 1.00 2.43 47.08 1301.73

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 36 / 45

slide-76
SLIDE 76

Budget allocation (with Ashley Hou)

Problem: Given fixed budget to distribute amongst influencers, how to optimally allocate resources?

S T

y(1) = 2 y(4) = 3

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 37 / 45

slide-77
SLIDE 77

Budget allocation (with Ashley Hou)

Problem: Given fixed budget to distribute amongst influencers, how to optimally allocate resources?

S T

y(1) = 2 y(4) = 3

Mathematical formulation: If resources {y(s)}s∈S are allocated among source nodes S, probability of influencing customer t is It(y) = 1 −

  • (s,t)∈E

(1 − pst)y(s)

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 37 / 45

slide-78
SLIDE 78

Budget allocation (with Ashley Hou)

Problem: Given fixed budget to distribute amongst influencers, how to optimally allocate resources?

S T

y(1) = 2 y(4) = 3

Mathematical formulation: If resources {y(s)}s∈S are allocated among source nodes S, probability of influencing customer t is It(y) = 1 −

  • (s,t)∈E

(1 − pst)y(s) so we solve max

t∈T It(y) s.t. s∈S y(s) ≤ B

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 37 / 45

slide-79
SLIDE 79

Robust variant

In practice, might not know edge parameters p = {pst}, or even edge structure

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 38 / 45

slide-80
SLIDE 80

Robust variant

In practice, might not know edge parameters p = {pst}, or even edge structure Robust optimization framework: max

  • s∈S y(s)≤B
  • min

p∈Σ

  • t∈T

I p

t (y)

  • Po-Ling Loh (UW-Madison)

Data science for networked data Apr 16, 2019 38 / 45

slide-81
SLIDE 81

Robust variant

In practice, might not know edge parameters p = {pst}, or even edge structure Robust optimization framework: max

  • s∈S y(s)≤B
  • min

p∈Σ

  • t∈T

I p

t (y)

  • Goal: Develop efficient algorithms for robust budget allocation with

provable approximation guarantees

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 38 / 45

slide-82
SLIDE 82

Robust variant

In practice, might not know edge parameters p = {pst}, or even edge structure Robust optimization framework: max

  • s∈S y(s)≤B
  • min

p∈Σ

  • t∈T

I p

t (y)

  • Goal: Develop efficient algorithms for robust budget allocation with

provable approximation guarantees Ingredients: Maximization of min of submodular functions, extensions to integer lattices and budget constraints

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 38 / 45

slide-83
SLIDE 83

Network immunization (with Wen Yan)

Goal: Given a budget of interventions at nodes/edges of a graph, how to optimally distribute resources to retard an epidemic?

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 39 / 45

slide-84
SLIDE 84

Network immunization (with Wen Yan)

Goal: Given a budget of interventions at nodes/edges of a graph, how to optimally distribute resources to retard an epidemic? Interested in fractional immunization, which only decreases infectiveness of nodes/edges

0.4 0.5 0.2

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 39 / 45

slide-85
SLIDE 85

Network immunization

Formulation as influence maximization problem: min

θij≤B

  • max

A⊆V :|A|≤k I (A; {bij} − {θij})

  • Po-Ling Loh (UW-Madison)

Data science for networked data Apr 16, 2019 40 / 45

slide-86
SLIDE 86

Network immunization

Formulation as influence maximization problem: min

θij≤B

  • max

A⊆V :|A|≤k I (A; {bij} − {θij})

  • Challenges:

1

Bilevel optimization problem involving discrete and continuous variables

2

No computable closed-form expression for I or ∇I

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 40 / 45

slide-87
SLIDE 87

Local algorithms

Muni Pydi Varun Jog (UW-Madison) (UW-Madison)

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 41 / 45

slide-88
SLIDE 88

Maximizing graph functions

Given function f defined on nodes of a graph Examples: Degree, age of node, power/population level, etc.

2 2 3 1 1 1 2 2 2 2 2 6 4

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 42 / 45

slide-89
SLIDE 89

Maximizing graph functions

Given function f defined on nodes of a graph Examples: Degree, age of node, power/population level, etc.

2 2 3 1 1 1 2 2 2 2 2 6 4

Goal: Maximize f by “walking” along edges and querying values

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 42 / 45

slide-90
SLIDE 90

Maximizing graph functions

Given function f defined on nodes of a graph Examples: Degree, age of node, power/population level, etc.

2 2 3 1 1 1 2 2 2 2 2 6 4

Goal: Maximize f by “walking” along edges and querying values Could use “vanilla random walk” with transition probabilities Pij = wij

di , but can we leverage smoothness/structure of graph

function?

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 42 / 45

slide-91
SLIDE 91

Metropolis-Hastings algorithm

MH algorithm specified by target density pf and proposal distribution Q (stochastic matrix)

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 43 / 45

slide-92
SLIDE 92

Metropolis-Hastings algorithm

MH algorithm specified by target density pf and proposal distribution Q (stochastic matrix) Transition matrix: Pij =

  • Qij min
  • 1, pf (j)Qji

pf (i)Qij

  • ,

j = i, 1 −

j=i Pij,

j = i

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 43 / 45

slide-93
SLIDE 93

Metropolis-Hastings algorithm

MH algorithm specified by target density pf and proposal distribution Q (stochastic matrix) Transition matrix: Pij =

  • Qij min
  • 1, pf (j)Qji

pf (i)Qij

  • ,

j = i, 1 −

j=i Pij,

j = i Known convergence of MH algorithm to pf

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 43 / 45

slide-94
SLIDE 94

Metropolis-Hastings algorithm

MH algorithm specified by target density pf and proposal distribution Q (stochastic matrix) Transition matrix: Pij =

  • Qij min
  • 1, pf (j)Qji

pf (i)Qij

  • ,

j = i, 1 −

j=i Pij,

j = i Known convergence of MH algorithm to pf Idea: Build a density pf maximized wherever f is maximized, hope that MH algorithm finds maximizers quickly

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 43 / 45

slide-95
SLIDE 95

Local algorithm

1 Initialize at random vertex i0 Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 44 / 45

slide-96
SLIDE 96

Local algorithm

1 Initialize at random vertex i0 2 Take T steps of MH algorithm according to transition matrix P Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 44 / 45

slide-97
SLIDE 97

Local algorithm

1 Initialize at random vertex i0 2 Take T steps of MH algorithm according to transition matrix P 3 Output maximum among {f (i0), . . . , f (iT)} Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 44 / 45

slide-98
SLIDE 98

Local algorithm

1 Initialize at random vertex i0 2 Take T steps of MH algorithm according to transition matrix P 3 Output maximum among {f (i0), . . . , f (iT)}

Exponential walk: pf (i) ∝ exp

  • γf (i)
  • and Q = D−1W

Laplacian walk: pf (i) ∝ f 2(i) and Q defined with respect to eigenvectors of graph Laplacian L = D − W

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 44 / 45

slide-99
SLIDE 99

Local algorithm

1 Initialize at random vertex i0 2 Take T steps of MH algorithm according to transition matrix P 3 Output maximum among {f (i0), . . . , f (iT)}

Exponential walk: pf (i) ∝ exp

  • γf (i)
  • and Q = D−1W

Laplacian walk: pf (i) ∝ f 2(i) and Q defined with respect to eigenvectors of graph Laplacian L = D − W Theoretical results: Rates of convergence in TV distance, hitting time bounds for both algorithms in terms of graph/function characteristics

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 44 / 45

slide-100
SLIDE 100

Summary

Many interesting data analysis problems involving network-structured data

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 45 / 45

slide-101
SLIDE 101

Summary

Many interesting data analysis problems involving network-structured data Problems span statistics, optimization, algorithmic design

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 45 / 45

slide-102
SLIDE 102

Summary

Many interesting data analysis problems involving network-structured data Problems span statistics, optimization, algorithmic design Need for new methods, theory, and validation on real-world datasets

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 45 / 45

slide-103
SLIDE 103

Summary

Many interesting data analysis problems involving network-structured data Problems span statistics, optimization, algorithmic design Need for new methods, theory, and validation on real-world datasets Modern-day algorithms should be scalable to large data sets

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 45 / 45

slide-104
SLIDE 104

Summary

Many interesting data analysis problems involving network-structured data Problems span statistics, optimization, algorithmic design Need for new methods, theory, and validation on real-world datasets Modern-day algorithms should be scalable to large data sets

Thank you!

Po-Ling Loh (UW-Madison) Data science for networked data Apr 16, 2019 45 / 45