http cs224w stanford edu 1 new problem outbreak detection
play

http://cs224w.stanford.edu (1) New problem: Outbreak detection (2) - PowerPoint PPT Presentation

HW2 Q1.1 parts (b) and (c) cancelled. HW3 released. It is long. Start early. CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu (1) New problem: Outbreak detection (2) Develop an approximation


  1. HW2 Q1.1 parts (b) and (c) cancelled. HW3 released. It is long. Start early. CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu

  2. ¡ (1) New problem: Outbreak detection ¡ (2) Develop an approximation algorithm § It is a submodular opt. problem! ¡ (3) Speed-up greedy hill-climbing § Valid for optimizing general submodular functions (i.e., also works for influence maximization) ¡ (4) Prove a new “data dependent” bound on the solution quality § Valid for optimizing any submodular function (i.e., also works for influence maximization) 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2

  3. ¡ Given a real city water distribution network ¡ And data on how contaminants spread in the network ¡ Detect the contaminant as quickly as possible S S ¡ Problem posed by the US Environmental Protection Agency 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 3

  4. Posts Blogs Information cascade Time ordered hyperlinks Which blogs should one read to detect cascades as effectively as possible? 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 4

  5. Want to read things before others do. Detect blue & yellow stories soon but miss the red story . Detect all stories but late . 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 5

  6. ¡ Both of these two are an instance of the same underlying problem! ¡ Given a dynamic process spreading over a network we want to select a set of nodes to detect the process effectively ¡ Many other applications: § Epidemics § Influence propagation § Network security 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 6

  7. ¡ Utility of placing sensors: § Water flow dynamics, demands of households, … ¡ For each subset S Í V compute utility f(S) High impact Low impact outbreak outbreak Contamination Medium impact S 3 outbreak S 1 S 2 S 3 S 4 S 2 S 1 Sensor reduces impact through S 4 early detection! Set V of all network junctions S 1 Low sensing “quality” (e.g. f(S)=0.01) High sensing “quality” (e.g., f(S) = 0.9) 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 7

  8. Given: ¡ Graph 𝐻(𝑊, 𝐹) ¡ Data on how outbreaks spread over the 𝑯 : § For each outbreak 𝑗 we know the time 𝑈(𝑣, 𝑗) when outbreak 𝑗 contaminates node 𝑣 Simulator of water consumption&flow Water distribution network (built by Mech. Eng. people) (physical pipes and junctions) We simulate the contamination spread for every possible location. 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 8

  9. Given: ¡ Graph 𝐻(𝑊, 𝐹) ¡ Data on how outbreaks spread over the 𝑯 : § For each outbreak 𝑗 we know the time 𝑈(𝑣, 𝑗) when outbreak 𝑗 contaminates node 𝑣 c a b a c b Traces of the information flow and identify influence sets The network of Collect lots of blogs posts and trace the blogosphere hyperlinks to obtain data about information flow from a given blog. 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 9

  10. � Given: ¡ Graph 𝐻(𝑊, 𝐹) ¡ Data on how outbreaks spread over the 𝑯 : § For each outbreak 𝑗 we know the time 𝑈(𝑣, 𝑗) when outbreak 𝑗 contaminates node 𝑣 ¡ Goal: Select a subset of nodes S that maximizes the expected reward : max /⊆1 𝑔 𝑇 = 5 𝑄 𝑗 𝑔 7 𝑇 7 Expected reward for detecting outbreak i subject to: cost(S) < B P(i) … probability of outbreak i occurring. f(i) … reward for detecting outbreak i using sensors S . 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 10

  11. ¡ Reward (one of the following three): § (1) Minimize time to detection § (2) Maximize number of detected propagations § (3) Minimize number of infected people ¡ Cost (context dependent): § Reading big blogs is more time consuming § Placing a sensor in a remote location is expensive 8 5 11 9 2 outbreak i 1 6 f(S) 3 10 Monitoring blue node saves more people than monitoring the green node 7 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11

  12. ¡ Objective functions: § 1) Time to detection ( DT ) § How long does it take to detect a contamination? § Penalty for detecting at time 𝒖 : 𝜌 7 (𝑢) = 𝑢 § 2) Detection likelihood ( DL ) § How many contaminations do we detect? § Penalty for detecting at time 𝒖 : 𝜌 7 (𝑢) = 0 , 𝜌 7 (∞) = 1 § Note, this is binary outcome: we either detect or not § 3) Population affected ( PA ) § How many people drank contaminated water? § Penalty for detecting at time 𝒖 : 𝜌 7 (𝑢) = {# of infected nodes in outbreak 𝑗 by time 𝑢 }. ¡ Observation: In all cases detecting sooner does not hurt! 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12

  13. We define 𝒈 𝒋 𝑻 as penalty reduction: 𝑔 7 𝑇 = 𝜌 7 ∅ − 𝜌 7 (𝑈(𝑇, 𝑗)) ¡ Observation: Diminishing returns New sensor: S 1 S 1 S’ s’ S 2 S 3 S 2 S 4 Placement S={s 1 , s 2 } Placement S’={s 1 , s 2 , s 3 , s 4 } Adding s’helps Adding s’helps a lot very little 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 13

  14. ¡ Claim: For all 𝑩 ⊆ 𝑪 ⊆ 𝑾 and sensors 𝒕 ∈ 𝑾\𝑪 𝒈 𝑩 ∪ 𝒕 − 𝒈 𝑩 ≥ 𝒈 𝑪 ∪ 𝒕 − 𝒈 𝑪 ¡ Proof: All our objectives are submodular § Fix cascade/outbreak 𝒋 § Show 𝒈 𝒋 𝑩 = 𝝆 𝒋 ∞ − 𝝆 𝒋 (𝑼(𝑩, 𝒋)) is submodular § Consider 𝑩 ⊆ 𝑪 ⊆ 𝑾 and sensor 𝒕 ∈ 𝑾\𝑪 § When does node 𝒕 detect cascade 𝒋 ? § We analyze 3 cases based on when 𝒕 detects outbreak i § (1) 𝑼 𝒕, 𝒋 ≥ 𝑼(𝑩, 𝒋) : 𝒕 detects late, nobody benefits: 𝑔 7 𝐵 ∪ 𝑡 = 𝑔 7 𝐵 , also 𝑔 7 𝐶 ∪ 𝑡 = 𝑔 7 𝐶 and so 𝑔 7 𝐵 ∪ 𝑡 − 𝑔 7 𝐵 = 0 = 𝑔 7 𝐶 ∪ 𝑡 − 𝑔 7 𝐶 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 14

  15. � Remember 𝑩 ⊆ 𝑪 ¡ Proof (contd.): § (2) 𝑼 𝑪, 𝒋 ≤ 𝑼 𝒕, 𝒋 < 𝑼 𝑩, 𝒋 : 𝒕 detects after B but before A 𝒕 detects sooner than any node in 𝑩 but after all in 𝑪 . So 𝒕 only helps improve the solution 𝑩 (but not 𝑪) 𝑔 7 𝐵 ∪ 𝑡 − 𝑔 7 𝐵 ≥ 0 = 𝑔 7 𝐶 ∪ 𝑡 − 𝑔 7 𝐶 § (3) 𝑼 𝒕, 𝒋 < 𝑼(𝑪, 𝒋) : 𝒕 detects early 𝑔 7 𝐵 ∪ 𝑡 − 𝑔 7 𝐵 = 𝜌 7 ∞ − 𝜌 7 𝑈 𝑡, 𝑗 − 𝑔 7 (𝐵) ≥ 𝜌 7 ∞ − 𝜌 7 𝑈 𝑡, 𝑗 − 𝑔 7 (𝐶) = 𝑔 7 𝐶 ∪ 𝑡 − 𝑔 7 𝐶 § Ineqaulity is due to non-decreasingness of 𝑔 7 (⋅) , i.e., 𝑔 7 𝐵 ≤ 𝑔 7 (𝐶) § So, 𝒈 𝒋 (⋅) is submodular! ¡ So, 𝒈(⋅) is also submodular 𝑔 𝑇 = 5 𝑄 𝑗 𝑔 7 𝑇 7 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 15

  16. ¡ What do we know about optimizing submodular Hill-climbing functions? reward § A hill-climbing (i.e., greedy) is near d a 𝟐 optimal: (𝟐 − 𝒇 ) ⋅ 𝑷𝑸𝑼 b b a ¡ But: c e § (1) This only works for unit cost c case! (each sensor costs the same) d § For us each sensor 𝒕 has cost 𝒅(𝒕) e § (2) Hill-climbing algorithm is slow Add sensor with § At each iteration we need to re-evaluate highest marginal gain marginal gains of all nodes § Runtime 𝑷(|𝑾| · 𝑳) for placing 𝑳 sensors 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu Part 2-16

  17. ¡ Consider the following algorithm to solve the outbreak detection problem: Hill-climbing that ignores cost § Ignore sensor cost 𝒅(𝒕) § Repeatedly select sensor with highest marginal gain § Do this until the budget is exhausted ¡ Q: How well does this work? ¡ A: It can fail arbitrarily badly! L § Next we come up with an example where Hill- climbing solution is arbitrarily away from OPT 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 18

  18. ¡ Bad example when we ignore cost: § 𝒐 sensors, budget 𝑪 § 𝒕 𝟐 : reward 𝒔 , cost 𝑪 , , 𝒕 𝟑 … 𝒕 𝒐 : reward 𝒔 − 𝜻 , § All sensors have the same cost: c 𝒕 𝒋 = 𝟐 § Hill-climbing always prefers more expensive sensor 𝒕 𝟐 with reward 𝒔 (and exhausts the budget). It never selects cheaper sensors with reward 𝒔 − 𝜻 → For variable cost it can fail arbitrarily badly! ¡ Idea: What if we optimize benefit-cost ratio ? 𝑔 𝐵 7ef ∪ {𝑡} − 𝑔(𝐵 7ef ) Greedily pick sensor 𝑡 7 = arg max 𝒕 𝒋 that maximizes 𝒅 𝒕 d∈1 benefit to cost ratio. 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend