http cs224w stanford edu 1 new problem outbreak detection
play

http://cs224w.stanford.edu (1) New problem: Outbreak detection (2) - PowerPoint PPT Presentation

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu (1) New problem: Outbreak detection (2) Develop an approximation algorithm It is a submodular opt. problem! (3) Speed-up


  1. CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu

  2.  (1) New problem: Outbreak detection  (2) Develop an approximation algorithm  It is a submodular opt. problem!  (3) Speed-up greedy hill-climbing  Valid for optimizing general submodular functions (i.e., also works for influence maximization)  (4) Prove a new “data dependent” bound on the solution quality  Valid for optimizing general submodular functions (i.e., also works for influence maximization) 10/24/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2

  3. [Leskovec et al., KDD ’07]  Given a real city water distribution network  And data on how contaminants spread in the network  Detect the contaminant as quickly as possible S S  Problem posed by the US Environmental Protection Agency 10/24/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 3

  4. [Leskovec et al., KDD ’07] Posts Blogs Information cascade Time ordered hyperlinks Which blogs should one read to detect cascades as effectively as possible? 10/24/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu Part 2-4

  5. [Leskovec et al., KDD ’07] Want to read things before others do. Detect blue & yellow soon but miss red . Detect all stories but late . 10/24/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 5

  6.  Both of these two are an instance of the same underlying problem!  Given a dynamic process spreading over a network  We want to select a set of nodes to detect the process effectively  Many other applications:  Epidemics  Influence propagation  Network security 10/24/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 6

  7.  Utility of placing sensors:  Water flow dynamics, demands of households, …  For each subset S ⊆ V compute utility f(S) High impact Low impact Contamination outbreak outbreak Medium impact S 3 outbreak S 1 S 2 S 3 S 4 S 2 S 1 Sensor reduces impact through S 4 early detection! Set V of all network junctions S 1 Low sensing quality f(S)=0.01 High sensing quality f(S) = 0.9 10/24/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 7

  8. [Leskovec et al., KDD ’07] Given:  Graph 𝐻 ( 𝑊 , 𝐹 )  Data on how outbreaks spread over the 𝑯 :  For each outbreak 𝑗 we know the time 𝑈 ( 𝑗 , 𝑣 ) when outbreak 𝑗 contaminates node 𝑣 Simulator of water consumption&flow Water distribution network (built by Mech Eng. people) (physical pipes and junctions) We simulate the contamination spread for every possible location. 10/24/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 8

  9. [Leskovec et al., KDD ’07] Given:  Graph 𝐻 ( 𝑊 , 𝐹 )  Data on how outbreaks spread over the 𝑯 :  For each outbreak 𝑗 we know the time 𝑈 ( 𝑗 , 𝑣 ) when outbreak 𝑗 contaminates node 𝑣 c a b a c b Traces of the information flow The network of Collect lots of blogs posts and trace the blogosphere hyperlinks to obtain data about information flow from a given blog. 10/24/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9

  10. [Leskovec et al., KDD ’07] Given:  Graph 𝐻 ( 𝑊 , 𝐹 )  Data on how outbreaks spread over the 𝑯 :  For each outbreak 𝑗 we know the time 𝑈 ( 𝑗 , 𝑣 ) when outbreak 𝑗 contaminates node 𝑣  Goal: Select a subset of nodes S that maximize the expected reward : max 𝑇⊆𝑊 𝑔 𝑇 = � 𝑄 𝑗 𝑔 𝑗 𝑇 𝑗 Expected reward for detecting outbreak i subject to: cost(S) < B 𝒈 𝒋 𝑻 is penalty reduction: 𝑔 𝑗 𝑇 = 𝜌 𝑗 ∅ − 𝜌 𝑗 ( 𝑇 ) 10/24/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10

  11.  Reward  (1) Minimize time to detection  (2) Maximize number of detected propagations  (3) Minimize number of infected people  Cost (node/location dependent):  Reading big blogs is more time consuming  Placing a sensor in a remote location is expensive outbreak i f(S) Monitoring blue node saves more people than monitoring the green node 10/24/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 11

  12.  Objective functions:  1) Time to detection (DT)  How long does it take to detect a contamination?  Penalty: 𝜌 𝑗 ( 𝑢 ) = min { 𝑢 , 𝑈 𝑛𝑛𝑛 }  2) Detection likelihood (DL)  How many contaminations do we detect?  We incur penalty if we don’t detect: 𝜌 𝑗 ( 𝑢 ) = 0 , 𝜌 𝑗 ( ∞ ) = 1  3) Population affected (PA)  How many people drank contaminated water?  𝜌 𝑗 ( 𝑢 ) = {# of blogs in cascade 𝑗 at time 𝑢 }.  Observation: In all cases detecting sooner does not hurt! 10/24/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 12

  13. [Leskovec et al., KDD ’07]  Observation: Diminishing returns New sensor: S 1 S 1 S’ s’ S 2 S 3 S 2 S 4 Placement S={s 1 , s 2 } Placement S’={s 1 , s 2 , s 3 , s 4 } Adding s’ helps Adding s’ helps a lot very little 10/24/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 13

  14.  Claim: For all 𝐵 ⊆ 𝐶 ⊆ 𝑊 and sensors 𝑡 ∈ 𝑊 \B 𝑔 𝐵 ∪ 𝑡 − 𝑔 𝐵 ≥ 𝑔 𝐶 ∪ 𝑡 − 𝑔 𝐶  Proof:  Fix cascade 𝑗  Show 𝑔 𝑗 𝐵 = 𝜌 𝑗 ∞ − 𝜌 𝑗 ( 𝑈 ( 𝐵 , 𝑗 )) is submodular  Consider 𝐵 ⊆ 𝐶 ⊆ 𝑊 and sensor 𝑡 ∈ 𝑊 \B  When does node 𝒕 detect cascade 𝒋 ? 3 Cases:  (1) 𝑈 𝑡 , 𝑗 ≥ 𝑈 ( 𝐵 , 𝑗 ) then 𝑔 𝑗 𝐵 ∪ 𝑡 = 𝑔 𝑗 𝐵 , 𝑔 𝑗 𝐶 ∪ 𝑡 = 𝑔 𝑗 𝐶 and so 𝑔 𝑗 𝐵 ∪ 𝑡 − 𝑔 𝑗 𝐵 = 0 = 𝑔 𝑗 𝐶 ∪ 𝑡 − 𝑔 𝑗 𝐶  Since 𝑡 detects too late, nobody benefits 10/24/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 14

  15.  Proof (contd.):  3 Cases:  (2) 𝑈 𝐶 , 𝑗 ≤ 𝑈 𝑡 , 𝑗 < 𝑈 ( 𝐵 , 𝑗 ) then 𝑔 𝑗 𝐵 ∪ 𝑐 − 𝑔 𝑗 𝐵 ≥ 0 = 𝑔 𝑗 𝐶 ∪ 𝑡 − 𝑔 𝑗 𝐶  𝑡 detects sooner than any node in 𝐵 but after all in 𝐶 . So 𝑣 only helps improve the solution 𝐵 .  (3) 𝑈 𝑡 , 𝑗 < 𝑈 ( 𝐶 , 𝑗 ) then 𝑔 𝑗 𝐵 ∪ 𝑡 − 𝑔 𝑗 𝐵 = 𝜌 𝑗 ∞ − 𝜌 𝑗 𝑈 𝑡 , 𝑗 − 𝑔 𝑗 ( 𝐵 ) ≥ 𝜌 𝑗 ∞ − 𝜌 𝑗 𝑈 𝑡 , 𝑗 − 𝑔 𝑗 ( 𝐶 ) = 𝑔 𝑗 𝐶 ∪ 𝑡 − 𝑔 𝑗 𝐶  Ineqaulity is due to non-decreasingness of 𝑔 𝑗 ( ⋅ ) , i.e., 𝑔 𝑗 𝐵 ≤ 𝑔 𝑗 ( 𝐶 )  So, 𝒈 𝒋 ( ⋅ ) is submodular!  So, 𝒈 ( ⋅ ) is also submodular 𝑔 𝑇 = � 𝑄 𝑗 𝑔 𝑗 𝑇 𝑗 10/24/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 15

  16.  What do we know about optimizing submodular Hill-climbing functions? reward  A hill-climbing (i.e., greedy) is near d a 1 optimal ( 1 − 𝑓 ⋅ 𝑃𝑄𝑈 ) b b a  But: c e  (1) This only works for unit cost case! c (each sensor costs the same) d  For use each sensor 𝑡 has cost 𝑑 ( 𝑡 ) e  (2) Hill-climbing algorithm is slow Add sensor with  At each iteration we need to re-evaluate highest marginal gain marginal gains of all nodes  Runtime 𝑃 (| 𝑊 | · 𝐿 ) for placing 𝐿 sensors 10/24/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu Part 2-16

  17. [Leskovec et al., KDD ’07]  Consider: Hill-climbing that ignores cost  Ignore sensor cost  Repeatedly select sensor with highest marginal gain  Do this until the budget is exhausted  How well does this work? 10/24/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 17

  18. [Leskovec et al., KDD ’07]  Bad example:  𝑜 sensors, budget 𝐶  𝑡 1 : reward 𝑠 , cost 𝐶  𝑡 2 … 𝑡 𝑜 : reward 𝑠 − 𝜁 , cost 1  Hill-climbing always prefers more expensive sensor 𝑡 1 with reward 𝑠 (and exhausts the budget) It never selects cheaper sensors with reward 𝑠 − 𝜁 → For variable cost it can fail arbitrarily badly!  Idea: What if we optimize benefit-cost ratio ? 𝑔 𝐵 𝑗−1 ∪ { 𝑡 } − 𝑔 ( 𝐵 𝑗−1 ) Greedily pick sensor 𝑡 𝑗 = arg max 𝑡 𝑗 that maximizes 𝑑 𝑡 𝑡∈𝑊 benefit to cost ratio. 10/24/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend