http://cs224w.stanford.edu (1) New problem: Outbreak detection (2) - PowerPoint PPT Presentation

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu

 (1) New problem: Outbreak detection  (2) Develop an approximation algorithm  It is a submodular opt. problem!  (3) Speed-up greedy hill-climbing  Valid for optimizing general submodular functions (i.e., also works for influence maximization)  (4) Prove a new “data dependent” bound on the solution quality  Valid for optimizing general submodular functions (i.e., also works for influence maximization) 10/24/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2

[Leskovec et al., KDD ’07]  Given a real city water distribution network  And data on how contaminants spread in the network  Detect the contaminant as quickly as possible S S  Problem posed by the US Environmental Protection Agency 10/24/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 3

[Leskovec et al., KDD ’07] Posts Blogs Information cascade Time ordered hyperlinks Which blogs should one read to detect cascades as effectively as possible? 10/24/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu Part 2-4

[Leskovec et al., KDD ’07] Want to read things before others do. Detect blue & yellow soon but miss red . Detect all stories but late . 10/24/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 5

 Both of these two are an instance of the same underlying problem!  Given a dynamic process spreading over a network  We want to select a set of nodes to detect the process effectively  Many other applications:  Epidemics  Influence propagation  Network security 10/24/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 6

 Utility of placing sensors:  Water flow dynamics, demands of households, …  For each subset S ⊆ V compute utility f(S) High impact Low impact Contamination outbreak outbreak Medium impact S 3 outbreak S 1 S 2 S 3 S 4 S 2 S 1 Sensor reduces impact through S 4 early detection! Set V of all network junctions S 1 Low sensing quality f(S)=0.01 High sensing quality f(S) = 0.9 10/24/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 7

[Leskovec et al., KDD ’07] Given:  Graph 𝐻 ( 𝑊 , 𝐹 )  Data on how outbreaks spread over the 𝑯 :  For each outbreak 𝑗 we know the time 𝑈 ( 𝑗 , 𝑣 ) when outbreak 𝑗 contaminates node 𝑣 Simulator of water consumption&flow Water distribution network (built by Mech Eng. people) (physical pipes and junctions) We simulate the contamination spread for every possible location. 10/24/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 8

[Leskovec et al., KDD ’07] Given:  Graph 𝐻 ( 𝑊 , 𝐹 )  Data on how outbreaks spread over the 𝑯 :  For each outbreak 𝑗 we know the time 𝑈 ( 𝑗 , 𝑣 ) when outbreak 𝑗 contaminates node 𝑣 c a b a c b Traces of the information flow The network of Collect lots of blogs posts and trace the blogosphere hyperlinks to obtain data about information flow from a given blog. 10/24/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9

[Leskovec et al., KDD ’07] Given:  Graph 𝐻 ( 𝑊 , 𝐹 )  Data on how outbreaks spread over the 𝑯 :  For each outbreak 𝑗 we know the time 𝑈 ( 𝑗 , 𝑣 ) when outbreak 𝑗 contaminates node 𝑣  Goal: Select a subset of nodes S that maximize the expected reward : max 𝑇⊆𝑊 𝑔 𝑇 = � 𝑄 𝑗 𝑔 𝑗 𝑇 𝑗 Expected reward for detecting outbreak i subject to: cost(S) < B 𝒈 𝒋 𝑻 is penalty reduction: 𝑔 𝑗 𝑇 = 𝜌 𝑗 ∅ − 𝜌 𝑗 ( 𝑇 ) 10/24/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10

 Reward  (1) Minimize time to detection  (2) Maximize number of detected propagations  (3) Minimize number of infected people  Cost (node/location dependent):  Reading big blogs is more time consuming  Placing a sensor in a remote location is expensive outbreak i f(S) Monitoring blue node saves more people than monitoring the green node 10/24/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 11

 Objective functions:  1) Time to detection (DT)  How long does it take to detect a contamination?  Penalty: 𝜌 𝑗 ( 𝑢 ) = min { 𝑢 , 𝑈 𝑛𝑛𝑛 }  2) Detection likelihood (DL)  How many contaminations do we detect?  We incur penalty if we don’t detect: 𝜌 𝑗 ( 𝑢 ) = 0 , 𝜌 𝑗 ( ∞ ) = 1  3) Population affected (PA)  How many people drank contaminated water?  𝜌 𝑗 ( 𝑢 ) = {# of blogs in cascade 𝑗 at time 𝑢 }.  Observation: In all cases detecting sooner does not hurt! 10/24/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 12

[Leskovec et al., KDD ’07]  Observation: Diminishing returns New sensor: S 1 S 1 S’ s’ S 2 S 3 S 2 S 4 Placement S={s 1 , s 2 } Placement S’={s 1 , s 2 , s 3 , s 4 } Adding s’ helps Adding s’ helps a lot very little 10/24/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 13

 Claim: For all 𝐵 ⊆ 𝐶 ⊆ 𝑊 and sensors 𝑡 ∈ 𝑊 \B 𝑔 𝐵 ∪ 𝑡 − 𝑔 𝐵 ≥ 𝑔 𝐶 ∪ 𝑡 − 𝑔 𝐶  Proof:  Fix cascade 𝑗  Show 𝑔 𝑗 𝐵 = 𝜌 𝑗 ∞ − 𝜌 𝑗 ( 𝑈 ( 𝐵 , 𝑗 )) is submodular  Consider 𝐵 ⊆ 𝐶 ⊆ 𝑊 and sensor 𝑡 ∈ 𝑊 \B  When does node 𝒕 detect cascade 𝒋 ? 3 Cases:  (1) 𝑈 𝑡 , 𝑗 ≥ 𝑈 ( 𝐵 , 𝑗 ) then 𝑔 𝑗 𝐵 ∪ 𝑡 = 𝑔 𝑗 𝐵 , 𝑔 𝑗 𝐶 ∪ 𝑡 = 𝑔 𝑗 𝐶 and so 𝑔 𝑗 𝐵 ∪ 𝑡 − 𝑔 𝑗 𝐵 = 0 = 𝑔 𝑗 𝐶 ∪ 𝑡 − 𝑔 𝑗 𝐶  Since 𝑡 detects too late, nobody benefits 10/24/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 14

 Proof (contd.):  3 Cases:  (2) 𝑈 𝐶 , 𝑗 ≤ 𝑈 𝑡 , 𝑗 < 𝑈 ( 𝐵 , 𝑗 ) then 𝑔 𝑗 𝐵 ∪ 𝑐 − 𝑔 𝑗 𝐵 ≥ 0 = 𝑔 𝑗 𝐶 ∪ 𝑡 − 𝑔 𝑗 𝐶  𝑡 detects sooner than any node in 𝐵 but after all in 𝐶 . So 𝑣 only helps improve the solution 𝐵 .  (3) 𝑈 𝑡 , 𝑗 < 𝑈 ( 𝐶 , 𝑗 ) then 𝑔 𝑗 𝐵 ∪ 𝑡 − 𝑔 𝑗 𝐵 = 𝜌 𝑗 ∞ − 𝜌 𝑗 𝑈 𝑡 , 𝑗 − 𝑔 𝑗 ( 𝐵 ) ≥ 𝜌 𝑗 ∞ − 𝜌 𝑗 𝑈 𝑡 , 𝑗 − 𝑔 𝑗 ( 𝐶 ) = 𝑔 𝑗 𝐶 ∪ 𝑡 − 𝑔 𝑗 𝐶  Ineqaulity is due to non-decreasingness of 𝑔 𝑗 ( ⋅ ) , i.e., 𝑔 𝑗 𝐵 ≤ 𝑔 𝑗 ( 𝐶 )  So, 𝒈 𝒋 ( ⋅ ) is submodular!  So, 𝒈 ( ⋅ ) is also submodular 𝑔 𝑇 = � 𝑄 𝑗 𝑔 𝑗 𝑇 𝑗 10/24/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 15

 What do we know about optimizing submodular Hill-climbing functions? reward  A hill-climbing (i.e., greedy) is near d a 1 optimal ( 1 − 𝑓 ⋅ 𝑃𝑄𝑈 ) b b a  But: c e  (1) This only works for unit cost case! c (each sensor costs the same) d  For use each sensor 𝑡 has cost 𝑑 ( 𝑡 ) e  (2) Hill-climbing algorithm is slow Add sensor with  At each iteration we need to re-evaluate highest marginal gain marginal gains of all nodes  Runtime 𝑃 (| 𝑊 | · 𝐿 ) for placing 𝐿 sensors 10/24/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu Part 2-16

[Leskovec et al., KDD ’07]  Consider: Hill-climbing that ignores cost  Ignore sensor cost  Repeatedly select sensor with highest marginal gain  Do this until the budget is exhausted  How well does this work? 10/24/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 17

[Leskovec et al., KDD ’07]  Bad example:  𝑜 sensors, budget 𝐶  𝑡 1 : reward 𝑠 , cost 𝐶  𝑡 2 … 𝑡 𝑜 : reward 𝑠 − 𝜁 , cost 1  Hill-climbing always prefers more expensive sensor 𝑡 1 with reward 𝑠 (and exhausts the budget) It never selects cheaper sensors with reward 𝑠 − 𝜁 → For variable cost it can fail arbitrarily badly!  Idea: What if we optimize benefit-cost ratio ? 𝑔 𝐵 𝑗−1 ∪ { 𝑡 } − 𝑔 ( 𝐵 𝑗−1 ) Greedily pick sensor 𝑡 𝑗 = arg max 𝑡 𝑗 that maximizes 𝑑 𝑡 𝑡∈𝑊 benefit to cost ratio. 10/24/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 18

http://cs224w.stanford.edu (1) New problem: Outbreak detection (2) - PowerPoint PPT Presentation

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu (1) New problem: Outbreak detection (2) Develop an approximation algorithm It is a submodular opt. problem! (3) Speed-up

http://cs224w.stanford.edu October August 12/3/2013 Jure Leskovec, Stanford CS224W: Social and

http://cs224w.stanford.edu 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information

http://cs224w.stanford.edu Course website: Course website: http://cs224w.stanford.edu

http://cs224w.stanford.edu 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 12/4/17 Jure

http://cs224w.stanford.edu Nodes Nodes Network Adjacency matrix 11/30/17 Jure Leskovec,

http://cs224w.stanford.edu (1) New problem: Outbreak detection (2) Develop an approximation

http://cs224w.stanford.edu (1) New problem: Outbreak detection (2) Develop an approximation

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 10/15/19 Jure

http://cs224w.stanford.edu Output: Node embeddings. We can also embed larger network

http://cs224w.stanford.edu Stanford Social Web (ca. 1999) network

http://cs224w.stanford.edu Networks of tightly Networks of tightly connected groups

http://cs224w.stanford.edu Spreading through networks: Spreading through networks:

http://cs224w.stanford.edu Non overlapping vs overlapping communities Non overlapping

http://cs224w.stanford.edu Teams of 2 3 students (1 is also ok) Teams of 2 3 students

http://cs224w.stanford.edu How to organize/navigate it? How to organize/navigate it?

The Untouchable Web Rick Hanlon Point Hover Click Type Resize Drag Load Point Hover

To change More textrunner, more pattern learning Reorder: Open Information Extraction

manager (IM) Introduction General platform to deploy on-demand customized virtual computing

Classifying unstructured text Deterministic and machine learning approaches Stephanie Fischer

Q1 2016 Supplementary Slides May 4, 2016 1 Forward-looking Statements This presentation for

Getting it Booking right Using Data to make Decisions Iaroslav Khramov | GOTO conference 2014

GANAK: A Scalable Probabilistic Exact Model Counter Shubham Sharma 1 , Mate Soos 2 , Subhajit Roy 1

technologies from particle physics to medical imaging Fabrice Retire TRIUMF LABORATOIRE