http://cs224w.stanford.edu (1) New problem: Outbreak detection (2) - PowerPoint PPT Presentation

HW2 Q1.1 parts (b) and (c) cancelled. HW3 released. It is long. Start early. CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu

¡ (1) New problem: Outbreak detection ¡ (2) Develop an approximation algorithm § It is a submodular opt. problem! ¡ (3) Speed-up greedy hill-climbing § Valid for optimizing general submodular functions (i.e., also works for influence maximization) ¡ (4) Prove a new “data dependent” bound on the solution quality § Valid for optimizing any submodular function (i.e., also works for influence maximization) 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2

¡ Given a real city water distribution network ¡ And data on how contaminants spread in the network ¡ Detect the contaminant as quickly as possible S S ¡ Problem posed by the US Environmental Protection Agency 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 3

Posts Blogs Information cascade Time ordered hyperlinks Which blogs should one read to detect cascades as effectively as possible? 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 4

Want to read things before others do. Detect blue & yellow stories soon but miss the red story . Detect all stories but late . 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 5

¡ Both of these two are an instance of the same underlying problem! ¡ Given a dynamic process spreading over a network we want to select a set of nodes to detect the process effectively ¡ Many other applications: § Epidemics § Influence propagation § Network security 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 6

¡ Utility of placing sensors: § Water flow dynamics, demands of households, … ¡ For each subset S Í V compute utility f(S) High impact Low impact outbreak outbreak Contamination Medium impact S 3 outbreak S 1 S 2 S 3 S 4 S 2 S 1 Sensor reduces impact through S 4 early detection! Set V of all network junctions S 1 Low sensing “quality” (e.g. f(S)=0.01) High sensing “quality” (e.g., f(S) = 0.9) 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 7

Given: ¡ Graph 𝐻(𝑊, 𝐹) ¡ Data on how outbreaks spread over the 𝑯 : § For each outbreak 𝑗 we know the time 𝑈(𝑣, 𝑗) when outbreak 𝑗 contaminates node 𝑣 Simulator of water consumption&flow Water distribution network (built by Mech. Eng. people) (physical pipes and junctions) We simulate the contamination spread for every possible location. 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 8

Given: ¡ Graph 𝐻(𝑊, 𝐹) ¡ Data on how outbreaks spread over the 𝑯 : § For each outbreak 𝑗 we know the time 𝑈(𝑣, 𝑗) when outbreak 𝑗 contaminates node 𝑣 c a b a c b Traces of the information flow and identify influence sets The network of Collect lots of blogs posts and trace the blogosphere hyperlinks to obtain data about information flow from a given blog. 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 9

� Given: ¡ Graph 𝐻(𝑊, 𝐹) ¡ Data on how outbreaks spread over the 𝑯 : § For each outbreak 𝑗 we know the time 𝑈(𝑣, 𝑗) when outbreak 𝑗 contaminates node 𝑣 ¡ Goal: Select a subset of nodes S that maximizes the expected reward : max /⊆1 𝑔 𝑇 = 5 𝑄 𝑗 𝑔 7 𝑇 7 Expected reward for detecting outbreak i subject to: cost(S) < B P(i) … probability of outbreak i occurring. f(i) … reward for detecting outbreak i using sensors S . 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 10

¡ Reward (one of the following three): § (1) Minimize time to detection § (2) Maximize number of detected propagations § (3) Minimize number of infected people ¡ Cost (context dependent): § Reading big blogs is more time consuming § Placing a sensor in a remote location is expensive 8 5 11 9 2 outbreak i 1 6 f(S) 3 10 Monitoring blue node saves more people than monitoring the green node 7 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11

¡ Objective functions: § 1) Time to detection ( DT ) § How long does it take to detect a contamination? § Penalty for detecting at time 𝒖 : 𝜌 7 (𝑢) = 𝑢 § 2) Detection likelihood ( DL ) § How many contaminations do we detect? § Penalty for detecting at time 𝒖 : 𝜌 7 (𝑢) = 0 , 𝜌 7 (∞) = 1 § Note, this is binary outcome: we either detect or not § 3) Population affected ( PA ) § How many people drank contaminated water? § Penalty for detecting at time 𝒖 : 𝜌 7 (𝑢) = {# of infected nodes in outbreak 𝑗 by time 𝑢 }. ¡ Observation: In all cases detecting sooner does not hurt! 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12

We define 𝒈 𝒋 𝑻 as penalty reduction: 𝑔 7 𝑇 = 𝜌 7 ∅ − 𝜌 7 (𝑈(𝑇, 𝑗)) ¡ Observation: Diminishing returns New sensor: S 1 S 1 S’ s’ S 2 S 3 S 2 S 4 Placement S={s 1 , s 2 } Placement S’={s 1 , s 2 , s 3 , s 4 } Adding s’helps Adding s’helps a lot very little 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 13

¡ Claim: For all 𝑩 ⊆ 𝑪 ⊆ 𝑾 and sensors 𝒕 ∈ 𝑾\𝑪 𝒈 𝑩 ∪ 𝒕 − 𝒈 𝑩 ≥ 𝒈 𝑪 ∪ 𝒕 − 𝒈 𝑪 ¡ Proof: All our objectives are submodular § Fix cascade/outbreak 𝒋 § Show 𝒈 𝒋 𝑩 = 𝝆 𝒋 ∞ − 𝝆 𝒋 (𝑼(𝑩, 𝒋)) is submodular § Consider 𝑩 ⊆ 𝑪 ⊆ 𝑾 and sensor 𝒕 ∈ 𝑾\𝑪 § When does node 𝒕 detect cascade 𝒋 ? § We analyze 3 cases based on when 𝒕 detects outbreak i § (1) 𝑼 𝒕, 𝒋 ≥ 𝑼(𝑩, 𝒋) : 𝒕 detects late, nobody benefits: 𝑔 7 𝐵 ∪ 𝑡 = 𝑔 7 𝐵 , also 𝑔 7 𝐶 ∪ 𝑡 = 𝑔 7 𝐶 and so 𝑔 7 𝐵 ∪ 𝑡 − 𝑔 7 𝐵 = 0 = 𝑔 7 𝐶 ∪ 𝑡 − 𝑔 7 𝐶 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 14

� Remember 𝑩 ⊆ 𝑪 ¡ Proof (contd.): § (2) 𝑼 𝑪, 𝒋 ≤ 𝑼 𝒕, 𝒋 < 𝑼 𝑩, 𝒋 : 𝒕 detects after B but before A 𝒕 detects sooner than any node in 𝑩 but after all in 𝑪 . So 𝒕 only helps improve the solution 𝑩 (but not 𝑪) 𝑔 7 𝐵 ∪ 𝑡 − 𝑔 7 𝐵 ≥ 0 = 𝑔 7 𝐶 ∪ 𝑡 − 𝑔 7 𝐶 § (3) 𝑼 𝒕, 𝒋 < 𝑼(𝑪, 𝒋) : 𝒕 detects early 𝑔 7 𝐵 ∪ 𝑡 − 𝑔 7 𝐵 = 𝜌 7 ∞ − 𝜌 7 𝑈 𝑡, 𝑗 − 𝑔 7 (𝐵) ≥ 𝜌 7 ∞ − 𝜌 7 𝑈 𝑡, 𝑗 − 𝑔 7 (𝐶) = 𝑔 7 𝐶 ∪ 𝑡 − 𝑔 7 𝐶 § Ineqaulity is due to non-decreasingness of 𝑔 7 (⋅) , i.e., 𝑔 7 𝐵 ≤ 𝑔 7 (𝐶) § So, 𝒈 𝒋 (⋅) is submodular! ¡ So, 𝒈(⋅) is also submodular 𝑔 𝑇 = 5 𝑄 𝑗 𝑔 7 𝑇 7 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 15

¡ What do we know about optimizing submodular Hill-climbing functions? reward § A hill-climbing (i.e., greedy) is near d a 𝟐 optimal: (𝟐 − 𝒇 ) ⋅ 𝑷𝑸𝑼 b b a ¡ But: c e § (1) This only works for unit cost c case! (each sensor costs the same) d § For us each sensor 𝒕 has cost 𝒅(𝒕) e § (2) Hill-climbing algorithm is slow Add sensor with § At each iteration we need to re-evaluate highest marginal gain marginal gains of all nodes § Runtime 𝑷(|𝑾| · 𝑳) for placing 𝑳 sensors 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu Part 2-16

¡ Consider the following algorithm to solve the outbreak detection problem: Hill-climbing that ignores cost § Ignore sensor cost 𝒅(𝒕) § Repeatedly select sensor with highest marginal gain § Do this until the budget is exhausted ¡ Q: How well does this work? ¡ A: It can fail arbitrarily badly! L § Next we come up with an example where Hill- climbing solution is arbitrarily away from OPT 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 18

¡ Bad example when we ignore cost: § 𝒐 sensors, budget 𝑪 § 𝒕 𝟐 : reward 𝒔 , cost 𝑪 , , 𝒕 𝟑 … 𝒕 𝒐 : reward 𝒔 − 𝜻 , § All sensors have the same cost: c 𝒕 𝒋 = 𝟐 § Hill-climbing always prefers more expensive sensor 𝒕 𝟐 with reward 𝒔 (and exhausts the budget). It never selects cheaper sensors with reward 𝒔 − 𝜻 → For variable cost it can fail arbitrarily badly! ¡ Idea: What if we optimize benefit-cost ratio ? 𝑔 𝐵 7ef ∪ {𝑡} − 𝑔(𝐵 7ef ) Greedily pick sensor 𝑡 7 = arg max 𝒕 𝒋 that maximizes 𝒅 𝒕 d∈1 benefit to cost ratio. 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 19

http://cs224w.stanford.edu (1) New problem: Outbreak detection (2) - PowerPoint PPT Presentation

HW2 Q1.1 parts (b) and (c) cancelled. HW3 released. It is long. Start early. CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu (1) New problem: Outbreak detection (2) Develop an approximation

http://cs224w.stanford.edu October August 12/3/2013 Jure Leskovec, Stanford CS224W: Social and

http://cs224w.stanford.edu 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information

http://cs224w.stanford.edu Course website: Course website: http://cs224w.stanford.edu

http://cs224w.stanford.edu 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 12/4/17 Jure

http://cs224w.stanford.edu Nodes Nodes Network Adjacency matrix 11/30/17 Jure Leskovec,

http://cs224w.stanford.edu (1) New problem: Outbreak detection (2) Develop an approximation

http://cs224w.stanford.edu (1) New problem: Outbreak detection (2) Develop an approximation

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 10/15/19 Jure

http://cs224w.stanford.edu Output: Node embeddings. We can also embed larger network

http://cs224w.stanford.edu Stanford Social Web (ca. 1999) network

http://cs224w.stanford.edu Networks of tightly Networks of tightly connected groups

http://cs224w.stanford.edu Spreading through networks: Spreading through networks:

http://cs224w.stanford.edu Non overlapping vs overlapping communities Non overlapping

http://cs224w.stanford.edu Teams of 2 3 students (1 is also ok) Teams of 2 3 students

http://cs224w.stanford.edu How to organize/navigate it? How to organize/navigate it?

The FF Planning System Jorge A. Baier Department of Computer Science University of Toronto

Declarative Routing Seminar in Distributed Computing 08 with papers chosen by Prof. T. Roscoe

Mobile Performance from Radio Up WebRTC battery, latency, and bandwidth optimization for the

Clusters of GPUs Michael LeBeane mlebeane@utexas.edu Advisor : Lizy K. John Problem Statement

Chapter 4 Beyond Classical Search 4.1 Local search algorithms and optimization problems CS4811 -

Introduction to Combinatorial Algorithms Lucia Moura Fall 2015 Introduction to Combinatorial

Required reading: AIMA, Chapter 4 Choueiry L WH: Chapters 6, 10, 13 and 14.

Artificial Intelligence Local and Randomized/Stochastic Search Lecture 6 CS 444 Spring 2020