Computational Sustainability Andreas Krause Master Class at - - PowerPoint PPT Presentation

computational sustainability
SMART_READER_LITE
LIVE PREVIEW

Computational Sustainability Andreas Krause Master Class at - - PowerPoint PPT Presentation

Submodular Optimization in Computational Sustainability Andreas Krause Master Class at CompSust 2012 Combinatorial optimization in computational sustainability Many applications in computational sustainability require solving large discrete


slide-1
SLIDE 1

Submodular Optimization in Computational Sustainability

Andreas Krause

Master Class at CompSust 2012

slide-2
SLIDE 2

Combinatorial optimization in computational sustainability

Many applications in computational sustainability require solving large discrete optimization problems: Given finite set V wish to select subset A (subject to some constraints) maximizing utility F(A) These problems are the focus of this tutorial.

2

slide-3
SLIDE 3

Wind farm Deployment

[Changshui et al, Renewable Energy, 2011]

3

How should we deploy wind farms to maximize efficiency? 2 1 3

slide-4
SLIDE 4

Conservation Planning

[w Golovin, Converse, Gardner, Morey – AAAI ‘11 Outstanding Paper+

Which patches of land should we recommend?

slide-5
SLIDE 5

5

Can only make a limited number of measurements!

Depth Location across lake

Robotic monitoring of rivers and lakes

[with Singh, Guestrin, Kaiser, Journal of AI Research ’09]

Need to monitor large spatial phenomena

Temperature, nutrient distribution, fluorescence, … NIMS Kaiser et.al. (UCLA)

Actual temperature Predicted temperature

Use robotic sensors to cover large areas Where should we sense to get most accurate predictions?

slide-6
SLIDE 6

Contamination of drinking water could affect millions of people

6

Monitoring water networks

[with Leskovec, Guestrin, VanBriesen, Faloutsos, J Wat Res Mgt ‘08]

Place sensors to detect contaminations

Where should we place sensors to quickly detect contamination?

Sensors

~$14K

Contamination

slide-7
SLIDE 7

7

Quantifying utility of sensor placements

Model predicts impact of contaminations For each subset A of V compute sensing quality F(A)

S2 S3 S4 S1 S2 S3 S4 S1

High sensing quality F(A) = 0.9 Low sensing quality F(A)=0.01 Model predicts High impact Medium impact location Low impact location Sensor reduces impact through early detection!

S1

Contamination Set V of all network junctions

slide-8
SLIDE 8

8

Sensor placement

Given: finite set V of locations, sensing quality F Want: such that NP-hard!

How well can this simple heuristic do?

S1 S2 S3 S4 S5 S6

Greedy algorithm:

Start with A = {} For i = 1 to k s* := argmaxs F(A U {s}) A := A U {s*}

slide-9
SLIDE 9

9

2 4 6 8 10 0.5 0.6 0.7 0.8 0.9 Number of sensors placed Population affected

Performance of greedy algorithm

Greedy score empirically close to optimal. Why?

Small subset of Water networks data

2 4 6 8 10 0.5 0.6 0.7 0.8 0.9 Number of sensors placed Population affected

Greedy Optimal

Population protected (higher is better) Number of sensors placed

slide-10
SLIDE 10

10

Key property 1: Monotonicity

S2 S1

Placement A = {S1, S2}

S2 S3 S4 S1

Placement B = {S1, S2, S3, S4}

F is monotonic: Adding sensors can only help

slide-11
SLIDE 11

11

S2 S3 S4 S1

Key property 2: Diminishing returns

S2 S1 S’

Placement A = {S1, S2} Placement B = {S1, S2, S3, S4} Adding S’ will help a lot! Adding S’ doesn’t help much New sensor S’ New sensor Y’

S’ B A S’ + +

Large improvement

Small improvement

Submodularity:

slide-12
SLIDE 12

12

One reason submodularity is useful

Theorem [Nemhauser et al ‘78] Suppose F is monotonic and submodular. Then greedy algorithm gives constant factor approximation:

Greedy algorithm gives near-optimal solution! In general, guarantees best possible unless P = NP! ~63%

slide-13
SLIDE 13

13

Battle of the Water Sensor Networks Competition

[with Leskovec, Guestrin, VanBriesen, Faloutsos, J Wat Res Mgt 2008]

Real metropolitan area network (12,527 nodes) Water flow simulator provided by EPA 3.6 million contamination events Multiple objectives: Detection time, affected population, … Place sensors that detect well “on average”

slide-14
SLIDE 14

Reward function is submodular

Claim: Reward function is monotonic submodular Consider event i:

Ri(uk) = benefit from sensor uk in event i Ri(A) = max Ri(uk), ukA

Ri is submodular

Overall objective:

F(A) =  Prob(i) Ri(A)

Submodular?? u1 Ri(u1) Event i u2 Ri(u2)

14

slide-15
SLIDE 15

15

Closedness properties

F1,…,Fm submodular functions on V and 1,…,m ≥ 0 Then: F(A) = i i Fi(A) is submodular! Submodularity closed under nonnegative linear combinations! Extremely useful fact!!

F(A) submodular   P() F(A) submodular! Multicriterion optimization: F1,…,Fm submodular, i>0  i i Fi(A) submodular

slide-16
SLIDE 16

Reward function is submodular

Claim: Reward function is monotonic submodular Consider event i:

Ri(uk) = benefit from sensor uk in event i Ri(A) = max Ri(uk), ukA

Ri is submodular

Overall objective:

F(A) =  Prob(i) Ri(A)

 F is submodular!  Can use greedy algorithm to solve !

u1 Ri(u1) Event i u2 Ri(u2)

16

slide-17
SLIDE 17

17

BWSN Competition results

13 participants Performance measured in 30 different criteria

5 10 15 20 25 30

Total Score

Higher is better

E E D D G G G G G H H H

G: Genetic algorithm H: Other heuristic D: Domain knowledge E: “Exact” method (MIP)

24% better performance than runner-up! 

slide-18
SLIDE 18

18

Simulated all on 2 weeks / 40 processors 152 GB data on disk  Very accurate computation of F(A) , 16 GB in main memory (compressed) Lower is better 30 hours/20 sensors

6 weeks for all 30 settings 

3.6M contaminations Very slow evaluation of F(A) 

1 2 3 4 5 6 7 8 9 10 100 200 300 Number of sensors selected Running time (minutes)

Exhaustive search (All subsets) Naive greedy

What was the trick?

slide-19
SLIDE 19

19

“Lazy” greedy algorithm [Minoux ’78]

Lazy greedy algorithm:

  • First iteration as usual
  • Keep an ordered list of marginal

benefits 

i from previous iteration

  • Re-evaluate 

i only for top

element

  • If 

i stays on top, use it,

  • therwise re-sort

a b c d Benefit  (s | A) e a d b c e a c d b e

Note: Very easy to compute online bounds, use in other algo’s, etc. [Leskovec, Krause et al. ’07]

slide-20
SLIDE 20

20

Simulated all on 2 weeks / 40 processors 152 GB data on disk  Very accurate computation of F(A) Using “lazy evaluations”: 1 hour/20 sensors

Done after 2 days! 

, 16 GB in main memory (compressed) Lower is better 30 hours/20 sensors

6 weeks for all 30 settings 

3.6M contaminations Very slow evaluation of F(A) 

1 2 3 4 5 6 7 8 9 10 100 200 300 Number of sensors selected Running time (minutes)

Exhaustive search (All subsets) Naive greedy

Fast greedy

Submodularity to the rescue:

Result of lazy evaluation

slide-21
SLIDE 21

2

Example: Windfarm deployment

[Changshui et al, Renewable Energy, 2011]

21

1 3 Contribution of 2 reduced due to wake effects Total power F(A) is monotonic submodular!  Wind

slide-22
SLIDE 22

Example: Windfarm deployment

[Changshui et al, Renewable Energy, 2011]

22

Greedy LazyGreedy Genetic Algo Power (kW) 79,585 79,585 78,850 Runtime 2.5 min 10 sec 1.6 hours

slide-23
SLIDE 23

Other interesting directions

Many sensing problems involve maximization

  • f monotonic submodular functions

Can use greedy algorithm to get near-optimal solutions! Lazy evaluations provide dramatic speedup

How can we handle more complex settings:

Complex constraints / complex cost functions? Sequential decisions?

23

slide-24
SLIDE 24

24

Non-constant cost functions

For each s V, let c(s)>0 be its cost (e.g., conservation cost, hardware cost, …) Cost of a set C(A) = sA c(s) Want to solve A* = argmax F(A) s.t. C(A) ≤ B Cost-benefit greedy algorithm:

Start with A := {}; While there is an s  V\A s.t. C(A U {s}) ≤ B

A := A U {s*}

slide-25
SLIDE 25

25

Performance of cost-benefit greedy

Want maxA F(A) s.t. C(A) ≤ 1 Cost-benefit greedy picks a. Then cannot afford b!  Cost-benefit greedy performs arbitrarily badly!

Set A F(A) C(A) {a} 2  {b} 1 1

slide-26
SLIDE 26

26

Cost-benefit optimization

[Wolsey ‘82, Sviridenko ‘04, Krause et al ‘05+

Theorem [Krause and Guestrin‘05]

ACB: cost-benefit greedy solution and AUC: unit-cost greedy solution (i.e., ignore costs)

Then max { F(ACB), F(AUC) } ≥ (1-1/√e) OPT Can still compute online bounds and speed up using lazy evaluations Note: Can also get (1-1/e) approximation in time O(n4) [Sviridenko ’04] (1-1/e) approximation for multiple linear constraints [Kulik ‘09+ 0.38/k approximation for k matroid and m linear constraints [Chekuri et al ‘11+

~39%

slide-27
SLIDE 27

Application: Conservation Planning

[w Golovin, Converse, Gardner, Morey – AAAI ‘11 Outstanding Paper+

How should we select land for conservation to protect rare & endangered species? Case Study: Planned Reserve in Washington State

Mazama pocket gopher streaked horned lark Taylor’s checkerspot

slide-28
SLIDE 28

Problem Ingredients

Land parcel details Geography: Roads, Rivers, etc Model of Species’ Population Dynamics

Reproduction, Colonization, Predation, Disease, Famine, Harsh Weather, …

slide-29
SLIDE 29
  • Time t+1

Population Dynamics

Environmental Conditions (Markovian) Our Choices Protected Parcels

  • Time t

Modeled using a Dynamic Bayesian Network

. . . . . . . . . . . . ! "# $% & ' & ! "# $% & ( &

R η

t

η

t+ 1

Z(i)

2,t

Z(i)

1,t

Z(i)

5,t

Z(i)

5,t+ 1

Z(i)

2,t+ 1

Z(i)

1,t+ 1

slide-30
SLIDE 30
  • Time t+1

Population Dynamics

Environmental Conditions (Markovian) Our Choices Protected Parcels

  • Time t

Modeled using a Dynamic Bayesian Network

. . . . . . . . . . . . ! "# $% & ' & ! "# $% & ( &

R η

t

η

t+ 1

Z(i)

2,t

Z(i)

1,t

Z(i)

5,t

Z(i)

5,t+ 1

Z(i)

2,t+ 1

Z(i)

1,t+ 1

slide-31
SLIDE 31

Model Parameters

From the ecology literature, or Elicited from panels of domain experts

100 200 300 400 500 0.0 0.2 0.4 0.6 0.8 1.0 Parcel Size (Acres) Annual Patch Survival Probability

  • Annual Patch Survival Probability

Patch Size (Acres)

slide-32
SLIDE 32
  • From Parcels to Patches

So we group parcels into larger patches.

Patch 1 Patch 2

Most parcels are too small to sustain a gopher family We assume no colonization between patches, and model only colonization within patches. We optimize over (sets of) patches.

slide-33
SLIDE 33

The Objective Function

In practice, use sample average approximation Choose R to maximize species persistence

. . . . . . . . . . . . ! "# $% & ' & ! "# $% & ( &

R η

t

η

t+ 1

Z(i)

2,t

Z(i)

1,t

Z(i)

5,t

Z(i)

5,t+ 1

Z(i)

2,t+ 1

Z(i)

1,t+ 1

Selected patches R

  • Pr[alive after 50yrs]

0.8 0.7 0.5 f(R)= 2.0 (Expected # alive)

slide-34
SLIDE 34

“Static” Conservation Planning

Select a reserve of maximum utility, subject to budget constraint

NP-hard But f is submodular  Can find a near-optimal solution!  Even in “incentive-compatible” manner *Singer ‘10+

slide-35
SLIDE 35

Solution

a

35

slide-36
SLIDE 36

10 20 30 40 50 60 0.5 1 1.5 2 2.5 Budget (km2) Expected number of surviving species Optimized random by area

Results: “Static” Planning

Can get large gain through optimization

slide-37
SLIDE 37

Other interesting directions

Many sensing problems involve maximization

  • f monotonic submodular functions

Can use greedy algorithm to get near-optimal solutions! Lazy evaluations provide dramatic speedup

How can we handle more complex settings:

Complex constraints / complex cost functions? Sequential decisions?

37

slide-38
SLIDE 38

Time t+1

  • Dynamic Conservation Planning

Build up reserve over time At each time step t, the budget Bt and the set Vt of available parcels may change May learn from information we gain after selecting patches

Time t

slide-39
SLIDE 39

Benefit of adaptivity

Find near-optimal set A

  • f sensor locations

Must commit to all actions in advance (no „observations“) Want near-optimal policy π for allocating resources based on observations

39

Sensors

Sequential decisions

Is there a notion of submodularity for policies??

„A priori“ decisions

Time t+1

  • Time t
slide-40
SLIDE 40

Problem Statement

Given:

Items (patches, tests, …) V=,1,…,n- Associated with random variables X1,…,Xn taking values in O Objective: Policy π maps observation xA to next item

Value of policy π: Want NP-hard (also hard to approximate!)

40

Patches picked by π if world in state xV

slide-41
SLIDE 41

Adaptive greedy algorithm

Suppose we’ve seen XA = xA. Conditional expected benefit of adding item s: Adaptive Greedy algorithm: Start with For i = 1:k

Pick Observe Set

41

When does this adaptive greedy algorithm work??

Benefit if world in state xV Conditional on

  • bservations xA
slide-42
SLIDE 42

Adaptive submodularity

[Golovin & Krause, JAIR 2011]

Adaptive monotonicity: Adaptive submodularity: Theorem: If f is adaptive submodular and adaptive monotone w.r.t. to distribution P, then

F(πgreedy) ≥ (1-1/e) F(πopt)

42

xB observes more than xA whenever Many other results about submodular set functions can also be “lifted” to the adaptive setting!

slide-43
SLIDE 43

Time t+1

  • Dynamic Conservation Planning

Build up reserve over time At each time step t, the budget Bt and the set Vt of available parcels may change May learn from information we gain after selecting patches

f is adaptive submodular in this setting! 

Time t

slide-44
SLIDE 44

Opportunistic Allocation for Dynamic Conservation

In each time step:

Available parcels and budget appear Opportunistically choose near-optimal allocation

Theorem: We get at least 38.7% of the value of

the best clairvoyant algorithm*

* Even under adversarial selection of available parcels & budgets! Time t=1 Time t=2

slide-45
SLIDE 45

Results

Adaptive optimization outperforms existing approaches

45

2 4 6 8 10 0.5 1 1.5 2 2.5 Time Expected number of persist. species a priori

  • ptimization

2 4 6 8 10 0.5 1 1.5 2 2.5 Time Expected number of persist. species random a priori

  • ptimization

2 4 6 8 10 0.5 1 1.5 2 2.5 Time Expected number of persist. species adaptive by area random a priori

  • ptimization

2 4 6 8 10 0.5 1 1.5 2 2.5 Time Expected number of persist. species Adaptive

  • ptimization

adaptive by area random a priori

  • ptimization
slide-46
SLIDE 46

Decision-Support Tool

[w Bogunovic, Converse]

46

Near real-time, interactive solver, see talk tomorrow!

slide-47
SLIDE 47

Related Work

Existing software

Marxan [Ball, Possingham & Watts ‘09+ Zonation [Moilanen and Kujala ‘08] General purpose software No population dynamics modeling, no guarantees

Sheldon et al. ‘10

Models non-submodular population dynamics Only considers static problem Relies on mixed integer programming

slide-48
SLIDE 48

Other applications of Adaptive Submodularity

Stochastic set cover Active learning Bayesian experimental design / value of information Influence maximization in social networks ... Submodular surrogates?

48

slide-49
SLIDE 49

Submodularity in ML / AI

49

Fast inference for high-order submodular MAP problems [NIPS ’10] Submodular dictionary selection for sparse representation [ICML ’10]

MATLAB Toolbox for optimizing submodular functions (JMLR ’10) Series of NIPS Workshops on Discrete Optimization in ML  videos on videolectures.net

Submodular compressive sensing [AISTATS ’12] First regret bounds for GP optimization [NIPS ’11]

C D J A O H G E K M I F N B L P C D J A O H G E K M I F N B L P

slide-50
SLIDE 50

Conclusions

Many applications in computational sustainabiity need large-scale discrete optimization under uncertainty Fortunately, some of those have structure: submodularity Submodularity can be exploited to develop efficient, scalable algorithms with strong guarantees Can handle complex constraints Adaptive submodularity allows to address sequential decision problems

50

Thanks: