Topics in Computational Sustainability CS 325 Spring 2016 Note to - PowerPoint PPT Presentation

Topics in Computational Sustainability CS 325 Spring 2016 Note to other teachers and users of these slides. Andrew would be delighted Making Choices: Sequential Decision Making if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials: http://www.cs.cmu.edu/~awm/tutorials . Comments and corrections gratefully received.

Stochastic programming decision ? Probabilistic model dry wet a b c {a(p a ),b(p b ),c(p c )}  maximizes expected utility  minimizes expected cost

Problem Setup Available Conserved parcels parcels " ! "# $%# Current territories Potential territories Given limited budget, what parcels should I conserve to maximize the expected number of occupied territories in 50 years?

Metapopulation = Cascade • Metapopulation model can be viewed as a cascade in the layered graph representing territories over time i i i i i j j j j j k k k k k Patches l l l l l m m m m m Target nodes: territories at final time step

Management Actions • Conserving parcels adds nodes to the network to create new pathways for the cascade Initial network Parcel 1 Parcel 2

Cascade Optimization Problem Given: • Patch network – Initially occupied territories – Colonization and extinction probabilities • Management actions – Already-conserved parcels – List of available parcels and their costs • Time horizon T • Budget B Find set of parcels with total cost at most B that maximizes the expected number of occupied territories at time T. Can we make our decision adaptively?

Sequential decision making • We have a systems that changes state over time • Can (partially) control the system state transitions by taking actions • Problem gives an objective that specifies which states (or state sequences) are more/less preferred • Problem: At each time step select an action to optimize the overall (long-term) objective – P roduce most preferred sequences of “states” 9

Discounted Rewards/Costs An assistant professor gets paid, say, 20K per year. How much, in total, will the A.P. earn in their life? 20 + 20 + 20 + 20 + 20 + … = Infinity $ $ What’s wrong with this argument?

Discounted Rewards “A reward (payment) in the future is not worth quite as much as a reward now.” – Because of chance of obliteration – Because of inflation Example: Being promised $10,000 next year is worth only 90% as much as receiving $10,000 right now. Assuming payment n years in future is worth only (0.9) n of payment now, what is the AP’s Future Discounted Sum of Rewards ?

Infinite Sum Assuming a discount rate of 0.9, how much does the assistant professor get in total? 20 + .9 20 + .9 2 20 + .9 3 20 + … = 20 + .9 (20 + .9 20 + .9 2 20 + …) x = 20 + .9 x x = 20/(.1) = 200

Discount Factors People in economics and probabilistic decision- making do this all the time. The “Discounted sum of future rewards” using discount factor g ” is (reward now) + g (reward in 1 time step) + g 2 (reward in 2 time steps) + g 3 (reward in 3 time steps) + : : (infinite sum)

Markov System: the Academic Life 0.6 0.6 0.2 0.2 0.7 B. A. T. Assoc. Assistant Tenured Prof Prof Prof 60 20 400 0.2 S. 0.2 0.3 D. On the Dead Street 0 10 Define: 0.7 0.3 J A = Expected discounted future rewards starting in state A J B = Expected discounted future rewards starting in state B J T = “ “ “ “ “ “ “ T J S = “ “ “ “ “ “ “ S J D = “ “ “ “ “ “ “ D How do we compute J A , J B , J T , J S , J D ?

Working Backwards 247 151 0.2 B. Associate A. Assistant 0.2 0.7 Prof.: 60 Prof.: 20 0.2 0.6 270 0.2 0.6 T. Tenured Prof.: 100 27 0.3 0.3 S. Out on the 0 Street: 10 D. Dead: 0 Discount 0.7 factor 0.9 1.0

Reincarnation? 0.2 B. Associate A. Assistant 0.2 0.7 Prof.: 60 Prof.: 20 0.2 0.6 0.2 0.6 T. Tenured Prof.: 100 0.3 0.3 S. Out on the Street: 10 D. Dead: 0 Discount factor 0.9 0.5 0.7 0.5

System of Equations L(A) = 20 + .9(.6 L(A) + .2 L(B) + .2 L(S)) L(B) = 60 + .9(.6 L(B) + .2 L(S) + .2 L(T)) L(S) = 10 + .9(.7 L(S) + .3 L(D)) L(T) = 100 + .9(.7 L(T) + .3 L(D)) L(D) = 0 + .9 (.5 L(D) + .5 L(A))

Solving a Markov System with Matrix Inversion • Upside: You get an exact answer • Downside: If you have 100,000 states you’re solving a 100,000 by 100,000 system of equations.

Value Iteration: another way to solve a Markov System Define J 1 (S i ) = Expected discounted sum of rewards over the next 1 time step. J 2 (S i ) = Expected discounted sum rewards during next 2 steps J 3 (S i ) = Expected discounted sum rewards during next 3 steps : J k (S i ) = Expected discounted sum rewards during next k steps J 1 (S i ) = (what?) J 2 (S i ) = (what?) J k+1 (S i ) = (what?)

Value Iteration: another way to solve a Markov System Define J 1 (S i ) = Expected discounted sum of rewards over the next 1 time step. J 2 (S i ) = Expected discounted sum rewards during next 2 steps J 3 (S i ) = Expected discounted sum rewards during next 3 steps : J k (S i ) = Expected discounted sum rewards during next k steps N = Number of states J 1 (S i ) = r i (what?) N   g 1 ( ) r p J s J 2 (S i ) = i ij j  (what?) 1 j N :   g k ( ) r p J s J k+1 (S i ) = i ij j  j 1 (what?)

Let’s do Value Iteration 1/2 WIND g = 0.5 1/2  SUN HAIL  0 .::.:.:: 1/2 +4 1/2 -8 1/2 1/2 J k ( SUN ) J k ( WIND ) J k ( HAIL ) k 1 2 3 4 5

Let’s do Value Iteration g = 0.5 1/2 WIND 1/2  SUN HAIL  0 .::.:.:: 1/2 +4 1/2 -8 1/2 1/2 J k ( SUN ) J k ( WIND ) J k ( HAIL ) k 1 4 0 -8 2 5 -1 -10 3 5 -1.25 -10.75 4 4.94 -1.44 -11 5 4.88 -1.52 -11.11

Value Iteration for solving Markov Systems • Compute J 1 (S i ) for each i • Compute J 2 (S i ) for each i : • Compute J k (S i ) for each i As k →∞ J k (S i )→J*(S i ) When to stop? When Max J k+1 (S i ) – J k (S i ) < ξ i This is faster than matrix inversion (N 3 style) if the transition matrix is sparse What if we have a way to interact with the Markov system?

A Markov Decision Process 1 S You run a 1 Poor & Poor & 1/2 startup A Unknown Famous A 1/2 company. +0 +0 In every S 1/2 state you 1/2 1 1/2 1/2 must choose 1/2 between Saving 1/2 Rich & Rich & money (S) Famous 1/2 S Unknown or +10 1/2 +10 Advertising (A). g = 0.9

Markov Decision Processes An MDP has… • A set of states {s 1 ··· S N } • A set of actions {a 1 ··· a M } • A set of rewards {r 1 ··· r N } (one for each state) • A transition probability function      k P Prob Next This and I use action j i k ij On each step: 0. Call current state S i 1. Receive reward r i 2. Choose action  {a 1 ··· a M } 3. If you choose action a k you’ll move to state S j with k P probability ij 4. All future rewards are discounted by g What’s a solution to an MDP? A sequence of actions?

A Policy A policy is a mapping from states to actions . Examples 1 S 1 PU PF Policy Number 1: A STATE → ACTION 1/2 0 0 PU S 1 PF A 1/2 RU RF RU S +10 +10 RF A Policy Number 2: STATE → ACTION 1 1/2 1/2 PU A PU PF A A 0 0 PF A 1/2 1/2 RU A 1 RU RF A RF A A 10 10 • How many possible policies in our example? • Which of the above two policies is best? • How do you compute the optimal policy?

Interesting Fact For every M.D.P. there exists an optimal policy. It’s a policy such that for every possible start state there is no better option than to follow the policy.

Computing the Optimal Policy Idea One: Run through all possible policies. Select the best. What’s the problem ??

Optimal Value Function Define J*(S i ) = Expected Discounted Future Rewards, starting from state S i , assuming we use the optimal policy 1 B 1 0 Question: S 2 1/2 1/2 +3 What is an optimal policy S 1 for that MDP? 1/3 1/2 +0 (assume g = 0.9) 1/3 1/2 1/3 S 3 +2 1 What is J*(S 1 ) ? What is J*(S 2 ) ? What is J*(S 3 ) ?

Computing the Optimal Value Function with Value Iteration Define J k (S i ) = Maximum possible expected sum of discounted rewards I can get if I start at state S i and I live for k time steps. Note that J 1 (S i ) = r i

Let’s compute J k (S i ) for our example J k (PU) J k (PF) J k (RU) J k (RF) k 1 2 3 4 5 6

J k (PU) J k (PF) J k (RU) J k (RF) k 1 0 0 10 10 2 0 4.5 14.5 19 3 2.03 8.55 16.52 25.08

Topics in Computational Sustainability CS 325 Spring 2016 Note to - PowerPoint PPT Presentation

Topics in Computational Sustainability CS 325 Spring 2016 Note to other teachers and users of these slides. Andrew would be delighted Making Choices: Sequential Decision Making if you found this source material useful in giving your own

Computational Sustainability: Smart Buildings CS 325: Topics in Computational Sustainability,

Computational Sustainability Carla P. Gomes Institute for Computational Sustainability Computing

Urban Urban Sustainability Urban Urban Sustainability Sustainability Sustainability I di I

Computational Sustainability Andreas Krause Master Class at CompSust 2012 Combinatorial

Building Sustainability: Building Sustainability: Building Sustainability: Building

Sustainability Sustainability Alyssa Dolher + Elenor Methven ARC 503 Sustainability For the

Sustainability Strategy Ask SMG Sustainability Sustainability is one of the four themes of

Topics in Computational Sustainability CS 325 Spring 2016 Lecture 1: Intro Course information

Topics in Computational Linguistics Topics in Computational Linguistics March 28, 2014 GIL,

IAG Sustainability IAG is committed to being leaders in sustainability What do we mean by

SUSTAINABILITY. Page 1 SUCCESS FACTOR CORPORATE SUSTAINABILITY. AGENDA. How does our holistic

AMS Sustainability June Report June 24th, 2020 AMS SUSTAINABILITY TEAM Sylvester Mensah Vice

Sustainability Definitions Definitions Sustainability Sustainability creates and maintains the

SUSTAINABILITY COMMITTEE MEETING OCTOBER 30, 2017 OFFICE OF SUSTAINABILITY Left to right: Ellie

Sustainability Check: a new tool for sustainability assessment early in the planning process

Transport Sustainability Management Economic Sustainability Aspects Levers of change for

Odyssey 2016, Bilbao, Spain Short- and Long-Term Speech Features for Hybrid HMM-i-Vector based

Experiments for Time-Predictable Execution of GPU Kernels Flavio Kreiliger, Joel Matjka, Michal

Tow ards Self-Tim ed Logic in the Tim e- Triggered Protocol Overview Introduction

WiFi and Multiple Interfaces: Adequate for Virtual Reality? Huanle Zhang*, Ahmed Elmokashfi # ,

Markov Systems, Markov Decision Processes, and Dynamic Programming Andrew W. Moore Note to

SLIDES AGAINST HUMANITY 1 Slides Against Humanity Course/Program: The Basic Course, Public

Optimal Mini-Batch and Step Sizes for SAGA Nidham Gazagnadou 1 , a joint work with Robert M. Gower

A Perfect Sampling Method for Exponential Random Graph Models Carter T. Butts Department of