Logistics Reading AIMA Ch 21 (Reinforcement Learning) Markov - PDF document

Logistics • Reading AIMA Ch 21 (Reinforcement Learning) Markov Decision Processes • Project 1 due today 2 printouts of report Email Miao with CSE 573 • Source code • Document in .doc or .pdf • Project 2 description on web New teams • By Monday 11/15 - Email Miao w/ team + direction Feel free to consider other ideas Idea 1: Spam Filter Idea 2: Localization • Decision Tree Learner ? • Placelab data • Ensemble of… ? • Learn “places” • Naïve Bayes ? K-means clustering • Predict movements Bag of Words Representation between places Enhancement Markov model, or …. • Augment Data Set ? • ??????? Proto-idea 4: Openmind.org Proto-idea 3: Captchas • Repository of Knowledge in NLP • The problem of software robots • What the heck can we do with it???? • Turing test is big business • Break or create Non-vision based? 1

Proto-idea 4: Wordnet Openmind Animals www.cogsci.princeton.edu/~wn/ • Giant graph of concepts Centrally controlled � semantics • What to do? • Integrate with FAQ lists, Openmind, ??? 573 Topics Where are We? • Uncertainty • Bayesian Networks • Sequential Stochastic Processes Reinforcement Learning (Hidden) Markov Models Supervised Learning Planning Dynamic Bayesian networks (DBNs) Probabalistic STRIPS Representation Logic-Based Probabilistic • Markov Decision Processes (MDPs) Knowledge Representation & Inference • Reinforcement Learning Search Problem Spaces Agency An Example Bayes Net Planning under uncertainty Planning Static Environment Pr(B=t) Pr(B=f) Earthquake Burglary 0.05 0.95 Instantaneous Stochastic Pr(A|E,B) e,b 0.9 (0.1) e,b 0.2 (0.8) Fully Observable Radio Alarm Perfect e,b 0.85 (0.15) What action e,b 0.01 (0.99) next? Percepts Actions Nbr1Calls Nbr2Calls Full 2

Recap: Markov Models Models of Planning Uncertainty Q: set of states Deterministic Disjunctive Probabilistic Complete Classical Contingent MDP Observation π: init prob distribution Partial ??? Contingent POMDP A: transition probability distribution ONE per ACTION Markov assumption ??? Conformant POMDP None Stationary model assumption Probabilistic “STRIPS”? A Factored domain in O ffice Move:office � cafe • Variables : R aining has_user_coffee (huc) , has_robot_coffee (hrc), robot_is_wet (w), has_robot_umbrella (u), raining (r), robot_in_office (o) has U mbrella -in O ffice • Actions : buy_coffee, deliver_coffee, get_umbrella, P<.1 move -in O ffice + W et What is the number of states? Can we succinctly represent transition -in O ffice -in O ffice probabilities in this case? + W et Dynamic Bayesian Nets Dynamic Bayesian Net for Move Actually table Total values 8 huc huc’ required to should have 16 huc huc represent entries! 4 transition hrc hrc’ hrc hrc probability Pr(w’|u,w) table = 36 u,w 1.0 (0) 16 w w w w’ u,w 0.1 (0.9) Vs 4096 u,w 1.0 (0) 4 u,w 1.0 (0) u u u u’ 2 r r’ r r Pr(r=T) Pr(r=F) 0.95 0.5 2 o o o o’ 3

Actions in DBN Observability huc huc • Full Observability hrc hrc • Partial Observability Last Time: w w • No Observability Actions in DBN u u Unrolling r r o o Don’t need them Today T T+1 a Reward/cost Horizon • Finite : Plan till t stages. • Each action has an associated cost. Reward = R(s 0 )+R(s 1 )+R(s 2 )+…+R(s t ) • Agent may accrue rewards at different • Infinite : The agent never dies. stages. A reward may depend on The reward R(s 0 )+R(s 1 )+R(s 2 )+… The current state Could be unbounded. The (current state, action) pair ? The (current state, action, next state) triplet • Additivity assumption : Costs and rewards are additive. Discounted reward : R(s 0 )+ γ R(s 1 )+ γ 2 R(s 2 )+… • Reward accumulated = R(s 0 )+R(s 1 )+R(s 2 )+… Average reward : lim n � ∞ (1/n)[ Σ i R(s i )] Goal for an MDP Optimal value of a state • Define V*(s) `value of a state’ as the maximum • Find a policy which: expected discounted reward achievable from this maximizes expected discounted reward state. over an infinite horizon • Value of state if we force it to do action “a” right now, but let it act optimally later: for a fully observable Q*(a,s)=R(s) + c(a) + Markov decision process. γΣ s’ ε S Pr(s’|a,s)V*(s’) • V* should satisfy the following equation: Why shouldn’t the planner find a plan?? V*(s) = max a ε A {Q*(a,s)} What is a policy?? = R(s) + max a ε A {c(a) + γΣ s’ ε S Pr(s’|a,s)V*(s’)} 4

Value iteration Bellman Backup • Assign an arbitrary assignment of values to V n Q n+1 (s,a) each state (or use an admissible heuristic). Max V n a 1 • Iterate over the set of states and in each V n iteration improve the value function as follows: a 2 s V n+1 (s) V n V t+1 (s)=R(s) + max a ε A {c(a)+ γΣ s’ ε S Pr(s’|a,s) V t (s’)} a 3 V n `Bellman Backup’ V n • Stop the iteration appropriately. V t approaches V* as t increases. V n Stopping Condition Complexity of value iteration • ε -convergence : A value function is ε –optimal • One iteration takes O(|S| 2 |A|) time. if the error (residue) at every state is less • Number of iterations required : than ε . poly(|S|,|A|,1/(1- γ )) Residue(s)=|V t+1 (s)- V t (s)| • Overall, the algorithm is polynomial in state Stop when max s ε S R(s) < ε space! • Thus exponential in number of state variables. Computation of optimal policy Policy evaluation • Given the value function V*(s), for each • Given a policy Π :S � A, find value of each state, do Bellman backups and the action state using this policy. which maximises the inner product term is • V Π (s) = R(s) + c( Π (s)) + the optimal action. γ [ Σ s’ ε S Pr(s’| Π (s),s)V Π (s’)] • � Optimal policy is stationary (time • This is a system of linear equations independent) – intuitive for infinite horizon involving |S| variables. case. 5

Bellman’s principle of optimality Policy iteration • Start with any policy ( Π 0 ). • A policy Π is optimal if V Π (s) ≥ V Π ’ (s) for all policies Π ’ and all states s є S. • Iterate Policy evaluation : For each state find V Π i (s). • Rather than finding the optimal value Policy improvement : For each state s, find action function, we can try and find the optimal a* that maximises Q Π i (a,s). policy directly, by doing a policy space If Q Π i (a*,s) > V Π i (s) let Π i+1 (s) = a* search. else let Π i+1 (s) = Π i (s) • Stop when Π i+1 = Π i • Converges faster than value iteration but policy evaluation step is more expensive. Modified Policy iteration RTDP iteration • Rather than evaluating the actual value of • Start with initial belief and initialize value of policy by solving system of equations, each belief as the heuristic value. approximate it by using value iteration with • For current belief fixed policy. Save the action that minimises the current state value in the current policy. Update the value of the belief through Bellman Backup. • Apply the minimum action and then randomly pick an observation. • Go to next belief assuming that observation. • Repeat until goal is achieved. Fast RTDP convergence Other speedups • What are the advantages of RTDP? • Heuristics • What are the disadvantages of RTDP? • Aggregations • Reachability Analysis How to speed up RTDP? 6

Going beyond full observability Models of Planning Uncertainty • In execution phase, we are uncertain Deterministic Disjunctive Probabilistic where we are, • but we have some idea of where we can be. Complete Classical Contingent MDP • A belief state = ? Observation Partial ??? Contingent POMDP ??? Conformant POMDP None Speedups Mathematical modelling • Search space : finite/infinite state/belief space. • Reachability Analysis Belief state = some idea of where we are • More informed heuristic • Initial state/belief. • Actions • Action transitions (state to state / belief to belief) • Action costs • Feedback : Zero/Partial/Total Algorithms for search Full Observability • A* : works for sequential solutions. • Modelled as MDPs. (also called fully observable MDPs) • AO* : works for acyclic solutions. • Output : Policy (State � Action) • LAO* : works for cyclic solutions. • Bellman Equation • RTDP : works for cyclic solutions. V*(s)=max a ε A(s) [c(a)+ Σ s’ ε S V*(s’)P(s’|s,a)] 7

Partial Observability No observability • Modelled as POMDPs. (partially observable • Deterministic search in the belief space. MDPs). Also called Probabilistic Contingent • Output ? Planning. • Belief = probabilistic distribution over states. • What is the size of belief space? • Output : Policy (Discretized Belief -> Action) • Bellman Equation o )] V*(b)=max a ε A(b) [c(a)+ Σ o ε O P(b,a,o) V*(b a 8

Logistics Reading AIMA Ch 21 (Reinforcement Learning) Markov - PDF document

Logistics Reading AIMA Ch 21 (Reinforcement Learning) Markov Decision Processes Project 1 due today 2 printouts of report Email Miao with CSE 573 Source code Document in .doc or .pdf Project 2 description on web New

Project Logistics 1 Our Satisfied Project Logistics Customers 2 Project Logistics Solutions

Presentation Air Logistics Group Air Logistics Group Introduction Introducing Air Logistics

Milestone Logistics Fine Arts . Freight . Distribution Packing . Storage . Logistics . Relocation

Logistics Hotels and Rail Freight Logistics in French Cities Dr. Laetitia Dablanc IFSTTAR,

WFP LOGISTICS CONTENTS World Food Programme: Who we are How WFP Logistics Works

LOGISTICS HUB LUXEMBOURG A TAILOR MADE SOLUTION FOR YOUR EUROPEAN DISTRIBUTION DANIEL LIEBERMANN

BRIDGEePORT LOGISTICS CENTER PERTH AMBOY, NEW JERSEY BRIDGEePORT LOGISTICS CENTER

Cargo Sales & Service Presentation Air Logistics Group Air Logistics Group Introduction

logistics sector in Germany to use e-documents? European Logistics Platform, Brussels, 8 December

Investor Presentation Allcargo Logistics Indias 1 st Multinational Logistics Company

SAFE Urban logistics Scandinavian Analysis of urban Freight logistics using Electric

4 JROTC LOGISTICS LOGISTICS RESPONSIBILITIES DUTIES & ACCOUNTABILTY Military Property

Kline Tower (KT) Renovation Town Hall Meeting February 19, 2020 Project Site Logistics:

CONSTRUCTION LOGISTICS PROGRAMME Construction Logistics Improvement Group Meeting 5 Housekeeping

Adi Logistics Our Commitment Adi Logistics and Transport is an Afghan based company that is

LOGISTICS Maximizing its Contribution to the Organization Dave Klugman CEO Simplified

Including ALL Students: WHY and HOW? The Journey Toward Inclusive Practices Arlington Public

On Faults and Faulty Programs Ali Jaoua, Marcelo Frias, Ali Mili RAMICS 2014 Marienstatt im

2018 Annual General Meeting Wednesday, 23 May 2018 Important notice and disclaimer You must read

Fourth Quarter and Full-Year 2011 Results Presentation to Investors and Analysts February 9,

Boosting Happiness and Managing Stress Dr. Marc Milstein www.drmarcmilstein.com Th The Hap

discussion DATE: November 12, 2008 . Tuesday 14 th June 2011 Supported by The Royal Australian

Financial Heterogeneity and Monetary Union S. Gilchrist 1 R. Schoenle 2 J. Sim 3 sek 3 E. Zakraj

L ECTURE 1: I NTRODUCTION Prof. Julia Hockenmaier juliahmr@illinois.edu Welcome to CS 446!

Logistics Reading AIMA Ch 21 (Reinforcement Learning) Markov - PDF document

Logistics Reading AIMA Ch 21 (Reinforcement Learning) Markov Decision Processes Project 1 due today 2 printouts of report Email Miao with CSE 573 Source code Document in .doc or .pdf Project 2 description on web New

Project Logistics 1 Our Satisfied Project Logistics Customers 2 Project Logistics Solutions

Presentation Air Logistics Group Air Logistics Group Introduction Introducing Air Logistics

Milestone Logistics Fine Arts . Freight . Distribution Packing . Storage . Logistics . Relocation

Logistics Hotels and Rail Freight Logistics in French Cities Dr. Laetitia Dablanc IFSTTAR,

WFP LOGISTICS CONTENTS World Food Programme: Who we are How WFP Logistics Works

LOGISTICS HUB LUXEMBOURG A TAILOR MADE SOLUTION FOR YOUR EUROPEAN DISTRIBUTION DANIEL LIEBERMANN

BRIDGEePORT LOGISTICS CENTER PERTH AMBOY, NEW JERSEY BRIDGEePORT LOGISTICS CENTER

Cargo Sales &amp; Service Presentation Air Logistics Group Air Logistics Group Introduction

logistics sector in Germany to use e-documents? European Logistics Platform, Brussels, 8 December

Investor Presentation Allcargo Logistics Indias 1 st Multinational Logistics Company

SAFE Urban logistics Scandinavian Analysis of urban Freight logistics using Electric

4 JROTC LOGISTICS LOGISTICS RESPONSIBILITIES DUTIES &amp; ACCOUNTABILTY Military Property

Kline Tower (KT) Renovation Town Hall Meeting February 19, 2020 Project Site Logistics:

CONSTRUCTION LOGISTICS PROGRAMME Construction Logistics Improvement Group Meeting 5 Housekeeping

Adi Logistics Our Commitment Adi Logistics and Transport is an Afghan based company that is

LOGISTICS Maximizing its Contribution to the Organization Dave Klugman CEO Simplified

Including ALL Students: WHY and HOW? The Journey Toward Inclusive Practices Arlington Public

On Faults and Faulty Programs Ali Jaoua, Marcelo Frias, Ali Mili RAMICS 2014 Marienstatt im

2018 Annual General Meeting Wednesday, 23 May 2018 Important notice and disclaimer You must read

Fourth Quarter and Full-Year 2011 Results Presentation to Investors and Analysts February 9,

Boosting Happiness and Managing Stress Dr. Marc Milstein www.drmarcmilstein.com Th The Hap

discussion DATE: November 12, 2008 . Tuesday 14 th June 2011 Supported by The Royal Australian

Financial Heterogeneity and Monetary Union S. Gilchrist 1 R. Schoenle 2 J. Sim 3 sek 3 E. Zakraj

L ECTURE 1: I NTRODUCTION Prof. Julia Hockenmaier juliahmr@illinois.edu Welcome to CS 446!

Cargo Sales & Service Presentation Air Logistics Group Air Logistics Group Introduction

4 JROTC LOGISTICS LOGISTICS RESPONSIBILITIES DUTIES & ACCOUNTABILTY Military Property