Decision making under uncertainty Course overview Christos - - PowerPoint PPT Presentation

decision making under uncertainty
SMART_READER_LITE
LIVE PREVIEW

Decision making under uncertainty Course overview Christos - - PowerPoint PPT Presentation

Decision making under uncertainty Course overview Christos Dimitrakakis October 29, 2013 . . . . . . Christos Dimitrakakis () Decision making under uncertainty October 29, 2013 1 / 8 The problem of decision making under uncertainty


slide-1
SLIDE 1

. . . . . .

Decision making under uncertainty

Course overview Christos Dimitrakakis October 29, 2013

Christos Dimitrakakis () Decision making under uncertainty October 29, 2013 1 / 8

slide-2
SLIDE 2

. . . . . .

The problem of decision making under uncertainty

Modelling our uncertainty about the world ⇒ learning Optimising our decisions given our knowlege ⇒ planning

Christos Dimitrakakis () Decision making under uncertainty October 29, 2013 2 / 8

slide-3
SLIDE 3

. . . . . .

The problem of decision making under uncertainty

Modelling our uncertainty about the world ⇒ learning Optimising our decisions given our knowlege ⇒ planning

Applications and related problems

Optimisation: robust decisions, efficient search, planning. AI: modelling, learning from interaction and/or demonstration. Economics: Mechanism design, behavioural modelling. Security: Cryptography, Biometrics, Intrusion detection and response Biology and Medicine: Automatic experiment design, clinical trials, congitive science.

Christos Dimitrakakis () Decision making under uncertainty October 29, 2013 2 / 8

slide-4
SLIDE 4

. . . . . .

The problem of decision making under uncertainty

Modelling our uncertainty about the world ⇒ learning Optimising our decisions given our knowlege ⇒ planning

Applications and related problems

Optimisation: robust decisions, efficient search, planning. AI: modelling, learning from interaction and/or demonstration. Economics: Mechanism design, behavioural modelling. Security: Cryptography, Biometrics, Intrusion detection and response Biology and Medicine: Automatic experiment design, clinical trials, congitive science.

Planning, learning and the exploration-exploitation trade-off

Christos Dimitrakakis () Decision making under uncertainty October 29, 2013 2 / 8

slide-5
SLIDE 5

. . . . . . Exploration-exploitation Introduction

The exploration-exploitation trade-off

Example (Selecting a restaurant)

You usually go to Les Epinards. You heard that King’s Arm is really good! It’s Friday. Do you:

Go to Les Epinards?

Christos Dimitrakakis () Decision making under uncertainty October 29, 2013 3 / 8

slide-6
SLIDE 6

. . . . . . Exploration-exploitation Introduction

The exploration-exploitation trade-off

Example (Selecting a restaurant)

You usually go to Les Epinards. You heard that King’s Arm is really good! It’s Friday. Do you:

Go to Les Epinards? Call King’s Arm to reserve?

Christos Dimitrakakis () Decision making under uncertainty October 29, 2013 3 / 8

slide-7
SLIDE 7

. . . . . . Exploration-exploitation Introduction

The exploration-exploitation trade-off

Example (Selecting a restaurant)

You usually go to Les Epinards. You heard that King’s Arm is really good! It’s Friday. Do you:

Go to Les Epinards? Call King’s Arm to reserve? Check the menu of King’s Arm and then decide?

Christos Dimitrakakis () Decision making under uncertainty October 29, 2013 3 / 8

slide-8
SLIDE 8

. . . . . . Exploration-exploitation Introduction

The exploration-exploitation trade-off

Example (Selecting a restaurant)

You usually go to Les Epinards. You heard that King’s Arm is really good! It’s Friday. Do you:

Go to Les Epinards? Call King’s Arm to reserve? Check the menu of King’s Arm and then decide?

The exploration-exploitation trade-off

Exploit knowledge about the world to gain a known reward. Explore the world to learn, potentially getting less or more reward. Arises when data collection is interactive.

Christos Dimitrakakis () Decision making under uncertainty October 29, 2013 3 / 8

slide-9
SLIDE 9

. . . . . . Exploration-exploitation Introduction

Formalising decision problems

How do our decisions depend on what we want? How do we weigh alternatives? Is there a good concept of rationality?

Christos Dimitrakakis () Decision making under uncertainty October 29, 2013 4 / 8

slide-10
SLIDE 10

. . . . . . Exploration-exploitation Introduction

Formalising decision problems

How do our decisions depend on what we want? How do we weigh alternatives? Is there a good concept of rationality?

Beliefs, learning and planning

How can we express belief and how does belief change? How might we make decisions according to our beliefs? What if our decisions can affect our beliefs?

Christos Dimitrakakis () Decision making under uncertainty October 29, 2013 4 / 8

slide-11
SLIDE 11

. . . . . . Exploration-exploitation Introduction

Formalising decision problems

How do our decisions depend on what we want? How do we weigh alternatives? Is there a good concept of rationality?

Beliefs, learning and planning

How can we express belief and how does belief change? How might we make decisions according to our beliefs? What if our decisions can affect our beliefs?

Why decision theory?

Formalising trade-offs makes problems well-posed. Better overall solutions could be found. We may ignore non-essential aspects.

Christos Dimitrakakis () Decision making under uncertainty October 29, 2013 4 / 8

slide-12
SLIDE 12

. . . . . . Exploration-exploitation Introduction

The reinforcement learning problem

Learning to act in an unknown world, by interaction

The interaction with the world

The agent takes actions. The world generates observations. The agent receives rewards.

Goal

Maximise total reward during the agent’s lifetime: Fundamental problem in artificial intelligence. Connections to animal learning. Linked to experiment design, optimisation, game theory.

Christos Dimitrakakis () Decision making under uncertainty October 29, 2013 5 / 8

slide-13
SLIDE 13

. . . . . . Exploration-exploitation Introduction

Outline

* Probability refresher.

1 Subjective probability and utility. 2 Decision problems. 3 Estimation.

* Hypothesis testing.

4 Sequential sampling and optimal stopping. 5 Automatic experiment design and bandit problems. 6 Reinforcement learning I: Markov decision processes and fundamental algorithms. 7 Reinforcement learning II: Stochastic and approximation algorithms 8 Reinforcement learning III: Generalised problems. 9 Project meeting. 10 Reinforcement learning IV: Bayesian algorithms 11 Reinforcement learning V: Bandit algorithms and regret 12 Project meeting. 13 Learning with expert advice 14 Learning by demonstration; Preference Elicitation

Christos Dimitrakakis () Decision making under uncertainty October 29, 2013 6 / 8

slide-14
SLIDE 14

. . . . . . Exploration-exploitation Introduction

Assessment

Exercises and feedback: 40%

Exercises after every unit. Exercise sets include feedback form. Necessary for a good project!

Christos Dimitrakakis () Decision making under uncertainty October 29, 2013 7 / 8

slide-15
SLIDE 15

. . . . . . Exploration-exploitation Introduction

Assessment

Exercises and feedback: 40%

Exercises after every unit. Exercise sets include feedback form. Necessary for a good project!

Participation: 10%

Active participation in the course. Corrections on course notes.

Christos Dimitrakakis () Decision making under uncertainty October 29, 2013 7 / 8

slide-16
SLIDE 16

. . . . . . Exploration-exploitation Introduction

Assessment

Exercises and feedback: 40%

Exercises after every unit. Exercise sets include feedback form. Necessary for a good project!

Participation: 10%

Active participation in the course. Corrections on course notes.

Project: 50%

Competition, presentation and report. Team competition using rl-glue socket API. Each team codes:

An environment (test-bed). An agent.

Agents are evaluated on all environments.

Christos Dimitrakakis () Decision making under uncertainty October 29, 2013 7 / 8

slide-17
SLIDE 17

. . . . . . Exploration-exploitation Introduction

Themes

Models for representing belief and preferences. Algorithms for decision making. Fast optimisation. Applications in finance. Decision making in animals. Inferring preferences and beliefs. Automatic design of experiments.

Christos Dimitrakakis () Decision making under uncertainty October 29, 2013 8 / 8

slide-18
SLIDE 18

. . . . . . Exploration-exploitation Introduction

[1] Dimitri P. Bertsekas and John N. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, 1996. [2] George Casella, Stephen Fienberg, and Ingram Olkin, editors. Monte Carlo Statistical

  • Methods. Springer Texts in Statistics. Springer, 1999.

[3] Nicol`

  • Cesa-Bianchi and G´

abor Lugosi. Prediction, Learning and Games. Cambridge University press, Cambridge, UK, 2006. [4] Morris H. DeGroot. Optimal Statistical Decisions. John Wiley & Sons, 1970. [5] Marting L. Puterman. Markov Decision Processes : Discrete Stochastic Dynamic

  • Programming. John Wiley & Sons, New Jersey, US, 1994.

[6] Leonard J. Savage. The Foundations of Statistics. Dover Publications, 1972. [7] Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.

Christos Dimitrakakis () Decision making under uncertainty October 29, 2013 8 / 8