Reinforcement Learning
A (almost)quick(and very incomplete) introduction
Slides from David Silver, Dan Klein, Mausam, Dan Weld
Reinforcement Learning A (almost) quick (and very incomplete) - - PowerPoint PPT Presentation
Reinforcement Learning A (almost) quick (and very incomplete) introduction Slides from David Silver, Dan Klein, Mausam, Dan Weld Reinforcement Learning At each time step t : Agent executes an action A t Environment emits a reward Rt
A (almost)quick(and very incomplete) introduction
Slides from David Silver, Dan Klein, Mausam, Dan Weld
At each time step t:
An RL agent may include one or more of these components:
Model-based vs. Model-free
models (i.e. transition probabilities.
learn what action to do when (without necessarily finding out the exact model of the action)
On Policy vs. Off Policy
policy, and improves it based on estimates.
another (or re-using experience from old policy).
MDP = <S, A, T, R, γ>
policy μ(a|s) {s1,a1,r2,...,sT} ∼ μ
Karthik Narasimhan, Adam Yala, Regina Barzilay CSAIL, MIT
Slides from Karthik Narasimhan
Why try to reason, when someone else can do it for you
Error.
tail of noisy, irrelevant documents" is unclear. [Yash]
be completely fair. [Anshul]
* most mean questions