Provably Efficient RL via Latent State Decoding Akshay Alekh John - PowerPoint PPT Presentation

Provably Efficient RL via Latent State Decoding Akshay Alekh John Simon S. Du Krishnamurthy Nan Jiang Miro Dudík Agarwal Langford

RL theory vs practice

RL theory vs practice Theory Simple tabular environments No generalization

RL theory vs practice Theory Practice Simple tabular environments Complex rich-observation environments   No generalization Generalization via function approximation

RL theory vs practice Theory Practice Simple tabular environments Complex rich-observation environments   No generalization Generalization via function approximation Can we design provably sample-e ffi cient RL algorithms for rich observation environments?

Block MDPs A structured model for rich observation RL

Block MDPs Context x A structured model for rich observation RL • Agent only observes rich context (visual signal)

Block MDPs State s Context x A structured model for rich observation RL • Agent only observes rich context (visual signal) • Environment summarized by small hidden state space (agent location)

Block MDPs State s Context x (Left) Action a A structured model for rich observation RL • Agent only observes rich context (visual signal) • Environment summarized by small hidden state space (agent location)

Block MDPs For H steps State s State s Context x Context x (Left) Action a Action a A structured model for rich observation RL • Agent only observes rich context (visual signal) • Environment summarized by small hidden state space (agent location)

Block MDPs For H steps State s State s Context x Context x (Left) Action a Action a A structured model for rich observation RL • Agent only observes rich context (visual signal) • Environment summarized by small hidden state space (agent location) • State can be decoded from observation

Objective: Find a Decoder

Objective: Find a Decoder Idea: Find a function that decodes f( ) = hidden states from contexts. context state

Objective: Find a Decoder Idea: Find a function that decodes f( ) = hidden states from contexts. context state Reduce to a tabular problem

Objective: Find a Decoder Idea: Find a function that decodes f( ) = hidden states from contexts. context state Reduce to a tabular problem Main Challenge: There is no label (we cannot observe hidden states).

Approach

Approach Our Approach: Learn a function that predicts the conditional probability of (previous state, action) pairs from f( ) = contexts. (assume access a regression oracle to s1,a1 s1,a2 s2,a1 s2,a2 context learn this function) State at level h: s1, s2 Actions: a1, a2

Approach Our Approach: Learn a function that predicts the conditional probability of (previous state, action) pairs from f( ) = contexts. (assume access a regression oracle to s1,a1 s1,a2 s2,a1 s2,a2 context learn this function) State at level h: s1, s2 Actions: a1, a2 Di ff erent conditional probabilities correspond to di ff erent states s1,a1 s1,a2 s2,a1 s2,a2 s1,a1 s1,a2 s2,a1 s2,a2 State at level h+1: s3 s4

Approach Our Approach: Learn a function that predicts the conditional probability of (previous state, action) pairs from f( ) = contexts. (assume access a regression oracle to s1,a1 s1,a2 s2,a1 s2,a2 context learn this function) State at level h: s1, s2 Actions: a1, a2 Di ff erent conditional probabilities correspond to di ff erent states s1,a1 s1,a2 s2,a1 s2,a2 s1,a1 s1,a2 s2,a1 s2,a2 State at level h+1: s3 s4 State classification

Guarantees Theorem: Our algorithm can find a near-optimal decoder with poly(M,K,H) samples in polynomial time, with H calls to supervised learning black box.

Guarantees Theorem: Our algorithm can find a near-optimal decoder with poly(M,K,H) samples in polynomial time, with H calls to supervised learning black box. M = Number of hidden states, K = Number of actions, H = Time horizon

Guarantees Statistical e ffi ciency Theorem: Our algorithm can find a near-optimal decoder with poly(M,K,H) samples in polynomial time, with H calls to supervised learning black box. M = Number of hidden states, K = Number of actions, H = Time horizon

Guarantees Statistical e ffi ciency Theorem: Our algorithm can find a near-optimal decoder with poly(M,K,H) samples in polynomial time, with H calls to supervised learning black box. M = Number of hidden states, K = Number of actions, H = Time horizon Computational e ffi ciency

Guarantees Statistical e ffi ciency Theorem: Our algorithm can find a near-optimal decoder with poly(M,K,H) samples in polynomial time, with H calls to supervised learning black box. M = Number of hidden states, K = Number of actions, H = Time horizon Computational e ffi ciency Rich observations

Guarantees Statistical e ffi ciency Theorem: Our algorithm can find a near-optimal decoder with poly(M,K,H) samples in polynomial time, with H calls to supervised learning black box. M = Number of hidden states, K = Number of actions, H = Time horizon Computational e ffi ciency Rich observations Assumptions • Supervised learner expressive enough • Latent states reachable and identifiable

Algorithm details and experiments @ Poster #208

Provably Efficient RL via Latent State Decoding Akshay Alekh John - PowerPoint PPT Presentation

Provably Efficient RL via Latent State Decoding Akshay Alekh John Simon S. Du Krishnamurthy Nan Jiang Miro Dudk Agarwal Langford RL theory vs practice RL theory vs practice Theory Simple tabular environments No generalization RL

Efficient Video Decoding on GPUs Efficient Video Decoding on GPUs by Point Based Rendering by

Provably secure hash functions - do we care? Krystian Matusiewicz Technical University of Denmark

By et al Siegfried Engelmann Decoding Strategies: Decoding B1- Teacher's Presentation Book

Decoding Philipp Koehn 17 September 2020 Philipp Koehn Machine Translation: Decoding 17

Observation Decoding with Sensor Models: Recognition Tasks via Classical Planning Diego Aineto,

Provably Correct Development of Reconfigurable Hardware Designs via Equational Reasoning Ian

Chapter 6 Decoding Statistical Machine Translation Decoding We have a mathematical model for

List Decoding of Algebraic Codes Peter Beelen, Kristian Brander and Johan S.R. Nielsen DTU

Decoding in Latent Conditional Models: A Practically Fast Solution for an NP-hard Problem Xu Sun (

1 Latent variable models In the next section we will discuss latent variable models for

Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral Algorithms for Latent

Provably secure compilation of side-channel countermeasures: the case of cryptographic

Provably Secure Higher-Order Masking of AES Matthieu Rivain Emmanuel Prouff CryptoExperts

Provably weak instances of Ring-LWE revisited Wouter Castryck 1 , 2 , Ilia Iliashenko 1 , Frederik

Provably Live Exception Handling Bart Jacobs DistriNet, KU Leuven FTfJP 2015 Bart Jacobs

What is the B-method? Welcome to Provably Correct Software http://www.it.uu.se/

The Best-of-Three Voting on Dense Graphs Nan Kang 1 as Rivera 2 Nicol 1 Department of

ECS 231 Computer Arithmetic 1 / 27 Outline Floating-point numbers and representations 1

Runtime Verifjcation of Scientifjc Software Maxwell Shinn, Clarence Lehman, and Ruzica Piskac

Midterm 2 Review File I/O For files you must first open them: Variable name File name Type

CE419 Session 5: JavaScript Web Programming 1 What is JavaScript? JavaScript is a dynamic

Applying Random Testing to a Base Type Environment Experience Report Vincent St-Amour Neil

Thinking'in'ClojureScript Programming)is)not)about)typing,)it's)about)thinking)4)Rich)Hickey

Paul Manns slides Jamaica and the Caribbean and some others 1 Nan Nelson, Matt Liston,