Information-Theoretic Considerations in Batch RL Jinglin Chen, Nan - PowerPoint PPT Presentation

Information-Theoretic Considerations in Batch RL Jinglin Chen, Nan Jiang University of Illinois at Urbana Champaign

What we study: theory of batch RL (ADP)—backbone for “deep RL”

What we study: theory of batch RL (ADP)—backbone for “deep RL” Setting: learn good policy from batch data {( s, a, r, s’ )} + value-function approximator F (model Q* )

What we study: theory of batch RL (ADP)—backbone for “deep RL” Setting: learn good policy from batch data {( s, a, r, s’ )} + value-function approximator F (model Q* ) Central question: When is sample-efficient ( poly (log | F | , H ) ) learning guaranteed?

What we study: theory of batch RL (ADP)—backbone for “deep RL” Setting: learn good policy from batch data {( s, a, r, s’ )} + value-function approximator F (model Q* ) Central question: When is sample-efficient ( poly (log | F | , H ) ) learning guaranteed? n Assumption on data n Data distribution: %(!, #) data distribution Distribution Distribution ind %(!, #) induced by any arbitrary policy ( policy π class: ) * !, # S × A [Munos’03] ! × #) Assumption on F

What we study: theory of batch RL (ADP)—backbone for “deep RL” Setting: learn good policy from batch data {( s, a, r, s’ )} + value-function approximator F (model Q* ) Central question: When is sample-efficient ( poly (log | F | , H ) ) learning guaranteed? n Assumption on data n Data distribution: %(!, #) data distribution Distribution Distribution ind %(!, #) induced by any arbitrary policy ( policy π class: ) * !, # S × A [Munos’03] ! × #) Assumption on F #" small " Π ℱ #" ℱ [Munos & Szepesvari ’05]

What we study: theory of batch RL (ADP)—backbone for “deep RL” Setting: learn good policy from batch data {( s, a, r, s’ )} + value-function approximator F (model Q* ) Central question: When is sample-efficient ( poly (log | F | , H ) ) learning guaranteed? Do they hold in Are they necessary? (hardness results) interesting scenarios? n Assumption on data n Data distribution: %(!, #) data distribution Distribution Distribution ind %(!, #) induced by any arbitrary policy ( policy π class: ) * !, # S × A [Munos’03] ! × #) Assumption on F #" small " Π ℱ #" ℱ [Munos & Szepesvari ’05]

What we study: theory of batch RL (ADP)—backbone for “deep RL” Setting: learn good policy from batch data {( s, a, r, s’ )} + value-function approximator F (model Q* ) Central question: When is sample-efficient ( poly (log | F | , H ) ) learning guaranteed? Do they hold in Are they necessary? (hardness results) interesting scenarios? n Assumption on data • Intuition: data should be exploratory n Data distribution: %(!, #) data distribution Distribution Distribution ind %(!, #) induced by any arbitrary policy ( policy π class: ) * !, # S × A [Munos’03] ! × #) Assumption on F #" small " Π ℱ #" ℱ [Munos & Szepesvari ’05]

What we study: theory of batch RL (ADP)—backbone for “deep RL” Setting: learn good policy from batch data {( s, a, r, s’ )} + value-function approximator F (model Q* ) Central question: When is sample-efficient ( poly (log | F | , H ) ) learning guaranteed? Do they hold in Are they necessary? (hardness results) interesting scenarios? n Assumption on data • Intuition: data should be exploratory n Data distribution: %(!, #) data distribution • We show: also about MDP dynamics! Distribution Distribution ind %(!, #) induced by any arbitrary policy ( • Unrestricted dynamics cause policy π class: ) * !, # exponential lower bound even with the most exploratory distribution F { f S × A [Munos’03] ! × #) Assumption on F #" small " Π ℱ #" ℱ [Munos & Szepesvari ’05]

What we study: theory of batch RL (ADP)—backbone for “deep RL” Setting: learn good policy from batch data {( s, a, r, s’ )} + value-function approximator F (model Q* ) Central question: When is sample-efficient ( poly (log | F | , H ) ) learning guaranteed? Do they hold in Are they necessary? (hardness results) interesting scenarios? n Assumption on data • Intuition: data should be exploratory n Data distribution: %(!, #) data distribution • We show: also about MDP dynamics! Distribution Distribution ind %(!, #) induced by any arbitrary policy ( • Unrestricted dynamics cause policy π class: ) * !, # exponential lower bound even with the most exploratory distribution F { f Similar to Jiang et al [2017] S × A [Munos’03] ! × #) Assumption on F #" small " Π ℱ #" ℱ [Munos & Szepesvari ’05]

What we study: theory of batch RL (ADP)—backbone for “deep RL” Setting: learn good policy from batch data {( s, a, r, s’ )} + value-function approximator F (model Q* ) Central question: When is sample-efficient ( poly (log | F | , H ) ) learning guaranteed? Do they hold in Are they necessary? (hardness results) interesting scenarios? n Assumption on data • Intuition: data should be exploratory n Data distribution: %(!, #) data distribution • We show: also about MDP dynamics! Distribution Distribution ind %(!, #) induced by any arbitrary policy ( • Unrestricted dynamics cause policy π class: ) * !, # exponential lower bound even with the most exploratory distribution F { f Similar to Jiang et al [2017] S × A [Munos’03] ! × #) Assumption on F • Conjecture: realizability alone is insufficient #" small " Π ℱ #" ℱ [Munos & Szepesvari ’05]

What we study: theory of batch RL (ADP)—backbone for “deep RL” Setting: learn good policy from batch data {( s, a, r, s’ )} + value-function approximator F (model Q* ) Central question: When is sample-efficient ( poly (log | F | , H ) ) learning guaranteed? Do they hold in Are they necessary? (hardness results) interesting scenarios? n Assumption on data • Intuition: data should be exploratory n Data distribution: %(!, #) data distribution • We show: also about MDP dynamics! Distribution Distribution ind %(!, #) induced by any arbitrary policy ( • Unrestricted dynamics cause policy π class: ) * !, # exponential lower bound even with the most exploratory distribution F { f Similar to Jiang et al [2017] S × A [Munos’03] ! × #) Assumption on F • Conjecture: realizability alone is insufficient #" • Alg-specific lower bound exists for decades ? small • Info-theoretic ? " Π ℱ #" ℱ [Munos & Szepesvari ’05]

What we study: theory of batch RL (ADP)—backbone for “deep RL” Setting: learn good policy from batch data {( s, a, r, s’ )} + value-function approximator F (model Q* ) Central question: When is sample-efficient ( poly (log | F | , H ) ) learning guaranteed? Do they hold in Are they necessary? (hardness results) interesting scenarios? n Assumption on data • Intuition: data should be exploratory n Data distribution: %(!, #) data distribution • We show: also about MDP dynamics! Distribution Distribution ind %(!, #) induced by any arbitrary policy ( • Unrestricted dynamics cause policy π class: ) * !, # exponential lower bound even with the most exploratory distribution F { f Similar to Jiang et al [2017] S × A [Munos’03] ! × #) Assumption on F • Conjecture: realizability alone is insufficient #" • Alg-specific lower bound exists for decades ? small • Info-theoretic ? • Negative results: two general proof styles " Π ℱ #" excluded • e.g., construct an exponentially large MDP ℱ family => fail! [Munos & Szepesvari ’05]

What we study: theory of batch RL (ADP)—backbone for “deep RL” Setting: learn good policy from batch data {( s, a, r, s’ )} + value-function approximator F (model Q* ) Central question: When is sample-efficient ( poly (log | F | , H ) ) learning guaranteed? Do they hold in Are they necessary? (hardness results) interesting scenarios? n Assumption on data • Intuition: data should be exploratory n Data distribution: %(!, #) data distribution • We show: also about MDP dynamics! Distribution Distribution ind %(!, #) induced by any arbitrary policy ( • Unrestricted dynamics cause policy π class: ) * !, # exponential lower bound even with the most exploratory distribution F { f Similar to Jiang et al [2017] S × A [Munos’03] ! × #) Assumption on F • Conjecture: realizability alone is insufficient #" • Alg-specific lower bound exists for decades F piece-wise constant ? small • Info-theoretic ? + • Negative results: two general proof styles ⇔ bisimulation " Π ℱ #" excluded F closed under   [Givan et al’03] • e.g., construct an exponentially large MDP Bellman update ℱ family => fail! [Munos & Szepesvari ’05]

Implications and the Bigger Picture Tabular RL Batch RL with function approximation tractable Nice dynamics & exploratory data + realizability + ??? Nice dynamics & exploratory data Online (exploration) + realizability G ? a p p ? a G RL intractable Nice dynamics   G a p c o (low Bellman rank; Jiang et al’17) n fi r m e d + realizability Nice dynamics (low witness rank; Sun et al’18) F { f + realizability Poster: Tue Evening Pacific Ballroom #209 value-based model-based

Information-Theoretic Considerations in Batch RL Jinglin Chen, Nan - PowerPoint PPT Presentation

Information-Theoretic Considerations in Batch RL Jinglin Chen, Nan Jiang University of Illinois at Urbana Champaign What we study: theory of batch RL (ADP)backbone for deep RL What we study: theory of batch RL (ADP)backbone for

Batch Systems Running calculations on HPC resources Outline What is a batch system? How

HEBT Magnet Vacuum Chambers for Batch 2 and Batch 3 PSP Code 2.3.7.1.2.3.2 Lukas Urban /

Batch Systems Running your jobs on an HPC machine Outline What are batch systems? Why are

Batch Metadata Editing in DSpace 1.6+ Maureen P. Walsh, The Ohio State University Libraries

INFORMATION-THEORETIC SECURITY INFORMATION-THEORETIC SECURITY Lecture 4 - Elements of Information

INFORMATION-THEORETIC SECURITY INFORMATION-THEORETIC SECURITY Lecture 1 - Elements of Information

INFORMATION-THEORETIC SECURITY INFORMATION-THEORETIC SECURITY Lecture 2 - Elements of Information

Building the Easy Button: Automating SAS Program Batch Runs Nancy Brucken inVentiv Health June

Batch Modeling and Process Monitoring Geir Rune Flten Agenda CAMO Batch analysis

Automating batch fecundity measurements Automating batch fecundity measurements using digital

A Novel Micro- -Batch Mixer Batch Mixer A Novel Micro That Scales To That Scales To The

Enabling Efficient Batch Verification Enabling Efficient Batch Verification on Data Integrity for

Process costing By: Jyotsna Khaitan Batch Costing: It is a modified form of job costing where

Asphalt Production Asphalt Plants Batch Plant Drum Plant Produces asphalt one batch at a time

Learning about the process and organism: Batch Sef Heijnen, Department of Biotechnology, Faculty

Stream Processing Marco Serafini COMPSCI 532 Lecture 5 Stream vs. Batch Processing Batch

Parish X Cluster Parish A Cluster PPP loan = $24,500 PPP loan = $24,500 EIDL Advance = $6,000

CS 730/730W/830: Intro AI MDP Wrap-Up ADP Q -Learning 1 handout: slides project proposals are

The ADP: enabling access and exploitation of radio data collections through the IVOA Marco

Applicant User Guide Presentation Slides to help the applicant through the application process

ftwilliam.com: Defined Benefit Webinar 8/19/2020 Speakers and Agenda Joe Kleinrichert

an Americ rican an Democ ocrac racy Project ect initi tiat ativ ive Introducti oduction

Numerical Enzymology Generalized Treatment of Kinetics & Equilibria Petr Kuzmi , Ph.D.

Disclosures Management of the Chest Pain Astra Zeneca Advisory Board Patient in the ED

Sambuz

Useful Links

Newsletter

Mail Us

Information-Theoretic Considerations in Batch RL Jinglin Chen, Nan - PowerPoint PPT Presentation

Information-Theoretic Considerations in Batch RL Jinglin Chen, Nan Jiang University of Illinois at Urbana Champaign What we study: theory of batch RL (ADP)backbone for deep RL What we study: theory of batch RL (ADP)backbone for

Batch Systems Running calculations on HPC resources Outline What is a batch system? How

HEBT Magnet Vacuum Chambers for Batch 2 and Batch 3 PSP Code 2.3.7.1.2.3.2 Lukas Urban /

Batch Systems Running your jobs on an HPC machine Outline What are batch systems? Why are

Batch Metadata Editing in DSpace 1.6+ Maureen P. Walsh, The Ohio State University Libraries

INFORMATION-THEORETIC SECURITY INFORMATION-THEORETIC SECURITY Lecture 4 - Elements of Information

INFORMATION-THEORETIC SECURITY INFORMATION-THEORETIC SECURITY Lecture 1 - Elements of Information

INFORMATION-THEORETIC SECURITY INFORMATION-THEORETIC SECURITY Lecture 2 - Elements of Information

Building the Easy Button: Automating SAS Program Batch Runs Nancy Brucken inVentiv Health June

Batch Modeling and Process Monitoring Geir Rune Flten Agenda CAMO Batch analysis

Automating batch fecundity measurements Automating batch fecundity measurements using digital

A Novel Micro- -Batch Mixer Batch Mixer A Novel Micro That Scales To That Scales To The

Enabling Efficient Batch Verification Enabling Efficient Batch Verification on Data Integrity for

Process costing By: Jyotsna Khaitan Batch Costing: It is a modified form of job costing where

Asphalt Production Asphalt Plants Batch Plant Drum Plant Produces asphalt one batch at a time

Learning about the process and organism: Batch Sef Heijnen, Department of Biotechnology, Faculty

Stream Processing Marco Serafini COMPSCI 532 Lecture 5 Stream vs. Batch Processing Batch

Parish X Cluster Parish A Cluster PPP loan = $24,500 PPP loan = $24,500 EIDL Advance = $6,000

CS 730/730W/830: Intro AI MDP Wrap-Up ADP Q -Learning 1 handout: slides project proposals are

The ADP: enabling access and exploitation of radio data collections through the IVOA Marco

Applicant User Guide Presentation Slides to help the applicant through the application process

ftwilliam.com: Defined Benefit Webinar 8/19/2020 Speakers and Agenda Joe Kleinrichert

an Americ rican an Democ ocrac racy Project ect initi tiat ativ ive Introducti oduction

Numerical Enzymology Generalized Treatment of Kinetics &amp; Equilibria Petr Kuzmi , Ph.D.

Disclosures Management of the Chest Pain Astra Zeneca Advisory Board Patient in the ED

Sambuz

Useful Links

Newsletter

Mail Us

Numerical Enzymology Generalized Treatment of Kinetics & Equilibria Petr Kuzmi , Ph.D.