Approximate information state for partially observed systems - PowerPoint PPT Presentation

Approximate information state for partially observed systems Jayakumar Subramanian and Aditya Mahajan McGill University Thanks to Amit Sinha and Raihan Seraj for simulation results IEEE Conference on Decision and Control 11 December 2019

Approx. info. state–(Subramanian and Mahajan) 1 Many successes of RL in recent years Algorithms based on comprehensive theory

Approx. info. state–(Subramanian and Mahajan) 1 Alpha Go Many successes of RL in recent years Algorithms based on comprehensive theory

Approx. info. state–(Subramanian and Mahajan) 1 Alpha Go Arcade games Many successes of RL in recent years Algorithms based on comprehensive theory

Approx. info. state–(Subramanian and Mahajan) 1 Alpha Go Arcade games Robotics Many successes of RL in recent years Algorithms based on comprehensive theory

Approx. info. state–(Subramanian and Mahajan) 1 Alpha Go Arcade games Robotics Many successes of RL in recent years Algorithms based on comprehensive theory restricted almost exclusively to systems with perfect state observations. Applications with partially observed state Healthcare Autonomous driving Finance (portfolio management) Retail and marketing

Approx. info. state–(Subramanian and Mahajan) 1 Alpha Go Arcade games Robotics Many successes of RL in recent years Algorithms based on comprehensive theory restricted almost exclusively to systems with perfect state observations. Applications with partially observed state Healthcare Autonomous driving Finance (portfolio management) Retail and marketing Develop a comprehensive theory of approximate DP and RL for partially observed systems

Notion of information state for partially observed systems

Approx. info. state–(Subramanian and Mahajan) 2 Stochastic System Controlled input: U t Stochastic input: W t Output: Y t Notion of state in partially observed stochastic dynamical systems Y t = f t (U 1:t , W 1:t ).

Approx. info. state–(Subramanian and Mahajan) 2 Stochastic System Controlled input: U t Stochastic input: W t Output: Y t STOCHASTIC INPUT IS NOT OBSERVED of inputs and OUTPUTS until time t . Notion of state in partially observed stochastic dynamical systems Let H t = (Y 1:t−1 , U 1:t−1 ) denote the history Y t = f t (U 1:t , W 1:t ).

Approx. info. state–(Subramanian and Mahajan) 2 Baum and Petrie, “Statistical inference for probabilistic functions of fjnite state Markov chains,” 1966. stochastic systems,” 1965. Striebel, “Suffjcient statistics in the optimal control of Astrom, “Optimal control of Markov decision processes with incomplete state information,” 1965. Notion of state in partially observed stochastic dynamical systems s ∈ 𝒯. the stochastic inputs are observed. TRADITIONAL SOLUTION: BELIEF STATES of inputs and OUTPUTS until time t . STOCHASTIC INPUT IS NOT OBSERVED Output: Y t Stochastic input: W t Controlled input: U t System Stochastic Stratonovich, “Conditional Markov processes,” 1960. Let H t = (Y 1:t−1 , U 1:t−1 ) denote the history Y t = f t (U 1:t , W 1:t ). Step 1 Identify a state {S t } t≥0 for predicting output assuming that Step 2 Defjne a BELIEF STATE B t ∈ Δ(𝒯) : B t (s) = ℙ (S t = s | H t = h t ),

Approx. info. state–(Subramanian and Mahajan) 3 Value function is piecewise linear and convex. Is exploited by various effjcient algorithms. Partially observed Markov decision processes (POMDPs): Pros and Cons of belief state representation Smallwood and Sondik, “The optimal control of partially observable Markov process over a fjnite horizon,” 1973. Chen, “Algorithms for partially observable Markov decision processes,” 1988. Kaelbling, Littmam, Cassandra, “Planning and acting in partially observable stochastic domains,” 1998. Pineau, Gordon, Thrun, “Point-based value iteration: an anytime algorithm for POMDPs,” 2003.

Approx. info. state–(Subramanian and Mahajan) 3 Value function is piecewise linear and convex. Is exploited by various effjcient algorithms. When the state space model is not known analytically (as is the case for black-box models and simulators as well as some real world application such as healthcare), belief states are diffjcult to construct and diffjcult to approximate from data. Partially observed Markov decision processes (POMDPs): Pros and Cons of belief state representation Smallwood and Sondik, “The optimal control of partially observable Markov process over a fjnite horizon,” 1973. Chen, “Algorithms for partially observable Markov decision processes,” 1988. Kaelbling, Littmam, Cassandra, “Planning and acting in partially observable stochastic domains,” 1998. Pineau, Gordon, Thrun, “Point-based value iteration: an anytime algorithm for POMDPs,” 2003.

Is there another ways to model partially observed systems which is more amenable to approximations? Let’s go back to first principles.

Approx. info. state–(Subramanian and Mahajan) 4 Stochastic System Controlled input: U t Stochastic input: W t Output: Y t WHEN THE STOCHASTIC INPUT IS NOT OBSERVED of inputs and OUTPUTS until time t . Notion of state in partially observed stochastic dynamical systems Let H t = (Y 1:t−1 , U 1:t−1 ) denote the history Y t = f t (U 1:t , W 1:t ).

Approx. info. state–(Subramanian and Mahajan) 4 a.s. Y (1) if for all future inputs (U t:T , W t:T ) , t ∼ H (2) t H (1) PREDICTING OUTPUTS ALMOST SURELY of inputs and OUTPUTS until time t . WHEN THE STOCHASTIC INPUT IS NOT OBSERVED Output: Y t Stochastic input: W t Controlled input: U t System Stochastic Notion of state in partially observed stochastic dynamical systems Let H t = (Y 1:t−1 , U 1:t−1 ) denote the history Y t = f t (U 1:t , W 1:t ). t:T = Y (2) t:T ,

Approx. info. state–(Subramanian and Mahajan) 4 Grassberger, “Complexity and forecasting in dynamical systems,” 1988. Notion of state in partially observed stochastic dynamical systems , U t:T ) t , U t:T ) = ℙ (Y (2) t ℙ (Y (1) if for all future CONTROL inputs U t:T , t ∼ H (2) t H (1) FORECASTING OUTPUTS IN DISTRIBUTION a.s. Cruthfjeld and Young, “Inferring statistical complexity,” 1989. Y (1) if for all future inputs (U t:T , W t:T ) , Stochastic System Controlled input: U t Stochastic input: W t Output: Y t WHEN THE STOCHASTIC INPUT IS NOT OBSERVED of inputs and OUTPUTS until time t . PREDICTING OUTPUTS ALMOST SURELY H (1) t ∼ H (2) t Let H t = (Y 1:t−1 , U 1:t−1 ) denote the history Y t = f t (U 1:t , W 1:t ). t:T = Y (2) t:T , t:T | H (1) t:T | H (2)

Approx. info. state–(Subramanian and Mahajan) ℙ (Y (1) FORECASTING OUTPUTS IN DISTRIBUTION H (1) t ∼ H (2) t if for all future CONTROL inputs U t:T , t 4 , U t:T ) = ℙ (Y (2) t , U t:T ) Too restrictive . . . Notion of state in partially observed stochastic dynamical systems Grassberger, “Complexity and forecasting in dynamical systems,” 1988. a.s. Cruthfjeld and Young, “Inferring statistical complexity,” 1989. Controlled input: U t System Stochastic input: W t Output: Y t Stochastic WHEN THE STOCHASTIC INPUT IS NOT OBSERVED Y (1) of inputs and OUTPUTS until time t . PREDICTING OUTPUTS ALMOST SURELY H (1) t ∼ H (2) t if for all future inputs (U t:T , W t:T ) , Let H t = (Y 1:t−1 , U 1:t−1 ) denote the history Y t = f t (U 1:t , W 1:t ). t:T = Y (2) t:T , t:T | H (1) t:T | H (2)

Approx. info. state–(Subramanian and Mahajan) 5 FORECASTING OUTPUTS IN DISTRIBUTION H (1) t ∼ H (2) t if for all future CONTROL inputs U t:T , ℙ (Y (1) t , U t:T ) = ℙ (Y (2) t , U t:T ) Now let’s consturct the state space t:T | H (1) t:T | H (2)

Approx. info. state–(Subramanian and Mahajan) t SUFFICIENT TO PREDICT OUTPUT: SUFFICIENT TO PREDICT ITSELF: of past inputs that satisfjes the following: PROPERTIES OF INFORMATION STATE , U t:T ) t 5 , U t:T ) = ℙ (Y (2) ℙ (Y (1) if for all future CONTROL inputs U t:T , t ∼ H (2) t H (1) FORECASTING OUTPUTS IN DISTRIBUTION Now let’s consturct the state space t:T | H (1) t:T | H (2) The info state Z t at time t is a “compression” ℙ (Z t+1 | H t , U t ) = ℙ (Z t+1 | Z t , U t ). ℙ (Y t | H t , U t ) = ℙ (Y t | Z t , U t ).

Approx. info. state–(Subramanian and Mahajan) 5 Step 1 for belief state formulations) case of perfect observations (which was suffjcient for forecasting outputs for the Same complexity as identifying the state SUFFICIENT TO PREDICT OUTPUT: SUFFICIENT TO PREDICT ITSELF: of past inputs that satisfjes the following: PROPERTIES OF INFORMATION STATE , U t:T ) t , U t:T ) = ℙ (Y (2) t ℙ (Y (1) if for all future CONTROL inputs U t:T , t ∼ H (2) t H (1) FORECASTING OUTPUTS IN DISTRIBUTION Now let’s consturct the state space t:T | H (1) t:T | H (2) The info state Z t at time t is a “compression” ℙ (Z t+1 | H t , U t ) = ℙ (Z t+1 | Z t , U t ). ℙ (Y t | H t , U t ) = ℙ (Y t | Z t , U t ).

Approximate information state for partially observed systems - PowerPoint PPT Presentation

Approximate information state for partially observed systems Jayakumar Subramanian and Aditya Mahajan McGill University Thanks to Amit Sinha and Raihan Seraj for simulation results IEEE Conference on Decision and Control 11 December 2019

Probabilistic Graphical Models 10-708 Learning Partially Observed Learning Partially Observed

Galactic Cosmic Galactic Cosmic- - Rays Observed by Rays Observed by Rays Observed by Rays

Galactic Cosmic Galactic Cosmic- - Rays Observed by Rays Observed by Rays Observed by Rays

Model reduction of partially-observed Motivation stochastic differential equations A control

ESRC Annual HCP Review Kahuku Wind Power and Kaheawa Wind Power I Kahuku Wind Power 2 Observed

Rayleigh- -Taylor instability Taylor instability Rayleigh in partially ionized in partially

Approximate Computing Is Dead; Long Live Approximate Computing Adrian Sampson Cornell Hardware

Approximate Nearest Neighbors Search Approximate Nearest Neighbors Search in High Dimensions in

Approximate Bayesian Computation Dr. Jarad Niemi STAT 615 - Iowa State University December 5,

Regulation and observed Regulation and observed successes in South Africa successes in South

Comparison of Simulated and Comparison of Simulated and Observed Interplanetary Observed

Nonlinear Modeling Overview Problem Definition Problem definition Observed Process

Probabilistic Graphical Models 10-708 Learning Completely Observed Learning Completely Observed

Continuous-time Principal-Agent Problem in Partially Observed System Kaitong HU CMAP, Ecole

Bayesian inference for partially observed Markov processes, with application to systems biology

Continuous-time Principal-Agent Problem in Partially Observed System and Path-dependent FBSDEs

Procrastination Extermination Nation (P.E.N.) Becky Fox & Alice Cheng Problem Have you ever

Digital pulse sensors Incremental encoders, angle Luxury variant Mechanical sensing

The Perishables Project the perishables project.com perishables book.com Attempting to sell

2 nd semester Photo comparison and Role play We define "comparing two images" as

Guidance Note 1 NTRL NATIONAL COMMISSIONING FRAMEWORK (KEY ELEMENTS) Purpose of and Objectives

Agenda Item 6: First-time Adoption of Accrual Basis IPSASs Amanda Botha IPSASB Meeting June 24

Reducing Carbon Emissions: Reducing Carbon Emissions: Bottom-Up Approaches Bottom-Up Approaches

CS 480/680: GAME ENGINE PROGRAMMING ARTIFICIAL INTELLIGENCE 3/7/2013 Santiago Ontan

Approximate information state for partially observed systems - PowerPoint PPT Presentation

Approximate information state for partially observed systems Jayakumar Subramanian and Aditya Mahajan McGill University Thanks to Amit Sinha and Raihan Seraj for simulation results IEEE Conference on Decision and Control 11 December 2019

Probabilistic Graphical Models 10-708 Learning Partially Observed Learning Partially Observed

Galactic Cosmic Galactic Cosmic- - Rays Observed by Rays Observed by Rays Observed by Rays

Galactic Cosmic Galactic Cosmic- - Rays Observed by Rays Observed by Rays Observed by Rays

Model reduction of partially-observed Motivation stochastic differential equations A control

ESRC Annual HCP Review Kahuku Wind Power and Kaheawa Wind Power I Kahuku Wind Power 2 Observed

Rayleigh- -Taylor instability Taylor instability Rayleigh in partially ionized in partially

Approximate Computing Is Dead; Long Live Approximate Computing Adrian Sampson Cornell Hardware

Approximate Nearest Neighbors Search Approximate Nearest Neighbors Search in High Dimensions in

Approximate Bayesian Computation Dr. Jarad Niemi STAT 615 - Iowa State University December 5,

Regulation and observed Regulation and observed successes in South Africa successes in South

Comparison of Simulated and Comparison of Simulated and Observed Interplanetary Observed

Nonlinear Modeling Overview Problem Definition Problem definition Observed Process

Probabilistic Graphical Models 10-708 Learning Completely Observed Learning Completely Observed

Continuous-time Principal-Agent Problem in Partially Observed System Kaitong HU CMAP, Ecole

Bayesian inference for partially observed Markov processes, with application to systems biology

Continuous-time Principal-Agent Problem in Partially Observed System and Path-dependent FBSDEs

Procrastination Extermination Nation (P.E.N.) Becky Fox &amp; Alice Cheng Problem Have you ever

Digital pulse sensors Incremental encoders, angle Luxury variant Mechanical sensing

The Perishables Project the perishables project.com perishables book.com Attempting to sell

2 nd semester Photo comparison and Role play We define &quot;comparing two images&quot; as

Guidance Note 1 NTRL NATIONAL COMMISSIONING FRAMEWORK (KEY ELEMENTS) Purpose of and Objectives

Agenda Item 6: First-time Adoption of Accrual Basis IPSASs Amanda Botha IPSASB Meeting June 24

Reducing Carbon Emissions: Reducing Carbon Emissions: Bottom-Up Approaches Bottom-Up Approaches

CS 480/680: GAME ENGINE PROGRAMMING ARTIFICIAL INTELLIGENCE 3/7/2013 Santiago Ontan

Procrastination Extermination Nation (P.E.N.) Becky Fox & Alice Cheng Problem Have you ever

2 nd semester Photo comparison and Role play We define "comparing two images" as