Manifold Embeddings for Model-Based Reinforcement Learning under - PowerPoint PPT Presentation

Outline Introduction Method Experiments Conclusions Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability By Keith Bush and Joelle Pineau at NIPS 2009 Presented by Chenghui Cai Duke University, ECE February 19, 2010 By Keith Bush and Joelle Pineau at NIPS 2009 Presented by Chenghui Cai Duke University, ECE Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability

Outline Introduction Method Experiments Conclusions Outline ◮ Introduction of RL ◮ Theory and Philosophy Behind RL ◮ Markov Models and RL ◮ Moving from Theory Study to Real Applications ◮ Background of This Paper ◮ Methods ◮ Experiments ◮ Conclusions and Discussions By Keith Bush and Joelle Pineau at NIPS 2009 Presented by Chenghui Cai Duke University, ECE Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability

Outline Introduction Method Experiments Conclusions RL ◮ Simple philosophy: agent receives rewards for good behaviors; punished for bad behaviors. ◮ General training data format: state (situation), action (decision), reward (label). ◮ Purpose of RL: learning behavior policy. ◮ Fundamental: Optimal Bellman Equation, V ∗ ( s t ) = max a t [ E ( r ( s t +1 ) | s t , a t ) + γ ∗ E ( V ∗ ( s t +1 ) | s t , a t )] ◮ The most general learning framework: Degree of Labels’ Clearness Most (with Explicit Least (No Labels) Labels) Supervised Learning RL Unsupervised Learning By Keith Bush and Joelle Pineau at NIPS 2009 Presented by Chenghui Cai Duke University, ECE Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability

Outline Introduction Method Experiments Conclusions Markov Models and RL ◮ Makov chain, HMM, MDP and POMDPs: Do we have control over the state transitons? Markov Models NO YES Markov Chain MDP YES Are the Markov Decision Process states completely HMM POMDP observable? NO Hidden Markov Model Partially Observable Markov Decision Process adopted from pomdp.org ◮ Two ways to learning the behavior policy: ◮ Model-based: learn dynamics, solve Markov Models; ◮ Model-free: directly learning the policy, such as, RPR, Q-learning. ◮ Applications: robotics, decision making under uncertainty. By Keith Bush and Joelle Pineau at NIPS 2009 Presented by Chenghui Cai Duke University, ECE Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability

Outline Introduction Method Experiments Conclusions Background of This Paper ◮ Barriers of Applications: (1) goal (reward) not well-defined; (2) exploration is expensive; (3) data not preserve Markov property. ◮ Solution 1: For many domains, particularly those governed by differential equations, leverage the induction of locality (nearest neighborhood, e.g., s ( t + 1) and s ( t ) ) during function approximation to satisfy Markov property ◮ Solution 2: Reconstruct state-spaces of partially observable systems: transfer high-order Markov property to first order Markov property, and preserve the locality. ◮ Example: use manifold embeddings to reconstruct locally Eculidean state-spaces of forced partially observable systems; can find the embedding non-parametrically. By Keith Bush and Joelle Pineau at NIPS 2009 Presented by Chenghui Cai Duke University, ECE Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability

Outline Introduction Method Experiments Conclusions Summary of the method An offline RL learning: ◮ Part 1, modeling phase: identify the appropriate embedding and define the local model. ◮ Part 2, learning phase: leverage (use) the resultant locality and perform RL. By Keith Bush and Joelle Pineau at NIPS 2009 Presented by Chenghui Cai Duke University, ECE Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability

Outline Introduction Method Experiments Conclusions Modeling: Manifold Embeddings for RL 1/2 Purpose: Using nonlinear dynamic systems theory to reconstruct complete state observability from incomplete state via delayed embeddings (representation). ◮ Assume real-valued vector space R M ; action a ; state dynamics function f ; and deterministic policy function a ( t ) = π ( s ( t )), where s ( t ) the state. s ( t + 1) = f ( s ( t ) , a ( t )) = f ( s ( t ) , π ( s ( t ))) = φ ( s ( t )) (1) ◮ if the system is observed via function y , s.t. ˜ s ( t ) = y ( s ( t )) (2) By Keith Bush and Joelle Pineau at NIPS 2009 Presented by Chenghui Cai Duke University, ECE Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability

Outline Introduction Method Experiments Conclusions Modeling: Manifold Embeddings for RL 2/2 ◮ Construct a vector s E ( t ) below s.t. s E lies on a subset of R E which is an embedding of s . s E ( t ) = [˜ s ( t ) , ˜ s ( t − 1) , · · · , ˜ s ( t − ( E − 1))] , E > 2 M (3) ◮ Because embeddings preserve the connectivity of the original vector space R M , in the context of RL the following mapping ψ with : s E ( t + 1) = ψ ( s E ( t )) , (4) may be substituted for f and vectors s E ( t ) may be substituted for corresponding vectors s ( t ). By Keith Bush and Joelle Pineau at NIPS 2009 Presented by Chenghui Cai Duke University, ECE Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability

Outline Introduction Method Experiments Conclusions Modeling: Nonparametric Idenfication of Manifold Embeddings 1/2 Problem left: how to compute E , the embedding dimension. Solution: Singular Value Decomposition (SVD) Algorithm s of length ˜ ◮ Given a sequence of state observations ˜ S , choose a sufficiently large fixed embedding dimension, ˆ E . ◮ For each embedding window size ˆ T min ∈ { ˆ E , · · · , ˜ S } , E ( t ) , t ∈ { ˆ T min , · · · , ˜ 1. Define a matrix S ˆ E of row vectors s ˆ S } , by the rule: s ( t − (ˆ E ( t ) = [˜ s ( t ) , ˜ s ( t − τ ) , · · · , ˜ E − 1) τ )] , (5) s ˆ 2. Compute the SVD of the matrix S ˆ E , S ˆ E = U Σ W ∗ . 3. Record the vector of singular values, σ ( ˆ T min ) in Σ. By Keith Bush and Joelle Pineau at NIPS 2009 Presented by Chenghui Cai Duke University, ECE Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability

Outline Introduction Method Experiments Conclusions Modeling: Nonparametric Idenfication of Manifold Embeddings 2/2 ◮ Estimate the embedding parameters, T min and E , of s by analysis of the second σ 2 ( ˆ T min ). 1. Approximate window size, T min , of s is the ˆ T min value of the first local maxima of the sequence of all σ 2 ( ˆ T min ), for T min ∈ { ˆ ˆ E , · · · , ˜ S } . 2. Approximate embedding dimension, E , is the number of non-trivial singular values of σ ( T min ). ◮ Embeddings s via parameters, T min and E , yields the matrix S E of row vectors s E ( t ) , t ∈ { T min , · · · , ˜ S } . By Keith Bush and Joelle Pineau at NIPS 2009 Presented by Chenghui Cai Duke University, ECE Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability

Outline Introduction Method Experiments Conclusions Modeling: Generative Local Models from Embeddings 1/2 Purpose: Generate local model, a trajectory of the underlying system and prepare the observed “state” for RL. ◮ Consider a dataset D of a set of temporally aligned sequences of s ( t ), a ( t ) and reward r ( t ), t ∈ { 1 , · · · , ˜ S } . ◮ Apply the above spectral embedding method to D and yield a sequence of vectors s E ( t ) , t ∈ { T min , · · · , ˜ S } . ◮ A local model M of D is the set of 3-tuples, m ( t ) = { s E ( t ) , a ( t ) , r ( t ) } , t ∈ { T min , · · · , ˜ S } . ◮ Define operations on these tuples, A ( m ( t )) = a ( t ) , S ( m ( t )) = s E ( t ) , Z ( m ( t )) = s z ( t ), where s z ( t ) = [ s E ( t ) , a ( t )], and U ( M , a ) = M a , where M a is the subset of tuples in M containing action a . By Keith Bush and Joelle Pineau at NIPS 2009 Presented by Chenghui Cai Duke University, ECE Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability

Outline Introduction Method Experiments Conclusions Modeling: Generative Local Models from Embeddings 2/2 Consider a state vector x ( i ) in R E indexed by simulation time i , and Compute its locality, i.e., nearest neighbor. ◮ Model’s nearest neighbor of x ( i ) when taking action a(i) defined in case of discrete set of actions and continuous case: m ( t x ( i ) ) = arg m ( t ) ∈U ( M , a ( i )) �S ( m ( t )) − x ( i ) � , a ∈ A min (6) m ( t x ( i ) ) = arg min m ( t ) ∈ M �Z ( m ( t )) − [ x ( i ) , ω a ( i )] � , a ∈ A , (7) where ω is a scaling parameter. ◮ Model gradient and numerical integration defined as: ∇ x ( i ) = S ( m ( t xi + 1)) − S ( m ( t xi )) (8) x ( i + 1) = x ( i ) + △ i ( ∇ x ( i ) + η ) (9) where η is a vector of noise and ∆ i is the integration step-size. By Keith Bush and Joelle Pineau at NIPS 2009 Presented by Chenghui Cai Duke University, ECE Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability

Manifold Embeddings for Model-Based Reinforcement Learning under - PowerPoint PPT Presentation

Outline Introduction Method Experiments Conclusions Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability By Keith Bush and Joelle Pineau at NIPS 2009 Presented by Chenghui Cai Duke University, ECE

Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018 Agenda 1 Team 2 Whats an

Linear Manifold Clustering Robert Haralick and Rave Harpaz Outline Background The linear

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

Word embeddings Rappel Embeddings ( pas Word Embeddings ) Est une lookup table Formalisme:

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep Learning for NLP Overview

Manifold Learning: Applications in Neuroimaging Robin Wolz 23/09/2011 Overview Manifold

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Linear Manifold Embeddings of Pattern Clusters Robert Haralick Rave Harpaz Pattern Recognition

Manifold Construction and Parameterization for Nonlinear Manifold-Based Model Reduction Chenjie

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Game Bot Identification Game Bot Identification based on Manifold Learning based on Manifold

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Mixed membership word embeddings: Corpus-specific embeddings without big data James Foulds

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Creating opportunities, building competences About us Established in 2014 State owned

OF INCREASING THE PLASTICATING EFFICIENCY Lublin, 27 January 2017, NEWEX project METHODS of

Being a Welcoming Church Tips & Tools for Greeters Janet Earls, Congregational Vitality

Global Landgrabs, Agribusiness and the Commercial Smallholder: A West African perspective Kojo

Variational Inequalities Learn about basic networks economics in Network Economics and

arXiv:1712.08207v3 [cs.CL] 21 Jun 2018 decoder networks. When combined with a traditional

Presentation to the Standing Committee on Finance On Budget Bill C-38 and The Abolition of

TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS Cris Cecka Senior Research Scientist,

Manifold Embeddings for Model-Based Reinforcement Learning under - PowerPoint PPT Presentation

Outline Introduction Method Experiments Conclusions Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability By Keith Bush and Joelle Pineau at NIPS 2009 Presented by Chenghui Cai Duke University, ECE

Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018 Agenda 1 Team 2 Whats an

Linear Manifold Clustering Robert Haralick and Rave Harpaz Outline Background The linear

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

Word embeddings Rappel Embeddings ( pas Word Embeddings ) Est une lookup table Formalisme:

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep Learning for NLP Overview

Manifold Learning: Applications in Neuroimaging Robin Wolz 23/09/2011 Overview Manifold

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Linear Manifold Embeddings of Pattern Clusters Robert Haralick Rave Harpaz Pattern Recognition

Manifold Construction and Parameterization for Nonlinear Manifold-Based Model Reduction Chenjie

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Game Bot Identification Game Bot Identification based on Manifold Learning based on Manifold

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Mixed membership word embeddings: Corpus-specific embeddings without big data James Foulds

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Creating opportunities, building competences About us Established in 2014 State owned

OF INCREASING THE PLASTICATING EFFICIENCY Lublin, 27 January 2017, NEWEX project METHODS of

Being a Welcoming Church Tips &amp; Tools for Greeters Janet Earls, Congregational Vitality

Global Landgrabs, Agribusiness and the Commercial Smallholder: A West African perspective Kojo

Variational Inequalities Learn about basic networks economics in Network Economics and

arXiv:1712.08207v3 [cs.CL] 21 Jun 2018 decoder networks. When combined with a traditional

Presentation to the Standing Committee on Finance On Budget Bill C-38 and The Abolition of

TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS Cris Cecka Senior Research Scientist,

Being a Welcoming Church Tips & Tools for Greeters Janet Earls, Congregational Vitality