Reinforcement learning with raw image pixels as input state Damien - PowerPoint PPT Presentation

Reinforcement learning with raw image pixels as input state Damien Ernst † , Rapha¨ ee , Louis Wehenkel el Mar´ Department of Electrical Engineering and Computer Science University of Li` ege, Belgium † Postdoctoral Researcher FNRS IWICPAS - August, 2006 Rapha¨ el Mar´ ee Reinforcement learning .... (1/14)

What is reinforcement learning ? Reinforcement learning = learning what to do, how to map states to actions, from the information acquired from interaction with a system. Classical setting for reinforcement learning: ◮ the reinforcement learning agent wants to minimize a long term cost signal ◮ the information the reinforcement learning agent has is a set of samples ◮ a sample = (state, action taken while being in this state, instantaneous cost, successor state) Reinforcement learning is a promising approach for designing autonomous robots able to fulfill specific tasks (helping disabled persons, cleaning a house, playing soccer, ...) Rapha¨ el Mar´ ee Reinforcement learning .... (2/14)

Reinforcement learning and visual input In many practical problems: the input state is made of visual percepts A visual percept is composed of hundreds if not thousands of elements ⇒ may be problematic if used as such as input space. Up to now, people believed that it was not possible to use components describing the image as such in a reinforcement learning algorithm ⇒ feature extraction techniques. BUT, two new elements: ◮ Recent advances in image classification = possible to work directly with image pixels by relying on state-of-the art supervised learning methods [Mar´ ee et al., CVPR 2005] ◮ Recent advances in reinforcement learning = the newly introduced fitted Q iteration family of algorithms can exploit the generation capabilities of any supervised learning method. [Ernst et al., JMLR 2005] Rapha¨ el Mar´ ee Reinforcement learning .... (3/14)

Question ? If using directly image pixels works in image classification and since we have now reinforcement learning algorithms that can exploit the generalization capabilities of any supervised learning method, then why not to use directly image pixels in reinforcement learning ? Rapha¨ el Mar´ ee Reinforcement learning .... (4/14)

Learning from a set of samples Problem formulation Deterministic version Discrete-time dynamics: x t +1 = f ( x t , u t ) t = 0 , 1 , . . . where x t ∈ X and u t ∈ U . Cost function: c ( x , u ) : X × U → R . c ( x , u ) bounded by B c . Instantaneous cost: c t = c ( x t , u t ) Discounted infinite horizon cost associated to stationary policy � N − 1 t =0 γ t c ( x t , µ ( x t )) where γ ∈ [0 , 1[. µ : X → U : J µ ( x ) = lim N →∞ Optimal stationary policy µ ∗ : Policy that minimizes J µ for all x . Objective: Find an optimal policy µ ∗ . We do not know: The discrete-time dynamics and the cost function. We know instead: A set of system transitions: t +1 ) } # F F = { ( x l t , u l t , c l t , x l l =1 . Rapha¨ el Mar´ ee Reinforcement learning .... (5/14)

Some dynamic programming results Sequence of state-action value functions Q N : X × U → R u ′ ∈ U Q N − 1 ( f ( x , u ) , u ′ ) , Q N ( x , u ) = c ( x , u ) + γ min ∀ N > 1 with Q 1 ( x , u ) ≡ c ( x , u ), converges to the Q -function, unique solution of the Bellman equation: u ′ ∈ U Q ( f ( x , u ) , u ′ ) . Q ( x , u ) = c ( x , u ) + γ min Necessary and sufficient optimality condition: µ ∗ ( x ) ∈ arg min Q ( x , u ) u ∈ U Suboptimal stationary policy µ ∗ N : µ ∗ N ( x ) ∈ arg min Q N ( x , u ) . u ∈ U Bound on µ ∗ N : N − J µ ∗ ≤ 2 γ N B c J µ ∗ (1 − γ ) 2 . Rapha¨ el Mar´ ee Reinforcement learning .... (6/14)

Fitted Q iteration Fitted Q iteration computes from F the functions ˆ Q 1 , ˆ Q 2 , . . . , ˆ Q N , approximations of Q 1 , Q 2 , . . . , Q N . Computation done iteratively by solving a sequence of standard supervised learning problems. Training sample for the k th ( k ≥ 1) �� # F �� ˆ ( x l t , u l t ) , c l Q k − 1 ( x l problem is t + γ min t +1 , u ) with u ∈ U l =1 Q 0 ( x , u ) ≡ 0. From the k th training sample, the supervised ˆ learning algorithm outputs ˆ Q k . ˆ µ ∗ Q N ( x , u ) is taken as approximation of µ ∗ ( x ). ˆ N ( x ) ∈ arg min u ∈ U Rapha¨ el Mar´ ee Reinforcement learning .... (7/14)

Fitted Q iteration: some remarks Performances of the algorithm depends on the supervised learning method chosen. Excellent performances have been observed when combined with supervised learning methods based on ensemble of regression trees. Works also for stochastic systems Consistency can be ensured under appropriate assumptions on the supervised learning method, the sampling process, the system dynamics and the cost function. Rapha¨ el Mar´ ee Reinforcement learning .... (8/14)

Our experimental protocol: test problem c ( p , u ) = 0 p (1) c ( p , u ) = − 1 p (1) 100 100 p t +1 (1) = min( p t (1) + 25 , 100) possible u t = go up actions p t 20 20 c ( p , u ) = − 2 0 0 0 20 100 p (0) 0 20 100 p (0) 100 pixels 30 pixels p (1) 10 pixels 15 pixels observation image when 100 the agent is in positon p t p t pixels ( p t ) p t 186 186 103 103 250 250 250 250 pixels ( p t ) = 30*30 element vector such that the grey level of the pixel located at the i th line and the j th column of the observation image is the (30 ∗ i + j )th element of this vector. 0 grey level ∈ { 0 , 1 , · · · , 255 } 0 100 p (0) Rapha¨ el Mar´ ee Reinforcement learning .... (9/14)

Framework parameters Four-tuples generation (# F = n) ◮ We repeat n times the sequence of instructions: 1. draw p 0 at random in P and u at random in U ; 2. observe r 0 and p 1 ; 3. add ( pixels ( p 0 ) , u 0 , r 0 , pixels ( p 1 )) to F . Fitted Q iteration algorithm ◮ ˆ Q k computed with Extra-Trees [Geurts et al., Machine Learning 2006] ◮ Number of iterations N = 10 ◮ Approximation of the optimal policy 10 ( x ) = arg max u ˆ µ ∗ ˆ Q 10 ( x , u ) Rapha¨ el Mar´ ee Reinforcement learning .... (10/14)

Results p (1) p (1) 100 100 0 0 0 100 p (0) 0 100 p (0) µ ∗ µ ∗ (a) ˆ 10 , 500 system trans. (b) ˆ 10 , 2000 system trans. µ ∗ J ˆ p used as state input 10 p (1) µ ∗ Optimal score ( J ) 1. 100 0.8 pixels ( p ) used as state input 0.6 0 0.4 0 100 p (0) 1000 3000 5000 7000 9000 # F µ ∗ (c) ˆ 10 , 8000 system trans. (d) score versus nb system trans. Rapha¨ el Mar´ ee Reinforcement learning .... (11/14)

Influence of the navigation image characteristics µ ∗ J ˆ 10 µ ∗ Optimal score ( J ) 1. 0.8 0.6 0.4 System partially observable 1 1 10 10 50 50 × × × 5 5 20 20 100 100 × × × Figure: Evolution of the score with the size of the constant grey level tiles. 2000 system samples. Rapha¨ el Mar´ ee Reinforcement learning .... (12/14)

Conclusions We have applied a new reinforcement algorithm known as fitted Q iteration to the problem of navigation from visual percepts. State inputs were the raw pixels. Good results even if in such conditions information is spread over a large number of low-level input variables ⇒ Question the need for still going through a feature extraction phase. Strong influence of the learning quality on the characteristics of the images the agent gets as input states. Rapha¨ el Mar´ ee Reinforcement learning .... (13/14)

References ◮ “Tree-based batch mode reinforcement learning”. D. Ernst, P. Geurts and L. Wehenkel. In Journal of Machine Learning Research. April 2005, Volume 6, pages 503-556. ◮ “Random subwindows for robust image classification”. R. Mar´ ee, P. Geurts, J. Piater and L. Wehenkel. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, June 2005, Volume 1, pages 34-40. ◮ “Extremely Randomized Trees”. P. Geurts, D. Ernst, and L. Wehenkel. Machine Learning, Volume 36, Number 1, page 3-42, 2006. ◮ “Reinforcement learning with raw pixels as state input”. D. Ernst, R. Mar´ ee and L. Wehenkel. International Workshop on Intelligent Computing in Pattern Analysis/Synthesis (IWICPAS). Proceedings series: Lecture Notes in Computer Science, Volume 4153, pages 446-454, August 2006. Rapha¨ el Mar´ ee Reinforcement learning .... (14/14)

Reinforcement learning with raw image pixels as input state Damien - PowerPoint PPT Presentation

Reinforcement learning with raw image pixels as input state Damien Ernst , Rapha ee , Louis Wehenkel el Mar Department of Electrical Engineering and Computer Science University of Li` ege, Belgium Postdoctoral Researcher FNRS

RAW CASHEW NUT QUALITY RAW CASHEW NUT QUALITY RAW CASHEW NUT QUALITY RAW CASHEW NUT QUALITY RAW

2019 RAW CASHEW NUTS CROP IN 2019 RAW CASHEW NUTS CROP IN 2019 RAW CASHEW NUTS CROP IN 2019 RAW

Raw Sockets and ICMP Raw Sockets and ICMP Code Examples Ping Traceroute Srinidhi

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Pixels Pixels Row and column indicates a PIXEL not a POINT. A pixel can theoretically contain

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

CCD Image Processing: CCD Image Processing: [ ] [ ] r x y , d x y , Raw File [ ]

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Turn Right Walk forward 100 pixels Start Here Walk Forward Turn Left and 100 pixels walk

Polygon Filling Goal intensify the pixels that belong to the polygon Issues which pixels belong

File Input and Output File Input and Output 1 / 9 File input/output input function reads values

Input Input devices Text entry Positional input Input Devices 1 iPod Wheel Input Devices 2

COMP 204 Introduction to image analysis with scikit-image (part two) Mathieu Blanchette, based

Einfhrung in Visual Computing U it 11 P i t O Unit 11: Point Operations ti http://

Mul$dimensional arrays CSCI 136: Fundamentals of Computer Science

LOGIC TECHNOLOGY FOR CS EDUCATION RISCAL The RISC Algorithm Language Wolfgang Schreiner

Digital Image Analysis and Processing CPE 0907544 Image Enhancement Part I Intensity

Readiness of PXD software for phase 2 & Preparation status for phase 3 BPAC focused review

Geirhos et al. (2019) Introduction ImageNet classifjcation with CNNs Which image cues are

Sprocket: A Serverless Video Processing Framework Lixi xiang Ao, , Liz Izhikevi vich ch, ,

Reinforcement learning with raw image pixels as input state Damien - PowerPoint PPT Presentation

Reinforcement learning with raw image pixels as input state Damien Ernst , Rapha ee , Louis Wehenkel el Mar Department of Electrical Engineering and Computer Science University of Li` ege, Belgium Postdoctoral Researcher FNRS

RAW CASHEW NUT QUALITY RAW CASHEW NUT QUALITY RAW CASHEW NUT QUALITY RAW CASHEW NUT QUALITY RAW

2019 RAW CASHEW NUTS CROP IN 2019 RAW CASHEW NUTS CROP IN 2019 RAW CASHEW NUTS CROP IN 2019 RAW

Raw Sockets and ICMP Raw Sockets and ICMP Code Examples Ping Traceroute Srinidhi

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Pixels Pixels Row and column indicates a PIXEL not a POINT. A pixel can theoretically contain

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

CCD Image Processing: CCD Image Processing: [ ] [ ] r x y , d x y , Raw File [ ]

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Turn Right Walk forward 100 pixels Start Here Walk Forward Turn Left and 100 pixels walk

Polygon Filling Goal intensify the pixels that belong to the polygon Issues which pixels belong

File Input and Output File Input and Output 1 / 9 File input/output input function reads values

Input Input devices Text entry Positional input Input Devices 1 iPod Wheel Input Devices 2

COMP 204 Introduction to image analysis with scikit-image (part two) Mathieu Blanchette, based

Einfhrung in Visual Computing U it 11 P i t O Unit 11: Point Operations ti http://

Mul$dimensional arrays CSCI 136: Fundamentals of Computer Science

LOGIC TECHNOLOGY FOR CS EDUCATION RISCAL The RISC Algorithm Language Wolfgang Schreiner

Digital Image Analysis and Processing CPE 0907544 Image Enhancement Part I Intensity

Readiness of PXD software for phase 2 &amp; Preparation status for phase 3 BPAC focused review

Geirhos et al. (2019) Introduction ImageNet classifjcation with CNNs Which image cues are

Sprocket: A Serverless Video Processing Framework Lixi xiang Ao, , Liz Izhikevi vich ch, ,

Readiness of PXD software for phase 2 & Preparation status for phase 3 BPAC focused review