Learning Skills from Play: Artificial Curiosity on a Katana Robot - PowerPoint PPT Presentation

Learning Skills from Play: Artificial Curiosity on a Katana Robot Arm Hung Ngo, Matthew Luciw, Alexander Forster, and Juergen Schmidhuber IDSIA-SUPSI-USI, Lugano, Switzerland {hung, matthew, alexander, juergen}@idsia.ch IEEE International Joint Conference on Neural Networks (June 15, 2012) Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 1 / 22

Outline Introduction Progress-Based Artificial Curiosity System Architecture Experiments and Results Conclusion Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 2 / 22

Introduction Learning from Play Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 3 / 22

Learning from Play Introduction Developmental robotics: lessons from children Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 4 / 22

Learning from Play Introduction Developmental robotics: lessons from children Intrinsically motivated playing: no external rewards!!! Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 4 / 22

Learning from Play Introduction Developmental robotics: lessons from children Intrinsically motivated playing: no external rewards!!! Constructive play with manipulation skills. (image from www.safekidscanada.ca) Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 4 / 22

Artificial Curiosity? A Theory: Compression Progress Introduction Juergen Schmidhuber (1990-now): A creative agent needs two learning components— Reinforcement Learner R + Predictor P Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 5 / 22

Artificial Curiosity? A Theory: Compression Progress Introduction Juergen Schmidhuber (1990-now): A creative agent needs two learning components— Reinforcement Learner R + Predictor P The learning progress or expected improvement of P becomes an intrinsic reward for R . Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 5 / 22

Artificial Curiosity? A Theory: Compression Progress Introduction Juergen Schmidhuber (1990-now): A creative agent needs two learning components— Reinforcement Learner R + Predictor P The learning progress or expected improvement of P becomes an intrinsic reward for R . Hence, to achieve high intrinsic reward, R is motivated to create new experiences such that P makes quick progress. Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 5 / 22

Details of the System Top-down image of the workspace Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 6 / 22

Innate Knowledge and Skills Details of the System Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 7 / 22

Innate Knowledge and Skills Details of the System Pick a selected block Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 7 / 22

Innate Knowledge and Skills Details of the System Pick a selected block Place at a selected location ( X , Y -coordinates & height) Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 7 / 22

Innate Knowledge and Skills Details of the System Pick a selected block Place at a selected location ( X , Y -coordinates & height) Outcome concepts “Stable/Unstable” Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 7 / 22

Innate Knowledge and Skills Details of the System Pick a selected block Place at a selected location ( X , Y -coordinates & height) Outcome concepts “Stable/Unstable” But, what to pick and where to place? Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 7 / 22

Top-down image of the workspace – Cropped Details of the System Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 8 / 22

Workspace – Boundaries Extraction Details of the System Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 9 / 22

F=(0,0,1,.,.) F=(0,0,.,.,.) F=(0,0,1,0,.) F=(0,0,1,0,0) F=(0,0,1,0,0) s=1 : height 1 a=1 : F has 1 bit set Receptive Field: Observation Features Extraction Details of the System potential placement location F=(0,.,.,.,.) Receptive Field Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 10 / 22

F=(0,0,1,.,.) F=(0,0,1,0,.) F=(0,0,1,0,0) F=(0,0,1,0,0) s=1 : height 1 a=1 : F has 1 bit set Receptive Field: Observation Features Extraction Details of the System potential placement location F=(0,.,.,.,.) F=(0,0,.,.,.) Receptive Field Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 10 / 22

F=(0,0,1,0,0) F=(0,0,1,0,.) F=(0,0,1,0,0) s=1 : height 1 a=1 : F has 1 bit set Receptive Field: Observation Features Extraction Details of the System potential placement location F=(0,.,.,.,.) F=(0,0,.,.,.) F=(0,0,1,.,.) Receptive Field Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 10 / 22

F=(0,0,1,0,0) F=(0,0,1,0,0) s=1 : height 1 a=1 : F has 1 bit set Receptive Field: Observation Features Extraction Details of the System potential placement location F=(0,.,.,.,.) F=(0,0,.,.,.) F=(0,0,1,.,.) F=(0,0,1,0,.) Receptive Field Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 10 / 22

F=(0,0,1,0,0) s=1 : height 1 a=1 : F has 1 bit set Receptive Field: Observation Features Extraction Details of the System potential placement location F=(0,.,.,.,.) F=(0,0,.,.,.) F=(0,0,1,.,.) F=(0,0,1,0,.) F=(0,0,1,0,0) Receptive Field Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 10 / 22

Receptive Field: Observation Features Extraction Details of the System potential placement location F=(0,.,.,.,.) F=(0,0,.,.,.) F=(0,0,1,.,.) F=(0,0,1,0,.) F=(0,0,1,0,0) F=(0,0,1,0,0) s=1 : height 1 Receptive Field a=1 : F has 1 bit set Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 10 / 22

s0,a0 s0,a0 s0,a0 s0,a0 Receptive Field: Fovea-like Subimage Observations Details of the System s0,a0 Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 11 / 22

s0,a0 s0,a0 s0,a0 Receptive Field: Fovea-like Subimage Observations Details of the System s0,a0 s0,a0 Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 11 / 22

s0,a0 s0,a0 Receptive Field: Fovea-like Subimage Observations Details of the System s0,a0 s0,a0 s0,a0 Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 11 / 22

s0,a0 Receptive Field: Fovea-like Subimage Observations Details of the System s0,a0 s0,a0 s0,a0 s0,a0 Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 11 / 22

Receptive Field: Fovea-like Subimage Observations Details of the System s0,a0 s0,a0 s0,a0 s0,a0 s0,a0 Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 11 / 22

s2,a1 s2,a5 Receptive Field: Fovea-like Subimage Observations Details of the System s1,a5 Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 12 / 22

s2,a5 Receptive Field: Fovea-like Subimage Observations Details of the System s2,a1 s1,a5 Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 12 / 22

Receptive Field: Fovea-like Subimage Observations Details of the System s2,a1 s2,a5 s1,a5 Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 12 / 22

Predictors P i : Learning how the world works Details of the System 1Regularized Least Square Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 13 / 22

Predictors P i : Learning how the world works Details of the System Each P i predicts a basic physical concept—whether the placed block will stay there after released. 1Regularized Least Square Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 13 / 22

Predictors P i : Learning how the world works Details of the System Each P i predicts a basic physical concept—whether the placed block will stay there after released. Self-generated labels ( Stable ≡ + 1 / Unstable ≡ − 1) 1Regularized Least Square Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 13 / 22

Predictors P i : Learning how the world works Details of the System Each P i predicts a basic physical concept—whether the placed block will stay there after released. Self-generated labels ( Stable ≡ + 1 / Unstable ≡ − 1) Implemented as RLS 1 -based online linear classifiers [4]. 1Regularized Least Square Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 13 / 22

Predictors P i : Learning how the world works Details of the System Each P i predicts a basic physical concept—whether the placed block will stay there after released. Self-generated labels ( Stable ≡ + 1 / Unstable ≡ − 1) Implemented as RLS 1 -based online linear classifiers [4]. Extended to also give a confidence interval in such prediction. 1Regularized Least Square Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 13 / 22

Predictors P i : Learning how the world works Details of the System Each P i predicts a basic physical concept—whether the placed block will stay there after released. Self-generated labels ( Stable ≡ + 1 / Unstable ≡ − 1) Implemented as RLS 1 -based online linear classifiers [4]. Extended to also give a confidence interval in such prediction. The learning progress, calculated as confidence improvement , is used as intrinsic reward. 1Regularized Least Square Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 13 / 22

Learning Skills from Play: Artificial Curiosity on a Katana Robot - PowerPoint PPT Presentation

Learning Skills from Play: Artificial Curiosity on a Katana Robot Arm Hung Ngo, Matthew Luciw, Alexander Forster, and Juergen Schmidhuber IDSIA-SUPSI-USI, Lugano, Switzerland {hung, matthew, alexander, juergen}@idsia.ch IEEE International

OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR

Does God play dice with the cell? Does God play dice with the cell? Does God play dice with the

the early modern era Research is feeding curiosity and answering questions The Guardian 14

Cabinets of Curiosity What are Cabinets of Curiosity? Background Context -Renaissance -The

CUSTOMER CURIOSITY EXPERIENCE People stop and look at things that pique their curiosity every

Killer Presentation Skills: How to Acquire the Skills and Killer Presentation Skills: How to

DIMACS WORKSHOP Media for Play, Expression, Curiosity, and Learning: Mathematics through

Welcome to 8 th Grade Parent Information Night What are the IB approaches to learning skills? 1.

Play with Ants, Play as Ants: The Kodomo Project Report on the Play-Shop Hiroaki Ishiguro

Promenade Park Garden Play Area New Play Area Layout wynne-williams associates Promenade Park

The Curiosity Cycle Jonathan Mugan, @jmugan Tech2025:

The Curiosity Frontier Roni Harnik, Fermilab Why Are We Here? We are curious . We are like kids

Professional Curiosity 7 minute slides to enhance safeguarding practice and promote professional

Proposition Knowledge Graphs Gabriel Stanovsky Omer Levy Ido Dagan Bar-Ilan University Israel

Developing Early maths Skills First skills Children learn very basic maths skills from an early

Understanding the impact of variations in the skills supply and demand SKILLS GAPS AND HIGH

Visual Perception and Color CS/BIOEN 4640: Image Processing Basics April 3, 2012 The Human Eye

Retina Tues. Jan. 23, 2018 1 Layers of the Retina (rods and cones) light signals To the

Paradox of Perception Which way was the train moving? A. Towards us B. Away from us 100

Defense Against the Dark Arts: An overview of adversarial example security research and future

Readings Covered Human Perception Foveal Vision thumbnail at arms length Lecture 5:

Administrivia Assignment 2 available now - back to programming - due next Wednesday CS 89/189:

L42. THE EYE OF THE FLY In superposition eyes, light from many focusing lenses converge on a small

Locating Cephalometric X-Ray Landmarks with Foveated Pyramid Attention Logan Gilmour, Nilanjan