Learning Skills from Play: Artificial Curiosity on a Katana Robot Arm Hung Ngo, Matthew Luciw, Alexander Forster, and Juergen Schmidhuber IDSIA-SUPSI-USI, Lugano, Switzerland {hung, matthew, alexander, juergen}@idsia.ch IEEE International Joint Conference on Neural Networks (June 15, 2012) Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 1 / 22
Outline Introduction Progress-Based Artificial Curiosity System Architecture Experiments and Results Conclusion Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 2 / 22
Introduction Learning from Play Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 3 / 22
Introduction Learning from Play Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 3 / 22
Learning from Play Introduction Developmental robotics: lessons from children Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 4 / 22
Learning from Play Introduction Developmental robotics: lessons from children Intrinsically motivated playing: no external rewards!!! Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 4 / 22
Learning from Play Introduction Developmental robotics: lessons from children Intrinsically motivated playing: no external rewards!!! Constructive play with manipulation skills. (image from www.safekidscanada.ca) Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 4 / 22
Artificial Curiosity? A Theory: Compression Progress Introduction Juergen Schmidhuber (1990-now): A creative agent needs two learning components— Reinforcement Learner R + Predictor P Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 5 / 22
Artificial Curiosity? A Theory: Compression Progress Introduction Juergen Schmidhuber (1990-now): A creative agent needs two learning components— Reinforcement Learner R + Predictor P The learning progress or expected improvement of P becomes an intrinsic reward for R . Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 5 / 22
Artificial Curiosity? A Theory: Compression Progress Introduction Juergen Schmidhuber (1990-now): A creative agent needs two learning components— Reinforcement Learner R + Predictor P The learning progress or expected improvement of P becomes an intrinsic reward for R . Hence, to achieve high intrinsic reward, R is motivated to create new experiences such that P makes quick progress. Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 5 / 22
Details of the System Top-down image of the workspace Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 6 / 22
Innate Knowledge and Skills Details of the System Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 7 / 22
Innate Knowledge and Skills Details of the System Pick a selected block Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 7 / 22
Innate Knowledge and Skills Details of the System Pick a selected block Place at a selected location ( X , Y -coordinates & height) Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 7 / 22
Innate Knowledge and Skills Details of the System Pick a selected block Place at a selected location ( X , Y -coordinates & height) Outcome concepts “Stable/Unstable” Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 7 / 22
Innate Knowledge and Skills Details of the System Pick a selected block Place at a selected location ( X , Y -coordinates & height) Outcome concepts “Stable/Unstable” But, what to pick and where to place? Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 7 / 22
Top-down image of the workspace – Cropped Details of the System Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 8 / 22
Workspace – Boundaries Extraction Details of the System Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 9 / 22
F=(0,0,1,.,.) F=(0,0,.,.,.) F=(0,0,1,0,.) F=(0,0,1,0,0) F=(0,0,1,0,0) s=1 : height 1 a=1 : F has 1 bit set Receptive Field: Observation Features Extraction Details of the System potential placement location F=(0,.,.,.,.) Receptive Field Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 10 / 22
F=(0,0,1,.,.) F=(0,0,1,0,.) F=(0,0,1,0,0) F=(0,0,1,0,0) s=1 : height 1 a=1 : F has 1 bit set Receptive Field: Observation Features Extraction Details of the System potential placement location F=(0,.,.,.,.) F=(0,0,.,.,.) Receptive Field Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 10 / 22
F=(0,0,1,0,0) F=(0,0,1,0,.) F=(0,0,1,0,0) s=1 : height 1 a=1 : F has 1 bit set Receptive Field: Observation Features Extraction Details of the System potential placement location F=(0,.,.,.,.) F=(0,0,.,.,.) F=(0,0,1,.,.) Receptive Field Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 10 / 22
F=(0,0,1,0,0) F=(0,0,1,0,0) s=1 : height 1 a=1 : F has 1 bit set Receptive Field: Observation Features Extraction Details of the System potential placement location F=(0,.,.,.,.) F=(0,0,.,.,.) F=(0,0,1,.,.) F=(0,0,1,0,.) Receptive Field Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 10 / 22
F=(0,0,1,0,0) s=1 : height 1 a=1 : F has 1 bit set Receptive Field: Observation Features Extraction Details of the System potential placement location F=(0,.,.,.,.) F=(0,0,.,.,.) F=(0,0,1,.,.) F=(0,0,1,0,.) F=(0,0,1,0,0) Receptive Field Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 10 / 22
Receptive Field: Observation Features Extraction Details of the System potential placement location F=(0,.,.,.,.) F=(0,0,.,.,.) F=(0,0,1,.,.) F=(0,0,1,0,.) F=(0,0,1,0,0) F=(0,0,1,0,0) s=1 : height 1 Receptive Field a=1 : F has 1 bit set Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 10 / 22
s0,a0 s0,a0 s0,a0 s0,a0 Receptive Field: Fovea-like Subimage Observations Details of the System s0,a0 Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 11 / 22
s0,a0 s0,a0 s0,a0 Receptive Field: Fovea-like Subimage Observations Details of the System s0,a0 s0,a0 Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 11 / 22
s0,a0 s0,a0 Receptive Field: Fovea-like Subimage Observations Details of the System s0,a0 s0,a0 s0,a0 Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 11 / 22
s0,a0 Receptive Field: Fovea-like Subimage Observations Details of the System s0,a0 s0,a0 s0,a0 s0,a0 Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 11 / 22
Receptive Field: Fovea-like Subimage Observations Details of the System s0,a0 s0,a0 s0,a0 s0,a0 s0,a0 Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 11 / 22
s2,a1 s2,a5 Receptive Field: Fovea-like Subimage Observations Details of the System s1,a5 Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 12 / 22
s2,a5 Receptive Field: Fovea-like Subimage Observations Details of the System s2,a1 s1,a5 Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 12 / 22
Receptive Field: Fovea-like Subimage Observations Details of the System s2,a1 s2,a5 s1,a5 Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 12 / 22
Predictors P i : Learning how the world works Details of the System 1Regularized Least Square Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 13 / 22
Predictors P i : Learning how the world works Details of the System Each P i predicts a basic physical concept—whether the placed block will stay there after released. 1Regularized Least Square Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 13 / 22
Predictors P i : Learning how the world works Details of the System Each P i predicts a basic physical concept—whether the placed block will stay there after released. Self-generated labels ( Stable ≡ + 1 / Unstable ≡ − 1) 1Regularized Least Square Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 13 / 22
Predictors P i : Learning how the world works Details of the System Each P i predicts a basic physical concept—whether the placed block will stay there after released. Self-generated labels ( Stable ≡ + 1 / Unstable ≡ − 1) Implemented as RLS 1 -based online linear classifiers [4]. 1Regularized Least Square Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 13 / 22
Predictors P i : Learning how the world works Details of the System Each P i predicts a basic physical concept—whether the placed block will stay there after released. Self-generated labels ( Stable ≡ + 1 / Unstable ≡ − 1) Implemented as RLS 1 -based online linear classifiers [4]. Extended to also give a confidence interval in such prediction. 1Regularized Least Square Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 13 / 22
Predictors P i : Learning how the world works Details of the System Each P i predicts a basic physical concept—whether the placed block will stay there after released. Self-generated labels ( Stable ≡ + 1 / Unstable ≡ − 1) Implemented as RLS 1 -based online linear classifiers [4]. Extended to also give a confidence interval in such prediction. The learning progress, calculated as confidence improvement , is used as intrinsic reward. 1Regularized Least Square Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 13 / 22
Recommend
More recommend