learning skills from play artificial curiosity on a
play

Learning Skills from Play: Artificial Curiosity on a Katana Robot - PowerPoint PPT Presentation

Learning Skills from Play: Artificial Curiosity on a Katana Robot Arm Hung Ngo, Matthew Luciw, Alexander Forster, and Juergen Schmidhuber IDSIA-SUPSI-USI, Lugano, Switzerland {hung, matthew, alexander, juergen}@idsia.ch IEEE International


  1. Learning Skills from Play: Artificial Curiosity on a Katana Robot Arm Hung Ngo, Matthew Luciw, Alexander Forster, and Juergen Schmidhuber IDSIA-SUPSI-USI, Lugano, Switzerland {hung, matthew, alexander, juergen}@idsia.ch IEEE International Joint Conference on Neural Networks (June 15, 2012) Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 1 / 22

  2. Outline Introduction Progress-Based Artificial Curiosity System Architecture Experiments and Results Conclusion Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 2 / 22

  3. Introduction Learning from Play Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 3 / 22

  4. Introduction Learning from Play Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 3 / 22

  5. Learning from Play Introduction Developmental robotics: lessons from children Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 4 / 22

  6. Learning from Play Introduction Developmental robotics: lessons from children Intrinsically motivated playing: no external rewards!!! Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 4 / 22

  7. Learning from Play Introduction Developmental robotics: lessons from children Intrinsically motivated playing: no external rewards!!! Constructive play with manipulation skills. (image from www.safekidscanada.ca) Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 4 / 22

  8. Artificial Curiosity? A Theory: Compression Progress Introduction Juergen Schmidhuber (1990-now): A creative agent needs two learning components— Reinforcement Learner R + Predictor P Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 5 / 22

  9. Artificial Curiosity? A Theory: Compression Progress Introduction Juergen Schmidhuber (1990-now): A creative agent needs two learning components— Reinforcement Learner R + Predictor P The learning progress or expected improvement of P becomes an intrinsic reward for R . Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 5 / 22

  10. Artificial Curiosity? A Theory: Compression Progress Introduction Juergen Schmidhuber (1990-now): A creative agent needs two learning components— Reinforcement Learner R + Predictor P The learning progress or expected improvement of P becomes an intrinsic reward for R . Hence, to achieve high intrinsic reward, R is motivated to create new experiences such that P makes quick progress. Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 5 / 22

  11. Details of the System Top-down image of the workspace Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 6 / 22

  12. Innate Knowledge and Skills Details of the System Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 7 / 22

  13. Innate Knowledge and Skills Details of the System Pick a selected block Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 7 / 22

  14. Innate Knowledge and Skills Details of the System Pick a selected block Place at a selected location ( X , Y -coordinates & height) Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 7 / 22

  15. Innate Knowledge and Skills Details of the System Pick a selected block Place at a selected location ( X , Y -coordinates & height) Outcome concepts “Stable/Unstable” Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 7 / 22

  16. Innate Knowledge and Skills Details of the System Pick a selected block Place at a selected location ( X , Y -coordinates & height) Outcome concepts “Stable/Unstable” But, what to pick and where to place? Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 7 / 22

  17. Top-down image of the workspace – Cropped Details of the System Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 8 / 22

  18. Workspace – Boundaries Extraction Details of the System Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 9 / 22

  19. F=(0,0,1,.,.) F=(0,0,.,.,.) F=(0,0,1,0,.) F=(0,0,1,0,0) F=(0,0,1,0,0) s=1 : height 1 a=1 : F has 1 bit set Receptive Field: Observation Features Extraction Details of the System potential placement location F=(0,.,.,.,.) Receptive Field Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 10 / 22

  20. F=(0,0,1,.,.) F=(0,0,1,0,.) F=(0,0,1,0,0) F=(0,0,1,0,0) s=1 : height 1 a=1 : F has 1 bit set Receptive Field: Observation Features Extraction Details of the System potential placement location F=(0,.,.,.,.) F=(0,0,.,.,.) Receptive Field Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 10 / 22

  21. F=(0,0,1,0,0) F=(0,0,1,0,.) F=(0,0,1,0,0) s=1 : height 1 a=1 : F has 1 bit set Receptive Field: Observation Features Extraction Details of the System potential placement location F=(0,.,.,.,.) F=(0,0,.,.,.) F=(0,0,1,.,.) Receptive Field Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 10 / 22

  22. F=(0,0,1,0,0) F=(0,0,1,0,0) s=1 : height 1 a=1 : F has 1 bit set Receptive Field: Observation Features Extraction Details of the System potential placement location F=(0,.,.,.,.) F=(0,0,.,.,.) F=(0,0,1,.,.) F=(0,0,1,0,.) Receptive Field Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 10 / 22

  23. F=(0,0,1,0,0) s=1 : height 1 a=1 : F has 1 bit set Receptive Field: Observation Features Extraction Details of the System potential placement location F=(0,.,.,.,.) F=(0,0,.,.,.) F=(0,0,1,.,.) F=(0,0,1,0,.) F=(0,0,1,0,0) Receptive Field Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 10 / 22

  24. Receptive Field: Observation Features Extraction Details of the System potential placement location F=(0,.,.,.,.) F=(0,0,.,.,.) F=(0,0,1,.,.) F=(0,0,1,0,.) F=(0,0,1,0,0) F=(0,0,1,0,0) s=1 : height 1 Receptive Field a=1 : F has 1 bit set Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 10 / 22

  25. s0,a0 s0,a0 s0,a0 s0,a0 Receptive Field: Fovea-like Subimage Observations Details of the System s0,a0 Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 11 / 22

  26. s0,a0 s0,a0 s0,a0 Receptive Field: Fovea-like Subimage Observations Details of the System s0,a0 s0,a0 Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 11 / 22

  27. s0,a0 s0,a0 Receptive Field: Fovea-like Subimage Observations Details of the System s0,a0 s0,a0 s0,a0 Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 11 / 22

  28. s0,a0 Receptive Field: Fovea-like Subimage Observations Details of the System s0,a0 s0,a0 s0,a0 s0,a0 Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 11 / 22

  29. Receptive Field: Fovea-like Subimage Observations Details of the System s0,a0 s0,a0 s0,a0 s0,a0 s0,a0 Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 11 / 22

  30. s2,a1 s2,a5 Receptive Field: Fovea-like Subimage Observations Details of the System s1,a5 Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 12 / 22

  31. s2,a5 Receptive Field: Fovea-like Subimage Observations Details of the System s2,a1 s1,a5 Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 12 / 22

  32. Receptive Field: Fovea-like Subimage Observations Details of the System s2,a1 s2,a5 s1,a5 Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 12 / 22

  33. Predictors P i : Learning how the world works Details of the System 1Regularized Least Square Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 13 / 22

  34. Predictors P i : Learning how the world works Details of the System Each P i predicts a basic physical concept—whether the placed block will stay there after released. 1Regularized Least Square Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 13 / 22

  35. Predictors P i : Learning how the world works Details of the System Each P i predicts a basic physical concept—whether the placed block will stay there after released. Self-generated labels ( Stable ≡ + 1 / Unstable ≡ − 1) 1Regularized Least Square Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 13 / 22

  36. Predictors P i : Learning how the world works Details of the System Each P i predicts a basic physical concept—whether the placed block will stay there after released. Self-generated labels ( Stable ≡ + 1 / Unstable ≡ − 1) Implemented as RLS 1 -based online linear classifiers [4]. 1Regularized Least Square Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 13 / 22

  37. Predictors P i : Learning how the world works Details of the System Each P i predicts a basic physical concept—whether the placed block will stay there after released. Self-generated labels ( Stable ≡ + 1 / Unstable ≡ − 1) Implemented as RLS 1 -based online linear classifiers [4]. Extended to also give a confidence interval in such prediction. 1Regularized Least Square Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 13 / 22

  38. Predictors P i : Learning how the world works Details of the System Each P i predicts a basic physical concept—whether the placed block will stay there after released. Self-generated labels ( Stable ≡ + 1 / Unstable ≡ − 1) Implemented as RLS 1 -based online linear classifiers [4]. Extended to also give a confidence interval in such prediction. The learning progress, calculated as confidence improvement , is used as intrinsic reward. 1Regularized Least Square Hung Ngo et al (IDSIA) Learning Skills from Play IJCNN 2012 13 / 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend