0 1 0 230 4 0 2 0
play

!"#"$%&'"()*$+*&&,%*-."/+)%+ - PowerPoint PPT Presentation

!"#"$%&'"()*$+*&&,%*-."/+)%+ /"(/%,0'%)%,+*(1+$0(230/4-+$"*,(0(2+0(+ ,%5%4-/ ! "#$%%$&'($)!*+,$-$%! ./+#0$!0%12$3!45647&.5897!:;*<.68! ! =>0?@@A1B$%)C#D%#ECF%!


  1. !"#"$%&'"()*$+*&&,%*-."/+)%+ /"(/%,0'%)%,+*(1+$0(230/4-+$"*,(0(2+0(+ ,%5%4-/ ! "#$%%$&'($)!*+,$-$%! ./+#0$!0%12$3!45647&.5897!:;*<.68! ! =>0?@@A1B$%)C#D%#ECF%! =>0?@@BBBC0-1+,$-$%CG1H! !

  2. !I$=E(#1+%EJ!ED,!K1LD#M($! N$($J10H$D3!#D!O+HED!4DFED3) ! • !"#$%&'(##) *+,"(&($-.) • /"0-.-1-"&) • 234.%5.646) 7484#-314.0)

  3. 7D!#DDE3$!G$%$Y%EJ!ED,!H1%0=1J1L#GEJ! $/+#0H$D3!W! 4DDE3$!H1M(EM1DEJ!)-)3$H!3=E3!F1)3$%)!)01D3ED$1+)!IP9! 1%LED#Q$,!$R0J1%EM1D!S#D3%#D)#G!H1M(EM1D@G+%#1)#3-&,%#($D! $R0J1%EM1DT! ! U131%!0%#H#M($)!3=E3!G1D)3%E#D!3=$!)0EG$!1F!H131%!G1HHED,)! ED,!L$)3+%$)?!$CLC!H+)GJ$)!E%$!D13!G1D3%1JJ$,!#D,#(#,+EJJ-!ED,! #D,$0$D,$D3J-V!1)G#JJE31%)V!W! ! 8$D)1%#!,$3$G31%)!ED,!3%EGX$%)!3=E3!EJJ1B!3=$!YEY-!31! Y113)3%E0!#3)!E>$DM1DEJ!ED,!$H1M1DEJ!)-)3$H)?!$CLC! H1($H$D3V!=#L=!0#3G=V!FEG$)V!W! ! 8$D)1%#H131%!%$A$R$)?!$CLC!$-$!3%EGX#DL!1F!H1(#DL!1Y2$G3)V! GJ1)#DL!=ED,)!B=$D!1Y2$G3)!31+G=$,V!W! ! U1%0=1J1L#GEJ!0%10$%M$)!3=E3!FEG#J#3E3$!3=$!G1D3%1J!1F!3=$! Y1,-V!W!

  4. W!Y+#J3!B#3=#D!E!HE3+%EM1DEJ!0%1L%EH!W ! $CLC!H-$J#DEM1D@H-$J#D1L$D$)#)!0%1L%$))#($J-!Y+#J,#DL!Y%E#D!%$L#1D)V!G1DD$GMDL!3=$H! 31L$3=$%!ED,!31!H+)GJ$)V!#DG%$E)#DL!0%1L%$))#($J-!%$)1J+M1D!1F!)$D)$)!ED,!H131%!G1D3%1JV! W!!!

  5. W!#D a structured physical and social environment ! ! !3=$D!G1DMD+1+)J-!$R3$D,$,!3=EDX)!31!E!! L$D$%#G!J$E%D#DL!ED,!,$($J10H$D3EJ!)-)3$H!

  6. 9".+$-.*#) Developmental :.&3(;*$-.) Developmental Psychology and Social Biology Robotics Study how to build developmental machines 9".+$-.*#) !-64##(.<) Developmental Developmental Psychology and Social Biology Robotics Understand human development better (Weng et al., 2001, Science ) (Lungarella et al., 2006, Conn. Sc. ) (Oudeyer, 2011, Encycl. Lear. Sc. ) 2=>4+0)-?)&0"6@A)BC4)/;+C(04+0";4)-?)D4.&-;(1-0-;)*.6)D-+(*#)7484#-314.0) ! )E4*;.(.<)*#<-;(0C1&)*;4)-.#@)*)+-13-.4.0)

  7. ;$E%D#DL!H1,$J)!F1%! %1Y13!H131%!)X#JJ!EG/+#)#M1D ! Models of the self/body Movements <-> Effects

  8. ;$E%D#DL!H1,$J)!F1%! %1Y13!H131%!)X#JJ!EG/+#)#M1D ! Models of physical interaction with objects Movements <-> Effects

  9. ;$E%D#DL!H1,$J)!F1%! %1Y13!H131%!)X#JJ!EG/+#)#M1D ! Models of tool use Movements <-> Effects

  10. ;$E%D#DL!H1,$J)!F1%! O#L=&,#H$D)#1D)! ! %1Y13!H131%!)X#JJ!EG/+#)#M1D ! O#L=&(1J+H$! ! 831G=E)MG#3-! ! 6$,+D,EDG-! x 1 y i Forward Model y 1 Y r x i Reachable Space of y 2 Effect Inverse Model x 2 Π Y B*&')D3*+4)G)D3*+4)-?) D3*+4)-?)F-.0;-##4;&) 5H4+0&) x i = ( C i , π i ) y i ( C i , ( s 1 , a 1 , ..., s n , a n ) π i ) ∈ R n π i : S ∈ R n → A ∈ R l

  11. U131%!)-D$%L#$)@0%#H#M($) ! J-=-0&A) I"1*.&A)1"&+"#*;)&@.4;<(4&) • DMP Formalism • Recurrent Neural Nets • GMR CPGs • Splines + vector fields (Ijspeert et al., (Rossignol, 1996) 2005)

  12. .R0J1%#DL!ED,!;$E%D#DL!H+JM0J$!H1,$J)! ED,!)X#JJ)!#D!E!,$($J10H$D3EJ!%1Y13 ! Π 1 Bashing param. primitive Π 2 Biting param. primitive Π 3 Head turn param. primitive Π 4 Vocalizing param. primitive Y 1 Mov. sensori. primitive Y 2 Visual patt. sensori. primitive Y 3 Mouth touch sensori. primitive Y 4 Leg touch sensori. primitive Y 5 Sound pitch sensori. primitive The Playground Experiment IEEE Trans. Ev. Comp. (Oudeyer et al., 2007)

  13. Innate equipment + (Social) learning y l,i π i, 1 y l, 1 y l,i Y l,r Multiple Families π i,j y l, 1 y l, 2 Multiple Families π i, 2 Y l,r Y l Π i of Sensori Primitives y l, 2 π i, 1 of Motor Primitives Y l π i,j = π i, 2 = Π i Multiple Task y l,i π i, 1 y l, 1 Multiple Controller π i,j y l,i Y l,r Spaces π i, 2 y l, 1 y l, 2 π i, 1 Π i Spaces Y l,r Y l y l,i π i,j y l, 2 y l, 1 Y l π i, 2 Y l,r Π i y l, 2 Y l + Operators for projecting/ + Operators for projecting/ combining motor primitives combining sensori primitives (include dimensionality reduction or increase) Mechanisms for self-generation of problems = models do be learnt π 1 y i π 1 y i π 1 y i y 1 y 1 y 1 M1 M4 M7 Y r Y r Y r π i π i π i y 2 y 2 y 2 π 2 π 2 π 2 Π Π Π Y Y Y Explore and y i y i y i π 1 π 1 π 1 M2 y 1 M5 y 1 M8 y 1 Y r Y r Y r π i π i π i y 2 y 2 y 2 π 2 π 2 π 2 Π Π Π Y Y Y learn M3 M6 Mi y i y i π 1 π 1 y 1 y 1 ! Y r Y r π i π i y 2 y 2 π 2 π 2 Π Y Π Y

  14. 7GM($!.R0J1%EM1D!ED,!;$E%D#DL ! π 1 y i M1 y 1 What models to generate, explore and learn and in what order, Y r π i y 2 π 2 Π Y given: y i π 1 M2 y 1 Y r π i y 2 π 2 Π Y • High inhomogeneities in the mathematical properties of the M3 mappings y i π 1 y 1 Y r π i y 2 Π π 2 Y • Diversity of complexity/dimensionality/volume , learnability, and level of noise π 1 y i M4 y 1 Y r π i y 2 π 2 Π Y • Some are trivial, some other unlearnable • Some may be non-stationary y i π 1 M5 y 1 Y r π i y 2 π 2 • Life-time severely limited: the set of learnable models cannot Π Y be learnt entirely during lifetime M6 y i π 1 y 1 Y r π i y 2 Π π 2 Y ! The goal is that learnt models can be reused to solve π 1 y i M7 y 1 Y r π i efficiently (predictive or control) problems unknown to the y 2 π 2 Π Y learner initially and taken for e.g. uniformly in a space of y i π 1 M8 y 1 Y r π i problems relevant in the environment in which the robot exists y 2 π 2 Π Y Mi !

  15. 9$G=D#GEJ!G=EJJ$DL$) ! ! Problem generation: Fixed or adaptive set of problems? Adaptive boundaries boundaries for a given problem? How to control of the π 1 y i M1 y 1 growth of complexity (inside and across problems)? Y r π i y 2 π 2 Π Y y i π 1 M2 y 1 ! Problem selection: What problems to focus on ? How to build a Y r π i y 2 π 2 Π Y useful learning curriculum? M3 y i π 1 ! Which measure of interestingness? y 1 Y r π i y 2 Π π 2 Y Standard approaches to active learning will fail (most often do worse π 1 y i M4 y 1 Y r π i y 2 π 2 Π Y than random), i.e. approaches based on sampling where uncertainty is high, density approaches or approaches based on analytic hypothesis y i π 1 M5 y 1 Y r π i y 2 π 2 Π Y about the learning algorithm or the data (e.g. like when using GPs) (Whitehead, 1991; Linden and Weber, 1993; Thrun, 1995; Sutton, 1990; Cohn et al., 1996; M6 y i π 1 y 1 Y r π i Brafman and M. Tennenholtz, 2002; Strehl et Littman, 2006; Szita and Lorincz, 2008) y 2 Π π 2 Y π 1 y i M7 y 1 ! In particular, very difficult to evaluate analytically the information Y r π i y 2 π 2 Π Y gain, rather need to evaluate it empirically, but then how? y i π 1 M8 y 1 Y r π i y 2 π 2 Π Y ! If interaction between self-generated problems, then need for Mi sequential decision optimization " Intrinsically Motivated ! Reinforcement Learning (IMRL, Barto et al. 04, Schmidhuber, 1991).

  16. 9=$!)$E%G=!F1%!#D3$%H$,#E3$!G1H0J$R#3- ! FC(#6)6484#-314.0A)(.0;(.&(+)1-$8*$-.)*.6) :.);-=-0&A) 14+C*.(&1&)-?)&3-.0*.4-"&)4L3#-;*$-.) N$($J10H$D3EJ!! 5$+%1)G#$DG$)! 0)-G=1J1L-! U1,$J)!47KV!647KV!87aa&647KV!UG87aa! S*+,$-$%!$3!EJCV!_^^\b!*+,$-$%!$3!EJCV!_^^cb! NE-ED!ED,!I$JJ$#D$!S_^^_TV!! IE%ED$)!ED,!`E0JEDV!_^^[b!IE%ED$)!ED,! `EXE,$!ED,!NE-ED!S_^^_TV! <=#3$!SZ[\[TV!I$%J-D$!SZ[]^TV!! K)#X)Q$D3H#=EJ-#!SZ[[]T! O1%(#3Q!S_^^^T! `E0JEDV!_^Z^EVYT!! ! 7GM(#M$)!1F!#D3$%H$,#E3$! ! G1H0J$R#3-V!E)!$(EJ+E3$,!$H0#%#GEJJ-V! 7JL1%#3=H#G!E)0$G3)!ED,!/+EJ#3EM($! E%$!#D3%#D)#GEJJ-!%$BE%,#DL! H1,$JJ#DL!1F!)$D)1%#H131%!,$($J10H$D3! ! )!4+C*.(&1&)?-;);4<"#*$.<)0C4) ! <;-K0C)-?)+-13#4L(0@A)0C4)(13-;0*.+4) 4D3$%H$,#E3$!G1H0J$R#3-! # !UER#HEJ! -?)&0*;$.<)&1*##) J$E%D#DL!0%1L%$))!E)!$(EJ+E3$,!$H0#%#GEJJ-!

  17. Interestingness IAC (2007) R-IAC (2009) SAGG-RIAC (2010) = Empirical measure of learning progress Parameterized space of problems/models π 1 y i y 1 Y r π i y 2 Π π 2 Y y i π 1 y 1 Y r π i y 2 π 2 Π Y π 1 y i y 1 Y r π i π 1 y i y 2 y 1 Y r Π π 2 π i Y y 2 π 2 π 1 y i Π Y y 1 Y r π i y 2 Π π 2 Y π 1 y i y 1 Y r π i Stochastic y 2 Π π 2 y i y i Y π 1 π 1 y 1 y 1 Y r Y r π i π i π 1 y i y 2 y 1 y 2 Y r π 2 π i π 2 Π Y Π Y y 2 Choice of Π π 2 Y Problem y i y i π 1 π 1 y 1 y 1 Y r Y r π i π i y 2 y 2 according to a π 2 π 2 Π Y Π Y π 1 y i π 1 y i y 1 y 1 Y r Y r π i π i probability y 2 y 2 Π π 2 Π π 2 Y Y proportional to Learning Progress π 1 y i y 1 Y r π i y 2 π 2 Π Y Recursive splitting or problem space optimizing difference in learning progress

  18. 7GM($!%$L+JEM1D!1F!3=$!L%1B3=!1F! G1H0J$R#3-!#D!$R0J1%EM1D ! Optimizing learning progress , i.e. the decrease of prediction errors (derivative) The IAC/R-IAC (Intelligent Adaptive Curiosity) architecture(s) Makes no assumption on the regression algorithm used as “Predictor” (e.g. can be SVE, GP, or non- parametric) IAC: Oudeyer P-Y, Kaplan , F. and Hafner, V. (2007), R-IAC: Baranes and Oudeyer (2009) Related Work: Schmidhuber (1991, 2006)

  19. http://playground.csl.sony.fr (Oudeyer, Kaplan, Hafner, 2007, IEEE Trans. Evol. Comp.) Here a classic non-parametric regressor is used (Schaal and Atkeson, 1994)

  20. D4#?%-;<*.(M*$-.)-?) 6484#-314.0*#) 3*N4;.& ! B#3=! ".(84;&*#&)*.6) 6(84;&(0@ ) *(1+ ! 6(&+-84;@)-?) +-11".(+*$-. ) ) 4DFED3!ED,!K=#J,!N$(C!_^^d ! :%1DM$%)!#D!5$+%1)G#$DG$V!_^^c ! K1DD$GM1D!8G#$DG$V!_^^] !

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend