!"#"$%&'"()*$+*&&,%*-."/+)%+ /"(/%,0'%)%,+*(1+$0(230/4-+$"*,(0(2+0(+ ,%5%4-/ ! "#$%%$&'($)!*+,$-$%! ./+#0$!0%12$3!45647&.5897!:;*<.68! ! =>0?@@A1B$%)C#D%#ECF%! =>0?@@BBBC0-1+,$-$%CG1H! !
!I$=E(#1+%EJ!ED,!K1LD#M($! N$($J10H$D3!#D!O+HED!4DFED3) ! • !"#$%&'(##) *+,"(&($-.) • /"0-.-1-"&) • 234.%5.646) 7484#-314.0)
7D!#DDE3$!G$%$Y%EJ!ED,!H1%0=1J1L#GEJ! $/+#0H$D3!W! 4DDE3$!H1M(EM1DEJ!)-)3$H!3=E3!F1)3$%)!)01D3ED$1+)!IP9! 1%LED#Q$,!$R0J1%EM1D!S#D3%#D)#G!H1M(EM1D@G+%#1)#3-&,%#($D! $R0J1%EM1DT! ! U131%!0%#H#M($)!3=E3!G1D)3%E#D!3=$!)0EG$!1F!H131%!G1HHED,)! ED,!L$)3+%$)?!$CLC!H+)GJ$)!E%$!D13!G1D3%1JJ$,!#D,#(#,+EJJ-!ED,! #D,$0$D,$D3J-V!1)G#JJE31%)V!W! ! 8$D)1%#!,$3$G31%)!ED,!3%EGX$%)!3=E3!EJJ1B!3=$!YEY-!31! Y113)3%E0!#3)!E>$DM1DEJ!ED,!$H1M1DEJ!)-)3$H)?!$CLC! H1($H$D3V!=#L=!0#3G=V!FEG$)V!W! ! 8$D)1%#H131%!%$A$R$)?!$CLC!$-$!3%EGX#DL!1F!H1(#DL!1Y2$G3)V! GJ1)#DL!=ED,)!B=$D!1Y2$G3)!31+G=$,V!W! ! U1%0=1J1L#GEJ!0%10$%M$)!3=E3!FEG#J#3E3$!3=$!G1D3%1J!1F!3=$! Y1,-V!W!
W!Y+#J3!B#3=#D!E!HE3+%EM1DEJ!0%1L%EH!W ! $CLC!H-$J#DEM1D@H-$J#D1L$D$)#)!0%1L%$))#($J-!Y+#J,#DL!Y%E#D!%$L#1D)V!G1DD$GMDL!3=$H! 31L$3=$%!ED,!31!H+)GJ$)V!#DG%$E)#DL!0%1L%$))#($J-!%$)1J+M1D!1F!)$D)$)!ED,!H131%!G1D3%1JV! W!!!
W!#D a structured physical and social environment ! ! !3=$D!G1DMD+1+)J-!$R3$D,$,!3=EDX)!31!E!! L$D$%#G!J$E%D#DL!ED,!,$($J10H$D3EJ!)-)3$H!
9".+$-.*#) Developmental :.&3(;*$-.) Developmental Psychology and Social Biology Robotics Study how to build developmental machines 9".+$-.*#) !-64##(.<) Developmental Developmental Psychology and Social Biology Robotics Understand human development better (Weng et al., 2001, Science ) (Lungarella et al., 2006, Conn. Sc. ) (Oudeyer, 2011, Encycl. Lear. Sc. ) 2=>4+0)-?)&0"6@A)BC4)/;+C(04+0";4)-?)D4.&-;(1-0-;)*.6)D-+(*#)7484#-314.0) ! )E4*;.(.<)*#<-;(0C1&)*;4)-.#@)*)+-13-.4.0)
;$E%D#DL!H1,$J)!F1%! %1Y13!H131%!)X#JJ!EG/+#)#M1D ! Models of the self/body Movements <-> Effects
;$E%D#DL!H1,$J)!F1%! %1Y13!H131%!)X#JJ!EG/+#)#M1D ! Models of physical interaction with objects Movements <-> Effects
;$E%D#DL!H1,$J)!F1%! %1Y13!H131%!)X#JJ!EG/+#)#M1D ! Models of tool use Movements <-> Effects
;$E%D#DL!H1,$J)!F1%! O#L=&,#H$D)#1D)! ! %1Y13!H131%!)X#JJ!EG/+#)#M1D ! O#L=&(1J+H$! ! 831G=E)MG#3-! ! 6$,+D,EDG-! x 1 y i Forward Model y 1 Y r x i Reachable Space of y 2 Effect Inverse Model x 2 Π Y B*&')D3*+4)G)D3*+4)-?) D3*+4)-?)F-.0;-##4;&) 5H4+0&) x i = ( C i , π i ) y i ( C i , ( s 1 , a 1 , ..., s n , a n ) π i ) ∈ R n π i : S ∈ R n → A ∈ R l
U131%!)-D$%L#$)@0%#H#M($) ! J-=-0&A) I"1*.&A)1"&+"#*;)&@.4;<(4&) • DMP Formalism • Recurrent Neural Nets • GMR CPGs • Splines + vector fields (Ijspeert et al., (Rossignol, 1996) 2005)
.R0J1%#DL!ED,!;$E%D#DL!H+JM0J$!H1,$J)! ED,!)X#JJ)!#D!E!,$($J10H$D3EJ!%1Y13 ! Π 1 Bashing param. primitive Π 2 Biting param. primitive Π 3 Head turn param. primitive Π 4 Vocalizing param. primitive Y 1 Mov. sensori. primitive Y 2 Visual patt. sensori. primitive Y 3 Mouth touch sensori. primitive Y 4 Leg touch sensori. primitive Y 5 Sound pitch sensori. primitive The Playground Experiment IEEE Trans. Ev. Comp. (Oudeyer et al., 2007)
Innate equipment + (Social) learning y l,i π i, 1 y l, 1 y l,i Y l,r Multiple Families π i,j y l, 1 y l, 2 Multiple Families π i, 2 Y l,r Y l Π i of Sensori Primitives y l, 2 π i, 1 of Motor Primitives Y l π i,j = π i, 2 = Π i Multiple Task y l,i π i, 1 y l, 1 Multiple Controller π i,j y l,i Y l,r Spaces π i, 2 y l, 1 y l, 2 π i, 1 Π i Spaces Y l,r Y l y l,i π i,j y l, 2 y l, 1 Y l π i, 2 Y l,r Π i y l, 2 Y l + Operators for projecting/ + Operators for projecting/ combining motor primitives combining sensori primitives (include dimensionality reduction or increase) Mechanisms for self-generation of problems = models do be learnt π 1 y i π 1 y i π 1 y i y 1 y 1 y 1 M1 M4 M7 Y r Y r Y r π i π i π i y 2 y 2 y 2 π 2 π 2 π 2 Π Π Π Y Y Y Explore and y i y i y i π 1 π 1 π 1 M2 y 1 M5 y 1 M8 y 1 Y r Y r Y r π i π i π i y 2 y 2 y 2 π 2 π 2 π 2 Π Π Π Y Y Y learn M3 M6 Mi y i y i π 1 π 1 y 1 y 1 ! Y r Y r π i π i y 2 y 2 π 2 π 2 Π Y Π Y
7GM($!.R0J1%EM1D!ED,!;$E%D#DL ! π 1 y i M1 y 1 What models to generate, explore and learn and in what order, Y r π i y 2 π 2 Π Y given: y i π 1 M2 y 1 Y r π i y 2 π 2 Π Y • High inhomogeneities in the mathematical properties of the M3 mappings y i π 1 y 1 Y r π i y 2 Π π 2 Y • Diversity of complexity/dimensionality/volume , learnability, and level of noise π 1 y i M4 y 1 Y r π i y 2 π 2 Π Y • Some are trivial, some other unlearnable • Some may be non-stationary y i π 1 M5 y 1 Y r π i y 2 π 2 • Life-time severely limited: the set of learnable models cannot Π Y be learnt entirely during lifetime M6 y i π 1 y 1 Y r π i y 2 Π π 2 Y ! The goal is that learnt models can be reused to solve π 1 y i M7 y 1 Y r π i efficiently (predictive or control) problems unknown to the y 2 π 2 Π Y learner initially and taken for e.g. uniformly in a space of y i π 1 M8 y 1 Y r π i problems relevant in the environment in which the robot exists y 2 π 2 Π Y Mi !
9$G=D#GEJ!G=EJJ$DL$) ! ! Problem generation: Fixed or adaptive set of problems? Adaptive boundaries boundaries for a given problem? How to control of the π 1 y i M1 y 1 growth of complexity (inside and across problems)? Y r π i y 2 π 2 Π Y y i π 1 M2 y 1 ! Problem selection: What problems to focus on ? How to build a Y r π i y 2 π 2 Π Y useful learning curriculum? M3 y i π 1 ! Which measure of interestingness? y 1 Y r π i y 2 Π π 2 Y Standard approaches to active learning will fail (most often do worse π 1 y i M4 y 1 Y r π i y 2 π 2 Π Y than random), i.e. approaches based on sampling where uncertainty is high, density approaches or approaches based on analytic hypothesis y i π 1 M5 y 1 Y r π i y 2 π 2 Π Y about the learning algorithm or the data (e.g. like when using GPs) (Whitehead, 1991; Linden and Weber, 1993; Thrun, 1995; Sutton, 1990; Cohn et al., 1996; M6 y i π 1 y 1 Y r π i Brafman and M. Tennenholtz, 2002; Strehl et Littman, 2006; Szita and Lorincz, 2008) y 2 Π π 2 Y π 1 y i M7 y 1 ! In particular, very difficult to evaluate analytically the information Y r π i y 2 π 2 Π Y gain, rather need to evaluate it empirically, but then how? y i π 1 M8 y 1 Y r π i y 2 π 2 Π Y ! If interaction between self-generated problems, then need for Mi sequential decision optimization " Intrinsically Motivated ! Reinforcement Learning (IMRL, Barto et al. 04, Schmidhuber, 1991).
9=$!)$E%G=!F1%!#D3$%H$,#E3$!G1H0J$R#3- ! FC(#6)6484#-314.0A)(.0;(.&(+)1-$8*$-.)*.6) :.);-=-0&A) 14+C*.(&1&)-?)&3-.0*.4-"&)4L3#-;*$-.) N$($J10H$D3EJ!! 5$+%1)G#$DG$)! 0)-G=1J1L-! U1,$J)!47KV!647KV!87aa&647KV!UG87aa! S*+,$-$%!$3!EJCV!_^^\b!*+,$-$%!$3!EJCV!_^^cb! NE-ED!ED,!I$JJ$#D$!S_^^_TV!! IE%ED$)!ED,!`E0JEDV!_^^[b!IE%ED$)!ED,! `EXE,$!ED,!NE-ED!S_^^_TV! <=#3$!SZ[\[TV!I$%J-D$!SZ[]^TV!! K)#X)Q$D3H#=EJ-#!SZ[[]T! O1%(#3Q!S_^^^T! `E0JEDV!_^Z^EVYT!! ! 7GM(#M$)!1F!#D3$%H$,#E3$! ! G1H0J$R#3-V!E)!$(EJ+E3$,!$H0#%#GEJJ-V! 7JL1%#3=H#G!E)0$G3)!ED,!/+EJ#3EM($! E%$!#D3%#D)#GEJJ-!%$BE%,#DL! H1,$JJ#DL!1F!)$D)1%#H131%!,$($J10H$D3! ! )!4+C*.(&1&)?-;);4<"#*$.<)0C4) ! <;-K0C)-?)+-13#4L(0@A)0C4)(13-;0*.+4) 4D3$%H$,#E3$!G1H0J$R#3-! # !UER#HEJ! -?)&0*;$.<)&1*##) J$E%D#DL!0%1L%$))!E)!$(EJ+E3$,!$H0#%#GEJJ-!
Interestingness IAC (2007) R-IAC (2009) SAGG-RIAC (2010) = Empirical measure of learning progress Parameterized space of problems/models π 1 y i y 1 Y r π i y 2 Π π 2 Y y i π 1 y 1 Y r π i y 2 π 2 Π Y π 1 y i y 1 Y r π i π 1 y i y 2 y 1 Y r Π π 2 π i Y y 2 π 2 π 1 y i Π Y y 1 Y r π i y 2 Π π 2 Y π 1 y i y 1 Y r π i Stochastic y 2 Π π 2 y i y i Y π 1 π 1 y 1 y 1 Y r Y r π i π i π 1 y i y 2 y 1 y 2 Y r π 2 π i π 2 Π Y Π Y y 2 Choice of Π π 2 Y Problem y i y i π 1 π 1 y 1 y 1 Y r Y r π i π i y 2 y 2 according to a π 2 π 2 Π Y Π Y π 1 y i π 1 y i y 1 y 1 Y r Y r π i π i probability y 2 y 2 Π π 2 Π π 2 Y Y proportional to Learning Progress π 1 y i y 1 Y r π i y 2 π 2 Π Y Recursive splitting or problem space optimizing difference in learning progress
7GM($!%$L+JEM1D!1F!3=$!L%1B3=!1F! G1H0J$R#3-!#D!$R0J1%EM1D ! Optimizing learning progress , i.e. the decrease of prediction errors (derivative) The IAC/R-IAC (Intelligent Adaptive Curiosity) architecture(s) Makes no assumption on the regression algorithm used as “Predictor” (e.g. can be SVE, GP, or non- parametric) IAC: Oudeyer P-Y, Kaplan , F. and Hafner, V. (2007), R-IAC: Baranes and Oudeyer (2009) Related Work: Schmidhuber (1991, 2006)
http://playground.csl.sony.fr (Oudeyer, Kaplan, Hafner, 2007, IEEE Trans. Evol. Comp.) Here a classic non-parametric regressor is used (Schaal and Atkeson, 1994)
D4#?%-;<*.(M*$-.)-?) 6484#-314.0*#) 3*N4;.& ! B#3=! ".(84;&*#&)*.6) 6(84;&(0@ ) *(1+ ! 6(&+-84;@)-?) +-11".(+*$-. ) ) 4DFED3!ED,!K=#J,!N$(C!_^^d ! :%1DM$%)!#D!5$+%1)G#$DG$V!_^^c ! K1DD$GM1D!8G#$DG$V!_^^] !
Recommend
More recommend