Safe Exploration for Interactive Machine Learning Matteo Turchetta, - - PowerPoint PPT Presentation
Safe Exploration for Interactive Machine Learning Matteo Turchetta, - - PowerPoint PPT Presentation
Safe Exploration for Interactive Machine Learning Matteo Turchetta, Felix Berkenkamp, Andreas Krause <latexit
- Agent can query no
noisy sy values of an unk unkno nown n function
- Use data to make inform
rmed queries
- Available queries may depend from previous ones: model dependency with directed graph
- Includes: Bayesian optimization, active learning and exploration of deterministic Markov decision processes
Interactive Machine Learning
Matteo Turchetta
Icon made by Freepik, Good Ware from www.flaticon.com
x
<latexit sha1_base64="hL+FaLtOT9luwfLW3Ut08xl3Pcw=">AB6HicbVDLTgJBEOzF+IL9ehlIjHxRHbRI9ELx4hkUcCGzI79MLI7OxmZtZICF/gxYPGePWTvPk3DrAHBSvpFLVne6uIBFcG9f9dnJr6xubW/ntws7u3v5B8fCoqeNUMWywWMSqHVCNgktsG4EthOFNAoEtoLR7cxvPaLSPJb3ZpygH9GB5CFn1Fip/tQrltyOwdZJV5GSpCh1it+dfsxSyOUhgmqdcdzE+NPqDKcCZwWuqnGhLIRHWDHUkj1P5kfuiUnFmlT8JY2ZKGzNXfExMaT2OAtsZUTPUy95M/M/rpCa89idcJqlByRaLwlQE5PZ16TPFTIjxpZQpri9lbAhVZQZm03BhuAtv7xKmpWyd1Gu1C9L1ZsjycwCmcgwdXUIU7qEDGCA8wyu8OQ/Oi/PufCxac042cwx/4Hz+AOeHjQA=</latexit>f(x) + w
<latexit sha1_base64="6dFmwAauOuGIL1koQpOLW+AHK4o=">AB7XicbVBNSwMxEJ2tX7V+VT16CRahIpTdKuix6MVjBfsB7VKyabaNZpMlyapl6X/w4kERr/4fb/4b03YP2vpg4PHeDPzgpgzbVz328ktLa+sruXCxubW9s7xd29paJIrRBJeqHWBNORO0YZjhtB0riqOA01ZwfzXxWw9UaSbFrRnF1I/wQLCQEWys1AzLT8cnj71iya24U6BF4mWkBnqveJXty9JElFhCMdadzw3Nn6KlWGE03Ghm2gaY3KPB7RjqcAR1X46vXaMjqzSR6FUtoRBU/X3RIojrUdRYDsjbIZ63puI/3mdxIQXfspEnBgqyGxRmHBkJq8jvpMUWL4yBJMFLO3IjLEChNjAyrYELz5lxdJs1rxTivVm7NS7TKLIw8HcAhl8OAcanANdWgAgTt4hld4c6Tz4rw7H7PWnJPN7MfOJ8/r6Oiw=</latexit>Unknown safety constrai raint q(x)>0 that must be satisfied at al all ti times Encompasses many problems
Safety constrained interactive machine learning
Matteo Turchetta
Icon made by Smashicons from www.flaticon.com
x
<latexit sha1_base64="hL+FaLtOT9luwfLW3Ut08xl3Pcw=">AB6HicbVDLTgJBEOzF+IL9ehlIjHxRHbRI9ELx4hkUcCGzI79MLI7OxmZtZICF/gxYPGePWTvPk3DrAHBSvpFLVne6uIBFcG9f9dnJr6xubW/ntws7u3v5B8fCoqeNUMWywWMSqHVCNgktsG4EthOFNAoEtoLR7cxvPaLSPJb3ZpygH9GB5CFn1Fip/tQrltyOwdZJV5GSpCh1it+dfsxSyOUhgmqdcdzE+NPqDKcCZwWuqnGhLIRHWDHUkj1P5kfuiUnFmlT8JY2ZKGzNXfExMaT2OAtsZUTPUy95M/M/rpCa89idcJqlByRaLwlQE5PZ16TPFTIjxpZQpri9lbAhVZQZm03BhuAtv7xKmpWyd1Gu1C9L1ZsjycwCmcgwdXUIU7qEDGCA8wyu8OQ/Oi/PufCxac042cwx/4Hz+AOeHjQA=</latexit>f(x) + w
<latexit sha1_base64="6dFmwAauOuGIL1koQpOLW+AHK4o=">AB7XicbVBNSwMxEJ2tX7V+VT16CRahIpTdKuix6MVjBfsB7VKyabaNZpMlyapl6X/w4kERr/4fb/4b03YP2vpg4PHeDPzgpgzbVz328ktLa+sruXCxubW9s7xd29paJIrRBJeqHWBNORO0YZjhtB0riqOA01ZwfzXxWw9UaSbFrRnF1I/wQLCQEWys1AzLT8cnj71iya24U6BF4mWkBnqveJXty9JElFhCMdadzw3Nn6KlWGE03Ghm2gaY3KPB7RjqcAR1X46vXaMjqzSR6FUtoRBU/X3RIojrUdRYDsjbIZ63puI/3mdxIQXfspEnBgqyGxRmHBkJq8jvpMUWL4yBJMFLO3IjLEChNjAyrYELz5lxdJs1rxTivVm7NS7TKLIw8HcAhl8OAcanANdWgAgTt4hld4c6Tz4rw7H7PWnJPN7MfOJ8/r6Oiw=</latexit>q(x) ≥ 0
<latexit sha1_base64="1WDpB7YDdQ1gSi87g0UxOTPeTV4=">AB8XicbVDLSgNBEOz1GeMr6tHLYBDiJexGQY9BLx4jmAcmS5idJIhs7ObmVkxLPkLx4U8erfePNvnDwOmljQUFR1090VxIJr47rfzsrq2vrGZmYru72zu7efOzis6ShRDKsEpFqBFSj4BKrhuBjVghDQOB9WBwM/Hrj6g0j+S9GcXoh7QneZczaqz0MCw8nbV6OCRuO5d3i+4UZJl4c5KHOSrt3FerE7EkRGmYoFo3PTc2fkqV4UzgONtKNMaUDWgPm5ZKGqL20+nFY3JqlQ7pRsqWNGSq/p5Iaj1KAxsZ0hNXy96E/E/r5mY7pWfchknBiWbLeomgpiITN4nHa6QGTGyhDLF7a2E9amizNiQsjYEb/HlZVIrFb3zYunuIl+nseRgWM4gQJ4cAluIUKVIGBhGd4hTdHOy/Ou/Mxa1x5jNH8AfO5w9OaZAF</latexit>q(x) < 0
<latexit sha1_base64="CtSaWxXLG+cQl1A/lMlyUODJ8dM=">AB7nicbVA9SwNBEJ2LXzF+RS1tFoMQm3AXBS0sgjaWEcwHJEfY2+wlS/Z2z909MRz5ETYWitj6e+z8N26SKzTxwcDjvRlm5gUxZ9q47reTW1ldW9/Ibxa2tnd294r7B0tE0Vog0guVTvAmnImaMw2k7VhRHAaetYHQz9VuPVGkmxb0Zx9SP8ECwkBFsrNR6KD+dXiG3Vy5FXcGtEy8jJQgQ71X/Or2JUkiKgzhWOuO58bGT7EyjHA6KXQTWNMRnhAO5YKHFHtp7NzJ+jEKn0USmVLGDRTf0+kONJ6HAW2M8JmqBe9qfif10lMeOmnTMSJoYLMF4UJR0ai6e+ozxQlho8twUQxeysiQ6wMTahg3BW3x5mTSrFe+sUr07L9WuszjycATHUAYPLqAGt1CHBhAYwTO8wpsTOy/Ou/Mxb8052cwh/IHz+QPFw46K</latexit>Therapy design Mars exploration Model free RL
[Sui et al. 2015], [Sui et al. 2018] [Turchetta et al. 2016], [Wachi et al. 2018] [Berkenkamp et al. 2016]
Build a conserv rvat ative estimat ate of the decisions that are safe to evaluate Uni Uniforml mly reduce uce unce uncertaint nty on the boundary of this region Treating the ex expansion of the safe set as a pro proxy xy objective can be was waste teful Example: 1D optimization task
Existing approaches
Matteo Turchetta
S
p t
<latexit sha1_base64="iTYg1Ft2L5xMtjPOc3OrWXcTUmE=">AB+XicbVDLSsNAFJ3UV62vqEs3wSK4KkVdFl047KifUAbw2Q6aYdOZsLMTaGE/okbF4q49U/c+TdO2iy09cDA4Zx7uHdOmHCmwXW/rdLa+sbmVnm7srO7t39gHx61tUwVoS0iuVTdEGvKmaAtYMBpN1EUxyGnXB8m/udCVWaSfEI04T6MR4KFjGCwUiBbfelsfN09jB7SgI7Kpbc+dwVolXkCoq0Azsr/5AkjSmAgjHWvc8NwE/woY4XRW6aeaJpiM8ZD2DBU4ptrP5pfPnDOjDJxIKvMEOHP1dyLDsdbTODSTMYaRXvZy8T+vl0J07WdMJClQRaLopQ7IJ28BmfAFCXAp4Zgopi51SEjrDABU1bFlOAtf3mVtOs176JWv7+sNm6KOsroBJ2ic+ShK9RAd6iJWoigCXpGr+jNyqwX6936WIyWrCJzjP7A+vwBIxyT+w=</latexit>S
p t
<latexit sha1_base64="iTYg1Ft2L5xMtjPOc3OrWXcTUmE=">AB+XicbVDLSsNAFJ3UV62vqEs3wSK4KkVdFl047KifUAbw2Q6aYdOZsLMTaGE/okbF4q49U/c+TdO2iy09cDA4Zx7uHdOmHCmwXW/rdLa+sbmVnm7srO7t39gHx61tUwVoS0iuVTdEGvKmaAtYMBpN1EUxyGnXB8m/udCVWaSfEI04T6MR4KFjGCwUiBbfelsfN09jB7SgI7Kpbc+dwVolXkCoq0Azsr/5AkjSmAgjHWvc8NwE/woY4XRW6aeaJpiM8ZD2DBU4ptrP5pfPnDOjDJxIKvMEOHP1dyLDsdbTODSTMYaRXvZy8T+vl0J07WdMJClQRaLopQ7IJ28BmfAFCXAp4Zgopi51SEjrDABU1bFlOAtf3mVtOs176JWv7+sNm6KOsroBJ2ic+ShK9RAd6iJWoigCXpGr+jNyqwX6936WIyWrCJzjP7A+vwBIxyT+w=</latexit>Gt
<latexit sha1_base64="PkvxSbIJAC9+r2/QYT+PqTqc0Ag=">AB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkV9Fj0oMeK9gPaUDbTbt0swm7E6GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSKQw6Lrfzsrq2vrGZmGruL2zu7dfOjhsmjVjDdYLGPdDqjhUijeQIGStxPNaRI3gpGN1O/9cS1EbF6xHC/YgOlAgFo2ilh9se9kplt+LOQJaJl5My5Kj3Sl/dfszSiCtkhrT8dwE/YxqFEzySbGbGp5QNqID3rFU0YgbP5udOiGnVumTMNa2FJKZ+nsio5Ex4yiwnRHFoVn0puJ/XifF8MrPhEpS5IrNF4WpJBiT6d+kLzRnKMeWUKaFvZWwIdWUoU2naEPwFl9eJs1qxTuvVO8vyrXrPI4CHMJnIEHl1CDO6hDAxgM4Ble4c2Rzovz7nzMW1ecfOYI/sD5/AEn5I2</latexit>Gt
<latexit sha1_base64="PkvxSbIJAC9+r2/QYT+PqTqc0Ag=">AB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkV9Fj0oMeK9gPaUDbTbt0swm7E6GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSKQw6Lrfzsrq2vrGZmGruL2zu7dfOjhsmjVjDdYLGPdDqjhUijeQIGStxPNaRI3gpGN1O/9cS1EbF6xHC/YgOlAgFo2ilh9se9kplt+LOQJaJl5My5Kj3Sl/dfszSiCtkhrT8dwE/YxqFEzySbGbGp5QNqID3rFU0YgbP5udOiGnVumTMNa2FJKZ+nsio5Ex4yiwnRHFoVn0puJ/XifF8MrPhEpS5IrNF4WpJBiT6d+kLzRnKMeWUKaFvZWwIdWUoU2naEPwFl9eJs1qxTuvVO8vyrXrPI4CHMJnIEHl1CDO6hDAxgM4Ble4c2Rzovz7nzMW1ecfOYI/sD5/AEn5I2</latexit>D
<latexit sha1_base64="vhcSZKB5REty1fMyhkxMk/A5FwM=">AB8nicbVDLSgMxFL1TX7W+qi7dBIvgqsxUQZdFXbisYB8wHUomzbShmWRIMkIZ+hluXCji1q9x59+YaWehrQcCh3PuJeMOFMG9f9dkpr6xubW+Xtys7u3v5B9fCo2WqCG0TyaXqhVhTzgRtG2Y47SWK4jktBtObnO/+0SVZlI8mlCgxiPBIsYwcZKfj/GZkwz+5mg2rNrbtzoFXiFaQGBVqD6ld/KEkaU2EIx1r7npuYIMPKMLprNJPNU0wmeAR9S0VOKY6yOaRZ+jMKkMUSWfMGiu/t7IcKz1NA7tZB5RL3u5+J/npya6DjImktRQRYfRSlHRqL8fjRkihLDp5ZgopjNisgYK0yMbaliS/CWT14lnUbdu6g3Hi5rzZuijKcwCmcgwdX0IR7aEbCEh4hld4c4z4rw7H4vRklPsHMfOJ8/diuRXg=</latexit>Domain D f(s) ≡ q(s) q(s) ≥ 0 # evaluations
Many unnecessary samples when optimum has already been found
StageOPT [Sui et al. 2018]
Goal Oriented Safe Exploration separates IML task and safety
Matteo Turchetta
Id Idea: Let existing IML algorithms solve the task and build add-on module to deal with safety Consider the set of optimistically safe points S
- t
S
p t
<latexit sha1_base64="iTYg1Ft2L5xMtjPOc3OrWXcTUmE=">AB+XicbVDLSsNAFJ3UV62vqEs3wSK4KkVdFl047KifUAbw2Q6aYdOZsLMTaGE/okbF4q49U/c+TdO2iy09cDA4Zx7uHdOmHCmwXW/rdLa+sbmVnm7srO7t39gHx61tUwVoS0iuVTdEGvKmaAtYMBpN1EUxyGnXB8m/udCVWaSfEI04T6MR4KFjGCwUiBbfelsfN09jB7SgI7Kpbc+dwVolXkCoq0Azsr/5AkjSmAgjHWvc8NwE/woY4XRW6aeaJpiM8ZD2DBU4ptrP5pfPnDOjDJxIKvMEOHP1dyLDsdbTODSTMYaRXvZy8T+vl0J07WdMJClQRaLopQ7IJ28BmfAFCXAp4Zgopi51SEjrDABU1bFlOAtf3mVtOs176JWv7+sNm6KOsroBJ2ic+ShK9RAd6iJWoigCXpGr+jNyqwX6936WIyWrCJzjP7A+vwBIxyT+w=</latexit>D
<latexit sha1_base64="vhcSZKB5REty1fMyhkxMk/A5FwM=">AB8nicbVDLSgMxFL1TX7W+qi7dBIvgqsxUQZdFXbisYB8wHUomzbShmWRIMkIZ+hluXCji1q9x59+YaWehrQcCh3PuJeMOFMG9f9dkpr6xubW+Xtys7u3v5B9fCo2WqCG0TyaXqhVhTzgRtG2Y47SWK4jktBtObnO/+0SVZlI8mlCgxiPBIsYwcZKfj/GZkwz+5mg2rNrbtzoFXiFaQGBVqD6ld/KEkaU2EIx1r7npuYIMPKMLprNJPNU0wmeAR9S0VOKY6yOaRZ+jMKkMUSWfMGiu/t7IcKz1NA7tZB5RL3u5+J/npya6DjImktRQRYfRSlHRqL8fjRkihLDp5ZgopjNisgYK0yMbaliS/CWT14lnUbdu6g3Hi5rzZuijKcwCmcgwdX0IR7aEbCEh4hld4c4z4rw7H4vRklPsHMfOJ8/diuRXg=</latexit>S
- t
Unsafe with high probability Could be safe Safe with high probability
safe?
x ∈ S
- t
yes Not sure
Select safe s.t. informative about
z
<latexit sha1_base64="VLEo6VgUnu2TnOxoOkqsMPXvyTo=">AB6HicbVDLTgJBEOzF+IL9ehlIjHxRHbRI9ELx4hkUcCGzI79MLI7OxmZtYECV/gxYPGePWTvPk3DrAHBSvpFLVne6uIBFcG9f9dnJr6xubW/ntws7u3v5B8fCoqeNUMWywWMSqHVCNgktsG4EthOFNAoEtoLR7cxvPaLSPJb3ZpygH9GB5CFn1Fip/tQrltyOwdZJV5GSpCh1it+dfsxSyOUhgmqdcdzE+NPqDKcCZwWuqnGhLIRHWDHUkj1P5kfuiUnFmlT8JY2ZKGzNXfExMaT2OAtsZUTPUy95M/M/rpCa89idcJqlByRaLwlQE5PZ16TPFTIjxpZQpri9lbAhVZQZm03BhuAtv7xKmpWyd1Gu1C9L1ZsjycwCmcgwdXUIU7qEDGCA8wyu8OQ/Oi/PufCxac042cwx/4Hz+AOqPjQI=</latexit>q(z)
<latexit sha1_base64="UB3Gu3DrxBt+1NqI7H08IsdOIWY=">AB63icbVBNTwIxEJ3FL8Qv1KOXRmKCF7KLJnokevGIiYAJbEi3dKGh7a5t1wQ3/AUvHjTGq3/Im/GLuxBwZdM8vLeTGbmBTFn2rjut1NYWV1b3yhulra2d3b3yvsHbR0litAWiXik7gOsKWeStgwznN7HimIRcNoJxteZ3mkSrNI3plJTH2Bh5KFjGCTSQ/Vp9N+ueLW3BnQMvFyUoEczX75qzeISCKoNIRjrbueGxs/xcowum01Es0jTEZ4yHtWiqxoNpPZ7dO0YlVBiMlC1p0Ez9PZFiofVEBLZTYDPSi14m/ud1ExNe+imTcWKoJPNFYcKRiVD2OBowRYnhE0swUczeisgIK0yMjadkQ/AWX14m7XrNO6vVb8rjas8jiIcwTFUwYMLaMANKEFBEbwDK/w5gjnxXl3PuatBSefOYQ/cD5/AIGEjeI=</latexit>q(x)
<latexit sha1_base64="y0Y5F34vkmRhvjvmFkAaWDGa8=">AB63icbVBNTwIxEJ3FL8Qv1KOXRmKCF7KLJnokevGIiYAJbEi3dKGh7a5t10g2/AUvHjTGq3/Im/GLuxBwZdM8vLeTGbmBTFn2rjut1NYWV1b3yhulra2d3b3yvsHbR0litAWiXik7gOsKWeStgwznN7HimIRcNoJxteZ3mkSrNI3plJTH2Bh5KFjGCTSQ/Vp9N+ueLW3BnQMvFyUoEczX75qzeISCKoNIRjrbueGxs/xcowum01Es0jTEZ4yHtWiqxoNpPZ7dO0YlVBiMlC1p0Ez9PZFiofVEBLZTYDPSi14m/ud1ExNe+imTcWKoJPNFYcKRiVD2OBowRYnhE0swUczeisgIK0yMjadkQ/AWX14m7XrNO6vVb8rjas8jiIcwTFUwYMLaMANKEFBEbwDK/w5gjnxXl3PuatBSefOYQ/cD5/AH56jeA=</latexit>z
<latexit sha1_base64="VLEo6VgUnu2TnOxoOkqsMPXvyTo=">AB6HicbVDLTgJBEOzF+IL9ehlIjHxRHbRI9ELx4hkUcCGzI79MLI7OxmZtYECV/gxYPGePWTvPk3DrAHBSvpFLVne6uIBFcG9f9dnJr6xubW/ntws7u3v5B8fCoqeNUMWywWMSqHVCNgktsG4EthOFNAoEtoLR7cxvPaLSPJb3ZpygH9GB5CFn1Fip/tQrltyOwdZJV5GSpCh1it+dfsxSyOUhgmqdcdzE+NPqDKcCZwWuqnGhLIRHWDHUkj1P5kfuiUnFmlT8JY2ZKGzNXfExMaT2OAtsZUTPUy95M/M/rpCa89idcJqlByRaLwlQE5PZ16TPFTIjxpZQpri9lbAhVZQZm03BhuAtv7xKmpWyd1Gu1C9L1ZsjycwCmcgwdXUIU7qEDGCA8wyu8OQ/Oi/PufCxac042cwx/4Hz+AOqPjQI=</latexit>q(z) + ν
<latexit sha1_base64="ctDxgMnICWRa2EXV5xLFAliIdnE=">AB73icbVBNSwMxEJ34WetX1aOXYBEqQtmtgh6LXjxWsB/QLiWbZtvQbHabZIW69E948aCIV/+ON/+NabsHbX0w8Hhvhpl5fiy4No7zjVZW19Y3NnNb+e2d3b39wsFhQ0eJoqxOIxGplk80E1yuFGsFasGAl9wZr+8HbqNx+Z0jySD2YcMy8kfckDTomxUmtUejo78ikWyg6ZWcGvEzcjBQhQ61b+Or0IpqETBoqiNZt14mNlxJlOBVsku8kmsWEDkmftS2VJGTaS2f3TvCpVXo4iJQtafBM/T2RklDrcejbzpCYgV70puJ/XjsxwbWXchknhk6XxQkApsIT5/HPa4YNWJsCaGK21sxHRBFqLER5W0I7uLy6RKbsX5cr9ZbF6k8WRg2M4gRK4cAVuIMa1IGCgGd4hTc0Qi/oHX3MW1dQNnMEf4A+fwBApI90</latexit>f(x) + ω
<latexit sha1_base64="68cZDMEPtQjXc9NHdcx+nrcGbPc=">AB8nicbVDLSgNBEOz1GeMr6tHLYhAiQtiNgh6DXjxGMA/YLGF2MpsMmcyMyuGJZ/hxYMiXv0ab/6Nk2QPmljQUFR1090VJYxq43nfzsrq2vrGZmGruL2zu7dfOjhsaZkqTJpYMqk6EdKEUGahpGOokiEeMtKPR7dRvPxKlqRQPZpyQkKOBoDHFyFgpiCtPZ+dyckA9Uplr+rN4C4TPydlyNHolb6fYlToTBDGkd+F5iwgwpQzEjk2I31SRBeIQGJLBUIE50mM1OnrinVum7sVS2hHFn6u+JDHGtxzynRyZoV70puJ/XpCa+DrMqEhSQwSeL4pT5hrpTv93+1QRbNjYEoQVtbe6eIgUwsamVLQh+IsvL5NWrepfVGv3l+X6TR5HAY7hBCrgwxXU4Q4a0AQMEp7hFd4c47w4787HvHXFyWeO4A+czx9iE5Cr</latexit>Update S
- t, S
p t
<latexit sha1_base64="Nfx8z/1pMec/RMJX+NGbHL54TE=">AC3icbVDLSsNAFJ34rPUVdekmtAgupCRV0GXRjcuK9gFtDJPpB06yYSZG6GEunbjr7hxoYhbf8Cdf+OkzcK2Hhg4nHMPd+7xY84U2PaPsbS8srq2Xtgobm5t7+yae/tNJRJaIMILmTbx4pyFtEGMOC0HUuKQ5/Tlj+8yvzWA5WKiegORjF1Q9yPWMAIBi15ZqkrtJ2l09vxvfDg5HFGiT3wzLJdsSewFomTkzLKUfM725PkCSkERCOleo4dgxuiUwum42E0UjTEZ4j7taBrhkCo3ndwyto60rMCIfWLwJqofxMpDpUahb6eDEM1LyXif95nQSCzdlUZwAjch0UZBwC4SVFWP1mKQE+EgTCTf7XIAEtMQNdX1CU48ycvkma14pxWqjdn5dplXkcBHaISOkYOkc1dI3qIEIekIv6A29G8/Gq/FhfE5Hl4w8c4BmYHz9ApMWnAQ=</latexit>no
New decision set S
- t
Original IML Safety filter Exploit existing IML algorithms Learn about safety only when necessary IML algorithm considers only plausibly safe decisions
x
<latexit sha1_base64="hL+FaLtOT9luwfLW3Ut08xl3Pcw=">AB6HicbVDLTgJBEOzF+IL9ehlIjHxRHbRI9ELx4hkUcCGzI79MLI7OxmZtZICF/gxYPGePWTvPk3DrAHBSvpFLVne6uIBFcG9f9dnJr6xubW/ntws7u3v5B8fCoqeNUMWywWMSqHVCNgktsG4EthOFNAoEtoLR7cxvPaLSPJb3ZpygH9GB5CFn1Fip/tQrltyOwdZJV5GSpCh1it+dfsxSyOUhgmqdcdzE+NPqDKcCZwWuqnGhLIRHWDHUkj1P5kfuiUnFmlT8JY2ZKGzNXfExMaT2OAtsZUTPUy95M/M/rpCa89idcJqlByRaLwlQE5PZ16TPFTIjxpZQpri9lbAhVZQZm03BhuAtv7xKmpWyd1Gu1C9L1ZsjycwCmcgwdXUIU7qEDGCA8wyu8OQ/Oi/PufCxac042cwx/4Hz+AOeHjQA=</latexit>Heuristic-based expansion of the safe set
Matteo Turchetta
- Define a heuristic to measure how informative
is about
- Order uncertain points by heuristic value (cross size)
- Find the point with highest heuristic,
- Explore the safe points that could add to the safe set (blue
shaded region)
ht : D → R
<latexit sha1_base64="BH+w7cRTW/NlP3JBa578bi4Er6I=">ACDnicbVDLSsNAFJ3UV62vqks3g6XgqiRVUFwVdeGyin1AE8JkOm2HTh7M3Cgl5Avc+CtuXCji1rU7/8ZJ2oW2HrhwOde7r3HiwRXYJrfRmFpeWV1rbhe2tjc2t4p7+61VRhLylo0FKHsekQxwQPWAg6CdSPJiO8J1vHGl5nfuWdS8TC4g0nEHJ8MAz7glICW3HJ15MK57RMYUSKSq9SWfDgCImX4gHPZ85Lb1C1XzJqZAy8Sa0YqaIamW/6y+yGNfRYAFUSpnmVG4CREAqeCpSU7ViwidEyGrKdpQHymnCR/J8VrfTxIJS6AsC5+nsiIb5SE9/TndmFat7LxP+8XgyDMyfhQRQDC+h0SAWGEKcZYP7XDIKYqIJoZLrWzEdEUko6ARLOgRr/uVF0q7XrONa/eak0riYxVFEB+gQHSELnaIGukZN1EIUPaJn9IrejCfjxXg3PqatBWM2s4/+wPj8AUTunOE=</latexit>q(z)
<latexit sha1_base64="UB3Gu3DrxBt+1NqI7H08IsdOIWY=">AB63icbVBNTwIxEJ3FL8Qv1KOXRmKCF7KLJnokevGIiYAJbEi3dKGh7a5t1wQ3/AUvHjTGq3/Im/GLuxBwZdM8vLeTGbmBTFn2rjut1NYWV1b3yhulra2d3b3yvsHbR0litAWiXik7gOsKWeStgwznN7HimIRcNoJxteZ3mkSrNI3plJTH2Bh5KFjGCTSQ/Vp9N+ueLW3BnQMvFyUoEczX75qzeISCKoNIRjrbueGxs/xcowum01Es0jTEZ4yHtWiqxoNpPZ7dO0YlVBiMlC1p0Ez9PZFiofVEBLZTYDPSi14m/ud1ExNe+imTcWKoJPNFYcKRiVD2OBowRYnhE0swUczeisgIK0yMjadkQ/AWX14m7XrNO6vVb8rjas8jiIcwTFUwYMLaMANKEFBEbwDK/w5gjnxXl3PuatBSefOYQ/cD5/AIGEjeI=</latexit>q(x)
<latexit sha1_base64="y0Y5F34vkmRhvjvmFkAaWDGa8=">AB63icbVBNTwIxEJ3FL8Qv1KOXRmKCF7KLJnokevGIiYAJbEi3dKGh7a5t10g2/AUvHjTGq3/Im/GLuxBwZdM8vLeTGbmBTFn2rjut1NYWV1b3yhulra2d3b3yvsHbR0litAWiXik7gOsKWeStgwznN7HimIRcNoJxteZ3mkSrNI3plJTH2Bh5KFjGCTSQ/Vp9N+ueLW3BnQMvFyUoEczX75qzeISCKoNIRjrbueGxs/xcowum01Es0jTEZ4yHtWiqxoNpPZ7dO0YlVBiMlC1p0Ez9PZFiofVEBLZTYDPSi14m/ud1ExNe+imTcWKoJPNFYcKRiVD2OBowRYnhE0swUczeisgIK0yMjadkQ/AWX14m7XrNO6vVb8rjas8jiIcwTFUwYMLaMANKEFBEbwDK/w5gjnxXl3PuatBSefOYQ/cD5/AH56jeA=</latexit>S
p t
<latexit sha1_base64="iTYg1Ft2L5xMtjPOc3OrWXcTUmE=">AB+XicbVDLSsNAFJ3UV62vqEs3wSK4KkVdFl047KifUAbw2Q6aYdOZsLMTaGE/okbF4q49U/c+TdO2iy09cDA4Zx7uHdOmHCmwXW/rdLa+sbmVnm7srO7t39gHx61tUwVoS0iuVTdEGvKmaAtYMBpN1EUxyGnXB8m/udCVWaSfEI04T6MR4KFjGCwUiBbfelsfN09jB7SgI7Kpbc+dwVolXkCoq0Azsr/5AkjSmAgjHWvc8NwE/woY4XRW6aeaJpiM8ZD2DBU4ptrP5pfPnDOjDJxIKvMEOHP1dyLDsdbTODSTMYaRXvZy8T+vl0J07WdMJClQRaLopQ7IJ28BmfAFCXAp4Zgopi51SEjrDABU1bFlOAtf3mVtOs176JWv7+sNm6KOsroBJ2ic+ShK9RAd6iJWoigCXpGr+jNyqwX6936WIyWrCJzjP7A+vwBIxyT+w=</latexit>D
<latexit sha1_base64="vhcSZKB5REty1fMyhkxMk/A5FwM=">AB8nicbVDLSgMxFL1TX7W+qi7dBIvgqsxUQZdFXbisYB8wHUomzbShmWRIMkIZ+hluXCji1q9x59+YaWehrQcCh3PuJeMOFMG9f9dkpr6xubW+Xtys7u3v5B9fCo2WqCG0TyaXqhVhTzgRtG2Y47SWK4jktBtObnO/+0SVZlI8mlCgxiPBIsYwcZKfj/GZkwz+5mg2rNrbtzoFXiFaQGBVqD6ld/KEkaU2EIx1r7npuYIMPKMLprNJPNU0wmeAR9S0VOKY6yOaRZ+jMKkMUSWfMGiu/t7IcKz1NA7tZB5RL3u5+J/npya6DjImktRQRYfRSlHRqL8fjRkihLDp5ZgopjNisgYK0yMbaliS/CWT14lnUbdu6g3Hi5rzZuijKcwCmcgwdX0IR7aEbCEh4hld4c4z4rw7H4vRklPsHMfOJ8/diuRXg=</latexit>S
- t
x
<latexit sha1_base64="hL+FaLtOT9luwfLW3Ut08xl3Pcw=">AB6HicbVDLTgJBEOzF+IL9ehlIjHxRHbRI9ELx4hkUcCGzI79MLI7OxmZtZICF/gxYPGePWTvPk3DrAHBSvpFLVne6uIBFcG9f9dnJr6xubW/ntws7u3v5B8fCoqeNUMWywWMSqHVCNgktsG4EthOFNAoEtoLR7cxvPaLSPJb3ZpygH9GB5CFn1Fip/tQrltyOwdZJV5GSpCh1it+dfsxSyOUhgmqdcdzE+NPqDKcCZwWuqnGhLIRHWDHUkj1P5kfuiUnFmlT8JY2ZKGzNXfExMaT2OAtsZUTPUy95M/M/rpCa89idcJqlByRaLwlQE5PZ16TPFTIjxpZQpri9lbAhVZQZm03BhuAtv7xKmpWyd1Gu1C9L1ZsjycwCmcgwdXUIU7qEDGCA8wyu8OQ/Oi/PufCxac042cwx/4Hz+AOeHjQA=</latexit>Previous methods
- Breadth-first search like
- Reason about uncertainty
inside the safe set
GoOSE
- A* like
- Reason about uncertainty
- utside the safe set
Guarantees
Matteo Turchetta
- Sampling inside guarantees safety with high probability
- If necessary for the IML algorithm, the optimistic and pessimistic estimates of the safe set converge to a natural notion
- f largest safe reachable set up to a tolerance in a finite number of time steps
- Thus, except for a finite amount of iterations dedicated to the expansion of the safe set, the IML algorithm performs as
if it had knowledge of the largest safe reachable set from the beginning (e.g. retains no-regret properties)
S
p t
<latexit sha1_base64="w+D8Ggdza5b98fP60SF85hzxeY=">AB+XicbVDLSsNAFJ3UV62vqEs3wSK4KkVdFl047KifUAbw2Q6aYdOZsLMTaGE/okbF4q49U/c+TdO2iy09cDA4Zx7uHdOmHCmwXW/rdLa+sbmVnm7srO7t39gHx61tUwVoS0iuVTdEGvKmaAtYMBpN1EUxyGnXB8m/udCVWaSfEI04T6MR4KFjGCwUiBbfelsfN09jAL4CkJ7Kpbc+dwVolXkCoq0Azsr/5AkjSmAgjHWvc8NwE/woY4XRW6aeaJpiM8ZD2DBU4ptrP5pfPnDOjDJxIKvMEOHP1dyLDsdbTODSTMYaRXvZy8T+vl0J07WdMJClQRaLopQ7IJ28BmfAFCXAp4Zgopi51SEjrDABU1bFlOAtf3mVtOs176JWv7+sNm6KOsroBJ2ic+ShK9RAd6iJWoigCXpGr+jNyqwX6936WIyWrCJzjP7A+vwBIyaT+w=</latexit>D ≡ S
- <latexit sha1_base64="aPHo0DBxI5xW/i1hSlgjmhaje24=">ACXicbVC7TsMwFHXKq5RXgJHFokJiqpKCBGMFDIxF0IfUhMpx3daqYwfbqVRFWVn4FRYGEGLlD9j4G5w2A7QcydLRufdc3uCiFGlHefbKiwtr6yuFdLG5tb2zv27l5TiVhi0sCdkOkCKMctLQVDPSjiRBYcBIKxhdZvXWmEhFBb/Tk4j4IRpw2qcYaSN1beiFSA8xYslV6pGHmI6hJ4whm5fcpveia5edijMFXCRuTsogR71rf3k9geOQcI0ZUqrjOpH2EyQ1xYykJS9WJEJ4hAakYyhHIVF+Mr0khUdG6cG+kOZxDafqb0eCQqUmYWA6s73VfC0T/6t1Yt0/9xPKo1gTjmcf9WMGtYBZLBHJcGaTQxBWFKzK8RDJBHWJrySCcGdP3mRNKsV96RSvTkt1y7yOIrgAByCY+CM1AD16AOGgCDR/AMXsGb9WS9WO/Wx6y1YOWefAH1ucPisua3w=</latexit>
S
p
<latexit sha1_base64="BDjIBKr0FDh8GT4FGnOqEuBfE=">AB9XicbVDLSgMxFL3js9ZX1aWbYBFclZkq6LoxmVF+4B2WjJpg3NJEOSUcrQ/3DjQhG3/os7/8ZMOwtPRA4nHMP9+YEMWfauO63s7K6tr6xWdgqbu/s7u2XDg6bWiaK0AaRXKp2gDXlTNCGYbTdqwojgJOW8H4JvNbj1RpJsWDmcTUj/BQsJARbKzU60prZtn0ftqL+6WyW3FnQMvEy0kZctT7pa/uQJIkosIQjrXueG5s/BQrwin02I30TGZIyHtGOpwBHVfjq7eopOrTJAoVT2CYNm6u9EiOtJ1FgJyNsRnrRy8T/vE5iwis/ZSJODBVkvihMODISZRWgAVOUGD6xBPF7K2IjLDCxNirYEb/HLy6RZrXjnlerdRbl2ndRgGM4gTPw4BJqcAt1aABc/wCm/Ok/PivDsf89EVJ8cwR84nz8VT5Lj</latexit>D
<latexit sha1_base64="vhcSZKB5REty1fMyhkxMk/A5FwM=">AB8nicbVDLSgMxFL1TX7W+qi7dBIvgqsxUQZdFXbisYB8wHUomzbShmWRIMkIZ+hluXCji1q9x59+YaWehrQcCh3PuJeMOFMG9f9dkpr6xubW+Xtys7u3v5B9fCo2WqCG0TyaXqhVhTzgRtG2Y47SWK4jktBtObnO/+0SVZlI8mlCgxiPBIsYwcZKfj/GZkwz+5mg2rNrbtzoFXiFaQGBVqD6ld/KEkaU2EIx1r7npuYIMPKMLprNJPNU0wmeAR9S0VOKY6yOaRZ+jMKkMUSWfMGiu/t7IcKz1NA7tZB5RL3u5+J/npya6DjImktRQRYfRSlHRqL8fjRkihLDp5ZgopjNisgYK0yMbaliS/CWT14lnUbdu6g3Hi5rzZuijKcwCmcgwdX0IR7aEbCEh4hld4c4z4rw7H4vRklPsHMfOJ8/diuRXg=</latexit>S
- <latexit sha1_base64="8KIuPuYz9sGm5Y7S07DX9Y+Qtxc=">AB9XicbVDLSgMxFL3js9ZX1aWbYBFclZkq6LoxmVF+4B2WjJpg3NJEOSUcrQ/3DjQhG3/os7/8ZMOwtPRA4nHMP9+YEMWfauO63s7K6tr6xWdgqbu/s7u2XDg6bWiaK0AaRXKp2gDXlTNCGYbTdqwojgJOW8H4JvNbj1RpJsWDmcTUj/BQsJARbKzU60prZtn0ftqT/VLZrbgzoGXi5aQMOer90ld3IEkSUWEIx1p3PDc2foqVYTabGbaBpjMsZD2rFU4IhqP51dPUWnVhmgUCr7hEz9XcixZHWkyiwkxE2I73oZeJ/Xicx4ZWfMhEnhgoyXxQmHBmJsgrQgClKDJ9Ygoli9lZERlhYmxRVuCt/jlZdKsVrzSvXuoly7zusowDGcwBl4cAk1uIU6NICAgmd4hTfnyXlx3p2P+eiKk2eO4A+czx8Ty5Li</latexit>
S
p
<latexit sha1_base64="BDjIBKr0FDh8GT4FGnOqEuBfE=">AB9XicbVDLSgMxFL3js9ZX1aWbYBFclZkq6LoxmVF+4B2WjJpg3NJEOSUcrQ/3DjQhG3/os7/8ZMOwtPRA4nHMP9+YEMWfauO63s7K6tr6xWdgqbu/s7u2XDg6bWiaK0AaRXKp2gDXlTNCGYbTdqwojgJOW8H4JvNbj1RpJsWDmcTUj/BQsJARbKzU60prZtn0ftqL+6WyW3FnQMvEy0kZctT7pa/uQJIkosIQjrXueG5s/BQrwin02I30TGZIyHtGOpwBHVfjq7eopOrTJAoVT2CYNm6u9EiOtJ1FgJyNsRnrRy8T/vE5iwis/ZSJODBVkvihMODISZRWgAVOUGD6xBPF7K2IjLDCxNirYEb/HLy6RZrXjnlerdRbl2ndRgGM4gTPw4BJqcAt1aABc/wCm/Ok/PivDsf89EVJ8cwR84nz8VT5Lj</latexit>D
<latexit sha1_base64="vhcSZKB5REty1fMyhkxMk/A5FwM=">AB8nicbVDLSgMxFL1TX7W+qi7dBIvgqsxUQZdFXbisYB8wHUomzbShmWRIMkIZ+hluXCji1q9x59+YaWehrQcCh3PuJeMOFMG9f9dkpr6xubW+Xtys7u3v5B9fCo2WqCG0TyaXqhVhTzgRtG2Y47SWK4jktBtObnO/+0SVZlI8mlCgxiPBIsYwcZKfj/GZkwz+5mg2rNrbtzoFXiFaQGBVqD6ld/KEkaU2EIx1r7npuYIMPKMLprNJPNU0wmeAR9S0VOKY6yOaRZ+jMKkMUSWfMGiu/t7IcKz1NA7tZB5RL3u5+J/npya6DjImktRQRYfRSlHRqL8fjRkihLDp5ZgopjNisgYK0yMbaliS/CWT14lnUbdu6g3Hi5rzZuijKcwCmcgwdX0IR7aEbCEh4hld4c4z4rw7H4vRklPsHMfOJ8/diuRXg=</latexit>S
- <latexit sha1_base64="8KIuPuYz9sGm5Y7S07DX9Y+Qtxc=">AB9XicbVDLSgMxFL3js9ZX1aWbYBFclZkq6LoxmVF+4B2WjJpg3NJEOSUcrQ/3DjQhG3/os7/8ZMOwtPRA4nHMP9+YEMWfauO63s7K6tr6xWdgqbu/s7u2XDg6bWiaK0AaRXKp2gDXlTNCGYbTdqwojgJOW8H4JvNbj1RpJsWDmcTUj/BQsJARbKzU60prZtn0ftqT/VLZrbgzoGXi5aQMOer90ld3IEkSUWEIx1p3PDc2foqVYTabGbaBpjMsZD2rFU4IhqP51dPUWnVhmgUCr7hEz9XcixZHWkyiwkxE2I73oZeJ/Xicx4ZWfMhEnhgoyXxQmHBmJsgrQgClKDJ9Ygoli9lZERlhYmxRVuCt/jlZdKsVrzSvXuoly7zusowDGcwBl4cAk1uIU6NICAgmd4hTfnyXlx3p2P+eiKk2eO4A+czx8Ty5Li</latexit>
S
p
<latexit sha1_base64="BDjIBKr0FDh8GT4FGnOqEuBfE=">AB9XicbVDLSgMxFL3js9ZX1aWbYBFclZkq6LoxmVF+4B2WjJpg3NJEOSUcrQ/3DjQhG3/os7/8ZMOwtPRA4nHMP9+YEMWfauO63s7K6tr6xWdgqbu/s7u2XDg6bWiaK0AaRXKp2gDXlTNCGYbTdqwojgJOW8H4JvNbj1RpJsWDmcTUj/BQsJARbKzU60prZtn0ftqL+6WyW3FnQMvEy0kZctT7pa/uQJIkosIQjrXueG5s/BQrwin02I30TGZIyHtGOpwBHVfjq7eopOrTJAoVT2CYNm6u9EiOtJ1FgJyNsRnrRy8T/vE5iwis/ZSJODBVkvihMODISZRWgAVOUGD6xBPF7K2IjLDCxNirYEb/HLy6RZrXjnlerdRbl2ndRgGM4gTPw4BJqcAt1aABc/wCm/Ok/PivDsf89EVJ8cwR84nz8VT5Lj</latexit>D
<latexit sha1_base64="vhcSZKB5REty1fMyhkxMk/A5FwM=">AB8nicbVDLSgMxFL1TX7W+qi7dBIvgqsxUQZdFXbisYB8wHUomzbShmWRIMkIZ+hluXCji1q9x59+YaWehrQcCh3PuJeMOFMG9f9dkpr6xubW+Xtys7u3v5B9fCo2WqCG0TyaXqhVhTzgRtG2Y47SWK4jktBtObnO/+0SVZlI8mlCgxiPBIsYwcZKfj/GZkwz+5mg2rNrbtzoFXiFaQGBVqD6ld/KEkaU2EIx1r7npuYIMPKMLprNJPNU0wmeAR9S0VOKY6yOaRZ+jMKkMUSWfMGiu/t7IcKz1NA7tZB5RL3u5+J/npya6DjImktRQRYfRSlHRqL8fjRkihLDp5ZgopjNisgYK0yMbaliS/CWT14lnUbdu6g3Hi5rzZuijKcwCmcgwdX0IR7aEbCEh4hld4c4z4rw7H4vRklPsHMfOJ8/diuRXg=</latexit>S
p ≡ S
- <latexit sha1_base64="glB063Y2Isu96vKA5BHE7enDFpQ=">AC3icbVC7TsMwFHV4lvIKMLJYrZCYqQgwVjBwlgEfUhNqBzXa06drCdSlXUnYVfYWEAIVZ+gI2/wWkz0JYjWTo65x5d3xPEjCrtOD/Wyura+sZmYau4vbO7t28fHDaVSCQmDSyYkO0AKcIoJw1NSPtWBIUBYy0guF15rdGRCoq+L0ex8SPUJ/TkGKkjdS1S54wdpZO7yYPsUceEzqa0TXLjsVZwq4TNyclEGOetf+9noCJxHhGjOkVMd1Yu2nSGqKGZkUvUSRGOEh6pOoRxFRPnp9JYJPDFKD4ZCmsc1nKp/EymKlBpHgZmMkB6oRS8T/M6iQ4v/ZTyONGE49miMGFQC5gVA3tUEqzZ2BCEJTV/hXiAJMLa1Fc0JbiLJy+TZrXinlWqt+fl2lVeRwEcgxI4BS64ADVwA+qgATB4Ai/gDbxbz9ar9WF9zkZXrDxzBOZgf0C6EqcOg=</latexit>
Iteration = 0 Iteration = 10 Iteration = 40 Iteration = 100
Qualitative comparison for a 1D optimization task
Matteo Turchetta
Domain D Domain D f(s) ≡ q(s) q(s) ≥ 0 # evaluations
StageOPT [Sui et al. 2018] GoOSE (ours)
Quantitative comparison for optimization task
Matteo Turchetta
10 20 30
Iterations
0.0 0.2 0.4 0.6
Cumulative regret
SafeOpt StageOpt GoOSE 10 20 30 40 50
Iterations
Algorithms: SafeOPT [Sui et al. 2015], StageOPT [Sui et al. 2018], GoOSE (ours) Safe average regret: where is the largest safe set reachable from
A(S0)
<latexit sha1_base64="xNo5Vwm1BW0k29gHpQHSiOmj49E=">AB7XicbVDLSsNAFL2pr1pfVZduBotQNyWx4mNXdeOyon1AG8pkOmnHTjJhZiKU0H9w40IRt/6PO/GaRpErQcuHM65l3v8SLOlLbtTyu3sLi0vJfLaytb2xuFbd3mkrEktAGEVzItocV5SykDc0p+1IUhx4nLa80dXUbz1QqZgI7/Q4om6AByHzGcHaSM2L8m3PuwVS3bFToHmiZOREmSo94of3b4gcUBDThWquPYkXYTLDUjnE4K3VjRCJMRHtCOoSEOqHKT9NoJOjBKH/lCmgo1StWfEwkOlBoHnukMsB6qv95U/M/rxNo/cxMWRrGmIZkt8mOtEDT1GfSUo0HxuCiWTmVkSGWGKiTUCFNITzKU6+X54nzaOKU61Ub45LtcsjzswT6UwYFTqME1KEBO7hEZ7hxRLWk/Vqvc1ac1Y2swu/YL1/ATlyjmE=</latexit>S0
<latexit sha1_base64="eOf5PesBuXuamhC7xK71pUnYDp0=">AB6nicbVDLSsNAFL2pr1pfVZduBovgqiS2+NgV3bis1D6gDWUynbRDJ5MwMxFK6Ce4caGIW7/InX/jJA2i1gMXDufcy73eBFnStv2p1VYWV1b3yhulra2d3b3yvsHRXGktA2CXkoex5WlDNB25pTnuRpDjwO1605vU7z5QqVgo7vUsom6Ax4L5jGBtpFZraA/LFbtqZ0DLxMlJBXI0h+WPwSgkcUCFJhwr1XfsSLsJlpoRTuelQaxohMkUj2nfUIEDqtwkO3WOTowyQn4oTQmNMvXnRIDpWaBZzoDrCfqr5eK/3n9WPuXbsJEFGsqyGKRH3OkQ5T+jUZMUqL5zBMJDO3IjLBEhNt0ilIVylOP9+eZl0zqpOrVq7q1ca13kcRTiCYzgFBy6gAbfQhDYQGMjPMOLxa0n69V6W7QWrHzmEH7Bev8C6oeNsQ=</latexit>1D 2D
Safe shortest path in deterministic MDPs
Matteo Turchetta
Icon made by Payungkead from www.flaticon.com
= fixed goal = unsafe transition = safe shortest path Assumptions:
- Known, deterministic model
- Unsafe transitions unknown a priori
Comparison for safe shortest path in deterministic MDPs
Matteo Turchetta
Algorithms: SMDP [Turchetta al. 2016], SEO [Wachi et al. 2018] (optimizes exploration cost), GoOSE (ours, optimizes sample efficiency) Setting: 100 random synthetic squared maps with size 20,30,…,90 = 800 synthetic maps Plot: geometric mean of ratio with respect to uninformed baseline (SMDP)
2000 4000 6000 8000
World size
0.0 0.5 1.0
Ratio to SMDP
GoOSE SMDP SEO Cost of exploration 2000 4000 6000 8000
World size
Number of samples to first path 2000 4000 6000 8000
World size
10 20 30
Ratio to SMDP
Computation time
Setting: 4 start-goal destination pairs on 16 maps of different areas on Mars = 64 scenarios Table: geometric mean of ratio wrt SMDP
- ptimiza-
samples. GOOSE SEO Sample 30.0 % 38.4 % Cost 12.7 % 0.7 % Time 37.8 % 518 %
Conclusions
Matteo Turchetta
We introduced GoOSE, an add-on module for general IML algorithms that:
- Provides high probability sa
safety gua uarant ntees
- Pre
Preserv rves pro prope pert rties over the IML algorithm over the largest safe reachable set
- Is applicable to a wid
wide ran range of pro roblems, including safe Bayesian optimization, safe active learning and safe exploration in deterministic Markov decision processes
- Greatly impro