DAC: The Double Actor-Critic Architecture for Learning Options - - PowerPoint PPT Presentation

dac the double actor critic architecture for learning
SMART_READER_LITE
LIVE PREVIEW

DAC: The Double Actor-Critic Architecture for Learning Options - - PowerPoint PPT Presentation

DAC: The Double Actor-Critic Architecture for Learning Options NeurIPS 2019 Shangtong Zhang, Shimon Whiteson Presenter: Ehsan Mehralian March 17, 2020 Outline Problem statement Option Critic Double Actor Critic Problem statement


slide-1
SLIDE 1

DAC: The Double Actor-Critic Architecture for Learning Options

NeurIPS 2019

Shangtong Zhang, Shimon Whiteson

Presenter: Ehsan Mehralian March 17, 2020

slide-2
SLIDE 2

Outline

  • Problem statement
  • Option Critic
  • Double Actor Critic
slide-3
SLIDE 3

Problem statement

  • Temporal abstraction is a key component in RL:
  • Better exploration
  • Faster learning
  • Better generalization
  • Transfer learning
  • MDP + Temporal Abstract actions = SMDP
  • SMDP algorithms are data inefficient —> The option framework (Sutton et al., 1999)
  • Rises two problems:
  • Learning options
  • Learning a master policy

3

slide-4
SLIDE 4

Previous works

  • Based on finding subgoals:
  • Difficult to scale up
  • Can be as expensive as the entire task
  • Using value-based methods:
  • Can’t cope with large action spaces
  • Policy based methods have better convergence

properties with function approximation

4

slide-5
SLIDE 5

The Option Critic framework (PL Bacon et.al, 2017)

  • Blurs the line between discovering options and learning
  • ptions
  • The first scalable end-to-end approach
  • No slow down within a single task
  • Faster convergence in transfer learning

5

slide-6
SLIDE 6

Background

  • MDP
  • Goal:
  • Policy Gradient

M ≡ {S, A, R(s, a), P(s0|s, a), P0(s), γ}

<latexit sha1_base64="ozDunxBrI7/gQrzaWm7L8O9tGs=">ACG3icbZDLSsNAFIYn3q23qks3g0WsEriBV162bgR6qW10JRyMp2gzNJnJkIJfY93Pgqblwo4kpw4ds4jVlo6w8DH/85hzPn9yPOlHacL2tsfGJyanpmNjc3v7C4lF9eqaowloRWSMhDWfNBUc4CWtFMc1qLJAXhc3rt35wM6td3VCoWBle6F9GgE7A2oyANlYzv32GPXobszvsJZf2kX1RVDZs2bhcVJv3KZabTlFtYdvrgBCAvX4zX3BKTio8Cm4GBZSp3Mx/eK2QxIGmnBQqu46kW4kIDUjnPZzXqxoBOQGOrRuMABVSNJb+vjDeO0cDuU5gUap+7viQSEUj3hm04BuquGawPzv1o91u2DRsKCKNY0ID+L2jHOsSDoHCLSUo07xkAIpn5KyZdkEC0iTNnQnCHTx6F6nbJ3Sntne8WDo+zOGbQGlpHReSifXSITlEZVRBD+gJvaBX69F6t6s95/WMSubWUV/ZH1+A9nFnZU=</latexit>

π∗ = arg max

θ

ρ(πθ) = arg max

θ

Eτ∼pθ(τ)[

X

t=1

γt−1rt|s0, πθ]

<latexit sha1_base64="pYEcPR864IoyROfGrRImFH96CHg=">ACu3icbVHbtNAEF2bWwmXBnjkZURU4SAT2VxEXyIVEBIvSOGStlI2sdabtb3Ua5vdcUXk+id5429YJxGUtiOtdPacGc3MmbjKpcEg+O24167fuHlr53bvzt1793f7Dx4emrLWXEx5mZf6OGZG5LIQU5SYi+NKC6biXBzFJ+87/ehUaCPL4huKjFXLC1kIjlDS0X9X3ufgIoftTwF2nz13/pfPOzoQ8Tzw9W8NJFHhmCD5NmVIMaNujlVw8gzEwnVLFfkYNxUwga6nOSvCs+pcZwhVpYH+YxXHzoe04VgM1UkH1L8Hr2GE7o6ZWUYPjsF0VBYJrtrNFIsGn4etjvDMRIEP51vOo/4gGAXrgMsg3IB2cbEmkCXJa+VKJDnzJhZGFQ4b5hGyXNh162NqBg/YamYWVgwJcy8WXvfwp5lpCU2r4CYc2er2iYMmalYpvZbW0uah15lTarMdmfN7KoahQF3zRK6hywhO6QsJRacMxXFjCupZ0VeMY042jP3bMmhBdXvgwOX4zCl6PXn18NDt5t7dghj8kT4pGQvCEH5COZkCnhzr6zcFInc8cud7+7+SbVdbY1j8h/4dZ/AHcW1q0=</latexit>

∂ρ ∂θ = X

s

dπ(s) X

a

∂π(s, a) ∂θ Qπ(s, a)

<latexit sha1_base64="/x6PvheGy3sArpYDHIY2QZBqjE=">ADM3icbVJNixMxGM6MX2v96urRy4ulOJWxzOgueimsiqAHoat2d6Fph0yacPOl0lmsczmP3nxj3gQxIMiXv0PZqZVa7svBJ48z/udhHnMpfK8L5Z97vyFi5e2LjeuXL12/UZz+aBzApB2YBmcSaOQiJZzFM2UFzF7CgXjCRhzA7D42eVfnjChORZ+lbNczZKyDTlEadEGSrYtl62XwFm7wp+Arh84z5xXzvSJR0X+o68e1rDfuA5sgMunpIkIYB1o41zPr4HPSBihPyPixmjFNBazDByj/mU6cIYbmJuahWH5XFcKQBLnkD+z8Gp2I4eYlkQal6vh6XmKeRmutFG+NS3fe1CNSpDwXVkuOGjgShJY4J0JxEkPVlV65Lmv0oM4uYWJy51ybIWvCjLgWn/N6KZs59v9EGjVotryuVxtsAn8JWmhp/aD5CU8yWiQsVTQmUg59L1ejsqpAY6YbuJAsJ/SYTNnQwJQkTI7K+s01tA0zgSgT5qQKanY1oiSJlPMkNJ7VruW6VpFnacNCRY9HJU/zQrGULgpFRQwqg+oDwYQLRlU8N4BQwU2vQGfE7EuZb9YwS/DXR94EBw+6/sPu7v5Oa+/pch1b6Da6gxzko0doD71AfTRA1Ppgfba+Wd/tj/ZX+4f9c+FqW8uYW+g/s3/9Bu9JBrA=</latexit>

dπ(s) =

X

t=0

γtPr(st = s|s0, π)

<latexit sha1_base64="FD9wRK489dXj4FzHPCVtNLhkGI=">ADcXicbVLbtNAFJ04PEp4pYUNQqCrRhFOMZFdQLCJVEBIbJBSIG2lTGKNJ+NkVL+YGVdErtf8Hzt+g0/wNhxm5D0SpbOnPs69/p6ScClsu3fNaN+7fqNm1u3Grfv3L13v7m9cyTjVFA2oHEQixOPSBbwiA0UVwE7SQjoRewY+/0Q+E/PmNC8j6puYJG4VkGnGfU6I05W7XfrY/A2bfU34GOPtqvbO+mNIiHQv6pnx2XsK+a5uyAxaekjAkgPNGyd8vAc9IGKQ/LDzbCaMUVyLGYxmNp7yXTgijDQLzXzvOxjXnAkBSx5CMkywCzYTj7EMg3dTPWcfJxhHvlqni9kjDP1wsmFq86la1uw2nKk9fmC0AwnRChOAihk5SvPqkPyvISJrp4wnM9ZUnoGdfyE15uZbPG4UWm9jaWZS5Ka+X2UjlU0hX0hSld1ZOX4jtus2V37dJgEzgVaKHK+m7zF57ENA1ZpGhApBw6dqJGWSGPBixv4FSyhNBTMmVDSMSMjnKyovJoa2ZCfix0F+koGRXMzISjkPR1Z/Cm57ivIq3zDVPlvRxmPklSxiC4a+WkAKobi/GDCBaMqmGtAqOBaK9AZ0ctW+kgbegnO+sib4Gi/67zsvj581Tp4X61jCz1Gu8hEDnqDtAn1EcDRGt/jIfGE+Op8bf+qA713UWoUatyHqD/rP78H3fwF8A=</latexit>

6

slide-7
SLIDE 7

The Options Framework

  • A Markovian Option is a triple:
  • Initiation set
  • Intra-option policy
  • Termination function
  • Let denote the intra-option policy of option

parametrized by θ and , the termination function of parameterized by

ω ∈ Ω

<latexit sha1_base64="Rjw+NA/bZtbIfRtrZvLfiLPxTE=">ADhHicbVLbtNAEHViKMVcmsIjLyOiqA4ykU2p4CWogJB4QaRA2krZxFpv1smqXtvsrisi1z/CZ/HG37B23CYkHcnS2TO3M+MJ0ohJ5bp/G03zt2de7v3rQcPHz3ea+0/OZVJgdkiRKxHmAJY1YTIeKqYiep4JiHkT0Lj4WPrPLqmQLIl/qEVKxzPYhYygpWm/P3G784XQPRnxi4B5d+d9843Wzq468DAlgdXFRz4ri274KAZ5hwDKqwOStnkBfQBixni+JefIzWnChdIzBOwtfeG6cItYaBfah4E+aei5HAGSDIO6SrALtluMUIy436u+l4xyRGLQ7UoljImuXrpFcJXV9J3HVhvOdb6QoFJjlIsFMRlLKtWfdpA9VeQlTXTxlhZ6yIvSMG/kpq7ayXePkOlN7rc6qznVtLd1dSYdau4KBsKWv+vJGfdCacz3ZrFgL6W0G+13Z5bGWwDrwZto7aB3/qDpgnJOI0VibCUI89N1TgvFZOIFhbKJE0xucAzOtIwxpzKcV4dUQEdzUwhTIT+YgUVu56RYy7lgc6svx5ctNXkrf5RpkK345zFqeZojFZNgqzCFQC5UXClAlKVLTQABPBtFYgc6z3r/TdWnoJ3ubI2+D0Vc87B2dvG4f6jXsWs8M54btuEZb4xj47MxMIYGaTaB0236Zk7pmMemkfL0Gajznlq/Gfmu38uAxzM</latexit>

(Iω, πω, βω)

<latexit sha1_base64="y5Npujmvok+tBwiIuFdrI2yQBk=">ADsHicbVJb9MwFE4bLiNc1sEjL0dUFSnKqmSA4KXSACHBA6JjdKtUZ5HjOq1ZbtjORJXl7/EDeOPf4KTphXZHsvSd79y+Y9tPQyakbf9tNPVbt+/c3btn3H/w8NF+6+DxmUgyTuiQJGHCRz4WNGQxHUomQzpKOcWRH9Jz/JDGT+/olywJP4u5yl1IzyNWcAIloryDhq/O18A0Z8ZuwKUn1rvrG+msHDXgoEpnl9XcODZpuiChaY4ijCgwuiglF28gD5gPkUR/uXlSM6oxAXiswRMFV0xXbghDZQnZ76fyxKDmeABIsgXSeYJdstxkhkZfLvlNc5IjFgZwXCxkXuTx0Cu7Ja+HZFmyOdJW+gGOSoxRzyXAIpaxiw62H9KFqL2CimqesUFtWhNpxqz5l1a3s9jhZVqo0Vn3WfZW0u21dKi1SxhwU3iyL1bqVTVKIjpVs1kM6GsJDfOzWqpi+WK89X41d+12u17Z5dGewCpwZtrbaB1/qDJgnJIhpLEmIhxo6dSjcvtyMhLQyUCZpicomndKxgjCMq3Lz6cAV0FDOBIOHqxBIqdrMix5EQ8hXmeVDi+1YSd4UG2cyeOvmLE4zSWOyGBRkIcgEyt8LE8YpkeFcAUw4U1qBzLB6K6n+uKEuwdleRecHfWcl73XJ6/ax+/r69jTnmrPNFNztDfasfZJG2hDjTQPm6dN1HT1I32kezpepDYbdc0T7T/Tf/wDVxUuRg=</latexit>

πω,θ

<latexit sha1_base64="tqwsMDq9j0htVrb4RvMWMHqeE=">AH+niclVLb9tGEGYelRL1ESc5jKIYisZYNsEyQXAXkgQHoIKreVbUArEyt6JS3MV7hLKwK1PyWXHBIEvfaX9JZ/0yEpipTMxi0Bw7Oz8183FGHIcuF9I0v1y7fuPmN43mrdutb7/7/oc7O3fvHYkgjhw2cAI3iE7GVDCX+2wguXTZSRgx6o1djw+f5neH1+wSPDA/0MuQjby6NTnE+5QiS7bmOn/QYIexvzCyDJ793n3d90aVGF/q6Cwzs2+bujCgS6bU8ygQ1WqTkJ/+CD2g0ZR49J2dEDljkioSzQLQ8XbtMaAmDPAkZ+Nx8kqlPhoDEdyDsAzQU6+hkTEnp3InqVOE8L9iVyonMZpIvctFdlyKWyzC9WSI+Q3iaiTkJBGklMXUlqclwV6UGWXsAZJg+5wi4zB/a4hQ95psrlHIcFEm9b7TJPkRupmyV1WHGX0I90YcueWLNHNAk8NsXa3Afya2q2vov2FXmVkWP69MY6/PRg/x2KWeI7srfvj/AgnZlZqbpsZueoL2IbkKYc1DRT8I/TvWSpjn4fbJsp+WCRLBZmvRaBV6kUdpdOlMA7tAYWN6mGuQdKF2aJMn1hD4oRzJiJjiqndKAXCNHBLBtHzKBbsL8pGVS0Qbodw6iy76T0965AHFUBWBTJ5LoUmhQiyaIfW9aLghEVTBqnSpDp7grO8jDcOyk7ktscw896ivt1HAcjardpKtBvJExv9crKR4s4p4cXU0cu/G9Pz75tUPTs0q2skgm7tsFf9DxfodLVc03VCMEeXLSxG5hkvYSHWlMGvBS2069eJ0Sq5Whet2/q8OJtSXfr414K06de2dXfPAzB64bFgrY1dbPX17529yFjix3zpuFSIoWGcpSk5R2XqRaJBQupc06nbIimTz0mRkn26VLQRs8ZTI/3wJmbeKSKgnxMIbY2T6iyW271Jn3d0wlpOno4T7YSyZ7+SFJrELMoD0OwhnPGKOdBdoUCfiyBWcGUWNJX4tWyiCtd3yZePopwPr54PHh492n71YyXFLe6A91HTN0p5oz7TXWl8baE5j3njf+Nj41Fw2PzQ/N/MQ69fW2HuaxtP869/AJOtsaA=</latexit>

βω,ϑ

<latexit sha1_base64="qI5LWT6bJ+rKJl/HL5g8V0SgRo=">AIFXiclVLb9NAEDavBMKrwJHLiCqKTdPK5iG4ROIhJDgUiAFKU6tjbNJV/UL7qhcvwnuPBXuHAIa5I3Pg3jNdx7KSGgqUos7Mz3zedY7DBzGha7/OnHy1OkztfrZc43zFy5eurx25eoO96PQpj3bd/zw7ZBw6jCP9gQTDn0bhJS4Q4e+Ge4/TvfHNCQM97LQ4DOnDJxGNjZhOBLutKbaP5HEz6LmIHYMav2g/bL1XeJlobuipvzaTZtXSVa9A2J8R1CZhJo2kGbPcmdICE9Ml763YFHtUkMQM93xQcXfh0aAiDHAl9obD+EmS+kgEJmcuBEWAmnq1pG/yLVi0TGS3dhk3lgcJhmN3VhsGkloiRm39DaUSw6Q3zgkdmwGJBSMOJDSkrLeZEOSHgOIwQPWIJdSgf2uJIfMKnKUYztPBN3G80CJ8dG6npBHebcBXRDlVuiwxfsMdv0XTrB2swD80VqNprqM+xKupO8x8VqiPUXa62D+dilmW25/zw/wAJZ6YsNbV0Sa78AlZTMsh+RQM5/xD9G0Yi2Wfhlo6yb+dgqSDThQikTD2vk6hkxrVtq4eh+U6qYeaBwoUodQXNiAfQcmMt5JiSntqnsFbiLK0RATVgM1lyaCkDdJtaVqZfSulv3FMxk45AYsimUyXJNcJH3Y4lqUTCipApGLaBkdp3bAdZvKYZVjy1BJaZp7k723kaRmbeTvxSgMZkPafByvO32xiulF5NDLv0vT8+eRVD07FUbTinpw7eRT/oWL1GS2OaHpCMYXLy/NyDScwRLUscIsBC+0aVWL0yq4GiWuq/h/HUyoLv1wZcDld/yIvI0l7DK0tbaub+nygaOGMTfWlfnTtdZ+miPfjlzqCdshnPcNPRCDOGVmOxQrRZwGxN4nE9pH0yMu5YNY3moJNEzgrEf4s8TIL3ljJi4nB+6Q4xMP2Z8dS91Vu31IzG+P4iZF0SCenZWaBw5IHxIr0gYsZDawjlEg9ghQ65g7xGUX+BF2kARjNWjxo7t7aM21t3t+sP3g0l+Oscl25oaiKodxTHihPla7SU+zah9qn2pfa1/rH+uf6t/r3LPTkiXnONWXpqf/4DWvevJg=</latexit>

ϑ

<latexit sha1_base64="xXOrNJltigFZXxhlJkZbE/5g/g=">AIHiclVLb9NAEDYFkhJeBY5cRlRbJpWNu9LpAJCgMiBdJWilNr427SFX7hXbdUrn8KF/4KFw4gBDf4NYzXceykpgVLUWZnZ75tZexg4jAtd/3Vm4ey587X64oXGxUuXr1xdunZ9k/tRaNOe7Tt+uD0knDrMoz3BhEO3g5ASd+jQreG7p+n+1j4NOfO9t+IwoAOXjD02YjYR6LKu1R40X4J30dsH8z4Tftx+7XK20RrQ1flrSNpdi1d5Rq0zTFxXQJm0miaAdu5DR0g4dh0yQcrNsUeFSQxwz0fVNydejSoCANcib3hMH6WpD4SgcmZC0ERoKZeLembPHKtWHSMZCc2mTcSh0lGYycWq0YSWuKIW3obyiUHyG8UEjs2AxIKRhxIaSWl5aRIByQ8h10ED1iCXUoH9jiXHzCpynGMjTwTdxvNAifHRup6QR0m3AV0Q5VbosOn7DHb9F06xtrMA/NVaja6gvsSrqTvMfpaoj1p2utg/nYpZpltif8H8fCWemLHVg6ZJc+QDmUzLIfkUDOf8Q/StGItln4ZaOsm/kYKkgB1MRSJl6XidRyRHXNqwehuY7qYaZBwoXoRSX1iBfAQlM95KintqXkGbyHKzBIRVANWZyWDkjZIt6VpZfatlP7KRmb5QsimQyXJNcpFE3o8lqkXBiJIqGDWFktlp3qkdZPGaZljxgSWwzEHmSU5uI0/L2EzaiecayIC0/7xYcX6yielG5dHIvDPT8/ebVz04FVfRinty7uRV/IeK1Xe0uKLpDcUYXhxempFpeAQzUKcKMxW80KZVLU6r4GqUuM7jnziYUF368dyAy/f4MXnRWwYvYzemprW0rK/p8oHjhjExlpXJ07Wfpq7vh251BO2QzjvG3ogBnFK0nYoIkecBsR+R8a0j6ZHXMoHsfzAJdBEzy6M/B/ngDpLWfExOX80B1iZPpe4/N7qbNqrx+J0aNBzLwgEtSzs0KjyAHhQ/q1hF0WUls4h2gQO2TIFew9gich8JvaQBGM+ZaPG5t31oy7a/c37i2vP5nIsajcVG4pqmIoD5V15bnSVXqKXftY+1z7WvtW/1T/Uv9e/5GFLpyZ5NxQZp767z8iMDw</latexit>

7

Iω ⊂ S

<latexit sha1_base64="8Se2G0e5OrfHaB80mCWkNJfsvL4=">AK8niclVZb+NEFPYut8ZctguPvBxRbFpNrK5CIQUtAtCshVW7rprhQ31sRxUrO+4Rm3VK5/Bi8gBCv/Bre+DecmfEtqbdLMU5c+ZcvPNmRkvksCnzD+vXf/tdfePOtnZ769jvdg9+H7pzTOUtebunEQp8XhHqBH3lT5rPAe56kHgkXgfds8eJbPv/swkupH0dP2VXinYVkHfkr3yUMVc7D3k7/AGzv58y/ADs/GT4Z/qjRIdGHcKTRwbUQjxDozoM7TUJQwJ2ofbtxJ9/DGMg6doOyS9ObrNzj5HCTs9j0HC21ujQYQY4YueLRf5dwXUkA5v6ISNgca1ejGzaRY6ORubxTy3/WjFrgoJY56zR2aROuyaOsYQ2inPEN8qJW5uJyRlPgmAwypawzLJGER4CksMnvgFVikUWOWf+ILVm7GOK48cVbtN3Gq2AjdaKBDiZ3BUapRh41pjR697Tj01pjbj8A+5KLa17HqoS6qGqsRwvMX4/1MfpjlZr0HJb48P8CAUtRpLp0DAGuvQDbLjLkrKOACn+K+n2zEOiluWMg7cdVME7IZU0CaUOv8hQauab6sTNF02qGcyg10KgwSir4hX2oWlAgo4Oi6dKpVnQAUbZGIEzYRHm5RBixuEO9D1NvoBh79/h8dp2wGTIhjJS8VJRKr6nFYNylo0WIFrepQwpv73VmBtNd108kvHYZpLqWmuL2Myk2iKcvJtwqQgfT/ubHyamULO8zarSG1G93z8p3X3TgdW9HJp6LvxFZ8hYzde7TZonyHog1tFo97SA6vYSPUncTUhDfcDLrJGTRYzRbW7fi3NiZ0p36y1eDiHL9BL2rbwduxcaqS1f7BPLfkVYAm+eZxJO6IvH1gCY1IPt/nHBzKs87NEu6810zSXknXEOiSUHm4EeWS4L8E+/Ax2+NqAVdFL7oVh5cpFfVRN0mE8aFx6Um5/wE/GQv06QbJTwNRqNhifOU2G4mk+qGm/vqckTSNL6HKqfYrhNAJcVzhyCeFMOECx4J/eMWm8pYTB5yEq/fxIuWgInBp0Bg4f/kufyqFmD4y7oNkfXqiKwKkSURWZ2ILGySHzhXfsTq2ZcuCyZYVuQ2Dpz57fUQy7GEUrlpbtXmljS30Nyqza2ytbZSQ91z5XK3FBZXWPWXi6U21ypuxgX1GDbA7p4xMsQDNwWzFPaU8jlydv+xl7GbhV7E3IBQOjONhJ3lfNu5gVeodka9hLgvyNqboRiR0KNnufhkK6CPGlzmOMUfVi60bY+chJRehQu05EXS7Tmu7JqbZWz15VnuR0nGvMiViVZACwG/v0HSz/FhgquUCBu6iNWcM8Jni0MvxJVJMHcLvmcPrJyPx09PnxZ3uPvynp2FE+VD5SNMVUvlAeKxPlSJkqbi/u/dr7vfeHytTf1D/Vv6Tp/XulzwfKxqP+/R9JY6eQ</latexit>

βω : S → [0, 1]

<latexit sha1_base64="kIGJdm2Hahv0mRTpuWYC1/G8vak=">ALF3iclVZb+NEFPYut425deGRlyOqKDZNI5uLQCsF7YKQAnLVlmx2V4oTa+I4qVnbMZ5xS+X6X/DCX+GFBxDiFd74N5yZ8S2pt10sxTlz5ly+82ZGS/iwKfMP69c/eV197/Y17HfXNt95+5929+89oZs0cb2Juwk2ybMFoV7gR96E+SzwnsWJR8JF4D1dP+az89xLqb6LH7DL2ZiFZR/7KdwlDlXO/c9g9Atv7MfXPwc7G/Uf97zXaJ3ofTjTauxLiWNoVIe+vSZhSMDO1a4d+/OPYAgkWdsh+cnJbHbmMZLbydkGNJytNDq0mAGO2NlikX2Tcx1JwaZ+CHFtoHGtnk9tmoZOxoZmPs9sP1qxy1zCmGfs0MwTh1Rx+hDM+UM8a0S4mZ2TBLmkwA4rLwxLJIMQYSnsMTgsZ9jlUKBNe74x75g5XqM09ITZ9VuHaeMjdCNGjoU2BmcJBp12JBW6NHb3oTeGnP7EdjHXFS72rdYlVDnZY3VaIH5q7E+RH+sUpOe/QIf/p8jYCmKVBeOIcA1F2DXRYacthRQ4k9Qf2DmAr0dwyk/bQMxgm5qEgTehlnlwjV1Q/dSZoWs5wDqUGahVGSQS/cABlCwpktJfXTrRSg/awyhbQ4ygmXC4TRk0uEG4PV1vou9x+Ae3eDxpOmBSBCN5KTkpSWJlPQ5rJwUtGqygVRVKeHO/WyuQ9rpuOtmFwzDNhdTkN5dRuk0RTnZTgEykP4/N1ZWrmxuh2mzNaR2q3tevPaG6dlKzrZRPSd2IovkbF9j9ZblO9QtKH14nEPyeEVbIW6lZiK8JqbXjs5vRqr2cC6G/GxoT21I92Glyc49foRW0zeDM2TpWy2j2aZ5a8CtAk2z6OxB2RNQ8soRHJ5wecg2N51rlpzJ362k+Ke+EK4g1Kcgc/MhySZAd4+l3pMOXBjSCjio/FEtPLvKraoQO81HtwoNy8zE/EY/5a4xko4SvwWDQHzuPheFqPnpQU389RkjSbK5gDKn2i0RQivEYkjG+XChAscC/7hFZvIW04cBKu3sUTL1oCJgafAoGF/4Pn8qtagOEv6yZE1sjskpElkRktSKysEm+41z5EatmX7gsmGBZkls7cOZ310MsxIK5ba5VZlb0txCc6syt4rW2kNVc8Vy91QWFxhV8uGKG+V3E3LqjHYKxut/ADG8xPDX65szZ2zcGhnjgumAWwr5SPCfO3j/2cuOmoRcxNyCUTk0jZrOM7038HLVTqkXE/c5WXtTFCMSenSWie+6HLqowV7YJPhDeoS26ZGRkNLcIGWnAm6O8eVbXPTlK2+mGV+FKfMi1yZaJUGwDbAPxJh6SfYdcElCsRNfMQK7hnBA4jhp6SKJi7JV8Xnw8MD8ZfHb6f7Drwo67ikfKB8qmIqnysPlZFyokwUt/Nz59fO750/1F/U39Q/1b+k6d07hc/7ytaj/v0f7qC1vA=</latexit>

βω

<latexit sha1_base64="vgRIBHRfM8P5X5XRcORNEHYFQ1A=">ALJ3iclVZb+NEFPYut425deGRlyOqKDbNRjYXgVYK2gUhBeSqLdl0V4oTa+I4qVnbMZ5xS+X63/DCX+EFCRCR/4JZ2Z8S+ptF0txzpw5l+98c2bGizjwKTOMf+7cfeXV15/415HfOt95d+/+e6d0kyauN3E3wSZ5tiDUC/zImzCfBd6zOPFIuAi8p4vnX/P5p+deQv1N9IRdxt4sJOvIX/kuYahy7neG3UOwvR9T/xzsbNx/3P9eo32i9+FYo70rIR47hkZ16NtrEoYE7Fzt2rE/wiGQJK1HZKfnMxmZx4juZ2cbUD2UqjQ4sZ4IidLRbZNznXkRs6ocQ1wYa1+r51KZp6GRsaObzPajFbvMJYx5xh6YeKwK+oYfWimnCG+VULczI5JwnwSAIeVN4ZFkiGI8BSWGDz2c6xSKLDGHf/YF6xcj3FSeuKs2q3jlLERulFDhwI7g+NEow4b0go9etub0Ftjbj8C+4iLalf7FqsS6rysRotMH81ofoj1Vq0rNf4MP/cwQsRZHqwjEuOYC7LrIkNOWAkr8CeoPzFygl+aOgbSflME4IRcVCaQJvcyTa+SK6ifOBE3LGc6h1ECtwiJ4BcOoGxBgYz28rpLJ1rpQXsYZWuIETQTHmxTBg1uEG5P15voexz+wS0ep0HTIpgJC8lJyVJrKzHYe2koEWDFbSqQglv7ndrBdJe10nu3AYprmQmvzmMko3iaYoJ9spQAbS/+fGysqVze0wbaG1G51z4t3XnvjtGxFJ5uIvhNb8SUytu/ReovyHYo2tF487iE5vIKtULcSUxFec9NrJ6dXYzUbWHfj39iY0J768U6Di3P8Gr2obQZvxsapUla7h/PMklcBmTbx5G4I7LmgSU0Ivn8gHNwJM86N4258349zSflnXAFsSYFmYMfWS4JsiM8/Q51+NKARtBR5Ydi6clFflWN0GE+ql14UG4+5ifiEX+NkWyU8DUYDPpj54kwXM1HDyuoib8+YyRJNhdQ5lS7JUJohTgscWSjXJhwgWPBP7xiE3nLiQNOwtW7eOJFS8DE4FMgsPB/8Fx+VQsw/GXdhMh6eURWiciSiKxWRBY2yXecKz9i1ewLlwUTLEtyawfO/O56iOVYQqHcNrcqc0uaW2huVeZW0Vo7qaHquWK5GwqLK6zqywUj1Pcq7sYF9RiMd/o+fwjLYqnRt+cqds2zt6+MTDEA9cFsxD2leI5dvZ+t5cbNw29iLkBoXRqGjGbZXyvuoGXq3ZKvZi4z8nam6IYkdCjs0x85+XQRQ32xibBH9IltE2PjISUXoYLtOTM0N05rmybm6Zs9cUs86M4ZV7kykSrNAC2Af7RCEs/wS4MLlEgbuIjVnDPCB5ID8tVSTB3C35unD68cD8ZPDZyaf7j74q6LinfKB8qGiKqXyuPFJGyrEyUdzOz51fO390/lR/UX9T/1L/lqZ37xQ+7ytbj/rvf1COvD8=</latexit>

πω

<latexit sha1_base64="I2uNeq8uo+34O7dUiqEacYf8brc=">ALJXiclVZbj9tEFHbLrTGXbuGRlyNWUWw2jWwuAhUFtSCkgLzaXdK0leLEmjhO1tR2jGe8y8rP8MLf4UXHqgQEk/8Fc7M+Jasu1sxTlz5ly+82ZGS/iwKfMP65dfu1948607HfXtd9597+7evfef0E2auN7E3QSb5NmCUC/wI2/CfBZ4z+LEI+Ei8J4un/L5+eQn1N9FjdhF7s5CsI3/lu4ShyrnX+ap7CLb3c+qfgZ2N+4/6P2q0T/Q+HGu0dynEY8fQqA59e03CkICdq1079ucfwxBIsrZD8ouT2ezUYyS3k9MNaDhbaXRoMQMcsdPFIvsu5zqSgk39EOLaQONaPZ/aNA2djA3NfJ7ZfrRiF7mEMc/YfTNPHZJHaMPzZQzxLdKiJvZMUmYTwLgsPLGsEgyBGewhKDx36OVQoF1rjH/uClasxTkpPnFW7dZwyNkI3auhQYGdwnGjUYUNaoUdvexN6a8ztR2AfcVHtat9jVUKdlzVWowXmr8b6EP2xSk169gt8+H+GgKUoUp07hgDXIBdFxly2lJAiT9B/YGZC/TS3DGQ9pMyGCfkvCKBNKGXeXKNXFL9xJmgaTnDOZQaqFUYJRH8wgGULSiQ0V5ed+lEKz1oD6NsDTGCZsL9bcqgwQ3C7el6E32Pwz+4weNJ0wGTIhjJS8lJSRIr63FYOylo0WAFrapQwpv73ViBtNd108nOHYZpzqUmv76M0k2iKcrJdgqQgfT/ubGycmVzO0ybrSG1W93z8p3X3jgtW9HJqLvxFZ8hYzte7TeonyHog2tF497SA4vYSvUjcRUhNfc9NrJ6dVYzQbW3fjXNia0p360+DiHL9CL2qbwZuxcaqU1e7hPLPkVYAm2fZxJO6IrHlgCY1IPj/gHBzJs85NY+68X0/zSXknXEKsSUHm4EeWS4LsCE+/Qx2+NqARdFT5oVh6cpFfVSN0mI9qFx6Um4/5iXjEX2MkGyV8DQaD/th5LAxX89GDCmrir08ZSZLNOZQ51W6JEFohDksc2SgXJlzgWPAPr9hE3nLigJNw9S6eNESMDH4FAgs/J8l1/VAgx/Wdchsl4dkVUisiQiqxWRhU3yA+fKj1g1+9JlwQTLktzagTO/ux5iOZQKLfNrcrckuYWmluVuVW01k5qHquWO6GwuIKq/pywQj1vYq7cUE9BuOdvs8fwHiL4qnRN2dqs6edvX1jYIgHrgpmIewrxXPs7L2wlxs3Db2IuQGhdGoaMZtlfKe6gZerdkq9mLjPydqbohiR0KOzTHzl5dBFDXbGJsEfkiW0TY+MhJRehAu05LzQ3TmubJubpmz15SzozhlXuTKRKs0ALYB/skISz/BHgwuUCBu4iNWcE8JHkcMPyxVJMHcLfmq8OSTgfnp4POTz/YflPQcUf5UPlI0RT+UJ5qIyUY2WiuJ1fO793/uy8UH9T/1D/Uv+WprdvFT4fKFuP+u9/dDO7aA=</latexit>

<latexit sha1_base64="IrRTt0M5njxQwpyOni4bGbdsRxg=">ALI3iclVZbj9tEFHbLrTGXbuGRlyNWUWw2jWwuAioFtSCkgLzaXdK0leLEmjhO1tR2jGe8y8r/8ILf4UXHkAVLzwXzgz41uy7m6xFOfMmXP5zjdnZryIA58yw/jn1u3Xn/jzbfudNS3n3vbt795/Qjdp4noTdxNskmcLQr3Aj7wJ81ngPYsTj4SLwHu6eP4tn3965iXU30SP2UXszUKyjvyV7xKGKude56vuIdjez6l/BnY27j/q/6jRPtH7cKzR3qUQjx1Dozr07TUJQwJ2rnbt2J9/DEMgydoOyS9OZrNTj5HcTk43oOFspdGhxQxwxE4Xi+y7nOtICjb1Q4hrA41r9Xxq0zR0MjY083lm+9GKXeQSxjxj9808cdgldYw+NFPOEN8qIW5mxyRhPgmAw8obwyLJER4CksMHvs5VikUWOf+wLVq7GOCk9cVbt1nHK2AjdqKFDgZ3BcaJRhw1phR697U3orTG3H4F9xEW1q32PVQl1XtZYjRaYvxrQ/THKjXp2S/w4f8ZApaiSHXuGAJcwF2XWTIaUsBJf4E9QdmLtBLc8dA2k/KYJyQ84oE0oRe5sk1ckn1E2eCpuUM51BqoFZhlETwCwdQtqBARnt53aUTrfSgPYyNcQImgn3tymDBjcIt6frTfQ9Dv/gBo8nTQdMimAkLyUnJUmsrMdh7aSgRYMVtKpCW/ud2MF0l7XTSc7dximOZea/PoySjeJpign2ylABtL/58bKypXN7TBtobUbnXPy3de+O0bEUnm4i+E1vxFTK279F6i/Idija0XjzuITm8hK1QNxJTEV5z02snp1djNRtYd+Nf25jQnvrRToOLc/wKvahtBm/GxqlSVruH8ySVwGaZNvHkbgjsuaBJTQi+fyAc3Akzo3jbnzfj3NJ+WdcAmxJgWZgx9ZLgmyIz9DnX42oBG0FHlh2LpyUV+VY3QYT6qXhQbj7mJ+IRf42RbJTwNRgM+mPnsTBczUcPKqiJvz5lJEk251DmVLslQmiFOCxZKNcmHCBY8E/vGITecuJA07C1bt4kVLwMTgUyCw8H/yXH5VCzD8ZV2HyHp1RFaJyJKIrFZEFjbJD5wrP2LV7EuXBRMsS3JrB8787nqI5VhCodw2typzS5pbaG5V5lbRWjupoeq5YrkbCosrOrLBSPU9yruxgX1GIx3+j5/AOMtiqdG35yptaOzt28MDPHAVcEshH2leI6dvRf2cuOmoRcxNyCUTk0jZrOM71M38HLVTqkXE/c5WXtTFCMSenSWiW+8HLqowb7YJPhDqoS26ZGRkNKLcIGWnBW6O8eVbXPTlK2+nGV+FKfMi1yZaJUGwDbAPxh6SfYgcEFCsRNfMQK7inBw4jhZ6WKJi7JV8VnwyMD8dfH7y2f7Dbwo67igfKh8pmIqXygPlZFyrEwUt/Nr5/fOn52/1N/UP9QX6t/S9PatwucDZetR/0PWc+6aA=</latexit>

ω

<latexit sha1_base64="eCoxmkJFTZm09Z7Zr1pKfruUGI=">ALK3iclVZb+NEFPYut425deGRlyOqKDbNRjYXgVYK2oKQAnLVlmx2V4oTa+I4qVnbMZ5xS+X6/DCX+EBHriIV/4HZ2Z8S+ptF0txzpw5l+98c2bGizjwKTOMv+7cfeXV15/415HfOt95d+/+e0/oJk1cb+Jugk3ybEGoF/iRN2E+C7xnceKRcBF4TxfPv+bzT8+9hPqb6DG7jL1ZSNaRv/JdwlDl3O8cdo/A9n5M/XOws3H/sP+9RvtE78OJRntXQjxDI3q0LfXJAwJ2LnatWN/hEMgSRrOyQ/OZnNzjxGcjs524CGs5VGhxYzwBE7Wyb3KuIynY1A8hrg0rtXzqU3T0MnY0Mznme1HK3aZSxjzjD0w8RhV9Qx+tBMOUN8q4S4mR2ThPkA4rbwyLJEMQ4SksMXjs51ilUGCNO/6xL1i5HuO09MRZtVvHKWMjdKOGDgV2BieJRh02pBV69LY3obfG3H4E9jEX1a72LVYl1HlZYzVaYP5qrA/RH6vUpGe/wIf/5whYiLVhWMIcM0F2HWRIactBZT4E9QfmLlAL80dA2k/LYNxQi4qEkgTepkn18gV1U+dCZqWM5xDqYFahVESwS8cQNmCAhnt5XWXTrTSg/YwytYQI2gmPNimDBrcINyerjfR9zj8g1s8njQdMCmCkbyUnJQksbIeh7WTghYNVtCqCiW8ud+tFUh7XTed7MJhmOZCavKbyjdJqinGynABlI/58bKytXNrfDtNkaUrvVPS/e2N07IVnWwi+k5sxZfI2L5H6y3Kdyja0HrxuIfk8Aq2Qt1KTEV4zU2vnZxejdVsYN2Nf2NjQnvqw50GF+f4NXpR2wzejI1Tpax2j+aZJa8CNMm2jyNxR2TNA0toRPL5AefgWJ51bhpz5/16mk/KO+EKYk0KMgc/slwSZMd4+h3p8KUBjaCjyg/F0pOL/KoaocN8VLvwoNx8zE/EY/4aI9ko4WswGPTHzmNhuJqPHlZQE39xkiSbC6gzKl2S4TQCnFY4shGuTDhAseCf3jFJvKWEwechKt38cSLloCJwadAYOH/4Ln8qhZg+Mu6CZH18oisEpElEVmtiCxsku84V37EqtkXLgsmWJbk1g6c+d31EMuxhEK5bW5V5pY0t9DcqsytorV2UkPVc8VyNxQWV1jVlwtGqO9V3I0L6jEY7/R9/hDGWxRPjb45a3qxe7c2zcGhnjgumAWwr5SPCfO3m/2cuOmoRcxNyCUTk0jZrOM71c38DBsSr2YuM/J2puiGJHQo7NMfOvl0EUN9scmwR9SJrRNj4yElF6GC7Tk7NDdOa5sm5umbPXFLPOjOGVe5MpEqzQAtgH+4QhLP8FODC5RIG7iI1ZwzwgeSgw/L1Ukwdwt+brw5OB+cngs9NP9x9VdBxT/lA+VDRFP5XHmkjJQTZaK4nZ87v3b+6Pyp/qL+rv6t/iN794pfN5Xth713/8AFBS9TA=</latexit>

ω

<latexit sha1_base64="eCoxmkJFTZm09Z7Zr1pKfruUGI=">ALK3iclVZb+NEFPYut425deGRlyOqKDbNRjYXgVYK2oKQAnLVlmx2V4oTa+I4qVnbMZ5xS+X6/DCX+EBHriIV/4HZ2Z8S+ptF0txzpw5l+98c2bGizjwKTOMv+7cfeXV15/415HfOt95d+/+e0/oJk1cb+Jugk3ybEGoF/iRN2E+C7xnceKRcBF4TxfPv+bzT8+9hPqb6DG7jL1ZSNaRv/JdwlDl3O8cdo/A9n5M/XOws3H/sP+9RvtE78OJRntXQjxDI3q0LfXJAwJ2LnatWN/hEMgSRrOyQ/OZnNzjxGcjs524CGs5VGhxYzwBE7Wyb3KuIynY1A8hrg0rtXzqU3T0MnY0Mznme1HK3aZSxjzjD0w8RhV9Qx+tBMOUN8q4S4mR2ThPkA4rbwyLJEMQ4SksMXjs51ilUGCNO/6xL1i5HuO09MRZtVvHKWMjdKOGDgV2BieJRh02pBV69LY3obfG3H4E9jEX1a72LVYl1HlZYzVaYP5qrA/RH6vUpGe/wIf/5whYiLVhWMIcM0F2HWRIactBZT4E9QfmLlAL80dA2k/LYNxQi4qEkgTepkn18gV1U+dCZqWM5xDqYFahVESwS8cQNmCAhnt5XWXTrTSg/YwytYQI2gmPNimDBrcINyerjfR9zj8g1s8njQdMCmCkbyUnJQksbIeh7WTghYNVtCqCiW8ud+tFUh7XTed7MJhmOZCavKbyjdJqinGynABlI/58bKytXNrfDtNkaUrvVPS/e2N07IVnWwi+k5sxZfI2L5H6y3Kdyja0HrxuIfk8Aq2Qt1KTEV4zU2vnZxejdVsYN2Nf2NjQnvqw50GF+f4NXpR2wzejI1Tpax2j+aZJa8CNMm2jyNxR2TNA0toRPL5AefgWJ51bhpz5/16mk/KO+EKYk0KMgc/slwSZMd4+h3p8KUBjaCjyg/F0pOL/KoaocN8VLvwoNx8zE/EY/4aI9ko4WswGPTHzmNhuJqPHlZQE39xkiSbC6gzKl2S4TQCnFY4shGuTDhAseCf3jFJvKWEwechKt38cSLloCJwadAYOH/4Ln8qhZg+Mu6CZH18oisEpElEVmtiCxsku84V37EqtkXLgsmWJbk1g6c+d31EMuxhEK5bW5V5pY0t9DcqsytorV2UkPVc8VyNxQWV1jVlwtGqO9V3I0L6jEY7/R9/hDGWxRPjb45a3qxe7c2zcGhnjgumAWwr5SPCfO3m/2cuOmoRcxNyCUTk0jZrOM71c38DBsSr2YuM/J2puiGJHQo7NMfOvl0EUN9scmwR9SJrRNj4yElF6GC7Tk7NDdOa5sm5umbPXFLPOjOGVe5MpEqzQAtgH+4QhLP8FODC5RIG7iI1ZwzwgeSgw/L1Ukwdwt+brw5OB+cngs9NP9x9VdBxT/lA+VDRFP5XHmkjJQTZaK4nZ87v3b+6Pyp/qL+rv6t/iN794pfN5Xth713/8AFBS9TA=</latexit>
slide-8
SLIDE 8

The Option Critic framework

  • Discounted return

ρ(Ω, θ, ϑ, s0, w0) = EΩ,θ,ω[

X

t=0

γtrt+1|s0, ω0]

<latexit sha1_base64="X29SnWxqhiosMecWzR0JNx7MfDQ=">AEM3icbVNJb9NAFHYdlmK2Fo5cnqgiHDCVzSK4RCogJHpApEAXKZNY48kGdUbM+NA5M5/4sIf4YCEOIAQV/4D4yULaUey/L3vbd97mgnSkAnput/XzMa58xcurl+yLl+5eu36xuaNA5FknNB9koQJPwqwoCGL6b5kMqRHKac4CkJ6GBy/KPyHE8oFS+L3cprSXoRHMRsygqWm/E1zt/kaEP2QsQmg/J3zHlrCwe3HOjY4s5JCTu+a4sWOGiEowgDUlYTpax/F9qA+QhF+JOfIzmEivExwnY2jtnWnBGhLjoMgf6kKDmeABIsgXQTYBdtSXSyM9l21P9HLF4KeqktHP5X1PcV+eCN91YLlT+sbckxylGIuGQ6hkKWzLpJG8ryAga6eMqUnrIk9Iwr+Skrt3K6xt4sU3ut5qLOrLaW7i6kQ61dQofbwpdtMVevs1ES0ZHuzWJAbwpoNe1dPVJq9mMcyvQ/ed2q20VQ9pVolPL0/+J1lvBstNH3y21Le9/NaWq2D1D/0w+1/w9T5Xiq3Df7fkbW+62Wx4DbwabBn16fgbX9EgIVlEY0lCLETXc1PZy4sFk5AqC2WCpgc4xHtahjiIpeXt5BU3NDGCYcP3FEkp2OSPHkRDTKNCRxaxi1VeQZ/m6mRw+7eUsTjNJY1I1GmYhyASKBwQDximR4VQDTDjTWoGMsb4uUj8zSy/BWx35NDh4sO093H6892hr53m9jnXjlnHbsA3PeGLsGK+MjrFvEPOz+c38af5qfGn8aPxu/KlCzbU656bx32n8/Qc/2Dl</latexit>

8

slide-9
SLIDE 9

The Option Critic framework

  • Discounted return
  • Option-value function

ρ(Ω, θ, ϑ, s0, w0) = EΩ,θ,ω[

X

t=0

γtrt+1|s0, ω0]

<latexit sha1_base64="X29SnWxqhiosMecWzR0JNx7MfDQ=">AEM3icbVNJb9NAFHYdlmK2Fo5cnqgiHDCVzSK4RCogJHpApEAXKZNY48kGdUbM+NA5M5/4sIf4YCEOIAQV/4D4yULaUey/L3vbd97mgnSkAnput/XzMa58xcurl+yLl+5eu36xuaNA5FknNB9koQJPwqwoCGL6b5kMqRHKac4CkJ6GBy/KPyHE8oFS+L3cprSXoRHMRsygqWm/E1zt/kaEP2QsQmg/J3zHlrCwe3HOjY4s5JCTu+a4sWOGiEowgDUlYTpax/F9qA+QhF+JOfIzmEivExwnY2jtnWnBGhLjoMgf6kKDmeABIsgXQTYBdtSXSyM9l21P9HLF4KeqktHP5X1PcV+eCN91YLlT+sbckxylGIuGQ6hkKWzLpJG8ryAga6eMqUnrIk9Iwr+Skrt3K6xt4sU3ut5qLOrLaW7i6kQ61dQofbwpdtMVevs1ES0ZHuzWJAbwpoNe1dPVJq9mMcyvQ/ed2q20VQ9pVolPL0/+J1lvBstNH3y21Le9/NaWq2D1D/0w+1/w9T5Xiq3Df7fkbW+62Wx4DbwabBn16fgbX9EgIVlEY0lCLETXc1PZy4sFk5AqC2WCpgc4xHtahjiIpeXt5BU3NDGCYcP3FEkp2OSPHkRDTKNCRxaxi1VeQZ/m6mRw+7eUsTjNJY1I1GmYhyASKBwQDximR4VQDTDjTWoGMsb4uUj8zSy/BWx35NDh4sO093H6892hr53m9jnXjlnHbsA3PeGLsGK+MjrFvEPOz+c38af5qfGn8aPxu/KlCzbU656bx32n8/Qc/2Dl</latexit>

QΩ(s, w) = X

a

πω,θ(a|s)QU(s, ω, a)

<latexit sha1_base64="bTr07gbxGrSzQlPWVAwpZwv80x8=">AEd3icbVNLb9NAEHaTAMW8WrjBgRUhYBdT2TwEl0gFhAQHRAKkrZRNrLWzSVb1i91S+TuT+DPceN/cOHG+JWGtCtZnvlmvplvxl4vCZiQtv17o9FsXbp8ZfOqfu36jZu3trZv74s45T4d+HEQ80OPCBqwiA4kwE9TDgloRfQA+/oXR4/OKZcsDj6JhcJHYVkFrEp84kEyN1u/Ox8Qph+T9kxwtlX6431xRAWMS3UM8Tj08LsubYhTGThGQlDgrDSOzh4x3URYTPcEh+uBmWcyqJwnweIwOiS8REF6Qh8OTc87L3KsdIirBgIUrOEowcNdUQizR0M9l1DjDLJrKhSpljDP51FHclafCtS202nIE+qac+BlOCJeMBCiXpVbcqkXFeUFmkDxhCmYsgBgxjV+woqtnK/Rr5kQ1TtnderaIN0+k4q7RL1uCFc2RVL9cDGcUhn0JtFCH/OTb1jfISpCljVMy49D/ovfbMLfJjSKJlWpQ/exyC4NItWJ65diFv9AOuUsuTwgFq/RzwJ4q1Jfprj3S+3WtfB8nyx2QVeV1G2WQU2H23QGk1hFiulte9cuDjpvOJXR1qrTc7d+4UnspyGNpB8QIYaOnchRln8kP6BKx6mgCfGPyIwOwYxISMUoK+6NQh1AJmgac3giQp0lZGRUIhF6EFmvi6xHsvBi2LDVE5fjzIWJamkV82mqYBkjHKLyGaME59GSzAID5noBX5cwK/nISrqsMSnPWRzxv7z3ad57sv+y/ae2+rdWxq97QHmqE52itT/ug9bSB5jf+NO8282Hzb+t+61HLaNMbWxUnDvaf6fl/AP2rXN8</latexit>

9

slide-10
SLIDE 10

The Option Critic framework

  • Discounted return
  • Option-value function
  • Value of executing an action in the context of a state-option pair

ρ(Ω, θ, ϑ, s0, w0) = EΩ,θ,ω[

X

t=0

γtrt+1|s0, ω0]

<latexit sha1_base64="X29SnWxqhiosMecWzR0JNx7MfDQ=">AEM3icbVNJb9NAFHYdlmK2Fo5cnqgiHDCVzSK4RCogJHpApEAXKZNY48kGdUbM+NA5M5/4sIf4YCEOIAQV/4D4yULaUey/L3vbd97mgnSkAnput/XzMa58xcurl+yLl+5eu36xuaNA5FknNB9koQJPwqwoCGL6b5kMqRHKac4CkJ6GBy/KPyHE8oFS+L3cprSXoRHMRsygqWm/E1zt/kaEP2QsQmg/J3zHlrCwe3HOjY4s5JCTu+a4sWOGiEowgDUlYTpax/F9qA+QhF+JOfIzmEivExwnY2jtnWnBGhLjoMgf6kKDmeABIsgXQTYBdtSXSyM9l21P9HLF4KeqktHP5X1PcV+eCN91YLlT+sbckxylGIuGQ6hkKWzLpJG8ryAga6eMqUnrIk9Iwr+Skrt3K6xt4sU3ut5qLOrLaW7i6kQ61dQofbwpdtMVevs1ES0ZHuzWJAbwpoNe1dPVJq9mMcyvQ/ed2q20VQ9pVolPL0/+J1lvBstNH3y21Le9/NaWq2D1D/0w+1/w9T5Xiq3Df7fkbW+62Wx4DbwabBn16fgbX9EgIVlEY0lCLETXc1PZy4sFk5AqC2WCpgc4xHtahjiIpeXt5BU3NDGCYcP3FEkp2OSPHkRDTKNCRxaxi1VeQZ/m6mRw+7eUsTjNJY1I1GmYhyASKBwQDximR4VQDTDjTWoGMsb4uUj8zSy/BWx35NDh4sO093H6892hr53m9jnXjlnHbsA3PeGLsGK+MjrFvEPOz+c38af5qfGn8aPxu/KlCzbU656bx32n8/Qc/2Dl</latexit>

QΩ(s, w) = X

a

πω,θ(a|s)QU(s, ω, a)

<latexit sha1_base64="bTr07gbxGrSzQlPWVAwpZwv80x8=">AEd3icbVNLb9NAEHaTAMW8WrjBgRUhYBdT2TwEl0gFhAQHRAKkrZRNrLWzSVb1i91S+TuT+DPceN/cOHG+JWGtCtZnvlmvplvxl4vCZiQtv17o9FsXbp8ZfOqfu36jZu3trZv74s45T4d+HEQ80OPCBqwiA4kwE9TDgloRfQA+/oXR4/OKZcsDj6JhcJHYVkFrEp84kEyN1u/Ox8Qph+T9kxwtlX6431xRAWMS3UM8Tj08LsubYhTGThGQlDgrDSOzh4x3URYTPcEh+uBmWcyqJwnweIwOiS8REF6Qh8OTc87L3KsdIirBgIUrOEowcNdUQizR0M9l1DjDLJrKhSpljDP51FHclafCtS202nIE+qac+BlOCJeMBCiXpVbcqkXFeUFmkDxhCmYsgBgxjV+woqtnK/Rr5kQ1TtnderaIN0+k4q7RL1uCFc2RVL9cDGcUhn0JtFCH/OTb1jfISpCljVMy49D/ovfbMLfJjSKJlWpQ/exyC4NItWJ65diFv9AOuUsuTwgFq/RzwJ4q1Jfprj3S+3WtfB8nyx2QVeV1G2WQU2H23QGk1hFiulte9cuDjpvOJXR1qrTc7d+4UnspyGNpB8QIYaOnchRln8kP6BKx6mgCfGPyIwOwYxISMUoK+6NQh1AJmgac3giQp0lZGRUIhF6EFmvi6xHsvBi2LDVE5fjzIWJamkV82mqYBkjHKLyGaME59GSzAID5noBX5cwK/nISrqsMSnPWRzxv7z3ad57sv+y/ae2+rdWxq97QHmqE52itT/ug9bSB5jf+NO8282Hzb+t+61HLaNMbWxUnDvaf6fl/AP2rXN8</latexit>

QU(s, ω, a) = r(s, a) + γ X

s0

P(s0|s, a)U(ω, s0)

<latexit sha1_base64="QWX9mNVyILTOwsbsnV6viSUBgIM=">AEvXicbVNb9MwFM7WAqPcNnjk5YipWsq6KeEiEFLFACHBA6IFuk2qu+CkbmuWG7azUWX+k/DEv+Hk1pZulqIcf+f2nc+2G/tcKsv6u7Zeq1+7fmPjZuPW7Tt3721u3T+USI81vciPxLHLpXM5yHrK658dhwLRgPXZ0fu6bvMf3TGhOR+E3NYjYM6CTkY+5RhZCztf6n+QkI+5nwMyDp1/ab9hdTtmrDV1T7lzkZtexTNmCNpnQIKBAdKNJYn7yGDpAxYQE9JeTEjVlimoiphGY6J0jLbgiDHCnpq6bvtcZRhMgkgcQLwLMDG3pAZFJ4KSqY+uTlPBwrGa6oHGSqj1bC0dSMdqw3LIfIbC+qlJKZCcepDRksvbcsmHcjLSxh8ZhrnDIHcMaV/Jjnqlyu0asy0dtoLupUtZG6taAOJXcFXWFKR3XknD1mkyhgE+zNQyCfM7PRND/iVDmsqxnOxf7z/etDubjlGaR2S754f8MCRdm3urcsXJywewmlKUHFwxQMVfIL5r65x9Ee5YKHuvKpYJcj4XgS5Tr/pok17IVs/pY2jlQ0LABYIFhG5vLAL1Q3MickdvbikfbPKkDstZ3Pb2rfyBZcNuzS2jXJ1nc3fZBR5ScBC5flUyoFtxWqYZkft+Uw3SCJZTL1TOmEDNEMaMDlM89enoYnICMaRwC9UkKPLGSkNpJwFLkZmostVXwZe5RskavxymPIwThQLvaLROPFBRZA9ZRhxwTzlz9CgnuDIFbwpxYur8ME3UAR7deTLxuGTfvp/vPes+2Dt6UcG8ZD45FhGrbxwjgwPhdo294tVe17zVe+1F/XWd1vx4WoetrZc4D479VP/8HiFGJA=</latexit>

10

slide-11
SLIDE 11

The Option Critic framework

  • Discounted return
  • Option-value function
  • Value of executing an action in the context of a state-option pair
  • Option-value function upon arrival

ρ(Ω, θ, ϑ, s0, w0) = EΩ,θ,ω[

X

t=0

γtrt+1|s0, ω0]

<latexit sha1_base64="X29SnWxqhiosMecWzR0JNx7MfDQ=">AEM3icbVNJb9NAFHYdlmK2Fo5cnqgiHDCVzSK4RCogJHpApEAXKZNY48kGdUbM+NA5M5/4sIf4YCEOIAQV/4D4yULaUey/L3vbd97mgnSkAnput/XzMa58xcurl+yLl+5eu36xuaNA5FknNB9koQJPwqwoCGL6b5kMqRHKac4CkJ6GBy/KPyHE8oFS+L3cprSXoRHMRsygqWm/E1zt/kaEP2QsQmg/J3zHlrCwe3HOjY4s5JCTu+a4sWOGiEowgDUlYTpax/F9qA+QhF+JOfIzmEivExwnY2jtnWnBGhLjoMgf6kKDmeABIsgXQTYBdtSXSyM9l21P9HLF4KeqktHP5X1PcV+eCN91YLlT+sbckxylGIuGQ6hkKWzLpJG8ryAga6eMqUnrIk9Iwr+Skrt3K6xt4sU3ut5qLOrLaW7i6kQ61dQofbwpdtMVevs1ES0ZHuzWJAbwpoNe1dPVJq9mMcyvQ/ed2q20VQ9pVolPL0/+J1lvBstNH3y21Le9/NaWq2D1D/0w+1/w9T5Xiq3Df7fkbW+62Wx4DbwabBn16fgbX9EgIVlEY0lCLETXc1PZy4sFk5AqC2WCpgc4xHtahjiIpeXt5BU3NDGCYcP3FEkp2OSPHkRDTKNCRxaxi1VeQZ/m6mRw+7eUsTjNJY1I1GmYhyASKBwQDximR4VQDTDjTWoGMsb4uUj8zSy/BWx35NDh4sO093H6892hr53m9jnXjlnHbsA3PeGLsGK+MjrFvEPOz+c38af5qfGn8aPxu/KlCzbU656bx32n8/Qc/2Dl</latexit>

QΩ(s, w) = X

a

πω,θ(a|s)QU(s, ω, a)

<latexit sha1_base64="bTr07gbxGrSzQlPWVAwpZwv80x8=">AEd3icbVNLb9NAEHaTAMW8WrjBgRUhYBdT2TwEl0gFhAQHRAKkrZRNrLWzSVb1i91S+TuT+DPceN/cOHG+JWGtCtZnvlmvplvxl4vCZiQtv17o9FsXbp8ZfOqfu36jZu3trZv74s45T4d+HEQ80OPCBqwiA4kwE9TDgloRfQA+/oXR4/OKZcsDj6JhcJHYVkFrEp84kEyN1u/Ox8Qph+T9kxwtlX6431xRAWMS3UM8Tj08LsubYhTGThGQlDgrDSOzh4x3URYTPcEh+uBmWcyqJwnweIwOiS8REF6Qh8OTc87L3KsdIirBgIUrOEowcNdUQizR0M9l1DjDLJrKhSpljDP51FHclafCtS202nIE+qac+BlOCJeMBCiXpVbcqkXFeUFmkDxhCmYsgBgxjV+woqtnK/Rr5kQ1TtnderaIN0+k4q7RL1uCFc2RVL9cDGcUhn0JtFCH/OTb1jfISpCljVMy49D/ovfbMLfJjSKJlWpQ/exyC4NItWJ65diFv9AOuUsuTwgFq/RzwJ4q1Jfprj3S+3WtfB8nyx2QVeV1G2WQU2H23QGk1hFiulte9cuDjpvOJXR1qrTc7d+4UnspyGNpB8QIYaOnchRln8kP6BKx6mgCfGPyIwOwYxISMUoK+6NQh1AJmgac3giQp0lZGRUIhF6EFmvi6xHsvBi2LDVE5fjzIWJamkV82mqYBkjHKLyGaME59GSzAID5noBX5cwK/nISrqsMSnPWRzxv7z3ad57sv+y/ae2+rdWxq97QHmqE52itT/ug9bSB5jf+NO8282Hzb+t+61HLaNMbWxUnDvaf6fl/AP2rXN8</latexit>

QU(s, ω, a) = r(s, a) + γ X

s0

P(s0|s, a)U(ω, s0)

<latexit sha1_base64="QWX9mNVyILTOwsbsnV6viSUBgIM=">AEvXicbVNb9MwFM7WAqPcNnjk5YipWsq6KeEiEFLFACHBA6IFuk2qu+CkbmuWG7azUWX+k/DEv+Hk1pZulqIcf+f2nc+2G/tcKsv6u7Zeq1+7fmPjZuPW7Tt3721u3T+USI81vciPxLHLpXM5yHrK658dhwLRgPXZ0fu6bvMf3TGhOR+E3NYjYM6CTkY+5RhZCztf6n+QkI+5nwMyDp1/ab9hdTtmrDV1T7lzkZtexTNmCNpnQIKBAdKNJYn7yGDpAxYQE9JeTEjVlimoiphGY6J0jLbgiDHCnpq6bvtcZRhMgkgcQLwLMDG3pAZFJ4KSqY+uTlPBwrGa6oHGSqj1bC0dSMdqw3LIfIbC+qlJKZCcepDRksvbcsmHcjLSxh8ZhrnDIHcMaV/Jjnqlyu0asy0dtoLupUtZG6taAOJXcFXWFKR3XknD1mkyhgE+zNQyCfM7PRND/iVDmsqxnOxf7z/etDubjlGaR2S754f8MCRdm3urcsXJywewmlKUHFwxQMVfIL5r65x9Ee5YKHuvKpYJcj4XgS5Tr/pok17IVs/pY2jlQ0LABYIFhG5vLAL1Q3MickdvbikfbPKkDstZ3Pb2rfyBZcNuzS2jXJ1nc3fZBR5ScBC5flUyoFtxWqYZkft+Uw3SCJZTL1TOmEDNEMaMDlM89enoYnICMaRwC9UkKPLGSkNpJwFLkZmostVXwZe5RskavxymPIwThQLvaLROPFBRZA9ZRhxwTzlz9CgnuDIFbwpxYur8ME3UAR7deTLxuGTfvp/vPes+2Dt6UcG8ZD45FhGrbxwjgwPhdo294tVe17zVe+1F/XWd1vx4WoetrZc4D479VP/8HiFGJA=</latexit>

U(ω, s0) = (1 − βω,ϑ(s0))QΩ(s0, w) + βω,ϑ(s0)VΩ(s0)

<latexit sha1_base64="5TzYT/xdN2IhH/zeMgYObT0MAM=">AFMnichVRLb9NAEHZJgBJeLRy5jKi2DStbB6CS6QCQioHRAKkrZRNrbWzSVb1C+6JXL9m7jwS5A4wAGEuPIjGL8SN63ESlFmvplv5pvx2lbgcCF1/fvKpVr98pWrq9ca12/cvHV7bf3OnvCj0GZ923f8MCigjncY3JpcMOgpBR13LYvnX0Mo3vH7NQcN/7IGcBG7p04vExt6lEyFyv7TbfAGEfI34MJH7ft5+p4o21drQVUXrNDO7pq4KDdpkQl2XAkaTRLwfQARpOiEs/mTGRUyZpQsKpDypG54gGF6QBenJqWfGrJMVoBERwF4JFgpqiWjIgInLNWHaM5DAm3BvLWZLOIzlpGEpjwVpt6Gash6huH1I5JQEPJqQOprKTiFk06kJUXMLiAU9wygzAGZf4Ac+2cr5Gr2RitNFc1Clro3R9IR0K7RK6oSpM2RFz9cgmvsm2Jt7QN6mZqOpvsapMjgpZ5x7Fvaf+1oH+TilmjPbhT78P0bBuZm1OjH1TFz1ASxT8pKDCwYo9YeIbxpJpj5PN3Vce68sli7kZL4EWpVe9klUeiq0ntnH1DKS7jBHYAFhlTDbL2xCeQUzZaKVLG5pXy0ZoqU1znhYQDVg6+zGoLIaVNvStKr4Vqp+8z+MvSpBM9c29G09O3DeMApjQylO1z7Ska+HbnMk7ZDhRgYeiCHcXq/bIclDRIJFlD7iE7YAE2PukwM4+yVT6CJyAjGfog/T0KGVhkxdYWYuRZmpk9aLMdS8KLYIJLjZ8OYe0EkmWfnjcaRA9KH9PsBIx4yWzozNKgdctQK9pTi2yLxK9PAJRjLI5839h5uG4+2n/Qeb+y8KNaxqtxT7iuqYihPlR1lV+kqfcWufa59q/2s/ap/qf+o/67/yVMvrRScu8qZU/7D8Q/spI=</latexit>

11

slide-12
SLIDE 12

The Option Critic framework (cont.)

  • pairs lead to an augmented state space
  • One step probability transition

12

(s, ω)

<latexit sha1_base64="4zDhsbeInyb2kNZj6IFX9LWrAvs=">ALMHiclVZbj9tEFHbLrTG3LTzycsQqis2mkc1FoEpBLQgUkFe7S5q2UpxYE8fJmtqO8Yx3WXn9k3jhp8ALSCDEK7+CMzO+JevutpbinDlzLt/5syMF3HgU2Yf96/cqr73+xp2O+uZb7/z7t7d9x7TZq43sTdBJvk6YJQL/Ajb8J8FnhP48Qj4SLwniyefc3n5x5CfU30SN2EXuzkKwjf+W7hKHKudv5tnsItvdT6p+BnY37D/s/aLRP9D4ca7R3KcRjx9CoDn17TcKQgJ2rXTv25x/BEiytkPys5PZ7NRjJLeT0w1oOFtpdGgxAxyx08Ui+ybnOpKCTf0Q4tpA41o9n9o0DZ2MDc18ntl+tGIXuYQxz9g9M08cdkdow/NlDPEt0qIm9kxSZhPAuCw8sawSDIEZ7CEoPHfo5VCgXWuOMf+4KVqzFOSk+cVbt1nDI2Qjdq6FBgZ3CcaNRhQ1qhR297E3przO1HYB9xUe1q32FVQp2XNVajBeavxvoQ/bFKTXr2C3z4f4aApShSnTuGANdcgF0XGXLaUkCJP0H9gZkL9NLcMZD2kzIYJ+S8IoE0oZd5co1cUv3EmaBpOcM5lBqoVRglEfzCAZQtKJDRXl536UQrPWgPo2wNMYJmwr1tyqDBDcLt6XoTfY/DP7jB43HTAZMiGMlLyUlJEivrcVg7KWjRYAWtqlDCm/vdWIG013XTyc4dhmnOpSa/vozSTaIpysl2CpCB9JfcWFm5srkdps3WkNqt7n+zmtvnJat6GQT0XdiK75AxvY9Wm9RvkPRhtaLxz0kh5ewFepGYirCa2567eT0aqxmA+tu/GsbE9pTP9xpcHGOX6EXtc3gzdg4Vcpq93CeWfIqQJNs+zgSd0TWPLCERiSfH3AOjuRZ56Yxd96vp/mkvBMuIdakIHPwI8slQXaEp9+hDl8a0Ag6qvxQLD25yK+qETrMR7ULD8rNx/xEPOKvMZKNEr4Gg0F/7DwShqv56H4FNfHXp4wkyeYcypxqt0QIrRCHJY5slAsTLnAs+IdXbCJvOXHASbh6F0+8aAmYGHwKBb+j57Lr2oBhr+s6xBZL47IKhFZEpHVisjCJvmec+VHrJp97rJgmVJbu3Amd9dD7EcSyiU2+ZWZW5JcwvNrcrcKlprJzVUPVcsd0NhcYVfblghPpexd24oB6D8U7f5/dhvEXx1Oibs6anWt9TurO3bwM8cBVwSyEfaV4jp293+zlxk1DL2JuQCidmkbMZhnftG7g5aqdUi8m7jOy9qYoRiT06CwTH3w5dFGDTbJ8Ie8CW3TIyMhpRfhAi05RXR3jivb5qYpW30xy/woTpkXuTLRKg2AbYB/PcLST7AdgwsUiJv4iBXcU4InE8NvTBVJMHdLvio8/nhgfjL47OT/QdfFXTcUT5QPlQ0xVQ+Vx4oI+VYmShu5fO752/On+rv6p/qP+o/0rT27cKn/eVrUf97397+b6O</latexit>

P(st+1, ωt+1|st, ωt) = X

a

πωt,θ(a|st)P(st+1|st, at)(

<latexit sha1_base64="lzG5CZOInelMdeR2fMIipSm2Y5I=">ALa3iclVZb+NEFPYut60LS5d9AEPR1R7G0a2VwEWiloF4QUkKu2ZNdKU6sieOkZuMLnklL5fqBv8gb/4AX/gNnZjy2k2a7S6Qkx2fO5TvfnDnjaboMKbOsv+/cfevtd959796Ov+B/c/3Hvw0RlNVpkfDP1kmWQvpoQGyzAOhixky+BFmgUkmi6D59OXP/L15xdBRsMkfsau0mAckUczkOfMFR5D3b+bB2BG/y+Ci/AzQedp51fDdohZgdODNq+FuKJZxnUhI67IFEwC30lpuGk0fQA5It3Ij84eUuOw8YKdzsPAEDVyuNCVvMAJ/Y+XSa/1RwHVmBS8MI0trA4FqzGLl0FXk569nFJHfDeM6uCgljkrNDu8g8dk09qwPNlGPEN8+In7spyVhIlsBhFY3HMkPRHgKMwyehgVWKRY4Z/GgpWbsY4VZ64qrfqOCo2Qrdq6FBiZ3CSGdRjPVqhR283iYIF5g5jcI+5qLeMn7EqoS5UjdXTFPNXz2YP/bFKQ3p2Snz4f4GApShSXqWANfcgE0XGXK0pQCFP0P9gV0I9NLcs5D2UxWME3JZkUCa0FWewiDX1Dz1hmiqVjiHUgO1CqNkgl84ANWCAhltF3WXDg3lQdsYZe0RIxg2HK5TBg1uEG7bNJvo2xz+wWs8zpoOpo5YJC2KEsURU+V4bDsnaNEgBa2qUMKb+xnYDK+rQbqYpu3lx7DTJdSU9xeiHKTgMqC8o0aZCDzfx6tXO1t4UarZnNI7Vr/vPrsbW+dLYfRy4ei8RhfIOM209pfUj5GUbWu8f95AcXsNaqNcSUxGOIQ5VA29np12DtRtgNxPc2puwPfTjR4Xo/wGv6htBm/GxiUl62jSe7I2wBN8vWJK6JvDmzhEYknxwHo/luPNXKXfer5f5orwWriE1pCBz8Knlk2V+jAPwyITvLWgE7Vd+KCpPLvLbqo8Ok37twoNy8wEfisf8Z4Bko4Q/3W63M/CeCcP5pP+4gpqFi3NGsiy5BJVTbymEsBViT+HI+4Uw4QLHgn94y2byohMzTsI1Wzj04hlgYgpEJiGvwU+v60FGP7j3IbIeXNEjkLkSETOVkQONskvnKswZtXqK7cFE8wUubUDZ35zP8R2zKBUrps7lbkjzR0dypzp2ytjdRQ9Vy53Q2FwxVO9fKCEeqrFU/jlAYMBht9XzyGwRrFI6tj5ueOJSr4WCqg1BDqnK5uZxjSVFCKk9vgo/e3r7VtcQHbgp2Kexr5efE2/vLnSX+Kgpi5i8JpSPbStk454fdXwaF7q5okBL/JVkEIxRjEgV0nIt3xQJaqMHmSjL8It9C2/TISUTpVTRFS14H3Vzjym1roxWbfzfOwzhdsSD2ZaL5agksAf7iCbMwzZeXqFA/CxErOCfE5xoDF9PdSTB3iz5pnD2Zdf+qvN6df7T34o6binfaZ9oRmarX2rPdH62ok21Pydf/T7+sf6J/q/uw93P939XJrevVP6PNTWPrut/wBGpsyh</latexit>

(1 − βω,ϑ(st+1))1wt=wt+1 + βω,ϑ(st+1)πΩ(ωt+1|st+1))

<latexit sha1_base64="53XkGTOCem02YLI1B1Wo5jENUlw=">ALa3iclVZb+NEFPYut60LS5d9AEPR1R7G0a2VwEWiloF4QUkKu2ZNdKU6sieOkZn3DM26pXD/wF3njH/DCf+DMjG9Js93FUpwzZ87lO9+cmfE8CXzKDOPvO3fevud9+7t6Puv/B/Q/3Hnx0RuMsdb2xGwdx+mJOqBf4kTdmPgu8F0nqkXAeM/nL3/k8vJT6cfSMXSXeNCSryF/6LmGoch7s/Nk5Atv7PfMvwM5Hvae9XzXaI3oPTjTavRbiWNoVIevSJhSMAu1I6d+LNHMACSruyQ/OHkNjv3GCns9DwGDWdrjQ5bzABH7Hw+z38quI5kYFM/hKQx0LhWLyY2zUInZwOzmOW2Hy3ZVSFhzHJ2aBapw6pY/SgnXK+JYpcXM7ISnzSQAcVtEalkGIMJTWGDwxC+wSqHAGjf8E1+wcjPGaeWJs2qniVPFRuhGAx1K7AxOUo06bEBr9Ohtx6G3wtx+BPYxF9WO9jNWJdRFVWM9mP+eqwP0B+r1KRnr8SH/xcIWIoi1aVjCHDtBdh0kSEnWwqo8KeoPzALgV6aOwbSfloF4Rc1iSQNvQqT6GRa6qfOmM0rWY4h1IDjQqjpIJfOICqBQUy2i2aLh1rlQftYpS1IUbQTDhcpwxa3CDcrq630Xc5/IPXeJy1HTApgpG8VJxUJLGqHodtJwUtWqygVR1KeHM/TX1tDdJD10nv3QYJrqUmuL2Qko3CaesJ9+oQMbR9f+5tfJqbQs7zNrNIbVr/fPqvbe9dbZsRicfi84Tm/ENMm7fpc0m5XsUbWizfNxDcngNa6FeS0xNOIY4rBp4OzvdBqzZAruZ4NbehO25n270uDjKb/CL2nbwdmycqmS1czTLXkboEm+fiKJayJvn1lCI5LPDjiPx/K4c7OEO+8303xSXgvXkGhSkDn4qeWSID/GA/BIh+8NaAUd1n4oVp5c5LfVEB1mw8aFB+XmI34oHvPXCMlGCV/9fr83cp4Jw+Vs+LiGmvqrc0bSNL6EKqfaqRDCVoiDCkc+LIQJFzgW/MNbNpUXnTjJFy9g4detABMD4FAnP/N8/lt7UAw1/WbYisN0dkVYgsicjaisjCJvmFc+VHrJ595bJgkVFbuPAmd9cD7EcCyiV6+ZWbW5JcwvNrdrcKltrIzXUPVcud0thcYVf7xghOZqxd04px6D0UbfF49htEbxOiZ07Yn3tD14aBXG6GBVOeyc3mOxUJqdy9MQ6dvX2jb4gHbgpmKewr5XPi7P1lL2I3C72IuQGhdGIaCZvmfLO7gVeodka9hLgvycqboBiR0KPTXHwrFtBDTZXnOIP+RbatkdOQkqvwjla8jro5hxXbpubZGz53T3oyRjXuTKRMsABYD/CEhZ9iGwdXKBA39REruOcETzSGn6cqkmBulnxTOPuyb37V/+b06/0nP5R03FM+U75QNMVUvlWeKEPlRBkr7s4/6n31Y/UT9d/dh7uf7n4uTe/eKX0eKmvPbuc/Oq3MoQ=</latexit>
slide-13
SLIDE 13

The option Critic framework (cont.)

  • Theorem 1 (Intra-Option Policy Gradient Theorem)
  • Theorem 2 (Termination Gradient Theorem)

∂ρ ∂θ = X

s,ω

µΩ(s, ω|s0, ω0) X

a

∂πω,θ(a|s) ∂θ QU(s, w, a)

<latexit sha1_base64="HC4sNVRfbVi2JydH8zLnPgMR0W0=">AGvniclVRdb9MwFM3GWkb52uCRF4upasK6KeFDIKFKA4QED4gW6Dap7iwndVuzfBE726rMfxLxwr/hJmnatCtMRKp6fXzPve2LFDlwtpmr/X1m9sVKo3N2/Vbt+5e+/+1vaDQxHEkcO6TuAG0bFNBXO5z7qS5cdhxGjnu2yI/v0Xbp/dMYiwQP/m5yErO/Rkc+H3KESILK98av+CWH2I+ZnCdfm2+aX3TRpEYTtXRuMzCNjF1YaAmHlHPowirWh2H/OQJaiEajbBHL0iC5ZhJqnA0DpAOuzPEQCvSEKzk2LaT9yrFaIyw4B4K5wl6ihqh0XskUS2LHWSYO4P5UTlMk4SuWepiMhLQcwmKrfsg75hRJ0EhzSnLolaVKy2mTFsrKCzSA4iFXMGUGwIxL/JBnrlyt0SmYsFurz+sUtUG6OZeOptolake6ILIlZuqBjQOPjaA39xH+nIa1uv4RpspgVcw4W9nQf7Y2WsCHKfWc2Zzqg/8zEJyHWatzYmbiyi9gmZKX7K0YoNAfAb5rqUx9nk5MsL1TFEsNOZ+ZQMvSiz5Kp5fC6JAupBY7qYc5guYQVIkyf9EuKo5gpkw01PyUdvWCIRpQZWEJFXQL7S1ahkregNyGYZTVN1L5u9cwDsEaApicl8KTwqTZDEPkatNgYySK5A1K5WxU961E+T5hmGR5JxIaHOeI+rfYxS0XM10nGRpgLyQUfuve5UL1ZhLy6fjBxdODx/v3irz82Km0iSbnbs4DCQrR1z38wedDWwpsGONn3aZOsnHgRO7DFfOi4VomeZoewnaQfHZaqGY8FC6pzSEetB6FOPiX6SfX4VqgMyQMgp8vUYaWGQn1hJh4NmSmt04s76Xgqr1eLIev+gn3w1gy38kbDWMXyQCl3I04BFzpDuBgDoRB63IGVMwUMIXvwYmWMsjXw0On+5bz/ZfdJ7vHLyd2rGpPdIea7pmaS+1A+2D1ta6mlN5XaGV75XT6kF1WPWqQZ6vjblPNQWnurFH1+YPsE=</latexit>

µΩ(s, ω|s0, ω0) =

X

t=0

γtP(st = s, ωt = ω|s0, ω0)

<latexit sha1_base64="IltlN+mtDiZKwP18kd38cjPiVy4=">AHLniclVLb9NAEHYLCSW8Chy5jKi2DRUNg/BJVJ5SXBApEDaStnU2ribdFW/8K5bKnd/ERf+ChyQACGu/AzGdpw4aBgKcrs7Hwz3y7Y/dDlwtpml8XFs+crVTPLZ2vXbh46fKV5avXNkUQRw7rOIEbRNt9KpjLfdaRXLpsO4wY9fou2+rvP0n3tw5YJHjgv5VHIet5dOjzAXeoRJd9tfK0/hIexfzAyDJm+aj5mtdNKnRhLYuGseZ2bZNXRjQJEPqeRSIqtVJyHduQtoNCQefW8nRO4xSRWJ9gLQcXfsMWBOGOBK7vX7yTOV+mgMRHAPwkmAnoN1SUi9uxEtiy1kxDuD+SRymnsJPK2pSJbHgvbEK5ZA/5DSLqJCSkeTUhZSWKi1HRVqQpRewi8lDrDLzIE9zuBDnqlyMsdGgcTdWn2Sp8iN1M0JdRhxl9COdGHLlhizRzQJPDbE2twH8io1a3X9BXaVuVXR43jVx/rjtdFCPHap58jmiB/+HyDh3MxKHdpmRq58ALOQPGV3TgMF/wj9q5bK2OfhtomybxTJUkEOxyLQMvWijtLpsTA27A6GFjuphrkHJi7MEmX6wioUVzBjJhpqcks7eoEQDcwytcQMugW3pyWDkjZIt2EYZfaNlP7qKYjNMgCLIplcl0KTQiRZ9GPL+aJgREkVjBqnytAp7tQO8njDsOzk0JZY5jD3qL+3UcByNqN2kpkG8kTGfw5WUpysIl5cvhq5d+r2/Hny5l+cOaNoJ53s3qWj+A8F54/oZELTAcUYMTm7FJFLeAxTqezlFXPNzB4aVgjY0UbPW17+TPZDZzY750XCpE1zJD2UvShyXqRqJBQups0+HrIumTz0mekn2uldQR8uDIf76EzFtGJNQT4sjrY2Q65WJ2L3XO2+vGcvCwl3A/jCXznbzQIHZBpB+O2CXR8yR7hEa1Ik4cgVnj+J5SfzC1FAEa7blk8bmnTXr7tr9jXsr649HcixpN7Sbmq5Z2gNtXutbWO5lQ+VD5VvlW+Vz9Wv1R/VH/moYsLI8x1beqp/voNlCZozA=</latexit>

13

∂ρ ∂ϑ = − X

s0,ω

µΩ(s0, ω|s1, ω0)∂βω,ϑ(s0) ∂ϑ AΩ(s0, w)

<latexit sha1_base64="Z67uGn+zGfgNs/9taorX3YSQI8c=">ALMXiclVZb+NEFPYut425bBceTmimLTNLK5CLRS0C4IEZCrtmTXSlOrInjJGZtx3jGLZXrv8QL/wTxsg8gxCt/gjMz8SWpt10sxTlz5ly+82ZGc/iwKfMF7cufva62+8+da9lvr2O+d3/vwftndJ0mrjdy18E6eTYj1Av8yBsxnwXeszjxSDgLvKez59/w+afnXkL9dfSEXcbeJCTLyF/4LmGoch60vmsfge39nPrnYGfD7uPujxrtEr0LJxrtXAnxDE0qkPXpIwJGDnatuO/enH0AeSLO2Q/OJkNlt5jOR2slqDhrOlRocGM8ARW81m2bc515EUbOqHEFcGtfq+dimaehkrG/m08z2owW7zCWMacYOzTx2BV1jC7U04Q3yIhbmbHJGE+CYDymvDTZI+iPAU5hg89nOsUiwxh3/2BesXI9xWnjirNqu4hSxEbpRQYcNdgYniUYd1qclevS216G3xNx+BPYxF9W29j1WJdR5UWM5mH+cqz30R+r1KRnd4MP/8RsBRFqgvHEODqC7DrIkOGwo8CeoPzBzgV6aOwbSfloE4RclCSQOvQiT6RK6qfOiM0LWY4h1IDlQqjJIJfOICiBQUy2smrLh1phQftYJStIUbQTDjcpgxq3CDcjq7X0Xc4/INbPM7qDpgUwUheCk4KklhRj8OaSUGLGitoVYS3tzv1gqkva6bTnbhMExzITX5zWUbhLNpxspwAZSP+fGysrVja3w7TeGlK71T0v3nNjdOwFZ1sJPpObMVXyNi8R6stynco2tBq8biH5PAKtkKpt/BS8o0RDovubSanU2E1a1h3E9zYmNCc+/FOg4tz/Bq9qK0Hr8fGqUJW20fTzJXAZpk28eRuCOy+oElNCL59IDTeCzPOjeNufN+Nc0n5Z1wBbEmBZmDH1kuCbJjP2OdPjKgFrQemHYuHJRX5VDdBhOqhceFBuPuQn4jF/DZFslPDV6/W6Q+eJMFxMBw9LqIm/XDGSJOsLKHKq7QIhNELsFziyQS5MuMCx4B9esYm85cQBJ+HqbTzxojlgYvApEJj5P3kuv6oFGP6ybkJkvToiq0BkSURWIyILm+QHzpUfsXL2pcuCeYFuZUDZ353PcRyzGj3Da3SnNLmltobpXm1qa1dlJD2XOb5a4pLK6wyi8XjFDdq7gbZ9RjMNzp+/whDLcoHhtdc1L3xOu5PBt0Z2/f6BnigeuCuRH2lc1z4uz9bs/Xbhp6EXMDQunYNGI2yfiudQMvV+2UejFxn5OlN0YxIqFHJ5n4suhjRrsknWCPyROaOseGQkpvQxnaMk5ortzXNk0N07Z4stJ5kdxyrzIlYkWaQBsDfzEeZ+gv0YXKJA3MRHrOCuCB5ND8yVSTB3C35unD2Sc/8tPf56Wf7j7e0HFP+VD5SNEU/lCeaQMlBNlpLitX1t/tP5s/aX+pr5Q/1b/kaZ372x8PlC2HvXf/wBEub7F</latexit>
slide-14
SLIDE 14

Results

  • Works great!
  • But:
  • Cannot directly leverage recent advances in gradient-based

policy optimization from MDPs

  • Solution:
  • DAC (The Double Actor-Critic Architecture)

14

slide-15
SLIDE 15

The Double Actor-Critic Architecture for Learning Options (DAC)

  • Reformulate the option framework as two parallel

augmented MDPs

  • A policy based method
  • All policy optimization algorithms can be used off the

shelf (advantage over Option Critic)

  • Apply an actor-critic algorithm on each augmented MDP

(double actor critic)

  • Show that one critic is enough.

15

slide-16
SLIDE 16

Two Augmented MDPs

  • The high-MDP:
  • The agent makes high-level decisions (i.e., option

selection) in according to and thus

  • ptimizes
  • The low-MDP:
  • The agent makes low-level decisions (i.e., action

selection) in according to and thus optimizes

M H

<latexit sha1_base64="rWksbwyzpRE9tnAXZSRyzi9URIA=">AIJ3iclVXfj9NGEDbQJjQUOCxL6Oeoti9cL5IXiJBFSV6AMiR5sDKZuzNr5NboV/4V3fcfLtf9MX/hVeKkGF4JH/hPE6jp2cy4GlKLOzM98+2sPY19LqRtfzp3/sIP7baF3/qXPr58pWrG9eu74oTw28iI/Sl5MqWA+D9lIcumzF3HCaD12fPpy9/z/eHLBE8Cv+WxzGbBHQe8hn3qESXe6016D4Bwl6l/BI9lf/Yf+ZKfrU6sPQFL0TbQ5d2xQW9MmcBgEFojpdEvO932ANJmTgL52MyIPmKSKJAcRmLi79FjQEAa4kgfTafaHyn0BSJ4AHEVYOZeS42JSAM3kwNH7WEhzN5rAoae5m86ajElSfCtftQLzlBfrOEehmJaSI59SGnpWrLRZEBaHgB+wgec4Vdagf2uJYfc63KaYydMhN3O90Kp8RG6nZFHRbcJQwTU7hyIJbsMZtEAZtjbR4CeZqbna75J3al3arscbmaYv3l2hpgPnZpFpn9BT/8P0TChalLHbm2Jlc/gPWUAnLc0EDJP0H/lqM0+yLctVH2nRIsF+RoKQKtUy/rKJOeCGvHWFouZNrWHigciFKovWFLShHUDMTPVN6cgsM0QPUVaWiGA6cHNVMqhpg3R7lVn38vpb52RsVtPwKJIptCl1KQUSZb9uLJZFIyoqYJRSyidned2UERb1mOmx25EscFR719TbKtILNop1srYECyPrOi5WVJ6tIkNZHo/CuTM/37zmwWm4im420nOnr+I3VGy+o9UVzW8oxojq8PKMQsMTWIE6U5il4JU2vWZxehVXp8Z1Hf+rgwnNpR+uDbh+j5+SF718Do2bpV258le9li5G5v2tq0fOG04C2PTWDxDd+Md2Y+8NGCh9HwqxNixYznJcq6ez1SHpILF1HtJ52yMZkgDJiaZ/s4p6KJnH2ZRgr9QgvbWMzIaCHEcTDEyf72J9b3c2bQ3TuXs/iTjYZxKFnpFoVnqg4wg/2jCPk+YJ/1jNKiXcOQK3gHFA5H4ae2gCM56y6eN3Vvbzu3tuzt3Nh8Wshx0fjF+NUwDce4ZzwHhtDY2R4rX9ab1vW/+137T/bX9ofyxCz59b5NwVp725y/7w8NQ</latexit>

M L

<latexit sha1_base64="d4JL1XfSgIL42l+BtxUPbm0+y0=">AIJ3iclVbj9NGFDaXJhAKLPSRlyNWUWw2u7JbEH2JxEWVqFTUbNsSJmsNfFOsiN8wzPeZeWdf8NL/wovSLSqymP/SY/HcexkDQuWopw5c853vPNGXsa+1xI2/54eKly9+02leudq59e/3GzY1bt/dElCYeG3mRHyUvp1Qwn4dsJLn02cs4YTSY+uzF9NXTfP/FEUsEj8I/5EnMJgGdh3zGPSrR5d5qDbrPgbDXKT8Ckv3ef9z/zR9avVhaIreqTaHrm0KC/pkToOAlGdLon5/j0YAE3mJKBv3IzIQyapIslhBCbuLj0WNIQBruThdJr9pHIfTYEIHkBcBZi51JjItLAzeTAUfsZ4eFMnqiCxn4mtx2VuPJUuHYf6iUnyG+WUC8jMU0kpz7ktFRtuSgyA0v4ADBY6wS+3AHtfyY65VOYuxW2bibqdb4ZTYSN2uqMOCu4RhYgpXDsSPWaTKGBzrM1DIL/mZqdr/oxdabcqe1yuplh/ubYGmI9dmkVmf8EP/4+QcGHqUseurcnVD2A9pYAcNzRQ8k/Qv+Uozb4Id2UfbcEywU5XopA69TLOsqkp8LadUcYWu7kGhYeqFyIkmh9YQvKEdTMRE9VUzoywzRQ5SVJSKYDmyvSgY1bZBuz7Lq7Hs5/a1zMvbqCVgUyRS6lJqUIsmyH1c2i4IRNVUwagmls/O8czso4i3LcbNjV2KZ48KjPt9GmVawWbSTrTVQAFlfebGy8mQVCdL6aBTelen59M1rHpyGq+hmIz13+ip+QcXmO1pd0fyGYoyoDi/PKDQ8hRWoc4VZCl5p02sWp1dxdWpc1/E/O5jQXPrx2oDr9/gZedFbB69j41Zpd57vZ78od2PT3rH1A2cNZ2FsGotn6G58IAeRlwYslJ5PhRg7diwnWc7V85nqkFSwmHqv6JyN0QxpwMQk0985BV30HMAsSvAXStDekZGAyFOgilG5q83sb6XO5v2xqmc/TjJeBinkoVeUWiW+iAjyD+acMAT5kn/BA3qJRy5gndI8UAkflo7KIKz3vJZY+/7HeHnQe79zcfPVnIcW4Y9w1TMxHhqPjGfG0BgZXut613r9bf7T/b79v/tP8tQi9eWOR8Z6w87f/+BwHmw1Q=</latexit>

π, {βω}

<latexit sha1_base64="NBRGkIshMg8+ET09kUVEXbgqHz0=">AIQHiclVNb9tGEGXSVkrVjzjNsZdBDUFkrRhk0iK5CEhSBGiBpXbygmglYkVvZIX4Ve4SzvGen9aL/kJvfXcSw8Jil576nApipTMxi0BQbOzM2/evJ0l52nIhXTd365df+/9DzrdGx/2Pvr4k09v7tz67FAkeRawSZCESfZ8TgULecwmksuQPU8zRqN5yJ7NX3xT7D87ZngSfyzPE/ZLKLmC94QCW6/Fudw/5TIOxlzk+BqJ+Gj4Y/2mJInSGMbTG4MObYd23hwJAsaRILrXJyk/+hJGQLMliegrXxF5wiTVJDtJwMbdtceBljDAlTyZz9UTXfhoDkTwCNI6wC68jp4SkUe+kiNPHynC4U81yWNIyXveDrz5YXw3SE0S86Q3yKjgSIpzSnIRS0dGO5KjICAy/gGMFTrFL48Aet/JTblS5jHFQZeJur1/jVNhI3a2pw4q7hHFmC1+OxJo9ZpMkYkuszWMgPxRmr29/h10Zt656XK/mWH+9dkaYj13aZeZwxQ/T5FwaZpSZ75ryDUPYDulhJy2NFDxz9C/52nDvgz3XZT9oAIrBDlbi0Cb1Ks62qYXwjnwJxha7RQalh6oXYiSGX1hD6oRNMzEQNdTOrGrDFAlI0lItge3NmUDBraIN2B4zTZDwr6e1dkHDYTsCiSKXWpNKlEklU/vmwXBSMaqmDUGspkF3lXdlDGO47nqzNfYpmz0qPf3UaVrJZtaO2GiBnP95sVR1spEeXM0Su/G9Pz7zWsfnJar6KuJmTtzFf9DxfY7Wl/R4oZijKgPr8goNbyADagrhVkLXmszaBdnUHP1Gly38d85mNBe+tHWgJv3+CV50dsEb2LjVmX3+k+P1Pe6hwgYoTbfRkT7O7vuvmseuGx4K2PXWj1jf+dXcpwEecRiGYRUiKnpnKmiaCkGdXLCUBi/ok3RjGnExEyZD6CGPnqOYZFk+IslG8zQ9FIiPNojpHFe09s7xXOtr1pLhcPZorHaS5ZHJSFnkIMoHiawrHPGOBDM/RoEHGkSsEJxRPSuI3t4cieNstXzYO7+579/a/Pvhq9+HjlRw3rM+tLyzb8qz71kPrW2tsTayg80vn986bztvu6+4f3T+7f5Wh16+tcm5bG0/3738AetHNYg=</latexit>

M H

<latexit sha1_base64="rWksbwyzpRE9tnAXZSRyzi9URIA=">AIJ3iclVXfj9NGEDbQJjQUOCxL6Oeoti9cL5IXiJBFSV6AMiR5sDKZuzNr5NboV/4V3fcfLtf9MX/hVeKkGF4JH/hPE6jp2cy4GlKLOzM98+2sPY19LqRtfzp3/sIP7baF3/qXPr58pWrG9eu74oTw28iI/Sl5MqWA+D9lIcumzF3HCaD12fPpy9/z/eHLBE8Cv+WxzGbBHQe8hn3qESXe6016D4Bwl6l/BI9lf/Yf+ZKfrU6sPQFL0TbQ5d2xQW9MmcBgEFojpdEvO932ANJmTgL52MyIPmKSKJAcRmLi79FjQEAa4kgfTafaHyn0BSJ4AHEVYOZeS42JSAM3kwNH7WEhzN5rAoae5m86ajElSfCtftQLzlBfrOEehmJaSI59SGnpWrLRZEBaHgB+wgec4Vdagf2uJYfc63KaYydMhN3O90Kp8RG6nZFHRbcJQwTU7hyIJbsMZtEAZtjbR4CeZqbna75J3al3arscbmaYv3l2hpgPnZpFpn9BT/8P0TChalLHbm2Jlc/gPWUAnLc0EDJP0H/lqM0+yLctVH2nRIsF+RoKQKtUy/rKJOeCGvHWFouZNrWHigciFKovWFLShHUDMTPVN6cgsM0QPUVaWiGA6cHNVMqhpg3R7lVn38vpb52RsVtPwKJIptCl1KQUSZb9uLJZFIyoqYJRSyidned2UERb1mOmx25EscFR719TbKtILNop1srYECyPrOi5WVJ6tIkNZHo/CuTM/37zmwWm4im420nOnr+I3VGy+o9UVzW8oxojq8PKMQsMTWIE6U5il4JU2vWZxehVXp8Z1Hf+rgwnNpR+uDbh+j5+SF718Do2bpV258le9li5G5v2tq0fOG04C2PTWDxDd+Md2Y+8NGCh9HwqxNixYznJcq6ez1SHpILF1HtJ52yMZkgDJiaZ/s4p6KJnH2ZRgr9QgvbWMzIaCHEcTDEyf72J9b3c2bQ3TuXs/iTjYZxKFnpFoVnqg4wg/2jCPk+YJ/1jNKiXcOQK3gHFA5H4ae2gCM56y6eN3Vvbzu3tuzt3Nh8Wshx0fjF+NUwDce4ZzwHhtDY2R4rX9ab1vW/+137T/bX9ofyxCz59b5NwVp725y/7w8NQ</latexit>

π, {βω}

<latexit sha1_base64="NBRGkIshMg8+ET09kUVEXbgqHz0=">AIQHiclVNb9tGEGXSVkrVjzjNsZdBDUFkrRhk0iK5CEhSBGiBpXbygmglYkVvZIX4Ve4SzvGen9aL/kJvfXcSw8Jil576nApipTMxi0BQbOzM2/evJ0l52nIhXTd365df+/9DzrdGx/2Pvr4k09v7tz67FAkeRawSZCESfZ8TgULecwmksuQPU8zRqN5yJ7NX3xT7D87ZngSfyzPE/ZLKLmC94QCW6/Fudw/5TIOxlzk+BqJ+Gj4Y/2mJInSGMbTG4MObYd23hwJAsaRILrXJyk/+hJGQLMliegrXxF5wiTVJDtJwMbdtceBljDAlTyZz9UTXfhoDkTwCNI6wC68jp4SkUe+kiNPHynC4U81yWNIyXveDrz5YXw3SE0S86Q3yKjgSIpzSnIRS0dGO5KjICAy/gGMFTrFL48Aet/JTblS5jHFQZeJur1/jVNhI3a2pw4q7hHFmC1+OxJo9ZpMkYkuszWMgPxRmr29/h10Zt656XK/mWH+9dkaYj13aZeZwxQ/T5FwaZpSZ75ryDUPYDulhJy2NFDxz9C/52nDvgz3XZT9oAIrBDlbi0Cb1Ks62qYXwjnwJxha7RQalh6oXYiSGX1hD6oRNMzEQNdTOrGrDFAlI0lItge3NmUDBraIN2B4zTZDwr6e1dkHDYTsCiSKXWpNKlEklU/vmwXBSMaqmDUGspkF3lXdlDGO47nqzNfYpmz0qPf3UaVrJZtaO2GiBnP95sVR1spEeXM0Su/G9Pz7zWsfnJar6KuJmTtzFf9DxfY7Wl/R4oZijKgPr8goNbyADagrhVkLXmszaBdnUHP1Gly38d85mNBe+tHWgJv3+CV50dsEb2LjVmX3+k+P1Pe6hwgYoTbfRkT7O7vuvmseuGx4K2PXWj1jf+dXcpwEecRiGYRUiKnpnKmiaCkGdXLCUBi/ok3RjGnExEyZD6CGPnqOYZFk+IslG8zQ9FIiPNojpHFe09s7xXOtr1pLhcPZorHaS5ZHJSFnkIMoHiawrHPGOBDM/RoEHGkSsEJxRPSuI3t4cieNstXzYO7+579/a/Pvhq9+HjlRw3rM+tLyzb8qz71kPrW2tsTayg80vn986bztvu6+4f3T+7f5Wh16+tcm5bG0/3738AetHNYg=</latexit>

{πω}

<latexit sha1_base64="sfm9ZA1RE/rITtCq8ztcQK2lgc=">AIUniclVbj9tEFPa2kE1DgS089uWIVRSbTVc2F5WXSG0REkhUZIFsK2Wy1sQ7SUb1rZ7xblez8xuREC/8EF54AI7HcexkTRcsRTlz5pzvfOeb2zwNuZCu+/venbvNvZ797rvXf/Q8+PHjw0alI8ixgkyAJk+zlnAoW8phNJche5lmjEbzkL2Yv/q6mH9xwTLBk/hneZWyWUSXMV/wgEp0+Q86q/5zIOx1zi+AqJ+GT4c/2mJInSGMbTG4NubYd23hwJAsaRILrXJyk/+xRGQLMliegbXxG5YpJqkq0SsHF243GgJQxwJFfzufpGFz6aAxE8grQOsAuvo6dE5JGv5MjTZ4rweCGvdEnjTMlHns58eS18dwjNkjPkt8hoEhKM8lpCAUt3Riui4zAwAs4R/CUa+zSOLDHnfyUG1VuYpxUmTjb69c4FTZSd2vqsOYuYZzZwpcjsWGP2SJ2BJr8xjID4XZ69vfYVfGraseN6M51t+MnRHmY5d2mTlc8P/CyRcmqbUpe8acs0F2E0pIactDVT8M/QfedqwL8N9F2U/qcAKQS43ItAm9aqOtum1cE78CYZWM4WGpQdqF6JkRl84gmoLGmZioOtdOrGrDFAlK0hItgePNqWDBraIN2B4zTZDwr6R7dknDYTsCiSKXWpNKlEklU/vmwXBSMaqmDUBspkF3m3dlDGO47nq0tfYpnL0qPf3kaVrJZt6N2GiBnP95sFS1spEeXNrlN6t3fPvJ6947QcRV9NzL4zR/E/VGw/o/URLU4oxoh68YqMUsNr2IK6VZiN4LU2g3ZxBjVXr8F1F/+tGxPaSz/d2eDmHr8hL3qb4E1snKrsXv/5mfq+fAowRG1fR/hGENW8r4j2Dw7dY9d8cNPw1sahtf7G/sGv5DwJ8ojFMgipEFPTeVMFW0FIcMCuWApDV7RJZuiGdOIiZkyT6KGPnrOYZFk+IslG8zQ9FIiKtojpHFTSh25wpn29w0l4uvZorHaS5ZHJSFnkIMoHifYVznrFAhldo0CDjyBWCFcW1k/gK91AEb7flm8bpZ8fe58dfnx+OTZWo6u9dD6xLItz3psPbG+tcbWxAo6v3T+6PzV+Xv/t/0/u3vdu2Xonb1zsfW1te9/w8X29Hp</latexit>

M L

<latexit sha1_base64="d4JL1XfSgIL42l+BtxUPbm0+y0=">AIJ3iclVbj9NGFDaXJhAKLPSRlyNWUWw2u7JbEH2JxEWVqFTUbNsSJmsNfFOsiN8wzPeZeWdf8NL/wovSLSqymP/SY/HcexkDQuWopw5c853vPNGXsa+1xI2/54eKly9+02leudq59e/3GzY1bt/dElCYeG3mRHyUvp1Qwn4dsJLn02cs4YTSY+uzF9NXTfP/FEUsEj8I/5EnMJgGdh3zGPSrR5d5qDbrPgbDXKT8Ckv3ef9z/zR9avVhaIreqTaHrm0KC/pkToOAlGdLon5/j0YAE3mJKBv3IzIQyapIslhBCbuLj0WNIQBruThdJr9pHIfTYEIHkBcBZi51JjItLAzeTAUfsZ4eFMnqiCxn4mtx2VuPJUuHYf6iUnyG+WUC8jMU0kpz7ktFRtuSgyA0v4ADBY6wS+3AHtfyY65VOYuxW2bibqdb4ZTYSN2uqMOCu4RhYgpXDsSPWaTKGBzrM1DIL/mZqdr/oxdabcqe1yuplh/ubYGmI9dmkVmf8EP/4+QcGHqUseurcnVD2A9pYAcNzRQ8k/Qv+Uozb4Id2UfbcEywU5XopA69TLOsqkp8LadUcYWu7kGhYeqFyIkmh9YQvKEdTMRE9VUzoywzRQ5SVJSKYDmyvSgY1bZBuz7Lq7Hs5/a1zMvbqCVgUyRS6lJqUIsmyH1c2i4IRNVUwagmls/O8czso4i3LcbNjV2KZ48KjPt9GmVawWbSTrTVQAFlfebGy8mQVCdL6aBTelen59M1rHpyGq+hmIz13+ip+QcXmO1pd0fyGYoyoDi/PKDQ8hRWoc4VZCl5p02sWp1dxdWpc1/E/O5jQXPrx2oDr9/gZedFbB69j41Zpd57vZ78od2PT3rH1A2cNZ2FsGotn6G58IAeRlwYslJ5PhRg7diwnWc7V85nqkFSwmHqv6JyN0QxpwMQk0985BV30HMAsSvAXStDekZGAyFOgilG5q83sb6XO5v2xqmc/TjJeBinkoVeUWiW+iAjyD+acMAT5kn/BA3qJRy5gndI8UAkflo7KIKz3vJZY+/7HeHnQe79zcfPVnIcW4Y9w1TMxHhqPjGfG0BgZXut613r9bf7T/b79v/tP8tQi9eWOR8Z6w87f/+BwHmw1Q=</latexit>

{πω}

<latexit sha1_base64="sfm9ZA1RE/rITtCq8ztcQK2lgc=">AIUniclVbj9tEFPa2kE1DgS089uWIVRSbTVc2F5WXSG0REkhUZIFsK2Wy1sQ7SUb1rZ7xblez8xuREC/8EF54AI7HcexkTRcsRTlz5pzvfOeb2zwNuZCu+/venbvNvZ797rvXf/Q8+PHjw0alI8ixgkyAJk+zlnAoW8phNJche5lmjEbzkL2Yv/q6mH9xwTLBk/hneZWyWUSXMV/wgEp0+Q86q/5zIOx1zi+AqJ+GT4c/2mJInSGMbTG4NubYd23hwJAsaRILrXJyk/+xRGQLMliegbXxG5YpJqkq0SsHF243GgJQxwJFfzufpGFz6aAxE8grQOsAuvo6dE5JGv5MjTZ4rweCGvdEnjTMlHns58eS18dwjNkjPkt8hoEhKM8lpCAUt3Riui4zAwAs4R/CUa+zSOLDHnfyUG1VuYpxUmTjb69c4FTZSd2vqsOYuYZzZwpcjsWGP2SJ2BJr8xjID4XZ69vfYVfGraseN6M51t+MnRHmY5d2mTlc8P/CyRcmqbUpe8acs0F2E0pIactDVT8M/QfedqwL8N9F2U/qcAKQS43ItAm9aqOtum1cE78CYZWM4WGpQdqF6JkRl84gmoLGmZioOtdOrGrDFAlK0hItgePNqWDBraIN2B4zTZDwr6R7dknDYTsCiSKXWpNKlEklU/vmwXBSMaqmDUBspkF3m3dlDGO47nq0tfYpnL0qPf3kaVrJZt6N2GiBnP95sFS1spEeXNrlN6t3fPvJ6947QcRV9NzL4zR/E/VGw/o/URLU4oxoh68YqMUsNr2IK6VZiN4LU2g3ZxBjVXr8F1F/+tGxPaSz/d2eDmHr8hL3qb4E1snKrsXv/5mfq+fAowRG1fR/hGENW8r4j2Dw7dY9d8cNPw1sahtf7G/sGv5DwJ8ojFMgipEFPTeVMFW0FIcMCuWApDV7RJZuiGdOIiZkyT6KGPnrOYZFk+IslG8zQ9FIiKtojpHFTSh25wpn29w0l4uvZorHaS5ZHJSFnkIMoHifYVznrFAhldo0CDjyBWCFcW1k/gK91AEb7flm8bpZ8fe58dfnx+OTZWo6u9dD6xLItz3psPbG+tcbWxAo6v3T+6PzV+Xv/t/0/u3vdu2Xonb1zsfW1te9/w8X29Hp</latexit>

16

slide-17
SLIDE 17

High MDP

  • Define a dummy option # and
  • Interpret a state-option pair as new state and an option as a new action.
  • Define Markov Policy

Ω+ = Ω ∪ {#}

<latexit sha1_base64="Bn4wZqgv+f97oeptI1slRiSIvcA=">AIcXiclVbj9tEFHZb2IRw2wIvCIGOuopiN+nKLiB4idSCkECiahbItlKcWBPvJGvVNzjXVaz8z/40/wQt/gDMzcexkTRcsRTnzbl85vbMo8jxl3zt373x5kGn+1bv7Xfefe/9w/sfnLKsLEI6DbM4K14uCaNxlNIpj3hMX+YFJckypi+Wr75V8y8uaMGiLP2FX+V0npB1Gq2ikHCEgvsHv/efgU9/LaML8MXPo6ejn2w2Is4IJjYbXGtzErg2c2Dkr0mSEPBlr+/n0eIhjIEUaz8hvwXC5+eUE+kX5xnYOLtFHGhxAxzx8+VSfCcVRkrwWZRAXjvYCnXkzGdlEg+9uRC+FG64lfS0FgI/siTRcCvWeCOoFlyjvxWBQmFn5OCRyQGRUs2hpsiY9DpGZxh8jyS2KUGsMe9+DzSqtzMcVJF4myvX+epciN1t6YOG+4cJoXNAj5mW/Y7WcJXWPtKAX/uTJ7fsH7ErDsupxO1pi/e3YGWM8dmbyNGH/5fIGFj6lKXgavJNRdgP8SknLU0UPEvEB96UrM37oGLsp9UyZQgl1sRSJN6VUfa5Jo5J8EUXasZpaFBoIYwS6H1hSFUW1AzYwNZ79KpXUWwAWbZGWIG24NHu5JBQxukO3CcJvuBoj+8JeK0GYBFkYzRpdKkEolX/QS8XRT0aKiCXtUOlrF3dqB8XcLxCXAcylwaRr2+jCjNsNu2IvQZMIud/HixRraz0k7K5NQy6s3v+/eS1b5yWoxiIqd53+ij+h4rtZ7Q+ouqEog+rF09FGA2vYSfVrcJsBa+1GbSLM6i5eg2u+/lfuzGhvfTvQ2u7/Eb8iLaTN7MjVOV3es/W4gfzVOALmL3OtJvhGheWIiY2ouhkuC5uerCMlexR74MDo/cY1d/cNPwNsaRtfkmweEf/lkWlglNeRgTxmaem/O5UF2HMcVqJaM5CV+RNZ2hmZKEsrnQL6aEPiJnsMoK/KUcNqMECRh7CpZoqe6KNn+nALb5mYlX309F1Gal5ymoSm0KmPgGajnF86igoY8vkKDhEWEXCE8J7i0HB/pHorg7bd80zh9fOx9fvzlyRdHT7ZyNG1PrEeWLblWV9ZT6zvrYk1tcKDvzofdT7tfNb5u/txF7oPjOvdO5uYD62drzv8B2bu2PU=</latexit>

17

slide-18
SLIDE 18

Low MDP

  • Interpret state-option pair as state and leave actions unchanged.
  • Define Markov policy:

18

slide-19
SLIDE 19

How to sample from these MDPs?

19

  • Consider trajectories with non zero probability:
  • Where
  • Define functions:

ΩL = {τ L|p(τ L|πL, M L) > 0}

<latexit sha1_base64="O4Nn9sVeSVUn6nCaprc+pzvekmA=">AIpHiclVbj9tEFHZbaEK4beGRlwOrEJtNVzYUwUtQC0KAlFUTSraV4sSaeCdZq7hGe+ymviP8TN49wZsaOnazpgqUoZ871O9+cmVmlYcC4bf97/6Dt95+2Om+03v3vfc/+PDo0UfnLMkzn878JEyVyvCaBjEdMYDHtJXaUZJtArpy9XrH6T95RXNWJDEv/GblC4isomDdeATjirv0cM/+2fg0t/z4Apc8WL4bPiryYbEGsLEZIOtEiebTILhu6GRBEBt+j13TRYfgEjINnGjcgfnD5JeWkcLPLBEy07jQWtLgBrvjlaiV+LKSO5OCyIK0djCl1irmLsjT/CRUyFG8RrflNoGEvBHztF5vEt8+whNEsuEN86I75wU5LxgIQgYRWNZVlkBCo9gwtMngYFdqkU2ONBfBoVm7nmFaRaO316zxVboRu19ChxM5hkpnM4yO2Q4/RbhLRDdYOYnCfS7HXN3/BrpS6qHrcrVZYf7e2RhiPXZo6cljiw/8rBKxFVerasxW45gYchuiU85YGKvwZ6k+cQqHX7p6NtE+rZJKQ6x0JpAm9qlOYZMusqTdD18oiOdQaqFWYJVP8wglUI6iQsUFRT+nMrCLYALPsLTGD6cDjfcqgwQ3CHVhWE/1Awj+5I+K8GYBFEYzmpeKkIolX/Xi8nRT0aLCXrtUKlrG3dmB9rcsxPXHscy1pTvLmNKkyjKdsRBw3oRNb/PFi2tnCjfLmaGjt3vT8+8lrH5yWo+iJmZo7dRT/Q8X2M1ofUXlC0YfVmycjNIdb2Et1JzE7wmtuBu3kDGqsTgPrYf43Dia0l352MODqHr9FL2qbyZu50VTJvf7ZUoz1U4AuYv86Um+EaF5YSqOKL08kB8/1XefnqQw+RnNpHUurehRQ3EJqVqJ8c8ZDOFuOLfjOxmfIOzq2T231wW3BKYVjo/wm3tFf7kXi5xGNuR8SxuaOnfKFkET5IUEOaMp8V+TDZ2jGJOIsoVQj2wBfdRcwDrJ8BdzUNpmhCARYzfRCj3l3coObVLZpvnfP3tQgRxmnMa+7rQOg+BJyBfbLgIMurz8AYF4mcBYgX/kuA0cHzXe0iCc9jybeH8y1Pnq9Ovp0+On35f0tE1PjE+M0zDMb4xnho/GxNjZvidTzs/dSadafz7rj7ojvTrvfvlTEfG3tfd/kP6o7oyQ=</latexit>

ΩH = {τ H|p(τ H|πH, M H) > 0}

<latexit sha1_base64="F96PnrXlZq8SM8AVZH8zMHMYQ/c=">AIpHiclVbj9tEFHYLNGm4dAuPfTmwCrHZdGVzEX0JakGIReqCSXbSnFiTbyT7Ki+1TPeZTXxH+Nn8Ma/4czYjp2s6YKlKGfO9TvfnJlZJgHjwrb/vnP3vfc/uNfp3u9+NHnzw4ePjpGY+z1KdTPw7i9PWScBqwiE4FEwF9naSUhMuAvlq+UnZX13SlLM4+l1cJ3QeknXEVswnAlXew3t/9k/BpW8zdgmufDl8NvzN5ENiDWFs8sFGi2PNrkFQ3dNwpCAm/f6bsIWX8EISLp2Q/KHJ1xQXJ3fQiBhOtW40FLW6AK3GxXMqfc6UjGbichZDUDqbSWvnM5VnoSTFy8oV0WbQS13kBYyHFYydPbHhnj2EZsk54lulxJduQlLBSAKVt5YlkVGoNzOMfkCcuxS63AHvfiE6ZuZljUkWitdev81S5EbpdQ4cSu4BxanJPjPgWPUa7cUjXWJtF4L5QYq9v/opdaXVe9bhdLbH+dm2NMB67NIvIYkP/y8RcCHqUlercE1N2A/pEg5a2mgwp+i/sjJNfrC3bOR9kmVTBFytSWBNKFXdXKTbLg18aboWlkUh4UGahVmSTW/cATVCGpkfJDXUzo1qwg+wCw7S8xgOvB4lzJocINwB5bVRD9Q8I9uiThrBmBRBFPwUnFSkSqfjzRTgp6NFhBr20qHa3ibu2g8Lcsx5NXnsAyV4Umf3cbViBpmxH7jVQJL+58GS1c7mbpg1R6PQ7kzPv5+89sFpOYqenOq50fxP1RsP6P1EVUnFH14vXkqouBwAzupbiVmS3jNzaCdnEGN1Wlg3c/zsGE9tLP9gZc3+M36EVtM3kzN5oqudc/XcjnxVOALnL3OtJvhGxeWFqjiy+OFAcvirvOzxIVfIjm0nqirPpRQHEDiVmJ6s05GcLp4sSCH2x8hryDQ/vY1h/cFJxSODTKb+wd/OWex34W0kj4AeF85tiJmEtFlB9QRJBxmhD/DVnTGYoRCSmfS/3I5tBHzTms4hR/kQCtbUZIEnJ+HS7RU92tfN+mlG2WSZWT+aSRUkmaOQXhVZACIG9WLDOUupL4JrFIifMsQK/gXBaRD4rveQBGe/5ZvC2dfHzjfH302+PXz6Y0lH13hkfGYhmN8bzw1ToyxMTX8zuedXzrjzqT7Zfd592V3WrjevVPGfGbsfN3FP8qK6LU=</latexit>

Ω = {τ|p(τ|π, O, M) > 0}

<latexit sha1_base64="cqkB+LzTiuFrC+ok9q0vxExGHg=">AI2niclVZb+NEFPYusAnh1l0eTmimLTbGUDK/YlaBeEVCSqpkDaleLEmriT1Frf1jNuqSZ+4QGEeOWX8ca/4CdwZsaOndRswVKUM2fOd853Ps/FizQMGLftv+7df+PNtx50um/3n3vfc/2Hv46IwlebTiZ+ESfZiQRgNg5hOeMBD+iLNKIkWIT1fvPxazp9f0YwFSfwjv0npLCKrOFgGPuHo8h4+Lt/DC59lQdX4Iofhs+H35tsSKwhjE02WCtz7Nkms2DorkgUEXCLXt9Ng/knMAKSrdyI/OQJl19STgo3u0zAxNmNx4KWMARv1wsxDeF9JEcXBZEkNYBpvRaxdRleQJPnKuXCDeMlvCk1jLvhjp8g8vmaePYRmyRnyW2bEF25KMh6QECStojEsi4xApWdwgcnToMAulQN73MGngVLldo7TComzvX6dp8qN1O2aOpTcOYwzk3l8xDbsEe0mEV1h7SAG90Savb75LXal3EXV42a0wPqbsTVCPHZpauSw5If/V0hYm6rUtWcrcs0XsAvRKactDVT8M/QfOIVir8M9G2U/rZJQa43IpAm9apOYZI1s069CYZWM1JD7YHahVkypS8cQLUEFTM2KOpVOjErBtglq0hZjAdeLwtGTS0QboDy2qyH0j6B3cgzpoALIpktC6VJpVIvOrH4+2iYERDFYzapFJoibuzAx1vWY4nrj2OZa61p3h9GxVMsynbETsN6ETW/9xYonqzhRvlzaWhvVur593XvCadmKnpiodae24n+o2L5H6y0qdyjGsPrlSYTWcA1bqe4UZiN4rc2gXZxBzdVpcN3N/9qFCe2ln+8scHWO35IXvc3kzdw4Vdm9/vFcfKevAgwR28eRuiNE8BSHlV8fiA1ONFnZ+nEryP06VrJMfySlhDampDl5Anlk9CcYKH37EFX9rQyHm0waFZIaUpb6ojBMyPSoi3t28f2uqB24ZTGvtG+Yy9vT/di8TPIxpzPySMTR075TMh5fVDirxzRlPivyQrOkUzJhFlM6Gu5gL6LmAZLhL+agvE2EIBFjN9ECI2V/bHdOtvmpjlfPp2JIE5zTmNfF1rmIfAE5D0PF0FGfR7eoEH8LECu4F8SXEMcvwZ6KIKz2/Jt4+zTQ+ezwyen+8/+6qUo2t8ZHxsmIZjfGE8M46MsTEx/M5p+j80vm163Z/7v7W/V2H3r9XYj40tp7uH/8AEWX8HA=</latexit>

τ = {S0, O0, S1, O1, ..., ST }

<latexit sha1_base64="L70y7GTe7j40/Ow/LEIf3/ZiNZg=">AJA3iclVZLj+NEPbuAgnhNbucEJcSoyg2k41sHoJL0C4IaVZiNBlmM7tSnFgdTydjrV+42zOMOpa48Fe4cAChvfInuPFvqO6OYydjdsBSkurq+q+tyPzNMwYNy2/75z95r7/Rar/Zevtd959b+/+gzOW5JlPx34SJtnzOWE0DGI65gEP6fM0oySah/TZ/MU3cv7ZJc1YkMRP+XVKpxFZxsEi8AlHl3e/9UH3CFz6Qx5cgitO+4/735usT6w+jEzWylz5Nkms6DvLkUEXCLTtdNg9nHMASLd2I/OgJl19QTgo3u0jAxNmNx4KGMARv5jPxbeF9JEcXBZEkFYBpvRaxcRleQJPnSKmXCDeMGvC01jJvhDp8g8vmKe3Yd6ySnyW2TEF25KMh6QECStojZcFxmCSs/gHJOnQYFdKgf2uINPA6XKzRwnJRJnO90qT5kbqdsVdVhz5zDKTObxIduwR7SbRHSJtYMY3GNpdrmE+xKuYuyx81ojvU3Y2uIeOzS1Mj+mh/+XiJhbapSV56tyNVfwC5Ep5w0NFDyz9B/4BSKvQ73bJT9pEwmBbnaiEDq1Ms6hUlWzDrxhazkgNtQcqF2bJlL5wAOUSVMxYr6hW6dgsEayHWbaGmMF04OG2ZFDTBun2LKvOvifpH9yCOKsDsCiS0bqUmpQi8bIfjzeLghE1VTBqk0qhJe7WDnS8ZTmeuPI4lrnSnuLVbZQwzWbdjthpQCey/ufGEuWbLdwory8N7d1aPf+85oXTsNW9MRYrTu1Ff9DxeY9Wm1RuUMxhlUvTyK0hivYSnWrMBvBK216zeL0Kq5Ojetu/lcuTGgu/Xhngatz/Ia86K0nr+fGqdLudI9m4jt9FWCI2D6O1B0h6geW8qjiswOpwbE+6/w8leD9alpO6jthBampDV1DHlk+CcUxn5HFnxlQy3p4QaHZomUpryqDhEwOywhKqeMPpUH4rH8OkWt0cKvwWDQP/WeYpy3t28PbPXATcNZG/vG+hl5e3+54mfRzTmfkgYmzh2yqdCvgc/pFg3ZzQl/guypBM0YxJRNhXqDi+gi5zWCQZfmIOyltHCBIxdh3NMVLqwHbnpLNpbpLzxZdTEcRpzmns60KLPASegPxDAOdBRn0eXqNB/CxAruBfEFxsHP82dFAEZ7flm8bZJwPn08HnJ5/tP/p6LUfb+ND4yDANx/jCeGQcGiNjbPitn1q/tH5r/d7+uf1r+4/2Sx16984a876x9bT/Ac3vAdW</latexit>

f H : Ω → ΩH

<latexit sha1_base64="Mv/mEaFdmUm6KazaZ2RQjfXu/4=">AJXiclVZbj9tEFHbLJWm4dAuPfTliFcVm08iGVqBWQS0IKUisNs20pxYk28TmLVNzjDauJ/wv/BVeKBCSDzxVzgzE8dO1nTBUpIzZ853znc+zyWzJPApM82/bt1+6+13m07Te/+D+8e3PvonMZ6nojNw7i9OWMUC/wI2/EfBZ4L5PUI+Es8F7MXn0j5l9cein14+g5u0q8SUgWkT/3XcLQ5dxrPGkfg+39mPmXYPOz7rPuDzrtEqMLQ5121tIcOqZODejaCxKGBOy81bYTf/op9IGkCzskPzncZkuPkdxOlzHoOLv1GFATBjhiy9mMf5sLH8nApn4ISRmgC6+Rj2ahQ5nfSufctuP5uwqVzSmnD2w8tRha+qYXaiWnC/eUpcbickZT4JQNDK8NkT7I9BQuMHni59ildGCPe/jEl6pcz3FaIHG21S7zFLmRulShw13BsNUpw7r0y17RNtx6C2wth+BfSLMVlv/DruS7rzocTuaYf3t2OgjHrvUFbK74Ye/l0hYmbLUyjElueoL2IeolOaBgr+KfqPrFyV+GOibKfFsmEIKutCKRKvaiT62RNjVNnhKHFjNBQeaB0YZU6gtHUCxByYx28nKVjvQCQTuYZWeIGXQLHuxKBhVtkG7HMKrsO4L+0Q2I8yoAiyIZpUuhSESK/pxWL0oGFRBaO2qSRa4G7sQMUbhuXwlcOwzEp58je3UcAUm07fK8Blcj4nxuLF282t8OsujSUd2f1/PvOq184NVvR4SO57uRW/A8V6/douUXFDsUYWr48gVAarmEn1Y3CbAUvtenUi9MpuVoVrv537gwob70s70FLs/xa/Kit5q8mhunCrvVPp7y79VgCF89ziSdwSvHljSI4tPj4QGJ+qsc7NEgA/LaTGp7oQ1JLoyVA1xZLk4Cd4+h0b8JUJlaSDLQ7NAilMcVUNEDAdlBCRVISfiRPxRHydodho4Vev1+ueOc9F4Hw6eLxlmvqLJSNpGq+gKOkcHJo9Uz5w3bA2xqG2eYbOwWv7Inaz0IuYGxBKx5aZsAkXb8oNvLxlZ9RLiPuKLwxmhEJPTrh8pbPoY2eC5jHKX4iBtJbRXASUnoVzjBSKEX354Szbm6csfmXE+5HSca8yFWF5lkALAbxlwEu/NRzWXCFBnFTH7mCuyS4HBn+sWihCNZ+y9eN8961ue9R6cPD59+vZGjqd3XPtF0zdK+0J5qA2ojTS38XPj18bvjdfNX5q/Nf9o/qlCb9/aYD7Wdp7m3/8AleEUVA=</latexit>

f L : Ω → ΩL

<latexit sha1_base64="WYeM8Y5Okr1AhJRO/HkZr2sfjXw=">AJt3iclVbtGEGXSyT15rSPfRnUETWikC2KXoBVCQtCqiADct15AQRJWJFUdI2vIW7tGpQ/MS+9K1/09ldUaRk1m4JSBzOzpk5c7gXzmKfMm6afz94+M673/qNFsfDhRx9/cvT40ysWpYnrjdzIj5JXM8I8n4beiFPue6/ixCPBzPdezt78LMZfXnsJo1H4gt/E3iQgy5AuqEs4upzHj/5sn4HtvU3pNdjZfd59zedYnRhaHOhtpDh1TZwZ07SUJAgJ23mrbMZ1+CX0gydIOyB9OZvOVx0luJ6sIdBzdeQyoCQN84qvZLPslFz6Sgs1oAHEZoAuvkY9tlgZOxvtWPs1sGi74Ta5oTDP+xMoTh2+Y3ahWnKC/BYJcTM7JgmnxAdBK68bov0QaZnMfkMc2xS+nAHg/wMZWq3M5xUSBxtNUu8xS5kbpZUoctdw7DRGcO7Mde0TbUeAtsTYNwT4XZqut/4pdSXde9Lh7mH93bPRzx2qStkd8sP79dIWJmy1NoxJbnqCziEqJTjmgYK/gn6T6xcslfhjomyXxTJhCDrnQikSr2ok+tkw4wLZ4ShxYjQUHmgdGWROoLJ1BMQcmMdfJylo70AsE6mGXvETPoFjzZlwq2iDdjmFU2XcE/ZN7EFdVABZFMkqXQpNCJF704/B6UTCiogpG7VJtMDd24GKNwzLydYOxzJr5cnvbqOAKTbdrKDBlQi438urKx4s7kdpNWpobx7s+fV179xKlZik42kvNOLsX/ULF+jZLVKxQjGHlyxMIpeEG9lLdK8xO8FKbTr04nZKrVeF6mP/OiQn1pZ8fTHC5j9+SF73V5NXcOFTYrfbZNDtVRwGZPvbkTwjsuqGJT2y+PREaHCu9jo3jQX4uBwWg+pM2ECsK0PVEFuWS/zsHe/MwN+NKGSdLDoVkghSmOqgECpoMSIpK8EuxI56Lv0sUGy386/V63UvnhQxcTAc/7KgmdLniJEmiNRQ1W+2CIdRS7Bc8skEuQ4QhuOANj9hEnXJyg1N0jTbueOEcsDBQBgRm9HfPFUe1JNaTE/vInTqHB2bPVNecNuwtsaxtr2GztFf9jxy08ALuesTxsaWGfNJuaO63t5y06ZFxP3DVl6YzRDEnhsksnvjhza6EGuUYK/kIP0VhEZCRi7CWYKYRh2PCWTc2Tvniu0lGwzjlXuiqQovUBx6B+IiBOU1QFf8GDeImFLmCuyK4QDh+6rRQBOuw5dvG1Vc96+veNxdPj5/9tJWjoX2ufaHpmqV9qz3TBtpQG2lu42njdcNtzJvfN53morlSoQ8fbDGfaXtX8+0/0kFAdA=</latexit>
slide-20
SLIDE 20

How to sample from these MDPs?

20

  • Lemma 1: is a bijection and
  • Lemma 2: is a bijection and
  • So, Sampling from is equivalent to sampling

from and

p(τ|π, O, M) = p(τ L|πL, M L), r(τ) = r(τ L)

<latexit sha1_base64="7FPq/Lr84Vi81yaBiSJLYsFNhtc=">AKCXiclVZLj+NEPYuryS8ZuHIgRKjKDaTjWweAiEF7YKQgpTRzDCb2ZXixOp4nKRZv3C3J4wcX7nwV7hwACGu/ANu/Buqu+PYyZiZxVKcnV9V97odnsU8ZN81/7t1/6eVXn2t0Wy9/sab7198OCdCxalieuN3MiPkmczwjyfht6IU+57z+LEI8HM957On8txp9eQmjUfiEX8feJCLkM6pSzi6nAeN9vHYHs/pPQK7Oy8+7j7nc6xOjCqc46a2meOqbODOjaCxIEBOy81bZjOv0Q+kCShR2QH53M5kuPk9xOlhHoOLr1GFATBvjEl7NZ9k0ufCQFm9EA4jJAF14jH9sDZyM9618mtk0nPrXNGYZvyhlScOXzPH7EK15AT5zRPiZnZMEk6JD4JWXncFOmDTM/gEpPHNMcupQN73MPHVKpyM8dZgcTRVrvMU+RG6mZJHTbcOZwmOnN4n23ZI9qOAm+BtWkI9okwW239W+xKuvOix+3TDOtvn40+4rFLXSG7G374f4WElSlLrRxTkqu+gH2ISjmuaDgn6D/yMolexXumCj7WZFMCLaikCq1Is6uU7WzDhzRhajAgNlQdKF2ZJpL5wBMUlMxYJy9n6UgvEKyDWXYeMYNuwcNdyaCiDdLtGEaVfUfQP7oDcVEFYFEko3QpNClE4kU/Dq8XBSMqmDUNpVEC9ydHah4w7CcbOVwLNSnvz2NgqYrNpJ9trQCUy/ufCyo3m9tBWp0ayrsze/575dVPnJql6GQjOe/kUnyBivVrtFyiYoViDCtfnkAoDdewk+pOYbaCl9p06sXplFytCtf9/LdOTKgv/Xhvgst9/Ia86K0mr+bGocJutY+n2VAdBRiS7W5H8ozIqhuW9Mji0yOhwYna69w0FuDclgMqjNhDbGuDFVDbFku8bMT3P2ODfjShErSwRaHZoEUpjiqBgiYDkqISCrCz8WOeCJu5yg2Wnjr9Xrdc+eJDJxPB19sqSZ0seQkSaIVFDVb7YIh1FLsFzyQS5DhCG4B8esYk65eQGp+gabdzxwkvAwkAZEJjR7z1XHNWSjLgNb2M0bL0oWFBaKgIDWsJDQ3n4NDsmfKCm4a1MQ61zXqHPxtX0ZuGnghd3C2NgyYz7JxGx0fS9v2SnzYuI+JwtvjGZIAo9NMvklk0MbPdh9lOAv5C9VURGAsaugxlGis7Y/phw1o2NUz7/fJLRME65F7q0Dz1gUcgPovgkiaos3+NBnETilzBXRJchw/nlogrXf8k3j4qOe9XHv07NPDh9tZGjob2nfaDpmqV9pj3SBtqpNtLcxk+NXxq/NX5v/tz8tflH808Vev/eBvOutnM1/oX3hxdCw=</latexit>

{π, O, M}

<latexit sha1_base64="gStaIG9qCdPpdl5xdr2Z/GvFe8k=">AKtHiclVZb9s2Fa7W6xdm6PezlYFhaXENqd8MAD+2GAd6gIMlSJwWiWKNl2dEqyZpIxQsU/cE97m3/ZoekKcmOmnQCLB8enst3Ph6SmqZRSJl/fvg4Tvf+Bzsd/cOPv7k0e7jT0/pMs/8YOwvo2X2akpoEIVJMGYhi4JXaRaQeBoFZ9PXP/H5s6sgo+Eyecmu0+AiJosknIc+YajyHu/83T0AN/gzD6/ALU76L/q/GbRPzD4cGbR3I8QjzKoCX13QeKYgFvqXTcNJ1/CEi2cGPyl1e47DJgpHSzyUYOFtpTGgxAxyxy+m0+LnkOpKDS8MY0trA4FqzPHdpHnsFG9rlpHDZM6uSwljUrAndpl57IZ6Vh+aKS8Q3zwjfuGmJGMhiYDKhvDdZIhiPAUZhg8DUusUiwxi3/NBSs3I5xrDxVu/WcVRshG7V0GNncFRZlCPDWmFHr3dZRwsMHeYgHvIRb1r/IJVCXWpaqxGU8xfjc0h+mOVhvTsr/Hh/xUClqJItfIsAa65ANsuMuR5SwEKf4b6fbsU6KW5ZyHtxyoYJ2RVkUCa0FWe0iA31Dz2xmiqZjiHUgO1CqNkgl/YB9WCAhntlXWXjg3lQXsYZWOIEQwbnmxSBg1uEG7PNJvoexz+/j0ep0HTIpgJC+KE0USU/V4rJ0UtGiwglZVKOHN/e6tQNqbpu0VK49hmpXUlHeXodwkmnU5xVYBMpD5PzdWoVa2dO82RpSu9E9b957Y3TshW9Yiz6TmzFt8jYvkfrLcp3KNrQevG4h+TwBjZC3UtMRXjNTa+dnF6N1W5g3Y5/Z2NCe+oXWw0uzvFb9K2GbwZG6eUrHcPJoUjrwI0KTaPI3FHFM0DS2hE8sk+5+BQnV+nLnvXqaT8o74QZSQwoyBz+yfBIVh3j6HZjwgwWNoKPKD0XlyUV+VY3QYTKqXhQbn7CT8RD/jpBslHC12Aw6J94L4XhfDL6voKahYtLRrJsuQKVU+8qhNAKcahwFKNSmHCBY8E/vGIzecuJA07CNbt4iUzwMQUiAwDf8IfH5VCzD85dyFyHl7RI5C5EhETisiB5vkV85VmLBq9o3LglmitzagTO/vR5iOWawVm6aO5W5I80dNHcqc0cXnbWVGdfL292zBpZ4LZgr4U9bf0cebv/uLOln8dBwvyIUHpuWym7KPjG8aOg1N2cBinxX5NFcI5iQuKAXhTio6uELmpwoZYZ/hC70DY9ChJTeh1P0ZLjpNtzXNk2d56z+XcXRZikOQsSXya5xGwJfAvOJiFGbZEdI0C8bMQsYJ/SfB0YPidpyMJ9nbJt4XTpwP72eDr46/2nv+4pmNH+1z7QjM0W/tWe6NtCNtrPkdu3PW+b1D9G90V/f1QJo+fLD2+UzbePTkP5jkI=</latexit>

{πH, M H}

<latexit sha1_base64="nfXPgStgWXAGefGWOGSvZ3SiCl8=">AK1niclVZb+NEFPYut9ZctguPvBxRbFpNrK5CIQUtAtCshVW7rpVqoba+I4qVnbMZ5xS+WaBxDild/G/+BH8GZGY/tpN7uYinOmTPn8p1vzsx4lkYhZb1z737r73+xptvbW3rb7/z7nsPdh6+f0JXeYHE38VrbLTGaFBFCbBhIUsCk7TLCDxLAqezZ5/y+efXQYZDVfJU3adBucxWSbhIvQJQ5X3cOvf3j64wc95eAlucTx4MvjRoANiDuDQoP0bIR56lkFNGLhLEscE3FLvuWk4/RhGQLKlG5NfvMJlFwEjpZtdrMDA2VpjQocZ4IhdzGbFdyXkRxcGsaQNgYG15rlmUvz2CvYyC6nhRsmC3ZdShjTgj2y8xjN9SzBtBOeY74FhnxCzclGQtJBxW2RpWSUYgwlOY/A0LFKocAaN/zTULByO8aR8sRZvdfEUbERutVAhwo7g8PMoB4b0Ro9erurOFhi7jAB94CLes/4HqsS6lLVWI9mL8emyP0xyoN6Tmo8OH/JQKWokh15VkCXHsBNl1kyLOAhT+DPV7dinQS3PQtqPVDBOyFVNAmlDV3lKg9xQ8iboKma4RxKDTQqjJIJfmEPVAsKZLRfNl06MZQH7WOUtSFGMGx4tE4ZtLhBuH3TbKPvc/h7L/E4aTtgUgQjeVGcKJKYqsdj3aSgRYsVtKpDCW/u9IKpL1p2l5x5TFMcyU15d1lKDeJpiqn2ChABjL/58Yq1MqWbpy3W0Nq17rnxTuvu3E6tqJXTETfia34Chm792izRfkORvaLB73kBzewFqolxJTE95w0+8mp9gtVtYN+Pf2ZjQnfrJRoOLc/wWvahtB2/Hxikl6739aeHIqwBNivXjSNwRfvAEhqRfLrHOTiQZ52fp9x5t5nmk/JOuIHUkILMwY8sn0TFAZ5+yZ8bUEr6Lj2Q1F5cpFfVWN0mI4bFx6Umx/zE/GAv46RbJTwNRwOB8feU2G4mI6/qFm4fKCkSxbXYHKqfcUQuiEOFI4inEpTLjAseAfXrGZvOXEASfhmj08ZI5YGIKRCYhT8FPr+qBRj+cu5C5Lw6IkchciQipxORg03yA+cqTFg9+8JlwQRzRW7jwJnfXA+xHOolOvmTm3uSHMHzZ3a3KlayM1XzChr1Yb6h5Ef1Q4qPB2dq2hJR64LdiVsKtVz6G387c7X/l5HCTMjwilZ7aVsvOCby0/CjBhToOU+M/JMjhDMSFxQM8L8VlWQg81uJSrDH9YndC2PQoSU3odz9CSF0I357iya+4sZ4svz4swSXMWJL5MtMgjYCvg3gwDzNsmugaBeJnIWIF/4Lg+cHwS1BHEuzNkm8LJ58M7U+Hnx9tv4m4qOLe1D7SPN0GztC+2xNtYOtYnmbx9vX2/tv27fqr/qv+h/ylN79+rfD7Q1h79r/8Axaed0A=</latexit>

{πL, M L}

<latexit sha1_base64="pNTVcLOnacanvDSg2NZpn1lWyU=">AK1niclVZb+NEFPYut9ZctguPvBxRbFpNrK5CIQUtAtCshVW7rpVqoba+I4qVnbMZ5xS+WaBxDild/G/+BH8GZGY/tpN7uYinOmTPn8p1vzsx4lkYhZb1z737r73+xptvbW3rb7/z7nsPdh6+f0JXeYHE38VrbLTGaFBFCbBhIUsCk7TLCDxLAqezZ5/y+efXQYZDVfJU3adBucxWSbhIvQJQ5X3cOvf3j64wc95eAlucTx4MvjRoANiDuDQoP0bIR56lkFNGLhLEscE3FLvuWk4/RhGQLKlG5NfvMJlFwEjpZtdrMDA2VpjQocZ4IhdzGbFdyXkRxcGsaQNgYG15rlmUvz2CvYyC6nhRsmC3ZdShjTgj2y8xjN9SzBtBOeY74FhnxCzclGQtJBxW2RpWSUYgwlOY/A0LFKocAaN/zTULByO8aR8sRZvdfEUbERutVAhwo7g8PMoB4b0Ro9erurOFhi7jAB94CLes/4HqsS6lLVWI9mL8emyP0xyoN6Tmo8OH/JQKWokh15VkCXHsBNl1kyLOAhT+DPV7dinQS3PQtqPVDBOyFVNAmlDV3lKg9xQ8iboKma4RxKDTQqjJIJfmEPVAsKZLRfNl06MZQH7WOUtSFGMGx4tE4ZtLhBuH3TbKPvc/h7L/E4aTtgUgQjeVGcKJKYqsdj3aSgRYsVtKpDCW/u9IKpL1p2l5x5TFMcyU15d1lKDeJpiqn2ChABjL/58Yq1MqWbpy3W0Nq17rnxTuvu3E6tqJXTETfia34Chm792izRfkORvaLB73kBzewFqolxJTE95w0+8mp9gtVtYN+Pf2ZjQnfrJRoOLc/wWvahtB2/Hxikl6739aeHIqwBNivXjSNwRfvAEhqRfLrHOTiQZ52fp9x5t5nmk/JOuIHUkILMwY8sn0TFAZ5+yZ8bUEr6Lj2Q1F5cpFfVWN0mI4bFx6Umx/zE/GAv46RbJTwNRwOB8feU2G4mI6/qFm4fKCkSxbXYHKqfcUQuiEOFI4inEpTLjAseAfXrGZvOXEASfhmj08ZI5YGIKRCYhT8FPr+qBRj+cu5C5Lw6IkchciQipxORg03yA+cqTFg9+8JlwQRzRW7jwJnfXA+xHOolOvmTm3uSHMHzZ3a3KlayM1D1XLTdXyLHDxw6OvZ1da2iJB24LdiXsatVz6O387c5Xfh4HCfMjQumZbaXsvOBby48CjJ/TICX+c7IMzlBMSBzQ80J8lpXQw0u5SrDH1YntG2PgsSUXscztOSF0M05ruyaO8vZ4svzIkzSnAWJLxMt8gjYCvg3HszDJsmukaB+FmIWMG/IHh+MPwS1JEe7Pk28LJ0P70+HnR5/tPv6momNL+1D7SDM0W/tCe6yNtUNtovnbx9vX279t/6f6r/qf+h/StP79yqfD7S1R/rP8MDndA=</latexit>

p(τ|π, O, M) = p(τ H|πH, M H), r(τ) = r(τ H)

<latexit sha1_base64="VjgRN2SnazI9V8gQBaQ2jIBLrb4=">AJkHiclVZtj9tEHbLS5NA6bV85MuIUxSbSyObgqgQgR4IKSBOd8c10pxYm18TrLUb3jXF04b/x1+EN/4N8zuxrGTCz2wlHg8O8/M4/3xdM0pIzb9t/37r/z7nvP2g0Wx98+PCjRwePn1yJM/8YOgnYZK9nhIWhDQOhpzyMHidZgGJpmHwavrmBzn+6jrIGE3il/wmDcYRmcd0Rn3C0eU9fvBn+wTc4PecXoMrLrH3V9N1iVWF85M1lkp8yzTWZB152TKCLgFq2m9LJZ9AHks3diPzhCZcvAk4KN1skYOLoxmPBnjDAJ76YTsWPhfSRHFxGI0irAFN6rWLksjzyBO87xUS4NJ7xm0LTmAj+1Ckyj6+YZ3ehXnKM/GYZ8YWbkoxTEoKkVdQe10X6oNIzuMLkKS2wS+XAHnfwKVWq3M5xXiJxtNWu8pS5kbpdUYc1dw5nmck83mcb9oh2kyiY20ag3sqzVb/Am7Uu6i7HzNMX6m2erj3js0tTI7pof3q+RsDZVqaVnK3L1F7AL0SlHexo+WfoP3IKxV6HezbKfl4mk4IsNyKQOvWyTmGSFbPOvSGliNSQ+2ByoVZMqUvHE5BRUz1imqWTo0SwTrYJatR8xgOvB0WzKoaYN0O5ZVZ9+R9I/uQFzWAVgUyWhdSk1KkXjZj8f3i4IRNVUwapNKoSXuzg50vGU5nlh6HMstad4exslTLNZtyN2GtCJrP+5sET5Zgs3yutTQ3u3Zs+/r7z9E2fPUvTEUM07tRT/Q8X9a7RaonKFYgyrXp5EaA1XsJXqTmE2glfadPaL06m4OjWu/nfOjFhf+njnQmu9vFb8qK3nryeG4dKu9U+mYhf9FGAIWJ7O1JnhKhvWMqjik+OpAaneq/z81SCD6thOajPhBWkpjZ0Dbl+SQUp7j7nVjwrQ21pIMNDs0SKU15VA0QMBlUEJlUhl/IHfFU/l2g2GjhX6/X6154L1XgbDL4ekM1o/MFJ1mWLKGs2SoJwl6G/ZKGBQqRBqSCt7whM30Iaf2N83WauOGF18B1gXKgMCU/hb46qQ+OLR7trgtuGsjUNjfZ15B3+5V4mfR0HM/ZAwNnLslI+FnBR+GBQtN2dBSvw3ZB6M0IxJFLCxUB8UBbTRgySDH8xB+WtIwSJGLuJphgpW2a7Y9K5b2yU89nzsaBxmvMg9nWhWR4CT0B+ncAVzbDf8AYN4mcUuYK/IDjzOX7DtFAEZ7fl28bl5z3nWe/L8y8OX3y/lqNhfGJ8apiGY3xlvDAGxpkxNPzGw8azxjeNfvNJ83nzu+axDr1/b4352Ni6mj/A/3EMgE=</latexit>

f H : Ω → ΩH

<latexit sha1_base64="Mv/mEaFdmUm6KazaZ2RQjfXu/4=">AJXiclVZbj9tEFHbLJWm4dAuPfTliFcVm08iGVqBWQS0IKUisNs20pxYk28TmLVNzjDauJ/wv/BVeKBCSDzxVzgzE8dO1nTBUpIzZ853znc+zyWzJPApM82/bt1+6+13m07Te/+D+8e3PvonMZ6nojNw7i9OWMUC/wI2/EfBZ4L5PUI+Es8F7MXn0j5l9cein14+g5u0q8SUgWkT/3XcLQ5dxrPGkfg+39mPmXYPOz7rPuDzrtEqMLQ5121tIcOqZODejaCxKGBOy81bYTf/op9IGkCzskPzncZkuPkdxOlzHoOLv1GFATBjhiy9mMf5sLH8nApn4ISRmgC6+Rj2ahQ5nfSufctuP5uwqVzSmnD2w8tRha+qYXaiWnC/eUpcbickZT4JQNDK8NkT7I9BQuMHni59ildGCPe/jEl6pcz3FaIHG21S7zFLmRulShw13BsNUpw7r0y17RNtx6C2wth+BfSLMVlv/DruS7rzocTuaYf3t2OgjHrvUFbK74Ye/l0hYmbLUyjElueoL2IeolOaBgr+KfqPrFyV+GOibKfFsmEIKutCKRKvaiT62RNjVNnhKHFjNBQeaB0YZU6gtHUCxByYx28nKVjvQCQTuYZWeIGXQLHuxKBhVtkG7HMKrsO4L+0Q2I8yoAiyIZpUuhSESK/pxWL0oGFRBaO2qSRa4G7sQMUbhuXwlcOwzEp58je3UcAUm07fK8Blcj4nxuLF282t8OsujSUd2f1/PvOq184NVvR4SO57uRW/A8V6/douUXFDsUYWr48gVAarmEn1Y3CbAUvtenUi9MpuVoVrv537gwob70s70FLs/xa/Kit5q8mhunCrvVPp7y79VgCF89ziSdwSvHljSI4tPj4QGJ+qsc7NEgA/LaTGp7oQ1JLoyVA1xZLk4Cd4+h0b8JUJlaSDLQ7NAilMcVUNEDAdlBCRVISfiRPxRHydodho4Vev1+ueOc9F4Hw6eLxlmvqLJSNpGq+gKOkcHJo9Uz5w3bA2xqG2eYbOwWv7Inaz0IuYGxBKx5aZsAkXb8oNvLxlZ9RLiPuKLwxmhEJPTrh8pbPoY2eC5jHKX4iBtJbRXASUnoVzjBSKEX354Szbm6csfmXE+5HSca8yFWF5lkALAbxlwEu/NRzWXCFBnFTH7mCuyS4HBn+sWihCNZ+y9eN8961ue9R6cPD59+vZGjqd3XPtF0zdK+0J5qA2ojTS38XPj18bvjdfNX5q/Nf9o/qlCb9/aYD7Wdp7m3/8AleEUVA=</latexit>

f L : Ω → ΩL

<latexit sha1_base64="WYeM8Y5Okr1AhJRO/HkZr2sfjXw=">AJt3iclVbtGEGXSyT15rSPfRnUETWikC2KXoBVCQtCqiADct15AQRJWJFUdI2vIW7tGpQ/MS+9K1/09ldUaRk1m4JSBzOzpk5c7gXzmKfMm6afz94+M673/qNFsfDhRx9/cvT40ysWpYnrjdzIj5JXM8I8n4beiFPue6/ixCPBzPdezt78LMZfXnsJo1H4gt/E3iQgy5AuqEs4upzHj/5sn4HtvU3pNdjZfd59zedYnRhaHOhtpDh1TZwZ07SUJAgJ23mrbMZ1+CX0gydIOyB9OZvOVx0luJ6sIdBzdeQyoCQN84qvZLPslFz6Sgs1oAHEZoAuvkY9tlgZOxvtWPs1sGi74Ta5oTDP+xMoTh2+Y3ahWnKC/BYJcTM7JgmnxAdBK68bov0QaZnMfkMc2xS+nAHg/wMZWq3M5xUSBxtNUu8xS5kbpZUoctdw7DRGcO7Mde0TbUeAtsTYNwT4XZqut/4pdSXde9Lh7mH93bPRzx2qStkd8sP79dIWJmy1NoxJbnqCziEqJTjmgYK/gn6T6xcslfhjomyXxTJhCDrnQikSr2ok+tkw4wLZ4ShxYjQUHmgdGWROoLJ1BMQcmMdfJylo70AsE6mGXvETPoFjzZlwq2iDdjmFU2XcE/ZN7EFdVABZFMkqXQpNCJF704/B6UTCiogpG7VJtMDd24GKNwzLydYOxzJr5cnvbqOAKTbdrKDBlQi438urKx4s7kdpNWpobx7s+fV179xKlZik42kvNOLsX/ULF+jZLVKxQjGHlyxMIpeEG9lLdK8xO8FKbTr04nZKrVeF6mP/OiQn1pZ8fTHC5j9+SF73V5NXcOFTYrfbZNDtVRwGZPvbkTwjsuqGJT2y+PREaHCu9jo3jQX4uBwWg+pM2ECsK0PVEFuWS/zsHe/MwN+NKGSdLDoVkghSmOqgECpoMSIpK8EuxI56Lv0sUGy386/V63UvnhQxcTAc/7KgmdLniJEmiNRQ1W+2CIdRS7Bc8skEuQ4QhuOANj9hEnXJyg1N0jTbueOEcsDBQBgRm9HfPFUe1JNaTE/vInTqHB2bPVNecNuwtsaxtr2GztFf9jxy08ALuesTxsaWGfNJuaO63t5y06ZFxP3DVl6YzRDEnhsksnvjhza6EGuUYK/kIP0VhEZCRi7CWYKYRh2PCWTc2Tvniu0lGwzjlXuiqQovUBx6B+IiBOU1QFf8GDeImFLmCuyK4QDh+6rRQBOuw5dvG1Vc96+veNxdPj5/9tJWjoX2ufaHpmqV9qz3TBtpQG2lu42njdcNtzJvfN53morlSoQ8fbDGfaXtX8+0/0kFAdA=</latexit>
slide-21
SLIDE 21
  • Proposition:
  • Optimizing in M is equivalent to optimizing in

and optimizing in

J = Z r(τ)p(τ|π, O, M)dτ = Z r(τ H)p(τ H|πH, M H)dτ H = Z r(τ L)p(τ L|πL, M L)dτ L

<latexit sha1_base64="sN1Dhj9OCqfGrj34rLugfHEzOq4=">AKm3iclVbrbqNGFGa3t5heNtv+qVRVOmpkGRqvBb2oVSWvdruq6q6IkjTr7EohRmOMbqAKTMkjQgP1Vfpv75Nz8x4ADtskUyPnOu3/nmxjSNQsos69795973P9jp6B9+9PEnD3YfnpKV3nmB2N/Fa2yV1NCgyhMgjELWRS8SrOAxNMoeDl9/YzbX14EGQ1XyQt2lQbnMVk4Tz0CUOV93Dn7+4BuMGfeXgBbnHSf9r/3aB9YvbhyKC9ayEeZBTei7CxLHBNxS7pOPkahkCyhRuTv7zCZcuAkdLNlisw0FpTGhxAxyx5XRa/FJyHcnBpWEMae1gcK1Znrk0j72CDe1yUrhMmdXpYQxKdgju8w8dk09qw/NkueIb54Rv3BTkrGQRMBhlY3husgQRHoKM0yehiV2KRTY41Z8GgpWbuY4VpFo1bt1HpUboVs1dFhjZ3CUGdRjQ1qhx2h3FQcLrB0m4B5yUe8av2FXQl2qHqvRFOtXY3OI8dilISP7a3z4f4GApShKXqWANecgO0QmfKspQGFP0P9vl0K9NLds5D2Y5WME3JZkUCa0FWd0iDX1Dz2xuiqLJxDqYFahVkywS/sg1qCAhntlfUqHRsqgvYwy8YQMxg2PNqkDBrcINyeaTbR9zj8/TsiTpsBWBTBSF4UJ4okpvrxWDsp6NFgBb2qVCKax93ZgfQ3TdsrLj2GZS6lpry9DRUm0azbKbYakInM/7mxCjWzpRvnzaUhtRur5807r3htGxFrxiLdSe24ltUbN+j9RblOxR9aD15PEJyeA0bqe4kpiK85qbXTk6vxmo3sG7nv3VhQnvp1sLXJzjN+hFbTN5MzealKx3DyaFI68CdCk2jyNxRxTNA0toRPHJPufgUJ51fp7y4L3azI3yTriG1JCrMGPLJ9ExSGefgcmPLagkXRUxaGoIrnIr6oRBkxGdQhPyt1P+Il4yF8nSDZK+BoMBv0T74VwnE9GP1VQs3CxZCTLVpegaupdhRBaIQ4VjmJUChcucCz4h1dsJm85cBJuGYXT7xkBlgYQgoEpuEfgc+vagGv5zbEDlvj8hRiByJyGlF5Jj6c05VmLDK+MZwfwzxW0dwInfng4xGzNYKzfdncrdke4OujuVu+Pt7lkDSzxwU7DXwp62fo683X/c2crP4yBhfkQoPbOtlJ0XfH/4UVDqbk6DlPivySI4QzEhcUDPC/FtVUIXNTgfqwx/iFomxEFiSm9iqfoyZmg2zaubLOd5Wz+43kRJmnOgsSXheZ5BGwF/EMNZmGMx9doUD8LESs4C8JHgIMP+d0JMHebvmcPrNwP528P3xd3tPfl7TsaN9oX2lGZqt/aA90UbakTbW/M7ncedXzsj/Uv9mf5cd6Tr/XvrmM+0jUcf/wcpvYjS</latexit>

How to optimize and in these MDPs?

πH

<latexit sha1_base64="AcFvz+TPzSO4zjBimH2pM7JzDeI=">AK3XiclVZb+NEFPYut9bcuvDIyxFVFJtmI5uLQEhBuyCkgFy1pZvd1daNXGc1KztGM+4pXIt8cIDCPHK/+KN38Ef4MyMx3ZSb7tYinPmzLl85szM56lUiZf1z5+4r72+htb2/qb739zrs7957TFd5gcTfxWtsqczQoMoTIJC1kUPE2zgMSzKHgye/4Nn39yHmQ0XCWP2GUanMZkmYSL0CcMVd69rX97+AGP+XhObjF8eDh4AeDog5gEOD9q+EeOhZBjVh4C5JHBNwS73npuH0IxgByZuTH72CpedBYyUbna2AgNna40JHWaAI3Y2mxXflxHcnBpGEPaGBhca5YnLs1jr2Aju5wWbpgs2GUpYUwLdt8uM49dUc8aQDvlKeJbZMQv3JRkLCQRcFhla1glGYEIT2GOwdOwxCqFAmvc8E9Dwcr1GEfKE2f1XhNHxUboVgMdKuwMDjODemxEa/To7a7iYIm5wTcAy7qPeM7rEqoS1VjPZph/npsjtAfqzSk56DCh/nCFiKItWFZwlw7QXYdJEhTzoKUPgz1O/ZpUAvzT0LaT9SwTghFzUJpA1d5SkNckXNI2+CpmqGcyg10KgwSib4hT1QLSiQ0X7ZdOnEUB60j1HWhjBsOH+OmXQ4gbh9k2zjb7P4e/d4vG47YBJEYzkRXGiSGKqHo91k4IWLVbQqg4lvLnfrRVIe9O0veLCY5jmQmrKm8tQbhJNVU6xUYAMZP7PjVWolS3dOG+3htSudc+Ld15343RsRa+YiL4TW/ElMnbv0WaL8h2KNrRZPO4hObyCtVC3ElMT3nDT7yan32C1W1g349/YmNCd+uFGg4tz/Bq9qG0Hb8fGKSXrvf1p4cirAE2K9eNI3BF+8ASGpF8usc5OJBnZ+n3Hm3meaT8k64gtSQgszBjyfRMUBn7JnxlQSvouPZDUXlykV9VY3SYjhsXHpSbH/MT8YC/jpFslPA1HA4Hx94jYbiYjr+soWbh8oyRLFtdgMqp9xRC6IQ4UjiKcSlMuMCx4B9esZm85cQBJ+GaPTzxkjlgYgpEJiFPwY+v6oFGP5ybkLkvDwiRyFyJCKnE5GDTfI95ypMWD37wmXBHNFbuPAmd9cD7Ec6iU6+ZObe5IcwfNndrcqVprIzXUPVctd0vhcIXDFWLa29m1hpZ4LpgV8KuVj2H3s7f7nzl53GQMD8ilJ7YVspOC7D/CjAqDkNUuI/J8vgBMWExAE9LcTXWQk91OCKrjL8YZFC2/YoSEzpZTxDS14P3Zzjyq65k5wtvjgtwiTNWZD4MtEij4CtgH/qwTzMsHeiSxSIn4WIFfwzgscIw9CHUmwN0u+Ljz+eGh/Mvzs6NPdB19XdGxpH2gfaoZma59rD7SxdqhNH/72fYv279t/657+q/6H/qf0vTuncrnfW3t0f/6D5icoCA=</latexit>

πL

<latexit sha1_base64="xBrxm4mfqLzcx5nDc+sC4+kXWT4=">AK3XiclVZb+NEFPYut9bcuvDIyxFVFJtmI5uLQEhBuyCkgFy1pZvd1daNXGc1KztGM+4pXIt8cIDCPHK/+KN38Ef4MyMx3ZSb7tYinPmzLl85szM56lUiZf1z5+4r72+htb2/qb739zrs7957TFd5gcTfxWtsqczQoMoTIJC1kUPE2zgMSzKHgye/4Nn39yHmQ0XCWP2GUanMZkmYSL0CcMVd69rX97+AGP+XhObjF8eDh4AeDog5gEOD9q+EeOhZBjVh4C5JHBNwS73npuH0IxgByZuTH72CpedBYyUbna2AgNna40JHWaAI3Y2mxXflxHcnBpGEPaGBhca5YnLs1jr2Aju5wWbpgs2GUpYUwLdt8uM49dUc8aQDvlKeJbZMQv3JRkLCQRcFhla1glGYEIT2GOwdOwxCqFAmvc8E9Dwcr1GEfKE2f1XhNHxUboVgMdKuwMDjODemxEa/To7a7iYIm5wTcAy7qPeM7rEqoS1VjPZph/npsjtAfqzSk56DCh/nCFiKItWFZwlw7QXYdJEhTzoKUPgz1O/ZpUAvzT0LaT9SwTghFzUJpA1d5SkNckXNI2+CpmqGcyg10KgwSib4hT1QLSiQ0X7ZdOnEUB60j1HWhjBsOH+OmXQ4gbh9k2zjb7P4e/d4vG47YBJEYzkRXGiSGKqHo91k4IWLVbQqg4lvLnfrRVIe9O0veLCY5jmQmrKm8tQbhJNVU6xUYAMZP7PjVWolS3dOG+3htSudc+Ld15343RsRa+YiL4TW/ElMnbv0WaL8h2KNrRZPO4hObyCtVC3ElMT3nDT7yan32C1W1g349/YmNCd+uFGg4tz/Bq9qG0Hb8fGKSXrvf1p4cirAE2K9eNI3BF+8ASGpF8usc5OJBnZ+n3Hm3meaT8k64gtSQgszBjyfRMUBn7JnxlQSvouPZDUXlykV9VY3SYjhsXHpSbH/MT8YC/jpFslPA1HA4Hx94jYbiYjr+soWbh8oyRLFtdgMqp9xRC6IQ4UjiKcSlMuMCx4B9esZm85cQBJ+GaPTzxkjlgYgpEJiFPwY+v6oFGP5ybkLkvDwiRyFyJCKnE5GDTfI95ypMWD37wmXBHNFbuPAmd9cD7Ec6iU6+ZObe5IcwfNndrcqVprIzXUPVctd0vhcIXDFWLk7exaQ0s8cF2wK2FXq5Db+dvd7y8zhImB8RSk9sK2WnBd9hfhRg1JwGKfGfk2VwgmJC4oCeFuLrIQeanBFVxn+sEihbXsUJKb0Mp6hJa+Hbs5xZdfcSc4WX5wWYZLmLEh8mWiR8BWwD/1YB5m2DvRJQrEz0LECv4ZwWOE4QehjiTYmyVfFx5/PLQ/GX529Onug68rOra0D7QPNUOztc+1B9pYO9Qmr/9bPuX7d+2f9c9/Vf9D/1PaXr3TuXzvrb26H/9B56soCQ=</latexit>

π, O

<latexit sha1_base64="9LcojVcxygKODZSAuN45GBUKACY=">ALQ3iclVZb+NEFPYut425deGRlyOqKDZNI5uLQCsF7YKQAnLVlmy6i+LEmjhOYtZOjGfcUrn+b7zwB3jD/DCAwjxisSZmYztpNl21KcM2fO5TvfnJnxJIlCyizr9zt3X3n1tdfuNfQ3zr7Xfe3bv/3hldZakfDPxVtEqfTgNonAZDFjIouBpkgYknkTBk8mzr/n8k/MgpeFq+ZhdJsEoJvNlOAt9wlDl3W/80DwCN/gpC8/BzfvtR+3vDdomZhtODNq6EuKJZxnUhLY7J3FMwC30puE4+gCySduzH52ctdtgYKdx0sQIDZ0uNCTvMAEdsMZnk3xRcRzJwaRhDUhkYXGsWQ5dmsZezrl2MczdczthlIWGMc3ZoF6nHrqhntaGecoT4ZinxczchKQtJBxWURuk3RBhKcwxeBJWGCVQoE1bvknoWDleoxT5YmzerOKo2IjdKuCDmvsDE5Sg3qsS0v06O2u4mCOucMluMdc1JvGt1iVUBeqxnI0wfzl2OyiP1ZpSM/2Gh/+nyNgKYpUF54lwNUXYNtFhzuKEDhT1F/YBcCvT3LKT9VAXjhFyUJA6dJWnMgVNU+9AZqGc6h1EClwip4BcOQLWgQEZbRdWlA0N50BZG2RhiBMOGw03KoMYNwm2Zh19i8M/uMXjrO6ASRGM5EVxokhiqh6P7SYFLWqsoFUZSnhzv1srkPamaXv5hcwzYXUFDeXodwkmnU5+VYBMpD5khsrVytbuHFWbw2p3eie5+83Y2zYyt6+UD0ndiKL5Bx9x6tijfoWhDq8XjHpLDK9gIdSsxJeEY4lC17252WhVYuwZ2O8GNnQm7cz/a6nBxkF/jF7X14PXYOKVkvXk0zh15F6BJvnkeiUsir59YQiOSjw84j8fysPOzhDvV9N8Ul4KV5AYUpA5+Jnlkyg/xuPvyIQvLagF7ZV+KCpPLvK7qocO417lwoNy8z4/Eo/5q49ko4SvTqfT7nuPheFs3HtQk3D+YKRNF1dgMqpNxVC2Amxq3DkvUKYcIFjwT+8Y1N5zYkTsI1m3jkLaeAiSGkQGAS/hj4/K4WYPjLuQmR8+KIHIXIkYicnYgcbJLvOFfhkpWz10WTDBV5FYOnPnt9RDLMYW1ctPcKc0dae6guVOaO+vW2koNZc+tl7umcLjCKT9dMEJ1seJunNCAQX+r74sH0N+geGi17VHdE+/n8nAw9W1E3t6+1bHEA9cFey3sa+vnxNv7zZ2u/CwOlsyPCKVD20rYKOfb2I+CQnczGiTEf0bmwRDFJYkDOsrFN2ABTdRg26xS/CGTQlv3yElM6WU8QUsOkW7PceWuWHGZl+M8nCZCxY+jLRLIuArYB/UMI0TLFBo0sUiJ+GiBX8BcGziuFnp4k2NslXxfOPu7Yn3Q+O/10/+FXazruaR9oH2qGZmufaw+1naiDTS/8Uvj8Zfjb/1X/U/9X/0f6Xp3Ttrn/e1jUf/73+iP8Wm</latexit>

πH

<latexit sha1_base64="vuChStQF5neCSlueJc4A5Xia9g=">ALSniclVbNb+NEFPcuC2y8fHThyOWJKopN08jmQ6CVgnZBSAG5aks23ZXixEwcJzFrJ8Yzalc/31cOHj+DCAYS48GbGYztptl0sxXnz5n383m/ezHiSRCFlvX7nbuv3Xv9jTfvN/QHb739zrt7D987o6t16gcDfxWt0ucTQoMoXAYDFrIoeJ6kAYknUfBs8uJrPv/sPEhpuFo+ZdJMIrJfBnOQp8wVHkPGz80j8ANflqH5+Bm/faT9vcGbROzDScGbV0J8cSzDGpC252TOCbg5nrTcLxR9AFks7dmPzsZS5bBIzkbrpYgYGzpcaEHWaAI7aYTLJvcq4ja3BpGENSGRhca+ZDl65jL2NdOx9nbricsctcwhn7NDOU49dUc9qQz3lCPHNUuJnbkJSFpIOKy8NiySdEGEpzDF4EmY5VCgTVu+SehYOV6jFPlibN6s4qjYiN0q4IOBXYGJ6lBPdalJXr0dldxMfc4RLcYy7qTeNbrEqoc1VjOZpg/nJsdtEfqzSkZ7vAh/nCFiKItWFZwlw9QXYdpEhzsKUPhT1B/YuUAvzT0LaT9VwTghFyUJpA5d5ckNckXNU2+ApmqGcyg1UKkwSir4hQNQLSiQ0VZedenAUB60hVE2hjBsOFwkzKocYNwW6ZR9/i8A9u8TirO2BSBCN5UZwokpiqx2O7SUGLGitoVYS3tzv1gqkvWnaXnbhMUxzITX5zWUoN4mKCfbKkAGMv/nxsrUyuZuvK63htRudM/Ld97uxtmxFb1sIPpObMVXyLh7j1ZblO9QtKHV4nEPyeEVbIS6lZiScAxqNp3NzutCqxdA7ud4MbOhN25n2x1uDjIr/GL2nrwemycUrLePBpnjrwL0CTbPI/EJZHVTyhEcnHB5zHY3nY+euEO+9X03xSXgpXkBhSkDn4meWTKDvG4+/IhC8tqAXtlX4oKk8u8ruqhw7jXuXCg3LzPj8Sj/mrj2SjhK9Op9Pue0+F4Wzce1RCTcP5gpE0XV2Ayqk3FULYCbGrcGS9XJhwgWPBP7xjU3nNiRNOwjWbeOQtp4CJIaRAYBL+GPj8rhZg+Mu5CZHz6ogchciRiJydiBxsku84V+GSlbMvXRZMFXkVg6c+e31EMsxhUK5ae6U5o40d9DcKc2dorW2UkPZc8Vy1xQOVzjlpwtGqC5W3I0TGjDob/V9/gj6GxQPrbY9qnvi/VweDqbaCBUkXUDx9vatjiUeuC7YhbCvFc+Jt/ebO1356zhYMj8ilA5tK2GjO9mPwow6poGCfFfkHkwRHFJ4oCOMvEpmEMTNdg9qxR/SKjQ1j0yElN6GU/QkgOl23NcuWtuGazL0ZuEzWLFj6MtFsHQFbAf+uhGmYp9GlygQPw0RK/gLgkcWw69PHUmwt0u+Lpx93LE/6Xx2+un+468KOu5rH2gfaoZma59rj7WedqINL/xS+OPxl+Nv/Vf9T/1f/R/pendO4XP+9rG8+Def3UExvc=</latexit>

M H

<latexit sha1_base64="FANcCwg4x0kWVP8crCPeZAkfBPk=">ALSHiclVZb+NEFPaW28bLpQuPvBxRbFpGtlcBFopaBeEFJCrtmTXSlOrInjJGbjxHgmLZXrn8cLj7zxG3jhAYR48yMx3ZSb7tYinPmzLl85szM57Ey5Ay/r93t5r7/x5lv3G/qDt959739h+f0/Um8YOBv16uk+cTQoNluAoGLGTL4HmcBCSaLINnkxf8PlnF0FCw/XqKbuKg1FE5qtwFvqEocp72Bg3j8ENftqEF+Cm/faT9g8GbROzDacGbV0L8dSzDGpC252TKCLgZnrTjcPx9AFkszdiPzspS5bBIxkbrJYg4GzhcaEGjPAEVtMJum3GdeRDbg0jCAuDQyuNbOhSzeRl7KunY1TN1zN2FUmYxTdmRniceuqWe1oZpyhPhmCfFTNyYJC8kSOKysMsyTdEGEpzDF4HGYZVCgTXu+MehYOVmjDPlibN6s4yjYiN0q4QOXYGp4lBPdalBXr0dtdRMfc4QrcEy7qTeM7rEqoM1VjMZpg/mJsdtEfqzSkZzvHh/8XCFiKItWlZwlw1QXYdZEhzUFKPwJ6g/tTKCX5p6FtJ+pYJyQy4IEUoWu8mQGuabmTdAUzXDOZQaKFUYJRH8wiGoFhTIaCsru3RgKA/awihbQ4xg2HC0TRlUuEG4LdOsom9x+Id3eJxXHTApgpG8KE4USUzV47F6UtCiwgpaFaGEN/e7swJpb5q2l156DNcSk12exnKTaLJy0l3CpCBzP+5sVK1spkbaqtIbVb3fPynVfODVb0UsHou/EVnyFjPV7tNyifIeiDS0Xj3tIDq9hK9SdxBSEY4gj1b717LRKsHYF7G6CWzsT6nM/2elwcZDf4Be1eDV2DilZL15PE4deRegSbp9HolLIq2eWEIjko8POY8n8rDzNzF3Pin+aS8FK4hNqQgc/AzyfL9ASPv2MTvrKgErRX+KGoPLnI76oeOox7pQsPys37/Eg84a8+ko0SvjqdTrvPRWGs3HvUQE1CecLRpJkfQkqp95UCKEWYlfhSHuZMOECx4J/eMcm8poTJ5yEazbxyFtNARNDSIHAJPwx8PldLcDwl3MbIufVETkKkSMRObWIHGyS7zlX4YoVsy9dFkwVeSWDpz53fUQyzGFXLlt7hTmjR30NwpzJ28tXZSQ9Fz+XJXFA5XOMWnC0YoL1bcjRMaMOjv9H32CPpbFA+tj2qeuL9XBwOptoIJSQdMXj7B1bHEg/cFOxcONDy59Tb/82drv1NFKyYvySUDm0rZqOU72V/GWS6u6FBTPwXZB4MUVyRKCjVHwIZtBEDfbOsEf0im0VY+URJReRO05Dp7hxX1s0N2z25SgNV/GBStfJptlsDWwL8qYRom2KXLKxSIn4SIFfwFwQOL4benjiTYuyXfFM4/6difdj4/+zg8dc5Hfe1D7WPNEOztS+0x1pPO9UGmt/4pfFH46/G3/qv+p/6P/q/0nTvXu7zgb1PNj7D0o0xfs=</latexit>

πL

<latexit sha1_base64="rQSXtcQrvH7Gz1s5S4THmgoBW9M=">ALSniclVbNb+NEFPcuC2y8fHThyOWJKopN08jmQ6CVgnZBSAG5aks23ZXixEwcJzFrJ8Yzalc/31cOHj+DCAYS48GbGYztptl0sxXnz5n383m/ezHiSRCFlvX7nbuv3Xv9jTfvN/QHb739zrt7D987o6t16gcDfxWt0ucTQoMoXAYDFrIoeJ6kAYknUfBs8uJrPv/sPEhpuFo+ZdJMIrJfBnOQp8wVHkPGz80j8ANflqH5+Bm/faT9vcGbROzDScGbV0J8cSzDGpC252TOCbg5nrTcLxR9AFks7dmPzsZS5bBIzkbrpYgYGzpcaEHWaAI7aYTLJvcq4ja3BpGENSGRhca+ZDl65jL2NdOx9nbricsctcwhn7NDOU49dUc9qQz3lCPHNUuJnbkJSFpIOKy8NiySdEGEpzDF4EmY5VCgTVu+SehYOV6jFPlibN6s4qjYiN0q4IOBXYGJ6lBPdalJXr0dldxMfc4RLcYy7qTeNbrEqoc1VjOZpg/nJsdtEfqzSkZ7vAh/nCFiKItWFZwlw9QXYdpEhzsKUPhT1B/YuUAvzT0LaT9VwTghFyUJpA5d5ckNckXNU2+ApmqGcyg1UKkwSir4hQNQLSiQ0VZedenAUB60hVE2hjBsOFwkzKocYNwW6ZR9/i8A9u8TirO2BSBCN5UZwokpiqx2O7SUGLGitoVYS3tzv1gqkvWnaXnbhMUxzITX5zWUoN4mKCfbKkAGMv/nxsrUyuZuvK63htRudM/Ld97uxtmxFb1sIPpObMVXyLh7j1ZblO9QtKHV4nEPyeEVbIS6lZiScAxqNp3NzutCqxdA7ud4MbOhN25n2x1uDjIr/GL2nrwemycUrLePBpnjrwL0CTbPI/EJZHVTyhEcnHB5zHY3nY+euEO+9X03xSXgpXkBhSkDn4meWTKDvG4+/IhC8tqAXtlX4oKk8u8ruqhw7jXuXCg3LzPj8Sj/mrj2SjhK9Op9Pue0+F4Wzce1RCTcP5gpE0XV2Ayqk3FULYCbGrcGS9XJhwgWPBP7xjU3nNiRNOwjWbeOQtp4CJIaRAYBL+GPj8rhZg+Mu5CZHz6ogchciRiJydiBxsku84V+GSlbMvXRZMFXkVg6c+e31EMsxhUK5ae6U5o40d9DcKc2dorW2UkPZc8Vy1xQOVzjlpwtGqC5W3I0TGjDob/V9/gj6GxQPrbY9qnvi/VweDqbaCBUkXaTy9vatjiUeuC7YhbCvFc+Jt/ebO1356zhYMj8ilA5tK2GjO9mPwow6poGCfFfkHkwRHFJ4oCOMvEpmEMTNdg9qxR/SKjQ1j0yElN6GU/QkgOl23NcuWtuGazL0ZuEzWLFj6MtFsHQFbAf+uhGmYp9GlygQPw0RK/gLgkcWw69PHUmwt0u+Lpx93LE/6Xx2+un+468KOu5rH2gfaoZma59rj7WedqINL/xS+OPxl+Nv/Vf9T/1f/R/pendO4XP+9rG8+Def3sUxvs=</latexit>

M L

<latexit sha1_base64="eD0jDCfI5TeKMRUwdHCvFE6w=">ALSHiclVZb+NEFPaW28bLpQuPvBxRbFpGtlcBFopaBeEFJCrtmTXSlOrInjJGbjxHgmLZXrn8cLj7zxG3jhAYR48yMx3ZSb7tYinPmzLl85szM57Ey5Ay/r93t5r7/x5lv3G/qDt959739h+f0/Um8YOBv16uk+cTQoNluAoGLGTL4HmcBCSaLINnkxf8PlnF0FCw/XqKbuKg1FE5qtwFvqEocp72Bg3j8ENftqEF+Cm/faT9g8GbROzDacGbV0L8dSzDGpC252TKCLgZnrTjcPx9AFkszdiPzspS5bBIxkbrJYg4GzhcaEGjPAEVtMJum3GdeRDbg0jCAuDQyuNbOhSzeRl7KunY1TN1zN2FUmYxTdmRniceuqWe1oZpyhPhmCfFTNyYJC8kSOKysMsyTdEGEpzDF4HGYZVCgTXu+MehYOVmjDPlibN6s4yjYiN0q4QOXYGp4lBPdalBXr0dtdRMfc4QrcEy7qTeM7rEqoM1VjMZpg/mJsdtEfqzSkZzvHh/8XCFiKItWlZwlw1QXYdZEhzUFKPwJ6g/tTKCX5p6FtJ+pYJyQy4IEUoWu8mQGuabmTdAUzXDOZQaKFUYJRH8wiGoFhTIaCsru3RgKA/awihbQ4xg2HC0TRlUuEG4LdOsom9x+Id3eJxXHTApgpG8KE4USUzV47F6UtCiwgpaFaGEN/e7swJpb5q2l156DNcSk12exnKTaLJy0l3CpCBzP+5sVK1spkbaqtIbVb3fPynVfODVb0UsHou/EVnyFjPV7tNyifIeiDS0Xj3tIDq9hK9SdxBSEY4gj1b717LRKsHYF7G6CWzsT6nM/2elwcZDf4Be1eDV2DilZL15PE4deRegSbp9HolLIq2eWEIjko8POY8n8rDzNzF3Pin+aS8FK4hNqQgc/AzyfL9ASPv2MTvrKgErRX+KGoPLnI76oeOox7pQsPys37/Eg84a8+ko0SvjqdTrvPRWGs3HvUQE1CecLRpJkfQkqp95UCKEWYlfhSHuZMOECx4J/eMcm8poTJ5yEazbxyFtNARNDSIHAJPwx8PldLcDwl3MbIufVETkKkSMRObWIHGyS7zlX4YoVsy9dFkwVeSWDpz53fUQyzGFXLlt7hTmjR30NwpzJ28tXZSQ9Fz+XJXFA5XOMWnC0YoL1bcjRMaMOjv9H32CPpbFA+tj2qeuL9XBwOptoIJSQdU3r7B1bHEg/cFOxcONDy59Tb/82drv1NFKyYvySUDm0rZqOU72V/GWS6u6FBTPwXZB4MUVyRKCjVHwIZtBEDfbOsEf0im0VY+URJReRO05Dp7hxX1s0N2z25SgNV/GBStfJptlsDWwL8qYRom2KXLKxSIn4SIFfwFwQOL4benjiTYuyXfFM4/6difdj4/+zg8dc5Hfe1D7WPNEOztS+0x1pPO9UGmt/4pfFH46/G3/qv+p/6P/q/0nTvXu7zgb1PNj7D1BExf8=</latexit>
slide-22
SLIDE 22

How to optimize and in these MDPs?

22

  • Observation1: MH depends on while depends on and
  • When we keep the intra-option policies fixed and optimize , we

are implicitly optimizing and

  • Observation 2: ML depends on and while depends on
  • When we keep the master policy and the termination conditions

fixed and optimize , we are implicitly optimizing

πH

<latexit sha1_base64="AcFvz+TPzSO4zjBimH2pM7JzDeI=">AK3XiclVZb+NEFPYut9bcuvDIyxFVFJtmI5uLQEhBuyCkgFy1pZvd1daNXGc1KztGM+4pXIt8cIDCPHK/+KN38Ef4MyMx3ZSb7tYinPmzLl85szM56lUiZf1z5+4r72+htb2/qb739zrs7957TFd5gcTfxWtsqczQoMoTIJC1kUPE2zgMSzKHgye/4Nn39yHmQ0XCWP2GUanMZkmYSL0CcMVd69rX97+AGP+XhObjF8eDh4AeDog5gEOD9q+EeOhZBjVh4C5JHBNwS73npuH0IxgByZuTH72CpedBYyUbna2AgNna40JHWaAI3Y2mxXflxHcnBpGEPaGBhca5YnLs1jr2Aju5wWbpgs2GUpYUwLdt8uM49dUc8aQDvlKeJbZMQv3JRkLCQRcFhla1glGYEIT2GOwdOwxCqFAmvc8E9Dwcr1GEfKE2f1XhNHxUboVgMdKuwMDjODemxEa/To7a7iYIm5wTcAy7qPeM7rEqoS1VjPZph/npsjtAfqzSk56DCh/nCFiKItWFZwlw7QXYdJEhTzoKUPgz1O/ZpUAvzT0LaT9SwTghFzUJpA1d5SkNckXNI2+CpmqGcyg10KgwSib4hT1QLSiQ0X7ZdOnEUB60j1HWhjBsOH+OmXQ4gbh9k2zjb7P4e/d4vG47YBJEYzkRXGiSGKqHo91k4IWLVbQqg4lvLnfrRVIe9O0veLCY5jmQmrKm8tQbhJNVU6xUYAMZP7PjVWolS3dOG+3htSudc+Ld15343RsRa+YiL4TW/ElMnbv0WaL8h2KNrRZPO4hObyCtVC3ElMT3nDT7yan32C1W1g349/YmNCd+uFGg4tz/Bq9qG0Hb8fGKSXrvf1p4cirAE2K9eNI3BF+8ASGpF8usc5OJBnZ+n3Hm3meaT8k64gtSQgszBjyfRMUBn7JnxlQSvouPZDUXlykV9VY3SYjhsXHpSbH/MT8YC/jpFslPA1HA4Hx94jYbiYjr+soWbh8oyRLFtdgMqp9xRC6IQ4UjiKcSlMuMCx4B9esZm85cQBJ+GaPTzxkjlgYgpEJiFPwY+v6oFGP5ybkLkvDwiRyFyJCKnE5GDTfI95ypMWD37wmXBHNFbuPAmd9cD7Ec6iU6+ZObe5IcwfNndrcqVprIzXUPVctd0vhcIXDFWLa29m1hpZ4LpgV8KuVj2H3s7f7nzl53GQMD8ilJ7YVspOC7D/CjAqDkNUuI/J8vgBMWExAE9LcTXWQk91OCKrjL8YZFC2/YoSEzpZTxDS14P3Zzjyq65k5wtvjgtwiTNWZD4MtEij4CtgH/qwTzMsHeiSxSIn4WIFfwzgscIw9CHUmwN0u+Ljz+eGh/Mvzs6NPdB19XdGxpH2gfaoZma59rD7SxdqhNH/72fYv279t/657+q/6H/qf0vTuncrnfW3t0f/6D5icoCA=</latexit>

πL

<latexit sha1_base64="xBrxm4mfqLzcx5nDc+sC4+kXWT4=">AK3XiclVZb+NEFPYut9bcuvDIyxFVFJtmI5uLQEhBuyCkgFy1pZvd1daNXGc1KztGM+4pXIt8cIDCPHK/+KN38Ef4MyMx3ZSb7tYinPmzLl85szM56lUiZf1z5+4r72+htb2/qb739zrs7957TFd5gcTfxWtsqczQoMoTIJC1kUPE2zgMSzKHgye/4Nn39yHmQ0XCWP2GUanMZkmYSL0CcMVd69rX97+AGP+XhObjF8eDh4AeDog5gEOD9q+EeOhZBjVh4C5JHBNwS73npuH0IxgByZuTH72CpedBYyUbna2AgNna40JHWaAI3Y2mxXflxHcnBpGEPaGBhca5YnLs1jr2Aju5wWbpgs2GUpYUwLdt8uM49dUc8aQDvlKeJbZMQv3JRkLCQRcFhla1glGYEIT2GOwdOwxCqFAmvc8E9Dwcr1GEfKE2f1XhNHxUboVgMdKuwMDjODemxEa/To7a7iYIm5wTcAy7qPeM7rEqoS1VjPZph/npsjtAfqzSk56DCh/nCFiKItWFZwlw7QXYdJEhTzoKUPgz1O/ZpUAvzT0LaT9SwTghFzUJpA1d5SkNckXNI2+CpmqGcyg10KgwSib4hT1QLSiQ0X7ZdOnEUB60j1HWhjBsOH+OmXQ4gbh9k2zjb7P4e/d4vG47YBJEYzkRXGiSGKqHo91k4IWLVbQqg4lvLnfrRVIe9O0veLCY5jmQmrKm8tQbhJNVU6xUYAMZP7PjVWolS3dOG+3htSudc+Ld15343RsRa+YiL4TW/ElMnbv0WaL8h2KNrRZPO4hObyCtVC3ElMT3nDT7yan32C1W1g349/YmNCd+uFGg4tz/Bq9qG0Hb8fGKSXrvf1p4cirAE2K9eNI3BF+8ASGpF8usc5OJBnZ+n3Hm3meaT8k64gtSQgszBjyfRMUBn7JnxlQSvouPZDUXlykV9VY3SYjhsXHpSbH/MT8YC/jpFslPA1HA4Hx94jYbiYjr+soWbh8oyRLFtdgMqp9xRC6IQ4UjiKcSlMuMCx4B9esZm85cQBJ+GaPTzxkjlgYgpEJiFPwY+v6oFGP5ybkLkvDwiRyFyJCKnE5GDTfI95ypMWD37wmXBHNFbuPAmd9cD7Ec6iU6+ZObe5IcwfNndrcqVprIzXUPVctd0vhcIXDFWLk7exaQ0s8cF2wK2FXq5Db+dvd7y8zhImB8RSk9sK2WnBd9hfhRg1JwGKfGfk2VwgmJC4oCeFuLrIQeanBFVxn+sEihbXsUJKb0Mp6hJa+Hbs5xZdfcSc4WX5wWYZLmLEh8mWiR8BWwD/1YB5m2DvRJQrEz0LECv4ZwWOE4QehjiTYmyVfFx5/PLQ/GX529Onug68rOra0D7QPNUOztc+1B9pYO9Qmr/9bPuX7d+2f9c9/Vf9D/1PaXr3TuXzvrb26H/9B56soCQ=</latexit>

{πo}

<latexit sha1_base64="I0gK3nS9MIWLIbmTr5hbjFpdxw=">ALWXiclVZtb+NEHYP7mhdXnrcR76MqKLYNI1sXgQ6KegOhBSQq7bkcndSnFgbx0nNxbHxbloq13+SD0iIv8IHZne9tpP62iNSkvHsvDz7Oysp8kypMy/t58N7Dx9sLun73/40cefHDz+9CWN16kfDP14Gaevp4QGy3AVDFnIlsHrJA1INF0Gr6ZvfuTry6DlIbx6gW7ToJxRBarcB76hKHKe7wXt07ADX5fh5fgZoPO86vBu0QswNnBm3fCPHMswxqQsdkCgi4OZ6y03CyRfQA5Iu3Ij84WUuwgYyd30IgYDV0uNCQ1mgE/sYjrNfsq5jqzBpWESWVgcK2Zj1y6jryM9ex8krnhas6ucwljkrFjO089dkM9qwP1lGPEN0+Jn7kJSVlIlsBh5bXHIkPRHgKMwyehDlWKRY45Z/EgpWbsc4V564qreqOCo2Qrcq6FBgZ3CWGtRjPVqiR283joIF5g5X4J5yUW8ZP2NVQp2rGsunKeYvn80e+mOVhvTsFPjw/xIBS1GkuvIsAa6+AdsuMuSoQCFP0X9kZ0L9NLcs5D2cxWME3JVkDq0FWe3CA31Dz3hmiqVjiHUgOVCqOkgl84AtWCAhlt51WXDg3lQdsYZeMRIxg2HG9SBjVuEG7bNOvo2xz+0T0eL+sOmBTBSF4UJ4okpurxWDMpaFjBa3KUMKb+91bgbQ3TdvLrjyGa6kJr+7DOUm0RTlZFsFyEDm/zxYmdrZ3I3W9daQ2o3uefvJa26chqPoZUPRd+IovkPG5jNaHVF+QtGVpvHPSHN7AR6l5iSsIxLFq32Z2hVYuwZ2O8GdnQnNuZ9vdbgY5Lf4RW09eD02LilZb51MkfeBWiSbc4jcUlk9YklNCL5IjzeCqHnb9OuPNhtcwX5aVwA4khBZmDzyfLNTH8nJnxvQS1ov/RDUXlykd9VfXSY9CsXHpSbD/hIPOU/AyQbJfzpdrudgfdCGM4n/acl1DRcXDCSpvEVqJx6SyGERog9hSPr58KECxwL/uEdm8prTkw4Cds4chbzQATQ0iBwDT8LfD5XS3A8B/nLkTOuyNyFCJHInIaETnYJL9wrsIVK1fui2YKbIrRw489v7IbZjBoVy09wpzR1p7qC5U5o7RWtpYay54rtrikcrnDKVxeMUF2seBqnNGAw2Or7/CkMNigeWR17XPfE+7kcDqY6CBUklcvN5BiL8f7PvYNDq2uJD9wW7EI41IrPmXfwpzuL/XUrJi/JSObCth4wfbn8Z5Lq7pkFC/DdkEYxQXJEoONMvBnm0EINlOc4hf5Fdq6R0YiSq+jKVpy3HR7jSub1kZrNv9unIWrZM2ClS8TzdLYDHw10yYhSm27fIaBeKnIWIF/4LgBGP4MqojCfZ2ybeFl1927a+635x/fjsh4KOXe0z7XPN0GztW+2Z1tfOtKHm7/2196/+UH+k/7O/s7+7r0vTBzuFzxNt47P/5D+TGsew</latexit>

πH

<latexit sha1_base64="AcFvz+TPzSO4zjBimH2pM7JzDeI=">AK3XiclVZb+NEFPYut9bcuvDIyxFVFJtmI5uLQEhBuyCkgFy1pZvd1daNXGc1KztGM+4pXIt8cIDCPHK/+KN38Ef4MyMx3ZSb7tYinPmzLl85szM56lUiZf1z5+4r72+htb2/qb739zrs7957TFd5gcTfxWtsqczQoMoTIJC1kUPE2zgMSzKHgye/4Nn39yHmQ0XCWP2GUanMZkmYSL0CcMVd69rX97+AGP+XhObjF8eDh4AeDog5gEOD9q+EeOhZBjVh4C5JHBNwS73npuH0IxgByZuTH72CpedBYyUbna2AgNna40JHWaAI3Y2mxXflxHcnBpGEPaGBhca5YnLs1jr2Aju5wWbpgs2GUpYUwLdt8uM49dUc8aQDvlKeJbZMQv3JRkLCQRcFhla1glGYEIT2GOwdOwxCqFAmvc8E9Dwcr1GEfKE2f1XhNHxUboVgMdKuwMDjODemxEa/To7a7iYIm5wTcAy7qPeM7rEqoS1VjPZph/npsjtAfqzSk56DCh/nCFiKItWFZwlw7QXYdJEhTzoKUPgz1O/ZpUAvzT0LaT9SwTghFzUJpA1d5SkNckXNI2+CpmqGcyg10KgwSib4hT1QLSiQ0X7ZdOnEUB60j1HWhjBsOH+OmXQ4gbh9k2zjb7P4e/d4vG47YBJEYzkRXGiSGKqHo91k4IWLVbQqg4lvLnfrRVIe9O0veLCY5jmQmrKm8tQbhJNVU6xUYAMZP7PjVWolS3dOG+3htSudc+Ld15343RsRa+YiL4TW/ElMnbv0WaL8h2KNrRZPO4hObyCtVC3ElMT3nDT7yan32C1W1g349/YmNCd+uFGg4tz/Bq9qG0Hb8fGKSXrvf1p4cirAE2K9eNI3BF+8ASGpF8usc5OJBnZ+n3Hm3meaT8k64gtSQgszBjyfRMUBn7JnxlQSvouPZDUXlykV9VY3SYjhsXHpSbH/MT8YC/jpFslPA1HA4Hx94jYbiYjr+soWbh8oyRLFtdgMqp9xRC6IQ4UjiKcSlMuMCx4B9esZm85cQBJ+GaPTzxkjlgYgpEJiFPwY+v6oFGP5ybkLkvDwiRyFyJCKnE5GDTfI95ypMWD37wmXBHNFbuPAmd9cD7Ec6iU6+ZObe5IcwfNndrcqVprIzXUPVctd0vhcIXDFWLa29m1hpZ4LpgV8KuVj2H3s7f7nzl53GQMD8ilJ7YVspOC7D/CjAqDkNUuI/J8vgBMWExAE9LcTXWQk91OCKrjL8YZFC2/YoSEzpZTxDS14P3Zzjyq65k5wtvjgtwiTNWZD4MtEij4CtgH/qwTzMsHeiSxSIn4WIFfwzgscIw9CHUmwN0u+Ljz+eGh/Mvzs6NPdB19XdGxpH2gfaoZma59rD7SxdqhNH/72fYv279t/657+q/6H/qf0vTuncrnfW3t0f/6D5icoCA=</latexit>

π

<latexit sha1_base64="qNrtmE0e43JQ5WPSosHXcFhfzM0=">ALXniclVZtb+NEHYPOK4Ox/XgCxJfRlRbJpGNi8CnR0B0IKyFVbcumdFCfWxnESc3FivOuWyvWf5BviCz+F2d2s7aS+9oiUZDw7L8OzvrSbwMKbOsv/cevPf+Bw8/fLSvNz56/PGTg6efXNB1mvjBwF8v18nrCaHBMlwFAxayZfA6TgISTZbBq8mbn/j6q8sgoeF69ZJdx8EoIvNVOAt9wlDlPd1nzRNwgz/S8BLcrN9+0f7NoG1ituHMoK0bIZ5lkFNaLtzEkUE3FxvunE4/hK6QJK5G5E/vcxli4CR3E0WazBwtdCYUGMG+MQWk0n2c851JAWXhHEpYHBtWY+dGkaeRnr2vk4c8PVjF3nEsY4Y8d2njshnpWG6opR4hvlhA/c2OSsJAsgcPK4+bJF0Q4SlMXgc5lilUGCNO/5xKFi5HeNceKq3izjqNgI3SqhwY7g7PEoB7r0gI9ervrKJhj7nAF7ikX9abxC1Yl1LmqsXiaYP7i2eyiP1ZpSM/2Bh/+XyJgKYpUV54lwFU3YNdFhzWFKDwJ6g/snOBXp7FtJ+roJxQq4KEkgVusqTG+SGmufeAE3VCudQaqBUYZRE8AtHoFpQIKOtvOzSgaE8aAujbD1iBMOG423KoMINwm2ZhV9i8M/usfjouqASRGM5EVxokhiqh6P1ZOCFhVW0KoIJby5370VSHvTtL3symOY5kpq8rvLUG4SzacbKcAGcj8nwcrUzubu1FabQ2p3eqet5+8+sapOYpeNhB9J47iO2SsP6PlEeUnFG1ouXncQ3J4A1uh7iWmIBxDHKv2rWenVYK1K2B3E9zZmVCf+8VOh4tBfotf1FaDV2PjkpL15sk4c+RdgCbZ9jwSl0RWnVhCI5KPjziPp3LY+WnMnQ/LZb4oL4UbiA0pyBx8ZvlkmZ3i+Dsx4QcLKkF7hR+KypOL/K7qocO4V7rwoNy8z0fiKf/pI9ko4U+n02n3vZfCcDbuPSugJuF8wUiSrK9A5dSbCiHUQuwqHFkvFyZc4FjwD+/YRF5zYsJuGYTR95qCpgYQgoEJuHvgc/vagG/zh3IXLeHZGjEDkSkVOLyMEm+ZVzFa5YsfrWbcEU0Vu6cCZ390PsR1T2Ci3zZ3C3JHmDpo7hbmza2d1FD03Ga7KwqHK5zi1QUjlBcrnsYJDRj0d/o+fwb9LYqHVtseVT3xfi6Gg6kOQgmpyOVmco6tc46AiweHVscSH7gt2BvhUNt8zryDv9zp2k+jYMX8JaF0aFsxG2X8hPvLAGOmNIiJ/4bMgyGKxIFdJSJ18McmqjBjlon+EWShbqkZGI0utogpYcPN1d48q6tWHKZt+PsnAVpyxY+TLRLF0CWwN/14RpmGDvLq9RIH4SIlbwFwTHGM3Uh1JsHdLvi1cfNWxv+58e/7N4fMfN3Q80j7XvtAMzda+05rPe1MG2j+/j/6nq7rDf3fxsPG48YTafpgb+Pzqb1aXz2H5i7yEc=</latexit>

{βo}

<latexit sha1_base64="2h2Rs+cjLdKMJN1vx0oiQRUsXfQ=">ALaHiclVZb9s2Fa7W2Pvku6CYdjLQPD0uIY0i7YUMBDu2GANyhIMtdpAcsWaFm2tVqWJlLJAkX7kXvbD9jLfsUOSVGSHTfpBFg+PDyX73w8JDWNVwFlpvn3vftvPnW2+82Gs03v/Q/2H354TqM08fyhF62i5MWUH8VrP0hC9jKfxEnPgmnK/59OWPfP75hZ/QIFo/Y1exPw7JYh3MA48wVLkP9/5sHYPj/54GF+Bkg87Tzq867RCjA6c6bV8L8dQ1dWpAx1mQMCTg5I2WEweTL6AHJFk4IfnDzRy29BnJnWQZgY6zpcaAHWaAI7acTrOfcq4jKTg0CGuDHSuNfKRQ9PQzVjPyieZE6zn7CqXMCYZO7LyxGX1DU7UE85RnzhHiZE5OEBWQFHFZeGxZJeiDCU5h8DjIsUqhwBq3/ONAsHIzxpnyxNlGq4qjYiN0s4IOBXYGp4lOXdajJXr0dqLQX2DuYA3OCRcbLf1nrEqoc1VjOZpi/nJs9NAfq9SlZ6fAh/8XCFiKItWlawpw9QXYdpEhRzsKUPgT1B9auUAvzV0TaT9TwTghlyUJpA5d5cl1ck2NM3eIpmqGcyg1UKkwSiL4hUNQLSiQ0XZedelQVx60jVE2hBt+BokzKocYNw24ZR9/m8A/v8DivO2BSBCN5UZwokpiqx2W7SUGLGitoVYS3tzvzgqkvWFYbnbpMkxzKTX57WUoN4mKCfbKkAGMv7nxsrUyuZOmNZbQ2o3ufVO2934+zYim42FH0ntuJrZNy9R6stynco2tBq8biH5PAaNkLdSUxJOIY4Uu27m512Bdaqgd1OcGtnwu7cT7c6XBzkN/hFbT14PTZOKbnROp5ktrwL0CTbPI/EJZHVTyhEcknh5zHE3nYeWnMnQ+qaT4pL4VriHUpyBz8zPLIKjvB4+/YgO9NqAXtl34oKk8u8ruqjw6TfuXCg3LzAT8ST/hrgGSjhK9ut9sZuM+E4XzSf1xCTYLFkpEkiS5B5Wy0FELYCbGncGT9XJhwgWPBP7xjE3nNiRNOwjVaeOStZ4CJIaBAYBr85nv8rhZg+Mu+DZH9+ohshciWiOydiGxskl84V8GalbOvXBZMFPkVg6c+e31EMsxg0K5aW6X5rY0t9HcLs3torW2UkPZc8Vy1xQ2V9jlpwtGqC5W3I1T6jMYbPV9/hgGxSPzI41rnvi/VweDobaCBWkMpeTyXMsyjkCPhJ5Ihy5+wdm1xQP3BSsQjQiufU3f/LmUVeGvpr5q0IpSPLjNk43vdW/kYPaV+TLyXZOGPUFyT0KfjTHwo5tBCDfZWlOAP6RbaukdGQkqvwila8jLo9hxX7pobpWz+3TgL1nHK/LUnE83TFbAI+FcnzIEu3h1hQLxkgCxgrckeKAx/DZtIAnWdsk3hfMvu9ZX3W/Ovj548kNBxwPtc+2RpmuW9q32ROtrp9pQ8/b+aTQbHzc+afzb3G9+2vxMmt6/V/h8pG08zUf/ATWxzCw=</latexit>

{πo}

<latexit sha1_base64="I0gK3nS9MIWLIbmTr5hbjFpdxw=">ALWXiclVZtb+NEHYP7mhdXnrcR76MqKLYNI1sXgQ6KegOhBSQq7bkcndSnFgbx0nNxbHxbloq13+SD0iIv8IHZne9tpP62iNSkvHsvDz7Oysp8kypMy/t58N7Dx9sLun73/40cefHDz+9CWN16kfDP14Gaevp4QGy3AVDFnIlsHrJA1INF0Gr6ZvfuTry6DlIbx6gW7ToJxRBarcB76hKHKe7wXt07ADX5fh5fgZoPO86vBu0QswNnBm3fCPHMswxqQsdkCgi4OZ6y03CyRfQA5Iu3Ij84WUuwgYyd30IgYDV0uNCQ1mgE/sYjrNfsq5jqzBpWESWVgcK2Zj1y6jryM9ex8krnhas6ucwljkrFjO089dkM9qwP1lGPEN0+Jn7kJSVlIlsBh5bXHIkPRHgKMwyehDlWKRY45Z/EgpWbsc4V564qreqOCo2Qrcq6FBgZ3CWGtRjPVqiR283joIF5g5X4J5yUW8ZP2NVQp2rGsunKeYvn80e+mOVhvTsFPjw/xIBS1GkuvIsAa6+AdsuMuSoQCFP0X9kZ0L9NLcs5D2cxWME3JVkDq0FWe3CA31Dz3hmiqVjiHUgOVCqOkgl84AtWCAhlt51WXDg3lQdsYZeMRIxg2HG9SBjVuEG7bNOvo2xz+0T0eL+sOmBTBSF4UJ4okpurxWDMpaFjBa3KUMKb+91bgbQ3TdvLrjyGa6kJr+7DOUm0RTlZFsFyEDm/zxYmdrZ3I3W9daQ2o3uefvJa26chqPoZUPRd+IovkPG5jNaHVF+QtGVpvHPSHN7AR6l5iSsIxLFq32Z2hVYuwZ2O8GdnQnNuZ9vdbgY5Lf4RW09eD02LilZb51MkfeBWiSbc4jcUlk9YklNCL5IjzeCqHnb9OuPNhtcwX5aVwA4khBZmDzyfLNTH8nJnxvQS1ov/RDUXlykd9VfXSY9CsXHpSbD/hIPOU/AyQbJfzpdrudgfdCGM4n/acl1DRcXDCSpvEVqJx6SyGERog9hSPr58KECxwL/uEdm8prTkw4Cds4chbzQATQ0iBwDT8LfD5XS3A8B/nLkTOuyNyFCJHInIaETnYJL9wrsIVK1fui2YKbIrRw489v7IbZjBoVy09wpzR1p7qC5U5o7RWtpYay54rtrikcrnDKVxeMUF2seBqnNGAw2Or7/CkMNigeWR17XPfE+7kcDqY6CBUklcvN5BiL8f7PvYNDq2uJD9wW7EI41IrPmXfwpzuL/XUrJi/JSObCth4wfbn8Z5Lq7pkFC/DdkEYxQXJEoONMvBnm0EINlOc4hf5Fdq6R0YiSq+jKVpy3HR7jSub1kZrNv9unIWrZM2ClS8TzdLYDHw10yYhSm27fIaBeKnIWIF/4LgBGP4MqojCfZ2ybeFl1927a+635x/fjsh4KOXe0z7XPN0GztW+2Z1tfOtKHm7/2196/+UH+k/7O/s7+7r0vTBzuFzxNt47P/5D+TGsew</latexit>

πH

<latexit sha1_base64="AcFvz+TPzSO4zjBimH2pM7JzDeI=">AK3XiclVZb+NEFPYut9bcuvDIyxFVFJtmI5uLQEhBuyCkgFy1pZvd1daNXGc1KztGM+4pXIt8cIDCPHK/+KN38Ef4MyMx3ZSb7tYinPmzLl85szM56lUiZf1z5+4r72+htb2/qb739zrs7957TFd5gcTfxWtsqczQoMoTIJC1kUPE2zgMSzKHgye/4Nn39yHmQ0XCWP2GUanMZkmYSL0CcMVd69rX97+AGP+XhObjF8eDh4AeDog5gEOD9q+EeOhZBjVh4C5JHBNwS73npuH0IxgByZuTH72CpedBYyUbna2AgNna40JHWaAI3Y2mxXflxHcnBpGEPaGBhca5YnLs1jr2Aju5wWbpgs2GUpYUwLdt8uM49dUc8aQDvlKeJbZMQv3JRkLCQRcFhla1glGYEIT2GOwdOwxCqFAmvc8E9Dwcr1GEfKE2f1XhNHxUboVgMdKuwMDjODemxEa/To7a7iYIm5wTcAy7qPeM7rEqoS1VjPZph/npsjtAfqzSk56DCh/nCFiKItWFZwlw7QXYdJEhTzoKUPgz1O/ZpUAvzT0LaT9SwTghFzUJpA1d5SkNckXNI2+CpmqGcyg10KgwSib4hT1QLSiQ0X7ZdOnEUB60j1HWhjBsOH+OmXQ4gbh9k2zjb7P4e/d4vG47YBJEYzkRXGiSGKqHo91k4IWLVbQqg4lvLnfrRVIe9O0veLCY5jmQmrKm8tQbhJNVU6xUYAMZP7PjVWolS3dOG+3htSudc+Ld15343RsRa+YiL4TW/ElMnbv0WaL8h2KNrRZPO4hObyCtVC3ElMT3nDT7yan32C1W1g349/YmNCd+uFGg4tz/Bq9qG0Hb8fGKSXrvf1p4cirAE2K9eNI3BF+8ASGpF8usc5OJBnZ+n3Hm3meaT8k64gtSQgszBjyfRMUBn7JnxlQSvouPZDUXlykV9VY3SYjhsXHpSbH/MT8YC/jpFslPA1HA4Hx94jYbiYjr+soWbh8oyRLFtdgMqp9xRC6IQ4UjiKcSlMuMCx4B9esZm85cQBJ+GaPTzxkjlgYgpEJiFPwY+v6oFGP5ybkLkvDwiRyFyJCKnE5GDTfI95ypMWD37wmXBHNFbuPAmd9cD7Ec6iU6+ZObe5IcwfNndrcqVprIzXUPVctd0vhcIXDFWLa29m1hpZ4LpgV8KuVj2H3s7f7nzl53GQMD8ilJ7YVspOC7D/CjAqDkNUuI/J8vgBMWExAE9LcTXWQk91OCKrjL8YZFC2/YoSEzpZTxDS14P3Zzjyq65k5wtvjgtwiTNWZD4MtEij4CtgH/qwTzMsHeiSxSIn4WIFfwzgscIw9CHUmwN0u+Ljz+eGh/Mvzs6NPdB19XdGxpH2gfaoZma59rD7SxdqhNH/72fYv279t/657+q/6H/qf0vTuncrnfW3t0f/6D5icoCA=</latexit>

π

<latexit sha1_base64="qNrtmE0e43JQ5WPSosHXcFhfzM0=">ALXniclVZtb+NEHYPOK4Ox/XgCxJfRlRbJpGNi8CnR0B0IKyFVbcumdFCfWxnESc3FivOuWyvWf5BviCz+F2d2s7aS+9oiUZDw7L8OzvrSbwMKbOsv/cevPf+Bw8/fLSvNz56/PGTg6efXNB1mvjBwF8v18nrCaHBMlwFAxayZfA6TgISTZbBq8mbn/j6q8sgoeF69ZJdx8EoIvNVOAt9wlDlPd1nzRNwgz/S8BLcrN9+0f7NoG1ituHMoK0bIZ5lkFNaLtzEkUE3FxvunE4/hK6QJK5G5E/vcxli4CR3E0WazBwtdCYUGMG+MQWk0n2c851JAWXhHEpYHBtWY+dGkaeRnr2vk4c8PVjF3nEsY4Y8d2njshnpWG6opR4hvlhA/c2OSsJAsgcPK4+bJF0Q4SlMXgc5lilUGCNO/5xKFi5HeNceKq3izjqNgI3SqhwY7g7PEoB7r0gI9ervrKJhj7nAF7ikX9abxC1Yl1LmqsXiaYP7i2eyiP1ZpSM/2Bh/+XyJgKYpUV54lwFU3YNdFhzWFKDwJ6g/snOBXp7FtJ+roJxQq4KEkgVusqTG+SGmufeAE3VCudQaqBUYZRE8AtHoFpQIKOtvOzSgaE8aAujbD1iBMOG423KoMINwm2ZhV9i8M/usfjouqASRGM5EVxokhiqh6P1ZOCFhVW0KoIJby5370VSHvTtL3symOY5kpq8rvLUG4SzacbKcAGcj8nwcrUzubu1FabQ2p3eqet5+8+sapOYpeNhB9J47iO2SsP6PlEeUnFG1ouXncQ3J4A1uh7iWmIBxDHKv2rWenVYK1K2B3E9zZmVCf+8VOh4tBfotf1FaDV2PjkpL15sk4c+RdgCbZ9jwSl0RWnVhCI5KPjziPp3LY+WnMnQ/LZb4oL4UbiA0pyBx8ZvlkmZ3i+Dsx4QcLKkF7hR+KypOL/K7qocO4V7rwoNy8z0fiKf/pI9ko4U+n02n3vZfCcDbuPSugJuF8wUiSrK9A5dSbCiHUQuwqHFkvFyZc4FjwD+/YRF5zYsJuGYTR95qCpgYQgoEJuHvgc/vagG/zh3IXLeHZGjEDkSkVOLyMEm+ZVzFa5YsfrWbcEU0Vu6cCZ390PsR1T2Ci3zZ3C3JHmDpo7hbmza2d1FD03Ga7KwqHK5zi1QUjlBcrnsYJDRj0d/o+fwb9LYqHVtseVT3xfi6Gg6kOQgmpyOVmco6tc46AiweHVscSH7gt2BvhUNt8zryDv9zp2k+jYMX8JaF0aFsxG2X8hPvLAGOmNIiJ/4bMgyGKxIFdJSJ18McmqjBjlon+EWShbqkZGI0utogpYcPN1d48q6tWHKZt+PsnAVpyxY+TLRLF0CWwN/14RpmGDvLq9RIH4SIlbwFwTHGM3Uh1JsHdLvi1cfNWxv+58e/7N4fMfN3Q80j7XvtAMzda+05rPe1MG2j+/j/6nq7rDf3fxsPG48YTafpgb+Pzqb1aXz2H5i7yEc=</latexit>

{βo}

<latexit sha1_base64="2h2Rs+cjLdKMJN1vx0oiQRUsXfQ=">ALaHiclVZb9s2Fa7W2Pvku6CYdjLQPD0uIY0i7YUMBDu2GANyhIMtdpAcsWaFm2tVqWJlLJAkX7kXvbD9jLfsUOSVGSHTfpBFg+PDyX73w8JDWNVwFlpvn3vftvPnW2+82Gs03v/Q/2H354TqM08fyhF62i5MWUH8VrP0hC9jKfxEnPgmnK/59OWPfP75hZ/QIFo/Y1exPw7JYh3MA48wVLkP9/5sHYPj/54GF+Bkg87Tzq867RCjA6c6bV8L8dQ1dWpAx1mQMCTg5I2WEweTL6AHJFk4IfnDzRy29BnJnWQZgY6zpcaAHWaAI7acTrOfcq4jKTg0CGuDHSuNfKRQ9PQzVjPyieZE6zn7CqXMCYZO7LyxGX1DU7UE85RnzhHiZE5OEBWQFHFZeGxZJeiDCU5h8DjIsUqhwBq3/ONAsHIzxpnyxNlGq4qjYiN0s4IOBXYGp4lOXdajJXr0dqLQX2DuYA3OCRcbLf1nrEqoc1VjOZpi/nJs9NAfq9SlZ6fAh/8XCFiKItWlawpw9QXYdpEhRzsKUPgT1B9auUAvzV0TaT9TwTghlyUJpA5d5cl1ck2NM3eIpmqGcyg1UKkwSiL4hUNQLSiQ0XZedelQVx60jVE2hBt+BokzKocYNw24ZR9/m8A/v8DivO2BSBCN5UZwokpiqx2W7SUGLGitoVYS3tzvzgqkvWFYbnbpMkxzKTX57WUoN4mKCfbKkAGMv7nxsrUyuZOmNZbQ2o3ufVO2934+zYim42FH0ntuJrZNy9R6stynco2tBq8biH5PAaNkLdSUxJOIY4Uu27m512Bdaqgd1OcGtnwu7cT7c6XBzkN/hFbT14PTZOKbnROp5ktrwL0CTbPI/EJZHVTyhEcknh5zHE3nYeWnMnQ+qaT4pL4VriHUpyBz8zPLIKjvB4+/YgO9NqAXtl34oKk8u8ruqjw6TfuXCg3LzAT8ST/hrgGSjhK9ut9sZuM+E4XzSf1xCTYLFkpEkiS5B5Wy0FELYCbGncGT9XJhwgWPBP7xjE3nNiRNOwjVaeOStZ4CJIaBAYBr85nv8rhZg+Mu+DZH9+ohshciWiOydiGxskl84V8GalbOvXBZMFPkVg6c+e31EMsxg0K5aW6X5rY0t9HcLs3torW2UkPZc8Vy1xQ2V9jlpwtGqC5W3I1T6jMYbPV9/hgGxSPzI41rnvi/VweDobaCBWkMpeTyXMsyjkCPhJ5Ihy5+wdm1xQP3BSsQjQiufU3f/LmUVeGvpr5q0IpSPLjNk43vdW/kYPaV+TLyXZOGPUFyT0KfjTHwo5tBCDfZWlOAP6RbaukdGQkqvwila8jLo9hxX7pobpWz+3TgL1nHK/LUnE83TFbAI+FcnzIEu3h1hQLxkgCxgrckeKAx/DZtIAnWdsk3hfMvu9ZX3W/Ovj548kNBxwPtc+2RpmuW9q32ROtrp9pQ8/b+aTQbHzc+afzb3G9+2vxMmt6/V/h8pG08zUf/ATWxzCw=</latexit>

π

<latexit sha1_base64="qNrtmE0e43JQ5WPSosHXcFhfzM0=">ALXniclVZtb+NEHYPOK4Ox/XgCxJfRlRbJpGNi8CnR0B0IKyFVbcumdFCfWxnESc3FivOuWyvWf5BviCz+F2d2s7aS+9oiUZDw7L8OzvrSbwMKbOsv/cevPf+Bw8/fLSvNz56/PGTg6efXNB1mvjBwF8v18nrCaHBMlwFAxayZfA6TgISTZbBq8mbn/j6q8sgoeF69ZJdx8EoIvNVOAt9wlDlPd1nzRNwgz/S8BLcrN9+0f7NoG1ituHMoK0bIZ5lkFNaLtzEkUE3FxvunE4/hK6QJK5G5E/vcxli4CR3E0WazBwtdCYUGMG+MQWk0n2c851JAWXhHEpYHBtWY+dGkaeRnr2vk4c8PVjF3nEsY4Y8d2njshnpWG6opR4hvlhA/c2OSsJAsgcPK4+bJF0Q4SlMXgc5lilUGCNO/5xKFi5HeNceKq3izjqNgI3SqhwY7g7PEoB7r0gI9ervrKJhj7nAF7ikX9abxC1Yl1LmqsXiaYP7i2eyiP1ZpSM/2Bh/+XyJgKYpUV54lwFU3YNdFhzWFKDwJ6g/snOBXp7FtJ+roJxQq4KEkgVusqTG+SGmufeAE3VCudQaqBUYZRE8AtHoFpQIKOtvOzSgaE8aAujbD1iBMOG423KoMINwm2ZhV9i8M/usfjouqASRGM5EVxokhiqh6P1ZOCFhVW0KoIJby5370VSHvTtL3symOY5kpq8rvLUG4SzacbKcAGcj8nwcrUzubu1FabQ2p3eqet5+8+sapOYpeNhB9J47iO2SsP6PlEeUnFG1ouXncQ3J4A1uh7iWmIBxDHKv2rWenVYK1K2B3E9zZmVCf+8VOh4tBfotf1FaDV2PjkpL15sk4c+RdgCbZ9jwSl0RWnVhCI5KPjziPp3LY+WnMnQ/LZb4oL4UbiA0pyBx8ZvlkmZ3i+Dsx4QcLKkF7hR+KypOL/K7qocO4V7rwoNy8z0fiKf/pI9ko4U+n02n3vZfCcDbuPSugJuF8wUiSrK9A5dSbCiHUQuwqHFkvFyZc4FjwD+/YRF5zYsJuGYTR95qCpgYQgoEJuHvgc/vagG/zh3IXLeHZGjEDkSkVOLyMEm+ZVzFa5YsfrWbcEU0Vu6cCZ390PsR1T2Ci3zZ3C3JHmDpo7hbmza2d1FD03Ga7KwqHK5zi1QUjlBcrnsYJDRj0d/o+fwb9LYqHVtseVT3xfi6Gg6kOQgmpyOVmco6tc46AiweHVscSH7gt2BvhUNt8zryDv9zp2k+jYMX8JaF0aFsxG2X8hPvLAGOmNIiJ/4bMgyGKxIFdJSJ18McmqjBjlon+EWShbqkZGI0utogpYcPN1d48q6tWHKZt+PsnAVpyxY+TLRLF0CWwN/14RpmGDvLq9RIH4SIlbwFwTHGM3Uh1JsHdLvi1cfNWxv+58e/7N4fMfN3Q80j7XvtAMzda+05rPe1MG2j+/j/6nq7rDf3fxsPG48YTafpgb+Pzqb1aXz2H5i7yEc=</latexit>

{βo}

<latexit sha1_base64="2h2Rs+cjLdKMJN1vx0oiQRUsXfQ=">ALaHiclVZb9s2Fa7W2Pvku6CYdjLQPD0uIY0i7YUMBDu2GANyhIMtdpAcsWaFm2tVqWJlLJAkX7kXvbD9jLfsUOSVGSHTfpBFg+PDyX73w8JDWNVwFlpvn3vftvPnW2+82Gs03v/Q/2H354TqM08fyhF62i5MWUH8VrP0hC9jKfxEnPgmnK/59OWPfP75hZ/QIFo/Y1exPw7JYh3MA48wVLkP9/5sHYPj/54GF+Bkg87Tzq867RCjA6c6bV8L8dQ1dWpAx1mQMCTg5I2WEweTL6AHJFk4IfnDzRy29BnJnWQZgY6zpcaAHWaAI7acTrOfcq4jKTg0CGuDHSuNfKRQ9PQzVjPyieZE6zn7CqXMCYZO7LyxGX1DU7UE85RnzhHiZE5OEBWQFHFZeGxZJeiDCU5h8DjIsUqhwBq3/ONAsHIzxpnyxNlGq4qjYiN0s4IOBXYGp4lOXdajJXr0dqLQX2DuYA3OCRcbLf1nrEqoc1VjOZpi/nJs9NAfq9SlZ6fAh/8XCFiKItWlawpw9QXYdpEhRzsKUPgT1B9auUAvzV0TaT9TwTghlyUJpA5d5cl1ck2NM3eIpmqGcyg1UKkwSiL4hUNQLSiQ0XZedelQVx60jVE2hBt+BokzKocYNw24ZR9/m8A/v8DivO2BSBCN5UZwokpiqx2W7SUGLGitoVYS3tzvzgqkvWFYbnbpMkxzKTX57WUoN4mKCfbKkAGMv7nxsrUyuZOmNZbQ2o3ufVO2934+zYim42FH0ntuJrZNy9R6stynco2tBq8biH5PAaNkLdSUxJOIY4Uu27m512Bdaqgd1OcGtnwu7cT7c6XBzkN/hFbT14PTZOKbnROp5ktrwL0CTbPI/EJZHVTyhEcknh5zHE3nYeWnMnQ+qaT4pL4VriHUpyBz8zPLIKjvB4+/YgO9NqAXtl34oKk8u8ruqjw6TfuXCg3LzAT8ST/hrgGSjhK9ut9sZuM+E4XzSf1xCTYLFkpEkiS5B5Wy0FELYCbGncGT9XJhwgWPBP7xjE3nNiRNOwjVaeOStZ4CJIaBAYBr85nv8rhZg+Mu+DZH9+ohshciWiOydiGxskl84V8GalbOvXBZMFPkVg6c+e31EMsxg0K5aW6X5rY0t9HcLs3torW2UkPZc8Vy1xQ2V9jlpwtGqC5W3I1T6jMYbPV9/hgGxSPzI41rnvi/VweDobaCBWkMpeTyXMsyjkCPhJ5Ihy5+wdm1xQP3BSsQjQiufU3f/LmUVeGvpr5q0IpSPLjNk43vdW/kYPaV+TLyXZOGPUFyT0KfjTHwo5tBCDfZWlOAP6RbaukdGQkqvwila8jLo9hxX7pobpWz+3TgL1nHK/LUnE83TFbAI+FcnzIEu3h1hQLxkgCxgrckeKAx/DZtIAnWdsk3hfMvu9ZX3W/Ovj548kNBxwPtc+2RpmuW9q32ROtrp9pQ8/b+aTQbHzc+afzb3G9+2vxMmt6/V/h8pG08zUf/ATWxzCw=</latexit>

πL

<latexit sha1_base64="KzYpcZOi6pC8bBQcl5J9XcsECs=">ALaHiclVZb9s2Fa7WyPvku6CYdjLQPD0uIY0i7YUMBDu2GANyhIMtdpAcsWaFm2tVqWJtLJAkX7kXvbD9jLfsUOSVGSHTfpBFg+PDyX73w8JDVJliFlvX3vftvPnW2+82NMb73/gf7Dz8p/E69YOBHy/j9MWE0GAZroIBC9kyeJGkAYkmy+D5OWPfP75RZDSMF49Y1dJMIrIfBXOQp8wVHkP9/5sHoMb/L4OL8DN+u2n7V8N2iZmG04N2roW4qlnGdSEtjsnUTAzfWm4TjL6ALJ27EfnDy1y2CBjJ3XQRg4GzpcaEHWaAI7aYTLKfcq4ja3BpGEFSGRhca+ZDl64jL2NdOx9nbriasatcwhn7MjOU49dU89qQz3lCPHNUuJnbkJSFpIlcFh5bVgk6YIT2GKwZMwxyqFAmvc8k9CwcrNGfKE2f1ZhVHxUboVgUdCuwMTlODeqxLS/To7cZRMfc4QrcEy7qTeNnrEqoc1VjOZpg/nJsdtEfqzSkZ7vAh/8XCFiKItWlZwlw9QXYdpEhzsKUPhT1B/auUAvzT0LaT9TwTghlyUJpA5d5ckNck3NM2+ApmqGcyg1UKkwSir4hUNQLSiQ0VZedenAUB60hVE2hjBsOFokzKocYNwW6ZR9/i8A/v8DivO2BSBCN5UZwokpiqx2O7SUGLGitoVYS3tzvzgqkvWnaXnbpMUxzKTX57WUoN4mKCfbKkAGMv/nxsrUyuZutK63htRudM+rd97uxtmxFb1sIPpObMXyLh7j1ZblO9QtKHV4nEPyeE1bIS6k5iScAxpNp3NzutCqxdA7ud4NbOhN25n251uDjIb/CL2nrwemycUrLePB5njrwL0CTbPI/EJZHVTyhEcnHh5zHE3nY+euEOx9U03xSXgrXkBhSkDn4meWTZXaCx9+xCd9bUAvaK/1QVJ5c5HdVDx3GvcqFB+XmfX4knvBXH8lGCV+dTqfd954Jw9m497iEmobzBSNpGl+Cyqk3FULYCbGrcGS9XJhwgWPBP7xjU3nNiRNOwjWbeOStpoCJIaRAYBL+Fvj8rhZg+Mu5DZHz+ogchciRiJydiBxskl84V+GKlbOvXBZMFXkVg6c+e31EMsxhUK5ae6U5o40d9DcKc2dorW2UkPZc8Vy1xQOVzjlpwtGqC5W3I0TGjDob/V9/hj6GxQPrbY9qnvi/VweDqbaCBUkvUjlZvIYi/MCUbF5Yx6+wdWxIP3BTsQjQiufU2/Lncb+OgpWzF8Soe2lbBRxve6vw5oGCfFfknkwRHFoCOMvGhmEMTNdhbcYo/pFto6x4ZiSi9iZoycug23NcuWtuGaz70ZuErWLFj5MtFsvQWA/qhGmYhcvr1AgfhoiVvAXBA80ht+mOpJgb5d8Uzj/smN/1fnm7OuDJz8UdDzQPtceaYZma9qT7SedqoNH/vH72hf6x/ov/b2G982vhMmt6/V/h8pG08jUf/ATmrzCw=</latexit>

{πo}

<latexit sha1_base64="I0gK3nS9MIWLIbmTr5hbjFpdxw=">ALWXiclVZtb+NEHYP7mhdXnrcR76MqKLYNI1sXgQ6KegOhBSQq7bkcndSnFgbx0nNxbHxbloq13+SD0iIv8IHZne9tpP62iNSkvHsvDz7Oysp8kypMy/t58N7Dx9sLun73/40cefHDz+9CWN16kfDP14Gaevp4QGy3AVDFnIlsHrJA1INF0Gr6ZvfuTry6DlIbx6gW7ToJxRBarcB76hKHKe7wXt07ADX5fh5fgZoPO86vBu0QswNnBm3fCPHMswxqQsdkCgi4OZ6y03CyRfQA5Iu3Ij84WUuwgYyd30IgYDV0uNCQ1mgE/sYjrNfsq5jqzBpWESWVgcK2Zj1y6jryM9ex8krnhas6ucwljkrFjO089dkM9qwP1lGPEN0+Jn7kJSVlIlsBh5bXHIkPRHgKMwyehDlWKRY45Z/EgpWbsc4V564qreqOCo2Qrcq6FBgZ3CWGtRjPVqiR283joIF5g5X4J5yUW8ZP2NVQp2rGsunKeYvn80e+mOVhvTsFPjw/xIBS1GkuvIsAa6+AdsuMuSoQCFP0X9kZ0L9NLcs5D2cxWME3JVkDq0FWe3CA31Dz3hmiqVjiHUgOVCqOkgl84AtWCAhlt51WXDg3lQdsYZeMRIxg2HG9SBjVuEG7bNOvo2xz+0T0eL+sOmBTBSF4UJ4okpurxWDMpaFjBa3KUMKb+91bgbQ3TdvLrjyGa6kJr+7DOUm0RTlZFsFyEDm/zxYmdrZ3I3W9daQ2o3uefvJa26chqPoZUPRd+IovkPG5jNaHVF+QtGVpvHPSHN7AR6l5iSsIxLFq32Z2hVYuwZ2O8GdnQnNuZ9vdbgY5Lf4RW09eD02LilZb51MkfeBWiSbc4jcUlk9YklNCL5IjzeCqHnb9OuPNhtcwX5aVwA4khBZmDzyfLNTH8nJnxvQS1ov/RDUXlykd9VfXSY9CsXHpSbD/hIPOU/AyQbJfzpdrudgfdCGM4n/acl1DRcXDCSpvEVqJx6SyGERog9hSPr58KECxwL/uEdm8prTkw4Cds4chbzQATQ0iBwDT8LfD5XS3A8B/nLkTOuyNyFCJHInIaETnYJL9wrsIVK1fui2YKbIrRw489v7IbZjBoVy09wpzR1p7qC5U5o7RWtpYay54rtrikcrnDKVxeMUF2seBqnNGAw2Or7/CkMNigeWR17XPfE+7kcDqY6CBUklcvN5BiL8f7PvYNDq2uJD9wW7EI41IrPmXfwpzuL/XUrJi/JSObCth4wfbn8Z5Lq7pkFC/DdkEYxQXJEoONMvBnm0EINlOc4hf5Fdq6R0YiSq+jKVpy3HR7jSub1kZrNv9unIWrZM2ClS8TzdLYDHw10yYhSm27fIaBeKnIWIF/4LgBGP4MqojCfZ2ybeFl1927a+635x/fjsh4KOXe0z7XPN0GztW+2Z1tfOtKHm7/2196/+UH+k/7O/s7+7r0vTBzuFzxNt47P/5D+TGsew</latexit>

π

<latexit sha1_base64="qNrtmE0e43JQ5WPSosHXcFhfzM0=">ALXniclVZtb+NEHYPOK4Ox/XgCxJfRlRbJpGNi8CnR0B0IKyFVbcumdFCfWxnESc3FivOuWyvWf5BviCz+F2d2s7aS+9oiUZDw7L8OzvrSbwMKbOsv/cevPf+Bw8/fLSvNz56/PGTg6efXNB1mvjBwF8v18nrCaHBMlwFAxayZfA6TgISTZbBq8mbn/j6q8sgoeF69ZJdx8EoIvNVOAt9wlDlPd1nzRNwgz/S8BLcrN9+0f7NoG1ituHMoK0bIZ5lkFNaLtzEkUE3FxvunE4/hK6QJK5G5E/vcxli4CR3E0WazBwtdCYUGMG+MQWk0n2c851JAWXhHEpYHBtWY+dGkaeRnr2vk4c8PVjF3nEsY4Y8d2njshnpWG6opR4hvlhA/c2OSsJAsgcPK4+bJF0Q4SlMXgc5lilUGCNO/5xKFi5HeNceKq3izjqNgI3SqhwY7g7PEoB7r0gI9ervrKJhj7nAF7ikX9abxC1Yl1LmqsXiaYP7i2eyiP1ZpSM/2Bh/+XyJgKYpUV54lwFU3YNdFhzWFKDwJ6g/snOBXp7FtJ+roJxQq4KEkgVusqTG+SGmufeAE3VCudQaqBUYZRE8AtHoFpQIKOtvOzSgaE8aAujbD1iBMOG423KoMINwm2ZhV9i8M/usfjouqASRGM5EVxokhiqh6P1ZOCFhVW0KoIJby5370VSHvTtL3symOY5kpq8rvLUG4SzacbKcAGcj8nwcrUzubu1FabQ2p3eqet5+8+sapOYpeNhB9J47iO2SsP6PlEeUnFG1ouXncQ3J4A1uh7iWmIBxDHKv2rWenVYK1K2B3E9zZmVCf+8VOh4tBfotf1FaDV2PjkpL15sk4c+RdgCbZ9jwSl0RWnVhCI5KPjziPp3LY+WnMnQ/LZb4oL4UbiA0pyBx8ZvlkmZ3i+Dsx4QcLKkF7hR+KypOL/K7qocO4V7rwoNy8z0fiKf/pI9ko4U+n02n3vZfCcDbuPSugJuF8wUiSrK9A5dSbCiHUQuwqHFkvFyZc4FjwD+/YRF5zYsJuGYTR95qCpgYQgoEJuHvgc/vagG/zh3IXLeHZGjEDkSkVOLyMEm+ZVzFa5YsfrWbcEU0Vu6cCZ390PsR1T2Ci3zZ3C3JHmDpo7hbmza2d1FD03Ga7KwqHK5zi1QUjlBcrnsYJDRj0d/o+fwb9LYqHVtseVT3xfi6Gg6kOQgmpyOVmco6tc46AiweHVscSH7gt2BvhUNt8zryDv9zp2k+jYMX8JaF0aFsxG2X8hPvLAGOmNIiJ/4bMgyGKxIFdJSJ18McmqjBjlon+EWShbqkZGI0utogpYcPN1d48q6tWHKZt+PsnAVpyxY+TLRLF0CWwN/14RpmGDvLq9RIH4SIlbwFwTHGM3Uh1JsHdLvi1cfNWxv+58e/7N4fMfN3Q80j7XvtAMzda+05rPe1MG2j+/j/6nq7rDf3fxsPG48YTafpgb+Pzqb1aXz2H5i7yEc=</latexit>

πL

<latexit sha1_base64="KzYpcZOi6pC8bBQcl5J9XcsECs=">ALaHiclVZb9s2Fa7WyPvku6CYdjLQPD0uIY0i7YUMBDu2GANyhIMtdpAcsWaFm2tVqWJtLJAkX7kXvbD9jLfsUOSVGSHTfpBFg+PDyX73w8JDVJliFlvX3vftvPnW2+82NMb73/gf7Dz8p/E69YOBHy/j9MWE0GAZroIBC9kyeJGkAYkmy+D5OWPfP75RZDSMF49Y1dJMIrIfBXOQp8wVHkP9/5sHoMb/L4OL8DN+u2n7V8N2iZmG04N2roW4qlnGdSEtjsnUTAzfWm4TjL6ALJ27EfnDy1y2CBjJ3XQRg4GzpcaEHWaAI7aYTLKfcq4ja3BpGEFSGRhca+ZDl64jL2NdOx9nbriasatcwhn7MjOU49dU89qQz3lCPHNUuJnbkJSFpIlcFh5bVgk6YIT2GKwZMwxyqFAmvc8k9CwcrNGfKE2f1ZhVHxUboVgUdCuwMTlODeqxLS/To7cZRMfc4QrcEy7qTeNnrEqoc1VjOZpg/nJsdtEfqzSkZ7vAh/8XCFiKItWlZwlw9QXYdpEhzsKUPhT1B/auUAvzT0LaT9TwTghlyUJpA5d5ckNck3NM2+ApmqGcyg1UKkwSir4hUNQLSiQ0VZedenAUB60hVE2hjBsOFokzKocYNwW6ZR9/i8A/v8DivO2BSBCN5UZwokpiqx2O7SUGLGitoVYS3tzvzgqkvWnaXnbpMUxzKTX57WUoN4mKCfbKkAGMv/nxsrUyuZutK63htRudM+rd97uxtmxFb1sIPpObMXyLh7j1ZblO9QtKHV4nEPyeE1bIS6k5iScAxpNp3NzutCqxdA7ud4NbOhN25n251uDjIb/CL2nrwemycUrLePB5njrwL0CTbPI/EJZHVTyhEcnHh5zHE3nY+euEOx9U03xSXgrXkBhSkDn4meWTZXaCx9+xCd9bUAvaK/1QVJ5c5HdVDx3GvcqFB+XmfX4knvBXH8lGCV+dTqfd954Jw9m497iEmobzBSNpGl+Cyqk3FULYCbGrcGS9XJhwgWPBP7xjU3nNiRNOwjWbeOStpoCJIaRAYBL+Fvj8rhZg+Mu5DZHz+ogchciRiJydiBxskl84V+GKlbOvXBZMFXkVg6c+e31EMsxhUK5ae6U5o40d9DcKc2dorW2UkPZc8Vy1xQOVzjlpwtGqC5W3I0TGjDob/V9/hj6GxQPrbY9qnvi/VweDqbaCBUkvUjlZvIYi/MCUbF5Yx6+wdWxIP3BTsQjQiufU2/Lncb+OgpWzF8Soe2lbBRxve6vw5oGCfFfknkwRHFoCOMvGhmEMTNdhbcYo/pFto6x4ZiSi9iZoycug23NcuWtuGaz70ZuErWLFj5MtFsvQWA/qhGmYhcvr1AgfhoiVvAXBA80ht+mOpJgb5d8Uzj/smN/1fnm7OuDJz8UdDzQPtceaYZma9qT7SedqoNH/vH72hf6x/ov/b2G982vhMmt6/V/h8pG08jUf/ATmrzCw=</latexit>

{βo}

<latexit sha1_base64="2h2Rs+cjLdKMJN1vx0oiQRUsXfQ=">ALaHiclVZb9s2Fa7W2Pvku6CYdjLQPD0uIY0i7YUMBDu2GANyhIMtdpAcsWaFm2tVqWJlLJAkX7kXvbD9jLfsUOSVGSHTfpBFg+PDyX73w8JDWNVwFlpvn3vftvPnW2+82Gs03v/Q/2H354TqM08fyhF62i5MWUH8VrP0hC9jKfxEnPgmnK/59OWPfP75hZ/QIFo/Y1exPw7JYh3MA48wVLkP9/5sHYPj/54GF+Bkg87Tzq867RCjA6c6bV8L8dQ1dWpAx1mQMCTg5I2WEweTL6AHJFk4IfnDzRy29BnJnWQZgY6zpcaAHWaAI7acTrOfcq4jKTg0CGuDHSuNfKRQ9PQzVjPyieZE6zn7CqXMCYZO7LyxGX1DU7UE85RnzhHiZE5OEBWQFHFZeGxZJeiDCU5h8DjIsUqhwBq3/ONAsHIzxpnyxNlGq4qjYiN0s4IOBXYGp4lOXdajJXr0dqLQX2DuYA3OCRcbLf1nrEqoc1VjOZpi/nJs9NAfq9SlZ6fAh/8XCFiKItWlawpw9QXYdpEhRzsKUPgT1B9auUAvzV0TaT9TwTghlyUJpA5d5cl1ck2NM3eIpmqGcyg1UKkwSiL4hUNQLSiQ0XZedelQVx60jVE2hBt+BokzKocYNw24ZR9/m8A/v8DivO2BSBCN5UZwokpiqx2W7SUGLGitoVYS3tzvzgqkvWFYbnbpMkxzKTX57WUoN4mKCfbKkAGMv7nxsrUyuZOmNZbQ2o3ufVO2934+zYim42FH0ntuJrZNy9R6stynco2tBq8biH5PAaNkLdSUxJOIY4Uu27m512Bdaqgd1OcGtnwu7cT7c6XBzkN/hFbT14PTZOKbnROp5ktrwL0CTbPI/EJZHVTyhEcknh5zHE3nYeWnMnQ+qaT4pL4VriHUpyBz8zPLIKjvB4+/YgO9NqAXtl34oKk8u8ruqjw6TfuXCg3LzAT8ST/hrgGSjhK9ut9sZuM+E4XzSf1xCTYLFkpEkiS5B5Wy0FELYCbGncGT9XJhwgWPBP7xjE3nNiRNOwjVaeOStZ4CJIaBAYBr85nv8rhZg+Mu+DZH9+ohshciWiOydiGxskl84V8GalbOvXBZMFPkVg6c+e31EMsxg0K5aW6X5rY0t9HcLs3torW2UkPZc8Vy1xQ2V9jlpwtGqC5W3I1T6jMYbPV9/hgGxSPzI41rnvi/VweDobaCBWkMpeTyXMsyjkCPhJ5Ihy5+wdm1xQP3BSsQjQiufU3f/LmUVeGvpr5q0IpSPLjNk43vdW/kYPaV+TLyXZOGPUFyT0KfjTHwo5tBCDfZWlOAP6RbaukdGQkqvwila8jLo9hxX7pobpWz+3TgL1nHK/LUnE83TFbAI+FcnzIEu3h1hQLxkgCxgrckeKAx/DZtIAnWdsk3hfMvu9ZX3W/Ovj548kNBxwPtc+2RpmuW9q32ROtrp9pQ8/b+aTQbHzc+afzb3G9+2vxMmt6/V/h8pG08zUf/ATWxzCw=</latexit>

{πo}

<latexit sha1_base64="I0gK3nS9MIWLIbmTr5hbjFpdxw=">ALWXiclVZtb+NEHYP7mhdXnrcR76MqKLYNI1sXgQ6KegOhBSQq7bkcndSnFgbx0nNxbHxbloq13+SD0iIv8IHZne9tpP62iNSkvHsvDz7Oysp8kypMy/t58N7Dx9sLun73/40cefHDz+9CWN16kfDP14Gaevp4QGy3AVDFnIlsHrJA1INF0Gr6ZvfuTry6DlIbx6gW7ToJxRBarcB76hKHKe7wXt07ADX5fh5fgZoPO86vBu0QswNnBm3fCPHMswxqQsdkCgi4OZ6y03CyRfQA5Iu3Ij84WUuwgYyd30IgYDV0uNCQ1mgE/sYjrNfsq5jqzBpWESWVgcK2Zj1y6jryM9ex8krnhas6ucwljkrFjO089dkM9qwP1lGPEN0+Jn7kJSVlIlsBh5bXHIkPRHgKMwyehDlWKRY45Z/EgpWbsc4V564qreqOCo2Qrcq6FBgZ3CWGtRjPVqiR283joIF5g5X4J5yUW8ZP2NVQp2rGsunKeYvn80e+mOVhvTsFPjw/xIBS1GkuvIsAa6+AdsuMuSoQCFP0X9kZ0L9NLcs5D2cxWME3JVkDq0FWe3CA31Dz3hmiqVjiHUgOVCqOkgl84AtWCAhlt51WXDg3lQdsYZeMRIxg2HG9SBjVuEG7bNOvo2xz+0T0eL+sOmBTBSF4UJ4okpurxWDMpaFjBa3KUMKb+91bgbQ3TdvLrjyGa6kJr+7DOUm0RTlZFsFyEDm/zxYmdrZ3I3W9daQ2o3uefvJa26chqPoZUPRd+IovkPG5jNaHVF+QtGVpvHPSHN7AR6l5iSsIxLFq32Z2hVYuwZ2O8GdnQnNuZ9vdbgY5Lf4RW09eD02LilZb51MkfeBWiSbc4jcUlk9YklNCL5IjzeCqHnb9OuPNhtcwX5aVwA4khBZmDzyfLNTH8nJnxvQS1ov/RDUXlykd9VfXSY9CsXHpSbD/hIPOU/AyQbJfzpdrudgfdCGM4n/acl1DRcXDCSpvEVqJx6SyGERog9hSPr58KECxwL/uEdm8prTkw4Cds4chbzQATQ0iBwDT8LfD5XS3A8B/nLkTOuyNyFCJHInIaETnYJL9wrsIVK1fui2YKbIrRw489v7IbZjBoVy09wpzR1p7qC5U5o7RWtpYay54rtrikcrnDKVxeMUF2seBqnNGAw2Or7/CkMNigeWR17XPfE+7kcDqY6CBUklcvN5BiL8f7PvYNDq2uJD9wW7EI41IrPmXfwpzuL/XUrJi/JSObCth4wfbn8Z5Lq7pkFC/DdkEYxQXJEoONMvBnm0EINlOc4hf5Fdq6R0YiSq+jKVpy3HR7jSub1kZrNv9unIWrZM2ClS8TzdLYDHw10yYhSm27fIaBeKnIWIF/4LgBGP4MqojCfZ2ybeFl1927a+635x/fjsh4KOXe0z7XPN0GztW+2Z1tfOtKHm7/2196/+UH+k/7O/s7+7r0vTBzuFzxNt47P/5D+TGsew</latexit>
slide-23
SLIDE 23

Do we need two critics?

  • Proposition: When state-value functions are used as

critics, one critic can be expressed in terms of the other, and hence only one critic is necessary

23

slide-24
SLIDE 24

Experiments setup

  • Want to see
  • DAC vs. existing gradient based option learnings
  • DAC vs. hierarchy-free methods
  • DAC can use any policy optimization (AC, SAC, NAC,

PPO, …).

  • Here focus on DAC + PPO.

24

slide-25
SLIDE 25

Results

  • Single task
  • DAC + A2C similar to OC
  • DAC + PPO similar to PPO

25

slide-26
SLIDE 26

Results

  • Transfer Learning

CartPole = (balance, balance_sparse) Reacher = (easy, hard) Cheetah = (run, backward) Fish = (upright, downleft) Walker1 = (squat, stand) Walker2 = (walk, backward)

26

slide-27
SLIDE 27

Recap

  • Problem: An end to end, policy based method to learn options

and policy over them

  • Solution: Option Critic
  • Limitation: Can’t use other policy optimization algorithms off-

the-shelf

  • Solution: Reformulate the SMDP of the option framework

as two augmented MDPs (Double Actor Critic)

27

slide-28
SLIDE 28

Thank you