CS 4803 / 7643: Deep Learning
Ashwin Kalyan Georgia Tech
Topics:
– Policy Gradients – Actor Critic
CS 4803 / 7643: Deep Learning Topics: Policy Gradients Actor - - PowerPoint PPT Presentation
CS 4803 / 7643: Deep Learning Topics: Policy Gradients Actor Critic Ashwin Kalyan Georgia Tech Topics well cover Overview of RL RL vs other forms of learning RL API Applications Framework: Markov
Topics:
– Policy Gradients – Actor Critic
2
3
4
The optimal Q-value function at state s and action a, is the expected cumulative reward from taking action a in state s and acting optimally thereafter
5
The optimal Q-value function at state s and action a, is the expected cumulative reward from taking action a in state s and acting optimally thereafter Optimal policy:
6
7
8
Target Q-Value Predicted Q-Value
a∈A Q(s, a)
<latexit sha1_base64="yzDprbFnNXTNJIZeZI4+KbcSxOk=">ACEnicbZDLSgMxFIYzXmu9V26CRahBSkzVdCNUHXjsgV7gc5QzqRpG5rJDElGLEOfwY2v4saFIm5dufNtzLRdaOsPgY/nEPO+f2IM6Vt+9taWl5ZXVvPbGQ3t7Z3dnN7+w0VxpLQOgl5KFs+KMqZoHXNKetSFIfE6b/vAmrTfvqVQsFHd6FEvgL5gPUZAG6uTK7oRK6givsQuyL4bwEMnAZcJQ3pAgCdX43GtoE6g2Mnl7ZI9EV4EZwZ5NFO1k/tyuyGJAyo04aBU27Ej7SUgNSOcjrNurGgEZAh92jYoIKDKSyYnjfGxcbq4F0rzhMYT9/dEAoFSo8A3nemar6Wmv/V2rHuXgJE1GsqSDTj3oxzrEaT64yQlmo8MAJHM7IrJACQbVLMmhCc+ZMXoVEuOaelcu0sX7mexZFBh+gIFZCDzlEF3aIqiOCHtEzekVv1pP1Yr1bH9PWJWs2c4D+yPr8AdJ6nPM=</latexit>Pick action with best Q value
9
Transition function and reward function
T
<latexit sha1_base64="3C9MXkPH8TY4nwHmFzXRcPtXQE=">AB8XicbVDLSgMxFL3js9ZX1aWbYBFclZkq6LoxmWFvrAtJZPeaUMzmSHJCGXoX7hxoYhb/8adf2OmnYW2HgczrmXnHv8WHBtXPfbWVvf2NzaLuwUd/f2Dw5LR8ctHSWKYZNFIlIdn2oUXGLTcCOwEyukoS+w7U/uMr/9hErzSDbMNMZ+SEeSB5xRY6XHXkjN2PfTxmxQKrsVdw6ySryclCFHfVD6g0jloQoDRNU67nxqafUmU4Ezgr9hKNMWUTOsKupZKGqPvpPGMnFtlSIJI2ScNmau/N1Iaj0NfTuZJdTLXib+53UTE9z0Uy7jxKBki4+CRBATkex8MuQKmRFTSyhT3GYlbEwVZcaWVLQleMsnr5JWteJdVqoPV+XabV5HAU7hDC7Ag2uowT3UoQkMJDzDK7w52nlx3p2Pxeiak+cwB84nz/Bt5D4</latexit>R
<latexit sha1_base64="HE8dhDNLhGJlVAgw6eEHnguJlo0=">AB8nicbVDLSgMxFL1TX7W+qi7dBIvgqsxUQZdFNy6r2AdMh5JM21oJhmSjFCGfoYbF4q49Wvc+Tdm2lo64HA4Zx7ybknTDjTxnW/ndLa+sbmVnm7srO7t39QPTzqaJkqQtEcql6IdaUM0HbhlOe4miOA457YaT29zvPlGlmRSPZprQIMYjwSJGsLGS34+xGRPMs4fZoFpz6+4caJV4BalBgdag+tUfSpLGVBjCsda+5yYmyLAyjHA6q/RTRNMJnhEfUsFjqkOsnkGTqzyhBFUtknDJqrvzcyHGs9jUM7mUfUy14u/uf5qYmug4yJDVUkMVHUcqRkSi/Hw2ZosTwqSWYKGazIjLGChNjW6rYErzlk1dJp1H3LuqN+8ta86aowncArn4MEVNOEOWtAGAhKe4RXeHO8O/Ox2K05BQ7x/AHzucPi3GRbA=</latexit>Use value / policy iteration known Obtain “optimal” policy
10
Transition function and reward function
T
<latexit sha1_base64="3C9MXkPH8TY4nwHmFzXRcPtXQE=">AB8XicbVDLSgMxFL3js9ZX1aWbYBFclZkq6LoxmWFvrAtJZPeaUMzmSHJCGXoX7hxoYhb/8adf2OmnYW2HgczrmXnHv8WHBtXPfbWVvf2NzaLuwUd/f2Dw5LR8ctHSWKYZNFIlIdn2oUXGLTcCOwEyukoS+w7U/uMr/9hErzSDbMNMZ+SEeSB5xRY6XHXkjN2PfTxmxQKrsVdw6ySryclCFHfVD6g0jloQoDRNU67nxqafUmU4Ezgr9hKNMWUTOsKupZKGqPvpPGMnFtlSIJI2ScNmau/N1Iaj0NfTuZJdTLXib+53UTE9z0Uy7jxKBki4+CRBATkex8MuQKmRFTSyhT3GYlbEwVZcaWVLQleMsnr5JWteJdVqoPV+XabV5HAU7hDC7Ag2uowT3UoQkMJDzDK7w52nlx3p2Pxeiak+cwB84nz/Bt5D4</latexit>R
<latexit sha1_base64="HE8dhDNLhGJlVAgw6eEHnguJlo0=">AB8nicbVDLSgMxFL1TX7W+qi7dBIvgqsxUQZdFNy6r2AdMh5JM21oJhmSjFCGfoYbF4q49Wvc+Tdm2lo64HA4Zx7ybknTDjTxnW/ndLa+sbmVnm7srO7t39QPTzqaJkqQtEcql6IdaUM0HbhlOe4miOA457YaT29zvPlGlmRSPZprQIMYjwSJGsLGS34+xGRPMs4fZoFpz6+4caJV4BalBgdag+tUfSpLGVBjCsda+5yYmyLAyjHA6q/RTRNMJnhEfUsFjqkOsnkGTqzyhBFUtknDJqrvzcyHGs9jUM7mUfUy14u/uf5qYmug4yJDVUkMVHUcqRkSi/Hw2ZosTwqSWYKGazIjLGChNjW6rYErzlk1dJp1H3LuqN+8ta86aowncArn4MEVNOEOWtAGAhKe4RXeHO8O/Ox2K05BQ7x/AHzucPi3GRbA=</latexit>Use value / policy iteration known Estimate Q values From data Obtain “optimal” policy
Previous class: Q - learning
unknown
11
Transition function and reward function
T
<latexit sha1_base64="3C9MXkPH8TY4nwHmFzXRcPtXQE=">AB8XicbVDLSgMxFL3js9ZX1aWbYBFclZkq6LoxmWFvrAtJZPeaUMzmSHJCGXoX7hxoYhb/8adf2OmnYW2HgczrmXnHv8WHBtXPfbWVvf2NzaLuwUd/f2Dw5LR8ctHSWKYZNFIlIdn2oUXGLTcCOwEyukoS+w7U/uMr/9hErzSDbMNMZ+SEeSB5xRY6XHXkjN2PfTxmxQKrsVdw6ySryclCFHfVD6g0jloQoDRNU67nxqafUmU4Ezgr9hKNMWUTOsKupZKGqPvpPGMnFtlSIJI2ScNmau/N1Iaj0NfTuZJdTLXib+53UTE9z0Uy7jxKBki4+CRBATkex8MuQKmRFTSyhT3GYlbEwVZcaWVLQleMsnr5JWteJdVqoPV+XabV5HAU7hDC7Ag2uowT3UoQkMJDzDK7w52nlx3p2Pxeiak+cwB84nz/Bt5D4</latexit>R
<latexit sha1_base64="HE8dhDNLhGJlVAgw6eEHnguJlo0=">AB8nicbVDLSgMxFL1TX7W+qi7dBIvgqsxUQZdFNy6r2AdMh5JM21oJhmSjFCGfoYbF4q49Wvc+Tdm2lo64HA4Zx7ybknTDjTxnW/ndLa+sbmVnm7srO7t39QPTzqaJkqQtEcql6IdaUM0HbhlOe4miOA457YaT29zvPlGlmRSPZprQIMYjwSJGsLGS34+xGRPMs4fZoFpz6+4caJV4BalBgdag+tUfSpLGVBjCsda+5yYmyLAyjHA6q/RTRNMJnhEfUsFjqkOsnkGTqzyhBFUtknDJqrvzcyHGs9jUM7mUfUy14u/uf5qYmug4yJDVUkMVHUcqRkSi/Hw2ZosTwqSWYKGazIjLGChNjW6rYErzlk1dJp1H3LuqN+8ta86aowncArn4MEVNOEOWtAGAhKe4RXeHO8O/Ox2K05BQ7x/AHzucPi3GRbA=</latexit>Use value / policy iteration known Obtain “optimal” policy Estimate and from data
T
<latexit sha1_base64="3C9MXkPH8TY4nwHmFzXRcPtXQE=">AB8XicbVDLSgMxFL3js9ZX1aWbYBFclZkq6LoxmWFvrAtJZPeaUMzmSHJCGXoX7hxoYhb/8adf2OmnYW2HgczrmXnHv8WHBtXPfbWVvf2NzaLuwUd/f2Dw5LR8ctHSWKYZNFIlIdn2oUXGLTcCOwEyukoS+w7U/uMr/9hErzSDbMNMZ+SEeSB5xRY6XHXkjN2PfTxmxQKrsVdw6ySryclCFHfVD6g0jloQoDRNU67nxqafUmU4Ezgr9hKNMWUTOsKupZKGqPvpPGMnFtlSIJI2ScNmau/N1Iaj0NfTuZJdTLXib+53UTE9z0Uy7jxKBki4+CRBATkex8MuQKmRFTSyhT3GYlbEwVZcaWVLQleMsnr5JWteJdVqoPV+XabV5HAU7hDC7Ag2uowT3UoQkMJDzDK7w52nlx3p2Pxeiak+cwB84nz/Bt5D4</latexit>R
<latexit sha1_base64="HE8dhDNLhGJlVAgw6eEHnguJlo0=">AB8nicbVDLSgMxFL1TX7W+qi7dBIvgqsxUQZdFNy6r2AdMh5JM21oJhmSjFCGfoYbF4q49Wvc+Tdm2lo64HA4Zx7ybknTDjTxnW/ndLa+sbmVnm7srO7t39QPTzqaJkqQtEcql6IdaUM0HbhlOe4miOA457YaT29zvPlGlmRSPZprQIMYjwSJGsLGS34+xGRPMs4fZoFpz6+4caJV4BalBgdag+tUfSpLGVBjCsda+5yYmyLAyjHA6q/RTRNMJnhEfUsFjqkOsnkGTqzyhBFUtknDJqrvzcyHGs9jUM7mUfUy14u/uf5qYmug4yJDVUkMVHUcqRkSi/Hw2ZosTwqSWYKGazIjLGChNjW6rYErzlk1dJp1H3LuqN+8ta86aowncArn4MEVNOEOWtAGAhKe4RXeHO8O/Ox2K05BQ7x/AHzucPi3GRbA=</latexit>Estimate Q values From data unknown
Homework!
12
Transition function and reward function
T
<latexit sha1_base64="3C9MXkPH8TY4nwHmFzXRcPtXQE=">AB8XicbVDLSgMxFL3js9ZX1aWbYBFclZkq6LoxmWFvrAtJZPeaUMzmSHJCGXoX7hxoYhb/8adf2OmnYW2HgczrmXnHv8WHBtXPfbWVvf2NzaLuwUd/f2Dw5LR8ctHSWKYZNFIlIdn2oUXGLTcCOwEyukoS+w7U/uMr/9hErzSDbMNMZ+SEeSB5xRY6XHXkjN2PfTxmxQKrsVdw6ySryclCFHfVD6g0jloQoDRNU67nxqafUmU4Ezgr9hKNMWUTOsKupZKGqPvpPGMnFtlSIJI2ScNmau/N1Iaj0NfTuZJdTLXib+53UTE9z0Uy7jxKBki4+CRBATkex8MuQKmRFTSyhT3GYlbEwVZcaWVLQleMsnr5JWteJdVqoPV+XabV5HAU7hDC7Ag2uowT3UoQkMJDzDK7w52nlx3p2Pxeiak+cwB84nz/Bt5D4</latexit>R
<latexit sha1_base64="HE8dhDNLhGJlVAgw6eEHnguJlo0=">AB8nicbVDLSgMxFL1TX7W+qi7dBIvgqsxUQZdFNy6r2AdMh5JM21oJhmSjFCGfoYbF4q49Wvc+Tdm2lo64HA4Zx7ybknTDjTxnW/ndLa+sbmVnm7srO7t39QPTzqaJkqQtEcql6IdaUM0HbhlOe4miOA457YaT29zvPlGlmRSPZprQIMYjwSJGsLGS34+xGRPMs4fZoFpz6+4caJV4BalBgdag+tUfSpLGVBjCsda+5yYmyLAyjHA6q/RTRNMJnhEfUsFjqkOsnkGTqzyhBFUtknDJqrvzcyHGs9jUM7mUfUy14u/uf5qYmug4yJDVUkMVHUcqRkSi/Hw2ZosTwqSWYKGazIjLGChNjW6rYErzlk1dJp1H3LuqN+8ta86aowncArn4MEVNOEOWtAGAhKe4RXeHO8O/Ox2K05BQ7x/AHzucPi3GRbA=</latexit>Use value / policy iteration known Obtain “optimal” policy Estimate and from data
T
<latexit sha1_base64="3C9MXkPH8TY4nwHmFzXRcPtXQE=">AB8XicbVDLSgMxFL3js9ZX1aWbYBFclZkq6LoxmWFvrAtJZPeaUMzmSHJCGXoX7hxoYhb/8adf2OmnYW2HgczrmXnHv8WHBtXPfbWVvf2NzaLuwUd/f2Dw5LR8ctHSWKYZNFIlIdn2oUXGLTcCOwEyukoS+w7U/uMr/9hErzSDbMNMZ+SEeSB5xRY6XHXkjN2PfTxmxQKrsVdw6ySryclCFHfVD6g0jloQoDRNU67nxqafUmU4Ezgr9hKNMWUTOsKupZKGqPvpPGMnFtlSIJI2ScNmau/N1Iaj0NfTuZJdTLXib+53UTE9z0Uy7jxKBki4+CRBATkex8MuQKmRFTSyhT3GYlbEwVZcaWVLQleMsnr5JWteJdVqoPV+XabV5HAU7hDC7Ag2uowT3UoQkMJDzDK7w52nlx3p2Pxeiak+cwB84nz/Bt5D4</latexit>R
<latexit sha1_base64="HE8dhDNLhGJlVAgw6eEHnguJlo0=">AB8nicbVDLSgMxFL1TX7W+qi7dBIvgqsxUQZdFNy6r2AdMh5JM21oJhmSjFCGfoYbF4q49Wvc+Tdm2lo64HA4Zx7ybknTDjTxnW/ndLa+sbmVnm7srO7t39QPTzqaJkqQtEcql6IdaUM0HbhlOe4miOA457YaT29zvPlGlmRSPZprQIMYjwSJGsLGS34+xGRPMs4fZoFpz6+4caJV4BalBgdag+tUfSpLGVBjCsda+5yYmyLAyjHA6q/RTRNMJnhEfUsFjqkOsnkGTqzyhBFUtknDJqrvzcyHGs9jUM7mUfUy14u/uf5qYmug4yJDVUkMVHUcqRkSi/Hw2ZosTwqSWYKGazIjLGChNjW6rYErzlk1dJp1H3LuqN+8ta86aowncArn4MEVNOEOWtAGAhKe4RXeHO8O/Ox2K05BQ7x/AHzucPi3GRbA=</latexit>Estimate Q values From data unknown unknown
This class!
13
14
J(π) = E " T X
t=1
R(st, at) #
<latexit sha1_base64="B+j1qSim06aX+cYcVTPiILNpSYU=">AC63icfVJdixMxFM2MX+v4sV19CVYCi2WMrMK+lJYFEF8WFbZ7i40s0MmzbRhMx8kd8QS8xd8UERX/1DvlvzHS6uLsVLwQO95ycnNwkraTQEIa/Pf/a9Rs3b23dDu7cvXd/u7Pz4EiXtWJ8wkpZqpOUai5FwScgQPKTSnGap5Ifp2evGv74A1dalMUhLCse53ReiEwCq6V7Hheb4xJTmGRpua1TQxNgGiRk0r0CZuV8EknMBhinRh4EtmGwtVfZujkA0skz2BKdJ071Ti0p4etJaPSvLf9cx1RYr6AGBMS9DChVaXKj5hkijITWbNvWwMxjuzp/rlZtGlmxBDsENMWDIK3fRd2gC9e43KgxuM/eZJONxyFq8KbIFqDLlrXQdL5RWYlq3NeAJNU62kUVhAbqkAwyW1Aas0rys7onE8dLGjOdWxWb2Vxz3VmOCuVWwXgVfiDkNzrZd56pRNZH2Va5r/4qY1ZC9iI4qBl6w9qCslhK3Dw8ngnFGcilA5Qp4bJitqBu+OC+R+CGEF298iY42h1FT0e751916ux7GFHqHqI8i9BztoTfoAE0Q8xbeZ+r983P/S/+d/9HK/W9Z6H6FL5P/8ALC3rGQ=</latexit>π∗ = arg max
π:S→A E
" T X
t=1
R(st, at) #
<latexit sha1_base64="hQq64rXZEwu0gPZ8pj2AdWqwCY=">ADGnicfVJbaxNBGJ1dbzXeUn30ZTAEg1ltwqKEKiK4FOp2rSFTLMTmaTobMXZr7VhnF+hy/+FV98UMQ38cV/42w2bW0jfjBwOd8t5mJCyk0BMFvz79w8dLlK2tXG9eu37h5q7l+e0/npWJ8wHKZq4OYai5FxgcgQPKDQnGaxpLvx4cvKn3/HVda5NkuzAs+Suk0E4lgFBwVrXtBu49JSmEWx+aljQyNgGiRkJ0CJvk8EFH0O1hHRl4ENpKwsWp0nP2riWSJzAkukydqx/Y8W5dklFp3tjOsY8oMZ3BCBPSaGNCi0LlR5gkijITWrNt6wKiH9rx9nGxcLWYET2wPUxr0G24Ucf3sduCqlzHkXGMU9Pct7aui9VKn9/wj6z9nTrs/NXLf8zftRsBRvBIvAqCJeghZaxEzV/knOypRnwCTVehgGBYwMVSCY5LZBSs0Lyg7plA8dzGjK9cgsntbitmMmOMmVOxngBft3hqGp1vM0ds5qZH1eq8h/acMSkicjI7KiBJ6xulFSgw5rv4JngjFGci5A5Qp4WbFbEbdW4H7TQ13CeH5lVfB3uZG+HBj8/Wj1tbz5XWsobvoHuqgED1GW+gV2kEDxLyP3mfvq/fN/+R/8b/7P2qr7y1z7qAz4f/6Axr5AH4=</latexit>θ∗ = arg max
θ
E " T X
t=1
R(st, at) #
<latexit sha1_base64="6GmiULs0oBvYilwKGbksjpJFOA=">AC/nicfVJdixMxFM2MX+v41VXfAmWQqulzKyCvhQWRfBpW7u9C0QybNtGEzHyR3ZEsM+Fd8UERX/0dvlvzHS6rLsVLwQO5yc3JskKaXQEIa/Pf/K1WvXb2zdDG7dvnP3Xmv7/qEuKsX4iBWyUMcJ1VyKnI9AgOTHpeI0SyQ/Sk5e1/rRB60KPIDWJZ8ktF5LlLBKDgq3vYedoaYZBQWSWLe2NjQGIgWGSlFl7BZAR91DL0+1rGBp5GtJVyeK31n71kieQpjoqvMuYahnR40kYxK8952z3xEifkCJpiQoIMJLUtVnGKSKspMZM2ebQLEMLTvbOwaDPMiD7YPqYN6AUEFhzo9Al2g1A1d+bT2DSkPZ/sYo917H9ajFvtcBCuCm+CaA3aF37cesXmRWsyngOTFKtx1FYwsRQBYJbgNSaV5SdkLnfOxgTjOuJ2b1fBZ3HDPDaHcygGv2L93GJpvcwS56xb1pe1mvyXNq4gfTkxIi8r4DlrDkoriaHA9V/AM6E4A7l0gDIlXK+YLah7D3A/JnCXEF0eRMc7gyiZ4Od8/bu6/W17GFHqHqIsi9ALtordoH40Q84z32fvqfM/+V/87/6Pxup76z0P0IXyf/4B3mP0Ag=</latexit>15
J(π) = E " T X
t=1
R(st, at) #
<latexit sha1_base64="B+j1qSim06aX+cYcVTPiILNpSYU=">AC63icfVJdixMxFM2MX+v4sV19CVYCi2WMrMK+lJYFEF8WFbZ7i40s0MmzbRhMx8kd8QS8xd8UERX/1DvlvzHS6uLsVLwQO95ycnNwkraTQEIa/Pf/a9Rs3b23dDu7cvXd/u7Pz4EiXtWJ8wkpZqpOUai5FwScgQPKTSnGap5Ifp2evGv74A1dalMUhLCse53ReiEwCq6V7Hheb4xJTmGRpua1TQxNgGiRk0r0CZuV8EknMBhinRh4EtmGwtVfZujkA0skz2BKdJ071Ti0p4etJaPSvLf9cx1RYr6AGBMS9DChVaXKj5hkijITWbNvWwMxjuzp/rlZtGlmxBDsENMWDIK3fRd2gC9e43KgxuM/eZJONxyFq8KbIFqDLlrXQdL5RWYlq3NeAJNU62kUVhAbqkAwyW1Aas0rys7onE8dLGjOdWxWb2Vxz3VmOCuVWwXgVfiDkNzrZd56pRNZH2Va5r/4qY1ZC9iI4qBl6w9qCslhK3Dw8ngnFGcilA5Qp4bJitqBu+OC+R+CGEF298iY42h1FT0e751916ux7GFHqHqI8i9BztoTfoAE0Q8xbeZ+r983P/S/+d/9HK/W9Z6H6FL5P/8ALC3rGQ=</latexit>π∗ = arg max
π:S→A E
" T X
t=1
R(st, at) #
<latexit sha1_base64="hQq64rXZEwu0gPZ8pj2AdWqwCY=">ADGnicfVJbaxNBGJ1dbzXeUn30ZTAEg1ltwqKEKiK4FOp2rSFTLMTmaTobMXZr7VhnF+hy/+FV98UMQ38cV/42w2bW0jfjBwOd8t5mJCyk0BMFvz79w8dLlK2tXG9eu37h5q7l+e0/npWJ8wHKZq4OYai5FxgcgQPKDQnGaxpLvx4cvKn3/HVda5NkuzAs+Suk0E4lgFBwVrXtBu49JSmEWx+aljQyNgGiRkJ0CJvk8EFH0O1hHRl4ENpKwsWp0nP2riWSJzAkukydqx/Y8W5dklFp3tjOsY8oMZ3BCBPSaGNCi0LlR5gkijITWrNt6wKiH9rx9nGxcLWYET2wPUxr0G24Ucf3sduCqlzHkXGMU9Pct7aui9VKn9/wj6z9nTrs/NXLf8zftRsBRvBIvAqCJeghZaxEzV/knOypRnwCTVehgGBYwMVSCY5LZBSs0Lyg7plA8dzGjK9cgsntbitmMmOMmVOxngBft3hqGp1vM0ds5qZH1eq8h/acMSkicjI7KiBJ6xulFSgw5rv4JngjFGci5A5Qp4WbFbEbdW4H7TQ13CeH5lVfB3uZG+HBj8/Wj1tbz5XWsobvoHuqgED1GW+gV2kEDxLyP3mfvq/fN/+R/8b/7P2qr7y1z7qAz4f/6Axr5AH4=</latexit>θ∗ = arg max
θ
E " T X
t=1
R(st, at) #
<latexit sha1_base64="6GmiULs0oBvYilwKGbksjpJFOA=">AC/nicfVJdixMxFM2MX+v41VXfAmWQqulzKyCvhQWRfBpW7u9C0QybNtGEzHyR3ZEsM+Fd8UERX/0dvlvzHS6rLsVLwQO5yc3JskKaXQEIa/Pf/K1WvXb2zdDG7dvnP3Xmv7/qEuKsX4iBWyUMcJ1VyKnI9AgOTHpeI0SyQ/Sk5e1/rRB60KPIDWJZ8ktF5LlLBKDgq3vYedoaYZBQWSWLe2NjQGIgWGSlFl7BZAR91DL0+1rGBp5GtJVyeK31n71kieQpjoqvMuYahnR40kYxK8952z3xEifkCJpiQoIMJLUtVnGKSKspMZM2ebQLEMLTvbOwaDPMiD7YPqYN6AUEFhzo9Al2g1A1d+bT2DSkPZ/sYo917H9ajFvtcBCuCm+CaA3aF37cesXmRWsyngOTFKtx1FYwsRQBYJbgNSaV5SdkLnfOxgTjOuJ2b1fBZ3HDPDaHcygGv2L93GJpvcwS56xb1pe1mvyXNq4gfTkxIi8r4DlrDkoriaHA9V/AM6E4A7l0gDIlXK+YLah7D3A/JnCXEF0eRMc7gyiZ4Od8/bu6/W17GFHqHqIsi9ALtordoH40Q84z32fvqfM/+V/87/6Pxup76z0P0IXyf/4B3mP0Ag=</latexit>16
T
t=0
θ
17
t=0
N
i=1 T
t=1
t, ai t)
<latexit sha1_base64="QoASQNhWVjKDWEoe6z7xTOghe8g=">ACJHicbZDLSgMxFIYz9VbrerSTbAIFaTMVEFBNGNK1GwF+i0QybNtKGZC8kZsQzMG58FTcuvODCjc9ipu1CWw+EfPz/OSTndyPBFZjml5Gbm19YXMovF1ZW19Y3iptbdRXGkrIaDUomy5RTPCA1YCDYM1IMuK7gjXcwWXmN+6ZVDwM7mAYsbZPegH3OCWgJad4apMokuEDtj1JaGKlyXVq9h3En5mpZ3rMUPGd1iWVYc7cIBJdu07xZJZMUeFZ8GaQAlN6sYpvtvdkMY+C4AKolTLMiNoJ0QCp4KlBTtWLCJ0QHqspTEgPlPtZLRkive0sVeKPUJAI/U3xMJ8ZUa+q7u9An01bSXif95rRi8k3bCgygGFtDxQ14sMIQ4Swx3uWQUxFADoZLrv2LaJzot0LkWdAjW9MqzUK9WrMNK9faodH4xiSOPdtAuKiMLHaNzdIVuUA1R9Iie0St6M56MF+PD+By35ozJzDb6U8b3D8yHpNg=</latexit>i=1
<latexit sha1_base64="OAWlX2MJMfZLwLqjwQeU2oVJP98=">AB+3icbVBNS8NAEJ3Ur1q/Yj16CRbBU0mqoBeh6MWTVLAf0MSw2W7apZtN2N2IJeSvePGgiFf/iDf/jds2B219MPB4b4aZeUHCqFS2/W2UVlbX1jfKm5Wt7Z3dPXO/2pFxKjBp45jFohcgSRjlpK2oYqSXCIKigJFuML6e+t1HIiSN+b2aJMSL0JDTkGKktOSbVTdzFUp96uZ+Ri+d/OHWN2t23Z7BWiZOQWpQoOWbX+4gxmlEuMIMSdl37ER5GRKYkbyiptKkiA8RkPS15SjiEgvm92eW8daGVhLHRxZc3U3xMZiqScRIHujJAayUVvKv7n9VMVXngZ5UmqCMfzRWHKLBVb0yCsARUEKzbRBGFB9a0WHiGBsNJxVXQIzuLy6TqDun9cbdWa15VcRhkM4ghNw4ByacAMtaAOGJ3iGV3gzcuPFeDc+5q0lo5g5gD8wPn8A7DuUWw=</latexit>18
τi = {s1, a1, . . . sT , aT }i
<latexit sha1_base64="TFbOelEzSv4cKXAFyEjbwcOLY4=">AGIXicjVRLb9NAEHZLAiW8UjhyWVFSmhUxQWpXCJVICTEoSofUjZ1FpvNsmqfsk7Rq3c/Stc+CtcOIBQb4g/w6ztoldqlpKPJ6Z/b5vZnbXjTypoNf7s7R8p1a/e2/lfuPBw0ePnzRXn+6rMIm52OhF8aHLlPCk4HYAwmeOIxiwXzXEwfu8TsTP/giYiXDYACnkRj5bBrIieQM0OWs1rZapE+oz2Dmul7aTMAaqkTyPZpnwcwplyoNMlyklh3dYmRKrSBfTO5p6YgJDqhIfs/o9fTITnz0s+6fZFHYzmdwYhQ2miRj20KMwGsU+KnwJKcxckT2sZzyTEHm/kvMBGRsiKwxNCJzHjqa3THZ1Lkn1bH+1cyLOr8lLZBd0lLDc6jVbOfPTSaGPxFLNPjDLj1FdiF8s2uDdUbQRGsqjJfATM9VjxvdiNhchtelNtC9IhkAygjHYpoWifGWTesHJIVz15dplrbP4b2T7KCEurygq8cFpR8T/IG5p0+y0zP595vJISnNdZNq+syNsOtYSYESIunhHl9DpknSwgmcgcJSJlx+u6NAOQHzlfjonhJaXdBOIEshsgjcUYp3Vdm4sR9ItDUdkMc6MuTN0wb0eaNalybCzX7hKPVGoYoCfA6od6TXehu97CFVwy6MNat4dp3mOWLwxBcBcI8pNbR7EYxSFoPknkDWRImI8WM2FUM0A+YLNUqz+jRpoWdMJmGMP9ximXd+Rcp8pU59FzPNsFQ5ZpzXxYJTN6MUhlECYiA50STxCMQEnNdkrGMBQfvFA3GY4laCZ8xbCXgpdrAJtjlkqvG/uaG/Wpj89Prte23RTtWrOfWC6t2daWtW19sHatPYvXvta+137WftW/1X/Uf9fP89TlpWLNM2vhqf/9ByWzGbc=</latexit>θ θ + αrθJ(θ)
<latexit sha1_base64="9FzOBrFC5DSMeH+3r2YIaAdeD8k=">AGVHicjVTfb9MwEM62towAo4NHXiymSi2rpmQgwUulCYSEeJgG6n5IdRc5rtuaOT9kO7Ap8x8JD0j8JbzwgJ2kXZu0iEhtL3fn+7lw/ZlRIx/m1sblVqzfubd+3Hzx8tPO4ufvkTEQJx+QURyziFz4ShNGQnEoqGbmIOUGBz8i5f/XOxM+/Ei5oFPblTUyGAZqEdEwxktrl7da+tEAPwADJqe+n75WXIk9CQMY0zbEo0jeCk92ukB4qdx3lQmB+C7S1ekdBRkZywEUSaCzeo67OclMWLpZ9We5UFOJ1M5BDaLfCxDeWUSNQp4UOJkhzFyxPaxjPHWCib+Wc1dUWI4phH1wCOcKpq9JjlVOiPVdHs/ouV6Ke1K1QUoNzp2K0e+fG4IT7R2deGmXGqO7LbZu6/+jaEIxp0ZN5CZHPUPG+rMZS5H+0qcqi4XQhGspytTmFQj4zyFywckhVPXl2GWtkvu1sjzLA0qkyAxZNKizWlVwn0nzURnu6Yg6L50qIZpO7VN2i/LeQYO0Qs6TuPHk2zBJCRkDj6LshPKcD9sFSRNZoKDXIrtWq9Li9uyqBXQEzBKB0hZJci2zm59yMtJTWiVvIX2vuAyVJVgYcWGqrDRKPGpOpcJzdNuF0CtjtA8+vq1D5VH7TmSbhpxHn0DhUd3g1g8RWum5jX3nAMne0DVcAtjzyqeE6/5Q4PjJChxAwJMXCdWA5TxCXFjCgbJoLECF+hCRloM0QBEcM0k0aBlvaMwDji+qO3MvMunkhRIMRN4OtM29RjhnqtgkeM3w5SGcSJiHOgcKAjID5hwUjygmW7EYbCHOquQI8RXoKUv8P21oEt9xy1Tg7PHBfHhx+erV39LaQY9t6Zj232pZrvbaOrA/WiXVq4dr32u+6Vd+o/6z/aWw16nq5kZx5qm19DR2/gI58yjp</latexit>Run the policy and sample trajectories Compute policy gradient Update policy Slide credit: Sergey Levine
rθJ(θ) ⇡ X
i
" T X
t=1
rθ log πθ(ai
t | si t) · T
X
t=1
R(si
t | ai t)
#
<latexit sha1_base64="RBn+cpP1fVseXo4iuczuvOlwew=">ACgHicbZFdi9QwFIbT+rWOX6NehMchC7o2K6CIgiL3ohXq+zsLky65TRNZ8ImTUhOxaHM7/B/ePEUxniji7Hg8ec95yck5pVXSY5r+iuJr12/cvLV3e3Tn7r37D8YPH5140zouZtwo485K8ELJRsxQohJn1gnQpRKn5cXHPn/6TgvTXOMKytyDYtG1pIDBqkY/2ANlAoKhkuBQD8nW9hnYK0z35lvdSGZEjXON9zh+2x9frzjYsosmJXDLYECzyXTsqK+p3GK4N0x60BlxU93Wd+L/VsK12crHEvBhP0m6CXoVsgEmZIijYvyTVYa3WjTIFXg/z1KLeQcOJVdiPWKtFxb4BSzEPGADWvi82wxwTZ8FpaK1ceE0SDfqv4OtPcrXYbKvnV/OdeL/8vNW6zf5p1sbIui4duH6lZRNLTfBq2kExzVKgBwJ0OvlC/BAcews1EYQnb5y1fh5GCavZoefHk9OfwjGOPCFPSUIy8oYck/kiMwIJ7+jSfQ8ehHcRK/jLNtaRwNnsdkJ+J3fwA7gcRo</latexit>19
= rθ Z πθ(τ)R(τ)dτ
<latexit sha1_base64="4gyGx/NjBUPBtNX1H/STwXdkv6o=">AD/nicjVNLixNBEJ6d8bGOj82qNy+NIZBoCDOroJfAogjiYVlX5CeD2dTtJsz4PuGtkwDvhXvHhQxKu/w5v/xu7MxM1DxIZqr/6quqrojvKBFfgeb+2bOfK1WvXt2+4N2/dvrPT2L17otJcUnZMU5HKs4goJnjCjoGDYGeZCSOBDuNzl+a+Ol7JhVPkyOYZSyIySThY04JaCjcte+3UB/hmMA0iopXZViQELDiMc54G9NRCh9UCJ0uUmEBj/3ShFB2GelqeqfEgo1hgFUea1bfK4dHVUlKRPGubC94WPLJFAKEsdtCb9oYpgxIZ60/BpJXcK0DbInx5LZef4oqauiEmWyfQC4bEktPDL4qCsJPG+Xw4PFvL8TXkF70LZRaRyOm6r6jx8ZLQROdHsC6PMgOWl2NWxTd1/TG0EZryeyRwSEglSn1e3sRL5n91sriVwNwvxBJYk1OtbTxyZf9hoej1vbmjT8WunadV2GDZ+4lFK85glQAVRauB7GQFkcCpYKWLc8UyQs/JhA20m5CYqaCYX98StTQyQuNU6k9LnKPLGQWJlZrFkWYatWo9ZsC/xQY5jJ8HBU+yHFhCq0bjXCBIkXkLaMQloyBm2iFUcq0V0SnRlwf0i3H1Evz1kTedk72e/6S39/Zpc/9FvY5t64H10GpbvXM2rdeW4fWsUXtwv5kf7G/Oh+dz84353tFtbfqnHvWijk/fgNGClAV</latexit>rθJ(θ) = rθEτ∼pθ(τ)[R(τ)]
<latexit sha1_base64="Ak6iFp2ndaPkRSt0gfcdrY+Q7cQ=">AEBXicjVNb9NAEN3afBTzlcKxlxVRJAeiyi5IcIlUgZAQh6qgpq2UTaz1ZpOs6i/tjlEjsxcu/BUuHECIK/+BG/+Gdey0TYJQR7I0fvP2zZvRbphFQoHn/dmw7GvXb9zcvOXcvnP3v3G1oMjleaS8R5Lo1SehFTxSCS8BwIifpJTuMw4sfh6auyfvyBSyXS5BmGR/EdJKIsWAUDBRsWdst3MUkpjANw+K1DgoaAFEiJplwCRul8FEF0O5gFRTwxNdlCWcXlY6htzWJ+Bj6ROWxYXU9PTysJBmNivfaXfCIFJMpDAhTgu/dQlMOdD2Sn8CNK+6BXBLZHzHpdk5/hC0ygSmUyPcNkLCkrfF3s68qS6Pp6uL+w56/bK0QHdAfTKmk7rarz8HpjcqJYZ+VzkpQX5hdHrvU/c/UpcFM1DM5JKFhROu/5V0sVa6ymfWlM2uNVzjrum0g4aTW/HmwdeT/w6aI6DoLGbzJKWR7zBFhEler7XgaDgkoQLOLaIbniGWndML7Jk1ozNWgmN9ijVsGeFxKs2XAJ6jl08UNFZqFoeGWTpVq7US/Fetn8P4xaAQSZYDT1jVaJxHGFJcPgk8EpIziGYmoUwK4xWzKTV3CMzDcwS/NWR15Oj3R3/6c7u2fNvZf1OjbRNnqEXOSj52gPvUEHqIeY9cn6Yn2zvtuf7a/2D/tnRbU26jMP0VLYv/4CxJTew=</latexit>Expand expectation
20
= rθ Z πθ(τ)R(τ)dτ
<latexit sha1_base64="4gyGx/NjBUPBtNX1H/STwXdkv6o=">AD/nicjVNLixNBEJ6d8bGOj82qNy+NIZBoCDOroJfAogjiYVlX5CeD2dTtJsz4PuGtkwDvhXvHhQxKu/w5v/xu7MxM1DxIZqr/6quqrojvKBFfgeb+2bOfK1WvXt2+4N2/dvrPT2L17otJcUnZMU5HKs4goJnjCjoGDYGeZCSOBDuNzl+a+Ol7JhVPkyOYZSyIySThY04JaCjcte+3UB/hmMA0iopXZViQELDiMc54G9NRCh9UCJ0uUmEBj/3ShFB2GelqeqfEgo1hgFUea1bfK4dHVUlKRPGubC94WPLJFAKEsdtCb9oYpgxIZ60/BpJXcK0DbInx5LZef4oqauiEmWyfQC4bEktPDL4qCsJPG+Xw4PFvL8TXkF70LZRaRyOm6r6jx8ZLQROdHsC6PMgOWl2NWxTd1/TG0EZryeyRwSEglSn1e3sRL5n91sriVwNwvxBJYk1OtbTxyZf9hoej1vbmjT8WunadV2GDZ+4lFK85glQAVRauB7GQFkcCpYKWLc8UyQs/JhA20m5CYqaCYX98StTQyQuNU6k9LnKPLGQWJlZrFkWYatWo9ZsC/xQY5jJ8HBU+yHFhCq0bjXCBIkXkLaMQloyBm2iFUcq0V0SnRlwf0i3H1Evz1kTedk72e/6S39/Zpc/9FvY5t64H10GpbvXM2rdeW4fWsUXtwv5kf7G/Oh+dz84353tFtbfqnHvWijk/fgNGClAV</latexit>rθJ(θ) = rθEτ∼pθ(τ)[R(τ)]
<latexit sha1_base64="Ak6iFp2ndaPkRSt0gfcdrY+Q7cQ=">AEBXicjVNb9NAEN3afBTzlcKxlxVRJAeiyi5IcIlUgZAQh6qgpq2UTaz1ZpOs6i/tjlEjsxcu/BUuHECIK/+BG/+Gdey0TYJQR7I0fvP2zZvRbphFQoHn/dmw7GvXb9zcvOXcvnP3v3G1oMjleaS8R5Lo1SehFTxSCS8BwIifpJTuMw4sfh6auyfvyBSyXS5BmGR/EdJKIsWAUDBRsWdst3MUkpjANw+K1DgoaAFEiJplwCRul8FEF0O5gFRTwxNdlCWcXlY6htzWJ+Bj6ROWxYXU9PTysJBmNivfaXfCIFJMpDAhTgu/dQlMOdD2Sn8CNK+6BXBLZHzHpdk5/hC0ygSmUyPcNkLCkrfF3s68qS6Pp6uL+w56/bK0QHdAfTKmk7rarz8HpjcqJYZ+VzkpQX5hdHrvU/c/UpcFM1DM5JKFhROu/5V0sVa6ymfWlM2uNVzjrum0g4aTW/HmwdeT/w6aI6DoLGbzJKWR7zBFhEler7XgaDgkoQLOLaIbniGWndML7Jk1ozNWgmN9ijVsGeFxKs2XAJ6jl08UNFZqFoeGWTpVq7US/Fetn8P4xaAQSZYDT1jVaJxHGFJcPgk8EpIziGYmoUwK4xWzKTV3CMzDcwS/NWR15Oj3R3/6c7u2fNvZf1OjbRNnqEXOSj52gPvUEHqIeY9cn6Yn2zvtuf7a/2D/tnRbU26jMP0VLYv/4CxJTew=</latexit>= Z rθπθ(τ)R(τ)dτ
<latexit sha1_base64="NIR6vyQDXONwIZJVMqpd8Xs+Xk=">AD/nicjVNdixMxFJ2d8WMdv7rqmy/BUmi1lJlV0JfCogjiw7LKdnehmQ6ZNG3DzheTO7IlBvwrvigiK/+Dt/8Nyad6e62FTEw825J+e0miPOYCPO/3lu1cuXrt+vYN9+at23fuNnbuHYmsLCgb0CzOipOICBbzlA2AQ8xO8oKRJIrZcXT6yuSP7BC8Cw9hHnOgoRMUz7hlICGwh37Qv1EU4IzKJIvlahJCFgwROc8zam4w+ihA6XSRCU98ZVIov8h0Nb2jcMwmMSiTDSr76nRYSVJSzfq/aShws+nUGAMHZb6G0bw4wB6azVx0DKqkpYEdoGOa9xSXaBLzW1IiZ5XmRnCE8KQqWv5L6qLPG+r0b7S3v+pj3Ju6C6iFRBx21VlUePjTdSTDX7zDgzoLowu9q20f1H18ZgzuezCYlUzq/eo0VjL/M5vNsQSuEeIprKudW6jHt35wbP5ho+n1vMVCm4FfB02rXgdh4xceZ7RMWAo0JkIMfS+HQJICOI2ZcnEpWE7oKZmyoQ5TkjARyMX1VailkTGaZIX+tOEFevmEJIkQ8yTSTONWrOcM+LfcsITJi0DyNC+BpbQqNCljBkybwGNecEoxHMdEFpw7RXRGdGXB/SLcfUQ/PWN4Oj3Z7/tLf7lz72U9jm3rofXIalu+9dzas95YB9bAora0P9tf7W/OJ+eL8935UVHtrfrMfWtlOT/AER1UBU=</latexit>Expand expectation Exchange integration and expectation
21
= rθ Z πθ(τ)R(τ)dτ
<latexit sha1_base64="4gyGx/NjBUPBtNX1H/STwXdkv6o=">AD/nicjVNLixNBEJ6d8bGOj82qNy+NIZBoCDOroJfAogjiYVlX5CeD2dTtJsz4PuGtkwDvhXvHhQxKu/w5v/xu7MxM1DxIZqr/6quqrojvKBFfgeb+2bOfK1WvXt2+4N2/dvrPT2L17otJcUnZMU5HKs4goJnjCjoGDYGeZCSOBDuNzl+a+Ol7JhVPkyOYZSyIySThY04JaCjcte+3UB/hmMA0iopXZViQELDiMc54G9NRCh9UCJ0uUmEBj/3ShFB2GelqeqfEgo1hgFUea1bfK4dHVUlKRPGubC94WPLJFAKEsdtCb9oYpgxIZ60/BpJXcK0DbInx5LZef4oqauiEmWyfQC4bEktPDL4qCsJPG+Xw4PFvL8TXkF70LZRaRyOm6r6jx8ZLQROdHsC6PMgOWl2NWxTd1/TG0EZryeyRwSEglSn1e3sRL5n91sriVwNwvxBJYk1OtbTxyZf9hoej1vbmjT8WunadV2GDZ+4lFK85glQAVRauB7GQFkcCpYKWLc8UyQs/JhA20m5CYqaCYX98StTQyQuNU6k9LnKPLGQWJlZrFkWYatWo9ZsC/xQY5jJ8HBU+yHFhCq0bjXCBIkXkLaMQloyBm2iFUcq0V0SnRlwf0i3H1Evz1kTedk72e/6S39/Zpc/9FvY5t64H10GpbvXM2rdeW4fWsUXtwv5kf7G/Oh+dz84353tFtbfqnHvWijk/fgNGClAV</latexit>rθJ(θ) = rθEτ∼pθ(τ)[R(τ)]
<latexit sha1_base64="Ak6iFp2ndaPkRSt0gfcdrY+Q7cQ=">AEBXicjVNb9NAEN3afBTzlcKxlxVRJAeiyi5IcIlUgZAQh6qgpq2UTaz1ZpOs6i/tjlEjsxcu/BUuHECIK/+BG/+Gdey0TYJQR7I0fvP2zZvRbphFQoHn/dmw7GvXb9zcvOXcvnP3v3G1oMjleaS8R5Lo1SehFTxSCS8BwIifpJTuMw4sfh6auyfvyBSyXS5BmGR/EdJKIsWAUDBRsWdst3MUkpjANw+K1DgoaAFEiJplwCRul8FEF0O5gFRTwxNdlCWcXlY6htzWJ+Bj6ROWxYXU9PTysJBmNivfaXfCIFJMpDAhTgu/dQlMOdD2Sn8CNK+6BXBLZHzHpdk5/hC0ygSmUyPcNkLCkrfF3s68qS6Pp6uL+w56/bK0QHdAfTKmk7rarz8HpjcqJYZ+VzkpQX5hdHrvU/c/UpcFM1DM5JKFhROu/5V0sVa6ymfWlM2uNVzjrum0g4aTW/HmwdeT/w6aI6DoLGbzJKWR7zBFhEler7XgaDgkoQLOLaIbniGWndML7Jk1ozNWgmN9ijVsGeFxKs2XAJ6jl08UNFZqFoeGWTpVq7US/Fetn8P4xaAQSZYDT1jVaJxHGFJcPgk8EpIziGYmoUwK4xWzKTV3CMzDcwS/NWR15Oj3R3/6c7u2fNvZf1OjbRNnqEXOSj52gPvUEHqIeY9cn6Yn2zvtuf7a/2D/tnRbU26jMP0VLYv/4CxJTew=</latexit>= Z rθπθ(τ)R(τ)dτ
<latexit sha1_base64="NIR6vyQDXONwIZJVMqpd8Xs+Xk=">AD/nicjVNdixMxFJ2d8WMdv7rqmy/BUmi1lJlV0JfCogjiw7LKdnehmQ6ZNG3DzheTO7IlBvwrvigiK/+Dt/8Nyad6e62FTEw825J+e0miPOYCPO/3lu1cuXrt+vYN9+at23fuNnbuHYmsLCgb0CzOipOICBbzlA2AQ8xO8oKRJIrZcXT6yuSP7BC8Cw9hHnOgoRMUz7hlICGwh37Qv1EU4IzKJIvlahJCFgwROc8zam4w+ihA6XSRCU98ZVIov8h0Nb2jcMwmMSiTDSr76nRYSVJSzfq/aShws+nUGAMHZb6G0bw4wB6azVx0DKqkpYEdoGOa9xSXaBLzW1IiZ5XmRnCE8KQqWv5L6qLPG+r0b7S3v+pj3Ju6C6iFRBx21VlUePjTdSTDX7zDgzoLowu9q20f1H18ZgzuezCYlUzq/eo0VjL/M5vNsQSuEeIprKudW6jHt35wbP5ho+n1vMVCm4FfB02rXgdh4xceZ7RMWAo0JkIMfS+HQJICOI2ZcnEpWE7oKZmyoQ5TkjARyMX1VailkTGaZIX+tOEFevmEJIkQ8yTSTONWrOcM+LfcsITJi0DyNC+BpbQqNCljBkybwGNecEoxHMdEFpw7RXRGdGXB/SLcfUQ/PWN4Oj3Z7/tLf7lz72U9jm3rofXIalu+9dzas95YB9bAora0P9tf7W/OJ+eL8935UVHtrfrMfWtlOT/AER1UBU=</latexit>= Z rθπθ(τ) · πθ(τ) πθ(τ) · R(τ)dτ
<latexit sha1_base64="6Cf4ltUuB2RixiA9bfabPClb2TA=">AEMXicjVPLbtNAFHVtHsU8msKSzYgoUgJRZRck2ESqQAjEoiqoaStlHGs8GSej+iXPNWpk5pfY8CeITRcgxJafYCZ2sZBiJFs3Tn3zL3nHnuCLOICHOd8w7SuXb9xc/OWfvO3Xtbre37RyItcsqGNI3S/CQgkU8YUPgELGTLGckDiJ2HJy+0vnjywXPE0OYZ4xLybThIecElCQv2+6aABwjGBWRCUr6VfEh+w4DHOeBfTSQqfhA+9PhJ+CU9cqVMou8z0Fb0ncRCGFRxIo1cOT4sCpJSVR+kN0lD+d8OgMPYWx30LsuhkD0mv0x0CKqotfEboauehxpewCX9ZUFTHJsjw9QzjMCS1dWe7LShIfuHK8v5TnrsreR9kH5Eq6NmdqvP4sdZG8qlin2lGpSXYlfH1nX/MbUWmPF6Jr1JSBCRer/qxkrmf7xZt8WzdSGeQLPahYTaPv0hK8OaKbmOVOxmr4l+62s+MsFloP3DpoG/U68Ftf8SlRcwSoBERYuQ6GXglyYHTiEkbF4JlhJ6SKRupMCExE165+OMl6ihkgsI0V4+acYFePVGSWIh5HCimViuaOQ3+LTcqIHzhlTzJCmAJrRqFRYQgRfr6oAnPGYVorgJCc60Ijojyj5Ql8xWJrjNkdeDo90d9+nO7vtn7b2XtR2bxkPjkdE1XO5sWe8NQ6MoUHNz+Y387v5w/pinVs/rV8V1dyozwVpb1+w8M52YB</latexit>Expand expectation Exchange integration and expectation
22
= rθ Z πθ(τ)R(τ)dτ
<latexit sha1_base64="4gyGx/NjBUPBtNX1H/STwXdkv6o=">AD/nicjVNLixNBEJ6d8bGOj82qNy+NIZBoCDOroJfAogjiYVlX5CeD2dTtJsz4PuGtkwDvhXvHhQxKu/w5v/xu7MxM1DxIZqr/6quqrojvKBFfgeb+2bOfK1WvXt2+4N2/dvrPT2L17otJcUnZMU5HKs4goJnjCjoGDYGeZCSOBDuNzl+a+Ol7JhVPkyOYZSyIySThY04JaCjcte+3UB/hmMA0iopXZViQELDiMc54G9NRCh9UCJ0uUmEBj/3ShFB2GelqeqfEgo1hgFUea1bfK4dHVUlKRPGubC94WPLJFAKEsdtCb9oYpgxIZ60/BpJXcK0DbInx5LZef4oqauiEmWyfQC4bEktPDL4qCsJPG+Xw4PFvL8TXkF70LZRaRyOm6r6jx8ZLQROdHsC6PMgOWl2NWxTd1/TG0EZryeyRwSEglSn1e3sRL5n91sriVwNwvxBJYk1OtbTxyZf9hoej1vbmjT8WunadV2GDZ+4lFK85glQAVRauB7GQFkcCpYKWLc8UyQs/JhA20m5CYqaCYX98StTQyQuNU6k9LnKPLGQWJlZrFkWYatWo9ZsC/xQY5jJ8HBU+yHFhCq0bjXCBIkXkLaMQloyBm2iFUcq0V0SnRlwf0i3H1Evz1kTedk72e/6S39/Zpc/9FvY5t64H10GpbvXM2rdeW4fWsUXtwv5kf7G/Oh+dz84353tFtbfqnHvWijk/fgNGClAV</latexit>rθJ(θ) = rθEτ∼pθ(τ)[R(τ)]
<latexit sha1_base64="Ak6iFp2ndaPkRSt0gfcdrY+Q7cQ=">AEBXicjVNb9NAEN3afBTzlcKxlxVRJAeiyi5IcIlUgZAQh6qgpq2UTaz1ZpOs6i/tjlEjsxcu/BUuHECIK/+BG/+Gdey0TYJQR7I0fvP2zZvRbphFQoHn/dmw7GvXb9zcvOXcvnP3v3G1oMjleaS8R5Lo1SehFTxSCS8BwIifpJTuMw4sfh6auyfvyBSyXS5BmGR/EdJKIsWAUDBRsWdst3MUkpjANw+K1DgoaAFEiJplwCRul8FEF0O5gFRTwxNdlCWcXlY6htzWJ+Bj6ROWxYXU9PTysJBmNivfaXfCIFJMpDAhTgu/dQlMOdD2Sn8CNK+6BXBLZHzHpdk5/hC0ygSmUyPcNkLCkrfF3s68qS6Pp6uL+w56/bK0QHdAfTKmk7rarz8HpjcqJYZ+VzkpQX5hdHrvU/c/UpcFM1DM5JKFhROu/5V0sVa6ymfWlM2uNVzjrum0g4aTW/HmwdeT/w6aI6DoLGbzJKWR7zBFhEler7XgaDgkoQLOLaIbniGWndML7Jk1ozNWgmN9ijVsGeFxKs2XAJ6jl08UNFZqFoeGWTpVq7US/Fetn8P4xaAQSZYDT1jVaJxHGFJcPgk8EpIziGYmoUwK4xWzKTV3CMzDcwS/NWR15Oj3R3/6c7u2fNvZf1OjbRNnqEXOSj52gPvUEHqIeY9cn6Yn2zvtuf7a/2D/tnRbU26jMP0VLYv/4CxJTew=</latexit>= Z rθπθ(τ)R(τ)dτ
<latexit sha1_base64="NIR6vyQDXONwIZJVMqpd8Xs+Xk=">AD/nicjVNdixMxFJ2d8WMdv7rqmy/BUmi1lJlV0JfCogjiw7LKdnehmQ6ZNG3DzheTO7IlBvwrvigiK/+Dt/8Nyad6e62FTEw825J+e0miPOYCPO/3lu1cuXrt+vYN9+at23fuNnbuHYmsLCgb0CzOipOICBbzlA2AQ8xO8oKRJIrZcXT6yuSP7BC8Cw9hHnOgoRMUz7hlICGwh37Qv1EU4IzKJIvlahJCFgwROc8zam4w+ihA6XSRCU98ZVIov8h0Nb2jcMwmMSiTDSr76nRYSVJSzfq/aShws+nUGAMHZb6G0bw4wB6azVx0DKqkpYEdoGOa9xSXaBLzW1IiZ5XmRnCE8KQqWv5L6qLPG+r0b7S3v+pj3Ju6C6iFRBx21VlUePjTdSTDX7zDgzoLowu9q20f1H18ZgzuezCYlUzq/eo0VjL/M5vNsQSuEeIprKudW6jHt35wbP5ho+n1vMVCm4FfB02rXgdh4xceZ7RMWAo0JkIMfS+HQJICOI2ZcnEpWE7oKZmyoQ5TkjARyMX1VailkTGaZIX+tOEFevmEJIkQ8yTSTONWrOcM+LfcsITJi0DyNC+BpbQqNCljBkybwGNecEoxHMdEFpw7RXRGdGXB/SLcfUQ/PWN4Oj3Z7/tLf7lz72U9jm3rofXIalu+9dzas95YB9bAora0P9tf7W/OJ+eL8935UVHtrfrMfWtlOT/AER1UBU=</latexit>= Z rθπθ(τ) · πθ(τ) πθ(τ) · R(τ)dτ
<latexit sha1_base64="6Cf4ltUuB2RixiA9bfabPClb2TA=">AEMXicjVPLbtNAFHVtHsU8msKSzYgoUgJRZRck2ESqQAjEoiqoaStlHGs8GSej+iXPNWpk5pfY8CeITRcgxJafYCZ2sZBiJFs3Tn3zL3nHnuCLOICHOd8w7SuXb9xc/OWfvO3Xtbre37RyItcsqGNI3S/CQgkU8YUPgELGTLGckDiJ2HJy+0vnjywXPE0OYZ4xLybThIecElCQv2+6aABwjGBWRCUr6VfEh+w4DHOeBfTSQqfhA+9PhJ+CU9cqVMou8z0Fb0ncRCGFRxIo1cOT4sCpJSVR+kN0lD+d8OgMPYWx30LsuhkD0mv0x0CKqotfEboauehxpewCX9ZUFTHJsjw9QzjMCS1dWe7LShIfuHK8v5TnrsreR9kH5Eq6NmdqvP4sdZG8qlin2lGpSXYlfH1nX/MbUWmPF6Jr1JSBCRer/qxkrmf7xZt8WzdSGeQLPahYTaPv0hK8OaKbmOVOxmr4l+62s+MsFloP3DpoG/U68Ftf8SlRcwSoBERYuQ6GXglyYHTiEkbF4JlhJ6SKRupMCExE165+OMl6ihkgsI0V4+acYFePVGSWIh5HCimViuaOQ3+LTcqIHzhlTzJCmAJrRqFRYQgRfr6oAnPGYVorgJCc60Ijojyj5Ql8xWJrjNkdeDo90d9+nO7vtn7b2XtR2bxkPjkdE1XO5sWe8NQ6MoUHNz+Y387v5w/pinVs/rV8V1dyozwVpb1+w8M52YB</latexit>Expand expectation Exchange integration and expectation
rθ log π(τ) = rθπ(τ) π(τ)
<latexit sha1_base64="PFBv8gP4ATPO1SJnvIETFRlOHmU=">AFOnicjVTLbtNAFHWbAMW8WliyGVFSiCq4oIEm0gVCAmxqALqS8qk1ngySUYdPzRzjVKZ+S42fAU7FmxYgBbPoAZ20Tu6o6kq3rc+/c+7x2EiuIJe7/vKaqN54+atdvunbv37j9Y3h4oOJUrZPYxHLo4AoJnjE9oGDYEeJZCQMBDsMTt7Y/OEnJhWPoz04TdgoJNOITzglYCB/ozFoT7CIYFZEGRvtZ8RH7DiIU54G9NxDJ+VD50uUn4GzxtUyi5yHRNeUdjwSYwxCoNTVW/p4/3ipaUiOyjbp/VYcmnMxghjN0Wet/GMGNAOhV+DCQtWPyioG2Rc46Ftjl+1tN0xCRJZDxHeCIJzTyd7epCEu97+nj3TJ5Xl5fxLuguIkXQcVsF8/FTq43IqameW2UW1Bdil8e2fa+Y2gpMeDmTfYhIEj5vOzGUuY63tRtMXSmEY+g2u1cQmfZGFYdWUriNFdZVrbO9ufo5ywsquqgIRT2sqrmh5rfEXGa5FMHIxsDnkH1Am2dgMe5nKcoJ+eaZqXi4VYbaX9/sbfXyheqBVwabTrkG/vo3PI5pGrIqCBKDb1eAqOMSOBUMO3iVLGE0BMyZUMTRiRkapTlyjVqGWSMJrE0l/E+Rxd3ZCRU6jQMTKW1QFVzFrwsN0xh8mqU8ShJgUW0IJqkAkGM7H8EjblkFMSpCQiV3GhFdEaMSWD+Nq4xwauOXA8Otre851vbH15s7rwu7VhzHjtPnLbjOS+dHedM3D2Hdr40vjR+NX43fza/Nn80/xblK6ulHseOUur+e8/uA7IlQ=</latexit>23
= rθ Z πθ(τ)R(τ)dτ
<latexit sha1_base64="4gyGx/NjBUPBtNX1H/STwXdkv6o=">AD/nicjVNLixNBEJ6d8bGOj82qNy+NIZBoCDOroJfAogjiYVlX5CeD2dTtJsz4PuGtkwDvhXvHhQxKu/w5v/xu7MxM1DxIZqr/6quqrojvKBFfgeb+2bOfK1WvXt2+4N2/dvrPT2L17otJcUnZMU5HKs4goJnjCjoGDYGeZCSOBDuNzl+a+Ol7JhVPkyOYZSyIySThY04JaCjcte+3UB/hmMA0iopXZViQELDiMc54G9NRCh9UCJ0uUmEBj/3ShFB2GelqeqfEgo1hgFUea1bfK4dHVUlKRPGubC94WPLJFAKEsdtCb9oYpgxIZ60/BpJXcK0DbInx5LZef4oqauiEmWyfQC4bEktPDL4qCsJPG+Xw4PFvL8TXkF70LZRaRyOm6r6jx8ZLQROdHsC6PMgOWl2NWxTd1/TG0EZryeyRwSEglSn1e3sRL5n91sriVwNwvxBJYk1OtbTxyZf9hoej1vbmjT8WunadV2GDZ+4lFK85glQAVRauB7GQFkcCpYKWLc8UyQs/JhA20m5CYqaCYX98StTQyQuNU6k9LnKPLGQWJlZrFkWYatWo9ZsC/xQY5jJ8HBU+yHFhCq0bjXCBIkXkLaMQloyBm2iFUcq0V0SnRlwf0i3H1Evz1kTedk72e/6S39/Zpc/9FvY5t64H10GpbvXM2rdeW4fWsUXtwv5kf7G/Oh+dz84353tFtbfqnHvWijk/fgNGClAV</latexit>rθJ(θ) = rθEτ∼pθ(τ)[R(τ)]
<latexit sha1_base64="Ak6iFp2ndaPkRSt0gfcdrY+Q7cQ=">AEBXicjVNb9NAEN3afBTzlcKxlxVRJAeiyi5IcIlUgZAQh6qgpq2UTaz1ZpOs6i/tjlEjsxcu/BUuHECIK/+BG/+Gdey0TYJQR7I0fvP2zZvRbphFQoHn/dmw7GvXb9zcvOXcvnP3v3G1oMjleaS8R5Lo1SehFTxSCS8BwIifpJTuMw4sfh6auyfvyBSyXS5BmGR/EdJKIsWAUDBRsWdst3MUkpjANw+K1DgoaAFEiJplwCRul8FEF0O5gFRTwxNdlCWcXlY6htzWJ+Bj6ROWxYXU9PTysJBmNivfaXfCIFJMpDAhTgu/dQlMOdD2Sn8CNK+6BXBLZHzHpdk5/hC0ygSmUyPcNkLCkrfF3s68qS6Pp6uL+w56/bK0QHdAfTKmk7rarz8HpjcqJYZ+VzkpQX5hdHrvU/c/UpcFM1DM5JKFhROu/5V0sVa6ymfWlM2uNVzjrum0g4aTW/HmwdeT/w6aI6DoLGbzJKWR7zBFhEler7XgaDgkoQLOLaIbniGWndML7Jk1ozNWgmN9ijVsGeFxKs2XAJ6jl08UNFZqFoeGWTpVq7US/Fetn8P4xaAQSZYDT1jVaJxHGFJcPgk8EpIziGYmoUwK4xWzKTV3CMzDcwS/NWR15Oj3R3/6c7u2fNvZf1OjbRNnqEXOSj52gPvUEHqIeY9cn6Yn2zvtuf7a/2D/tnRbU26jMP0VLYv/4CxJTew=</latexit>= Z rθπθ(τ)R(τ)dτ
<latexit sha1_base64="NIR6vyQDXONwIZJVMqpd8Xs+Xk=">AD/nicjVNdixMxFJ2d8WMdv7rqmy/BUmi1lJlV0JfCogjiw7LKdnehmQ6ZNG3DzheTO7IlBvwrvigiK/+Dt/8Nyad6e62FTEw825J+e0miPOYCPO/3lu1cuXrt+vYN9+at23fuNnbuHYmsLCgb0CzOipOICBbzlA2AQ8xO8oKRJIrZcXT6yuSP7BC8Cw9hHnOgoRMUz7hlICGwh37Qv1EU4IzKJIvlahJCFgwROc8zam4w+ihA6XSRCU98ZVIov8h0Nb2jcMwmMSiTDSr76nRYSVJSzfq/aShws+nUGAMHZb6G0bw4wB6azVx0DKqkpYEdoGOa9xSXaBLzW1IiZ5XmRnCE8KQqWv5L6qLPG+r0b7S3v+pj3Ju6C6iFRBx21VlUePjTdSTDX7zDgzoLowu9q20f1H18ZgzuezCYlUzq/eo0VjL/M5vNsQSuEeIprKudW6jHt35wbP5ho+n1vMVCm4FfB02rXgdh4xceZ7RMWAo0JkIMfS+HQJICOI2ZcnEpWE7oKZmyoQ5TkjARyMX1VailkTGaZIX+tOEFevmEJIkQ8yTSTONWrOcM+LfcsITJi0DyNC+BpbQqNCljBkybwGNecEoxHMdEFpw7RXRGdGXB/SLcfUQ/PWN4Oj3Z7/tLf7lz72U9jm3rofXIalu+9dzas95YB9bAora0P9tf7W/OJ+eL8935UVHtrfrMfWtlOT/AER1UBU=</latexit>= Z rθπθ(τ) · πθ(τ) πθ(τ) · R(τ)dτ
<latexit sha1_base64="6Cf4ltUuB2RixiA9bfabPClb2TA=">AEMXicjVPLbtNAFHVtHsU8msKSzYgoUgJRZRck2ESqQAjEoiqoaStlHGs8GSej+iXPNWpk5pfY8CeITRcgxJafYCZ2sZBiJFs3Tn3zL3nHnuCLOICHOd8w7SuXb9xc/OWfvO3Xtbre37RyItcsqGNI3S/CQgkU8YUPgELGTLGckDiJ2HJy+0vnjywXPE0OYZ4xLybThIecElCQv2+6aABwjGBWRCUr6VfEh+w4DHOeBfTSQqfhA+9PhJ+CU9cqVMou8z0Fb0ncRCGFRxIo1cOT4sCpJSVR+kN0lD+d8OgMPYWx30LsuhkD0mv0x0CKqotfEboauehxpewCX9ZUFTHJsjw9QzjMCS1dWe7LShIfuHK8v5TnrsreR9kH5Eq6NmdqvP4sdZG8qlin2lGpSXYlfH1nX/MbUWmPF6Jr1JSBCRer/qxkrmf7xZt8WzdSGeQLPahYTaPv0hK8OaKbmOVOxmr4l+62s+MsFloP3DpoG/U68Ftf8SlRcwSoBERYuQ6GXglyYHTiEkbF4JlhJ6SKRupMCExE165+OMl6ihkgsI0V4+acYFePVGSWIh5HCimViuaOQ3+LTcqIHzhlTzJCmAJrRqFRYQgRfr6oAnPGYVorgJCc60Ijojyj5Ql8xWJrjNkdeDo90d9+nO7vtn7b2XtR2bxkPjkdE1XO5sWe8NQ6MoUHNz+Y387v5w/pinVs/rV8V1dyozwVpb1+w8M52YB</latexit>= Z πθ(τ)rθ log πθ(τ)R(τ)dτ
<latexit sha1_base64="81H4Sz9KP1lTiTaRTFADwQ1hG8=">AEgnicjVNdb9MwFM3aACN8dfDIi0VqR1Vl2xI8EClCYSEeJgGWrdJdRc5rtNacz4U36BVwf+D38UbvwbsJt3WdEJYSnR97vE9517LQSq4BNf9vdVo2vfuP9h+6Dx6/OTps9bO81OZ5BlI5qIJDsPiGSCx2wEHAQ7TzNGokCws+Dyo8mfWeZ5El8AouUTSIyi3nIKQEN+TuNnx0RDgiMA+C4pPyC+IDljzCKe9iOk3gh/Sh10fSL+C1p0wKpTeZvqb3FBYshDGWeaRZQ1dnJQlKRHFN9Vd8XDGZ3OYIydDvrSxTBnQHo1fQwkL1X8ktA1yLXGrbJLfFVTV8QkTbPkCuEwI7TwVHGkSkt86KmLo5U9b9Newfug+oiUQc/plMoXu8YbyWafWcGVDdmF1v29T9R9fGYMqrnswmJoEg1X59GmuZ/5nN5li0nC7EY6hXu7ZQjc9cZDmwekptIiW7rjU1f2clVztT1xfJbMPDnQX9VtsduMuFNgOvCtpWtY791i8TWgesRioIFKOPTeFSUEy4FQw5eBcspTQSzJjYx3GJGJyUiyfkEIdjUxRmGT6010s0dsnChJuYgCzTRuZT1nwLty4xzCd5OCx2kOLKalUJgLBAky7xFNecYoiIUOCM249oronOj7AP1qHT0Er97yZnC6P/AOBvtf37QP1Tj2LZeWq+sruVZb61D67N1bI0s2vjT7DQHzT3btndtz4oqY2t6swLa23Z7/8CAmB+6g=</latexit>Expand expectation Exchange integration and expectation
rθ log π(τ) = rθπ(τ) π(τ)
<latexit sha1_base64="PFBv8gP4ATPO1SJnvIETFRlOHmU=">AFOnicjVTLbtNAFHWbAMW8WliyGVFSiCq4oIEm0gVCAmxqALqS8qk1ngySUYdPzRzjVKZ+S42fAU7FmxYgBbPoAZ20Tu6o6kq3rc+/c+7x2EiuIJe7/vKaqN54+atdvunbv37j9Y3h4oOJUrZPYxHLo4AoJnjE9oGDYEeJZCQMBDsMTt7Y/OEnJhWPoz04TdgoJNOITzglYCB/ozFoT7CIYFZEGRvtZ8RH7DiIU54G9NxDJ+VD50uUn4GzxtUyi5yHRNeUdjwSYwxCoNTVW/p4/3ipaUiOyjbp/VYcmnMxghjN0Wet/GMGNAOhV+DCQtWPyioG2Rc46Ftjl+1tN0xCRJZDxHeCIJzTyd7epCEu97+nj3TJ5Xl5fxLuguIkXQcVsF8/FTq43IqameW2UW1Bdil8e2fa+Y2gpMeDmTfYhIEj5vOzGUuY63tRtMXSmEY+g2u1cQmfZGFYdWUriNFdZVrbO9ufo5ywsquqgIRT2sqrmh5rfEXGa5FMHIxsDnkH1Am2dgMe5nKcoJ+eaZqXi4VYbaX9/sbfXyheqBVwabTrkG/vo3PI5pGrIqCBKDb1eAqOMSOBUMO3iVLGE0BMyZUMTRiRkapTlyjVqGWSMJrE0l/E+Rxd3ZCRU6jQMTKW1QFVzFrwsN0xh8mqU8ShJgUW0IJqkAkGM7H8EjblkFMSpCQiV3GhFdEaMSWD+Nq4xwauOXA8Otre851vbH15s7rwu7VhzHjtPnLbjOS+dHedM3D2Hdr40vjR+NX43fza/Nn80/xblK6ulHseOUur+e8/uA7IlQ=</latexit>24
= rθ Z πθ(τ)R(τ)dτ
<latexit sha1_base64="4gyGx/NjBUPBtNX1H/STwXdkv6o=">AD/nicjVNLixNBEJ6d8bGOj82qNy+NIZBoCDOroJfAogjiYVlX5CeD2dTtJsz4PuGtkwDvhXvHhQxKu/w5v/xu7MxM1DxIZqr/6quqrojvKBFfgeb+2bOfK1WvXt2+4N2/dvrPT2L17otJcUnZMU5HKs4goJnjCjoGDYGeZCSOBDuNzl+a+Ol7JhVPkyOYZSyIySThY04JaCjcte+3UB/hmMA0iopXZViQELDiMc54G9NRCh9UCJ0uUmEBj/3ShFB2GelqeqfEgo1hgFUea1bfK4dHVUlKRPGubC94WPLJFAKEsdtCb9oYpgxIZ60/BpJXcK0DbInx5LZef4oqauiEmWyfQC4bEktPDL4qCsJPG+Xw4PFvL8TXkF70LZRaRyOm6r6jx8ZLQROdHsC6PMgOWl2NWxTd1/TG0EZryeyRwSEglSn1e3sRL5n91sriVwNwvxBJYk1OtbTxyZf9hoej1vbmjT8WunadV2GDZ+4lFK85glQAVRauB7GQFkcCpYKWLc8UyQs/JhA20m5CYqaCYX98StTQyQuNU6k9LnKPLGQWJlZrFkWYatWo9ZsC/xQY5jJ8HBU+yHFhCq0bjXCBIkXkLaMQloyBm2iFUcq0V0SnRlwf0i3H1Evz1kTedk72e/6S39/Zpc/9FvY5t64H10GpbvXM2rdeW4fWsUXtwv5kf7G/Oh+dz84353tFtbfqnHvWijk/fgNGClAV</latexit>rθJ(θ) = rθEτ∼pθ(τ)[R(τ)]
<latexit sha1_base64="Ak6iFp2ndaPkRSt0gfcdrY+Q7cQ=">AEBXicjVNb9NAEN3afBTzlcKxlxVRJAeiyi5IcIlUgZAQh6qgpq2UTaz1ZpOs6i/tjlEjsxcu/BUuHECIK/+BG/+Gdey0TYJQR7I0fvP2zZvRbphFQoHn/dmw7GvXb9zcvOXcvnP3v3G1oMjleaS8R5Lo1SehFTxSCS8BwIifpJTuMw4sfh6auyfvyBSyXS5BmGR/EdJKIsWAUDBRsWdst3MUkpjANw+K1DgoaAFEiJplwCRul8FEF0O5gFRTwxNdlCWcXlY6htzWJ+Bj6ROWxYXU9PTysJBmNivfaXfCIFJMpDAhTgu/dQlMOdD2Sn8CNK+6BXBLZHzHpdk5/hC0ygSmUyPcNkLCkrfF3s68qS6Pp6uL+w56/bK0QHdAfTKmk7rarz8HpjcqJYZ+VzkpQX5hdHrvU/c/UpcFM1DM5JKFhROu/5V0sVa6ymfWlM2uNVzjrum0g4aTW/HmwdeT/w6aI6DoLGbzJKWR7zBFhEler7XgaDgkoQLOLaIbniGWndML7Jk1ozNWgmN9ijVsGeFxKs2XAJ6jl08UNFZqFoeGWTpVq7US/Fetn8P4xaAQSZYDT1jVaJxHGFJcPgk8EpIziGYmoUwK4xWzKTV3CMzDcwS/NWR15Oj3R3/6c7u2fNvZf1OjbRNnqEXOSj52gPvUEHqIeY9cn6Yn2zvtuf7a/2D/tnRbU26jMP0VLYv/4CxJTew=</latexit>= Z rθπθ(τ)R(τ)dτ
<latexit sha1_base64="NIR6vyQDXONwIZJVMqpd8Xs+Xk=">AD/nicjVNdixMxFJ2d8WMdv7rqmy/BUmi1lJlV0JfCogjiw7LKdnehmQ6ZNG3DzheTO7IlBvwrvigiK/+Dt/8Nyad6e62FTEw825J+e0miPOYCPO/3lu1cuXrt+vYN9+at23fuNnbuHYmsLCgb0CzOipOICBbzlA2AQ8xO8oKRJIrZcXT6yuSP7BC8Cw9hHnOgoRMUz7hlICGwh37Qv1EU4IzKJIvlahJCFgwROc8zam4w+ihA6XSRCU98ZVIov8h0Nb2jcMwmMSiTDSr76nRYSVJSzfq/aShws+nUGAMHZb6G0bw4wB6azVx0DKqkpYEdoGOa9xSXaBLzW1IiZ5XmRnCE8KQqWv5L6qLPG+r0b7S3v+pj3Ju6C6iFRBx21VlUePjTdSTDX7zDgzoLowu9q20f1H18ZgzuezCYlUzq/eo0VjL/M5vNsQSuEeIprKudW6jHt35wbP5ho+n1vMVCm4FfB02rXgdh4xceZ7RMWAo0JkIMfS+HQJICOI2ZcnEpWE7oKZmyoQ5TkjARyMX1VailkTGaZIX+tOEFevmEJIkQ8yTSTONWrOcM+LfcsITJi0DyNC+BpbQqNCljBkybwGNecEoxHMdEFpw7RXRGdGXB/SLcfUQ/PWN4Oj3Z7/tLf7lz72U9jm3rofXIalu+9dzas95YB9bAora0P9tf7W/OJ+eL8935UVHtrfrMfWtlOT/AER1UBU=</latexit>= Z rθπθ(τ) · πθ(τ) πθ(τ) · R(τ)dτ
<latexit sha1_base64="6Cf4ltUuB2RixiA9bfabPClb2TA=">AEMXicjVPLbtNAFHVtHsU8msKSzYgoUgJRZRck2ESqQAjEoiqoaStlHGs8GSej+iXPNWpk5pfY8CeITRcgxJafYCZ2sZBiJFs3Tn3zL3nHnuCLOICHOd8w7SuXb9xc/OWfvO3Xtbre37RyItcsqGNI3S/CQgkU8YUPgELGTLGckDiJ2HJy+0vnjywXPE0OYZ4xLybThIecElCQv2+6aABwjGBWRCUr6VfEh+w4DHOeBfTSQqfhA+9PhJ+CU9cqVMou8z0Fb0ncRCGFRxIo1cOT4sCpJSVR+kN0lD+d8OgMPYWx30LsuhkD0mv0x0CKqotfEboauehxpewCX9ZUFTHJsjw9QzjMCS1dWe7LShIfuHK8v5TnrsreR9kH5Eq6NmdqvP4sdZG8qlin2lGpSXYlfH1nX/MbUWmPF6Jr1JSBCRer/qxkrmf7xZt8WzdSGeQLPahYTaPv0hK8OaKbmOVOxmr4l+62s+MsFloP3DpoG/U68Ftf8SlRcwSoBERYuQ6GXglyYHTiEkbF4JlhJ6SKRupMCExE165+OMl6ihkgsI0V4+acYFePVGSWIh5HCimViuaOQ3+LTcqIHzhlTzJCmAJrRqFRYQgRfr6oAnPGYVorgJCc60Ijojyj5Ql8xWJrjNkdeDo90d9+nO7vtn7b2XtR2bxkPjkdE1XO5sWe8NQ6MoUHNz+Y387v5w/pinVs/rV8V1dyozwVpb1+w8M52YB</latexit>= Z πθ(τ)rθ log πθ(τ)R(τ)dτ
<latexit sha1_base64="81H4Sz9KP1lTiTaRTFADwQ1hG8=">AEgnicjVNdb9MwFM3aACN8dfDIi0VqR1Vl2xI8EClCYSEeJgGWrdJdRc5rtNacz4U36BVwf+D38UbvwbsJt3WdEJYSnR97vE9517LQSq4BNf9vdVo2vfuP9h+6Dx6/OTps9bO81OZ5BlI5qIJDsPiGSCx2wEHAQ7TzNGokCws+Dyo8mfWeZ5El8AouUTSIyi3nIKQEN+TuNnx0RDgiMA+C4pPyC+IDljzCKe9iOk3gh/Sh10fSL+C1p0wKpTeZvqb3FBYshDGWeaRZQ1dnJQlKRHFN9Vd8XDGZ3OYIydDvrSxTBnQHo1fQwkL1X8ktA1yLXGrbJLfFVTV8QkTbPkCuEwI7TwVHGkSkt86KmLo5U9b9Newfug+oiUQc/plMoXu8YbyWafWcGVDdmF1v29T9R9fGYMqrnswmJoEg1X59GmuZ/5nN5li0nC7EY6hXu7ZQjc9cZDmwekptIiW7rjU1f2clVztT1xfJbMPDnQX9VtsduMuFNgOvCtpWtY791i8TWgesRioIFKOPTeFSUEy4FQw5eBcspTQSzJjYx3GJGJyUiyfkEIdjUxRmGT6010s0dsnChJuYgCzTRuZT1nwLty4xzCd5OCx2kOLKalUJgLBAky7xFNecYoiIUOCM249oronOj7AP1qHT0Er97yZnC6P/AOBvtf37QP1Tj2LZeWq+sruVZb61D67N1bI0s2vjT7DQHzT3btndtz4oqY2t6swLa23Z7/8CAmB+6g=</latexit>= Eτ∼pθ(τ)[rθ log πθ(τ)R(τ)]
<latexit sha1_base64="68w1wxgh2qI5Wm1k19LrzwOBZpQ=">AE3icjVTPb9MwFM7WAqP86uDIxaKq1EJVJQMJLpUmEBLiMA20bpPqNnJcp7Xm/FD8glYFX7hwACGu/Fvc+EO4YzfptiYTqVEz97/r7PL3a8WHAJtv1na7tWv3Hz1s7txp279+4/aO4+PJZRmlA2pJGIklOPSCZ4yIbAQbDTOGEk8AQ78c7emPzJ5ZIHoVHsIjZOCzkPucEtCQu7v9t40GCAcE5p6XvVuRlzAkgc45h1MpxF8li50e0i6GTxzlEmh+DLT0+VdhQXzYRlGuiqga0mRzklJSL7qDqrOpzw2RzGCONG73vYJgzIN2SPgaS5ipuXtAxyIXGFdolvuLUjJjEcRKdI+wnhGaOyg5UbokPHDU5WNlzqvYy3gPVQyQPuo12rjx5aryRZKarz40zA6pLs+vbNrz/2bUxGPNiT2YSEk+QYr7ejbXMJr2ptkXLaSIeQpntwkLRPvMh84aVU6qK5NVlral5N5bnaClYWlV2IKJZxcX1lJsdjNEa/0b0Y7fZsv2cqBq4BRByrGodv8jacRTQMWAhVEypFjxzDOSAKcCqYaOJUsJvSMzNhIhyEJmBxny/upUFsjU+RHiX50g5bo1RUZCaRcBJ6uNE5lOWfA63KjFPxX4yHcQospLmQnwoETKXHU15wiIhQ4ITbj2iuic6I8N+pfQ0E1wyluBsd7fed5f+/Di9b+6IdO9Zj64nVsRzrpbVvbMOraFa7j2pfat9r1O6l/rP+o/89LtrWLNI2t1H/9A7FZpC8=</latexit>Expand expectation Exchange integration and expectation
rθ log π(τ) = rθπ(τ) π(τ)
<latexit sha1_base64="PFBv8gP4ATPO1SJnvIETFRlOHmU=">AFOnicjVTLbtNAFHWbAMW8WliyGVFSiCq4oIEm0gVCAmxqALqS8qk1ngySUYdPzRzjVKZ+S42fAU7FmxYgBbPoAZ20Tu6o6kq3rc+/c+7x2EiuIJe7/vKaqN54+atdvunbv37j9Y3h4oOJUrZPYxHLo4AoJnjE9oGDYEeJZCQMBDsMTt7Y/OEnJhWPoz04TdgoJNOITzglYCB/ozFoT7CIYFZEGRvtZ8RH7DiIU54G9NxDJ+VD50uUn4GzxtUyi5yHRNeUdjwSYwxCoNTVW/p4/3ipaUiOyjbp/VYcmnMxghjN0Wet/GMGNAOhV+DCQtWPyioG2Rc46Ftjl+1tN0xCRJZDxHeCIJzTyd7epCEu97+nj3TJ5Xl5fxLuguIkXQcVsF8/FTq43IqameW2UW1Bdil8e2fa+Y2gpMeDmTfYhIEj5vOzGUuY63tRtMXSmEY+g2u1cQmfZGFYdWUriNFdZVrbO9ufo5ywsquqgIRT2sqrmh5rfEXGa5FMHIxsDnkH1Am2dgMe5nKcoJ+eaZqXi4VYbaX9/sbfXyheqBVwabTrkG/vo3PI5pGrIqCBKDb1eAqOMSOBUMO3iVLGE0BMyZUMTRiRkapTlyjVqGWSMJrE0l/E+Rxd3ZCRU6jQMTKW1QFVzFrwsN0xh8mqU8ShJgUW0IJqkAkGM7H8EjblkFMSpCQiV3GhFdEaMSWD+Nq4xwauOXA8Otre851vbH15s7rwu7VhzHjtPnLbjOS+dHedM3D2Hdr40vjR+NX43fza/Nn80/xblK6ulHseOUur+e8/uA7IlQ=</latexit>25
rθ " log p(s0) +
T
X
t=1
log πθ(at|st) +
T
X
t=1
log p(st+1 | st, at) #
<latexit sha1_base64="ht8omtVqWDaNhA+LkRIL8/ARjQ=">AFzHicjVRLb9NAEHbBIp5NIUjlxVpIRGlV2Q4BKpAiEBh6qgvqRsaq3Xm2TV9UPeMUpk9soP5MaVX8Ku7baJXapaSjSexzfjNrPxFcguP8WVvfaLUfPNx8ZD9+8vTZVmf7+amMs5SyExqLOD3iWSCR+wEOAh2nqSMhL5gZ/7lRxM/+8FSyePoGBYJG4dkGvEJpwS0y9ve+NtFQ4RDAjPfz8pLyceYMlDnPAepkEMP6UH/QGSXg67rjIhlNxEBjq9r7BgExhmYU6a+ioi+MSkhKRf1e9qzyc8ukMxghju4u+9jDMGJB+rT8GkpVdvDKhZzXPZgC/8VpkbEJEnSeI7wJCU0d1V+qEpKfOiqi8Mrem6TXs4HoAaIlEbf7padL14biSd6uy5YWac6obs6rEN7h2nNgQTXp3JvETEF6R6X1VjJXIfbZqy6HYaiEdQR7umUMlnBlkKVg+pqfMrvcKzL9d7FHRsFZVZyDiaYPF/yDvEOlesiyX36vx2K7VFBPWlXrpef0S5ambeJLIHqeRf35bY0A1DeoZAHyGwHq0HsDkUVzpPWaDlv023StNhteWN6S7NrjKV19lx9pziQU3DrYwdq3qOvM5vHMQ0C1kEVBApR6TwDgnKXAqmLJxJlC6CWZspE2IxIyOc4L5gp1tSdAkzjVP70NhXe5IiehlIvQ15lGfFmPGedtsVEGk/fjnEdJBiyiZaNJhDEyHzZUMBTRkEstEFoyjVXRGdEiwT6+2drEdz6kZvG6f6e+2Zv/9vbnYMPlRyb1kvrldWzXOudWB9to6sE4u2vrTi1ry1aB+2oZ23VZm6vlbVvLBWnvavf0W9+Kw=</latexit>pθ(τ) = pθ(s0, a0, . . . sT , aT )
<latexit sha1_base64="s8bA72w0YkcwfJbRWFZT159+D+w=">ACGXicbVDLSgMxFM3UV62vUZdugkVoQcpMFXQjFN24rNAXtGW4k6ZtaOZBckcopb/hxl9x40IRl7ryb0wfiLYeuHByzr1J7vFjKTQ6zpeVWldW9Ib2a2tnd29+z9g5qOEsV4lUyUg0fNJci5FUKHkjVhwCX/K6P7iZ+PV7rSIwgoOY94OoBeKrmCARvJsJ/Za2OcIuRZCkqdX9EfQnMKplqdCDXVXsWcKnPzjoFZwq6TNw5yZI5yp79YS5gScBDZBK0brpOjO0RKBRM8nGmlWgeAxtAjzcNDSHguj2abjamJ0bp0G6kTIVIp+rviREWg8D3QGgH296E3E/7xmgt3L9kiEcYI8ZLOHuomkGNFJTLQjFGcoh4YAU8L8lbI+KGBowsyYENzFlZdJrVhwzwrFu/Ns6XoeR5ockWOSIy65ICVyS8qkSh5IE/khbxaj9az9Wa9z1pT1nzmkPyB9fkNiuSevA=</latexit>=
T
Y
t=0
pθ(at | st) · p(st+1 | st, at)
<latexit sha1_base64="Qqm7vlMzjz6M4jPwkfTAmE5nExY=">ACLHicbVDLSgMxFM34rPVdekmWIQWpcxUQTeFYjcuK1gtdOqQyaQ2NDMJyR2hDP0gN/6KIC4s4tbvMK0VfB0IHM45l5t7QiW4AdcdO3PzC4tLy7mV/Ora+sZmYWv7yshU9aiUkjdDolhgiesBRwEayvNSBwKdh0OGhP/+o5pw2VyCUPFujG5TXiPUwJWCgqNGvaVlGQc0d3VxiFfjQZ0BKJA/5hE2AZSxTyMJWJWMzR14oy/jENtUOSgU3Yo7Bf5LvBkpohmaQeHJjyRNY5YAFcSYjucq6GZEA6eCjfJ+apgidEBuWcfShMTMdLPpsSO8b5UI96S2LwE8Vb9PZCQ2ZhiHNhkT6Jvf3kT8z+uk0DvtZjxRKbCEfi7qpQKDxJPmcMQ1oyCGlhCquf0rpn2iCQXb96W4P0+S+5qla8o0r14rhYP5vVkUO7aA+VkIdOUB2doyZqIYru0SN6QWPnwXl2Xp23z+icM5vZQT/gvH8AtFGmpA=</latexit>26
rθ " log p(s0) +
T
X
t=1
log πθ(at|st) +
T
X
t=1
log p(st+1 | st, at) #
<latexit sha1_base64="ht8omtVqWDaNhA+LkRIL8/ARjQ=">AFzHicjVRLb9NAEHbBIp5NIUjlxVpIRGlV2Q4BKpAiEBh6qgvqRsaq3Xm2TV9UPeMUpk9soP5MaVX8Ku7baJXapaSjSexzfjNrPxFcguP8WVvfaLUfPNx8ZD9+8vTZVmf7+amMs5SyExqLOD3iWSCR+wEOAh2nqSMhL5gZ/7lRxM/+8FSyePoGBYJG4dkGvEJpwS0y9ve+NtFQ4RDAjPfz8pLyceYMlDnPAepkEMP6UH/QGSXg67rjIhlNxEBjq9r7BgExhmYU6a+ioi+MSkhKRf1e9qzyc8ukMxghju4u+9jDMGJB+rT8GkpVdvDKhZzXPZgC/8VpkbEJEnSeI7wJCU0d1V+qEpKfOiqi8Mrem6TXs4HoAaIlEbf7padL14biSd6uy5YWac6obs6rEN7h2nNgQTXp3JvETEF6R6X1VjJXIfbZqy6HYaiEdQR7umUMlnBlkKVg+pqfMrvcKzL9d7FHRsFZVZyDiaYPF/yDvEOlesiyX36vx2K7VFBPWlXrpef0S5ambeJLIHqeRf35bY0A1DeoZAHyGwHq0HsDkUVzpPWaDlv023StNhteWN6S7NrjKV19lx9pziQU3DrYwdq3qOvM5vHMQ0C1kEVBApR6TwDgnKXAqmLJxJlC6CWZspE2IxIyOc4L5gp1tSdAkzjVP70NhXe5IiehlIvQ15lGfFmPGedtsVEGk/fjnEdJBiyiZaNJhDEyHzZUMBTRkEstEFoyjVXRGdEiwT6+2drEdz6kZvG6f6e+2Zv/9vbnYMPlRyb1kvrldWzXOudWB9to6sE4u2vrTi1ry1aB+2oZ23VZm6vlbVvLBWnvavf0W9+Kw=</latexit>Doesn’t depend on Transition probabilities!
27
rθ " log p(s0) +
T
X
t=1
log πθ(at|st) +
T
X
t=1
log p(st+1 | st, at) #
<latexit sha1_base64="ht8omtVqWDaNhA+LkRIL8/ARjQ=">AFzHicjVRLb9NAEHbBIp5NIUjlxVpIRGlV2Q4BKpAiEBh6qgvqRsaq3Xm2TV9UPeMUpk9soP5MaVX8Ku7baJXapaSjSexzfjNrPxFcguP8WVvfaLUfPNx8ZD9+8vTZVmf7+amMs5SyExqLOD3iWSCR+wEOAh2nqSMhL5gZ/7lRxM/+8FSyePoGBYJG4dkGvEJpwS0y9ve+NtFQ4RDAjPfz8pLyceYMlDnPAepkEMP6UH/QGSXg67rjIhlNxEBjq9r7BgExhmYU6a+ioi+MSkhKRf1e9qzyc8ukMxghju4u+9jDMGJB+rT8GkpVdvDKhZzXPZgC/8VpkbEJEnSeI7wJCU0d1V+qEpKfOiqi8Mrem6TXs4HoAaIlEbf7padL14biSd6uy5YWac6obs6rEN7h2nNgQTXp3JvETEF6R6X1VjJXIfbZqy6HYaiEdQR7umUMlnBlkKVg+pqfMrvcKzL9d7FHRsFZVZyDiaYPF/yDvEOlesiyX36vx2K7VFBPWlXrpef0S5ambeJLIHqeRf35bY0A1DeoZAHyGwHq0HsDkUVzpPWaDlv023StNhteWN6S7NrjKV19lx9pziQU3DrYwdq3qOvM5vHMQ0C1kEVBApR6TwDgnKXAqmLJxJlC6CWZspE2IxIyOc4L5gp1tSdAkzjVP70NhXe5IiehlIvQ15lGfFmPGedtsVEGk/fjnEdJBiyiZaNJhDEyHzZUMBTRkEstEFoyjVXRGdEiwT6+2drEdz6kZvG6f6e+2Zv/9vbnYMPlRyb1kvrldWzXOudWB9to6sE4u2vrTi1ry1aB+2oZ23VZm6vlbVvLBWnvavf0W9+Kw=</latexit>t=1
T
t=1
28
rθ " log p(s0) +
T
X
t=1
log πθ(at|st) +
T
X
t=1
log p(st+1 | st, at) #
<latexit sha1_base64="ht8omtVqWDaNhA+LkRIL8/ARjQ=">AFzHicjVRLb9NAEHbBIp5NIUjlxVpIRGlV2Q4BKpAiEBh6qgvqRsaq3Xm2TV9UPeMUpk9soP5MaVX8Ku7baJXapaSjSexzfjNrPxFcguP8WVvfaLUfPNx8ZD9+8vTZVmf7+amMs5SyExqLOD3iWSCR+wEOAh2nqSMhL5gZ/7lRxM/+8FSyePoGBYJG4dkGvEJpwS0y9ve+NtFQ4RDAjPfz8pLyceYMlDnPAepkEMP6UH/QGSXg67rjIhlNxEBjq9r7BgExhmYU6a+ioi+MSkhKRf1e9qzyc8ukMxghju4u+9jDMGJB+rT8GkpVdvDKhZzXPZgC/8VpkbEJEnSeI7wJCU0d1V+qEpKfOiqi8Mrem6TXs4HoAaIlEbf7padL14biSd6uy5YWac6obs6rEN7h2nNgQTXp3JvETEF6R6X1VjJXIfbZqy6HYaiEdQR7umUMlnBlkKVg+pqfMrvcKzL9d7FHRsFZVZyDiaYPF/yDvEOlesiyX36vx2K7VFBPWlXrpef0S5ambeJLIHqeRf35bY0A1DeoZAHyGwHq0HsDkUVzpPWaDlv023StNhteWN6S7NrjKV19lx9pziQU3DrYwdq3qOvM5vHMQ0C1kEVBApR6TwDgnKXAqmLJxJlC6CWZspE2IxIyOc4L5gp1tSdAkzjVP70NhXe5IiehlIvQ15lGfFmPGedtsVEGk/fjnEdJBiyiZaNJhDEyHzZUMBTRkEstEFoyjVXRGdEiwT6+2drEdz6kZvG6f6e+2Zv/9vbnYMPlRyb1kvrldWzXOudWB9to6sE4u2vrTi1ry1aB+2oZ23VZm6vlbVvLBWnvavf0W9+Kw=</latexit>t=1
T
t=1
29
τi = {s1, a1, . . . sT , aT }i
<latexit sha1_base64="TFbOelEzSv4cKXAFyEjbwcOLY4=">AGIXicjVRLb9NAEHZLAiW8UjhyWVFSmhUxQWpXCJVICTEoSofUjZ1FpvNsmqfsk7Rq3c/Stc+CtcOIBQb4g/w6ztoldqlpKPJ6Z/b5vZnbXjTypoNf7s7R8p1a/e2/lfuPBw0ePnzRXn+6rMIm52OhF8aHLlPCk4HYAwmeOIxiwXzXEwfu8TsTP/giYiXDYACnkRj5bBrIieQM0OWs1rZapE+oz2Dmul7aTMAaqkTyPZpnwcwplyoNMlyklh3dYmRKrSBfTO5p6YgJDqhIfs/o9fTITnz0s+6fZFHYzmdwYhQ2miRj20KMwGsU+KnwJKcxckT2sZzyTEHm/kvMBGRsiKwxNCJzHjqa3THZ1Lkn1bH+1cyLOr8lLZBd0lLDc6jVbOfPTSaGPxFLNPjDLj1FdiF8s2uDdUbQRGsqjJfATM9VjxvdiNhchtelNtC9IhkAygjHYpoWifGWTesHJIVz15dplrbP4b2T7KCEurygq8cFpR8T/IG5p0+y0zP595vJISnNdZNq+syNsOtYSYESIunhHl9DpknSwgmcgcJSJlx+u6NAOQHzlfjonhJaXdBOIEshsgjcUYp3Vdm4sR9ItDUdkMc6MuTN0wb0eaNalybCzX7hKPVGoYoCfA6od6TXehu97CFVwy6MNat4dp3mOWLwxBcBcI8pNbR7EYxSFoPknkDWRImI8WM2FUM0A+YLNUqz+jRpoWdMJmGMP9ximXd+Rcp8pU59FzPNsFQ5ZpzXxYJTN6MUhlECYiA50STxCMQEnNdkrGMBQfvFA3GY4laCZ8xbCXgpdrAJtjlkqvG/uaG/Wpj89Prte23RTtWrOfWC6t2daWtW19sHatPYvXvta+137WftW/1X/Uf9fP89TlpWLNM2vhqf/9ByWzGbc=</latexit>θ θ + αrθJ(θ)
<latexit sha1_base64="9FzOBrFC5DSMeH+3r2YIaAdeD8k=">AGVHicjVTfb9MwEM62towAo4NHXiymSi2rpmQgwUulCYSEeJgG6n5IdRc5rtuaOT9kO7Ap8x8JD0j8JbzwgJ2kXZu0iEhtL3fn+7lw/ZlRIx/m1sblVqzfubd+3Hzx8tPO4ufvkTEQJx+QURyziFz4ShNGQnEoqGbmIOUGBz8i5f/XOxM+/Ei5oFPblTUyGAZqEdEwxktrl7da+tEAPwADJqe+n75WXIk9CQMY0zbEo0jeCk92ukB4qdx3lQmB+C7S1ekdBRkZywEUSaCzeo67OclMWLpZ9We5UFOJ1M5BDaLfCxDeWUSNQp4UOJkhzFyxPaxjPHWCib+Wc1dUWI4phH1wCOcKpq9JjlVOiPVdHs/ouV6Ke1K1QUoNzp2K0e+fG4IT7R2deGmXGqO7LbZu6/+jaEIxp0ZN5CZHPUPG+rMZS5H+0qcqi4XQhGspytTmFQj4zyFywckhVPXl2GWtkvu1sjzLA0qkyAxZNKizWlVwn0nzURnu6Yg6L50qIZpO7VN2i/LeQYO0Qs6TuPHk2zBJCRkDj6LshPKcD9sFSRNZoKDXIrtWq9Li9uyqBXQEzBKB0hZJci2zm59yMtJTWiVvIX2vuAyVJVgYcWGqrDRKPGpOpcJzdNuF0CtjtA8+vq1D5VH7TmSbhpxHn0DhUd3g1g8RWum5jX3nAMne0DVcAtjzyqeE6/5Q4PjJChxAwJMXCdWA5TxCXFjCgbJoLECF+hCRloM0QBEcM0k0aBlvaMwDji+qO3MvMunkhRIMRN4OtM29RjhnqtgkeM3w5SGcSJiHOgcKAjID5hwUjygmW7EYbCHOquQI8RXoKUv8P21oEt9xy1Tg7PHBfHhx+erV39LaQY9t6Zj232pZrvbaOrA/WiXVq4dr32u+6Vd+o/6z/aWw16nq5kZx5qm19DR2/gI58yjp</latexit>Run the policy and sample trajectories Compute policy gradient Update policy Slide credit: Sergey Levine
rθJ(θ) ⇡ X
i
" T X
t=1
rθ log πθ(ai
t | si t) · T
X
t=1
R(si
t | ai t)
#
<latexit sha1_base64="RBn+cpP1fVseXo4iuczuvOlwew=">ACgHicbZFdi9QwFIbT+rWOX6NehMchC7o2K6CIgiL3ohXq+zsLky65TRNZ8ImTUhOxaHM7/B/ePEUxniji7Hg8ec95yck5pVXSY5r+iuJr12/cvLV3e3Tn7r37D8YPH5140zouZtwo485K8ELJRsxQohJn1gnQpRKn5cXHPn/6TgvTXOMKytyDYtG1pIDBqkY/2ANlAoKhkuBQD8nW9hnYK0z35lvdSGZEjXON9zh+2x9frzjYsosmJXDLYECzyXTsqK+p3GK4N0x60BlxU93Wd+L/VsK12crHEvBhP0m6CXoVsgEmZIijYvyTVYa3WjTIFXg/z1KLeQcOJVdiPWKtFxb4BSzEPGADWvi82wxwTZ8FpaK1ceE0SDfqv4OtPcrXYbKvnV/OdeL/8vNW6zf5p1sbIui4duH6lZRNLTfBq2kExzVKgBwJ0OvlC/BAcews1EYQnb5y1fh5GCavZoefHk9OfwjGOPCFPSUIy8oYck/kiMwIJ7+jSfQ8ehHcRK/jLNtaRwNnsdkJ+J3fwA7gcRo</latexit>30
Image Credit: http://karpathy.github.io/2016/05/31/rl/
31
Image Credit: http://karpathy.github.io/2016/05/31/rl/
32
Image Credit: http://karpathy.github.io/2016/05/31/rl/
(C) Dhruv Batra 33
34
rθ " log p(s0) +
T
X
t=1
log πθ(at|st) +
T
X
t=1
log p(st+1 | st, at) #
<latexit sha1_base64="ht8omtVqWDaNhA+LkRIL8/ARjQ=">AFzHicjVRLb9NAEHbBIp5NIUjlxVpIRGlV2Q4BKpAiEBh6qgvqRsaq3Xm2TV9UPeMUpk9soP5MaVX8Ku7baJXapaSjSexzfjNrPxFcguP8WVvfaLUfPNx8ZD9+8vTZVmf7+amMs5SyExqLOD3iWSCR+wEOAh2nqSMhL5gZ/7lRxM/+8FSyePoGBYJG4dkGvEJpwS0y9ve+NtFQ4RDAjPfz8pLyceYMlDnPAepkEMP6UH/QGSXg67rjIhlNxEBjq9r7BgExhmYU6a+ioi+MSkhKRf1e9qzyc8ukMxghju4u+9jDMGJB+rT8GkpVdvDKhZzXPZgC/8VpkbEJEnSeI7wJCU0d1V+qEpKfOiqi8Mrem6TXs4HoAaIlEbf7padL14biSd6uy5YWac6obs6rEN7h2nNgQTXp3JvETEF6R6X1VjJXIfbZqy6HYaiEdQR7umUMlnBlkKVg+pqfMrvcKzL9d7FHRsFZVZyDiaYPF/yDvEOlesiyX36vx2K7VFBPWlXrpef0S5ambeJLIHqeRf35bY0A1DeoZAHyGwHq0HsDkUVzpPWaDlv023StNhteWN6S7NrjKV19lx9pziQU3DrYwdq3qOvM5vHMQ0C1kEVBApR6TwDgnKXAqmLJxJlC6CWZspE2IxIyOc4L5gp1tSdAkzjVP70NhXe5IiehlIvQ15lGfFmPGedtsVEGk/fjnEdJBiyiZaNJhDEyHzZUMBTRkEstEFoyjVXRGdEiwT6+2drEdz6kZvG6f6e+2Zv/9vbnYMPlRyb1kvrldWzXOudWB9to6sE4u2vrTi1ry1aB+2oZ23VZm6vlbVvLBWnvavf0W9+Kw=</latexit>t=1
T
t=1
Formalizes notion of “trial and error”:
35
36
rθJ(θ) = Eτ∼pθ(τ) " T X
t=1
rθ log πθ(at|st) ·
T
X
t=1
R(st, at) b #
<latexit sha1_base64="kJ2fPiz67nf2xmaLnSPs7U4v8=">AGe3icjVRb9MwFM7GWka5bfCIhCyqinSEKRkgeKk0gZAQD9NA3UWqu8hx3dac5HtwKbMP4K/xhv/hBck7CTdmrSbZqnV8TnH5/v8neMECaNCu6fldU7a43m3fV7rfsPHj56vLH5FDEKcfkAMcs5scBEoTRiBxIKhk5TjhBYcDIUXD6ycSPfhAuaBz15XlChiGaRHRMZLa5W+u/eqAHoAhktMgyD4rP0O+hIKGMKE2xKNYXghfdh0g/Ey+8pQJgeQq4uj0roKMjOUAijTUWT1XnfSLkhix7LuyZ3mQ08lUDgGErQ74akM5JRJ1a/hQorRA8YsE23guMebK5v5ZTV0RoiTh8RmAY45w5qlsTxWUaM9TJ3szet4ivYw6UjkAFUa31SmQT7YMN8QnOvMDNOdUW2em1T94ZbG4IJLe9kNhEKGCr3VTUqkdtosyiLhtOFaCTr1S4plPKZRhaC1UNq0VNk17FG5r+Vz1EOWDtVZ8DiyQKL5SVvkOj2AzPfnfl6NR691JdmClXpSjXtlWnOHl6F7wGUJIzmb/DjJORygIF5hpeBcz5aFj9gITvdsErUAExkQqj4u0tS0vs2XsM6QiYSQO1UavRWtqDsj+98sUsTMrcHJSmykuj1KfmVCZ8TyvhOQBqwYTm0dfbPlQ+zfNKLH1txHn8c+bR90EsmaJr2utvtN1tN19g0fBKo2Va9/f+K3hcRqSGKGhBh4biKHGeKSYkZUC6aCJAifogkZaDNCIRHDLBdHgY72jMA45vqnhzf3zp/IUCjEeajb2jFDIOox41wWG6Ry/GY0ShJYlwATROGZAxMB9iMKcYMnOtYEwp5orwFOk+yD157qlRfDqV140Dne2vTfbO9/etnc/lnKsW8+sF5ZtedZ7a9f6Yu1bBxZe+9t43njZsBv/mu3mVtMpUldXyjNPrcpqvsP7hc3Zg=</latexit>37
rθJ(θ) = Eτ∼pθ(τ) " T X
t=1
rθ log πθ(at|st) ·
T
X
t=1
R(st, at) b #
<latexit sha1_base64="kJ2fPiz67nf2xmaLnSPs7U4v8=">AGe3icjVRb9MwFM7GWka5bfCIhCyqinSEKRkgeKk0gZAQD9NA3UWqu8hx3dac5HtwKbMP4K/xhv/hBck7CTdmrSbZqnV8TnH5/v8neMECaNCu6fldU7a43m3fV7rfsPHj56vLH5FDEKcfkAMcs5scBEoTRiBxIKhk5TjhBYcDIUXD6ycSPfhAuaBz15XlChiGaRHRMZLa5W+u/eqAHoAhktMgyD4rP0O+hIKGMKE2xKNYXghfdh0g/Ey+8pQJgeQq4uj0roKMjOUAijTUWT1XnfSLkhix7LuyZ3mQ08lUDgGErQ74akM5JRJ1a/hQorRA8YsE23guMebK5v5ZTV0RoiTh8RmAY45w5qlsTxWUaM9TJ3szet4ivYw6UjkAFUa31SmQT7YMN8QnOvMDNOdUW2em1T94ZbG4IJLe9kNhEKGCr3VTUqkdtosyiLhtOFaCTr1S4plPKZRhaC1UNq0VNk17FG5r+Vz1EOWDtVZ8DiyQKL5SVvkOj2AzPfnfl6NR691JdmClXpSjXtlWnOHl6F7wGUJIzmb/DjJORygIF5hpeBcz5aFj9gITvdsErUAExkQqj4u0tS0vs2XsM6QiYSQO1UavRWtqDsj+98sUsTMrcHJSmykuj1KfmVCZ8TyvhOQBqwYTm0dfbPlQ+zfNKLH1txHn8c+bR90EsmaJr2utvtN1tN19g0fBKo2Va9/f+K3hcRqSGKGhBh4biKHGeKSYkZUC6aCJAifogkZaDNCIRHDLBdHgY72jMA45vqnhzf3zp/IUCjEeajb2jFDIOox41wWG6Ry/GY0ShJYlwATROGZAxMB9iMKcYMnOtYEwp5orwFOk+yD157qlRfDqV140Dne2vTfbO9/etnc/lnKsW8+sF5ZtedZ7a9f6Yu1bBxZe+9t43njZsBv/mu3mVtMpUldXyjNPrcpqvsP7hc3Zg=</latexit>Homework!
38
rθJ(θ) = Eτ∼pθ(τ) " T X
t=1
rθ log πθ(at|st) ·
T
X
t=1
R(st, at) #
<latexit sha1_base64="Zpn69rO9o7ySEME81bejTDTspys=">AGZnicjVRLa9tAEFbS2E3cNk9KD70sDQa7MUFKC+3FEFoKpYeQFucBXkes1mt7iV7srtoEZf9kbz30p/RWUlOLNkJEdiMZr6d75vHyot9LpVt/1lafrJSqz9dXWs8e/5ifWNza/tURomg7IRGfiTOPSKZz0N2orjy2XksGAk8n515l59N/OwnE5JHYU9dx2wQkHIR5wSBS53a+WmiboIB0RNPC/9ot2UuApLHuCYtzAdRupGuqrdQdJN1Z6jTQjFd5EOwNsa+2yk+lgmAaC6tr7o5Skp8dMfujXFYcHEzVAGDea6FsLqwlTpF3hx4okOYubA1rGc8sxkzbzT3NCRkziWERXCI8Eoamj0yOdS+JdR18cTeU58/JS3lG6g0hutBvNnPnirdFGxBjQV0aZceo7seWyTd4HqjYCY17UZF5C4vmkeC93oxR5TG/m2wJ0kIiHqprtVkLRPjPIvGHVkJ735Ogq19D8N7I9ygrp6oK/Gg8p2Jxygda9PiFmZ3ObL6KDpi90jdmy3XRlHvHCpBOBofS7mZbzp1RAwPcFenabSHSvlMpESeX7NFsLg1vXoBHyKzVKiyVYpdqexLkAo2hKktancxim5xOeaWYmbkhamz1CRxuTmVSteBop0OwtAbCTp68NrD2uUZruCsokQ0a+pB+ohfjwh90zS3dy19+3sQfOGUxi7VvEcu5u/gZ4mAQsV9YmUfceO1SAlQnHqM93AiWQxoZdkzPpghiRgcpBmzdGoCZ4hGkUCfrCnmXf2REoCKa8D5Bm3rIaM85FsX6iRh8HKQ/jRLGQ5kSjxEcqQuabi4ZcMKr8azAIFRy0IjohMAcFX+YGNMGpljxvnB7sO+/2D76/3z38VLRj1XptvbFalmN9sA6tr9axdWLRlb+1tdp2baf2r75ef1l/lUOXl4ozO1bpqaP/TgUuWg=</latexit>Policy Evaluation (Recall Policy iteration)
the total reward – very noisy!
39
40
41
42
θ
<latexit sha1_base64="2zAaSTtsz8LmjwqlL7nkIJ/gjBM=">AHEHicjVLb9NAEHYLCSU82sKRy4oqKFRFRckuESqQEiIQ9Wi9CFlE2u92Sr+qXdNbRy9ydw4a9w4QBCXDly498waztYqdVLSWanZmd+eabGduNPC5Vu/1vafnW7Ur1zsrd2r37Dx6urq0/OpRhLCg7oKEXimOXSObxgB0orjx2HAlGfNdjR+7JW2M/+sSE5GHQVWcR6/tkHPARp0SBylmvPKujDsI+URPXTd5pJyGOwpL7OINTIehOpeOaraQdBK1aWtjQtGlpQXuTY09NlI9LGMfvDptPehmISnxko+6MfXDgo8nqo8wrtXRhwZWE6ZIs5AfKxJnWZzMoWE0Fzlmwqb6aUyIiEkUifAU4ZEgNLF1sqszSLxj68HuFJ5dhpfwltItRDKhWatnmQfPDTYixuB9apAZpb4EO1+2iXtN1QZgxPOazCEgrkfy8zwbc5abcFOmBdJBIB6oYrQLCDl9pEZYUWTLmsy72KuofmvpXOUJizcKiLwnEJxVUhryHp5iMz25/ZeAUk0H2lz82c65yWKxsLq3UHYqbdrdUQxF+jgaSwgJp91Em2guhbHM4cl2b5Fb1Jjuo8+HyEwaKoyaYqcqfT0kg2hlYt6kPenk29MaVJm5iAXdRqaxA43txLp2MCD3UIY6JKAowvHLtYOT/3yXFA2ESL8PNVAPcSLJuSK5i5q+wUrxdaT/F3lTJczI/m6Np/LZpGd/cHMrEODWwSGJ+cyUzprG+2tdvqgsmDnwoaVP3vO2l/ghMY+CxT1iJQ9ux2pfkKE4tRjuoZjySJCT8iY9UAMiM9kP0kxaVQHzRCNQgE/2KhUO3sjIb6UZ74LnoYLWbQZ5SJbL1aj1/2EB1GsWECzRKPYQypE5uAhlwqrwzEAgVHLAiOiEwHAq+ITUgwS6WXBYOt7fsF1vb+y83dt7kdKxYT6ynVsOyrVfWjvXe2rMOLFr5UvlW+VH5Wf1a/V79Vf2duS4v5XceW3NP9c9/BzNxnA=</latexit>β
<latexit sha1_base64="GoJsjmUFYSV/lMv6vYmSHaSE0zY=">AH13icjVLb9NAEHYLTUp4tXDksqKmtBQxQUJLpEqEBLiUDUofaBsaq2dTbKqX/KuocVdcQAhrvw1bvwI/gOzfqR+pFUtJVrPzM5832za9O3GRfd7t+l5Vu3V2r1TuNu/fuP3i4tv7okHthYNEDy7O94NgknNrMpQeCZse+wEljmnTI/P0rfIfaYBZ547EOc+HTlk6rIJs4gAk7G+8q+Jeg7RMxM3onjYgYAnPmYJ+1sDX2xAU3RLuDuBGJLV0qF/IvPR0Ib0ts04kYh46ENXrypNBktIidvRtrI4HLDpTIwQxo0m+tDCYkYFaZfqY0HCpIqRBLSUZV4jlza2ZzkhIya+H3hnCE8CYkW6jPZkAon1dHmyl8HTq/Ai1hGyg0iyaDeaSeWTZwobCaYQfaQKaO8BFtsW+W9pmsF0GdpT+rFJaZN0vciGwXPTbip0gLlIBFzRTnbHEJKnxIyIazsklVLEl2uNVb/jXiO4oKlXWUEtjetoLgq5TUk3Xxk8vrk85WQgPpCXqg5lyktVwoLIZ04HJrL1K30UIafoGicIC40W2jLVQoTwFPMnZWxTmt7Lz6LAxUpOGSqMm6JmIr4coGOQcpEGqT69MRUJiU3B+lSxqlJaDC1K+KGDjzoHYSBLg4BvA6wNJgcVxaC9omQeB9ySzQD7H9GblC3EWyz1kpS0/Su8rIDmdC8nUyX/B2mZ3+SW7WQeAOgeHJcTmXL+6j1EZ5okmiSbtvYFO9q2wqy5jasKtXvHrApxLlO9qMRzmHp9/imx2y2UbPUT/JFiduFqg1U0gx5UmpjIXYlew01ja62934QdWFni42tPTZN9b+gLRW6FBXWDbhfKh3fTGKSCYZVPZwCGnPrFOyZQOYekSh/JRFMrURMsYzTxAvjBxRBb8zsi4nB+7pgQqQjgZ8yLvINQzF5PYqY64eCulZSaBLaSHhIfeTQmAXUEvY5LIgVMCKrBmBGRfwKWwACXq5ericGdbf7G903+5sfsmpWNVe6I91Vqar3SdrX32r52oFm1Qe1r7XvtR/1T/Vv9Z/1XErq8lO5rBWe+u/8VGzg=</latexit>43
θ
<latexit sha1_base64="2zAaSTtsz8LmjwqlL7nkIJ/gjBM=">AHEHicjVLb9NAEHYLCSU82sKRy4oqKFRFRckuESqQEiIQ9Wi9CFlE2u92Sr+qXdNbRy9ydw4a9w4QBCXDly498waztYqdVLSWanZmd+eabGduNPC5Vu/1vafnW7Ur1zsrd2r37Dx6urq0/OpRhLCg7oKEXimOXSObxgB0orjx2HAlGfNdjR+7JW2M/+sSE5GHQVWcR6/tkHPARp0SBylmvPKujDsI+URPXTd5pJyGOwpL7OINTIehOpeOaraQdBK1aWtjQtGlpQXuTY09NlI9LGMfvDptPehmISnxko+6MfXDgo8nqo8wrtXRhwZWE6ZIs5AfKxJnWZzMoWE0Fzlmwqb6aUyIiEkUifAU4ZEgNLF1sqszSLxj68HuFJ5dhpfwltItRDKhWatnmQfPDTYixuB9apAZpb4EO1+2iXtN1QZgxPOazCEgrkfy8zwbc5abcFOmBdJBIB6oYrQLCDl9pEZYUWTLmsy72KuofmvpXOUJizcKiLwnEJxVUhryHp5iMz25/ZeAUk0H2lz82c65yWKxsLq3UHYqbdrdUQxF+jgaSwgJp91Em2guhbHM4cl2b5Fb1Jjuo8+HyEwaKoyaYqcqfT0kg2hlYt6kPenk29MaVJm5iAXdRqaxA43txLp2MCD3UIY6JKAowvHLtYOT/3yXFA2ESL8PNVAPcSLJuSK5i5q+wUrxdaT/F3lTJczI/m6Np/LZpGd/cHMrEODWwSGJ+cyUzprG+2tdvqgsmDnwoaVP3vO2l/ghMY+CxT1iJQ9ux2pfkKE4tRjuoZjySJCT8iY9UAMiM9kP0kxaVQHzRCNQgE/2KhUO3sjIb6UZ74LnoYLWbQZ5SJbL1aj1/2EB1GsWECzRKPYQypE5uAhlwqrwzEAgVHLAiOiEwHAq+ITUgwS6WXBYOt7fsF1vb+y83dt7kdKxYT6ynVsOyrVfWjvXe2rMOLFr5UvlW+VH5Wf1a/V79Vf2duS4v5XceW3NP9c9/BzNxnA=</latexit>β
<latexit sha1_base64="GoJsjmUFYSV/lMv6vYmSHaSE0zY=">AH13icjVLb9NAEHYLTUp4tXDksqKmtBQxQUJLpEqEBLiUDUofaBsaq2dTbKqX/KuocVdcQAhrvw1bvwI/gOzfqR+pFUtJVrPzM5832za9O3GRfd7t+l5Vu3V2r1TuNu/fuP3i4tv7okHthYNEDy7O94NgknNrMpQeCZse+wEljmnTI/P0rfIfaYBZ547EOc+HTlk6rIJs4gAk7G+8q+Jeg7RMxM3onjYgYAnPmYJ+1sDX2xAU3RLuDuBGJLV0qF/IvPR0Ib0ts04kYh46ENXrypNBktIidvRtrI4HLDpTIwQxo0m+tDCYkYFaZfqY0HCpIqRBLSUZV4jlza2ZzkhIya+H3hnCE8CYkW6jPZkAon1dHmyl8HTq/Ai1hGyg0iyaDeaSeWTZwobCaYQfaQKaO8BFtsW+W9pmsF0GdpT+rFJaZN0vciGwXPTbip0gLlIBFzRTnbHEJKnxIyIazsklVLEl2uNVb/jXiO4oKlXWUEtjetoLgq5TUk3Xxk8vrk85WQgPpCXqg5lyktVwoLIZ04HJrL1K30UIafoGicIC40W2jLVQoTwFPMnZWxTmt7Lz6LAxUpOGSqMm6JmIr4coGOQcpEGqT69MRUJiU3B+lSxqlJaDC1K+KGDjzoHYSBLg4BvA6wNJgcVxaC9omQeB9ySzQD7H9GblC3EWyz1kpS0/Su8rIDmdC8nUyX/B2mZ3+SW7WQeAOgeHJcTmXL+6j1EZ5okmiSbtvYFO9q2wqy5jasKtXvHrApxLlO9qMRzmHp9/imx2y2UbPUT/JFiduFqg1U0gx5UmpjIXYlew01ja62934QdWFni42tPTZN9b+gLRW6FBXWDbhfKh3fTGKSCYZVPZwCGnPrFOyZQOYekSh/JRFMrURMsYzTxAvjBxRBb8zsi4nB+7pgQqQjgZ8yLvINQzF5PYqY64eCulZSaBLaSHhIfeTQmAXUEvY5LIgVMCKrBmBGRfwKWwACXq5ericGdbf7G903+5sfsmpWNVe6I91Vqar3SdrX32r52oFm1Qe1r7XvtR/1T/Vv9Z/1XErq8lO5rBWe+u/8VGzg=</latexit>44
θ
<latexit sha1_base64="2zAaSTtsz8LmjwqlL7nkIJ/gjBM=">AHEHicjVLb9NAEHYLCSU82sKRy4oqKFRFRckuESqQEiIQ9Wi9CFlE2u92Sr+qXdNbRy9ydw4a9w4QBCXDly498waztYqdVLSWanZmd+eabGduNPC5Vu/1vafnW7Ur1zsrd2r37Dx6urq0/OpRhLCg7oKEXimOXSObxgB0orjx2HAlGfNdjR+7JW2M/+sSE5GHQVWcR6/tkHPARp0SBylmvPKujDsI+URPXTd5pJyGOwpL7OINTIehOpeOaraQdBK1aWtjQtGlpQXuTY09NlI9LGMfvDptPehmISnxko+6MfXDgo8nqo8wrtXRhwZWE6ZIs5AfKxJnWZzMoWE0Fzlmwqb6aUyIiEkUifAU4ZEgNLF1sqszSLxj68HuFJ5dhpfwltItRDKhWatnmQfPDTYixuB9apAZpb4EO1+2iXtN1QZgxPOazCEgrkfy8zwbc5abcFOmBdJBIB6oYrQLCDl9pEZYUWTLmsy72KuofmvpXOUJizcKiLwnEJxVUhryHp5iMz25/ZeAUk0H2lz82c65yWKxsLq3UHYqbdrdUQxF+jgaSwgJp91Em2guhbHM4cl2b5Fb1Jjuo8+HyEwaKoyaYqcqfT0kg2hlYt6kPenk29MaVJm5iAXdRqaxA43txLp2MCD3UIY6JKAowvHLtYOT/3yXFA2ESL8PNVAPcSLJuSK5i5q+wUrxdaT/F3lTJczI/m6Np/LZpGd/cHMrEODWwSGJ+cyUzprG+2tdvqgsmDnwoaVP3vO2l/ghMY+CxT1iJQ9ux2pfkKE4tRjuoZjySJCT8iY9UAMiM9kP0kxaVQHzRCNQgE/2KhUO3sjIb6UZ74LnoYLWbQZ5SJbL1aj1/2EB1GsWECzRKPYQypE5uAhlwqrwzEAgVHLAiOiEwHAq+ITUgwS6WXBYOt7fsF1vb+y83dt7kdKxYT6ynVsOyrVfWjvXe2rMOLFr5UvlW+VH5Wf1a/V79Vf2duS4v5XceW3NP9c9/BzNxnA=</latexit>s0 ∼ p(s0|s, a)
<latexit sha1_base64="qjvPZnMrTgJaZuVsuWAeqJ2o8h4=">AHGnicjVLb9NAEHYLCSW8WjhyWVFTWgU2QUJLpEqEBLiULUofUjZxFo7m2RVv+RdQytnfwcX/goXDiDEDXHh3zBrb9rETqulGh2Znbm29m107kMS5M89/K6q3bleqdtbu1e/cfPHy0vH4iIdJ7NJDN/TC+MQhnHosoIeCY+eRDElvuPRY+f0rbIf6IxZ2HQFecR7ftkHLARc4kAlb1RMeuog7BPxMRx0nfSToktMGc+jlgDu8NQTLktmi3E7VRsW1KZUHRpaYF7U2KPjkQP8QHr4pB908pEu89KNszPxwzMYT0UcY1+roQwOLCRWkWciPBUnyLHbu0FCaixzYTP9LCZExCSK4vAM4VFM3NS6Z7MIbGOJQd7M3hWGV7KWkK2EMmFZq2eZx48V9hIPAbvM4VMKeUl2MWyVdxrqlYAI6ZrUpuAOB7R+0U2Fiw34aZMC6SDQCwQxWgXEDR9qpE5YUWTLGty72KuofqvZXOUJSycKiLwnEJxVUhryHp5iMz35/5eAUk0H0hp2rOpablysaCSytzh+Jm3S3VUISv0UBSuEDcNptoGy2kUJYFPndW+YWNWb30WdDpCYNFUZN0DORPQ9pTIfQymU90P3p6BtTmpS5OdCizEKTxGbqVMptC3iwWgDXRxwdGHbxdJmZ/OBWTOA4/zRQD/GiCbmiucvafsFKsfVEv1X27HLmJF/X5ilvFtk5GMzNOjS4RWB4NJd8S795fGuqDPb6ptk2s4XKgqWFTUOvfXv9D7DjJj4NhOsRznuWGYl+SmLBXI/KGk4jYh7Ssa0B2JAfMr7aYZOojpohmgUxvCDu5Vp50+kxOf83HfAU7HCizalXGbrJWL0up+yIEoEDdw80SjxkAiR+k6gIYupK7xzEIgbM8CK3AmBMRHwNakBCVax5LJwtNO2XrR3Dl5u7r7RdKwZT41nRsOwjFfGrvHe2DcODbfypfKt8qPys/q1+r36q/o7d1d0WeGAur+vc/yoZ0xA=</latexit>β
<latexit sha1_base64="GoJsjmUFYSV/lMv6vYmSHaSE0zY=">AH13icjVLb9NAEHYLTUp4tXDksqKmtBQxQUJLpEqEBLiUDUofaBsaq2dTbKqX/KuocVdcQAhrvw1bvwI/gOzfqR+pFUtJVrPzM5832za9O3GRfd7t+l5Vu3V2r1TuNu/fuP3i4tv7okHthYNEDy7O94NgknNrMpQeCZse+wEljmnTI/P0rfIfaYBZ547EOc+HTlk6rIJs4gAk7G+8q+Jeg7RMxM3onjYgYAnPmYJ+1sDX2xAU3RLuDuBGJLV0qF/IvPR0Ib0ts04kYh46ENXrypNBktIidvRtrI4HLDpTIwQxo0m+tDCYkYFaZfqY0HCpIqRBLSUZV4jlza2ZzkhIya+H3hnCE8CYkW6jPZkAon1dHmyl8HTq/Ai1hGyg0iyaDeaSeWTZwobCaYQfaQKaO8BFtsW+W9pmsF0GdpT+rFJaZN0vciGwXPTbip0gLlIBFzRTnbHEJKnxIyIazsklVLEl2uNVb/jXiO4oKlXWUEtjetoLgq5TUk3Xxk8vrk85WQgPpCXqg5lyktVwoLIZ04HJrL1K30UIafoGicIC40W2jLVQoTwFPMnZWxTmt7Lz6LAxUpOGSqMm6JmIr4coGOQcpEGqT69MRUJiU3B+lSxqlJaDC1K+KGDjzoHYSBLg4BvA6wNJgcVxaC9omQeB9ySzQD7H9GblC3EWyz1kpS0/Su8rIDmdC8nUyX/B2mZ3+SW7WQeAOgeHJcTmXL+6j1EZ5okmiSbtvYFO9q2wqy5jasKtXvHrApxLlO9qMRzmHp9/imx2y2UbPUT/JFiduFqg1U0gx5UmpjIXYlew01ja62934QdWFni42tPTZN9b+gLRW6FBXWDbhfKh3fTGKSCYZVPZwCGnPrFOyZQOYekSh/JRFMrURMsYzTxAvjBxRBb8zsi4nB+7pgQqQjgZ8yLvINQzF5PYqY64eCulZSaBLaSHhIfeTQmAXUEvY5LIgVMCKrBmBGRfwKWwACXq5ericGdbf7G903+5sfsmpWNVe6I91Vqar3SdrX32r52oFm1Qe1r7XvtR/1T/Vv9Z/1XErq8lO5rBWe+u/8VGzg=</latexit>45
θ
<latexit sha1_base64="2zAaSTtsz8LmjwqlL7nkIJ/gjBM=">AHEHicjVLb9NAEHYLCSU82sKRy4oqKFRFRckuESqQEiIQ9Wi9CFlE2u92Sr+qXdNbRy9ydw4a9w4QBCXDly498waztYqdVLSWanZmd+eabGduNPC5Vu/1vafnW7Ur1zsrd2r37Dx6urq0/OpRhLCg7oKEXimOXSObxgB0orjx2HAlGfNdjR+7JW2M/+sSE5GHQVWcR6/tkHPARp0SBylmvPKujDsI+URPXTd5pJyGOwpL7OINTIehOpeOaraQdBK1aWtjQtGlpQXuTY09NlI9LGMfvDptPehmISnxko+6MfXDgo8nqo8wrtXRhwZWE6ZIs5AfKxJnWZzMoWE0Fzlmwqb6aUyIiEkUifAU4ZEgNLF1sqszSLxj68HuFJ5dhpfwltItRDKhWatnmQfPDTYixuB9apAZpb4EO1+2iXtN1QZgxPOazCEgrkfy8zwbc5abcFOmBdJBIB6oYrQLCDl9pEZYUWTLmsy72KuofmvpXOUJizcKiLwnEJxVUhryHp5iMz25/ZeAUk0H2lz82c65yWKxsLq3UHYqbdrdUQxF+jgaSwgJp91Em2guhbHM4cl2b5Fb1Jjuo8+HyEwaKoyaYqcqfT0kg2hlYt6kPenk29MaVJm5iAXdRqaxA43txLp2MCD3UIY6JKAowvHLtYOT/3yXFA2ESL8PNVAPcSLJuSK5i5q+wUrxdaT/F3lTJczI/m6Np/LZpGd/cHMrEODWwSGJ+cyUzprG+2tdvqgsmDnwoaVP3vO2l/ghMY+CxT1iJQ9ux2pfkKE4tRjuoZjySJCT8iY9UAMiM9kP0kxaVQHzRCNQgE/2KhUO3sjIb6UZ74LnoYLWbQZ5SJbL1aj1/2EB1GsWECzRKPYQypE5uAhlwqrwzEAgVHLAiOiEwHAq+ITUgwS6WXBYOt7fsF1vb+y83dt7kdKxYT6ynVsOyrVfWjvXe2rMOLFr5UvlW+VH5Wf1a/V79Vf2duS4v5XceW3NP9c9/BzNxnA=</latexit>s0 ∼ p(s0|s, a)
<latexit sha1_base64="qjvPZnMrTgJaZuVsuWAeqJ2o8h4=">AHGnicjVLb9NAEHYLCSW8WjhyWVFTWgU2QUJLpEqEBLiULUofUjZxFo7m2RVv+RdQytnfwcX/goXDiDEDXHh3zBrb9rETqulGh2Znbm29m107kMS5M89/K6q3bleqdtbu1e/cfPHy0vH4iIdJ7NJDN/TC+MQhnHosoIeCY+eRDElvuPRY+f0rbIf6IxZ2HQFecR7ftkHLARc4kAlb1RMeuog7BPxMRx0nfSToktMGc+jlgDu8NQTLktmi3E7VRsW1KZUHRpaYF7U2KPjkQP8QHr4pB908pEu89KNszPxwzMYT0UcY1+roQwOLCRWkWciPBUnyLHbu0FCaixzYTP9LCZExCSK4vAM4VFM3NS6Z7MIbGOJQd7M3hWGV7KWkK2EMmFZq2eZx48V9hIPAbvM4VMKeUl2MWyVdxrqlYAI6ZrUpuAOB7R+0U2Fiw34aZMC6SDQCwQxWgXEDR9qpE5YUWTLGty72KuofqvZXOUJSycKiLwnEJxVUhryHp5iMz35/5eAUk0H0hp2rOpablysaCSytzh+Jm3S3VUISv0UBSuEDcNptoGy2kUJYFPndW+YWNWb30WdDpCYNFUZN0DORPQ9pTIfQymU90P3p6BtTmpS5OdCizEKTxGbqVMptC3iwWgDXRxwdGHbxdJmZ/OBWTOA4/zRQD/GiCbmiucvafsFKsfVEv1X27HLmJF/X5ilvFtk5GMzNOjS4RWB4NJd8S795fGuqDPb6ptk2s4XKgqWFTUOvfXv9D7DjJj4NhOsRznuWGYl+SmLBXI/KGk4jYh7Ssa0B2JAfMr7aYZOojpohmgUxvCDu5Vp50+kxOf83HfAU7HCizalXGbrJWL0up+yIEoEDdw80SjxkAiR+k6gIYupK7xzEIgbM8CK3AmBMRHwNakBCVax5LJwtNO2XrR3Dl5u7r7RdKwZT41nRsOwjFfGrvHe2DcODbfypfKt8qPys/q1+r36q/o7d1d0WeGAur+vc/yoZ0xA=</latexit>Qβ(s, a)
<latexit sha1_base64="OBpgOmPA0pqOmGPEf8xNI1PvyCg=">AHFnicjVLb9NAEHYLCSW8WjhyWVFSqhVxQUJLpEqEBLiULUofUjZ1FpvNsmqfsm7hlbu/gou/BUuHECIK+LGv2HW3rSJnVa1lGh2Znbm29mbC/2uZCdzr+l5Vu3a/U7K3cb9+4/ePhode3xgYjShLJ9GvlRcuQRwXwesn3Jpc+O4oSRwPZoXfyVtsP7FE8CjsybOYDQIyDvmIUyJB5a7V7CbqIhwQOfG87J1yM+JKLHiAY97CdBjJc+HKto2Em8kNR2kTi8tNri3FfbZSPaxSAPw6nbUca8ISYmfVStqR9O+HgiBwjRhN9aGE5YZK0S/mxJGmRxS0cWlpzkWMmbK6fxoSImMRxEp0iPEoIzRyV7agCEu86nhnCs+pwsu4LZWNSCG0G80i8/FzjY0kY/A+1ci0Ul2CnS9bx72mag0w5qYmfQiJ5xNznmdjznITbq0QDoIxENZjnYBwdCnG1kQVjapqbwLuca6v9GPkd5wtKtMgI/GldQXBXyGpJuPjKz/ZmNV0IC3ZfqXM+5MrRc2VhwsXN3KG7a3UoNZfgGDSFBRJup4020FwKbZnDU+zeIre4Nd3HgA+RnjRUGjXJTmX+esgSNoRWLuqB6U/XbExlUmbmwIgqD01Sl+tbmXAd4MGxEQa6BODowbGHlctzP5MLyiZJEn2eaqAe4scTckVzF7X9gpVy64l5V7nT5SxIvq7N56JdZmfveGbWocE2geExXO652NP3tNJdXe9sdvIHVQXHCOuWeXbd1b/ADE0DFkrqEyH6TieWg4wklOfqQZOBYsJPSFj1gcxJAETgyxHplATNEM0ihL4wV7l2tkbGQmEOAs8NSMiLJNKxfZ+qkcvR5kPIxTyUJaJBqlPpIR0t8INOQJo9I/A4HQhANWRCcERkTCl6QBJDjlkqvCwdam82Jza+/l+vYbQ8eK9dR6ZrUsx3plbVvrV1r36K1L7VvtR+1n/Wv9e/1X/XfhevykrnzxJp76n/+A6wMc18=</latexit>β
<latexit sha1_base64="GoJsjmUFYSV/lMv6vYmSHaSE0zY=">AH13icjVLb9NAEHYLTUp4tXDksqKmtBQxQUJLpEqEBLiUDUofaBsaq2dTbKqX/KuocVdcQAhrvw1bvwI/gOzfqR+pFUtJVrPzM5832za9O3GRfd7t+l5Vu3V2r1TuNu/fuP3i4tv7okHthYNEDy7O94NgknNrMpQeCZse+wEljmnTI/P0rfIfaYBZ547EOc+HTlk6rIJs4gAk7G+8q+Jeg7RMxM3onjYgYAnPmYJ+1sDX2xAU3RLuDuBGJLV0qF/IvPR0Ib0ts04kYh46ENXrypNBktIidvRtrI4HLDpTIwQxo0m+tDCYkYFaZfqY0HCpIqRBLSUZV4jlza2ZzkhIya+H3hnCE8CYkW6jPZkAon1dHmyl8HTq/Ai1hGyg0iyaDeaSeWTZwobCaYQfaQKaO8BFtsW+W9pmsF0GdpT+rFJaZN0vciGwXPTbip0gLlIBFzRTnbHEJKnxIyIazsklVLEl2uNVb/jXiO4oKlXWUEtjetoLgq5TUk3Xxk8vrk85WQgPpCXqg5lyktVwoLIZ04HJrL1K30UIafoGicIC40W2jLVQoTwFPMnZWxTmt7Lz6LAxUpOGSqMm6JmIr4coGOQcpEGqT69MRUJiU3B+lSxqlJaDC1K+KGDjzoHYSBLg4BvA6wNJgcVxaC9omQeB9ySzQD7H9GblC3EWyz1kpS0/Su8rIDmdC8nUyX/B2mZ3+SW7WQeAOgeHJcTmXL+6j1EZ5okmiSbtvYFO9q2wqy5jasKtXvHrApxLlO9qMRzmHp9/imx2y2UbPUT/JFiduFqg1U0gx5UmpjIXYlew01ja62934QdWFni42tPTZN9b+gLRW6FBXWDbhfKh3fTGKSCYZVPZwCGnPrFOyZQOYekSh/JRFMrURMsYzTxAvjBxRBb8zsi4nB+7pgQqQjgZ8yLvINQzF5PYqY64eCulZSaBLaSHhIfeTQmAXUEvY5LIgVMCKrBmBGRfwKWwACXq5ericGdbf7G903+5sfsmpWNVe6I91Vqar3SdrX32r52oFm1Qe1r7XvtR/1T/Vv9Z/1XErq8lO5rBWe+u/8VGzg=</latexit>46
θ
<latexit sha1_base64="2zAaSTtsz8LmjwqlL7nkIJ/gjBM=">AHEHicjVLb9NAEHYLCSU82sKRy4oqKFRFRckuESqQEiIQ9Wi9CFlE2u92Sr+qXdNbRy9ydw4a9w4QBCXDly498waztYqdVLSWanZmd+eabGduNPC5Vu/1vafnW7Ur1zsrd2r37Dx6urq0/OpRhLCg7oKEXimOXSObxgB0orjx2HAlGfNdjR+7JW2M/+sSE5GHQVWcR6/tkHPARp0SBylmvPKujDsI+URPXTd5pJyGOwpL7OINTIehOpeOaraQdBK1aWtjQtGlpQXuTY09NlI9LGMfvDptPehmISnxko+6MfXDgo8nqo8wrtXRhwZWE6ZIs5AfKxJnWZzMoWE0Fzlmwqb6aUyIiEkUifAU4ZEgNLF1sqszSLxj68HuFJ5dhpfwltItRDKhWatnmQfPDTYixuB9apAZpb4EO1+2iXtN1QZgxPOazCEgrkfy8zwbc5abcFOmBdJBIB6oYrQLCDl9pEZYUWTLmsy72KuofmvpXOUJizcKiLwnEJxVUhryHp5iMz25/ZeAUk0H2lz82c65yWKxsLq3UHYqbdrdUQxF+jgaSwgJp91Em2guhbHM4cl2b5Fb1Jjuo8+HyEwaKoyaYqcqfT0kg2hlYt6kPenk29MaVJm5iAXdRqaxA43txLp2MCD3UIY6JKAowvHLtYOT/3yXFA2ESL8PNVAPcSLJuSK5i5q+wUrxdaT/F3lTJczI/m6Np/LZpGd/cHMrEODWwSGJ+cyUzprG+2tdvqgsmDnwoaVP3vO2l/ghMY+CxT1iJQ9ux2pfkKE4tRjuoZjySJCT8iY9UAMiM9kP0kxaVQHzRCNQgE/2KhUO3sjIb6UZ74LnoYLWbQZ5SJbL1aj1/2EB1GsWECzRKPYQypE5uAhlwqrwzEAgVHLAiOiEwHAq+ITUgwS6WXBYOt7fsF1vb+y83dt7kdKxYT6ynVsOyrVfWjvXe2rMOLFr5UvlW+VH5Wf1a/V79Vf2duS4v5XceW3NP9c9/BzNxnA=</latexit>s0 ∼ p(s0|s, a)
<latexit sha1_base64="qjvPZnMrTgJaZuVsuWAeqJ2o8h4=">AHGnicjVLb9NAEHYLCSW8WjhyWVFTWgU2QUJLpEqEBLiULUofUjZxFo7m2RVv+RdQytnfwcX/goXDiDEDXHh3zBrb9rETqulGh2Znbm29m107kMS5M89/K6q3bleqdtbu1e/cfPHy0vH4iIdJ7NJDN/TC+MQhnHosoIeCY+eRDElvuPRY+f0rbIf6IxZ2HQFecR7ftkHLARc4kAlb1RMeuog7BPxMRx0nfSToktMGc+jlgDu8NQTLktmi3E7VRsW1KZUHRpaYF7U2KPjkQP8QHr4pB908pEu89KNszPxwzMYT0UcY1+roQwOLCRWkWciPBUnyLHbu0FCaixzYTP9LCZExCSK4vAM4VFM3NS6Z7MIbGOJQd7M3hWGV7KWkK2EMmFZq2eZx48V9hIPAbvM4VMKeUl2MWyVdxrqlYAI6ZrUpuAOB7R+0U2Fiw34aZMC6SDQCwQxWgXEDR9qpE5YUWTLGty72KuofqvZXOUJSycKiLwnEJxVUhryHp5iMz35/5eAUk0H0hp2rOpablysaCSytzh+Jm3S3VUISv0UBSuEDcNptoGy2kUJYFPndW+YWNWb30WdDpCYNFUZN0DORPQ9pTIfQymU90P3p6BtTmpS5OdCizEKTxGbqVMptC3iwWgDXRxwdGHbxdJmZ/OBWTOA4/zRQD/GiCbmiucvafsFKsfVEv1X27HLmJF/X5ilvFtk5GMzNOjS4RWB4NJd8S795fGuqDPb6ptk2s4XKgqWFTUOvfXv9D7DjJj4NhOsRznuWGYl+SmLBXI/KGk4jYh7Ssa0B2JAfMr7aYZOojpohmgUxvCDu5Vp50+kxOf83HfAU7HCizalXGbrJWL0up+yIEoEDdw80SjxkAiR+k6gIYupK7xzEIgbM8CK3AmBMRHwNakBCVax5LJwtNO2XrR3Dl5u7r7RdKwZT41nRsOwjFfGrvHe2DcODbfypfKt8qPys/q1+r36q/o7d1d0WeGAur+vc/yoZ0xA=</latexit>Qβ(s, a)
<latexit sha1_base64="OBpgOmPA0pqOmGPEf8xNI1PvyCg=">AHFnicjVLb9NAEHYLCSW8WjhyWVFSqhVxQUJLpEqEBLiULUofUjZ1FpvNsmqfsm7hlbu/gou/BUuHECIK+LGv2HW3rSJnVa1lGh2Znbm29mbC/2uZCdzr+l5Vu3a/U7K3cb9+4/ePhode3xgYjShLJ9GvlRcuQRwXwesn3Jpc+O4oSRwPZoXfyVtsP7FE8CjsybOYDQIyDvmIUyJB5a7V7CbqIhwQOfG87J1yM+JKLHiAY97CdBjJc+HKto2Em8kNR2kTi8tNri3FfbZSPaxSAPw6nbUca8ISYmfVStqR9O+HgiBwjRhN9aGE5YZK0S/mxJGmRxS0cWlpzkWMmbK6fxoSImMRxEp0iPEoIzRyV7agCEu86nhnCs+pwsu4LZWNSCG0G80i8/FzjY0kY/A+1ci0Ul2CnS9bx72mag0w5qYmfQiJ5xNznmdjznITbq0QDoIxENZjnYBwdCnG1kQVjapqbwLuca6v9GPkd5wtKtMgI/GldQXBXyGpJuPjKz/ZmNV0IC3ZfqXM+5MrRc2VhwsXN3KG7a3UoNZfgGDSFBRJup4020FwKbZnDU+zeIre4Nd3HgA+RnjRUGjXJTmX+esgSNoRWLuqB6U/XbExlUmbmwIgqD01Sl+tbmXAd4MGxEQa6BODowbGHlctzP5MLyiZJEn2eaqAe4scTckVzF7X9gpVy64l5V7nT5SxIvq7N56JdZmfveGbWocE2geExXO652NP3tNJdXe9sdvIHVQXHCOuWeXbd1b/ADE0DFkrqEyH6TieWg4wklOfqQZOBYsJPSFj1gcxJAETgyxHplATNEM0ihL4wV7l2tkbGQmEOAs8NSMiLJNKxfZ+qkcvR5kPIxTyUJaJBqlPpIR0t8INOQJo9I/A4HQhANWRCcERkTCl6QBJDjlkqvCwdam82Jza+/l+vYbQ8eK9dR6ZrUsx3plbVvrV1r36K1L7VvtR+1n/Wv9e/1X/XfhevykrnzxJp76n/+A6wMc18=</latexit>β
<latexit sha1_base64="GoJsjmUFYSV/lMv6vYmSHaSE0zY=">AH13icjVLb9NAEHYLTUp4tXDksqKmtBQxQUJLpEqEBLiUDUofaBsaq2dTbKqX/KuocVdcQAhrvw1bvwI/gOzfqR+pFUtJVrPzM5832za9O3GRfd7t+l5Vu3V2r1TuNu/fuP3i4tv7okHthYNEDy7O94NgknNrMpQeCZse+wEljmnTI/P0rfIfaYBZ547EOc+HTlk6rIJs4gAk7G+8q+Jeg7RMxM3onjYgYAnPmYJ+1sDX2xAU3RLuDuBGJLV0qF/IvPR0Ib0ts04kYh46ENXrypNBktIidvRtrI4HLDpTIwQxo0m+tDCYkYFaZfqY0HCpIqRBLSUZV4jlza2ZzkhIya+H3hnCE8CYkW6jPZkAon1dHmyl8HTq/Ai1hGyg0iyaDeaSeWTZwobCaYQfaQKaO8BFtsW+W9pmsF0GdpT+rFJaZN0vciGwXPTbip0gLlIBFzRTnbHEJKnxIyIazsklVLEl2uNVb/jXiO4oKlXWUEtjetoLgq5TUk3Xxk8vrk85WQgPpCXqg5lyktVwoLIZ04HJrL1K30UIafoGicIC40W2jLVQoTwFPMnZWxTmt7Lz6LAxUpOGSqMm6JmIr4coGOQcpEGqT69MRUJiU3B+lSxqlJaDC1K+KGDjzoHYSBLg4BvA6wNJgcVxaC9omQeB9ySzQD7H9GblC3EWyz1kpS0/Su8rIDmdC8nUyX/B2mZ3+SW7WQeAOgeHJcTmXL+6j1EZ5okmiSbtvYFO9q2wqy5jasKtXvHrApxLlO9qMRzmHp9/imx2y2UbPUT/JFiduFqg1U0gx5UmpjIXYlew01ja62934QdWFni42tPTZN9b+gLRW6FBXWDbhfKh3fTGKSCYZVPZwCGnPrFOyZQOYekSh/JRFMrURMsYzTxAvjBxRBb8zsi4nB+7pgQqQjgZ8yLvINQzF5PYqY64eCulZSaBLaSHhIfeTQmAXUEvY5LIgVMCKrBmBGRfwKWwACXq5ericGdbf7G903+5sfsmpWNVe6I91Vqar3SdrX32r52oFm1Qe1r7XvtR/1T/Vv9Z/1XErq8lO5rBWe+u/8VGzg=</latexit>47
θ
<latexit sha1_base64="2zAaSTtsz8LmjwqlL7nkIJ/gjBM=">AHEHicjVLb9NAEHYLCSU82sKRy4oqKFRFRckuESqQEiIQ9Wi9CFlE2u92Sr+qXdNbRy9ydw4a9w4QBCXDly498waztYqdVLSWanZmd+eabGduNPC5Vu/1vafnW7Ur1zsrd2r37Dx6urq0/OpRhLCg7oKEXimOXSObxgB0orjx2HAlGfNdjR+7JW2M/+sSE5GHQVWcR6/tkHPARp0SBylmvPKujDsI+URPXTd5pJyGOwpL7OINTIehOpeOaraQdBK1aWtjQtGlpQXuTY09NlI9LGMfvDptPehmISnxko+6MfXDgo8nqo8wrtXRhwZWE6ZIs5AfKxJnWZzMoWE0Fzlmwqb6aUyIiEkUifAU4ZEgNLF1sqszSLxj68HuFJ5dhpfwltItRDKhWatnmQfPDTYixuB9apAZpb4EO1+2iXtN1QZgxPOazCEgrkfy8zwbc5abcFOmBdJBIB6oYrQLCDl9pEZYUWTLmsy72KuofmvpXOUJizcKiLwnEJxVUhryHp5iMz25/ZeAUk0H2lz82c65yWKxsLq3UHYqbdrdUQxF+jgaSwgJp91Em2guhbHM4cl2b5Fb1Jjuo8+HyEwaKoyaYqcqfT0kg2hlYt6kPenk29MaVJm5iAXdRqaxA43txLp2MCD3UIY6JKAowvHLtYOT/3yXFA2ESL8PNVAPcSLJuSK5i5q+wUrxdaT/F3lTJczI/m6Np/LZpGd/cHMrEODWwSGJ+cyUzprG+2tdvqgsmDnwoaVP3vO2l/ghMY+CxT1iJQ9ux2pfkKE4tRjuoZjySJCT8iY9UAMiM9kP0kxaVQHzRCNQgE/2KhUO3sjIb6UZ74LnoYLWbQZ5SJbL1aj1/2EB1GsWECzRKPYQypE5uAhlwqrwzEAgVHLAiOiEwHAq+ITUgwS6WXBYOt7fsF1vb+y83dt7kdKxYT6ynVsOyrVfWjvXe2rMOLFr5UvlW+VH5Wf1a/V79Vf2duS4v5XceW3NP9c9/BzNxnA=</latexit>s0 ∼ p(s0|s, a)
<latexit sha1_base64="qjvPZnMrTgJaZuVsuWAeqJ2o8h4=">AHGnicjVLb9NAEHYLCSW8WjhyWVFTWgU2QUJLpEqEBLiULUofUjZxFo7m2RVv+RdQytnfwcX/goXDiDEDXHh3zBrb9rETqulGh2Znbm29m107kMS5M89/K6q3bleqdtbu1e/cfPHy0vH4iIdJ7NJDN/TC+MQhnHosoIeCY+eRDElvuPRY+f0rbIf6IxZ2HQFecR7ftkHLARc4kAlb1RMeuog7BPxMRx0nfSToktMGc+jlgDu8NQTLktmi3E7VRsW1KZUHRpaYF7U2KPjkQP8QHr4pB908pEu89KNszPxwzMYT0UcY1+roQwOLCRWkWciPBUnyLHbu0FCaixzYTP9LCZExCSK4vAM4VFM3NS6Z7MIbGOJQd7M3hWGV7KWkK2EMmFZq2eZx48V9hIPAbvM4VMKeUl2MWyVdxrqlYAI6ZrUpuAOB7R+0U2Fiw34aZMC6SDQCwQxWgXEDR9qpE5YUWTLGty72KuofqvZXOUJSycKiLwnEJxVUhryHp5iMz35/5eAUk0H0hp2rOpablysaCSytzh+Jm3S3VUISv0UBSuEDcNptoGy2kUJYFPndW+YWNWb30WdDpCYNFUZN0DORPQ9pTIfQymU90P3p6BtTmpS5OdCizEKTxGbqVMptC3iwWgDXRxwdGHbxdJmZ/OBWTOA4/zRQD/GiCbmiucvafsFKsfVEv1X27HLmJF/X5ilvFtk5GMzNOjS4RWB4NJd8S795fGuqDPb6ptk2s4XKgqWFTUOvfXv9D7DjJj4NhOsRznuWGYl+SmLBXI/KGk4jYh7Ssa0B2JAfMr7aYZOojpohmgUxvCDu5Vp50+kxOf83HfAU7HCizalXGbrJWL0up+yIEoEDdw80SjxkAiR+k6gIYupK7xzEIgbM8CK3AmBMRHwNakBCVax5LJwtNO2XrR3Dl5u7r7RdKwZT41nRsOwjFfGrvHe2DcODbfypfKt8qPys/q1+r36q/o7d1d0WeGAur+vc/yoZ0xA=</latexit>Qβ(s, a)
<latexit sha1_base64="OBpgOmPA0pqOmGPEf8xNI1PvyCg=">AHFnicjVLb9NAEHYLCSW8WjhyWVFSqhVxQUJLpEqEBLiULUofUjZ1FpvNsmqfsm7hlbu/gou/BUuHECIK+LGv2HW3rSJnVa1lGh2Znbm29mbC/2uZCdzr+l5Vu3a/U7K3cb9+4/ePhode3xgYjShLJ9GvlRcuQRwXwesn3Jpc+O4oSRwPZoXfyVtsP7FE8CjsybOYDQIyDvmIUyJB5a7V7CbqIhwQOfG87J1yM+JKLHiAY97CdBjJc+HKto2Em8kNR2kTi8tNri3FfbZSPaxSAPw6nbUca8ISYmfVStqR9O+HgiBwjRhN9aGE5YZK0S/mxJGmRxS0cWlpzkWMmbK6fxoSImMRxEp0iPEoIzRyV7agCEu86nhnCs+pwsu4LZWNSCG0G80i8/FzjY0kY/A+1ci0Ul2CnS9bx72mag0w5qYmfQiJ5xNznmdjznITbq0QDoIxENZjnYBwdCnG1kQVjapqbwLuca6v9GPkd5wtKtMgI/GldQXBXyGpJuPjKz/ZmNV0IC3ZfqXM+5MrRc2VhwsXN3KG7a3UoNZfgGDSFBRJup4020FwKbZnDU+zeIre4Nd3HgA+RnjRUGjXJTmX+esgSNoRWLuqB6U/XbExlUmbmwIgqD01Sl+tbmXAd4MGxEQa6BODowbGHlctzP5MLyiZJEn2eaqAe4scTckVzF7X9gpVy64l5V7nT5SxIvq7N56JdZmfveGbWocE2geExXO652NP3tNJdXe9sdvIHVQXHCOuWeXbd1b/ADE0DFkrqEyH6TieWg4wklOfqQZOBYsJPSFj1gcxJAETgyxHplATNEM0ihL4wV7l2tkbGQmEOAs8NSMiLJNKxfZ+qkcvR5kPIxTyUJaJBqlPpIR0t8INOQJo9I/A4HQhANWRCcERkTCl6QBJDjlkqvCwdam82Jza+/l+vYbQ8eK9dR6ZrUsx3plbVvrV1r36K1L7VvtR+1n/Wv9e/1X/XfhevykrnzxJp76n/+A6wMc18=</latexit>β
<latexit sha1_base64="GoJsjmUFYSV/lMv6vYmSHaSE0zY=">AH13icjVLb9NAEHYLTUp4tXDksqKmtBQxQUJLpEqEBLiUDUofaBsaq2dTbKqX/KuocVdcQAhrvw1bvwI/gOzfqR+pFUtJVrPzM5832za9O3GRfd7t+l5Vu3V2r1TuNu/fuP3i4tv7okHthYNEDy7O94NgknNrMpQeCZse+wEljmnTI/P0rfIfaYBZ547EOc+HTlk6rIJs4gAk7G+8q+Jeg7RMxM3onjYgYAnPmYJ+1sDX2xAU3RLuDuBGJLV0qF/IvPR0Ib0ts04kYh46ENXrypNBktIidvRtrI4HLDpTIwQxo0m+tDCYkYFaZfqY0HCpIqRBLSUZV4jlza2ZzkhIya+H3hnCE8CYkW6jPZkAon1dHmyl8HTq/Ai1hGyg0iyaDeaSeWTZwobCaYQfaQKaO8BFtsW+W9pmsF0GdpT+rFJaZN0vciGwXPTbip0gLlIBFzRTnbHEJKnxIyIazsklVLEl2uNVb/jXiO4oKlXWUEtjetoLgq5TUk3Xxk8vrk85WQgPpCXqg5lyktVwoLIZ04HJrL1K30UIafoGicIC40W2jLVQoTwFPMnZWxTmt7Lz6LAxUpOGSqMm6JmIr4coGOQcpEGqT69MRUJiU3B+lSxqlJaDC1K+KGDjzoHYSBLg4BvA6wNJgcVxaC9omQeB9ySzQD7H9GblC3EWyz1kpS0/Su8rIDmdC8nUyX/B2mZ3+SW7WQeAOgeHJcTmXL+6j1EZ5okmiSbtvYFO9q2wqy5jasKtXvHrApxLlO9qMRzmHp9/imx2y2UbPUT/JFiduFqg1U0gx5UmpjIXYlew01ja62934QdWFni42tPTZN9b+gLRW6FBXWDbhfKh3fTGKSCYZVPZwCGnPrFOyZQOYekSh/JRFMrURMsYzTxAvjBxRBb8zsi4nB+7pgQqQjgZ8yLvINQzF5PYqY64eCulZSaBLaSHhIfeTQmAXUEvY5LIgVMCKrBmBGRfwKWwACXq5ericGdbf7G903+5sfsmpWNVe6I91Vqar3SdrX32r52oFm1Qe1r7XvtR/1T/Vv9Z/1XErq8lO5rBWe+u/8VGzg=</latexit>48
θ
<latexit sha1_base64="2zAaSTtsz8LmjwqlL7nkIJ/gjBM=">AHEHicjVLb9NAEHYLCSU82sKRy4oqKFRFRckuESqQEiIQ9Wi9CFlE2u92Sr+qXdNbRy9ydw4a9w4QBCXDly498waztYqdVLSWanZmd+eabGduNPC5Vu/1vafnW7Ur1zsrd2r37Dx6urq0/OpRhLCg7oKEXimOXSObxgB0orjx2HAlGfNdjR+7JW2M/+sSE5GHQVWcR6/tkHPARp0SBylmvPKujDsI+URPXTd5pJyGOwpL7OINTIehOpeOaraQdBK1aWtjQtGlpQXuTY09NlI9LGMfvDptPehmISnxko+6MfXDgo8nqo8wrtXRhwZWE6ZIs5AfKxJnWZzMoWE0Fzlmwqb6aUyIiEkUifAU4ZEgNLF1sqszSLxj68HuFJ5dhpfwltItRDKhWatnmQfPDTYixuB9apAZpb4EO1+2iXtN1QZgxPOazCEgrkfy8zwbc5abcFOmBdJBIB6oYrQLCDl9pEZYUWTLmsy72KuofmvpXOUJizcKiLwnEJxVUhryHp5iMz25/ZeAUk0H2lz82c65yWKxsLq3UHYqbdrdUQxF+jgaSwgJp91Em2guhbHM4cl2b5Fb1Jjuo8+HyEwaKoyaYqcqfT0kg2hlYt6kPenk29MaVJm5iAXdRqaxA43txLp2MCD3UIY6JKAowvHLtYOT/3yXFA2ESL8PNVAPcSLJuSK5i5q+wUrxdaT/F3lTJczI/m6Np/LZpGd/cHMrEODWwSGJ+cyUzprG+2tdvqgsmDnwoaVP3vO2l/ghMY+CxT1iJQ9ux2pfkKE4tRjuoZjySJCT8iY9UAMiM9kP0kxaVQHzRCNQgE/2KhUO3sjIb6UZ74LnoYLWbQZ5SJbL1aj1/2EB1GsWECzRKPYQypE5uAhlwqrwzEAgVHLAiOiEwHAq+ITUgwS6WXBYOt7fsF1vb+y83dt7kdKxYT6ynVsOyrVfWjvXe2rMOLFr5UvlW+VH5Wf1a/V79Vf2duS4v5XceW3NP9c9/BzNxnA=</latexit>s0 ∼ p(s0|s, a)
<latexit sha1_base64="qjvPZnMrTgJaZuVsuWAeqJ2o8h4=">AHGnicjVLb9NAEHYLCSW8WjhyWVFTWgU2QUJLpEqEBLiULUofUjZxFo7m2RVv+RdQytnfwcX/goXDiDEDXHh3zBrb9rETqulGh2Znbm29m107kMS5M89/K6q3bleqdtbu1e/cfPHy0vH4iIdJ7NJDN/TC+MQhnHosoIeCY+eRDElvuPRY+f0rbIf6IxZ2HQFecR7ftkHLARc4kAlb1RMeuog7BPxMRx0nfSToktMGc+jlgDu8NQTLktmi3E7VRsW1KZUHRpaYF7U2KPjkQP8QHr4pB908pEu89KNszPxwzMYT0UcY1+roQwOLCRWkWciPBUnyLHbu0FCaixzYTP9LCZExCSK4vAM4VFM3NS6Z7MIbGOJQd7M3hWGV7KWkK2EMmFZq2eZx48V9hIPAbvM4VMKeUl2MWyVdxrqlYAI6ZrUpuAOB7R+0U2Fiw34aZMC6SDQCwQxWgXEDR9qpE5YUWTLGty72KuofqvZXOUJSycKiLwnEJxVUhryHp5iMz35/5eAUk0H0hp2rOpablysaCSytzh+Jm3S3VUISv0UBSuEDcNptoGy2kUJYFPndW+YWNWb30WdDpCYNFUZN0DORPQ9pTIfQymU90P3p6BtTmpS5OdCizEKTxGbqVMptC3iwWgDXRxwdGHbxdJmZ/OBWTOA4/zRQD/GiCbmiucvafsFKsfVEv1X27HLmJF/X5ilvFtk5GMzNOjS4RWB4NJd8S795fGuqDPb6ptk2s4XKgqWFTUOvfXv9D7DjJj4NhOsRznuWGYl+SmLBXI/KGk4jYh7Ssa0B2JAfMr7aYZOojpohmgUxvCDu5Vp50+kxOf83HfAU7HCizalXGbrJWL0up+yIEoEDdw80SjxkAiR+k6gIYupK7xzEIgbM8CK3AmBMRHwNakBCVax5LJwtNO2XrR3Dl5u7r7RdKwZT41nRsOwjFfGrvHe2DcODbfypfKt8qPys/q1+r36q/o7d1d0WeGAur+vc/yoZ0xA=</latexit>Qβ(s, a)
<latexit sha1_base64="OBpgOmPA0pqOmGPEf8xNI1PvyCg=">AHFnicjVLb9NAEHYLCSW8WjhyWVFSqhVxQUJLpEqEBLiULUofUjZ1FpvNsmqfsm7hlbu/gou/BUuHECIK+LGv2HW3rSJnVa1lGh2Znbm29mbC/2uZCdzr+l5Vu3a/U7K3cb9+4/ePhode3xgYjShLJ9GvlRcuQRwXwesn3Jpc+O4oSRwPZoXfyVtsP7FE8CjsybOYDQIyDvmIUyJB5a7V7CbqIhwQOfG87J1yM+JKLHiAY97CdBjJc+HKto2Em8kNR2kTi8tNri3FfbZSPaxSAPw6nbUca8ISYmfVStqR9O+HgiBwjRhN9aGE5YZK0S/mxJGmRxS0cWlpzkWMmbK6fxoSImMRxEp0iPEoIzRyV7agCEu86nhnCs+pwsu4LZWNSCG0G80i8/FzjY0kY/A+1ci0Ul2CnS9bx72mag0w5qYmfQiJ5xNznmdjznITbq0QDoIxENZjnYBwdCnG1kQVjapqbwLuca6v9GPkd5wtKtMgI/GldQXBXyGpJuPjKz/ZmNV0IC3ZfqXM+5MrRc2VhwsXN3KG7a3UoNZfgGDSFBRJup4020FwKbZnDU+zeIre4Nd3HgA+RnjRUGjXJTmX+esgSNoRWLuqB6U/XbExlUmbmwIgqD01Sl+tbmXAd4MGxEQa6BODowbGHlctzP5MLyiZJEn2eaqAe4scTckVzF7X9gpVy64l5V7nT5SxIvq7N56JdZmfveGbWocE2geExXO652NP3tNJdXe9sdvIHVQXHCOuWeXbd1b/ADE0DFkrqEyH6TieWg4wklOfqQZOBYsJPSFj1gcxJAETgyxHplATNEM0ihL4wV7l2tkbGQmEOAs8NSMiLJNKxfZ+qkcvR5kPIxTyUJaJBqlPpIR0t8INOQJo9I/A4HQhANWRCcERkTCl6QBJDjlkqvCwdam82Jza+/l+vYbQ8eK9dR6ZrUsx3plbVvrV1r36K1L7VvtR+1n/Wv9e/1X/XfhevykrnzxJp76n/+A6wMc18=</latexit>β
<latexit sha1_base64="GoJsjmUFYSV/lMv6vYmSHaSE0zY=">AH13icjVLb9NAEHYLTUp4tXDksqKmtBQxQUJLpEqEBLiUDUofaBsaq2dTbKqX/KuocVdcQAhrvw1bvwI/gOzfqR+pFUtJVrPzM5832za9O3GRfd7t+l5Vu3V2r1TuNu/fuP3i4tv7okHthYNEDy7O94NgknNrMpQeCZse+wEljmnTI/P0rfIfaYBZ547EOc+HTlk6rIJs4gAk7G+8q+Jeg7RMxM3onjYgYAnPmYJ+1sDX2xAU3RLuDuBGJLV0qF/IvPR0Ib0ts04kYh46ENXrypNBktIidvRtrI4HLDpTIwQxo0m+tDCYkYFaZfqY0HCpIqRBLSUZV4jlza2ZzkhIya+H3hnCE8CYkW6jPZkAon1dHmyl8HTq/Ai1hGyg0iyaDeaSeWTZwobCaYQfaQKaO8BFtsW+W9pmsF0GdpT+rFJaZN0vciGwXPTbip0gLlIBFzRTnbHEJKnxIyIazsklVLEl2uNVb/jXiO4oKlXWUEtjetoLgq5TUk3Xxk8vrk85WQgPpCXqg5lyktVwoLIZ04HJrL1K30UIafoGicIC40W2jLVQoTwFPMnZWxTmt7Lz6LAxUpOGSqMm6JmIr4coGOQcpEGqT69MRUJiU3B+lSxqlJaDC1K+KGDjzoHYSBLg4BvA6wNJgcVxaC9omQeB9ySzQD7H9GblC3EWyz1kpS0/Su8rIDmdC8nUyX/B2mZ3+SW7WQeAOgeHJcTmXL+6j1EZ5okmiSbtvYFO9q2wqy5jasKtXvHrApxLlO9qMRzmHp9/imx2y2UbPUT/JFiduFqg1U0gx5UmpjIXYlew01ja62934QdWFni42tPTZN9b+gLRW6FBXWDbhfKh3fTGKSCYZVPZwCGnPrFOyZQOYekSh/JRFMrURMsYzTxAvjBxRBb8zsi4nB+7pgQqQjgZ8yLvINQzF5PYqY64eCulZSaBLaSHhIfeTQmAXUEvY5LIgVMCKrBmBGRfwKWwACXq5ericGdbf7G903+5sfsmpWNVe6I91Vqar3SdrX32r52oFm1Qe1r7XvtR/1T/Vv9Z/1XErq8lO5rBWe+u/8VGzg=</latexit>β
<latexit sha1_base64="EpwadKl79nuFVBzWqBryC0A38=">AB7HicbVBNS8NAEJ34WetX1aOXxSJ4KkV9Fj04rGCaQtKJvtpl262YTdiVBCf4MXD4p49Qd589+4bXPQ1gcDj/dmJkXplIYdN1vZ219Y3Nru7RT3t3bPzisHB23TJpxn2WyER3Qmq4FIr7KFDyTqo5jUPJ2+H4bua3n7g2IlGPOEl5ENOhEpFgFK3k90KOtF+pujV3DrJKvIJUoUCzX/nqDRKWxVwhk9SYruemGORUo2CST8u9zPCUsjEd8q6lisbcBPn82Ck5t8qARIm2pZDM1d8TOY2NmcSh7YwpjsyNxP/87oZRjdBLlSaIVdsSjKJMGEzD4nA6E5QzmxhDIt7K2EjaimDG0+ZRuCt/zyKmnVa95lrf5wVW3cFnGU4BTO4AI8uIYG3EMTfGAg4Ble4c1Rzovz7nwsWtecYuYE/sD5/AHFJI6o</latexit>a ← a0, s ← s0
<latexit sha1_base64="51Vrw+Ar65qn/gf5hVafKincpQw=">AIEnicjVLb9NAEHYLJCW8WjhyWVFiSFUcUGCS6QKhIQ4VA1KH1I3sdbOJlnVL3nX0Mrd38CFv8KFAwhx5cSNf8OsH6ljp1UtdqdmZ35vtmN1bgMC63X8rqzdu3qrV12437ty9d/B+sbDA+5HoU3bd/xwyOLcOowj+4LJhx6FISUuJZD62Tt8p/+ImGnPneQJwFdOiSqcmzCYCTOZGrd1EPYRdImaWFb+TZkxMgTlzcDa2B74pybQu8gbsbimSGVCwUXng6E6xI7dCKOMY9ciOp15WiQprSJE3+U7TwOh2w6E0OEcaOJPrSxmFB9FJ9LEiUVjHTgLayzGsU0ib2PCdkxCQIQv8U4UlI7NiQ8a5MIbGeIUe7OTyjCi9mHSE7iKQLvdFMK4+eKmwknEL0qUKmjPIC7GLbKu8VXSuAct6UhuPWA7J9otsLHiuw02VFigHiZgnytnmEDL6lJApYWXrFrS6HKtsfrfSOYoKVg6VUbg+NMKistSXkHS9UemqE8xXwkJqC/kuZpzmdFyqbAQ0knCoblc3UoPZfgZGigKF4ibXR09QwslGcBT3r3loUF7fw+umyM1KSh0qgJeiqS5yEO6RikXKZBpk8vuzGVSnMQbaUSWoSmUydirlpA9GB2GgiwOAWwHWJosictqQdskDP3PuQX6IU4wI5eIu0z2OStl6Un2Vpn5UxJvkrmc6X2emPCrMOAncIDE+By7l8SR+lNsoTVJN9L6JLbVX2VSWMXgVG/x6QGfSlTsqJWMcgFPv81bHdLS0XPUz7IlmZsL3FoZpoTztFZOQ+LKj6ISZflbrjcKLSLSgie/sOctc32zu9VNPlRdGNliU8u+PXP9L0yFHbnUE7ZDOD82uoEYxiQUzHaobOCI04DYJ2RKj2HpEZfyYZyoIlETLGM08UP4gzclsRZPxMTl/My1IFJx8s+ZVzmO47E5PUwZl4QCerZaFJ5CDhI/X7iMYspLZwzmB7JABVmTPCFwPAb+iDSDBKLdcXRxsbxkvtrb7Lzd3mR0rGmPtSdaWzO0V9qO9l7b0/Y1u/al9q32o/az/rX+vf6r/jsNXV3JzjzSFr76n/wM8mH</latexit>49
50
= Qπθ(s, a) − V πθ(s)
<latexit sha1_base64="i8bxdaOGhE4+O6ebZz4faxk1DA8=">AIQHicjVLb9s4EFZfdut9Je2xF2IDw1brBla2QHsxUHRYLGHoi6SNEDoCJRM20T1gki1CRT+tF72J+xtz73soUWx1z1SFGJTCVBCQgZ4bfPNkA6yiHExHv9z4+at23c63bv3ej/8+NPv2xs3t/naZGHdC9MozQ/CAinEUvonmAiogdZTkcRPRd8P535X/3geacpcmuOMnoLCbLhC1YSASY/M3Ofh9NEI6JWAVB+Ur6JfEF5izGRvicJ6KU+4Ld4S4X4rHnlQulJ17RhDuShzRhTjEvIghajKWR7sVZEi8q0c1nE4Z8uVmCGMe305xCLFRXEtfJjQYoqi18FDJXlLEcDVtrTEDEJMvy9BjhRU7C0pPla1lRYhNPHr2u6XlteiUbCTlCpFq4vX6V+eiR4kbyJUQfK2bKM/JrpetcK+oWhHMmKlJbRISRMTs19VY81xHm7YskA6AWCJstDMKRj7VyEow2yXblirazjVX/3t6jnRC65TNIEqXLRaXQV4h0vVHptmfJp7FBLov5Kmac2lkubSxEDLS4VBc3d1WDTZ9waSwgXi/thFj9FaCuVZ41PdvYvCsmF9H2M2R2rSkDVqgh4L/TyUOZ1DKy/qgenPxNyY1qQ05sAspYmhc/UqZL7HujgjRAGuTjw2IXtLpY+03EmF5RN8jz9WFugHhJlK3Jc0FK23Emit15Yp4qv76blcZXdfmUuy1x5jRShxvqnTVM7eI2zNMqi64Ux8Has9HRHdfw9aM6+EBnwJqFjHQw9soYzrkgxEZuOgJmho0jdxfUzMwnLTKVa6cu2qjyJLpfr1rlzngGQAz3xjzwc9W6nJ9KjxKphqnqB9y+pKf2NrvD3WH2ovPLPYcsz3xt/4G2YoLGKaiDAinB9640zMSpILFkZU9nDBaUbC92RJD2GZkJjyWampSdQHyxwt0hz+4AXS1uaJksScn8QBRCrdue1Txot8h4VYPJ+VLMkKQZOwSrQoIiRSpH5N0ZzlNBTRCSxImDPgisIVgcsk4De3ByJ4dsntxf7Otvfb9s706daLl0aOu85D51dn6HjOM+eF84fzxtlzws6nzufOl87X7l/df7vfuv9VoTdvmDMPnLWv+/93B8Hcw=</latexit>“how much better is an action than expected?
51