Multi-Agent Adversarial Inverse Reinforcement Learning
Contact: lantaoyu@cs.stanford.edu
Lantao Yu, Jiaming Song, Stefano Ermon
Department of Computer Science, Stanford University
Multi-Agent Adversarial Inverse Reinforcement Learning Lantao Yu, - - PowerPoint PPT Presentation
Multi-Agent Adversarial Inverse Reinforcement Learning Lantao Yu, Jiaming Song, Stefano Ermon Department of Computer Science, Stanford University Contact: lantaoyu@cs.stanford.edu <latexit
Contact: lantaoyu@cs.stanford.edu
Department of Computer Science, Stanford University
t=1 γtr(st, at)
Computer Games Dialogue Multi-Agent System
π∈Π EπE[log π(a|s)]
<latexit sha1_base64="oLJVDvbL3g/9lhUVNX+FRAp9xk0=">ACJXicbVDLSsNAFJ34rPVdelmsAjVRUmqoIsKBSm4rGAf0MQwmU7boZNJmJmIJeZn3PgrblxYRHDlrzhJu9DWAwNnzrmXe+/xQkalMs0vY2l5ZXVtPbeR39za3tkt7O23ZBAJTJo4YIHoeEgSRjlpKqoY6YSCIN9jpO2NrlO/UCEpAG/U+OQOD4acNqnGCktuYWqHdL7U3gFbR89urH+QZtyaDdokpq6HlxPckMt50bRYMoOYl9CRPHLdQNMtmBrhIrBkpghkabmFi9wIc+YQrzJCUXcsMlRMjoShmJMnbkSQhwiM0IF1NOfKJdOLsygQea6UH+4HQjyuYqb87YuRLOfY9XZkuLue9VPzP60aqf+nElIeRIhxPB/UjBlUA08hgjwqCFRtrgrCgeleIh0grHSweR2CNX/yImlVytZuXJ7XqxVZ3HkwCE4AiVgQtQAzegAZoAg2fwCt7BxHgx3owP43NaumTMeg7AHxjfP2AHpJQ=</latexit>IRL RL
π∈Π EπE[log π(a|s)]
<latexit sha1_base64="oLJVDvbL3g/9lhUVNX+FRAp9xk0=">ACJXicbVDLSsNAFJ34rPVdelmsAjVRUmqoIsKBSm4rGAf0MQwmU7boZNJmJmIJeZn3PgrblxYRHDlrzhJu9DWAwNnzrmXe+/xQkalMs0vY2l5ZXVtPbeR39za3tkt7O23ZBAJTJo4YIHoeEgSRjlpKqoY6YSCIN9jpO2NrlO/UCEpAG/U+OQOD4acNqnGCktuYWqHdL7U3gFbR89urH+QZtyaDdokpq6HlxPckMt50bRYMoOYl9CRPHLdQNMtmBrhIrBkpghkabmFi9wIc+YQrzJCUXcsMlRMjoShmJMnbkSQhwiM0IF1NOfKJdOLsygQea6UH+4HQjyuYqb87YuRLOfY9XZkuLue9VPzP60aqf+nElIeRIhxPB/UjBlUA08hgjwqCFRtrgrCgeleIh0grHSweR2CNX/yImlVytZuXJ7XqxVZ3HkwCE4AiVgQtQAzegAZoAg2fwCt7BxHgx3owP43NaumTMeg7AHxjfP2AHpJQ=</latexit>Matching with GAN p(s, a)
<latexit sha1_base64="3q69riKgMJ7dCLeUS7TDplOABc=">AB7XicbVBNSwMxEJ2tX7V+VT16CRahgpTdVrDHghePFewHtEvJptk2NpsSVYoS/+DFw+KePX/ePfmLZ70NYHA4/3ZpiZF8ScaeO6305uY3Nreye/W9jbPzg8Kh6ftLVMFKEtIrlU3QBrypmgLcMp91YURwFnHaCye3c7zxRpZkUD2YaUz/CI8FCRrCxUjsu6yt8OSiW3Iq7AFonXkZKkKE5KH71h5IkERWGcKx1z3Nj46dYGUY4nRX6iaYxJhM8oj1LBY6o9tPFtTN0YZUhCqWyJQxaqL8nUhxpPY0C2xlhM9ar3lz8z+slJqz7KRNxYqgy0VhwpGRaP46GjJFieFTSzBRzN6KyBgrTIwNqGBD8FZfXiftasWrVar316VGPYsjD2dwDmXw4AYacAdNaAGBR3iGV3hzpPivDsfy9ack82cwh84nz+UwI5x</latexit>r∗ = (object pos − goal pos)2
<latexit sha1_base64="yr7Og+EdZ1l80MCEkGrNsiJiA=">ACGHicbZDLSgMxFIYzXmu9V26CRahCtaZKtiNUHDjsoK9QC9DJs20sZnJkJwRy9DHcOruHGhiNvufBvTy0Jbfwj8fOcTs7vRYJrsO1va2l5ZXVtPbWR3tza3tnN7O1XtYwVZRUqhVR1j2gmeMgqwEGweqQYCTzBal7/ZlyvPTKluQzvYRCxVkC6Ifc5JWCQmzlX7VN8jXNYE+QSO+BUWi6kdRDfIansCuJmKTdsHNZO28PRFeNM7MZNFMZTczanYkjQMWAhVE64ZjR9BKiAJOBRum7FmEaF90mUNY0MSMN1KJocN8bEhHexLZV4IeEJ/TyQk0HoQeKYzINDT87Ux/K/WiMEvthIeRjGwkE4X+bHAIPE4JdzhygQhBsYQqrj5K6Y9ogFk2XahODMn7xoqoW8c5Ev3F1mS8VZHCl0iI5QDjnoCpXQLSqjCqLoGb2id/RhvVhv1qf1NW1dsmYzB+iPrNEP2UGfpg=</latexit>VS.
π∗ : S → P(A)
<latexit sha1_base64="iMdxnKiMTtlEX/9UuknRpqdtNU=">ACGHicbVC7TsMwFHXKq5RXgJHFokIqDCUpCBTEQtjEfQhNaFyXLe16jiR7SBVUT6DhV9hYQAh1m78DU4b8SgcydLxOfq3nu8kFGpLOvDyM3NLywu5ZcLK6tr6xvm5lZDBpHApI4DFoiWhyRhlJO6oqRVigI8j1Gmt7wMvWb90RIGvBbNQqJ6M+pz2KkdJSxzx0Qnp3cA4dH6kBRiy+SaCjgu9/LSl98Ytkv2MWrbI1AfxL7IwUQYZaxw73QBHPuEKMyRl27ZC5cZIKIoZSQpOJEmI8BD1SVtTjnwi3XhyWAL3tNKFvUDoxWcqD87YuRLOfI9XZnuKGe9VPzPa0eqd+bGlIeRIhxPB/UiBvXlaUqwSwXBio0QVhQvSvEAyQVjrLg7Bnj35L2lUyvZRuXJ9XKyeZHkwQ7YBSVg1NQBVegBuoAgwfwBF7Aq/FoPBtvxvu0NGdkPdvgF4zxJyEOn84=</latexit>pω(τ) ∝ " η(s1)
T
Y
t=1
P(st+1|st, at) # exp T X
t=1
rω(st, at) ! max
ω
EπE [log pω(τ)] = Eτ∼πE " T X
t=1
rω(st, at) # − log Zω
<latexit sha1_base64="BmMsp83qz7lP2nj6Yhkur/PhWg0=">ADHnicbVJdixMxFM2MX2v92K4+hIsLrOopbMW9GVhQRZ8rNDuLjbtkEnTadhkJiR3ZMs4v8QX/4ovPigi+KT/xkw7Ld2uF0IO596Te26SWEthodP56/k3bt6fWfnbuPe/QcPd5t7j05tlhvGByTmTmPqeVSpHwAiQ/14ZTFUt+Fl+8rfJnH7mxIkv7MNd8pGiSiqlgFBwV7Xndfawjkime0IAzQ8w0SbTkGEi+RSGhAMN7Dhc8pOogKOwHPdxz5EFPA/LT3YML+gYDogRyQxGmPBLvdAGmNhcrRWmboODSoErCV5q3E4a2MU+JoperuochlkcFydlVBAtopOytiSzZMv0qvfRVZFLOQ9C4U35pquiX659BatB8Oq0l3jR6sPaUJqluYq5iZqtTruzCHwdhDVoTp6UfM3mWQsVzwFJqm1w7CjYVRQA4JXjZIbrm7ImfOhgShW3o2LxvCV+5pgJnmbGrRTwgt1UFRZO1exq6yGt9u5ivxfbpjD9M2oEKnOgads2WiaS+wev/oreCIMZyDnDlBmhPOK2YwaysD9qIa7hHB75Ovg9LAdvmofvu+2jrv1deygJ+gpClCIXqNj9A710Ax7P31fvu/fC/+N/8n/6vZanv1ZrH6Er4f/4BEzr8cQ=</latexit>Dω,φ(s, a, s0) = exp(fω,φ(s, a, s0)) exp(fω,φ(s, a, s0)) + π(a|s) fω,φ(s, a, s0) = rω(s, a) + γhφ(s0) − hφ(s)
<latexit sha1_base64="ny4vLkUWrovcqtB2OJoCK96BkwA=">ACnXicfVFbixMxFM6Mt7Xe6vqmDwaLOmVrmdld0BehoIKyAp2u9CU4Ux6pg2bzIQkI5Zx/pW/xDf/jZm2iO6uHgh8+S6c5JxMS2FdHP8MwkuXr1y9tnO9c+Pmrdt3und3j21ZGY5jXsrSnGRgUYoCx04iSfaIKhM4iQ7fdXqky9orCiLz26lcaZgUYhcHCeSrvfX6c1KxUuYMD0UjSRHcDAPu3TJy8pyw3wmuFXHeUXu/rN/2W6R5kWEXyz/Yaxzj9sbS+TbpSW8cWoBTQZdo6o9b07Peln3Z78TBeFz0Pki3okW0dpd0fbF7ySmHhuARrp0ms3awG4wSX2HRYZVEDP4UFTj0sQKGd1evpNvSxZ+Y0L40/haNr9s9EDcralcq8U4Fb2rNaS16kTSuXv5jVotCVw4JvGuWVpK6k7aroXBjkTq48AG6EfyvlS/A7cX6hHT+E5OyXz4Pj/WFyMNz/dNgbHW7HsUMekEckIgl5TkbkLTkiY8KD+8EoeBe8Dx+Gb8IP4ceNQy2mXvkrwonvwAWHcYS</latexit>log D − log(1 − D)
<latexit sha1_base64="3yil4nLVRNVidgOiICD05EDlg=">AB+3icbZDNTgIxFIXv4B/i34hLN43EBeQGSTRJYksXGIiSAIT0ikFGjrTSdsxkgmv4saFxrj1Rdz5NnZgFgqepMmXc+/NvT1+xJnSjvNt5TY2t7Z38ruFvf2DwyP7uNhRIpaEtongQnZ9rChnIW1rpjntRpLiwOf0wZ/epPWHRyoVE+G9nkXUC/A4ZCNGsDbWwC72uRijJqgFMpupXkxsEtO1VkIrYObQkytQb2V38oSBzQUBOleq5TqS9BEvNCKfzQj9WNMJkise0ZzDEAVesrh9js6NM0QjIc0LNVq4vycSHCg1C3zTGWA9Uau1Pyv1ov16NpLWBjFmoZkuWgUc6QFSoNAQyYp0XxmABPJzK2ITLDERJu4CiYEd/XL69CpVd3Lau2uXmrUszjycApnUAYXrqABt9CNhB4gmd4hTdrbr1Y79bHsjVnZTMn8EfW5w/coJV</latexit>rω(s, a)
<latexit sha1_base64="GfFmbREzrOhfDvSh1z51/BnqU=">AB9HicbVBNSwMxEJ2tX7V+VT16CRahgpTdWtBjwYvHCvYD2qVk02wbmTXJFsoS3+HFw+KePXHePfmLZ70NYHA4/3ZpiZF8ScaeO6305uY3Nreye/W9jbPzg8Kh6ftHSUKEKbJOKR6gRYU84kbRpmO3EimIRcNoOxndzvz2hSrNIPpTH2Bh5KFjGBjJV/1e5GgQ1zWV/iyXy5FXcBtE68jJQgQ6Nf/OoNIpIKg3hWOu58bGT7EyjHA6K/QSTWNMxnhIu5ZKLKj208XRM3RhlQEKI2VLGrRQf0+kWGg9FYHtFNiM9Ko3F/zuokJb/2UyTgxVJLlojDhyERongAaMEWJ4VNLMFHM3orICtMjM2pYEPwVl9eJ61qxbuVB9qpXotiyMPZ3AOZfDgBupwDw1oAoEneIZXeHMmzovz7nwsW3NONnMKf+B8/gDj0ZF5</latexit>i=1
<latexit sha1_base64="HYRQ8CwFtx4Y2154zpdwkeRb0c=">ACAnicbVDLSsNAFJ3UV62vqCtxM1gEVyWpgt0IFTeupIJ9QBPDZDph04mYWYilBDc+CtuXCji1q9w5984abPQ1gMXDufcy73+DGjUlnWt1FaWl5ZXSuvVzY2t7Z3zN29jowSgUkbRywSPR9JwignbUVI71YEBT6jHT98VXudx+IkDTid2oSEzdEQ04DipHSkmceOKkTIjXCiKWXmUedzEvphZ3d3hm1apZU8BFYhekCgq0PLGUQ4CQlXmCEp+7YVKzdFQlHMSFZxEklihMdoSPqachQS6abTFzJ4rJUBDCKhiys4VX9PpCiUchL6ujM/V857ufif109U0HBTyuNEY5ni4KEQRXBPA84oIJgxSaICyovhXiERIK51aRYdgz7+8SDr1mn1aq9+eVZuNIo4yOARH4ATY4Bw0wTVogTbA4BE8g1fwZjwZL8a78TFrLRnFzD74A+PzB2FTl2A=</latexit>[Heckerman et al, 2000] and best response dynamics [Nisan et al, 2011].
A1 × . . . × AN
<latexit sha1_base64="kn5TnKw9SeS8AHDpmRMoliI/i/I=">ACGXicbVDLSsNAFJ34rPUVdelmsAiuSlJFXVbcuJIK9gFNCJPJpB06eTBzI5TQ3Djr7hxoYhLXfk3TtsgtfXAwJlz7uXe/xUcAW9W0sLa+srq2XNsqbW9s7u+befkslmaSsSRORyI5PFBM8Zk3gIFgnlYxEvmBtf3A9tsPTCqexPcwTJkbkV7MQ04JaMkzLSci0KdE5Fcjz8YO8Igp7IgAfX7mym59cyKVbUmwIvELkgFWh45qcTJDSLWAxUEKW6tpWCmxMJnAo2KjuZYimhA9JjXU1joke6+eSyET7WSoDROoXA56osx05iZQaRr6uHC+p5r2x+J/XzSC8dHMepxmwmE4HhZnAkOBxTDjgklEQ0IlVzvimfSEJBh1nWIdjzJy+SVq1qn1Zrd2eV+nkRwkdoiN0gmx0geroBjVQE1H0iJ7RK3oznowX4934mJYuGUXPAfoD4+sHkJ6gnA=</latexit>k
<latexit sha1_base64="tgIj26ZWcA3amFO+MGSWQOuZURM=">AB6HicbVBNS8NAEJ34WetX1aOXxSJ4Kkt6LHgxWML9gPaUDbSbt2swm7G6GE/gIvHhTx6k/y5r9x2+agrQ8GHu/NMDMvSATXxnW/nY3Nre2d3cJecf/g8Oi4dHLa1nGqGLZYLGLVDahGwSW2DcCu4lCGgUCO8Hkbu53nlBpHsHM03Qj+hI8pAzaqzUnAxKZbfiLkDWiZeTMuRoDEpf/WHM0gilYJq3fPcxPgZVYzgbNiP9WYUDahI+xZKmE2s8Wh87IpVWGJIyVLWnIQv09kdFI62kU2M6ImrFe9ebif14vNeGtn3GZpAYlWy4KU0FMTOZfkyFXyIyYWkKZ4vZWwsZUWZsNkUbgrf68jpVyvedaXarJXrtTyOApzDBVyBzdQh3toQAsYIDzDK7w5j86L8+58LFs3nHzmDP7A+fwBz52M5Q=</latexit>z(k) = (z1, · · · , zN)(k)
<latexit sha1_base64="pqxlNX+9M8hJ1g9ojSFWK47KS1Q=">ACFHicbVDLSsNAFJ3UV62vqEs3g0VosZSkFnQjFNy4kgr2AU0Nk+mkHTp5MDMR2pCPcOvuHGhiFsX7vwbJ20W2npg4HDOvdw5xwkZFdIwvrXcyura+kZ+s7C1vbO7p+8ftEUQcUxaOGAB7zpIEZ90pJUMtINOUGew0jHGV+lfueBcED/05OQtL30NCnLsVIKsnWTy0PyZHjxtPkPi6Nywm8hKWpbVaghQeBFBU4tW/Kc8vWi0bVmAEuEzMjRZChaetf1iDAkUd8iRkSomcaoezHiEuKGUkKViRIiPAYDUlPUR95RPTjWagEnihlAN2Aq+dLOFN/b8TIE2LiOWoyjSAWvVT8z+tF0r3ox9QPI0l8PD/kRgzKAKYNwQHlBEs2UQRhTtVfIR4hjrBUPRZUCeZi5GXSrlXNs2rtl5s1LM68uAIHIMSME5aIBr0AQtgMEjeAav4E170l60d+1jPprTsp1D8Afa5w/jaJzD</latexit>aN−1
<latexit sha1_base64="v+f/aYPxsRvWslafPnFr4Gd9Ofc=">AB7nicbVBNS8NAEJ34WetX1aOXxSJ4sS1oMeCF09SwX5AG8pku2mXbjZhdyOU0B/hxYMiXv093vw3btsctPXBwO9GWbmBYng2rjut7O2vrG5tV3YKe7u7R8clo6OWzpOFWVNGotYdQLUTHDJmoYbwTqJYhgFgrWD8e3Mbz8xpXksH80kYX6EQ8lDTtFYqY397P7Sm/ZLZbfizkFWiZeTMuRo9EtfvUFM04hJQwVq3fXcxPgZKsOpYNiL9UsQTrGIetaKjFi2s/m507JuVUGJIyVLWnIXP09kWGk9SQKbGeEZqSXvZn4n9dNTXjZ1wmqWGSLhaFqSAmJrPfyYArRo2YWIJUcXsroSNUSI1NqGhD8JZfXiWtasW7qlQfauV6LY+jAKdwBhfgwTXU4Q4a0AQKY3iGV3hzEufFeXc+Fq1rTj5zAn/gfP4AsjWPGg=</latexit>a2
<latexit sha1_base64="Ec0V4M5VJcS9H1HCKGU89B9fOjI=">AB7HicbVBNS8NAEJ34WetX1aOXxSJ4Kkt6LHgxWMF0xbaUDbSbt0swm7G6GE/gYvHhTx6g/y5r9x2+agrQ8GHu/NMDMvTAXxnW/nY3Nre2d3dJef/g8Oi4cnLa1kmGPosEYnqhlSj4BJ9w43AbqQxqHATji5m/udJ1SaJ/LRTFMYjqSPOKMGiv5dJDXZ4NK1a25C5B14hWkCgVag8pXf5iwLEZpmKBa9zw3NUFOleFM4KzczSmlE3oCHuWShqjDvLFsTNyaZUhiRJlSxqyUH9P5DTWehqHtjOmZqxXvbn4n9fLTHQb5FymUHJlouiTBCTkPnZMgVMiOmlCmuL2VsDFVlBmbT9mG4K2+vE7a9Zp3Xas/NKrNRhFHCc7hAq7Agxtowj20wAcGHJ7hFd4c6bw4787HsnXDKWbO4A+czx+rq46M</latexit>a3
<latexit sha1_base64="BP53D1IpypxAELhPBK/HQ3jhrnU=">AB7HicbVBNS8NAEJ3Ur1q/qh69LBbBU0nagh4LXjxWMG2hDWy3bZLN5uwuxFK6G/w4kERr/4gb/4bt20O2vpg4PHeDPzwkRwbVz32ylsbe/s7hX3SweHR8cn5dOzto5TRZlPYxGrboiaCS6Zb7gRrJsohlEoWCec3i38zhNTmsfy0cwSFkQ4lnzEKRor+TjI6vNBueJW3SXIJvFyUoEcrUH5qz+MaRoxahArXuem5gQ2U4FWxe6qeaJUinOGY9SyVGTAfZ8tg5ubLKkIxiZUsaslR/T2QYaT2LQtsZoZnodW8h/uf1UjO6DTIuk9QwSVeLRqkgJiaLz8mQK0aNmFmCVHF7K6ETVEiNzadkQ/DWX94k7VrVq1drD41Ks5HUYQLuIRr8OAGmnAPLfCBAodneIU3RzovzrvzsWotOPnMOfyB8/kDrTCOjQ=</latexit>aN
<latexit sha1_base64="xRKkQeskeVxEbjV/hwOgPVUwf30=">AB7HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lqQY8FL56kgv2ANpTNdtMu3WzC7kQob/BiwdFvPqDvPlv3LY5aOuDgcd7M8zMCxIpDLrut1PY2Nza3inulvb2Dw6PyscnbROnmvEWi2WsuwE1XArFWyhQ8m6iOY0CyTvB5Hbud564NiJWjzhNuB/RkRKhYBSt1KD7H42KFfcqrsAWSdeTiqQozkof/WHMUsjrpBJakzPcxP0M6pRMlnpX5qeELZhI54z1JFI278bHsjFxYZUjCWNtSBbq74mMRsZMo8B2RhTHZtWbi/95vRTDGz8TKkmRK7ZcFKaSYEzmn5Oh0JyhnFpCmRb2VsLGVFOGNp+SDcFbfXmdtGtV76pae6hXGvU8jiKcwTlcgfX0IA7aEILGAh4hld4c5Tz4rw7H8vWgpPnMIfOJ8/1jeOqA=</latexit>z(k)
2
<latexit sha1_base64="l+r6os6bQcS1OA6couVrhu6GD38=">AB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuLeix4MVjBfsh7VqyabYNTbJLkhXq0l/hxYMiXv053vw3pu0etPXBwO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8fXMbz9SpVk78wkpr7AQ8lCRrCx0v1Tv/qQlsfn036x5FbcOdAq8TJSgyNfvGrN4hIqg0hGOtu54bGz/FyjDC6bTQSzSNMRnjIe1aKrGg2k/nB0/RmVUGKIyULWnQXP09kWKh9UQEtlNgM9L3kz8z+smJrzyUybjxFBJFovChCMTodn3aMAUJYZPLMFEMXsrIiOsMDE2o4INwVt+eZW0qhXvolK9rZXqtSyOPJzAKZTBg0uow0oAkEBDzDK7w5ynlx3p2PRWvOyWaO4Q+czx8Z5Y/n</latexit>z(k)
N
<latexit sha1_base64="jTGIYhOg/XAOEgydYoK56E86Is=">AB8HicbVBNSwMxEJ31s9avqkcvwSLUS9mtBT0WvHiSCvZD2rVk02wbmSXJCvUpb/CiwdFvPpzvPlvTNs9aOuDgcd7M8zMC2LOtHdb2dldW19YzO3ld/e2d3bLxwcNnWUKEIbJOKRagdYU84kbRhmOG3HimIRcNoKRldTv/VIlWaRvDPjmPoCDyQLGcHGSvdPvZuHtDQ6m/QKRbfszoCWiZeRImSo9wpf3X5EkGlIRxr3fHc2PgpVoYRTif5bqJpjMkID2jHUokF1X46O3iCTq3SR2GkbEmDZurviRQLrcisJ0Cm6Fe9Kbif14nMeGlnzIZJ4ZKMl8UJhyZCE2/R32mKDF8bAkmitlbERlihYmxGeVtCN7iy8ukWSl75+XKbVYq2Zx5OAYTqAEHlxADa6hDg0gIOAZXuHNUc6L8+58zFtXnGzmCP7A+fwBRP2QAw=</latexit>z(k)
1
<latexit sha1_base64="IRIL4XR82P8cLtcx8gD5bpLihT0=">AB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuLeix4MVjBfsh7VqyabYNTbJLkhXq0l/hxYMiXv053vw3pu0etPXBwO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8fXMbz9SpVk78wkpr7AQ8lCRrCx0v1T3tIy+Pzab9YcivuHGiVeBkpQYZGv/jVG0QkEVQawrHWXc+NjZ9iZRjhdFroJZrGmIzxkHYtlVhQ7afzg6fozCoDFEbKljRorv6eSLHQeiIC2ymwGelbyb+53UTE175KZNxYqgki0VhwpGJ0Ox7NGCKEsMnlmCimL0VkRFWmBibUcG4C2/vEpa1Yp3Uane1kr1WhZHk7gFMrgwSXU4QYa0AQCAp7hFd4c5bw4787HojXnZDPH8AfO5w8YW4/m</latexit>z(k+1)
1
∼ P(a1|a1 = z(k)
1) =
exp(λr1(a1, z(k)
1))
P
a0
1 exp(λr1(a0
1, z(k) 1))
<latexit sha1_base64="v8eSzIJxnCyu8uoPFyQJWadB+I=">ACmXicfVHBThsxEPVuC6Wh0NBKvXCxiCoSAdEakOihSFRVK9RTUBtAygZr1vGCFXt3ZXurBnf/qd/SW/+m3iSVCkGMZOn5zTzP+E1SGFsFP0JwidPl5afrTxvrL5YW3/Z3Hh1bvJSM95nucz1ZQKGS5HxvhVW8stCc1CJ5BfJ+GOdv/jOtRF59s1OCj5UcJ2JVDCwnqLNX7eUXLn2eId0KhwboXCvDZT8jBXYmyR1UFG3R6rjf/fbq7uzNgOPsZxqoG5mP8o2rH0fUeANSX1G7v4YVGncrEpFXWwTYlvuiDdfkxLm62oG0DLwIyBy0jx5t/o5HOSsVzyTYMyARIUdOtBWMmrRlwaXgAbwzUfeJiB4mbops5W+K1nRjNtT+ZxVP2f4UDZcxEJb6yntfcz9XkQ7lBadN3QyeyorQ8Y7NGaSmxzXG9JjwSmjMrJx4A08LPitkNeKutX2bDm0Duf3kRnO93yUF3/+ywdRLN7VhBm2gLtRFBR+gEnaIe6iMWvAneB5+Cz+Fm+CE8Db/MSsNgrnmN7kT49S/rGMe5</latexit>T
<latexit sha1_base64="7zpftZzZ+yjWL4w30Iid2NJsf14=">AB6HicbVDLSgNBEOyNrxhfUY9eBoPgKexGUY8BLx4TyAuSJcxOepMxs7PLzKwQr7AiwdFvPpJ3vwbJ8keNLGgoajqprsrSATXxnW/ndzG5tb2Tn63sLd/cHhUPD5p6ThVDJsFrHqBFSj4BKbhuBnUQhjQKB7WB8P/fbT6g0j2XDTBL0IzqUPOSMGivVG/1iyS27C5B14mWkBlq/eJXbxCzNEJpmKBadz03Mf6UKsOZwFmhl2pMKBvTIXYtlTRC7U8Xh87IhVUGJIyVLWnIQv09MaWR1pMosJ0RNSO96s3F/7xuasI7f8plkhqUbLkoTAUxMZl/TQZcITNiYglitbCRtRZmx2RsCN7qy+ukVSl7V+VK/bpUvcniyMZnMleHALVXiAGjSBAcIzvMKb8+i8O/Ox7I152Qzp/AHzucPrVuM0A=</latexit>(A1 × · · · AN)|S|
<latexit sha1_base64="ilJSyN3QRI/O3VAeEuQtFDtocE=">ACJHicbVDJSgNBEO2JW4zbqEcvjUGIlzATAwpeIl48SUSzQCaGnk4nadKz0F0jhMl8jBd/xYsHFzx48VvsLEhMLGj68V4V9eq5oeAKLOvLSC0tr6yupdczG5tb2zvm7l5VBZGkrEIDEci6SxQT3GcV4CBYPZSMeK5gNbd/OdJrD0wqHvh3MAhZ0yNdn3c4JaCplnmeczwCPUpEfJG0bOwA95jCDm0HoL8Z7fr4Ph7+ErfJMGmZWStvjQsvAnsKsmha5Zb57rQDGnMByqIUg3bCqEZEwmcCpZknEixkNA+6bKGhj7RVprx+MgEH2mjTuB1M8HPGZnJ2LiKTXwXN05MqnmtRH5n9aIoHPWjLkfRsB8OlnUiQSGAI8Sw20uGQUx0IBQybVXTHtEgo614wOwZ4/eRFUC3n7JF+4KWZLxWkcaXSADlEO2egUldAVKqMKougRPaNX9GY8GS/Gh/E5aU0Z05l9KeM7x/D/KWB</latexit>t-th
<latexit sha1_base64="Z+7y9IsOCA/l+dTpqY+8VZaf1cA=">AB8nicbVDLSgNBEJz1GeMr6tHLYBC8GHYjqMeAF48RzAOSJcxOZrND5rHM9AphyWd48aCIV7/Gm3/jJNmDJhY0FXdHdFqeAWfP/bW1vf2NzaLu2Ud/f2Dw4rR8dtqzNDWYtqoU03IpYJrlgLOAjWTQ0jMhKsE43vZn7niRnLtXqEScpCSUaKx5wScFKvz2Sa5HAJyXRQqfo1fw68SoKCVFGB5qDy1R9qmkmgApibS/wUwhzYoBTwablfmZSuiYjFjPUks2E+P3mKz50yxLE2rhTgufp7IifS2omMXKckNhlbyb+5/UyiG/DnKs0A6boYlGcCQwaz/7HQ24YBTFxhFD3a2YJsQCi6lsgshWH5lbTrteCqVn+oVxvXRwldIrO0AUK0A1qoHvURC1EkUbP6BW9eC9eO/ex6J1zStmTtAfeJ8/f6yRWQ=</latexit>k
<latexit sha1_base64="4VMQ86Bt3gZG37t8tODNC56piw0=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqMeCF48t2FpoQ9lsJ+3azSbsboQS+gu8eFDEqz/Jm/GbZuDtj4YeLw3w8y8IBFcG9f9dgpr6xubW8Xt0s7u3v5B+fCoreNUMWyxWMSqE1CNgktsGW4EdhKFNAoEPgTj25n/8IRK81jem0mCfkSHkoecUWOl5rhfrhVdw6ySrycVCBHo1/+6g1ilkYoDRNU67nJsbPqDKcCZyWeqnGhLIxHWLXUkj1H42P3RKzqwyIGsbElD5urviYxGWk+iwHZG1Iz0sjcT/O6qQlv/IzLJDUo2WJRmApiYjL7mgy4QmbExBLKFLe3EjaijJjsynZELzl1dJu1b1Lq15mWlfpXHUYQTOIVz8OAa6nAHDWgBA4RneIU359F5cd6dj0VrwclnjuEPnM8f0DeM5w=</latexit>{zt,(k)
i
: S → Ai}N
i=1
<latexit sha1_base64="qnSAOSv5tGeQwm7nUl7yR6KzThg=">ACKHicbVDLSgMxFM34rPVdekmWAQFKTMqKIJYceNKtoqdMYhk6ZtaOZBckepw3yOG3/FjYgi3folZtqC2nogcDjnXHLv8SLBFZhmz5iYnJqemc3N5ecXFpeWCyurNRXGkrIqDUobz2imOABqwIHwW4jyYjvCXbjdc4y/+aeScXD4Bq6EXN80gp4k1MCWnILJ3byeJfADt7qbKcuP8K2T6BNiUiuUmxL3moDkTJ8+NFPdcxO3YQfW+ndhVsomiWzDzxOrCEpoiEqbuHNboQ09lkAVBCl6pYZgZMQCZwKlubtWLGI0A5psbqmAfGZcpL+oSne1EoDN0OpXwC4r/6eSIivVNf3dDJbV416mfifV4+hegkPIhiYAEdfNSMBYQZ63hBpeMguhqQqjkeldM20QSCrbvC7BGj15nNR2S9Zeafdyv1jeH9aRQ+toA20hCx2gMjpHFVRFD2hF/SOPoxn49X4NHqD6IQxnFlDf2B8fQM3Sqas</latexit>t ∈ [T, . . . , 1]
<latexit sha1_base64="iZaqTjckF1CTLeU+ZOUzVOSi/Vc=">AB/HicbVBNSwMxEM3Wr1q/qj16CRbBQym7VdRjwYvHCv2CdinZbNqGZpMlmRXKUv+KFw+KePWHePfmLZ70NYHA4/3ZpiZF8SCG3Ddbye3sbm1vZPfLeztHxweFY9P2kYlmrIWVULpbkAME1yFnAQrBtrRqJAsE4wuZv7nUemDVeyCdOY+REZST7klICVBsUS4D6XuNes9EWowFSw5w+KZbfqLoDXiZeRMsrQGBS/+qGiScQkUEGM6XluDH5KNHAq2KzQTwyLCZ2QEetZKknEjJ8ujp/hc6uEeKi0LQl4of6eSElkzDQKbGdEYGxWvbn4n9dLYHjrp1zGCTBJl4uGicCg8DwJHLNKIipJYRqbm/FdEw0oWDzKtgQvNWX10m7VvUuq7WHq3L9Oosj07RGbpAHrpBdXSPGqiFKJqiZ/SK3pwn58V5dz6WrTknmymhP3A+fwC+BJN/</latexit>t-th
<latexit sha1_base64="Z+7y9IsOCA/l+dTpqY+8VZaf1cA=">AB8nicbVDLSgNBEJz1GeMr6tHLYBC8GHYjqMeAF48RzAOSJcxOZrND5rHM9AphyWd48aCIV7/Gm3/jJNmDJhY0FXdHdFqeAWfP/bW1vf2NzaLu2Ud/f2Dw4rR8dtqzNDWYtqoU03IpYJrlgLOAjWTQ0jMhKsE43vZn7niRnLtXqEScpCSUaKx5wScFKvz2Sa5HAJyXRQqfo1fw68SoKCVFGB5qDy1R9qmkmgApibS/wUwhzYoBTwablfmZSuiYjFjPUks2E+P3mKz50yxLE2rhTgufp7IifS2omMXKckNhlbyb+5/UyiG/DnKs0A6boYlGcCQwaz/7HQ24YBTFxhFD3a2YJsQCi6lsgshWH5lbTrteCqVn+oVxvXRwldIrO0AUK0A1qoHvURC1EkUbP6BW9eC9eO/ex6J1zStmTtAfeJ8/f6yRWQ=</latexit>ω
<latexit sha1_base64="2KfkJ5rul2iORja3LVvySE+BvsU=">ACWnicbVHRatswFJXdtWnTtU3XvVFLBTal2CnY9tLIVAKfeygSQuxMbJy7YjKlitdjwXjn9xLKexXBpWTMNpkF4SOzrkHXR3FhRQGPe/FcTc+bG61tnfaux/39g86h59GRpWaw5ArqfRDzAxIkcMQBUp4KDSwLJZwHz9eNfr9T9BGqPwOZwWEGUtzkQjO0FJR5ynIGE7jpApUBimr26eW+BWtsHRxjqvrOqr+aYWo2urSUhwbDeV0mLVeRYgK89poEU6xZBe0nbU6Xo9b150HfhL0CXLuo06v4OJ4mUGOXLJjBn7XoFhxTQKLqFuB6WBgvFHlsLYwpxlYMJqHk1NTy0zoYnSduVI5+xbR8UyY2ZbDubwc2q1pD/08YlJt/DSuRFiZDzxUVJKSkq2uRMJ0IDRzmzgHEt7KyUT5lmHO1vNCH4q09eB6N+z7/o9X986Q6+LuPYJifkMzkjPvlGBuSG3JIh4eSZ/HW2nJbzx3XdHXd30eo6S8ReVfu8Ss2vrWF</latexit>Let be i.i.d. sampled from LSBRE induced by some unknown reward function. Suppose that is differentiable w.r.t. . Then as , with probability tending to 1, the equation has a root that tends to be the maximizer of joint likelihood. τ1, . . . , τM
<latexit sha1_base64="fJMHuYlAqvVA/0zpsPKqHtezo68=">AB/3icbVDLSgMxFM3UV62vUcGNm2ARXEiZqQVdFty4ESrYB3SGIZNJ29DMg+SOUMYu/BU3LhRx62+482/MtLPQ1gMhJ+fcy705fiK4Asv6Nkorq2vrG+XNytb2zu6euX/QUXEqKWvTWMSy5xPFBI9YGzgI1kskI6EvWNcfX+d+94FJxePoHiYJc0MyjPiAUwJa8swjB0jq2efYEUEMSt/5+9Yzq1bNmgEvE7sgVSg5ZlfThDTNGQRUEGU6tWAm5GJHAq2LTipIolhI7JkPU1jUjIlJvN9p/iU60EeBLfSLAM/V3R0ZCpSahrytDAiO16OXif14/hcGVm/EoSYFdD5okAoMc7DwAGXjIKYaEKo5HpXTEdEgo6soOwV78jLp1Gv2Ra1+16g2G0UcZXSMTtAZstElaqIb1EJtRNEjekav6M14Ml6Md+NjXloyip5D9AfG5w+QgpUf</latexit>πt
i(at i|at −i, st; ωi)
<latexit sha1_base64="ZlhT61+hJ5MEtNOXI/hNWwf40w=">ACGHicbVDLSgMxFM34tr6qLt0Ei6CgdaYWFNwIblwq2Ad06nAnzdTQzIPkjlDGfoYbf8WNC0XcuvNvzLRdaOuBhJNz7iX3Hj+RQqNtf1szs3PzC4tLy4WV1bX1jeLmVl3HqWK8xmIZq6YPmksR8RoKlLyZKA6hL3nD713mfuOBKy3i6Bb7CW+H0I1EIBigkbzisZuIO/TEPuT3oxsC3vtBgPzI7E4JDqOzynbhzyLnjiwCuW7LI9BJ0mzpiUyBjXvHL7cQsDXmETILWLcdOsJ2BQsEkHxTcVPMEWA+6vGVoBCHX7Wy42IDuGaVDg1iZEyEdqr87Mgi17oe+qcwH15NeLv7ntVIMztqZiJIUecRGHwWpBjTPCXaEYozlH1DgClhZqXsHhQwNFkWTAjO5MrTpF4pOyflyk21dFEdx7FEdsgu2ScOSUX5Ipckxph5Im8kDfybj1br9aH9TkqnbHGPdvkD6yvH5yFoA0=</latexit>ωi
<latexit sha1_base64="v6OMI4j1sZsCb2Y0PV8n/ZXQehE=">AB73icbVDLSgNBEOyNrxhfUY9eBoPgKezGgB4DXjxGMA9IljA7mU2GzGOdmRXCkp/w4kERr/6ON/GSbIHTSxoKq6e6KEs6M9f1vr7CxubW9U9wt7e0fHB6Vj0/aRqWa0BZRXOluhA3lTNKWZbTbqIpFhGnWhyO/c7T1QbpuSDnSY0FHgkWcwItk7q9pWgIzxg3LFr/oLoHUS5KQCOZqD8ld/qEgqLSEY2N6gZ/YMPaMsLprNRPDU0wmeAR7TkqsaAmzBb3ztCFU4YoVtqVtGih/p7IsDBmKiLXKbAdm1VvLv7n9VIb34QZk0lqSTLRXHKkVo/jwaMk2J5VNHMNHM3YrIGtMrIuo5EIVl9eJ+1aNbiq1u7rlUY9j6MIZ3AOlxDANTgDprQAgIcnuEV3rxH78V79z6WrQUvnzmFP/A+fwALjo/t</latexit>M → ∞
<latexit sha1_base64="pc7wL1BuI4Re4hT0Z+Y0knid5ZM=">AB83icbVDLSgNBEOyNrxhfUY9eBoPgKezGgB4DXrwIEcwDsiHMTmaTIbOzy0yvsCz5DS8eFPHqz3jzb5w8DpY0FBUdPdFSRSGHTdb6ewsbm1vVPcLe3tHxwelY9P2iZONeMtFstYdwNquBSKt1Cg5N1EcxoFkneCye3M7zxbUSsHjFLeD+iIyVCwShayb8nPsbEFyrEbFCuFV3DrJOvCWpwBLNQfnLH8YsjbhCJqkxPc9NsJ9TjYJPi35qeEJZRM64j1LFY246efzm6fkwipDEsbalkIyV39P5DQyJosC2xlRHJtVbyb+5/VSDG/6uVBJilyxaIwlcT+OQuADIXmDGVmCWVa2FsJG1NGdqYSjYEb/XldKuVb2rau2hXmnUl3EU4QzO4RI8uIYG3ETWsAgWd4hTcndV6cd+dj0VpwljOn8AfO5w9lmZE6</latexit>ω
<latexit sha1_base64="3pc2gfJncb0B7g8IMigP2o0O3o=">AB+HicbVDLSgMxFM34rPXRUZdugkVwVWZqQZcFNy4r2Ad0Ssmkd9rQZDIkGaEO/RI3LhRx6e482/MtLPQ1gOBwzn3ck9OmHCmjed9OxubW9s7u6W98v7B4VHFPT7paJkqCm0quVS9kGjgLIa2YZDL1FARMihG05vc7/7CEozGT+YWQIDQcYxixglxkpDtxIYiZhlAVSwJjMh27Vq3kL4HXiF6SKCrSG7lcwkjQVEBvKidZ930vMICPKMphXg5SDQmhUzKGvqUxEaAH2SL4HF9YZYQjqeyLDV6ovzcyIrSeidBO5jH1qpeL/3n91EQ3g4zFSWogpstDUcqxkThvAY+YAmr4zBJCFbNZMZ0QRaixXZVtCf7ql9dJp17zr2r1+0a12SjqKEzdI4ukY+uURPdoRZqI4pS9Ixe0Zvz5Lw4787HcnTDKXZO0R84nz8uwZNi</latexit>max
ω
EπE " N X
i=1
log exp(fωi(s, a)) exp(fωi(s, a)) + qθi(ai|s) # + Eqθ " N X
i=1
log qθi(ai|s) exp(fωi(s, a)) + qθi(ai|s) #
<latexit sha1_base64="W+IKNFZMnkevf4RkiEtOpr8Nk4=">ADOnicpVJLaxRBEO4ZX3F9ZKNHL42LsIuyzMSAXoSABHKSFdwksDMOPb01s016HumukSxj+7e8+CtyEXD4p49QfYOzu+koUcLOjm6+r6q6qriUQqPnTrulavXrt9Yu9m5dfvO3fXuxr09XVSKw5gXslAHMdMgRQ5jFCjhoFTAsljCfnz4cqHvwOlRZG/wXkJYcbSXCSCM7RUtOGMgowdR3bDWZzUQZFBysyH5Tmud0xU/9ZKYaIdQwMJCU4CXWVRLV745u0rSxUpDRLFeB3AcdlPovamSJi+fkJ/XcHMYGAud6GP6ZGVcQbYyHZ7rwcmUCKdYWjVDl2V4JH5U0cTay5LdtUr/5te1O15Q68xehH4LeiR1kZR9ySYFrzKIEcumdYT3ysxrJlCwSWYTlBpKBk/ZClMLMxZBjqsm6839JFlpjQplF050ob9O6JmdbzLaeiwr0eW1BrtImFSbPw1rkZYWQ8+VDSUpFnQxR3QqFHCUcwsYV8LmSvmM2ainbaObYJ/vuSLYG9z6D8dbr7e6m1vte1YIw/IQ9InPnlGtskuGZEx4c5H58z54nx1P7mf3W/u96Wr67Qx98k/5v74CXR5Dwg=</latexit>θ
<latexit sha1_base64="awNWjzbBKgs6iOFQku1bCtckQXU=">AB+HicbVDLSsNAFL3xWeujUZdugkVwVZJa0GXBjcsK9gFNKZPpB06mYSZG6GfokbF4q49VPc+TdO2iy09cDA4Zx7uWdOkAiu0XW/rY3Nre2d3dJef/g8KhiH590dJwqyto0FrHqBUQzwSVrI0fBeoliJAoE6wbT29zvPjKleSwfcJawQUTGkoecEjTS0K74EcFJEGY+ThiS+dCujV3AWedeAWpQoHW0P7yRzFNIyaRCqJ13MTHGREIaeCzct+qlC6JSMWd9QSKmB9ki+Ny5MrICWNlnkRnof7eyEik9SwKzGQeU696ufif108xvBlkXCYpMkmXh8JUOBg7eQvOiCtGUcwMIVRxk9WhE6IRdNV2ZTgrX5nXTqNe+qVr9vVJuNo4SnME5XIH19CEO2hBGyik8Ayv8GY9WS/Wu/WxHN2wip1T+APr8wdClJNv</latexit>max
θ
Eqθ " N X
i=1
log(Dωi(s, a)) − log(1 − Dωi(s, a)) # = Eqθ " N X
i=1
fωi(s, a) − log(qθi(ai|s)) #
<latexit sha1_base64="caMlnMsnskftwDSiVTw28Kk4A=">ADE3icpVLbtQwFHXCqwyvKSzZWIxAMxIdJaUSbCpVAiRWqEhMW2kcIsfjZKz6kdoOYhTMN7DhV9iwACG2bNjxNziTDCpTWHElW8fnmsf2zcrOTM2in4G4bnzFy5e2rjcu3L12vUb/c2bB0ZVmtAJUVzpowbypmkE8sp0elplhknB5mx4+b/OFrqg1T8qVdlDQRuJAsZwRbT6WbwQgigd+kyM6pxe/uNSs7z7L6qUvrFuf1iUtXsNU5BxGnuZ0iU4m0Zruxe/XcU6oYPvFlStACp8wNzX24KsRuNIJbrSbeWlOdFiHNirlNIJKViKjGqEe9LELvbn/sZf/29nK2Ena7dBI/ATfQvPbUdofRONoGfAsiDswAF3sp/0faKZIJai0hGNjpnFU2qTG2jLCqeuhytASk2Nc0KmHEgtqknr5pw7e9cwM5kr7IS1csqcraiyMWYjMK5t7mPVcQ/4tN61s/ipmSwrSyVpD8orDq2CTYPAGdOUWL7wABPNvFdI5lhjYn0b9fwjxOtXPgsOtsfxg/H2i53B3k73HBvgNrgDhiAGD8EeAb2wQSQ4H3wMfgcfAk/hJ/Cr+G3VhoGXc0t8EeE38BOML6+A=</latexit>Lantao Yu, Jiaming Song, Stefano Ermon. Multi-Agent Adversarial Inverse Reinforcement Learning. ICML 2019.
Poster: 06:30 -- 09:00 PM @ Pacific Ballroom #36
Department of Computer Science, Stanford University