efficient exploration by novelty pursuit ziniu li
play

Efficient Exploration by Novelty Pursuit Ziniu Li - PowerPoint PPT Presentation

Efficient Exploration by Novelty Pursuit Ziniu Li ziniuli@link.cuhk.edu.cn The Chinese University of Hong Kong, Shenzhen & Polixir Joint work with Xiong-Hui Chen, Nanjing University International Conference on Distributed Artificial


  1. Efficient Exploration by Novelty Pursuit Ziniu Li ziniuli@link.cuhk.edu.cn The Chinese University of Hong Kong, Shenzhen & Polixir Joint work with Xiong-Hui Chen, Nanjing University International Conference on Distributed Artificial Intelligence (DAI), 2020 October 12, 2020 Ziniu Li (CUHKSZ & Polixir) Efficient Exploration by Novelty Pursuit October 12, 2020 1 / 30

  2. Overview Introduction Proposed Method Experiment Conclusion Ziniu Li (CUHKSZ & Polixir) Efficient Exploration by Novelty Pursuit October 12, 2020 2 / 30

  3. Outline Introduction Proposed Method Experiment Conclusion Ziniu Li (CUHKSZ & Polixir) Efficient Exploration by Novelty Pursuit October 12, 2020 3 / 30

  4. <latexit sha1_base64="EeOpa/oL5zGoQgAt0eCud3mg+Os=">AC1HicjVHLSsNAFD3GV62PRl26CRbBVUlE0GXRjcsK9gFtKZN0WkPTJEwmaqldiVt/wK1+k/gH+hfeGVNQi+iEJGfOPefO3HvdOPATaduvc8b8wuLScm4lv7q2vlEwN7dqSZQKj1e9KIhEw2UJD/yQV6UvA96IBWdDN+B1d3Cq4vUrLhI/Ci/kKObtIeuHfs/3mCSqYxZakt9IKceCXzPRnXTMol2y9bJmgZOBIrJVicwXtNBFBA8phuAIQkHYEjoacKBjZi4NsbECUK+jnNMkCdvSipOCkbsgL592jUzNqS9yplot0enBPQKclrYI09EOkFYnWbpeKozK/a3GOdU91tRH83yzUkVuKS2L98U+V/faoWiR6OdQ0+1RrRlXnZVlS3RV1c+tLVZIyxMQp3KW4IOxp57TPlvYkunbVW6bjb1qpWLX3Mm2Kd3VLGrDzc5yzoHZQcuySc35YLJ9ko85hB7vYp3keoYwzVFDVM3/E56NmnFr3Bn3n1JjLvNs49syHj4Anj+WSA=</latexit> <latexit sha1_base64="EeOpa/oL5zGoQgAt0eCud3mg+Os=">AC1HicjVHLSsNAFD3GV62PRl26CRbBVUlE0GXRjcsK9gFtKZN0WkPTJEwmaqldiVt/wK1+k/gH+hfeGVNQi+iEJGfOPefO3HvdOPATaduvc8b8wuLScm4lv7q2vlEwN7dqSZQKj1e9KIhEw2UJD/yQV6UvA96IBWdDN+B1d3Cq4vUrLhI/Ci/kKObtIeuHfs/3mCSqYxZakt9IKceCXzPRnXTMol2y9bJmgZOBIrJVicwXtNBFBA8phuAIQkHYEjoacKBjZi4NsbECUK+jnNMkCdvSipOCkbsgL592jUzNqS9yplot0enBPQKclrYI09EOkFYnWbpeKozK/a3GOdU91tRH83yzUkVuKS2L98U+V/faoWiR6OdQ0+1RrRlXnZVlS3RV1c+tLVZIyxMQp3KW4IOxp57TPlvYkunbVW6bjb1qpWLX3Mm2Kd3VLGrDzc5yzoHZQcuySc35YLJ9ko85hB7vYp3keoYwzVFDVM3/E56NmnFr3Bn3n1JjLvNs49syHj4Anj+WSA=</latexit> <latexit sha1_base64="EeOpa/oL5zGoQgAt0eCud3mg+Os=">AC1HicjVHLSsNAFD3GV62PRl26CRbBVUlE0GXRjcsK9gFtKZN0WkPTJEwmaqldiVt/wK1+k/gH+hfeGVNQi+iEJGfOPefO3HvdOPATaduvc8b8wuLScm4lv7q2vlEwN7dqSZQKj1e9KIhEw2UJD/yQV6UvA96IBWdDN+B1d3Cq4vUrLhI/Ci/kKObtIeuHfs/3mCSqYxZakt9IKceCXzPRnXTMol2y9bJmgZOBIrJVicwXtNBFBA8phuAIQkHYEjoacKBjZi4NsbECUK+jnNMkCdvSipOCkbsgL592jUzNqS9yplot0enBPQKclrYI09EOkFYnWbpeKozK/a3GOdU91tRH83yzUkVuKS2L98U+V/faoWiR6OdQ0+1RrRlXnZVlS3RV1c+tLVZIyxMQp3KW4IOxp57TPlvYkunbVW6bjb1qpWLX3Mm2Kd3VLGrDzc5yzoHZQcuySc35YLJ9ko85hB7vYp3keoYwzVFDVM3/E56NmnFr3Bn3n1JjLvNs49syHj4Anj+WSA=</latexit> <latexit sha1_base64="EeOpa/oL5zGoQgAt0eCud3mg+Os=">AC1HicjVHLSsNAFD3GV62PRl26CRbBVUlE0GXRjcsK9gFtKZN0WkPTJEwmaqldiVt/wK1+k/gH+hfeGVNQi+iEJGfOPefO3HvdOPATaduvc8b8wuLScm4lv7q2vlEwN7dqSZQKj1e9KIhEw2UJD/yQV6UvA96IBWdDN+B1d3Cq4vUrLhI/Ci/kKObtIeuHfs/3mCSqYxZakt9IKceCXzPRnXTMol2y9bJmgZOBIrJVicwXtNBFBA8phuAIQkHYEjoacKBjZi4NsbECUK+jnNMkCdvSipOCkbsgL592jUzNqS9yplot0enBPQKclrYI09EOkFYnWbpeKozK/a3GOdU91tRH83yzUkVuKS2L98U+V/faoWiR6OdQ0+1RrRlXnZVlS3RV1c+tLVZIyxMQp3KW4IOxp57TPlvYkunbVW6bjb1qpWLX3Mm2Kd3VLGrDzc5yzoHZQcuySc35YLJ9ko85hB7vYp3keoYwzVFDVM3/E56NmnFr3Bn3n1JjLvNs49syHj4Anj+WSA=</latexit> <latexit sha1_base64="DpwzqE8QmuIvEwFykJmgJTAdULA=">AC1HicjVHLSsNAFD2Nr1ofjbp0EyCq5KIoMuiG5cV7APaUpLptA5Nk5BMxFK7Erf+gFv9JvEP9C+8M6agFtEJSc6ce86dufd6kS8SaduvOWNhcWl5Jb9aWFvf2CyaW9v1JExjxms9MO46bkJ90XAa1JInzejmLsjz+cNb3im4o1rHiciDC7lOKdkTsIRF8wVxLVNYtyW+klBOXKWLaNUt2dbLmgdOBkrIVjU0X9BGDyEYUozAEUAS9uEioacFBzYi4jqYEBcTEjrOMUWBvCmpOClcYof0HdCulbEB7VXORLsZneLTG5PTwj5QtLFhNVplo6nOrNif8s90TnV3cb097JcI2Ilroj9yzdT/tenapHo40TXIKimSDOqOpZlSXVX1M2tL1VJyhARp3CP4jFhp2zPlvak+jaVW9dHX/TSsWqPcu0Kd7VLWnAzs9xzoP6Ydmxy87FUalymo06j13s4YDmeYwKzlFTc/8EU94NurGrXFn3H9KjVzm2cG3ZTx8AI0/lkE=</latexit> <latexit sha1_base64="DpwzqE8QmuIvEwFykJmgJTAdULA=">AC1HicjVHLSsNAFD2Nr1ofjbp0EyCq5KIoMuiG5cV7APaUpLptA5Nk5BMxFK7Erf+gFv9JvEP9C+8M6agFtEJSc6ce86dufd6kS8SaduvOWNhcWl5Jb9aWFvf2CyaW9v1JExjxms9MO46bkJ90XAa1JInzejmLsjz+cNb3im4o1rHiciDC7lOKdkTsIRF8wVxLVNYtyW+klBOXKWLaNUt2dbLmgdOBkrIVjU0X9BGDyEYUozAEUAS9uEioacFBzYi4jqYEBcTEjrOMUWBvCmpOClcYof0HdCulbEB7VXORLsZneLTG5PTwj5QtLFhNVplo6nOrNif8s90TnV3cb097JcI2Ilroj9yzdT/tenapHo40TXIKimSDOqOpZlSXVX1M2tL1VJyhARp3CP4jFhp2zPlvak+jaVW9dHX/TSsWqPcu0Kd7VLWnAzs9xzoP6Ydmxy87FUalymo06j13s4YDmeYwKzlFTc/8EU94NurGrXFn3H9KjVzm2cG3ZTx8AI0/lkE=</latexit> <latexit sha1_base64="DpwzqE8QmuIvEwFykJmgJTAdULA=">AC1HicjVHLSsNAFD2Nr1ofjbp0EyCq5KIoMuiG5cV7APaUpLptA5Nk5BMxFK7Erf+gFv9JvEP9C+8M6agFtEJSc6ce86dufd6kS8SaduvOWNhcWl5Jb9aWFvf2CyaW9v1JExjxms9MO46bkJ90XAa1JInzejmLsjz+cNb3im4o1rHiciDC7lOKdkTsIRF8wVxLVNYtyW+klBOXKWLaNUt2dbLmgdOBkrIVjU0X9BGDyEYUozAEUAS9uEioacFBzYi4jqYEBcTEjrOMUWBvCmpOClcYof0HdCulbEB7VXORLsZneLTG5PTwj5QtLFhNVplo6nOrNif8s90TnV3cb097JcI2Ilroj9yzdT/tenapHo40TXIKimSDOqOpZlSXVX1M2tL1VJyhARp3CP4jFhp2zPlvak+jaVW9dHX/TSsWqPcu0Kd7VLWnAzs9xzoP6Ydmxy87FUalymo06j13s4YDmeYwKzlFTc/8EU94NurGrXFn3H9KjVzm2cG3ZTx8AI0/lkE=</latexit> <latexit sha1_base64="DpwzqE8QmuIvEwFykJmgJTAdULA=">AC1HicjVHLSsNAFD2Nr1ofjbp0EyCq5KIoMuiG5cV7APaUpLptA5Nk5BMxFK7Erf+gFv9JvEP9C+8M6agFtEJSc6ce86dufd6kS8SaduvOWNhcWl5Jb9aWFvf2CyaW9v1JExjxms9MO46bkJ90XAa1JInzejmLsjz+cNb3im4o1rHiciDC7lOKdkTsIRF8wVxLVNYtyW+klBOXKWLaNUt2dbLmgdOBkrIVjU0X9BGDyEYUozAEUAS9uEioacFBzYi4jqYEBcTEjrOMUWBvCmpOClcYof0HdCulbEB7VXORLsZneLTG5PTwj5QtLFhNVplo6nOrNif8s90TnV3cb097JcI2Ilroj9yzdT/tenapHo40TXIKimSDOqOpZlSXVX1M2tL1VJyhARp3CP4jFhp2zPlvak+jaVW9dHX/TSsWqPcu0Kd7VLWnAzs9xzoP6Ydmxy87FUalymo06j13s4YDmeYwKzlFTc/8EU94NurGrXFn3H9KjVzm2cG3ZTx8AI0/lkE=</latexit> <latexit sha1_base64="b3TGODdXTClhWpMK20HD13T4Yg=">AC2XicjVHLSsNAFD2Nr1pf8bFzEyCq5KIoMuiG5cV7APaUpJ0WkPzYjIp1tKFO3HrD7jVHxL/QP/CO2MKahGdkOTMufecmXuvE/teIkzNafNzS8sLuWXCyura+sb+uZWLYlS7rKqG/kRbzh2wnwvZFXhCZ81Ys7swPFZ3RmcyXh9yHjiReGlGMWsHdj90Ot5ri2I6ug7LcGuhRDjyEkYHyp20tGLZslUy5gFVgaKyFYl0l/QhcRXKQIwBCEPZhI6GnCQsmYuLaGBPHCXkqzjBgbQpZTHKsIkd0LdPu2bGhrSXnolSu3SKTy8npYF90kSUxwnL0wVT5WzZH/zHitPebcR/Z3MKyBW4IrYv3TzP/qZC0CPZyoGjyqKVaMrM7NXFLVFXlz40tVghxi4iTuUpwTdpVy2mdDaRJVu+ytreJvKlOycu9muSne5S1pwNbPc6C2mHJMkvWxVGxfJqNOo9d7OGA5nmMs5RQZW8b/CIJzxrTe1Wu9PuP1O1XKbZxrelPXwAncaYoQ=</latexit> <latexit sha1_base64="b3TGODdXTClhWpMK20HD13T4Yg=">AC2XicjVHLSsNAFD2Nr1pf8bFzEyCq5KIoMuiG5cV7APaUpJ0WkPzYjIp1tKFO3HrD7jVHxL/QP/CO2MKahGdkOTMufecmXuvE/teIkzNafNzS8sLuWXCyura+sb+uZWLYlS7rKqG/kRbzh2wnwvZFXhCZ81Ys7swPFZ3RmcyXh9yHjiReGlGMWsHdj90Ot5ri2I6ug7LcGuhRDjyEkYHyp20tGLZslUy5gFVgaKyFYl0l/QhcRXKQIwBCEPZhI6GnCQsmYuLaGBPHCXkqzjBgbQpZTHKsIkd0LdPu2bGhrSXnolSu3SKTy8npYF90kSUxwnL0wVT5WzZH/zHitPebcR/Z3MKyBW4IrYv3TzP/qZC0CPZyoGjyqKVaMrM7NXFLVFXlz40tVghxi4iTuUpwTdpVy2mdDaRJVu+ytreJvKlOycu9muSne5S1pwNbPc6C2mHJMkvWxVGxfJqNOo9d7OGA5nmMs5RQZW8b/CIJzxrTe1Wu9PuP1O1XKbZxrelPXwAncaYoQ=</latexit> <latexit sha1_base64="b3TGODdXTClhWpMK20HD13T4Yg=">AC2XicjVHLSsNAFD2Nr1pf8bFzEyCq5KIoMuiG5cV7APaUpJ0WkPzYjIp1tKFO3HrD7jVHxL/QP/CO2MKahGdkOTMufecmXuvE/teIkzNafNzS8sLuWXCyura+sb+uZWLYlS7rKqG/kRbzh2wnwvZFXhCZ81Ys7swPFZ3RmcyXh9yHjiReGlGMWsHdj90Ot5ri2I6ug7LcGuhRDjyEkYHyp20tGLZslUy5gFVgaKyFYl0l/QhcRXKQIwBCEPZhI6GnCQsmYuLaGBPHCXkqzjBgbQpZTHKsIkd0LdPu2bGhrSXnolSu3SKTy8npYF90kSUxwnL0wVT5WzZH/zHitPebcR/Z3MKyBW4IrYv3TzP/qZC0CPZyoGjyqKVaMrM7NXFLVFXlz40tVghxi4iTuUpwTdpVy2mdDaRJVu+ytreJvKlOycu9muSne5S1pwNbPc6C2mHJMkvWxVGxfJqNOo9d7OGA5nmMs5RQZW8b/CIJzxrTe1Wu9PuP1O1XKbZxrelPXwAncaYoQ=</latexit> <latexit sha1_base64="b3TGODdXTClhWpMK20HD13T4Yg=">AC2XicjVHLSsNAFD2Nr1pf8bFzEyCq5KIoMuiG5cV7APaUpJ0WkPzYjIp1tKFO3HrD7jVHxL/QP/CO2MKahGdkOTMufecmXuvE/teIkzNafNzS8sLuWXCyura+sb+uZWLYlS7rKqG/kRbzh2wnwvZFXhCZ81Ys7swPFZ3RmcyXh9yHjiReGlGMWsHdj90Ot5ri2I6ug7LcGuhRDjyEkYHyp20tGLZslUy5gFVgaKyFYl0l/QhcRXKQIwBCEPZhI6GnCQsmYuLaGBPHCXkqzjBgbQpZTHKsIkd0LdPu2bGhrSXnolSu3SKTy8npYF90kSUxwnL0wVT5WzZH/zHitPebcR/Z3MKyBW4IrYv3TzP/qZC0CPZyoGjyqKVaMrM7NXFLVFXlz40tVghxi4iTuUpwTdpVy2mdDaRJVu+ytreJvKlOycu9muSne5S1pwNbPc6C2mHJMkvWxVGxfJqNOo9d7OGA5nmMs5RQZW8b/CIJzxrTe1Wu9PuP1O1XKbZxrelPXwAncaYoQ=</latexit> Reinforcement Learning ◮ RL is a learning paradigm that an agent interacts with the unknown environment to find the optimal decisions. action reward observation ◮ RL directly learns from stochastic feedbacks ( s, a, r, s ′ ) . Ziniu Li (CUHKSZ & Polixir) Efficient Exploration by Novelty Pursuit October 12, 2020 4 / 30

  5. Exploration v.s. Exploitation ◮ In an unknown environment, the agent is uncertain about the possible outcomes. ◮ Exploration : investigate the unknown actions which may bring large returns or unexpected losses. ◮ Exploitation : implement the well-known but possibly sub-optimal actions. Ziniu Li (CUHKSZ & Polixir) Efficient Exploration by Novelty Pursuit October 12, 2020 5 / 30

  6. Towards Efficient Exploration ◮ Simple exploration strategies based on “dithering” methods are inefficient. • ǫ -greedy and Boltzmann strategy require almost O (2 N ) samples to make progress on deep-sea. [Osband et al., 2019]. Figure 2: Deep-sea. Figure from [Osband et al., 2019]. There are only two actions and the reward is released at the most bottom right corner. Ziniu Li (CUHKSZ & Polixir) Efficient Exploration by Novelty Pursuit October 12, 2020 6 / 30

  7. Towards Efficient Exploration ◮ Theoretically, efficient exploration requires to “write-off” the known and inferior actions. ◮ There are general two principles borrowed from bandits: optimism in the face of uncertainty ( OFU ) and Thompson sampling ( TS ). • OFU : add “reward bonus” by constructing upper confidence intervals [Stadie et al., 2015, Pathak et al., 2017, Burda et al., 2019b]. • TS : sample the plausible actions from the iteratively updated posterior distribution [Osband et al., 2016a,b, O’Donoghue et al., 2018]. Ziniu Li (CUHKSZ & Polixir) Efficient Exploration by Novelty Pursuit October 12, 2020 7 / 30

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend