probing emergent semantics in predictive agents via
play

Probing Emergent Semantics in Predictive Agents via Question - PowerPoint PPT Presentation

Probing Emergent Semantics in Predictive Agents via Question Answering Link to slides with playable videos: bit.ly/3iKYJd3 Abhishek Fede Hamza Laura Rosalia Das* 1 Carnevale* Merzic Rimell Schneider Josh Alden Arjun Stephen Greg


  1. Probing Emergent Semantics in Predictive Agents via Question Answering Link to slides with playable videos: bit.ly/3iKYJd3 Abhishek Fede Hamza Laura Rosalia Das* 1 Carnevale* Merzic Rimell Schneider Josh Alden Arjun Stephen Greg Felix Abramson Hung Ahuja Clark Wayne Hill * Denotes equal contribution. 1 Now at Facebook AI Research. Work done during an internship at DeepMind.

  2. Self-supervised representation learning Reinforcement Language Vision Learning http://jalammar.github.io/illustrated-bert Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv:1810.04805 (2018). Pathak, Deepak, et al. "Context encoders: Feature learning by inpainting." CVPR 2016. Pathak, Deepak, et al. "Learning features by watching objects move." CVPR 2017. Wayne, Greg, et al. "Unsupervised predictive memory in a goal-directed agent." arXiv:1803.10760 (2018). Ha, David, and Jürgen Schmidhuber. "World models." arXiv:1803.10122 (2018).

  3. How much objective knowledge about the external world can be learned through egocentric prediction?

  4. Question-answering (in English) as an evaluation tool for investigating how much environment knowledge is encoded in an agent’s internal representation Intuitive: simply ask an agent what it knows about its world ● and get an answer back Open-ended: pose arbitrarily complex questions to an agent ●

  5. Environment 5

  6. Environment Environment Unity-based; runs at 30 fps ● 96 x 72 RGB first-person view ● 50 objects types ● 10 colors 3 sizes Agent First person view ● 8-D action space: ● Move-{forward, backward, left, right} Look-{up, down, left, right} 6

  7. Training task: Exploration +1 reward for unvisited object 0 reward for visited object rewards refresh once all visited 7 Top-down view shown for illustration purposes. The agent only has access to first-person observations.

  8. Training task: Exploration +1 reward for unvisited object 0 reward for visited object rewards refresh once all visited 8 Top-down view shown for illustration purposes. The agent only has access to first-person observations.

  9. Evaluation probe: Question-answering What is the color of the bed? How many wardrobes are there? What is the object near the bed? Is there a basketball in the room? ... 9 Top-down view shown for illustration purposes. The agent only has access to first-person observations.

  10. Evaluation probe: Question-answering Questions are programmatically generated in a manner similar to CLEVR (Johnson et al., 2017) 10

  11. Evaluation probe: Question-answering Gradients from question-answering are not backpropagated into the agent. 11 Top-down view shown for illustration purposes. The agent only has access to first-person observations.

  12. Setup How many wardrobes are there? What is the color of the bed? What is the object near the bed? Is there a basketball in the room? ... (i) (ii) During training, the agent explores During evaluation, we probe the agent’s and learns to build representations internal representations on a from egocentric observations question-answering task 12 Top-down view shown for illustration purposes. The agent only has access to first-person observations.

  13. Agent architecture 13

  14. Agent architecture 14

  15. Agent architecture 15

  16. Agent architecture 16

  17. Agent architecture 17

  18. Agent architecture 18

  19. Agent architecture 19

  20. Agent architecture 20

  21. Agent architecture Action-conditioned forward prediction ● Multiple steps into the future ● Self-supervised ● 21

  22. Agent architecture 22

  23. Agent architecture Gradients from the question-answering decoder not backpropagated into the agent 23

  24. Agent architecture 24

  25. Baselines and Oracle Baselines 25

  26. Baselines and Oracle Baselines Question-only: no vision ● 26

  27. Baselines and Oracle Baselines X Question-only: no vision ● LSTM: no auxiliary predictive loss ● 27

  28. Baselines and Oracle Baselines Question-only: no vision ● LSTM: no auxiliary predictive loss ● Predictive losses CPC|A (Guo et al., 2018) ● SimCore (Gregor et al., 2019) ● 28

  29. Baselines and Oracle Baselines Question-only: no vision ● LSTM: no auxiliary predictive loss ● X Predictive losses CPC|A (Guo et al., 2018) ● SimCore (Gregor et al., 2019) ● Oracle No SG: QA decoder without stop gradient ● similar to Embodied / Interactive Question Answering (Das et al., 2018, Gordon et al., 2018) 29

  30. Results: shape questions Oracle SimCore CPC|A LSTM Question-only Training steps 30

  31. Results: overall 63% 60% Top-1 QA Accuracy 32% 31% 29% 31

  32. Results 32

  33. Q: What is the aquamarine object? A: Grinder Top-3 answer Answer predictions probabilities P(“Grinder”) 33

  34. Q: How many blue objects are there? A: One Top-3 answer predictions P(“One”) P(“Three”)

  35. Q: How many yellow objects are there? A: Four Top-3 answer predictions P(“two”) P(“four”) P(“Three”)

  36. Compositional generalization Train-Test split Question e n l p e e d r SimCore e u u r e l b p g r . . . ball grinder Answer chair … bed Seen: What shape is the blue object? Bed Seen: What shape is the green object? Ball Unseen: What shape is the green object? Bed 36

  37. Top-down map prediction 37

  38. Top-down map prediction Top-down Map 38

  39. Top-down map prediction 39

  40. Conclusions Question-answering to probe ● internal representations, enabling evaluation of agents using natural linguistic interactions. Self-supervised predictive agents, ● such as SimCore, capture decodable knowledge about the environment, while non-predictive agents and CPC|A don’t. Generalization of the decoder ● suggests some degree of compositionality in internal representations. arxiv.org/abs/2006.01016 ●

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend