what action causes this towards naive physical action
play

What Action Causes This? Towards Naive Physical Action-Effect - PowerPoint PPT Presentation

What Action Causes This? Towards Naive Physical Action-Effect Prediction Qiaozi Gao 1 , Shaohua Yang 1 , Joyce Y. Chai 1 , Lucy Vanderwende 2 1 Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824 2


  1. What Action Causes This? Towards Naive Physical Action-Effect Prediction Qiaozi Gao 1 , Shaohua Yang 1 , Joyce Y. Chai 1 , Lucy Vanderwende 2 1 Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824 2 Microsoft Research, Redmond, WA 98052 1

  2. Motivation • What action causes this? 2

  3. Motivation • What is the result state of “open box”? 3

  4. Understanding Cause-Effect From: cde.ca.gov. (California Department of Education) 4

  5. Naïve Physical Action-Effect Prediction ? ? Action to Effect Action (squeeze-bottle) ? ? 5

  6. Naïve Physical Action-Effect Prediction ⩗ x Action to Effect Action (squeeze-bottle) x ⩗ 6

  7. Naïve Physical Action-Effect Prediction Effect to Action Action Action ? ? (peel-carrot) (juice-carrot) Action Action ? ? (grate-carrot) (chop-carrot) 7

  8. Naïve Physical Action-Effect Prediction Effect to Action Action Action x x (peel-carrot) (juice-carrot) Action Action x ⩗ (grate-carrot) (chop-carrot) 8

  9. Related Work • The NLP community – Most existing studies focus on the causal relations between high-level events. E.g., “the collapse of the housing bubble” causes the effect of “stock prices to fall”. (Yang and Mao, 2014; Sharp et al., 2016) – This paper studies the basic cause-effect knowledge related to concrete actions and their effects to the world. • Recent advances in Computer Vision and Robotics – Object physical state prediction (Zhou and Berg, 2016; Wu et al., 2017) – Action recognition through detection of state changes (Yang et al., 2013) – Robot following natural language commands (She et al, 2014; Misra et al., 2015) 9

  10. This Work • Introduce a new task on physical action-effect prediction and create a dataset for this task. – Data collection and analysis • Propose an approach that harnesses the large amount of image data available on the web with minimum supervision. – Web images acquisition – Bootstrapping strategy • Automatic prediction of effect knowledge for novel actions. 10

  11. Action-Effect Data • Actions (Verb-Noun Pairs) – 140 verb-noun pairs – 62 unique verbs (e.g., bend, boil, chop, crack, fold, grind, ignite, kick, peel, soak, trim) – 39 unique nouns (e.g., apple, baseball, book, car, chair, cup, flower, orange, shoe) • Effects – Effects described in language – Effects depicted by images 11

  12. Effects Described in Language • Action effect is often presupposed in our communication and not explicitly stated. • Crowd-sourcing data collection – Workers were shown a verb-noun pair, and were asked to describe what changes might occur to the object as a result of the action. – 1400 effect descriptions (10 for each verb-noun pair) – Examples: 12

  13. Effects Depicted by Images • Human labeled image set: 4163 images (Data available on the project webpage.) – Positive images are those capturing the resulting world state of the action. – Negative images are those deemed to capture some state of the related nouns, but are not the resulting state of the corresponding action. Action: Fry-Egg Positive: Negative: 13

  14. Web Search Images • Searching keywords : phrases extracted from language effect descriptions – Phrases were extracted using syntactic patterns: book � book is set aflame � book is on fire � BAD 14

  15. Bootstrapping Approach Web Search Images Prediction ResNet 0.1 Action 1 Bootstrapping 0.2 Action 2 Cross-Entropy … … Loss 0.5 Action C Seeding Images Prediction 0.3 Action 1 0.2 Action 2 Cross-Entropy … … Loss 0.2 Action C Cross-entropy loss: Bootstrapping cross-entropy loss: (Reed et al., 2014) 15

  16. Evaluations • Human annotated image data : use 10% as seeding images (training), 30% for development and 60% for test. – On average, each verb-noun pair only has 3 seeding images • Web search images : over 60,000 images were downloaded using around 2,000 effect phrases as searching keywords. • Methods for comparison – Seed – Seed+Act+Eff – BS+Seed+Act+Eff BS: bootstrapping approach; Seed: seed images; Act: web images downloaded using verb-noun as keywords; Eff: web images downloaded using effect phrases as keywords. 16

  17. Evaluation Results Action to Effect: MAP Top 5 Accuracy 0.29 0 .3 0 .8 0.75 0.247 0.679 0 .2 5 0 .7 0.629 0 .2 0.182 0 .6 0 .1 5 0 .5 Effect to Action: Micro F1 Score Macro F1 Score 0.272 0 .3 0 .3 0.212 0.207 0.189 0 .2 0 .2 0.093 0 .1 0 .1 0.062 0 0 Seed Seed+Act+Eff BS+Seed+Act+Eff 17

  18. Examples Top Action Top Action Predictions Predictions bite apple fry egg background background cut apple crack egg peel apple mix eggs background background chop carrot insert key grate carrot close drawer peel carrot fasten door background pile books cut potato background fry potato wrap book mash potato roll paper 18

  19. Examples Top Action Top Effect Top Action Top Effect Predictions Predictions Predictions Predictions bite apple apple is eaten fry egg egg into a harder background apple is being cut background substance cut apple apple is chewed crack egg cup into smaller pieces peel apple apple in tiny pieces mix eggs egg edible background carrot into tiny pieces background key in the keyhole chop carrot carrot is being cut insert key drawer without a key grate carrot carrot into many smaller close drawer door is locked peel carrot pieces fasten door door is being bolted background potato into a pot pile books books in a stack cut potato potato is being sliced background book on fry potato potato for wrap book books in a large stack mash potato potato edible roll paper books in a pile 19

  20. Examples eggs beat eggs Action AP beat eggs 0.783 pile boxes 0.766 bite apple apple bite apple 0.484 slice onion 0.470 20

  21. Examples eggs beat eggs Action AP beat eggs 0.783 pile boxes 0.766 bite apple apple bite apple 0.484 slice onion 0.470 Action AP shirt stain shirt crack glass 0.047 lock drawer 0.037 close window window stain shirt 0.023 close window 0.087 21

  22. Handling Unseen Verb-Noun Pairs • Generalize effect knowledge to new verb-noun pairs through an embedding model. Action-Effect Embedding trained from seed knowledge 22

  23. Handling Unseen Verb-Noun Pairs • Generalize effect knowledge to new verb-noun pairs through an embedding model. A New Action Effect phrases paper is being charred, (ignite-paper) paper is being burned, paper is set, paper is being destroyed, paper is lit Web Search Images Prediction ResNet 0.1 Action 1 Bootstrapping 0.2 Action 2 Cross-Entropy … … Loss 0.5 Action C Seeding Images Prediction 0.3 Action 1 0.2 Action 2 Cross-Entropy … … Loss 0.2 Action C Action-Effect Embedding trained from seed knowledge 23

  24. Evaluation Results Action to Effect: MAP Top 5 Accuracy 1 0 .6 0.928 0.529 0.507 0.893 0.786 0.369 0 .8 0 .4 0 .2 0 .6 0 0 .4 (p<0.05) Effect to Action: Micro F1 Score Macro F1 Score 0 .6 0 .6 0.516 0.472 0.392 0.378 0 .4 0 .4 0.278 0.179 0 .2 0 .2 0 0 (p<0.05) Seed BS+Seed+Act+pEff BS+Seed+Act+Eff pEff: web images downloaded using the predicted effect phrases. 24 924 954

  25. Action-Effect Embedding Space GloVe Verb GloVe Verb + Noun Action-Effect 25

  26. Action-Effect Embedding Space GloVe Verb GloVe Verb + Noun Action-Effect bind bind coil coil bend bend bend twist bind twist knot knot coil knot twist 26

  27. Action-Effect Embedding Space GloVe Verb GloVe Verb + Noun Action-Effect bind bind coil grate coil grind bend bend bend grind twist bind twist knot knot coil knot twist grind grate grate 27

  28. Action-Effect Embedding Space GloVe Verb GloVe Verb + Noun Action-Effect bind lock bind coil lock fasten grate coil grind lock fasten bend bend bend grind twist bind twist knot knot coil knot twist fasten grind grate grate 28

  29. Action-Effect Embedding Space GloVe Verb GloVe Verb + Noun Action-Effect tear bind lock bind coil lock fasten grate coil grind lock fasten bend bend crack crack crack break bend grind twist tear break bind twist knot knot tear coil knot twist break fasten grind grate grate 29

  30. Action-Effect Embedding Space GloVe Verb GloVe Verb + Noun Action-Effect crop tear bind lock bind coil lock fasten grate coil trim grind lock fasten bend bend crack crack trim crack break bend grind twist tear break bind twist knot knot tear coil knot trim twist crop crop break fasten grind grate grate 30

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend