Overview Esta es una naranja atrac1va: Adventures in Adap1ng an - PDF document

4/22/19 Overview Esta es una naranja atrac1va: Adventures in Adap1ng an English • Research Goal • Introduc1on to Grounded Language Language Grounding System to • Prior Works Non-English Data • Methods • Results By: Caroline Kery CommiGee: • Conclusion and Future Work Dr. Cynthia Matuszek Dr. Frank Ferraro Dr. Timothy Oates 1 2 Research Goal What is Grounded Language Acquisi1on? Take a grounded language acquisi1on system • Tying language to the real world and adapt it to non-English data What does “cat” mean? hGp://www.petmd.com/sites/default/files/what-does-it-mean-when-cat-wags-tail.jpg hGps://images.immediate.co.uk/vola1le/sites/4/2018/08/iStock_000044061370_Medium-fa5f8aa.jpg? 3 4 quality=45&crop=5px,17px,929px,400px&resize=960,413 hGp://www.royalcanin.ca/~/media/Royal-Canin-Canada/Product-Categories/cat-adult-landing-hero.ashx hGps://www.akc.org/wp-content/themes/akc/component-library/assets/img/welcome.jpg Why is it important? The English-centric Problem • Robots can learn from users • A common problem in Natural Language Processing (NLP), systems are oien designed • Adaptable to new situa1ons with English in mind • Lots of materials available for English systems, not as much for others 5 6 1

4/22/19 The English-Centric Problem Related works • Robo$c assistants should be accessible to • Grounded language – Grounding ac1ons (e.g. Kollar et al.), direc1ons (e.g. Matuszek et al. non-English-speakers! 2012), some1mes mul1lingual (e.g. Chen et al. 2010) • Computer Vision – Object recogni1on (e.g. Bo et al. 2011), image cap1oning (e.g. Gella et al. 2017) • Mul1lingual Natural Language Processing – Machine transla1on (e.g. Wu et al. 2016), system adapta1ons (e.g. Poesio et al. 2010) 7 8 The Grounded Language System My Research Goal (Pillai et al. RSS 2016) Take a grounded language acquisi1on system and adapt it to non-English data 9 10 Methods Methods • Analysis with Spanish and Hindi • Analysis with Spanish and Hindi – Started with Google Translate data An Indo-Iranian language • Iden1fied adapta1ons (primarily preprocessing) – Collected new crowd-sourced descrip1ons A Romance Language • Analyzed differences across languages with real data Map from the Washington Post Website: hGps://www.washingtonpost.com/ pbox.php?url=hGp:// www.washingtonpost.com/blogs/ worldviews/files/2015/04/Screen- Shot-2015-04-23-at-9.04.22- AM.png&w=1484&op=resize&opt=1&filter=a 11 12 n1alias&t=20170517 2

4/22/19 Google Translated Data Google Translated Data • Checked transla1on accuracy: back-transla1on • Overall scores comparable 13 14 Adjec1ve-Noun Agreement Necessary modifica1on: Stemming Hindi Spanish • Lemma1zer: • (Simple) Stemmer: baked -> bake baked -> bak baking -> bake baking -> bak runs -> run runs -> run running -> run running -> runn Lemma$zers are hard to find outside of English 15 16 Impact of Stemming on GT Data Real Data Collec1on • Google translate data is an approxima1on • Doesn’t necessarily reflect real language data 17 18 3

4/22/19 Real Data Collec1on: Amazon Data Collec1on: Results Mechanical Turk • Around 6,000 descrip1ons were collected for each language • “Give 1 to 2 sentences describing the object” • No sample descrip1ons. 19 20 Results: lots of overlap but also some Data Collec1on: Results variety! • Final counts for Spanish and Hindi were smaller due to problema1c workers 21 22 Analysis: Some proper1es that could Overall Scores impact scores • Token count • Stop words • Nega1ve/Posi1ve Examples 23 24 4

4/22/19 Token Count Stop words • More tokens used in more specific contexts can raise the • Generic and low IDF (Inverse Document Frequency) overall scores Both General stop word only Low IDF stop word only Scores when problema1c workers who used lots of unrelated terms were not removed from the Hindi dataset 25 26 25 Stop words Stop words: Scores General stop word only Low IDF stop word only Both 27 28 Par1cular Tokens and Par1cular Tokens and posi1ve/nega1ve posi1ve/nega1ve Examples Examples English F1 Spanish F1 stemmed Count Score stemmed Count Score English F1 Spanish F1 cabbag 237 0.9297 col 28 0.8352 stemmed Count Score stemmed Count Score cabbag - - repoll 113 0.8294 yellow 562 0.8449 amarill 648 0.933 29 30 5

Overview Esta es una naranja atrac1va: Adventures in Adap1ng an - PDF document

4/22/19 Overview Esta es una naranja atrac1va: Adventures in Adap1ng an English Research Goal Introduc1on to Grounded Language Language Grounding System to Prior Works Non-English Data Methods Results By: Caroline Kery

December 2013 Rea l esta te d ea ls serv ices Rea l esta te d ea ls serv ices Target properties

Adventures in Elm GOTO Chicago, 24 May 2016 Adventures in Elm Events, Reproducibility, and

SMART LOGISTICS Una logstica nica. Una nica logstica. ABOUT US Smart Logistics SA is a

Fattori Causali di una Patologia Varianza Totale di una Malattia Multifattoriale Genetica

Dispositivi medici software una sfida globale: una nuova era dei dispositivi medici La sicurezza

Shared Ski Adventures Shared Ski Adventures Indoor Training Indoor Training Adaptive Equipment

Dr Stephen Crabbe September 17 th , 2014 The Adventures of Tom Sawyer (1876) Adventures of

Preparing the CMC section of IMPD for biological/ biotechnology derived substances Dr. Una Moore

The Adventures of NIKA2 from the AoD perspective Bilal Ladjelate - Observing the millimeter

8:00-9:45 AM Session One Recent Adventures in Engineering Ethics Andrew McAninch Assistant

The adventures of a Suricate in eBPF land . Leblond Stamus Networks Oct. 6, 2016 . Leblond

OUTDOOR TV NETWORK DK OUTDOOR ADVENTURES OUTDOOR TV NETWORK DELIVERS EXPOSURE TO OVER A MILLION

ADVENTURES IN TIME & SPACE Jim Royer Syracuse University Joint work with Norman Danner

Adventures! ! What did I learn last class? Use simple dictionaries Word Banks Word Walls Sound

Roto Orange A Roto a lightweight, collapsible, hand operated dryer for outdoor adventures 2

Android Malware Adventures Mert Can Cokuner Krat Ouzhan Aknc Android Malware

Efficient Generation and Processing of Word Co-occurrence Networks Using corpus2graph Zheng

of Geometric Concepts Uri Stemmer Ben-Gurion University joint work with Haim Kaplan, Yishay

Text Processing CS440 Text processing NLP tasks typically require multiple steps of text

DCU meets MET: Bengali and Hindi Morpheme Extraction Debasis Ganguly, Johannes Leveling, Gareth

Natural Language Processing CSCI 4152/6509 Lecture 9 Elements of Morphology Instructor:

Web Information Retrieval Lecture 2 Tokenization, Normalization, Speedup, Phrase Queries Recap

Introduction to Artificial Intelligence Natural Language Processing Janyl Jumadinova November

MPII at the NTCIR-14 CENTRE Task Andrew Yates Max Planck Institute for Informatics Motivation

Sambuz

Useful Links

Newsletter

Mail Us

Overview Esta es una naranja atrac1va: Adventures in Adap1ng an - PDF document

4/22/19 Overview Esta es una naranja atrac1va: Adventures in Adap1ng an English Research Goal Introduc1on to Grounded Language Language Grounding System to Prior Works Non-English Data Methods Results By: Caroline Kery

December 2013 Rea l esta te d ea ls serv ices Rea l esta te d ea ls serv ices Target properties

Adventures in Elm GOTO Chicago, 24 May 2016 Adventures in Elm Events, Reproducibility, and

SMART LOGISTICS Una logstica nica. Una nica logstica. ABOUT US Smart Logistics SA is a

Fattori Causali di una Patologia Varianza Totale di una Malattia Multifattoriale Genetica

Dispositivi medici software una sfida globale: una nuova era dei dispositivi medici La sicurezza

Shared Ski Adventures Shared Ski Adventures Indoor Training Indoor Training Adaptive Equipment

Dr Stephen Crabbe September 17 th , 2014 The Adventures of Tom Sawyer (1876) Adventures of

Preparing the CMC section of IMPD for biological/ biotechnology derived substances Dr. Una Moore

The Adventures of NIKA2 from the AoD perspective Bilal Ladjelate - Observing the millimeter

8:00-9:45 AM Session One Recent Adventures in Engineering Ethics Andrew McAninch Assistant

The adventures of a Suricate in eBPF land . Leblond Stamus Networks Oct. 6, 2016 . Leblond

OUTDOOR TV NETWORK DK OUTDOOR ADVENTURES OUTDOOR TV NETWORK DELIVERS EXPOSURE TO OVER A MILLION

ADVENTURES IN TIME &amp; SPACE Jim Royer Syracuse University Joint work with Norman Danner

Adventures! ! What did I learn last class? Use simple dictionaries Word Banks Word Walls Sound

Roto Orange A Roto a lightweight, collapsible, hand operated dryer for outdoor adventures 2

Android Malware Adventures Mert Can Cokuner Krat Ouzhan Aknc Android Malware

Efficient Generation and Processing of Word Co-occurrence Networks Using corpus2graph Zheng

of Geometric Concepts Uri Stemmer Ben-Gurion University joint work with Haim Kaplan, Yishay

Text Processing CS440 Text processing NLP tasks typically require multiple steps of text

DCU meets MET: Bengali and Hindi Morpheme Extraction Debasis Ganguly, Johannes Leveling, Gareth

Natural Language Processing CSCI 4152/6509 Lecture 9 Elements of Morphology Instructor:

Web Information Retrieval Lecture 2 Tokenization, Normalization, Speedup, Phrase Queries Recap

Introduction to Artificial Intelligence Natural Language Processing Janyl Jumadinova November

MPII at the NTCIR-14 CENTRE Task Andrew Yates Max Planck Institute for Informatics Motivation

Sambuz

Useful Links

Newsletter

Mail Us

ADVENTURES IN TIME & SPACE Jim Royer Syracuse University Joint work with Norman Danner