leveraging distributed representa0ons and lexico syntac0c
play

Leveraging distributed representa0ons and lexico-syntac0c fixedness - PowerPoint PPT Presentation

Leveraging distributed representa0ons and lexico-syntac0c fixedness for token-level predic0on of the idioma0city of English verbnoun combina0ons Milton King and Paul Cook University of New Brunswick Fredericton, Canada 1 Mul0word


  1. Leveraging distributed representa0ons and lexico-syntac0c fixedness for token-level predic0on of the idioma0city of English verb–noun combina0ons Milton King and Paul Cook University of New Brunswick Fredericton, Canada 1

  2. Mul0word Expressions • Expressions of mul0ple words that can exhibit an idioma0c meaning – Ivory tower – Hit up – Take a walk • Verb noun combina0ons – See stars – Kick the bucket 2

  3. Idioma0c vs Literal • Pull plug – (I) They pulled the plug on the Department of Health funding – (L) Unfortunately someone pulled the sink plug • See stars – (I) It caught him on the head and he went down seeing liAle sparkling stars – (L) It’s sDll dark enough to see the brightest stars 3

  4. Idiom Token Classifica0on • Determine if an MWE instance is idioma0c – They pulled the plug on the project [IdiomaDc/Literal] • Applica0ons – Machine transla0on • Kick the bucket [mourir/frapper avec le pied] – Sentence comple0on • Keegan is ready to pull the plug on [a deal / the tv] 4

  5. Overview of Approach • Supervised approach • VNC token instances are represented via use of an embedding model • Embedding models – Skip-thoughts – Word2vec – Siamese CBOW • SVM classifier 5

  6. Lexico-Syntac0c Fixedness • The idioma0c meaning of an expression is typically restricted to a small number of lexico-syntac0c paVerns • See star (Idioma0c) – Ac0ve voice, no determiner, plural noun • See stars • See star (Literal) – Ac0ve voice, determiner, singular noun • See a star – Passive voice, plural noun • Stars were seen 6

  7. PaVerns Afsaneh Fazly et al. 2009 7

  8. Canonical Form • Lexico-syntac0c paVerns that idioma0c usages tend to occur in Afsaneh Fazly et al. 2009 8

  9. Integra0ng Canonical Forms • Unsupervised method used in Fazly et al. to iden0fy canonical forms • One-dimensional binary vector represen0ng if the expression is in the canonical form 9

  10. VNC-Tokens Dataset Cook et al. 2008 • Dev • Test – 14 MWEs – 14 MWEs – Training – Training • 270 Idiom • 298 Idiom • 179 Literal • 172 Literal – Tes0ng – Tes0ng • 92 Idiom • 90 Idiom • 53 Literal • 53 Literal 10

  11. Accuracy 11

  12. Results per class 12

  13. Conclusion • Averaging word2vec embeddings outperforms all other models used • Canonical form feature improves results • Future work – Unseen MWEs – Other embedding models 13

  14. Thank you This work was financially supported by NSERC, NBIF, and University of New Brunswick 14

  15. Results per class 15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend