Leveraging distributed representa0ons and lexico-syntac0c fixedness - - PowerPoint PPT Presentation

leveraging distributed representa0ons and lexico syntac0c
SMART_READER_LITE
LIVE PREVIEW

Leveraging distributed representa0ons and lexico-syntac0c fixedness - - PowerPoint PPT Presentation

Leveraging distributed representa0ons and lexico-syntac0c fixedness for token-level predic0on of the idioma0city of English verbnoun combina0ons Milton King and Paul Cook University of New Brunswick Fredericton, Canada 1 Mul0word


slide-1
SLIDE 1

Leveraging distributed representa0ons and lexico-syntac0c fixedness for token-level predic0on

  • f the idioma0city of English

verb–noun combina0ons

Milton King and Paul Cook University of New Brunswick Fredericton, Canada

1

slide-2
SLIDE 2

Mul0word Expressions

  • Expressions of mul0ple words that can exhibit

an idioma0c meaning

– Ivory tower – Hit up – Take a walk

  • Verb noun combina0ons

– See stars – Kick the bucket

2

slide-3
SLIDE 3

Idioma0c vs Literal

  • Pull plug

– (I) They pulled the plug on the Department of Health funding – (L) Unfortunately someone pulled the sink plug

  • See stars

– (I) It caught him on the head and he went down seeing liAle sparkling stars – (L) It’s sDll dark enough to see the brightest stars

3

slide-4
SLIDE 4

Idiom Token Classifica0on

  • Determine if an MWE instance is idioma0c

– They pulled the plug on the project [IdiomaDc/Literal]

  • Applica0ons

– Machine transla0on

  • Kick the bucket [mourir/frapper avec le pied]

– Sentence comple0on

  • Keegan is ready to pull the plug on [a deal / the tv]

4

slide-5
SLIDE 5

Overview of Approach

  • Supervised approach
  • VNC token instances are represented via use
  • f an embedding model
  • Embedding models

– Skip-thoughts – Word2vec – Siamese CBOW

  • SVM classifier

5

slide-6
SLIDE 6

Lexico-Syntac0c Fixedness

  • The idioma0c meaning of an expression is

typically restricted to a small number of lexico-syntac0c paVerns

  • See star (Idioma0c)

– Ac0ve voice, no determiner, plural noun

  • See stars
  • See star (Literal)

– Ac0ve voice, determiner, singular noun

  • See a star

– Passive voice, plural noun

  • Stars were seen

6

slide-7
SLIDE 7

PaVerns

7

Afsaneh Fazly et al. 2009

slide-8
SLIDE 8

Canonical Form

8

Afsaneh Fazly et al. 2009

  • Lexico-syntac0c paVerns that idioma0c usages tend

to occur in

slide-9
SLIDE 9

Integra0ng Canonical Forms

  • Unsupervised method used in Fazly et al. to

iden0fy canonical forms

  • One-dimensional binary vector represen0ng if

the expression is in the canonical form

9

slide-10
SLIDE 10

VNC-Tokens Dataset

Cook et al. 2008

  • Dev

– 14 MWEs – Training

  • 270 Idiom
  • 179 Literal

– Tes0ng

  • 92 Idiom
  • 53 Literal

10

  • Test

– 14 MWEs – Training

  • 298 Idiom
  • 172 Literal

– Tes0ng

  • 90 Idiom
  • 53 Literal
slide-11
SLIDE 11

Accuracy

11

slide-12
SLIDE 12

Results per class

12

slide-13
SLIDE 13

Conclusion

  • Averaging word2vec embeddings outperforms

all other models used

  • Canonical form feature improves results
  • Future work

– Unseen MWEs – Other embedding models

13

slide-14
SLIDE 14

Thank you

This work was financially supported by NSERC, NBIF, and University of New Brunswick

14

slide-15
SLIDE 15

Results per class

15