character eyes seeing language through character level
play

Character Eyes: Seeing Language through Character-Level Taggers - PowerPoint PPT Presentation

Character Eyes: Seeing Language through Character-Level Taggers Yuval Pinter Marc Marone Jacob Eisenstein @yuvalpi @ruyimarone @jacobeisenstein Blackbox NLP 2019 https://github.com/ruyimarone/character-eyes Taggers N sg V past RB DET


  1. Character Eyes: Seeing Language through Character-Level Taggers Yuval Pinter Marc Marone Jacob Eisenstein @yuvalpi @ruyimarone @jacobeisenstein Blackbox NLP 2019 https://github.com/ruyimarone/character-eyes

  2. Taggers N sg V past RB DET The cat walked fast 3

  3. Neural Taggers DET N sg V past RB The cat walked fast 4

  4. Character-level Neural Taggers DET N sg V past RB The cat walked fast T h e c a t w a l k e d f a s t 5

  5. Character-level Recurrent Neural Taggers DET N sg V past RB The cat walked fast T h e c a t w a l k e d f a s t 6

  6. Recurrent Taggers – Good at Finding Morphemes? DET N sg V past RB The cat walked fast T h e c a t w a l k e d f a s t 7

  7. Recurrent Taggers – Good at Finding Morphemes? DET N sg V past RB Agglutination The cat walked fast T h e c a t w a l k e d f a s t 7

  8. Recurrent Taggers – Good at Prefixes and Suffixes? N sg;def V past RB thecat walked fast t h e c a t w a l k e d f a s t 8

  9. Recurrent Taggers – Good at Prefixes and Suffixes? N sg;def V past RB Prefixing morphology thecat walked fast (e.g. Coptic) t h e c a t w a l k e d f a s t 8

  10. Recurrent Taggers – Can They Handle diSCoNtinUiTY? DET N sg V past RB The cat waeldk fast T h e c a t w a e l d k f a s t 9

  11. Recurrent Taggers – Can They Handle diSCoNtinUiTY? DET N sg V past RB Introflexive morphology The cat waeldk fast (Hebrew, Arabic) T h e c a t w a e l d k f a s t 9

  12. Main Idea(s) Model Language w a l k e d t h e c a t w a e l d k 10

  13. Main Idea(s) Model Language measure how models encode different w a l k e d linguistic patterns t h e c a t w a e l d k 10

  14. Main Idea(s) Model Language characterize languages based on model analysis; w a l k e d help engineer language- aware systems t h e c a t w a e l d k 11

  15. Analysis Primitive – Unit Decomposition DET N sg V past RB The cat walked fast T h e c a t w a l k e d f a s t 12

  16. Analysis Primitive – Unit Decomposition ● Assumption: units are “in DET N sg V past RB charge” of tracking morphemes that help predict POS The cat walked fast Hidden unit #n T h e c a t w a l k e d f a s t 12

  17. Analysis Primitive – Unit Decomposition ● Assumption: units are “in DET N sg V past RB charge” of tracking morphemes that help predict POS The cat walked fast ● Hypothesis: easy for agglutinations , difficult for introflexions Hidden unit #n T h e c a t w a l k e d f a s t 12

  18. Analysis Primitive – Unit Decomposition ● Assumption: units are “in DET N sg V past RB charge” of tracking morphemes that help predict POS The cat walked fast ● Hypothesis: easy for agglutinations , difficult for introflexions Hidden unit #n ● Hypothesis: unit’s direction Hidden unit #m affects ease of tracking suffixes vs. prefixes T h e c a t w a l k e d f a s t 12

  19. Evidence? ● Turkish is an agglutinative language ○ ev ‘house’; evler ‘houses’; evleriniz ‘your houses’; evlerinizden ‘from your houses’ 13

  20. Evidence? ● Turkish is an agglutinative language ○ ev ‘house’; evler ‘houses’; evleriniz ‘your houses’; evlerinizden ‘from your houses’ Unit 3 ( → ) 13

  21. Evidence? ● Turkish is an agglutinative language ○ ev ‘house’; evler ‘houses’; evleriniz ‘your houses’; evlerinizden ‘from your houses’ Unit 3 ( → ) Unit 124 (  ) 13

  22. Model & Data DET N sg V past RB The cat walked fast T h e c a t w a l k e d f a s t 14

  23. Model & Data ● Universal Dependencies (n=24) DET N sg V past RB ○ POS tags + Morphosyntactic Descriptions The cat walked fast T h e c a t w a l k e d f a s t 14

  24. Model & Data ● Universal Dependencies (n=24) DET N sg V past RB ○ POS tags + Morphosyntactic Descriptions ● Linguistic diversity – morph. synthesis: ○ 5 agglutinative languages 2 introflexive languages ○ ○ 3 isolating, 14 fusional The cat walked fast T h e c a t w a l k e d f a s t Source for language classes: WALS 14

  25. Model & Data ● Universal Dependencies (n=24) DET N sg V past RB ○ POS tags + Morphosyntactic Descriptions ● Linguistic diversity – morph. synthesis: ○ 5 agglutinative languages 2 introflexive languages ○ ○ 3 isolating, 14 fusional The cat walked fast ● Linguistic diversity – affixation: ○ (All) 1 prefixing language 2 non-affixing ○ ○ 2 equally pre- and suffixing T h e c a t w a l k e d f a s t ○ 19 suffixing Source for language classes: WALS 14

  26. Model & Data ● Universal Dependencies (n=24) DET N sg V past RB ○ POS tags + Morphosyntactic Descriptions Linguistic diversity (synthesis + affixation) ○ ● Word → Tag: Bidirectional LSTM + MLP (Not analyzed) ○ The cat walked fast ○ No word embeddings T h e c a t w a l k e d f a s t 15

  27. Model & Data ● Universal Dependencies (n=24) DET N sg V past RB ○ POS tags + Morphosyntactic Descriptions Linguistic diversity (synthesis + affixation) ○ ● Word → Tag: Bidirectional LSTM + MLP (Not analyzed) ○ The cat walked fast ○ No word embeddings ● Char → Word: Bidirectional LSTM ○ Char embedding size: 256 T h e c a t w a l k e d f a s t 15

  28. Analysis Metrics 16

  29. Analysis Metrics ● Run model on training data words 16

  30. Analysis Metrics ● Run model on training data words ● Collect activation levels for each unit 16

  31. Analysis Metrics ● Run model on training data words 0.42 ● Collect activation levels for each unit ● Aggregate to single measure (e.g. average absolute or max-delta ) 16

  32. Analysis Metrics ● Run model on training data words 0.42 ● Collect activation levels for each unit ● Aggregate to single measure (e.g. average absolute or max-delta ) ● Bin per unit over parts of speech … Unit 42 [0.0,0.1) [0.1,0.2) [0.9,1.0) … NOUN 8 2 40 … VERB 20 0 4 … … … … … … ADJ 10 10 10 16

  33. Analysis Metrics ● Run model on training data words 0.42 ● Collect activation levels for each unit ● Aggregate to single measure (e.g. average absolute or max-delta ) ● Bin per unit over parts of speech ● Mutual Information metric – POS … Unit 42 [0.0,0.1) [0.1,0.2) [0.9,1.0) … NOUN 8 2 40 Discrimination Index, or PDI … VERB 20 0 4 ○ (Higher PDI = better discriminator) … … … … … … ADJ 10 10 10 16

  34. … Analysis Metrics Unit 40 [0.0,0.1) [0.1,0.2) [0.9,1.0) … Unit 41 [0.0,0.1) [0.1,0.2) [0.9,1.0) … NOUN 8 2 40 … Unit 42 [0.0,0.1) [0.1,0.2) [0.9,1.0) … NOUN 8 2 40 … VERB 20 0 4 … NOUN 8 2 40 … VERB 20 0 4 … … … … … ● Run model on training data words … VERB 20 0 4 … … … … … … ADJ 10 10 10 … … … … … ● Collect activation levels for each unit … ADJ 10 10 10 … ADJ 10 10 10 ● Aggregate to single measure (e.g. average absolute or max-delta ) ● Bin per unit over parts of speech ● Mutual Information metric – POS Discrimination Index, or PDI ○ (Higher PDI = better discriminator) ● Aggregate across units by 17

  35. … Analysis Metrics Unit 40 [0.0,0.1) [0.1,0.2) [0.9,1.0) … Unit 41 [0.0,0.1) [0.1,0.2) [0.9,1.0) … NOUN 8 2 40 … Unit 42 [0.0,0.1) [0.1,0.2) [0.9,1.0) … NOUN 8 2 40 … VERB 20 0 4 … NOUN 8 2 40 … VERB 20 0 4 … … … … … ● Run model on training data words … VERB 20 0 4 … … … … … … ADJ 10 10 10 … … … … … ● Collect activation levels for each unit … ADJ 10 10 10 … ADJ 10 10 10 ● Aggregate to single measure (e.g. average absolute or max-delta ) ● Bin per unit over parts of speech ● Mutual Information metric – POS  mass median Discrimination Index, or PDI ○ (Higher PDI = better discriminator) ● Aggregate across units by Summing total mass ○ Reporting % of forward units before mass median ○ 17

  36. Findings (Cherry Pick) ● English: fusional, suffixing ● Coptic: agglutinative, prefixing ○ Small mass (hard to capture POS) Large mass (easy to distinguish POS ○ based on char sequence) Backward-heavy (80%) ○ ○ Forward-heavy (71%) 18

  37. Total PDI mass Findings (General Trends) 19

  38. Total PDI mass Findings (General Trends) ● 4/5 agglutinatives hold 4/6 top total-mass positions 19

  39. Total PDI mass Findings (General Trends) ● 4/5 agglutinatives hold 4/6 top total-mass positions ● 2/2 introflexives in bottom 2/4 spots (Persian and Hindi below, both fusional w/ non-Latin charsets) 19

  40. Direction Balance Study 20

  41. Direction Balance Study ● Some languages might not need two equal LSTM directions 20

  42. Direction Balance Study ● Some languages might not need two equal LSTM directions ● What if… they don’t need one of them at all? 20

  43. Direction Balance Study ● Some languages might not need two equal LSTM directions ● What if they need them in a different balance? Somewhere in the middle? ● What if… they don’t need one of them at all? 20

  44. Balance Study – Results 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend