MORSE: Semantic-ally Drive-n MORpheme SEgment-er Samuel MORSE - - PowerPoint PPT Presentation

morse
SMART_READER_LITE
LIVE PREVIEW

MORSE: Semantic-ally Drive-n MORpheme SEgment-er Samuel MORSE - - PowerPoint PPT Presentation

Tarek Sakakini Suma Bhat Pramod Viswanath MORSE: Semantic-ally Drive-n MORpheme SEgment-er Samuel MORSE minimized the number of on-off clicks for non-verbal communication. This MORSE minimizes the vocabulary size for Natural Language Processing


slide-1
SLIDE 1

MORSE:

Semantic-ally Drive-n MORpheme SEgment-er

Tarek Sakakini Suma Bhat Pramod Viswanath

Samuel MORSE minimized the number of on-off clicks for non-verbal communication. This MORSE minimizes the vocabulary size for Natural Language Processing systems.

slide-2
SLIDE 2

Morpheme Segmentation 1

slide-3
SLIDE 3

Morpheme Segmentation

Hopefully

slide-4
SLIDE 4

Not a trivial task

+ing

Player Playing Beijing Butterflies s

+er +s

slide-5
SLIDE 5

Machine Translation

Applications

Quick

  • Rapide

ly

  • ment

Sad

  • Triste

Sadly Tristement

Quickly

  • Rapidement

Sad

  • Triste

Sadly ???

Model: Test: Model: Test:

slide-6
SLIDE 6

Applications

Information Retrieval

Here at Toyota World, we have the cheapest cars in town. We are proudly called the first and last stop. … …

slide-7
SLIDE 7

Previous Work

2

slide-8
SLIDE 8

Letter Successor Variety (Harris, 1970)

H e l p l e s s l y

slide-9
SLIDE 9

Morfessor (Creutz and Lagos, 2005)

Help: 2387 Helping: 1586 Helper: 498 Helps: 2437 Jump: 1847 Jumping: 1664 Jumper: 1290 Jumps: 2987

slide-10
SLIDE 10

Downsides

Freshman Butterfl ies Butterfly ies

slide-11
SLIDE 11

Locally Semantic

car caring car cars

Cosine similarity

(Schone and Jurafsky, 2000) (Narasimhan et al., 2015) (Luo et al., 2017)

slide-12
SLIDE 12

Distinguishing criteria

car cars car cars player players runner runners goal goals play plays fine fines wheel wheels hand hands laptop laptops lab labs

slide-13
SLIDE 13

MORSE

3

Unsupervised Morphology Learning Input: Word Embeddings Segmentation: Optimization Problem 4 hyperparameters: Small tuning dataset

slide-14
SLIDE 14

Learning Morphology

Step 1

slide-15
SLIDE 15

Collecting candidate morphological rules

Vocabulary: jump play buy jumping playing buying jumper player buyer ….. and stand

(suf, ∅, ing): (suf, ∅, er): (pre, ∅, st):

(jump, jumping) (play, playing) (buy, buying) (jump, jumper) (play, player) (buy, buyer) (and, stand) (ore, store) (one, stone) (jump, jumping) (play, playing) (buy, buying) (and, stand) (Soricut and Och, 2015)

slide-16
SLIDE 16

Signals

Orthographic Semantic

Word Embeddings

quick quickly quick quickly wrong wrongly beautiful beautifully confident confidently beautiful beautifully confident confidently wrong wrongly

slide-17
SLIDE 17

What makes a good rule? Signal 1: Orthography

(beautiful, beautifully) (quick, quickly) (confident, confidently) Rule = (suf, ∅, ly) Size = 8723 ……………………………………… ……………………. (wrong, wrongly) Rule = (pre, ∅, st) Size= 16 (ore, store) …… (amp, stamp)

slide-18
SLIDE 18

What makes a good rule? Signal 2: Semantics

  • re

store amp stamp and stand

  • ne

stone quick quickly wrong wrongly beautiful beautifully confident confidently

slide-19
SLIDE 19

What makes a good member of a rule?

Scope: Vocabulary-Wide

quick quickly wrong wrongly beautiful beautifully confident confidently

  • n
  • nly
slide-20
SLIDE 20

What makes a good member of a rule?

Scope: Local

confident confidently

  • n
  • nly
slide-21
SLIDE 21

Segmenting

Step 2

slide-22
SLIDE 22

Linear Optimization Problem uncaring

(caring, uncaring) (ring, uncaring) (uncare, uncaring)

t1 t2 t3 t4

slide-23
SLIDE 23

(care, caring) (car, caring)

Iterate caring

t1 t2 t3 t4

(carol, caring)

un + caring

slide-24
SLIDE 24

(car, care)

Iterate care

t1 t2 t3 t4

(ca, care) (re, care)

un + care + ing

slide-25
SLIDE 25

Experiments

4

slide-26
SLIDE 26

Experimental Setup

Training Languages Gold Datasets Morpho Challenge

jumping jump ing playing play ing jumps jump s calls call s rooms room s

slide-27
SLIDE 27

Experiments

64.35 31.01 34.06 70.32 38.07 14.98 10 20 30 40 50 60 70 80 English Turkish Finnish Morfessor MORSE

slide-28
SLIDE 28

Morpho Challenge downsides

Business Turning-point Player’s Turning

Non-compositional Human error Trivial instances

slide-29
SLIDE 29

Experiments

◉ 2000 words ◉ Compositional ◉ 91% inter-annotator agreement ◉ In canonical (butterfly + ies) and non-canonical version (butterfl + ies)

New Dataset: SD17

slide-30
SLIDE 30

Results on SD17

57.31 81.01 83.96 10 20 30 40 50 60 70 80 90 Morfessor MORSE (tuned on MC) MORSE (tuned on SD17) F-score

slide-31
SLIDE 31

Against state-of-the-art

83.96 79.9 67.4 67.14 10 20 30 40 50 60 70 80 90 MORSE MorphoChain Morfessor S + W Morfessor S + W+ L F-Scores

slide-32
SLIDE 32

Negative Dataset

◉ 100 words like: honeymoon, passport, outdoors ◉ Checks for robustness

43 7 5 10 15 20 25 30 35 40 45 50 Morfessor MORSE #Segmentations

slide-33
SLIDE 33

Looking forward ◉ Robustness to highly agglutinative languages ◉ Extending to other languages (non-concatenative)

k a t a b a i A

slide-34
SLIDE 34

Looking forward ◉ Morphological mappings across languages

(suf, ∅, ly) (suf, ∅, ment) (suf, ∅, s)

English French

(suf, ∅, es) (suf, ∅, s)

slide-35
SLIDE 35

Links

https://morse.mybluemix.net https://github.com/yoonlee95/morse_segmentation

slide-36
SLIDE 36

Thank you

Questions?

slide-37
SLIDE 37

Effect of Hyperparameters

Precision Recall

slide-38
SLIDE 38

Prerequisite

Morpho-syntactic regularities in word vectors

play playing jump jumping scream screaming s sing store tore scream cream smile mile slay lay Valid rule with an invalid instance Invalid rule

(suf, ∅, ing) (s, sing)

(pre, ∅, s)

slide-39
SLIDE 39

Demo

4

morse.mybluemix.net