Lecture 5: Morphology Kai-Wei Chang CS @ University of Virginia - - PowerPoint PPT Presentation

lecture 5 morphology
SMART_READER_LITE
LIVE PREVIEW

Lecture 5: Morphology Kai-Wei Chang CS @ University of Virginia - - PowerPoint PPT Presentation

Lecture 5: Morphology Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/NLP16 6501 Natural Language Processing 1 This lecture v What is the structure of words? v Can we build an analyzer to


slide-1
SLIDE 1

Lecture 5: Morphology

Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/NLP16

1 6501 Natural Language Processing

slide-2
SLIDE 2

This lecture

v What is the structure of words? v Can we build an analyzer to model the structure of words?

vFinite-state automata and regular expression

2 6501 Natural Language Processing

slide-3
SLIDE 3

Words

vFinite-state methods are particularly useful in dealing with a lexicon

vCompact representations of words

vAgenda

vsome facts about words vcomputational methods

6501 Natural Language Processing 3

slide-4
SLIDE 4

A Turkish word

v How about English?

6501 Natural Language Processing 4

Example from Julia Hockenmaier, Intro to NLP

slide-5
SLIDE 5

Longest word in English

v Longest word in Shakespeare’s

Honorificabilitudinitatibus (27 letters)

v Longest non-technical word:

Antidisestablishmentarianism (28 letters)

v Longest word in a major dictionary

Pneumonoultramicroscopicsilicovolcanoconiosis (45 letters)

v Longest word in literature

Lopadotemachoselachogaleokranioleipsano...pterygon (182 letters) – Ancient greek transliteration v Methionylthreonylthreonylglutaminylarginyl...isoleucine (189,819 letters) – chemical name of a protein

6501 Natural Language Processing 5

slide-6
SLIDE 6

What is Morphology?

v The ways that words are built up from smaller meaningful units (morphemes) v Two classes of morphemes

vStems: The core meaning-bearing units vAffixes: adhere to stems to change their meanings and grammatical functions ve.g,. dis-grace-ful-ly

6501 Natural Language Processing 6

slide-7
SLIDE 7

Inflection Morphology

Create different forms of the same word: v Examples:

vVerbs: walk, walked, walks vNouns: Book, books, book’s vPersonal pronouns: he, she, her, them, us

v Serves a grammatical/semantic purpose that is different from the original but is transparently related to the original

6501 Natural Language Processing 7

slide-8
SLIDE 8

Derivational Morphology

Create different words from the same lemma: v Nominalization:

v V+ -ation: e.g., computerization v V+er: killer

v Negation:

v Un-: Unod, unseen, … v Mis-: mistake, misunderstand ...

v Adjectivization:

v V+-able: doable v N+-al: national

6501 Natural Language Processing 8

slide-9
SLIDE 9

What else?

v Combines words into a new word:

vCream, ice cream, ice cream cone, ice cream cone bakery

v Word formation is productive

vGoogle, Googler, to google, to misgoogle, to googlefy, googlification vGoogle Map, Google Book, …

6501 Natural Language Processing 9

slide-10
SLIDE 10

Morphological parsing and generation v Morphological parsing: v Morphological generation

vWhat words can be generated from grace? grace, graceful, gracefully, disgrace, ungrace, undisgraceful, undisgracefully

6501 Natural Language Processing 10

slide-11
SLIDE 11

Finite State Automata

vFSA and regular expression has the same expressive power vThe above FSA accepts string r/baa+!/

6501 Natural Language Processing 11

slide-12
SLIDE 12

Finite State Automata

v Terminology:

v It has 5 states v Alphabet: {b, a, !} v Start state: 𝑟" v Accept state: 𝑟# v 5 transitions

v Are there other machines that correspond to the same language r/baa+!/ ?

v Yes

6501 Natural Language Processing 12

Alphabet just means a finite set of symbols in the input Can have many accept states

slide-13
SLIDE 13

Formal definition

v You can specify an FSA by enumerating the following things.

vThe set of states: Q vA finite alphabet: Σ vA start state vA set of accept/final states vA transition function that maps QxΣ to Q

6501 Natural Language Processing 13

slide-14
SLIDE 14

Example -- dollars and Cents

6501 Natural Language Processing 14

slide-15
SLIDE 15

Yet another view – table representation

6501 Natural Language Processing 15

b a ! e 1 1 2 2 2,3 3 4 4

If you’re in state 1 and you’re looking at an a, go to state 2

slide-16
SLIDE 16

Non-Deterministic FSA

v 𝜗- transition v More than one possible next states v Equivalent to deterministic FSA

6501 Natural Language Processing 16

slide-17
SLIDE 17

Regular expression

v Equivalent to FSA v Matching strings with regular expressions (e.g., perl, python, grep)

v translating the regular expression into a machine (a table) and v passing the table and the string to an interpreter

6501 Natural Language Processing 17

slide-18
SLIDE 18

Model morphology with FSA

v Regular singular nouns are ok v Regular plural nouns have an -s on the end v Irregulars are ok as is

6501 Natural Language Processing 18

slide-19
SLIDE 19

Now plug in the words

6501 Natural Language Processing 19

slide-20
SLIDE 20

Derivational Rules

6501 Natural Language Processing 20

slide-21
SLIDE 21

From recognition to parsing

v Now we can use these machines to recognize strings v Can we use the machines to assign a structure to a string? (parsing) v Example:

vFrom “cats” to “cat +N +p”

6501 Natural Language Processing 21

slide-22
SLIDE 22

Transitions

v c:c reads a c and write a c

v ε:+N reads nothing and write +N

6501 Natural Language Processing 22

c:c a:a t:t ε: +N

s: +p

slide-23
SLIDE 23

Challenge: Ambiguity

v books: book +N +p or book +V +z (3rd person) v Non-deterministic FSA: allows multiple paths through a machine lead to the same accept state v Bias the search (or learn) so that a few likely paths are explored

6501 Natural Language Processing 23

slide-24
SLIDE 24

Challenge: Spelling rules

v The underlying morphemes (e.g., plural-s) can have different surface realization (-s, -es)

v cat+s = cats v fox+s = foxes v Make+ing = making

v How can we model it?

6501 Natural Language Processing 24

slide-25
SLIDE 25

Intermediate representation

6501 Natural Language Processing 25

slide-26
SLIDE 26

Overall Scheme

v One FST that has explicit information about the lexicon

vLexical level to intermediate forms

v Large set of machines that capture spelling rules

vIntermediate forms to surface

6501 Natural Language Processing 26

slide-27
SLIDE 27

Lexical to intermediate level

6501 Natural Language Processing 27

slide-28
SLIDE 28

Intermediate level to surface

v The add and “e” rule for –s

vExample: fox^s# ↔ foxes#

6501 Natural Language Processing 28

slide-29
SLIDE 29

Other application of FST

v ELIZA: https://en.wikipedia.org/wiki/ELIZA v Implemented using pattern matching -- FST

6501 Natural Language Processing 29

slide-30
SLIDE 30

ELIZA as a FST cascade

Human: You don't argue with me. Computer: WHY DO YOU THINK I DON'T ARGUE WITH YOU A simple rule:

  • v1. Replace you with I and me with you:

I don't argue with you.

  • v2. Replace <...> with Why do you think <...>:

Why do you think I don't argue with you.

6501 Natural Language Processing 30

slide-31
SLIDE 31

What about compounds?

v Compounds have heretical structure:

v(((ice cream) cone) bakery) not (ice ((cream cone) bakery)) v((computer science) (graduate student)) not (computer ((science graduate) student))

v We need context-free grammars to capture this underlying structure

6501 Natural Language Processing 31