lecture 5 morphology
play

Lecture 5: Morphology Kai-Wei Chang CS @ University of Virginia - PowerPoint PPT Presentation

Lecture 5: Morphology Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/NLP16 6501 Natural Language Processing 1 This lecture v What is the structure of words? v Can we build an analyzer to


  1. Lecture 5: Morphology Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/NLP16 6501 Natural Language Processing 1

  2. This lecture v What is the structure of words? v Can we build an analyzer to model the structure of words? v Finite-state automata and regular expression 6501 Natural Language Processing 2

  3. Words v Finite-state methods are particularly useful in dealing with a lexicon v Compact representations of words v Agenda v some facts about words v computational methods 6501 Natural Language Processing 3

  4. A Turkish word v How about English? Example from Julia Hockenmaier, Intro to NLP 6501 Natural Language Processing 4

  5. Longest word in English v Longest word in Shakespeare’s Honorificabilitudinitatibus (27 letters) v Longest non-technical word: Antidisestablishmentarianism (28 letters) v Longest word in a major dictionary Pneumonoultramicroscopicsilicovolcanoconiosis (45 letters) v Longest word in literature Lopadotemachoselachogaleokranioleipsano...pterygon (182 letters) – Ancient greek transliteration v Methionylthreonylthreonylglutaminylarginyl...isoleucine (189,819 letters) – chemical name of a protein 6501 Natural Language Processing 5

  6. What is Morphology? v The ways that words are built up from smaller meaningful units (morphemes) v Two classes of morphemes v Stems: The core meaning-bearing units v Affixes: adhere to stems to change their meanings and grammatical functions v e.g,. dis-grace-ful-ly 6501 Natural Language Processing 6

  7. Inflection Morphology Create different forms of the same word: v Examples: v Verbs: walk, walked, walks v Nouns: Book, books, book’s v Personal pronouns: he, she, her, them, us v Serves a grammatical/semantic purpose that is different from the original but is transparently related to the original 6501 Natural Language Processing 7

  8. Derivational Morphology Create different words from the same lemma: v Nominalization: v V+ -ation: e.g., computerization v V+er: killer v Negation: v Un-: Unod, unseen, … v Mis-: mistake, misunderstand ... v Adjectivization: v V+-able: doable v N+-al: national 6501 Natural Language Processing 8

  9. What else? v Combines words into a new word: v Cream, ice cream, ice cream cone, ice cream cone bakery v Word formation is productive v Google, Googler, to google, to misgoogle, to googlefy, googlification v Google Map, Google Book, … 6501 Natural Language Processing 9

  10. Morphological parsing and generation v Morphological parsing: v Morphological generation v What words can be generated from grace? grace, graceful, gracefully, disgrace, ungrace, undisgraceful, undisgracefully 6501 Natural Language Processing 10

  11. Finite State Automata v FSA and regular expression has the same expressive power v The above FSA accepts string r/baa+!/ 6501 Natural Language Processing 11

  12. Finite State Automata v Terminology: Alphabet just means a finite v It has 5 states set of symbols in the input v Alphabet: {b, a, !} v Start state: 𝑟 " Can have many accept states v Accept state: 𝑟 # v 5 transitions v Are there other machines that correspond to the same language r/baa+!/ ? v Yes 6501 Natural Language Processing 12

  13. Formal definition v You can specify an FSA by enumerating the following things. v The set of states: Q v A finite alphabet: Σ v A start state v A set of accept/final states v A transition function that maps Qx Σ to Q 6501 Natural Language Processing 13

  14. Example -- dollars and Cents 6501 Natural Language Processing 14

  15. Yet another view – table representation b a ! e 0 1 1 2 2 2,3 If you’re in state 1 and you’re looking at 3 4 an a, go to state 2 4 6501 Natural Language Processing 15

  16. Non-Deterministic FSA v 𝜗 - transition v More than one possible next states v Equivalent to deterministic FSA 6501 Natural Language Processing 16

  17. Regular expression v Equivalent to FSA v Matching strings with regular expressions (e.g., perl, python, grep) v translating the regular expression into a machine (a table) and v passing the table and the string to an interpreter 6501 Natural Language Processing 17

  18. Model morphology with FSA v Regular singular nouns are ok v Regular plural nouns have an -s on the end v Irregulars are ok as is 6501 Natural Language Processing 18

  19. Now plug in the words 6501 Natural Language Processing 19

  20. Derivational Rules 6501 Natural Language Processing 20

  21. From recognition to parsing v Now we can use these machines to recognize strings v Can we use the machines to assign a structure to a string? (parsing) v Example: v From “cats” to “cat +N +p” 6501 Natural Language Processing 21

  22. Transitions ε : +N s: + p c:c a:a t:t v c:c reads a c and write a c v ε :+N reads nothing and write +N 6501 Natural Language Processing 22

  23. Challenge: Ambiguity v books: book +N +p or book +V +z (3 rd person) v Non-deterministic FSA: allows multiple paths through a machine lead to the same accept state v Bias the search (or learn) so that a few likely paths are explored 6501 Natural Language Processing 23

  24. Challenge: Spelling rules v The underlying morphemes (e.g., plural-s) can have different surface realization (-s, -es) v cat+s = cats v fox+s = foxes v Make+ing = making v How can we model it? 6501 Natural Language Processing 24

  25. Intermediate representation 6501 Natural Language Processing 25

  26. Overall Scheme v One FST that has explicit information about the lexicon v Lexical level to intermediate forms v Large set of machines that capture spelling rules v Intermediate forms to surface 6501 Natural Language Processing 26

  27. Lexical to intermediate level 6501 Natural Language Processing 27

  28. Intermediate level to surface v The add and “e” rule for –s v Example: fox^s# ↔ foxes# 6501 Natural Language Processing 28

  29. Other application of FST v ELIZA: https://en.wikipedia.org/wiki/ELIZA v Implemented using pattern matching -- FST 6501 Natural Language Processing 29

  30. ELIZA as a FST cascade Human: You don't argue with me. Computer: WHY DO YOU THINK I DON'T ARGUE WITH YOU A simple rule: v 1. Replace you with I and me with you: I don't argue with you. v 2. Replace <...> with Why do you think <...>: Why do you think I don't argue with you. 6501 Natural Language Processing 30

  31. What about compounds? v Compounds have heretical structure: v (((ice cream) cone) bakery) not (ice ((cream cone) bakery)) v ((computer science) (graduate student)) not (computer ((science graduate) student)) v We need context-free grammars to capture this underlying structure 6501 Natural Language Processing 31

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend