morphological analysis
play

Morphological Analysis Daniel Zeman March 4, 2020 NPFL124 Natural - PowerPoint PPT Presentation

Morphological Analysis Daniel Zeman March 4, 2020 NPFL124 Natural Language Processing Charles University Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics unless otherwise stated Morphological Annotation NOUN


  1. Morphological Analysis Daniel Zeman March 4, 2020 NPFL124 Natural Language Processing Charles University Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics unless otherwise stated

  2. Morphological Annotation NOUN sell sell VERB Mood=Ind|Number=Plur|Person=3|Tense=Pres 5 books book Number=Plur _ 6 . . PUNCT _ Morphological Analysis Finite-State Morphology 4 CCONJ ID PRON FORM LEMMA POS FEATS 1 They they Case=Nom|Number=Plur and 2 buy buy VERB Mood=Ind|Number=Plur|Person=3|Tense=Pres 3 and 1/48

  3. Morphological Annotation 3 FORM LEMMA XPOS 1 Kupují kupovat VB-P---3P-AA--- 2 a a J ̂------------- prodávají _ prodávat VB-P---3P-AA--- 4 knihy kniha NNFP4-----A---- 5 . . Z:------------- Morphological Analysis Finite-State Morphology ID PUNCT ID CCONJ FORM LEMMA POS FEATS 1 Kupují kupovat VERB Mood=Ind|Number=Plur|Person=3|Tense=Pres 2 a a _ . 3 prodávají prodávat VERB Mood=Ind|Number=Plur|Person=3|Tense=Pres 4 knihy kniha NOUN Case=Acc|Gender=Fem|Number=Plur 5 . 2/48

  4. Morphological Annotation 3 FORM LEMMA XPOS 1 Kupují kupovat VB-P---3P-AA--- 2 a a J ̂------------- prodávají _ prodávat VB-P---3P-AA--- 4 knihy kniha NNFP4-----A---- 5 . . Z:------------- Morphological Analysis Finite-State Morphology ID PUNCT ID CCONJ FORM LEMMA POS FEATS 1 Kupují kupovat VERB Mood=Ind|Number=Plur|Person=3|Tense=Pres 2 a a _ . 3 prodávají prodávat VERB Mood=Ind|Number=Plur|Person=3|Tense=Pres 4 knihy kniha NOUN Case=Acc|Gender=Fem|Number=Plur 5 . 2/48

  5. Tagsets Morphological Analysis Finite-State Morphology 3/48 • Tag as a set of feature (category) values … ( k 1 , k 2 , ..., k n ) • Simple list of tags T = { t i } i =1 ..n • 1-1 mapping between tags and feature-value space T ↔ ( K 1 , K 2 , ..., K n ) • English • Penn Treebank (45 tags), Brown Corpus (87), Claws c5 (62), London-Lund (197) • Czech • Prague Dependency Treebank (4294; positional), Multext-East (1458; Orwell 1984 parallel corpus), Majka / Desam (MU Brno), Prague Spoken Corpus (over 10000!) • Universal Dependencies (UD) • 17 universal POS tags (UPOS) • 24 universal features, each 1 – 34 possible values

  6. Czech Positional Tags of PDT 1 3 - - AGFS3----1A---- - - A F - - - - Morphological Analysis Finite-State Morphology S 4/48 A G e n e h r r r r e y e e s s s o e e e e o c c e t l a n y d b d b s i i p r e r o c t r e g n m n m a e b s e v t e p e e l u u u p o d g g s s n n p f s o s s o s t o p r p a p

  7. Parts of Speech in PDT Morphological Analysis Finite-State Morphology 5/48 • N noun (podstatné jméno) • A adjective (přídavné jméno) • P pronoun (zájmeno) • C numeral (číslovka) • V verb (sloveso) • D adverb (příslovce) • R preposition (předložka) • J conjunction (spojka) • T particle (částice) • I interjection (citoslovce) • Z special (e.g. punctuation) (zvláštní, např. interpunkce) • X unknown word (neznámé slovo)

  8. Gender in PDT N Finite-State Morphology Morphological Analysis M, I or N Z unknown (neznámý) X F or N H, Q neuter (střední) I or N M W feminine (ženský) F I or F T masculine inanimate (mužský neživotný) I M or I Y masculine animate (mužský životný) 6/48

  9. Number in PDT S singular (jednotné) D P plural (množné) X unknown (neznámé) Morphological Analysis Finite-State Morphology 7/48 dual (dvojné)

  10. Case in PDT 6 Finite-State Morphology Morphological Analysis unknown (neznámý) X instrumental (sedmý pád) 7 locative (šestý pád) vocative (pátý pád) 1 5 accusative (čtvrtý pád) 4 dative (třetí pád) 3 genitive (druhý pád) 2 nominative (první pád) 8/48

  11. Degree, Polarity, and Person Morphological Analysis Finite-State Morphology 9/48 • Degree of comparison of adjectives and adverbs: • 1 (positive), 2 (comparative), 3 (superlative) • Polarity of verbs, adjectives, adverbs, and nouns: • A (affjrmative), N (negative) • Person of pronouns and verbs: • 1, 2, 3

  12. Mood, Tense, and Voice are subparts of speech Morphological Analysis Finite-State Morphology 10/48 • Changes relevance of other categories (such as person and number) ⇒ in a sense, these • Tense: • P (present – přítomný ), M (past – minulý ), F (future – budoucí ) • Voice: • A (active – činný ), P (passive – trpný ) • Mood: • N (indicative – oznamovací ), R (imperative – rozkazovací ), C (conditional – podmiňovací )

  13. Style and/or Variant colloquial, inappropriate in written discourse Finite-State Morphology Morphological Analysis special usage (e.g. after some prepositions) 9 colloquial like 6 but less preferred by speakers 7 6 1 colloquial, tolerated both in spoken and in written discourse 5 very archaic or colloquial variant 3 other variant, very rare, archaic or literary 2 other variant, less frequent 11/48

  14. The Penn Treebank Tagset conjunction Finite-State Morphology Morphological Analysis 12/48 10 LS list item marker 1 CC coordinating conjunction 11 MD modal 2 CD cardinal number 12 NN noun, singular/mass 3 DT determiner 13 NNS noun, plural 4 EX existential there 14 NNP proper noun, singular 5 FW foreign word 15 NNPS proper noun, plural 6 IN preposition or subordinating 16 PDT predeterminer 7 JJ adjective 17 POS possessive ending 8 JJR adjective, comparative 18 PRP personal pronoun 9 JJS adjective, superlative 19 PRP$ possessive pronoun

  15. The Penn Treebank Tagset (doing) Finite-State Morphology Morphological Analysis (does) present (do) 13/48 20 RB adverb 30 VBN verb, past participle (done) 21 RBR adverb, comparative 31 VBP verb, non-3 rd person singular 22 RBS adverb, superlative 23 RP particle 32 VBZ verb, 3 rd person singular present 24 SYM symbol 25 TO to 33 WDT wh-determiner (which) 26 UH interjection 34 WP wh-pronoun (who) 27 VB verb, base (do) 35 WP$ possessive wh-pronoun (whose) 28 VBD verb, past (did) 36 WRB wh-adverb (where) 29 VBG verb, gerund or present participle 37 . period…

  16. Universal POS Tags http://universaldependencies.org/u/pos/index.html Finite-State Morphology Morphological Analysis 14/48 • PRON (pronoun) • DET (determiner) • AUX (auxiliary) • NOUN • NUM (numeral) • PROPN (proper noun) • ADP (adposition) • VERB • SCONJ (subordinating conjunction) • ADJ (adjective) • CCONJ (coordinating conjunction) • ADV (adverb) • PART (particle) • INTJ (interjection) • PUNCT (punctuation) • SYM (symbol) • X (unknown)

  17. Universal Features http://universaldependencies.org/u/feat/index.html Finite-State Morphology Morphological Analysis 15/48 • PronType (druh zájmena) • Definite(ness) (určitost) • NumType (druh číslovky) • Degree (stupeň) • Poss (přivlastňovací) • VerbForm (slovesný tvar) • Reflex (zvratné) • Mood (způsob) • Foreign (cizí slovo) • Tense (čas) • Abbr (zkratka) • Aspect (vid) • Typo (překlep) • Voice (slovesný rod) • Gender (rod) • Evident(iality) (zjevnost) • Animacy (životnost) • Polarity (zápor) • NounClass (jmenná třída) • Person (osoba) • Number (číslo) • Polite(ness) (zdvořilost) • Case (pád) • Clusivity (kluzivita)

  18. Part of Speech Morphological Analysis Finite-State Morphology 16/48 • Vague defjnitions, criteria or mixed nature • Looong tradition… (diffjcult to change) • Traditional linguistics: • Classifjcation difgers cross-linguistically! • Even among established classes, not just endemic minor parts of speech. • Computational linguistics: • Dozens of classes and subclasses • Signifjcant difgerences even within one language

  19. History Morphological Analysis Finite-State Morphology 17/48 • 4 th century BC: Sanskrit • European tradition (prevailing in modern linguistics): Ancient Greek • Plato (4 th century BC): sentence consists of nouns and verbs • Aristotle added “conjunctions” (included conjunctions, pronouns and articles) • End of 2 nd century BC: classifjcation stabilized at 8 categories ( Διονύσιος ὁ Θρᾷξ: Τέχνη Γραμματική / Dionysios o Thrax: Art of Grammar )

  20. Ancient Greek Word Classes process performed or undergone Finite-State Morphology Morphological Analysis 18/48 • Noun ( ὄνομα onoma ) • infmected for case, signifying a concrete or abstract entity • Verb ( ῥῆμα rēma ) • without case infmection, but infmected for tense, person and number, signifying an activity or • Participle ( μετοχή metochē ) • sharing the features of the verb and the noun • Interjection ( ἄρθρον arthron ) • expressing emotion alone • Pronoun ( ἀντωνυμία antōnymia ) • substitutable for a noun and marked for person • Preposition ( πρόθεσις prothesis ) • placed before other words in composition and in syntax • Adverb ( ἐπίρρημα epirrēma ) • without infmection, in modifjcation or in addition to a verb • Conjunction ( σύνδεσμος syndesmos ) • binding together the discourse and fjlling gaps in its interpretation

  21. Where Are Adjectives? langage. Paris, 1767 Morphological Analysis Finite-State Morphology 19/48 • The best matching Ancient Greek defjnition is that of nouns, and perhaps participles. • Adjectives are a relatively new (1767) invention from France: • Nicolas Beauzée: Grammaire générale, ou exposition raisonnée des éléments nécessaires du

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend