Finite-State Transducers: Applications in Natural Language Processing
Heli Uibo
Institute of Computer Science University of Tartu Heli.Uibo@ut.ee
This watermark does not appear in the registered version - http://www.clicktoconvert.com
Finite-State Transducers: Applications in Natural Language - - PowerPoint PPT Presentation
This watermark does not appear in the registered version - http://www.clicktoconvert.com Finite-State Transducers: Applications in Natural Language Processing Heli Uibo Institute of Computer Science University of Tartu Heli.Uibo@ut.ee This
Institute of Computer Science University of Tartu Heli.Uibo@ut.ee
This watermark does not appear in the registered version - http://www.clicktoconvert.com
FSA and FST: operations, properties Natural languages vs. Chomsky’s hierarchy FST-s: application areas in NLP Finite-state computational morphology Author’s contribution: Estonian finite-state
Different morphology-based applications Conclusion
This watermark does not appear in the registered version - http://www.clicktoconvert.com
This watermark does not appear in the registered version - http://www.clicktoconvert.com
This watermark does not appear in the registered version - http://www.clicktoconvert.com
epsilon-free deterministic minimized
This watermark does not appear in the registered version - http://www.clicktoconvert.com
“English is not a finite state language.” (Chomsky
Chomsky’s hierarchy:
This watermark does not appear in the registered version - http://www.clicktoconvert.com
The Chomsky’s claim was about syntax
Proved by (theoretically unbounded) recursive
This watermark does not appear in the registered version - http://www.clicktoconvert.com
Syntax: phrase structure grammars (PSG) and
Morphology: context-sensitive rewrite rules (not-
This watermark does not appear in the registered version - http://www.clicktoconvert.com
Generative phonology by Chomsky&Halle (1968)
General form of rules: x → y / z _ w,
This watermark does not appear in the registered version - http://www.clicktoconvert.com
BUT: Writing large scale, practically usable
Finite-state devices have been "rediscovered" and
This watermark does not appear in the registered version - http://www.clicktoconvert.com
Finite-state methods have been especially successful for
The usability of FSA-s and FST-s in computational
Schützenberger, 1961: If we apply two FST-s sequentially,
This watermark does not appear in the registered version - http://www.clicktoconvert.com
Generalization to n FST-s: we manage without
1980 – the result was rediscovered by R. Kaplan and
This watermark does not appear in the registered version - http://www.clicktoconvert.com
Rule1
Rule2
Rulen
………..
This watermark does not appear in the registered version - http://www.clicktoconvert.com
Lexicon (word list) as FSA – compression of data! Bilingual dictionary as lexical transducer Morphological transducer (may be combined with
Each path from the initial state to a final state
This watermark does not appear in the registered version - http://www.clicktoconvert.com
This watermark does not appear in the registered version - http://www.clicktoconvert.com
This watermark does not appear in the registered version - http://www.clicktoconvert.com
This watermark does not appear in the registered version - http://www.clicktoconvert.com
This watermark does not appear in the registered version - http://www.clicktoconvert.com
This watermark does not appear in the registered version - http://www.clicktoconvert.com
stem flexion
suffixes (e.g. plural features and case endings)
This watermark does not appear in the registered version - http://www.clicktoconvert.com
productive derivation, using suffixes
compounding, using concatenation
This watermark does not appear in the registered version - http://www.clicktoconvert.com
Two-level model by K. Koskenniemi LexiconFST .o. RuleFST Three types of two-level rules: <=>, <=, => (formally
e.g. two-level rule a:b => L _ R is equivalent to
Linguists are used to rules of type
This watermark does not appear in the registered version - http://www.clicktoconvert.com
Phenomena handled by lexicons:
noun declination verb conjugation comparison of adjectives derivation compounding stem end alternations ne-se, 0-da, 0-me etc.
This watermark does not appear in the registered version - http://www.clicktoconvert.com
Handled by rules:
stem flexion
phonotactics
morphophonological distribution
This watermark does not appear in the registered version - http://www.clicktoconvert.com
LEXICON Verb lõika:lõiKa V2; ……….. LEXICON Verb-Deriv lõiga VD0; ……….. LEXICON VD0 tud+A:tud #; tu+S:tu S1; nud+A:nud #; nu+S:nu S1;
This watermark does not appear in the registered version - http://www.clicktoconvert.com
This watermark does not appear in the registered version - http://www.clicktoconvert.com
This watermark does not appear in the registered version - http://www.clicktoconvert.com
The experimental two-level morphology for
45 two-level rules The root lexicons include ≈2000 word roots. Over 200 small lexicons describe the stem end
This watermark does not appear in the registered version - http://www.clicktoconvert.com
avoid overgeneration of compound words
guess the analysis of unknown words (words not in
This watermark does not appear in the registered version - http://www.clicktoconvert.com
This watermark does not appear in the registered version - http://www.clicktoconvert.com
This watermark does not appear in the registered version - http://www.clicktoconvert.com