finite state registered automata and their uses in
play

Finite-State Registered Automata and their uses in Natural Languages - PowerPoint PPT Presentation

Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans Finite-State Registered Automata and their uses in Natural Languages Yael Cohen-Sygal and Shuly Wintner Department of Computer


  1. Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans Finite-State Registered Automata and their uses in Natural Languages Yael Cohen-Sygal and Shuly Wintner Department of Computer Science University of Haifa September 1, 2005 Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

  2. Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans Talk outline Finite-state registered automata (FSRA) – previous work A regular expression language for FSRAs Dedicated regular expression operators for linguistic applications Finite-state registered transducers (FSRT) Implementation and evaluation Future plans Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

  3. Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans Finite-state registered automata (FSRA) Finite state registered automata is a new computational model that augments FSAs with finite memory (registers) in a restricted way that saves space but does not add expressivity. The number of registers is finite, usually small, and eliminates the need to duplicate paths as it enables the FSA to ‘remember’ a finite number of symbols. In addition to being associated with an alphabet symbol, each arc is also associated with a series of actions on the registers, where each action can be one of the two: read ( R ) – allows traversing an arc only if a designated register contains a specific symbol. write ( W ) – writes a specific symbol into a designated register while traversing an arc. Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

  4. Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans FSRA example L = { ab , ba } L = { ab , ba , aa , bb } Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

  5. Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans FSRA example Arabic nouns: kitaab (book), $ams (sun) and daftar (notebook). The definite article in Arabic is the prefix ‘’al’ : Realized as ‘’al’ when preceding most consonants. The ‘l’ of the prefix assimilates to the first consonant of the noun when the latter is ‘d’, ‘$’, etc. Arabic distinguishes between definite and indefinite case markers. Nominative case is realized as the suffix ‘u’ on definite nouns, ‘un’ on indefinite nouns. word nominative definite nominative indefinite kitaab ’alkitaabu kitaabun $ams ’a$$amsu $amsun daftar ’addaftaru daftarun Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

  6. Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans FSRA example FSA for Arabic nominative definite and indefinite nouns: Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

  7. Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans FSRA example FSRA for Arabic nominative definite and indefinite nouns: Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

  8. Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans FSRA summary Previous work (Cohen-Sygal & Wintner, FSMNLP 2003): Equivalence to regular languages Closure properties – direct implementation Work contribution: Regular expression language Dedicated operators FSRT Implementation and evaluation Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

  9. Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans Regular expression language – definition Assume the regular expression syntax of XFST. Let � a be a series of register operations. Let R be a regular expression. Then, the following are also regular expressions: � a ⊲ R � a ⊲ ⊲ R � a ⊳ R � a ⊳ ⊳ R Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

  10. Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans Regular expression language – denotation Regular expression FSRA denotation A ✎☞ R ⑦ q A ✍✌ 0 A ✎☞ ✎☞ ⑦ ǫ,� � a ⊲ R a q 0 q A ✲ ✍✌ ✍✌ 0 shift( A ) ✎☞ ✎☞ � a ⊲ ⊲ R ⑦ ǫ,� a q 0 q A ✲ ✍✌ ✍✌ 0 Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

  11. Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans Regular expression language – denotation Regular expression FSRA denotation A R ✐ ❡ A ✐ ǫ,� ❡ ✐ � a ⊳ R a ✲ shift( A ) � a ⊳ ⊳ R ✐ ǫ,� a ❡ ✐ ✲ Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

  12. Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans Regular expression language – example Arabic nominative definite and indefinite nouns: Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

  13. Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans Regular expression language – example ! Regular expression for Arabic nominative definite and indefinite nouns define Prefix [<(W,1,indef)> ⊳ 0] | [<(W,1,def),(W,2,l)> ⊳ ’al] | [<(W,1,def),(W,2,$)> ⊳ ’a$] | [<(W,1,def),(W,2,d)> ⊳ ’ad]; define Base [ [<(R,2,l)> ⊳ 0]|[<(R,1,indef)> ⊳ 0] ] [ [k i t a a b]|[q a m a r] ]; define $Base [ [<(R,2,$)> ⊳ 0]|[<(R,1,indef)> ⊳ 0] ] [$ a m s]; define dBase [ [<(R,2,d)> ⊳ 0]|[<(R,1,indef)> ⊳ 0] ] [d a f t a r]; define Suffix [<(R,1,def)> ⊲ u]|[<(R,1,indef)> ⊲ un]; define ArabicExample Prefix [Base | $Base | dBase] Suffix; Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

  14. Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans Regular expression language – example Vowel harmony in Warlpiri: The vowel of suffixes agrees in certain aspects with the vowel of the stem to which it is attached. A simplified account of the phenomenon: Suffixes come in two varieties, one with ‘i’ vowels and one with ‘u’ vowels. Stems whose last vowel is ‘i’ take suffixes of the first variety, whereas stems whose last vowel is ‘u’ or ‘a’ take the other variety. Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

  15. Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans Regular expression language – example Vowel harmony in Warlpiri – examples: maliki+kil .i+l .i+lki+ji+li (dog+PROP+ERG+then+me+they) kud . u+kul .u+l .u+lku+ju+lu (child+PROP+ERG+then+me+they) minija+kul .u+l .u+lku+ju+lu (cat+PROP+ERG+then+me+they) Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

  16. Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans Regular expression language – example !Regular expression for vowel harmony in Warlpiri define LexI [m a l i k i]; % words ending in ‘i’ define LexU [k u d u]; % words ending in ‘u’ define LexA [m i n i j a]; % words ending in ‘a’ ! Join all the lexicons and write to register 1 ! ‘u’ or ’i’ according to the stem‘s last vowel. define Stem [<(W,1,i)> ⊳ LexI] | [<(W,1,u)> ⊳ [LexU | LexA]]; ! Traverse the arc only if the scanned symbol is ! the content of register 1. define V [<(R,1,i)> ⊲ i] | [<(R,1,u)> ⊲ u]; define PROP [+ k V l V]; % PROP suffix define ERG [+ l V]; % ERG suffix define Then [+ l k V]; % suffix indicating ‘then’ define Me [+ j V]; % suffix indicating ‘me’ define They [+ l V]; % suffix indicating ‘they’ ! define the whole network define WarlpiriExample Stem PROP ERG Then Me They; Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

  17. Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans Dedicated regular expression operators Circumfixes Interdigitation Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

  18. Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans Dedicated operators: circumfixes – definition R ⊗ {� β 1 � γ 1 �� β 2 � γ 2 � . . . � β m � γ m �} m ∈ N is the number of circumfixes. R is a regular expression denoting the set of bases. β i , γ i are regular expressions over Σ denoting the prefix and suffix of the i -th circumfix, respectively. Notice: R , β i , γ i may denote infinite sets Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend