Finite-State Registered Automata and their uses in Natural Languages - - PowerPoint PPT Presentation

finite state registered automata and their uses in
SMART_READER_LITE
LIVE PREVIEW

Finite-State Registered Automata and their uses in Natural Languages - - PowerPoint PPT Presentation

Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans Finite-State Registered Automata and their uses in Natural Languages Yael Cohen-Sygal and Shuly Wintner Department of Computer


slide-1
SLIDE 1

Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans

Finite-State Registered Automata and their uses in Natural Languages

Yael Cohen-Sygal and Shuly Wintner Department of Computer Science University of Haifa September 1, 2005

Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

slide-2
SLIDE 2

Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans

Talk outline

Finite-state registered automata (FSRA) – previous work A regular expression language for FSRAs Dedicated regular expression operators for linguistic applications Finite-state registered transducers (FSRT) Implementation and evaluation Future plans

Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

slide-3
SLIDE 3

Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans

Finite-state registered automata (FSRA)

Finite state registered automata is a new computational model that augments FSAs with finite memory (registers) in a restricted way that saves space but does not add expressivity. The number of registers is finite, usually small, and eliminates the need to duplicate paths as it enables the FSA to ‘remember’ a finite number of symbols. In addition to being associated with an alphabet symbol, each arc is also associated with a series of actions on the registers, where each action can be one of the two:

read (R) – allows traversing an arc only if a designated register contains a specific symbol. write (W ) – writes a specific symbol into a designated register while traversing an arc.

Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

slide-4
SLIDE 4

Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans

FSRA example

L = {ab, ba} L = {ab, ba, aa, bb}

Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

slide-5
SLIDE 5

Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans

FSRA example

Arabic nouns: kitaab (book), $ams (sun) and daftar (notebook). The definite article in Arabic is the prefix ‘’al’:

Realized as ‘’al’ when preceding most consonants. The ‘l’ of the prefix assimilates to the first consonant of the noun when the latter is ‘d’, ‘$’, etc.

Arabic distinguishes between definite and indefinite case markers. Nominative case is realized as the suffix ‘u’ on definite nouns, ‘un’ on indefinite nouns. word nominative definite nominative indefinite kitaab ’alkitaabu kitaabun $ams ’a$$amsu $amsun daftar ’addaftaru daftarun

Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

slide-6
SLIDE 6

Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans

FSRA example

FSA for Arabic nominative definite and indefinite nouns:

Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

slide-7
SLIDE 7

Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans

FSRA example

FSRA for Arabic nominative definite and indefinite nouns:

Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

slide-8
SLIDE 8

Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans

FSRA summary

Previous work (Cohen-Sygal & Wintner, FSMNLP 2003):

Equivalence to regular languages Closure properties – direct implementation

Work contribution:

Regular expression language Dedicated operators FSRT Implementation and evaluation

Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

slide-9
SLIDE 9

Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans

Regular expression language – definition

Assume the regular expression syntax of XFST. Let a be a series of register operations. Let R be a regular expression. Then, the following are also regular expressions:

  • a ⊲ R
  • a ⊲ ⊲R
  • a ⊳ R
  • a ⊳ ⊳R

Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

slide-10
SLIDE 10

Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans

Regular expression language – denotation

Regular expression FSRA denotation R

A

qA

✍✌ ✎☞

  • a ⊲ R

A

qA q0

✍✌ ✎☞ ✍✌ ✎☞ ✲

ǫ, a

  • a ⊲ ⊲R

shift(A)

qA q0

✍✌ ✎☞ ✍✌ ✎☞ ✲

ǫ, a

Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

slide-11
SLIDE 11

Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans

Regular expression language – denotation

Regular expression FSRA denotation R

A

✐ ❡

  • a ⊳ R

A

✐ ❡ ✐ ✲

ǫ, a

  • a ⊳ ⊳R

shift(A)

✐ ❡ ✐ ✲

ǫ, a

Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

slide-12
SLIDE 12

Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans

Regular expression language – example

Arabic nominative definite and indefinite nouns:

Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

slide-13
SLIDE 13

Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans

Regular expression language – example

! Regular expression for Arabic nominative definite and indefinite nouns

define Prefix [<(W,1,indef)> ⊳ 0] | [<(W,1,def),(W,2,l)> ⊳ ’al] | [<(W,1,def),(W,2,$)> ⊳ ’a$] | [<(W,1,def),(W,2,d)>⊳ ’ad]; define Base [ [<(R,2,l)> ⊳ 0]|[<(R,1,indef)> ⊳ 0] ] [ [k i t a a b]|[q a m a r] ]; define $Base [ [<(R,2,$)> ⊳ 0]|[<(R,1,indef)> ⊳ 0] ] [$ a m s]; define dBase [ [<(R,2,d)> ⊳ 0]|[<(R,1,indef)> ⊳ 0] ] [d a f t a r]; define Suffix [<(R,1,def)> ⊲ u]|[<(R,1,indef)> ⊲ un]; define ArabicExample Prefix [Base | $Base | dBase] Suffix;

Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

slide-14
SLIDE 14

Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans

Regular expression language – example

Vowel harmony in Warlpiri: The vowel of suffixes agrees in certain aspects with the vowel

  • f the stem to which it is attached.

A simplified account of the phenomenon:

Suffixes come in two varieties, one with ‘i’ vowels and one with ‘u’ vowels. Stems whose last vowel is ‘i’ take suffixes of the first variety, whereas stems whose last vowel is ‘u’ or ‘a’ take the other variety.

Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

slide-15
SLIDE 15

Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans

Regular expression language – example

Vowel harmony in Warlpiri – examples: maliki+kil .i+l .i+lki+ji+li (dog+PROP+ERG+then+me+they) kud . u+kul .u+l .u+lku+ju+lu (child+PROP+ERG+then+me+they) minija+kul .u+l .u+lku+ju+lu (cat+PROP+ERG+then+me+they)

Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

slide-16
SLIDE 16

Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans

Regular expression language – example

!Regular expression for vowel harmony in Warlpiri define LexI [m a l i k i]; % words ending in ‘i’ define LexU [k u d u]; % words ending in ‘u’ define LexA [m i n i j a]; % words ending in ‘a’ ! Join all the lexicons and write to register 1 ! ‘u’ or ’i’ according to the stem‘s last vowel. define Stem [<(W,1,i)> ⊳ LexI] | [<(W,1,u)> ⊳ [LexU | LexA]]; ! Traverse the arc only if the scanned symbol is ! the content of register 1. define V [<(R,1,i)> ⊲ i] | [<(R,1,u)> ⊲ u]; define PROP [+ k V l V]; % PROP suffix define ERG [+ l V]; % ERG suffix define Then [+ l k V]; % suffix indicating ‘then’ define Me [+ j V]; % suffix indicating ‘me’ define They [+ l V]; % suffix indicating ‘they’ ! define the whole network define WarlpiriExample Stem PROP ERG Then Me They;

Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

slide-17
SLIDE 17

Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans

Dedicated regular expression operators

Circumfixes Interdigitation

Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

slide-18
SLIDE 18

Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans

Dedicated operators: circumfixes – definition

R ⊗ {β1γ1β2γ2 . . . βmγm} m ∈ N is the number of circumfixes. R is a regular expression denoting the set of bases. βi, γi are regular expressions over Σ denoting the prefix and suffix of the i-th circumfix, respectively. Notice: R, βi, γi may denote infinite sets

Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

slide-19
SLIDE 19

Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans

Dedicated operators: circumfixes – example

Participle-forming combinations in German: A simplified account of the phenomenon: German verbs in their present form take an ‘n’ suffix but in participle form they take the circumfix ge-t. Examples: s¨ auseln ‘rustle’ ges¨ auselt ‘rustled’ br¨ usten ‘brag’ gebr¨ ustet ‘bragged’

Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

slide-20
SLIDE 20

Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans

Dedicated operators: circumfixes – example

Participle-forming combinations in German: Regular expression: [s ¨ a u s e l | b r ¨ u s t e] ⊗ {ǫng et} Corresponding FSRA:

Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

slide-21
SLIDE 21

Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans

Dedicated operators: interdigitation

{α11, α12, ..., α1n, ..., αm1, αm2, ..., αmn} ⊕ {β11β12...β1nβ1n+1, ..., βk1βk2...βknβk n+1} n ∈ N is the number of slots (represented by ‘’). m ∈ N is the number of roots. αi is a regular expression representing the i-th root. βi is a regular expression representing the i-th pattern.

Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

slide-22
SLIDE 22

Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans

Finite state registered transducers (FSRT)

Denote relations over two finite alphabets. Add to each transition an output symbol. Equivalent to ordinary transducers. Direct implementation of closure properties is done in the same way as in FSRAs.

Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

slide-23
SLIDE 23

Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans

FSRT – example

N-bit incrementor: The goal: construct a transducer over Σ = {0, 1} whose input is a number in n-bit binary representation and whose output is the result of adding 1 to the input. The na¨ ıve solution: a transducer with only 5 states and 12 arcs, but this transducer is neither sequential nor sequentiable. A sequential transducer for an n-bit binary incrementor would require 2n states and a similar number of transitions. Using the FSRT model, a more efficient n-bit transducer can be constructed.

Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

slide-24
SLIDE 24

Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans

FSRT – example

3-bit FSRT incrementor: In the general case: 3n + 1 states, 6n arcs.

Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

slide-25
SLIDE 25

Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans

Space comparison between FSAs and FSRAs

Operation type States Arcs Registers Size Circumfixation FSA 811 3824 – 47kb (4 circumfixes, FSRA 356 360 1 16kb 1043 roots) Interdigitation FSA 12,527 31,077 – 451kb (20 patterns, FSRA 58 3259 2 67kb 1043 roots) 64-bit inc.

  • Seq. FST

47,779 49,858 – 1.24Mb FSRT 193 384 64 11kb

Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

slide-26
SLIDE 26

Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans

Time comparison between FSAs and FSRAs

200 words 1000 words 5000 words Circumfixation FSA 0.01s 0.02s 0.08s (4 circumfixes, FSRA 0.01s 0.02s 0.09s 1043 roots) Interdigitation FSA 0.01s 0.02s 1s (20 patterns, FSRA 0.35s 1.42s 10.11s 1043 roots) 64-bit inc.

  • Seq. FST

0.04s 0.17s 0.85s FSRT 0.14s 0.5s 2.3s

Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata

slide-27
SLIDE 27

Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans

Future plans

More dedicated operators. Full-scale implementation; implementation of a largescale non-trivial grammar (e.g., Hebrew morphology)

Yael Cohen-Sygal and Shuly Wintner Finite-State Registered Automata