Comparing nondeterministic and quasideterministic finite-state - - PowerPoint PPT Presentation

comparing nondeterministic and quasideterministic finite
SMART_READER_LITE
LIVE PREVIEW

Comparing nondeterministic and quasideterministic finite-state - - PowerPoint PPT Presentation

Comparing nondeterministic and quasideterministic finite-state transducers built from morphological dictionaries Alicia Garrido-Alenda and Mikel L. Forcada Departament de Llenguatges i Sistemes Inform` atics Universitat dAlacant E-03071


slide-1
SLIDE 1

Comparing nondeterministic and quasideterministic finite-state transducers built from morphological dictionaries∗

Alicia Garrido-Alenda and Mikel L. Forcada Departament de Llenguatges i Sistemes Inform` atics Universitat d’Alacant E-03071 Alacant, Spain

SEPLN 2002, Valladolid

∗Funded by Caja de Ahorros del Mediterr´

aneo, Universitat d’Alacant and CICyT (project TIC2000-1599-C02-02).

1

slide-2
SLIDE 2

Index

Lexical transformations in NLP systems Aligned and unaligned dictionaries Transducers: quasideterministic and nondeterministic Building transducers from dictionaries Comparing quasi- and non-deterministic transducers Closing comments

2

slide-3
SLIDE 3

Index

Lexical transformations in NLP systems Aligned and unaligned dictionaries Transducers: quasideterministic and nondeterministic Building transducers from dictionaries Comparing quasi- and non-deterministic transducers Closing comments

2

slide-4
SLIDE 4

Index

Lexical transformations in NLP systems Aligned and unaligned dictionaries Transducers: quasideterministic and nondeterministic Building transducers from dictionaries Comparing quasi- and non-deterministic transducers Closing comments

2

slide-5
SLIDE 5

Index

Lexical transformations in NLP systems Aligned and unaligned dictionaries Transducers: quasideterministic and nondeterministic Building transducers from dictionaries Comparing quasi- and non-deterministic transducers Closing comments

2

slide-6
SLIDE 6

Index

Lexical transformations in NLP systems Aligned and unaligned dictionaries Transducers: quasideterministic and nondeterministic Building transducers from dictionaries Comparing quasi- and non-deterministic transducers Closing comments

2

slide-7
SLIDE 7

Index

Lexical transformations in NLP systems Aligned and unaligned dictionaries Transducers: quasideterministic and nondeterministic Building transducers from dictionaries Comparing quasi- and non-deterministic transducers Closing comments

2

slide-8
SLIDE 8

Index

Lexical transformations in NLP systems Aligned and unaligned dictionaries Transducers: quasideterministic and nondeterministic Building transducers from dictionaries Comparing quasi- and non-deterministic transducers Closing comments

2

slide-9
SLIDE 9

Lexical transformations

Lexical transformations in NLP systems:

  • Morphological analysis: surface form → lexical form(s) [lemma

+ PoS + inflection info.]

  • Morphological generation: lexical form → surface form.
  • Lexical transfer (in MT): source lexical form → target lexical

form. Transformations usually specified in terms of (morphological, bilingual) dictionaries.

3

slide-10
SLIDE 10

Lexical transformations

Lexical transformations in NLP systems:

  • Morphological analysis: surface form → lexical form(s) [lemma

+ PoS + inflection info.]

  • Morphological generation: lexical form → surface form.
  • Lexical transfer (in MT): source lexical form → target lexical

form. Transformations usually specified in terms of (morphological, bilingual) dictionaries.

3

slide-11
SLIDE 11

Lexical transformations

Lexical transformations in NLP systems:

  • Morphological analysis: surface form → lexical form(s) [lemma

+ PoS + inflection info.]

  • Morphological generation: lexical form → surface form.
  • Lexical transfer (in MT): source lexical form → target lexical

form. Transformations usually specified in terms of (morphological, bilingual) dictionaries.

3

slide-12
SLIDE 12

Lexical transformations

Lexical transformations in NLP systems:

  • Morphological analysis: surface form → lexical form(s) [lemma

+ PoS + inflection info.]

  • Morphological generation: lexical form → surface form.
  • Lexical transfer (in MT): source lexical form → target lexical

form. Transformations usually specified in terms of (morphological, bilingual) dictionaries.

3

slide-13
SLIDE 13

Lexical transformations

Lexical transformations in NLP systems:

  • Morphological analysis: surface form → lexical form(s) [lemma

+ PoS + inflection info.]

  • Morphological generation: lexical form → surface form.
  • Lexical transfer (in MT): source lexical form → target lexical

form. Transformations usually specified in terms of (morphological, bilingual) dictionaries.

3

slide-14
SLIDE 14

Lexical transformations

Lexical transformations in NLP systems:

  • Morphological analysis: surface form → lexical form(s) [lemma

+ PoS + inflection info.]

  • Morphological generation: lexical form → surface form.
  • Lexical transfer (in MT): source lexical form → target lexical

form. Transformations usually specified in terms of (morphological, bilingual) dictionaries.

3

slide-15
SLIDE 15

Aligned and unaligned dictionaries

Unaligned dictionary: simple list of (input string, output string) pairs. (record´ ais, recordar<vblex><pri><2><pl>) (recuerdo, recordar<vblex><pri><1><sg>) (recuerdo, recuerdo<n><m><sg>) Aligned dictionary: list of sequences of (input substring, output substring) pairs expressing linguistic regularities. (re, re)(c, c)(o, o)(rd, rd)(´ ais, ar<vblex><2><pl>) (re, re)(c, c)(ue, o)(rd, rd)(o, ar<vblex><1><sg>) (re, re)(c, c)(ue, ue)(rd, rd)(o, o<n><m><sg>)

4

slide-16
SLIDE 16

Aligned and unaligned dictionaries

Unaligned dictionary: simple list of (input string, output string) pairs. (record´ ais, recordar<vblex><pri><2><pl>) (recuerdo, recordar<vblex><pri><1><sg>) (recuerdo, recuerdo<n><m><sg>) Aligned dictionary: list of sequences of (input substring, output substring) pairs expressing linguistic regularities. (re, re)(c, c)(o, o)(rd, rd)(´ ais, ar<vblex><2><pl>) (re, re)(c, c)(ue, o)(rd, rd)(o, ar<vblex><1><sg>) (re, re)(c, c)(ue, ue)(rd, rd)(o, o<n><m><sg>)

4

slide-17
SLIDE 17

Aligned and unaligned dictionaries

Unaligned dictionary: simple list of (input string, output string) pairs. (record´ ais, recordar<vblex><pri><2><pl>) (recuerdo, recordar<vblex><pri><1><sg>) (recuerdo, recuerdo<n><m><sg>) Aligned dictionary: list of sequences of (input substring, output substring) pairs expressing linguistic regularities. (re, re)(c, c)(o, o)(rd, rd)(´ ais, ar<vblex><2><pl>) (re, re)(c, c)(ue, o)(rd, rd)(o, ar<vblex><1><sg>) (re, re)(c, c)(ue, ue)(rd, rd)(o, o<n><m><sg>)

4

slide-18
SLIDE 18

Aligned and unaligned dictionaries

Unaligned dictionary: simple list of (input string, output string) pairs. (record´ ais, recordar<vblex><pri><2><pl>) (recuerdo, recordar<vblex><pri><1><sg>) (recuerdo, recuerdo<n><m><sg>) Aligned dictionary: list of sequences of (input substring, output substring) pairs expressing linguistic regularities. (re, re)(c, c)(o, o)(rd, rd)(´ ais, ar<vblex><2><pl>) (re, re)(c, c)(ue, o)(rd, rd)(o, ar<vblex><1><sg>) (re, re)(c, c)(ue, ue)(rd, rd)(o, o<n><m><sg>)

4

slide-19
SLIDE 19

Aligned and unaligned dictionaries

Unaligned dictionary: simple list of (input string, output string) pairs. (record´ ais, recordar<vblex><pri><2><pl>) (recuerdo, recordar<vblex><pri><1><sg>) (recuerdo, recuerdo<n><m><sg>) Aligned dictionary: list of sequences of (input substring, output substring) pairs expressing linguistic regularities. (re, re)(c, c)(o, o)(rd, rd)(´ ais, ar<vblex><2><pl>) (re, re)(c, c)(ue, o)(rd, rd)(o, ar<vblex><1><sg>) (re, re)(c, c)(ue, ue)(rd, rd)(o, o<n><m><sg>)

4

slide-20
SLIDE 20

Aligned and unaligned dictionaries

Unaligned dictionary: simple list of (input string, output string) pairs. (record´ ais, recordar<vblex><pri><2><pl>) (recuerdo, recordar<vblex><pri><1><sg>) (recuerdo, recuerdo<n><m><sg>) Aligned dictionary: list of sequences of (input substring, output substring) pairs expressing linguistic regularities. (re, re)(c, c)(o, o)(rd, rd)(´ ais, ar<vblex><2><pl>) (re, re)(c, c)(ue, o)(rd, rd)(o, ar<vblex><1><sg>) (re, re)(c, c)(ue, ue)(rd, rd)(o, o<n><m><sg>)

4

slide-21
SLIDE 21

Aligned and unaligned dictionaries

Unaligned dictionary: simple list of (input string, output string) pairs. (record´ ais, recordar<vblex><pri><2><pl>) (recuerdo, recordar<vblex><pri><1><sg>) (recuerdo, recuerdo<n><m><sg>) Aligned dictionary: list of sequences of (input substring, output substring) pairs expressing linguistic regularities. (re, re)(c, c)(o, o)(rd, rd)(´ ais, ar<vblex><2><pl>) (re, re)(c, c)(ue, o)(rd, rd)(o, ar<vblex><1><sg>) (re, re)(c, c)(ue, ue)(rd, rd)(o, o<n><m><sg>)

4

slide-22
SLIDE 22

Aligned and unaligned dictionaries

Unaligned dictionary: simple list of (input string, output string) pairs. (record´ ais, recordar<vblex><pri><2><pl>) (recuerdo, recordar<vblex><pri><1><sg>) (recuerdo, recuerdo<n><m><sg>) Aligned dictionary: list of sequences of (input substring, output substring) pairs expressing linguistic regularities. (re, re)(c, c)(o, o)(rd, rd)(´ ais, ar<vblex><2><pl>) (re, re)(c, c)(ue, o)(rd, rd)(o, ar<vblex><1><sg>) (re, re)(c, c)(ue, ue)(rd, rd)(o, o<n><m><sg>)

4

slide-23
SLIDE 23

Aligned and unaligned dictionaries

Unaligned dictionary: simple list of (input string, output string) pairs. (record´ ais, recordar<vblex><pri><2><pl>) (recuerdo, recordar<vblex><pri><1><sg>) (recuerdo, recuerdo<n><m><sg>) Aligned dictionary: list of sequences of (input substring, output substring) pairs expressing linguistic regularities. (re, re)(c, c)(o, o)(rd, rd)(´ ais, ar<vblex><2><pl>) (re, re)(c, c)(ue, o)(rd, rd)(o, ar<vblex><1><sg>) (re, re)(c, c)(ue, ue)(rd, rd)(o, o<n><m><sg>)

4

slide-24
SLIDE 24

Transducers: quasi- and non-deterministic/1

Many lexical transformations in Indoeuropean languages may be performed sequentially using transducers:

  • reading the input left to right;
  • incrementally building:

– a prefix of the output (deterministic transducers), or – a set of candidate prefixes of the output (nondeterministic transducers). Sequential processing possible because inputs sharing a prefix correspond to outputs sharing a nontrivial prefix.

5

slide-25
SLIDE 25

Transducers: quasi- and non-deterministic/1

Many lexical transformations in Indoeuropean languages may be performed sequentially using transducers:

  • reading the input left to right;
  • incrementally building:

– a prefix of the output (deterministic transducers), or – a set of candidate prefixes of the output (nondeterministic transducers). Sequential processing possible because inputs sharing a prefix correspond to outputs sharing a nontrivial prefix.

5

slide-26
SLIDE 26

Transducers: quasi- and non-deterministic/1

Many lexical transformations in Indoeuropean languages may be performed sequentially using transducers:

  • reading the input left to right;
  • incrementally building:

– a prefix of the output (deterministic transducers), or – a set of candidate prefixes of the output (nondeterministic transducers). Sequential processing possible because inputs sharing a prefix correspond to outputs sharing a nontrivial prefix.

5

slide-27
SLIDE 27

Transducers: quasi- and non-deterministic/1

Many lexical transformations in Indoeuropean languages may be performed sequentially using transducers:

  • reading the input left to right;
  • incrementally building:

– a prefix of the output (deterministic transducers), or – a set of candidate prefixes of the output (nondeterministic transducers). Sequential processing possible because inputs sharing a prefix correspond to outputs sharing a nontrivial prefix.

5

slide-28
SLIDE 28

Transducers: quasi- and non-deterministic/1

Many lexical transformations in Indoeuropean languages may be performed sequentially using transducers:

  • reading the input left to right;
  • incrementally building:

– a prefix of the output (deterministic transducers), or – a set of candidate prefixes of the output (nondeterministic transducers). Sequential processing possible because inputs sharing a prefix correspond to outputs sharing a nontrivial prefix.

5

slide-29
SLIDE 29

Transducers: quasi- and non-deterministic/1

Many lexical transformations in Indoeuropean languages may be performed sequentially using transducers:

  • reading the input left to right;
  • incrementally building:

– a prefix of the output (deterministic transducers), or – a set of candidate prefixes of the output (nondeterministic transducers). Sequential processing possible because inputs sharing a prefix correspond to outputs sharing a nontrivial prefix.

5

slide-30
SLIDE 30

Transducers: quasi- and non-deterministic/1

Many lexical transformations in Indoeuropean languages may be performed sequentially using transducers:

  • reading the input left to right;
  • incrementally building:

– a prefix of the output (deterministic transducers), or – a set of candidate prefixes of the output (nondeterministic transducers). Sequential processing possible because inputs sharing a prefix correspond to outputs sharing a nontrivial prefix.

5

slide-31
SLIDE 31

Transducers: quasi- and non-deterministic/2

Deterministic, incremental processing: deliver the longest com- mon output prefix corresponding to all inputs sharing the current input prefix. In deterministic (“earliest p-subsequential” transducers):

  • states represent sets of prefixes sharing a common output

behavior;

  • a single state is reached for each state and input symbol;
  • output is associated to state-to-state transitions: the longest

common output prefix is built incrementally. Dictionary alignments ignored: “deterministic alignment”

6

slide-32
SLIDE 32

Transducers: quasi- and non-deterministic/2

Deterministic, incremental processing: deliver the longest com- mon output prefix corresponding to all inputs sharing the current input prefix. In deterministic (“earliest p-subsequential” transducers):

  • states represent sets of prefixes sharing a common output

behavior;

  • a single state is reached for each state and input symbol;
  • output is associated to state-to-state transitions: the longest

common output prefix is built incrementally. Dictionary alignments ignored: “deterministic alignment”

6

slide-33
SLIDE 33

Transducers: quasi- and non-deterministic/2

Deterministic, incremental processing: deliver the longest com- mon output prefix corresponding to all inputs sharing the current input prefix. In deterministic (“earliest p-subsequential” transducers):

  • states represent sets of prefixes sharing a common output

behavior;

  • a single state is reached for each state and input symbol;
  • output is associated to state-to-state transitions: the longest

common output prefix is built incrementally. Dictionary alignments ignored: “deterministic alignment”

6

slide-34
SLIDE 34

Transducers: quasi- and non-deterministic/2

Deterministic, incremental processing: deliver the longest com- mon output prefix corresponding to all inputs sharing the current input prefix. In deterministic (“earliest p-subsequential” transducers):

  • states represent sets of prefixes sharing a common output

behavior;

  • a single state is reached for each state and input symbol;
  • output is associated to state-to-state transitions: the longest

common output prefix is built incrementally. Dictionary alignments ignored: “deterministic alignment”

6

slide-35
SLIDE 35

Transducers: quasi- and non-deterministic/2

Deterministic, incremental processing: deliver the longest com- mon output prefix corresponding to all inputs sharing the current input prefix. In deterministic (“earliest p-subsequential” transducers):

  • states represent sets of prefixes sharing a common output

behavior;

  • a single state is reached for each state and input symbol;
  • output is associated to state-to-state transitions: the longest

common output prefix is built incrementally. Dictionary alignments ignored: “deterministic alignment”

6

slide-36
SLIDE 36

Transducers: quasi- and non-deterministic/2

Deterministic, incremental processing: deliver the longest com- mon output prefix corresponding to all inputs sharing the current input prefix. In deterministic (“earliest p-subsequential” transducers):

  • states represent sets of prefixes sharing a common output

behavior;

  • a single state is reached for each state and input symbol;
  • output is associated to state-to-state transitions: the longest

common output prefix is built incrementally. Dictionary alignments ignored: “deterministic alignment”

6

slide-37
SLIDE 37

Transducers: quasi- and non-deterministic/2

Deterministic, incremental processing: deliver the longest com- mon output prefix corresponding to all inputs sharing the current input prefix. In deterministic (“earliest p-subsequential” transducers):

  • states represent sets of prefixes sharing a common output

behavior;

  • a single state is reached for each state and input symbol;
  • output is associated to state-to-state transitions: the longest

common output prefix is built incrementally. Dictionary alignments ignored: “deterministic alignment”

[Details]

6

slide-38
SLIDE 38

Transducers: quasi- and non-deterministic/3

Full determinism impossible (hence the name quasideterministic) due to one-to-many (many ≤ p) correspondences:

  • only the longest common output prefix of all outputs (a

proper prefix) can be output at the end of the input τ(recuerdo) = {recordar<vblex> . . . , recuerdo<n> . . .} LCP(τ(recuerdo)) = rec

  • (at most p) output suffixes have to be appended at accep-

tance states. (rec)−1τ(recuerdo) = {ordar<vblex> . . . , uerdo<n> . . .}

7

slide-39
SLIDE 39

Transducers: quasi- and non-deterministic/3

Full determinism impossible (hence the name quasideterministic) due to one-to-many (many ≤ p) correspondences:

  • only the longest common output prefix of all outputs (a

proper prefix) can be output at the end of the input τ(recuerdo) = {recordar<vblex> . . . , recuerdo<n> . . .} LCP(τ(recuerdo)) = rec

  • (at most p) output suffixes have to be appended at accep-

tance states. (rec)−1τ(recuerdo) = {ordar<vblex> . . . , uerdo<n> . . .}

7

slide-40
SLIDE 40

Transducers: quasi- and non-deterministic/3

Full determinism impossible (hence the name quasideterministic) due to one-to-many (many ≤ p) correspondences:

  • only the longest common output prefix of all outputs (a

proper prefix) can be output at the end of the input τ(recuerdo) = {recordar<vblex> . . . , recuerdo<n> . . .} LCP(τ(recuerdo)) = rec

  • (at most p) output suffixes have to be appended at accep-

tance states. (rec)−1τ(recuerdo) = {ordar<vblex> . . . , uerdo<n> . . .}

7

slide-41
SLIDE 41

Transducers: quasi- and non-deterministic/3

Full determinism impossible (hence the name quasideterministic) due to one-to-many (many ≤ p) correspondences:

  • only the longest common output prefix of all outputs (a

proper prefix) can be output at the end of the input τ(recuerdo) = {recordar<vblex> . . . , recuerdo<n> . . .} LCP(τ(recuerdo)) = rec

  • (at most p) output suffixes have to be appended at accep-

tance states. (rec)−1τ(recuerdo) = {ordar<vblex> . . . , uerdo<n> . . .}

7

slide-42
SLIDE 42

Transducers: quasi- and non-deterministic/3

Full determinism impossible (hence the name quasideterministic) due to one-to-many (many ≤ p) correspondences:

  • only the longest common output prefix of all outputs (a

proper prefix) can be output at the end of the input τ(recuerdo) = {recordar<vblex> . . . , recuerdo<n> . . .} LCP(τ(recuerdo)) = rec

  • (at most p) output suffixes have to be appended at accep-

tance states. (rec)−1τ(recuerdo) = {ordar<vblex> . . . , uerdo<n> . . .}

7

slide-43
SLIDE 43

Transducers: quasi- and non-deterministic/3

Full determinism impossible (hence the name quasideterministic) due to one-to-many (many ≤ p) correspondences:

  • only the longest common output prefix of all outputs (a

proper prefix) can be output at the end of the input τ(recuerdo) = {recordar<vblex> . . . , recuerdo<n> . . .} LCP(τ(recuerdo)) = rec

  • (at most p) output suffixes have to be appended at accep-

tance states. (rec)−1τ(recuerdo) = {ordar<vblex> . . . , uerdo<n> . . .}

7

slide-44
SLIDE 44

Transducers: quasi- and non-deterministic/4

Disadvantages of quasideterministic transducers:

  • Any linguistic knowledge encoded in dictionary alignments is

thrown away.

  • For large dictionaries, irregularities may lead to very short

longest common output prefixes and very long output suf- fixes.

  • Adding a new dictionary entry may force a complete recon-

struction (longest common output prefixes may change)

8

slide-45
SLIDE 45

Transducers: quasi- and non-deterministic/4

Disadvantages of quasideterministic transducers:

  • Any linguistic knowledge encoded in dictionary alignments is

thrown away.

  • For large dictionaries, irregularities may lead to very short

longest common output prefixes and very long output suf- fixes.

  • Adding a new dictionary entry may force a complete recon-

struction (longest common output prefixes may change)

8

slide-46
SLIDE 46

Transducers: quasi- and non-deterministic/4

Disadvantages of quasideterministic transducers:

  • Any linguistic knowledge encoded in dictionary alignments is

thrown away.

  • For large dictionaries, irregularities may lead to very short

longest common output prefixes and very long output suf- fixes.

  • Adding a new dictionary entry may force a complete recon-

struction (longest common output prefixes may change)

8

slide-47
SLIDE 47

Transducers: quasi- and non-deterministic/4

Disadvantages of quasideterministic transducers:

  • Any linguistic knowledge encoded in dictionary alignments is

thrown away.

  • For large dictionaries, irregularities may lead to very short

longest common output prefixes and very long output suf- fixes.

  • Adding a new dictionary entry may force a complete recon-

struction (longest common output prefixes may change)

8

slide-48
SLIDE 48

Transducers: quasi- and non-deterministic/4

Disadvantages of quasideterministic transducers:

  • Any linguistic knowledge encoded in dictionary alignments is

thrown away.

  • For large dictionaries, irregularities may lead to very short

longest common output prefixes and very long output suf- fixes.

  • Adding a new dictionary entry may force a complete recon-

struction (longest common output prefixes may change)

8

slide-49
SLIDE 49

Transducers: quasi- and non-deterministic/5

Nondeterministic transducers avoid this by maintaining several

  • utput prefix candidates for each input:
  • more than one state may be reached for each state and input

symbol;

  • output is associated to state-to-state transitions so that a

set of output prefix candidates is built incrementally by main- taining a set of alive state-output pairs during processing;

  • output suffixes are no longer necessary.

9

slide-50
SLIDE 50

Transducers: quasi- and non-deterministic/5

Nondeterministic transducers avoid this by maintaining several

  • utput prefix candidates for each input:
  • more than one state may be reached for each state and input

symbol;

  • output is associated to state-to-state transitions so that a

set of output prefix candidates is built incrementally by main- taining a set of alive state-output pairs during processing;

  • output suffixes are no longer necessary.

9

slide-51
SLIDE 51

Transducers: quasi- and non-deterministic/5

Nondeterministic transducers avoid this by maintaining several

  • utput prefix candidates for each input:
  • more than one state may be reached for each state and input

symbol;

  • output is associated to state-to-state transitions so that a

set of output prefix candidates is built incrementally by main- taining a set of alive state-output pairs during processing;

  • output suffixes are no longer necessary.

9

slide-52
SLIDE 52

Transducers: quasi- and non-deterministic/5

Nondeterministic transducers avoid this by maintaining several

  • utput prefix candidates for each input:
  • more than one state may be reached for each state and input

symbol;

  • output is associated to state-to-state transitions so that a

set of output prefix candidates is built incrementally by main- taining a set of alive state-output pairs during processing;

  • output suffixes are no longer necessary.

9

slide-53
SLIDE 53

Transducers: quasi- and non-deterministic/5

Nondeterministic transducers avoid this by maintaining several

  • utput prefix candidates for each input:
  • more than one state may be reached for each state and input

symbol;

  • output is associated to state-to-state transitions so that a

set of output prefix candidates is built incrementally by main- taining a set of alive state-output pairs during processing;

  • output suffixes are no longer necessary.

9

slide-54
SLIDE 54

Transducers: quasi- and non-deterministic/6

Advantages of nondeterministic transducers:

  • May be very compact! (when linguists are good at finding

regularities to align inputs and outputs) (see later).

  • When expressed as finite-state letter transducers (with tran-

sitions reading or writing at most one symbol), they may be determinized and minimized similarly to finite automata.

  • New entries may be added and removed without realignment

and maintaining minimality (Garrido et al., TMI-2002).

10

slide-55
SLIDE 55

Transducers: quasi- and non-deterministic/6

Advantages of nondeterministic transducers:

  • May be very compact! (when linguists are good at finding

regularities to align inputs and outputs) (see later).

  • When expressed as finite-state letter transducers (with tran-

sitions reading or writing at most one symbol), they may be determinized and minimized similarly to finite automata.

  • New entries may be added and removed without realignment

and maintaining minimality (Garrido et al., TMI-2002).

10

slide-56
SLIDE 56

Transducers: quasi- and non-deterministic/6

Advantages of nondeterministic transducers:

  • May be very compact! (when linguists are good at finding

regularities to align inputs and outputs) (see later).

  • When expressed as finite-state letter transducers (with tran-

sitions reading or writing at most one symbol), they may be determinized and minimized similarly to finite automata.

  • New entries may be added and removed without realignment

and maintaining minimality (Garrido et al., TMI-2002).

10

slide-57
SLIDE 57

Transducers: quasi- and non-deterministic/6

Advantages of nondeterministic transducers:

  • May be very compact! (when linguists are good at finding

regularities to align inputs and outputs) (see later).

  • When expressed as finite-state letter transducers (with tran-

sitions reading or writing at most one symbol), they may be determinized and minimized similarly to finite automata.

  • New entries may be added and removed without realignment

and maintaining minimality (Garrido et al., TMI-2002).

10

slide-58
SLIDE 58

Transducers: quasi- and non-deterministic/6

Advantages of nondeterministic transducers:

  • May be very compact! (when linguists are good at finding

regularities to align inputs and outputs) (see later).

  • When expressed as finite-state letter transducers (with tran-

sitions reading or writing at most one symbol), they may be determinized and minimized similarly to finite automata.

  • New entries may be added and removed without realignment

and maintaining minimality (Garrido et al., TMI-2002).

[Details]

10

slide-59
SLIDE 59

Building transducers from dictionaries/1

Building quasideterministic transducers from unaligned dic- tionaries [Details]

  • 1. Build a trie for the input strings of the dictionary (each prefix

in the input vocabulary is a state)

  • 2. Using the output strings, compute the longest common out-

put prefix (LCOP) for each prefix

  • 3. Associate as output of each transition the suffix necessary to

get the arrival state LCOP from the departure state LCOP

  • 4. Compute the remaining output suffixes necessary to com-

plete the output at each acceptance state from the LCOP

  • f that state
  • 5. Minimize the resulting transducer

11

slide-60
SLIDE 60

Building transducers from dictionaries/1

Building quasideterministic transducers from unaligned dic- tionaries [Details]

  • 1. Build a trie for the input strings of the dictionary (each prefix

in the input vocabulary is a state)

  • 2. Using the output strings, compute the longest common out-

put prefix (LCOP) for each prefix

  • 3. Associate as output of each transition the suffix necessary to

get the arrival state LCOP from the departure state LCOP

  • 4. Compute the remaining output suffixes necessary to com-

plete the output at each acceptance state from the LCOP

  • f that state
  • 5. Minimize the resulting transducer

11

slide-61
SLIDE 61

Building transducers from dictionaries/1

Building quasideterministic transducers from unaligned dic- tionaries [Details]

  • 1. Build a trie for the input strings of the dictionary (each prefix

in the input vocabulary is a state)

  • 2. Using the output strings, compute the longest common out-

put prefix (LCOP) for each prefix

  • 3. Associate as output of each transition the suffix necessary to

get the arrival state LCOP from the departure state LCOP

  • 4. Compute the remaining output suffixes necessary to com-

plete the output at each acceptance state from the LCOP

  • f that state
  • 5. Minimize the resulting transducer

11

slide-62
SLIDE 62

Building transducers from dictionaries/1

Building quasideterministic transducers from unaligned dic- tionaries [Details]

  • 1. Build a trie for the input strings of the dictionary (each prefix

in the input vocabulary is a state)

  • 2. Using the output strings, compute the longest common out-

put prefix (LCOP) for each prefix

  • 3. Associate as output of each transition the suffix necessary to

get the arrival state LCOP from the departure state LCOP

  • 4. Compute the remaining output suffixes necessary to com-

plete the output at each acceptance state from the LCOP

  • f that state
  • 5. Minimize the resulting transducer

11

slide-63
SLIDE 63

Building transducers from dictionaries/1

Building quasideterministic transducers from unaligned dic- tionaries [Details]

  • 1. Build a trie for the input strings of the dictionary (each prefix

in the input vocabulary is a state)

  • 2. Using the output strings, compute the longest common out-

put prefix (LCOP) for each prefix

  • 3. Associate as output of each transition the suffix necessary to

get the arrival state LCOP from the departure state LCOP

  • 4. Compute the remaining output suffixes necessary to com-

plete the output at each acceptance state from the LCOP

  • f that state
  • 5. Minimize the resulting transducer

11

slide-64
SLIDE 64

Building transducers from dictionaries/1

Building quasideterministic transducers from unaligned dic- tionaries [Details]

  • 1. Build a trie for the input strings of the dictionary (each prefix

in the input vocabulary is a state)

  • 2. Using the output strings, compute the longest common out-

put prefix (LCOP) for each prefix

  • 3. Associate as output of each transition the suffix necessary to

get the arrival state LCOP from the departure state LCOP

  • 4. Compute the remaining output suffixes necessary to com-

plete the output at each acceptance state from the LCOP

  • f that state
  • 5. Minimize the resulting transducer

11

slide-65
SLIDE 65

Building transducers from dictionaries/2

Building nondeterministic transducers from aligned dictio- naries [Details]

  • 1. Build a state path from the start state to an acceptance

state for each aligned pair in the dictionary (with transitions reading or writing zero or one characters)

  • 2. Determinize as a finite automaton using the input-output

pairs as the alphabet

  • 3. Minimize in the same way

12

slide-66
SLIDE 66

Building transducers from dictionaries/2

Building nondeterministic transducers from aligned dictio- naries [Details]

  • 1. Build a state path from the start state to an acceptance

state for each aligned pair in the dictionary (with transitions reading or writing zero or one characters)

  • 2. Determinize as a finite automaton using the input-output

pairs as the alphabet

  • 3. Minimize in the same way

12

slide-67
SLIDE 67

Building transducers from dictionaries/2

Building nondeterministic transducers from aligned dictio- naries [Details]

  • 1. Build a state path from the start state to an acceptance

state for each aligned pair in the dictionary (with transitions reading or writing zero or one characters)

  • 2. Determinize as a finite automaton using the input-output

pairs as the alphabet

  • 3. Minimize in the same way

12

slide-68
SLIDE 68

Building transducers from dictionaries/2

Building nondeterministic transducers from aligned dictio- naries [Details]

  • 1. Build a state path from the start state to an acceptance

state for each aligned pair in the dictionary (with transitions reading or writing zero or one characters)

  • 2. Determinize as a finite automaton using the input-output

pairs as the alphabet

  • 3. Minimize in the same way

12

slide-69
SLIDE 69

Comparing quasi- and non-deterministic trans- ducers/1 [Details]

  • Build both kinds of transducers from a set of representative

dictionaries

  • Convert quasideterministic transducers also into finite-state

letter transducers – unfolding transitions with outputs longer than 1 – creating letter-by-letter state paths for output suffixes at acceptance states

  • Determinize and minimize the resulting letter transducers
  • Compare (unfair without conversion:

LTs are more “rudi- mentary”)

13

slide-70
SLIDE 70

Comparing quasi- and non-deterministic trans- ducers/1 [Details]

  • Build both kinds of transducers from a set of representative

dictionaries

  • Convert quasideterministic transducers also into finite-state

letter transducers – unfolding transitions with outputs longer than 1 – creating letter-by-letter state paths for output suffixes at acceptance states

  • Determinize and minimize the resulting letter transducers
  • Compare (unfair without conversion:

LTs are more “rudi- mentary”)

13

slide-71
SLIDE 71

Comparing quasi- and non-deterministic trans- ducers/1 [Details]

  • Build both kinds of transducers from a set of representative

dictionaries

  • Convert quasideterministic transducers also into finite-state

letter transducers – unfolding transitions with outputs longer than 1 – creating letter-by-letter state paths for output suffixes at acceptance states

  • Determinize and minimize the resulting letter transducers
  • Compare (unfair without conversion:

LTs are more “rudi- mentary”)

13

slide-72
SLIDE 72

Comparing quasi- and non-deterministic trans- ducers/1 [Details]

  • Build both kinds of transducers from a set of representative

dictionaries

  • Convert quasideterministic transducers also into finite-state

letter transducers – unfolding transitions with outputs longer than 1 – creating letter-by-letter state paths for output suffixes at acceptance states

  • Determinize and minimize the resulting letter transducers
  • Compare (unfair without conversion:

LTs are more “rudi- mentary”)

13

slide-73
SLIDE 73

Comparing quasi- and non-deterministic trans- ducers/1 [Details]

  • Build both kinds of transducers from a set of representative

dictionaries

  • Convert quasideterministic transducers also into finite-state

letter transducers – unfolding transitions with outputs longer than 1 – creating letter-by-letter state paths for output suffixes at acceptance states

  • Determinize and minimize the resulting letter transducers
  • Compare (unfair without conversion:

LTs are more “rudi- mentary”)

13

slide-74
SLIDE 74

Comparing quasi- and non-deterministic trans- ducers/1 [Details]

  • Build both kinds of transducers from a set of representative

dictionaries

  • Convert quasideterministic transducers also into finite-state

letter transducers – unfolding transitions with outputs longer than 1 – creating letter-by-letter state paths for output suffixes at acceptance states

  • Determinize and minimize the resulting letter transducers
  • Compare (unfair without conversion:

LTs are more “rudi- mentary”)

13

slide-75
SLIDE 75

Comparing quasi- and non-deterministic trans- ducers/1 [Details]

  • Build both kinds of transducers from a set of representative

dictionaries

  • Convert quasideterministic transducers also into finite-state

letter transducers – unfolding transitions with outputs longer than 1 – creating letter-by-letter state paths for output suffixes at acceptance states

  • Determinize and minimize the resulting letter transducers
  • Compare (unfair without conversion:

LTs are more “rudi- mentary”)

13

slide-76
SLIDE 76

Comparing quasi- and non-deterministic trans- ducers/2

Results:

  • Without conversion, both kinds of transducers have roughly

the same number of states (comparison unfair to LT)

  • After conversion, nondeterministic transducers are consis-

tently 2.5 times more compact than quasideterministic trans- ducers

  • Observed nondeterminism (average number of ASOPs) is
  • f the order of corpus-computed ambiguity in dictionaries:

quasidet., 1.3; nondet., 1.5–1.9 (slightly worse)

14

slide-77
SLIDE 77

Comparing quasi- and non-deterministic trans- ducers/2

Results:

  • Without conversion, both kinds of transducers have roughly

the same number of states (comparison unfair to LT)

  • After conversion, nondeterministic transducers are consis-

tently 2.5 times more compact than quasideterministic trans- ducers

  • Observed nondeterminism (average number of ASOPs) is
  • f the order of corpus-computed ambiguity in dictionaries:

quasidet., 1.3; nondet., 1.5–1.9 (slightly worse)

14

slide-78
SLIDE 78

Comparing quasi- and non-deterministic trans- ducers/2

Results:

  • Without conversion, both kinds of transducers have roughly

the same number of states (comparison unfair to LT)

  • After conversion, nondeterministic transducers are consis-

tently 2.5 times more compact than quasideterministic trans- ducers

  • Observed nondeterminism (average number of ASOPs) is
  • f the order of corpus-computed ambiguity in dictionaries:

quasidet., 1.3; nondet., 1.5–1.9 (slightly worse)

14

slide-79
SLIDE 79

Comparing quasi- and non-deterministic trans- ducers/2

Results:

  • Without conversion, both kinds of transducers have roughly

the same number of states (comparison unfair to LT)

  • After conversion, nondeterministic transducers are consis-

tently 2.5 times more compact than quasideterministic trans- ducers

  • Observed nondeterminism (average number of ASOPs) is
  • f the order of corpus-computed ambiguity in dictionaries:

quasidet., 1.3; nondet., 1.5–1.9 (slightly worse)

14

slide-80
SLIDE 80

Concluding remarks

For lexical transformations, nondeterministic transducers are a viable alternative to quasideterministic transducers:

  • they are compact
  • their nondeterminism is limited
  • they are easily maintained

Nondeterministic letter transducers are in use in www.interNOSTRUM.com (a Spanish–Catalan MT system)

15

slide-81
SLIDE 81

Concluding remarks

For lexical transformations, nondeterministic transducers are a viable alternative to quasideterministic transducers:

  • they are compact
  • their nondeterminism is limited
  • they are easily maintained

Nondeterministic letter transducers are in use in www.interNOSTRUM.com (a Spanish–Catalan MT system)

15

slide-82
SLIDE 82

Concluding remarks

For lexical transformations, nondeterministic transducers are a viable alternative to quasideterministic transducers:

  • they are compact
  • their nondeterminism is limited
  • they are easily maintained

Nondeterministic letter transducers are in use in www.interNOSTRUM.com (a Spanish–Catalan MT system)

15

slide-83
SLIDE 83

Concluding remarks

For lexical transformations, nondeterministic transducers are a viable alternative to quasideterministic transducers:

  • they are compact
  • their nondeterminism is limited
  • they are easily maintained

Nondeterministic letter transducers are in use in www.interNOSTRUM.com (a Spanish–Catalan MT system)

15

slide-84
SLIDE 84

Concluding remarks

For lexical transformations, nondeterministic transducers are a viable alternative to quasideterministic transducers:

  • they are compact
  • their nondeterminism is limited
  • they are easily maintained

Nondeterministic letter transducers are in use in www.interNOSTRUM.com (a Spanish–Catalan MT system)

15

slide-85
SLIDE 85

Concluding remarks

For lexical transformations, nondeterministic transducers are a viable alternative to quasideterministic transducers:

  • they are compact
  • their nondeterminism is limited
  • they are easily maintained

Nondeterministic letter transducers are in use in www.interNOSTRUM.com (a Spanish–Catalan MT system)

15

slide-86
SLIDE 86

G R A C I A S

16

slide-87
SLIDE 87

Finite-state letter transducers/1

A (nondeterministic) finite-state letter transducer is T = (Q, L, δ, qI, F),

  • Q: finite set of states
  • L = (Σ ∪ {θ}) × (Γ ∪ {θ}): label alphabet (Σ: input alphabet,

Γ: output alphabet, θ: “empty symbol”)

  • δ : Q × L → 2Q: transition function
  • qI ∈ Q: initial state
  • F ⊆ Q: acceptance states

17

slide-88
SLIDE 88

Finite-state letter transducers/1

A (nondeterministic) finite-state letter transducer is T = (Q, L, δ, qI, F),

  • Q: finite set of states
  • L = (Σ ∪ {θ}) × (Γ ∪ {θ}): label alphabet (Σ: input alphabet,

Γ: output alphabet, θ: “empty symbol”)

  • δ : Q × L → 2Q: transition function
  • qI ∈ Q: initial state
  • F ⊆ Q: acceptance states

17

slide-89
SLIDE 89

Finite-state letter transducers/1

A (nondeterministic) finite-state letter transducer is T = (Q, L, δ, qI, F),

  • Q: finite set of states
  • L = (Σ ∪ {θ}) × (Γ ∪ {θ}): label alphabet (Σ: input alphabet,

Γ: output alphabet, θ: “empty symbol”)

  • δ : Q × L → 2Q: transition function
  • qI ∈ Q: initial state
  • F ⊆ Q: acceptance states

17

slide-90
SLIDE 90

Finite-state letter transducers/1

A (nondeterministic) finite-state letter transducer is T = (Q, L, δ, qI, F),

  • Q: finite set of states
  • L = (Σ ∪ {θ}) × (Γ ∪ {θ}): label alphabet (Σ: input alphabet,

Γ: output alphabet, θ: “empty symbol”)

  • δ : Q × L → 2Q: transition function
  • qI ∈ Q: initial state
  • F ⊆ Q: acceptance states

17

slide-91
SLIDE 91

Finite-state letter transducers/1

A (nondeterministic) finite-state letter transducer is T = (Q, L, δ, qI, F),

  • Q: finite set of states
  • L = (Σ ∪ {θ}) × (Γ ∪ {θ}): label alphabet (Σ: input alphabet,

Γ: output alphabet, θ: “empty symbol”)

  • δ : Q × L → 2Q: transition function
  • qI ∈ Q: initial state
  • F ⊆ Q: acceptance states

17

slide-92
SLIDE 92

Finite-state letter transducers/1

A (nondeterministic) finite-state letter transducer is T = (Q, L, δ, qI, F),

  • Q: finite set of states
  • L = (Σ ∪ {θ}) × (Γ ∪ {θ}): label alphabet (Σ: input alphabet,

Γ: output alphabet, θ: “empty symbol”)

  • δ : Q × L → 2Q: transition function
  • qI ∈ Q: initial state
  • F ⊆ Q: acceptance states

17

slide-93
SLIDE 93

Finite-state letter transducers/1

A (nondeterministic) finite-state letter transducer is T = (Q, L, δ, qI, F),

  • Q: finite set of states
  • L = (Σ ∪ {θ}) × (Γ ∪ {θ}): label alphabet (Σ: input alphabet,

Γ: output alphabet, θ: “empty symbol”)

  • δ : Q × L → 2Q: transition function
  • qI ∈ Q: initial state
  • F ⊆ Q: acceptance states

[back]

17

slide-94
SLIDE 94

Finite-state letter transducers/2

State-to-state arrows have input–output labels (σ, γ):

  • Input σ can be an input symbol from Σ or nothing (θ)
  • Output γ can be an output symbol from Γ or nothing (θ)

Clearly, (θ, θ) arrows do nothing may be avoided.

18

slide-95
SLIDE 95

Finite-state letter transducers/2

State-to-state arrows have input–output labels (σ, γ):

  • Input σ can be an input symbol from Σ or nothing (θ)
  • Output γ can be an output symbol from Γ or nothing (θ)

Clearly, (θ, θ) arrows do nothing may be avoided.

18

slide-96
SLIDE 96

Finite-state letter transducers/2

State-to-state arrows have input–output labels (σ, γ):

  • Input σ can be an input symbol from Σ or nothing (θ)
  • Output γ can be an output symbol from Γ or nothing (θ)

Clearly, (θ, θ) arrows do nothing may be avoided.

18

slide-97
SLIDE 97

Finite-state letter transducers/2

State-to-state arrows have input–output labels (σ, γ):

  • Input σ can be an input symbol from Σ or nothing (θ)
  • Output γ can be an output symbol from Γ or nothing (θ)

Clearly, (θ, θ) arrows do nothing may be avoided.

18

slide-98
SLIDE 98

Finite-state letter transducers/2

State-to-state arrows have input–output labels (σ, γ):

  • Input σ can be an input symbol from Σ or nothing (θ)
  • Output γ can be an output symbol from Γ or nothing (θ)

Clearly, (θ, θ) arrows do nothing may be avoided.

[back]

18

slide-99
SLIDE 99

Finite-state letter transducers/3

Using FSLT: keep a set of alive state–output pairs (SASOP), updated after reading each input symbol from w = σ[1]σ[2] . . . σ[|w|]. t = 0, initial SASOP: V[0] = {(q, z) : q ∈ δ∗(qI, (ǫ, z))}, where δ∗ is the extension of δ to input–output string pairs t → t + 1 (after reading σ[t]): V[t] = {(q, zγ) : q ∈ δ∗(q′, (σ[t], γ)) ∧ (q′, z) ∈ V[t−1]} t = |w| (at the end of w): τ(w) = {z : (q, z) ∈ V[|w|] ∧ q ∈ F}.

19

slide-100
SLIDE 100

Finite-state letter transducers/3

Using FSLT: keep a set of alive state–output pairs (SASOP), updated after reading each input symbol from w = σ[1]σ[2] . . . σ[|w|]. t = 0, initial SASOP: V[0] = {(q, z) : q ∈ δ∗(qI, (ǫ, z))}, where δ∗ is the extension of δ to input–output string pairs t → t + 1 (after reading σ[t]): V[t] = {(q, zγ) : q ∈ δ∗(q′, (σ[t], γ)) ∧ (q′, z) ∈ V[t−1]} t = |w| (at the end of w): τ(w) = {z : (q, z) ∈ V[|w|] ∧ q ∈ F}.

19

slide-101
SLIDE 101

Finite-state letter transducers/3

Using FSLT: keep a set of alive state–output pairs (SASOP), updated after reading each input symbol from w = σ[1]σ[2] . . . σ[|w|]. t = 0, initial SASOP: V[0] = {(q, z) : q ∈ δ∗(qI, (ǫ, z))}, where δ∗ is the extension of δ to input–output string pairs t → t + 1 (after reading σ[t]): V[t] = {(q, zγ) : q ∈ δ∗(q′, (σ[t], γ)) ∧ (q′, z) ∈ V[t−1]} t = |w| (at the end of w): τ(w) = {z : (q, z) ∈ V[|w|] ∧ q ∈ F}.

19

slide-102
SLIDE 102

Finite-state letter transducers/3

Using FSLT: keep a set of alive state–output pairs (SASOP), updated after reading each input symbol from w = σ[1]σ[2] . . . σ[|w|]. t = 0, initial SASOP: V[0] = {(q, z) : q ∈ δ∗(qI, (ǫ, z))}, where δ∗ is the extension of δ to input–output string pairs t → t + 1 (after reading σ[t]): V[t] = {(q, zγ) : q ∈ δ∗(q′, (σ[t], γ)) ∧ (q′, z) ∈ V[t−1]} t = |w| (at the end of w): τ(w) = {z : (q, z) ∈ V[|w|] ∧ q ∈ F}.

19

slide-103
SLIDE 103

Finite-state letter transducers/3

Using FSLT: keep a set of alive state–output pairs (SASOP), updated after reading each input symbol from w = σ[1]σ[2] . . . σ[|w|]. t = 0, initial SASOP: V[0] = {(q, z) : q ∈ δ∗(qI, (ǫ, z))}, where δ∗ is the extension of δ to input–output string pairs t → t + 1 (after reading σ[t]): V[t] = {(q, zγ) : q ∈ δ∗(q′, (σ[t], γ)) ∧ (q′, z) ∈ V[t−1]} t = |w| (at the end of w): τ(w) = {z : (q, z) ∈ V[|w|] ∧ q ∈ F}.

[back]

19

slide-104
SLIDE 104

Longest common output prefix

The longest common output prefix for input w is LCOP(w) = LCP(τ(ww−1E)) where

  • E ⊂ Σ∗ is the vocabulary of inputs,
  • τ : E → 2Γ∗ is the transformation function, and
  • ww−1E = {x ∈ E : w ∈ Pr(x)}.

[back]

20

slide-105
SLIDE 105

Building quasideterministic transducers: details/1

Build a p-subsequential transducer T = (Q, Σ, Γ, δ, λ, qI, ψ):

  • With a trie structure: Q = Pr(E) ∪ {⊥} (⊥ is the absorption

state), qI = ǫ, and δ(x, σ) =

if x, xσ ∈ Pr(E) ⊥

  • therwise
  • With transition outputs λ(x, σ) = (LCOP(x))−1LCOP(xσ)

for x, xσ ∈ Pr(E), and undefined otherwise.

  • With output suffix sets ψ(w) = (LCOP(w))−1τ(w).

[back]

21

slide-106
SLIDE 106

Building quasideterministic transducers: details/1

Build a p-subsequential transducer T = (Q, Σ, Γ, δ, λ, qI, ψ):

  • With a trie structure: Q = Pr(E) ∪ {⊥} (⊥ is the absorption

state), qI = ǫ, and δ(x, σ) =

if x, xσ ∈ Pr(E) ⊥

  • therwise
  • With transition outputs λ(x, σ) = (LCOP(x))−1LCOP(xσ)

for x, xσ ∈ Pr(E), and undefined otherwise.

  • With output suffix sets ψ(w) = (LCOP(w))−1τ(w).

[back]

21

slide-107
SLIDE 107

Building quasideterministic transducers: details/1

Build a p-subsequential transducer T = (Q, Σ, Γ, δ, λ, qI, ψ):

  • With a trie structure: Q = Pr(E) ∪ {⊥} (⊥ is the absorption

state), qI = ǫ, and δ(x, σ) =

if x, xσ ∈ Pr(E) ⊥

  • therwise
  • With transition outputs λ(x, σ) = (LCOP(x))−1LCOP(xσ)

for x, xσ ∈ Pr(E), and undefined otherwise.

  • With output suffix sets ψ(w) = (LCOP(w))−1τ(w).

[back]

21

slide-108
SLIDE 108

Building quasideterministic transducers: details/1

Build a p-subsequential transducer T = (Q, Σ, Γ, δ, λ, qI, ψ):

  • With a trie structure: Q = Pr(E) ∪ {⊥} (⊥ is the absorption

state), qI = ǫ, and δ(x, σ) =

if x, xσ ∈ Pr(E) ⊥

  • therwise
  • With transition outputs λ(x, σ) = (LCOP(x))−1LCOP(xσ)

for x, xσ ∈ Pr(E), and undefined otherwise.

  • With output suffix sets ψ(w) = (LCOP(w))−1τ(w).

[back]

21

slide-109
SLIDE 109

Building quasideterministic transducers: details/1

Build a p-subsequential transducer T = (Q, Σ, Γ, δ, λ, qI, ψ):

  • With a trie structure: Q = Pr(E) ∪ {⊥} (⊥ is the absorption

state), qI = ǫ, and δ(x, σ) =

if x, xσ ∈ Pr(E) ⊥

  • therwise
  • With transition outputs λ(x, σ) = (LCOP(x))−1LCOP(xσ)

for x, xσ ∈ Pr(E), and undefined otherwise.

  • With output suffix sets ψ(w) = (LCOP(w))−1τ(w).

[back]

21

slide-110
SLIDE 110

Building quasideterministic transducers: details/1

Build a p-subsequential transducer T = (Q, Σ, Γ, δ, λ, qI, ψ):

  • With a trie structure: Q = Pr(E) ∪ {⊥} (⊥ is the absorption

state), qI = ǫ, and δ(x, σ) =

if x, xσ ∈ Pr(E) ⊥

  • therwise
  • With transition outputs λ(x, σ) = (LCOP(x))−1LCOP(xσ)

for x, xσ ∈ Pr(E), and undefined otherwise.

  • With output suffix sets ψ(w) = (LCOP(w))−1τ(w).

[back]

21

slide-111
SLIDE 111

Building quasideterministic transducers: details/2

The resulting transducer is minimized using the equivalence class algorithm (which iteratively refines a partition of Q). Two different states q and r are not equivalent if

  • ψ(q) = ψ(r)
  • for some σ, δ(q, σ) not in the same class as δ(r, σ)
  • for some σ, λ(q, σ) = λ(r, σ).

[back]

22

slide-112
SLIDE 112

Building nondeterministic transducers: details/1

For each dictionary entry (a1, b1)(a2, b2) . . . (aN, bN) . . . . . . build a path qI

(a1,b1)

→ s1

(a2,b2)

→ s2 . . . (aN,bN) →

  • qF. . .

. . . from initial state qI to acceptance state qF. For example, (haces, haz<n><m><pl>). . . . . . may be aligned as (h, h)(a, a)(c, z)(θ, <n>)(θ, <m>)(e, θ)(s, <pl>).

23

slide-113
SLIDE 113

Building nondeterministic transducers: details/1

For each dictionary entry (a1, b1)(a2, b2) . . . (aN, bN) . . . . . . build a path qI

(a1,b1)

→ s1

(a2,b2)

→ s2 . . . (aN,bN) →

  • qF. . .

. . . from initial state qI to acceptance state qF. For example, (haces, haz<n><m><pl>). . . . . . may be aligned as (h, h)(a, a)(c, z)(θ, <n>)(θ, <m>)(e, θ)(s, <pl>).

23

slide-114
SLIDE 114

Building nondeterministic transducers: details/1

For each dictionary entry (a1, b1)(a2, b2) . . . (aN, bN) . . . . . . build a path qI

(a1,b1)

→ s1

(a2,b2)

→ s2 . . . (aN,bN) →

  • qF. . .

. . . from initial state qI to acceptance state qF. For example, (haces, haz<n><m><pl>). . . . . . may be aligned as (h, h)(a, a)(c, z)(θ, <n>)(θ, <m>)(e, θ)(s, <pl>).

23

slide-115
SLIDE 115

Building nondeterministic transducers: details/1

For each dictionary entry (a1, b1)(a2, b2) . . . (aN, bN) . . . . . . build a path qI

(a1,b1)

→ s1

(a2,b2)

→ s2 . . . (aN,bN) →

  • qF. . .

. . . from initial state qI to acceptance state qF. For example, (haces, haz<n><m><pl>). . . . . . may be aligned as (h, h)(a, a)(c, z)(θ, <n>)(θ, <m>)(e, θ)(s, <pl>).

23

slide-116
SLIDE 116

Building nondeterministic transducers: details/1

For each dictionary entry (a1, b1)(a2, b2) . . . (aN, bN) . . . . . . build a path qI

(a1,b1)

→ s1

(a2,b2)

→ s2 . . . (aN,bN) →

  • qF. . .

. . . from initial state qI to acceptance state qF. For example, (haces, haz<n><m><pl>). . . . . . may be aligned as (h, h)(a, a)(c, z)(θ, <n>)(θ, <m>)(e, θ)(s, <pl>).

23

slide-117
SLIDE 117

Building nondeterministic transducers: details/1

For each dictionary entry (a1, b1)(a2, b2) . . . (aN, bN) . . . . . . build a path qI

(a1,b1)

→ s1

(a2,b2)

→ s2 . . . (aN,bN) →

  • qF. . .

. . . from initial state qI to acceptance state qF. For example, (haces, haz<n><m><pl>). . . . . . may be aligned as (h, h)(a, a)(c, z)(θ, <n>)(θ, <m>)(e, θ)(s, <pl>).

[back]

23

slide-118
SLIDE 118

Building nondeterministic transducers: details/2

qI qF h:h a:a c:z

θ:<n> θ:<m>

e:θ s:<pl>

The resulting transducer is determinized and minimized.

24

slide-119
SLIDE 119

Building nondeterministic transducers: details/2

qI qF h:h a:a c:z

θ:<n> θ:<m>

e:θ s:<pl>

The resulting transducer is determinized and minimized.

[back]

24

slide-120
SLIDE 120

Converting into letter transducers: details

  • Transitions q (σ,γ1γ2...γn)

− → q′ with n > 1 . . . . . . are unfolded into state paths q (σ,γ1) − → s1

(θ,γ2)

− → s2 . . . (θ,γn) − → q′

  • For each state q and for each tail γ1γ2 . . . γn ∈ ψ(q), . . .

ldots build an inputless state path q (θ,γ1) − → s1

(θ,γ2)

− → s2 . . . (θ,γn) − → qF (the only source of input nondeterminism).

25

slide-121
SLIDE 121

Converting into letter transducers: details

  • Transitions q (σ,γ1γ2...γn)

− → q′ with n > 1 . . . . . . are unfolded into state paths q (σ,γ1) − → s1

(θ,γ2)

− → s2 . . . (θ,γn) − → q′

  • For each state q and for each tail γ1γ2 . . . γn ∈ ψ(q), . . .

ldots build an inputless state path q (θ,γ1) − → s1

(θ,γ2)

− → s2 . . . (θ,γn) − → qF (the only source of input nondeterminism).

25

slide-122
SLIDE 122

Converting into letter transducers: details

  • Transitions q (σ,γ1γ2...γn)

− → q′ with n > 1 . . . . . . are unfolded into state paths q (σ,γ1) − → s1

(θ,γ2)

− → s2 . . . (θ,γn) − → q′

  • For each state q and for each tail γ1γ2 . . . γn ∈ ψ(q), . . .

ldots build an inputless state path q (θ,γ1) − → s1

(θ,γ2)

− → s2 . . . (θ,γn) − → qF (the only source of input nondeterminism).

25

slide-123
SLIDE 123

Converting into letter transducers: details

  • Transitions q (σ,γ1γ2...γn)

− → q′ with n > 1 . . . . . . are unfolded into state paths q (σ,γ1) − → s1

(θ,γ2)

− → s2 . . . (θ,γn) − → q′

  • For each state q and for each tail γ1γ2 . . . γn ∈ ψ(q), . . .

ldots build an inputless state path q (θ,γ1) − → s1

(θ,γ2)

− → s2 . . . (θ,γn) − → qF (the only source of input nondeterminism).

25

slide-124
SLIDE 124

Converting into letter transducers: details

  • Transitions q (σ,γ1γ2...γn)

− → q′ with n > 1 . . . . . . are unfolded into state paths q (σ,γ1) − → s1

(θ,γ2)

− → s2 . . . (θ,γn) − → q′

  • For each state q and for each tail γ1γ2 . . . γn ∈ ψ(q), . . .

ldots build an inputless state path q (θ,γ1) − → s1

(θ,γ2)

− → s2 . . . (θ,γn) − → qF (the only source of input nondeterminism).

25

slide-125
SLIDE 125

Converting into letter transducers: details

  • Transitions q (σ,γ1γ2...γn)

− → q′ with n > 1 . . . . . . are unfolded into state paths q (σ,γ1) − → s1

(θ,γ2)

− → s2 . . . (θ,γn) − → q′

  • For each state q and for each tail γ1γ2 . . . γn ∈ ψ(q), . . .

ldots build an inputless state path q (θ,γ1) − → s1

(θ,γ2)

− → s2 . . . (θ,γn) − → qF (the only source of input nondeterminism).

[back]

25