Applications in finite state automata The lexc language Kurt Eberle - - PowerPoint PPT Presentation

applications in finite state automata
SMART_READER_LITE
LIVE PREVIEW

Applications in finite state automata The lexc language Kurt Eberle - - PowerPoint PPT Presentation

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing Applications in finite state automata The lexc language Kurt Eberle kurt.eberle@uni-tuebingen.de (includes


slide-1
SLIDE 1

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Applications in finite state automata

The lexc language Kurt Eberle

kurt.eberle@uni-tuebingen.de

(includes material from Karttunen, Beesley, Butt and others)

November 29, 2016

1 / 56

slide-2
SLIDE 2

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Outline

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

2 / 56

slide-3
SLIDE 3

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Topics of this session

Understand lexc

◮ lexc files ◮ Continuation classes ◮ Defining lexical transducers ◮ Include definitions in lexc ◮ Useful strategies ◮ Integration and testing 3 / 56

slide-4
SLIDE 4

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Outline

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

4 / 56

slide-5
SLIDE 5

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

lexc files

◮ for definition of one or several lexica ◮ structure

◮ Multichar symbols declaration ◮ Declarations (definitions) ◮ Named lexica

with one (start) lexicon named root

5 / 56

slide-6
SLIDE 6

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

lexc file

structure

6 / 56

slide-7
SLIDE 7

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

lexc file

The start lexicon

7 / 56

slide-8
SLIDE 8

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

lexc file

The start lexicon: notational variants

8 / 56

slide-9
SLIDE 9

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

lexc file

end of lexc files

9 / 56

slide-10
SLIDE 10

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

lexc file

special symbols

10 / 56

slide-11
SLIDE 11

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

lexc files

compile file

◮ read lexc < ex1-lex.txt

→ network: 1 stack element

◮ save stack ex1-lex.fst

11 / 56

slide-12
SLIDE 12

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

lexc file

compiled lexicon

12 / 56

slide-13
SLIDE 13

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

lexc file

a kind of abbreviation

13 / 56

slide-14
SLIDE 14

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Outline

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

14 / 56

slide-15
SLIDE 15

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Continuation classes

stems and continuations

◮ analyze lexical entries into ◮ stem (root) ◮ ending(s) (continuations)

15 / 56

slide-16
SLIDE 16

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Continuation classes

example

16 / 56

slide-17
SLIDE 17

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Continuation classes

Syntax

17 / 56

slide-18
SLIDE 18

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Continuation classes

continuation classes and morphotactics

◮ continuation classes inherited from Koskenniemi’s two-level

morphology

◮ more difficult to model:

◮ separated dependencies ◮ interdigitation ◮ infixation ◮ reduplication 18 / 56

slide-19
SLIDE 19

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Continuation classes

Multiple classes of words

◮ style: stem + affix

19 / 56

slide-20
SLIDE 20

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Continuation classes

Multiple classes of words

◮ PoS classified stems

20 / 56

slide-21
SLIDE 21

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Continuation classes

Optionality

◮ style: aspect morphem as an option

21 / 56

slide-22
SLIDE 22

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Continuation classes

Optionality

◮ variant: aspect morphem as an option

22 / 56

slide-23
SLIDE 23

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Continuation classes

Loops

◮ several instances of a class in a string ◮ (N.B. ’no procedural reading’)

23 / 56

slide-24
SLIDE 24

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Outline

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

24 / 56

slide-25
SLIDE 25

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Defining lexical transducers

Transitions in lexc

25 / 56

slide-26
SLIDE 26

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Lexical transducers

regular expressions

26 / 56

slide-27
SLIDE 27

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Lexical transducers

Multichars

◮ Introductory section!

27 / 56

slide-28
SLIDE 28

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Outline

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

28 / 56

slide-29
SLIDE 29

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Include definitions in lexc

Definitions section

◮ Defined vars

29 / 56

slide-30
SLIDE 30

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Outline

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

30 / 56

slide-31
SLIDE 31

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Useful strategies

◮ visualize the morphotactic templates before coding (informal

sketch)

◮ a lexicon should subsume a generally coherent collection of

morphemes

◮ a lexicon is a potential target for continuations from other

morphemes

◮ prefer most general solutions if possible (large classes) ◮ avoid multiple copies of a morpheme if possible ◮ separated dependencies are difficult to model with

continuation classes: use filters instead, if possible

◮ read error messages carefully. Be aware of loops and filters

increasing complexity heavily

31 / 56

slide-32
SLIDE 32

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Useful strategies

Visualization

32 / 56

slide-33
SLIDE 33

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Useful strategies

Organization into coherent lexicons

33 / 56

slide-34
SLIDE 34

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Useful strategies

Using sublexicons

34 / 56

slide-35
SLIDE 35

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Useful strategies

Classes and subclasses . . .

35 / 56

slide-36
SLIDE 36

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Useful strategies

Classes and subclasses: add irregularities 1

36 / 56

slide-37
SLIDE 37

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Useful strategies

Classes and subclasses: add irregularities 2 (better solution)

37 / 56

slide-38
SLIDE 38

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Useful strategies

Tag-names

◮ prime directive: appropriate and known tags (if possible) ◮ secondary directive: same name for same phenomenon ◮ tertiary directive: take up naming used in grammars of related

languages if possible

38 / 56

slide-39
SLIDE 39

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Useful strategies

Separated dependencies

◮ example: (pseudo) Arabic

39 / 56

slide-40
SLIDE 40

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Useful strategies

Separated dependencies

40 / 56

slide-41
SLIDE 41

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Separated dependencies

Bad strategy

41 / 56

slide-42
SLIDE 42

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Separated dependencies

Overgeneration . . .

42 / 56

slide-43
SLIDE 43

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Separated dependencies

Overgeneration examples

43 / 56

slide-44
SLIDE 44

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Separated dependencies

Filter to avoid overgeneration

44 / 56

slide-45
SLIDE 45

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Outline

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

45 / 56

slide-46
SLIDE 46

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Interation and testing

◮ get a feeling how components of a larger system interact ◮ thinking about languages ◮ thinking about transducers ◮ thinking about rules ◮ thinking about composition

46 / 56

slide-47
SLIDE 47

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Interaction and testing

languages

◮ avoid overgeneration ◮ avoid undergeneration ◮ model information via different modules

47 / 56

slide-48
SLIDE 48

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Interaction and testing

languages

48 / 56

slide-49
SLIDE 49

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Interaction and testing

transducers: Xerox ’world’

◮ upper language: description level ◮ lower language: string level (inflected form) ◮ analyze string (look up) ◮ generate string (look down)

49 / 56

slide-50
SLIDE 50

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Interaction and testing

Analyze string!

50 / 56

slide-51
SLIDE 51

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Interaction and testing

Generate string!

51 / 56

slide-52
SLIDE 52

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Interaction and testing

Be aware of setting

52 / 56

slide-53
SLIDE 53

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Interaction and testing

Be aware of setting

53 / 56

slide-54
SLIDE 54

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Interaction and testing

rules

◮ ’principle based’ ◮ easy to extend ◮ irregular (beau/belle): treat in lexc per transducer statement ◮ regular: use (postprocessing) rules

54 / 56

slide-55
SLIDE 55

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Interation and testing

composition

55 / 56

slide-56
SLIDE 56

lexc files Continuation classes Defining lexical transducers Include definitions in lexc Useful strategies Interation and testing

Interation and testing

composition

56 / 56