Applications in finite state automata Organisation and Introduction - - PowerPoint PPT Presentation

applications in finite state automata
SMART_READER_LITE
LIVE PREVIEW

Applications in finite state automata Organisation and Introduction - - PowerPoint PPT Presentation

Organisational matters Introduction Plan of the Course Literature Applications in finite state automata Organisation and Introduction Kurt Eberle kurt.eberle@uni-tuebingen.de (includes material from Karttunen, Beesley, Butt and others)


slide-1
SLIDE 1

Organisational matters Introduction Plan of the Course Literature

Applications in finite state automata

Organisation and Introduction Kurt Eberle

kurt.eberle@uni-tuebingen.de

(includes material from Karttunen, Beesley, Butt and others)

October 25, 2016

1 / 43

slide-2
SLIDE 2

Organisational matters Introduction Plan of the Course Literature

Outline

Organisational matters Introduction Plan of the Course Literature

2 / 43

slide-3
SLIDE 3

Organisational matters Introduction Plan of the Course Literature

Goals of this session

Criteria for a certificate, times Finite State Automata? Why? Plan of the seminar

3 / 43

slide-4
SLIDE 4

Organisational matters Introduction Plan of the Course Literature

Outline

Organisational matters Introduction Plan of the Course Literature

4 / 43

slide-5
SLIDE 5

Organisational matters Introduction Plan of the Course Literature

Hours and place of the course

◮ Tuesday, 16:15–17:45

Place: VG Wilhelmstraße / 0.01

◮ Thursday, 16:15–17:00

Place: VG Wilhelmstraße / 1.13

5 / 43

slide-6
SLIDE 6

Organisational matters Introduction Plan of the Course Literature

Preconditions

◮ Zwischenpr¨

ufung (Courses 1st-4th semester)

◮ besides this: none

6 / 43

slide-7
SLIDE 7

Organisational matters Introduction Plan of the Course Literature

Criteria of a successful participation I

Formal Criteria

◮ Presentation of an application ◮ (written exam) ◮ (term paper)

Informal Criteria

7 / 43

slide-8
SLIDE 8

Organisational matters Introduction Plan of the Course Literature

Criteria of a successful participation II

◮ Class Participation!

Examination regulations of the Neuphilologische Fakult¨ at require that students attend courses regularly. If students do not attend a course meeting on more than two

  • ccasions in one semester without proper excuse (e.g. doctor’s

note), the course instructor has to give them a failing grade. Please do not put me in a position to have to fail you for this reason. If you cannot come to class, please email me ahead of time, if at all possible. You are expected to come on time. Being late without good reasons will count as not having attended a course meeting. If you own a mobile phone and carry it with you, please turn it off before class

8 / 43

slide-9
SLIDE 9

Organisational matters Introduction Plan of the Course Literature

Criteria of a successful participation III

Helpful

◮ Please take part actively! ◮ Ask questions, tell me when you do not understand such and

such topic!

◮ If there are problems: Give feedback, email, Sprechstunde ◮ When you try out things and work on the material:

Take into account: Hofstadter’s Law:

◮ It always takes longer than you expect, even when you take

into account Hofstadter’s Law.

9 / 43

slide-10
SLIDE 10

Organisational matters Introduction Plan of the Course Literature

Platform: Homepage of the course

Homepage

◮ URL: www.sfs.uni-tuebingen.de/˜keberle/ ◮ Key: ....

10 / 43

slide-11
SLIDE 11

Organisational matters Introduction Plan of the Course Literature

Main Objectives

◮ Get acquainted with using FSAs for representation of

languages and mapping between languages

◮ Study of Karttunen/Beesley’s Finite State Morphology ◮ Xerox’ FSA programming machinery ◮ Focus: implementation of morphological problems/tasks ◮ Others: tokenization, shallow syntax, . . .

11 / 43

slide-12
SLIDE 12

Organisational matters Introduction Plan of the Course Literature

Outline

Organisational matters Introduction Plan of the Course Literature

12 / 43

slide-13
SLIDE 13

Organisational matters Introduction Plan of the Course Literature

Motivation

Finite State Automata

◮ What is a finite state automaton (FSA)? ◮ example . . .

→ Karttunen/Beesley’s Cola Machine

13 / 43

slide-14
SLIDE 14

Organisational matters Introduction Plan of the Course Literature 14 / 43

slide-15
SLIDE 15

Organisational matters Introduction Plan of the Course Literature

Cola-FSA represents . . .

15 / 43

slide-16
SLIDE 16

Organisational matters Introduction Plan of the Course Literature

FSA formally . . .

A deterministic finite automaton M is a 5-tuple, (Q, S, d, q0, F), consisting of

◮ a finite set of states (Q) ◮ a transition function (d : QxS→Q) ◮ an initial or start state (q0 ∈ Q) ◮ a set of accept states (F ⊆ Q)

Let w = a1a2 . . . an be a string over the alphabet S. The automaton M accepts the string w if a sequence of states, r0, r1, . . . , rn, exists in Q with the following conditions:

◮ r0 = q0 ◮ ri+1 = d(ri, ai+1), for i = 0, . . . , n − 1 ◮ rn ∈ F.

16 / 43

slide-17
SLIDE 17

Organisational matters Introduction Plan of the Course Literature

Language of a FSA

17 / 43

slide-18
SLIDE 18

Organisational matters Introduction Plan of the Course Literature

Formally . . . Language and Grammar

◮ Formal language L ⊆ Σ∗ ◮ Word: w ∈ L ◮ Grammar: G = (V , Σ, P, S)

V : finite set of non-terminal symbols Σ : finite set of terminal symbols (V ∩ Σ = ∅) P : finite set of productions (grammar rules) S : start symbol of the grammar (S ∈ V )

◮ Productions p ∈ P have the form α → β, where

α ∈ (V ∪ Σ)∗V (V ∪ Σ)∗ and β ∈ (V ∪ Σ)∗

◮ Notational convention:

a, ai ∈ Σ; A, B, C ∈ V ; w, r ∈ Σ∗; α, β ∈ (V ∪ Σ)∗

18 / 43

slide-19
SLIDE 19

Organisational matters Introduction Plan of the Course Literature

Formally . . .

the cola language (CL)

Grammar of the language CL: G = (V , Σ, P, S) with P: S ← Q Q ← q Q ← D D N Q ← D N D Q ← N D D D d D N N N n G generates CL: . . . {q, ddn, dnd, ndd, nndn, . . . nnnnn } Question: dd ∈ G(CL) ?

19 / 43

slide-20
SLIDE 20

Organisational matters Introduction Plan of the Course Literature

The word problem

Task

◮ decide whether a string is a sentence/word of a language or

not!

20 / 43

slide-21
SLIDE 21

Organisational matters Introduction Plan of the Course Literature

Motivation

Finite State Automata

◮ Finite state automata correspond to regular expressions ◮ can recognize regular languages! ◮ There are other languages ˙

.

◮ → the Chomsky hierarchy of languages

21 / 43

slide-22
SLIDE 22

Organisational matters Introduction Plan of the Course Literature

Formal languages

Chomsky Hierarchy

◮ Type-0 grammars:

α → β (unrestricted)

◮ Type-1 grammars (context sensitive):

α → β with |α| ≤ |β| (exception: S → ε)

◮ Type-2 grammars (context free):

A → α

◮ Type-3 grammars (regular):

A → wB or A → w (right linear) and A → Bw or A → w (left linear) respectively where w ∈ Σ∗ ⇒ General phrase structure, Context sensitive, context free, regular languages

22 / 43

slide-23
SLIDE 23

Organisational matters Introduction Plan of the Course Literature

Formal languages

Cola Language is regular

S ← q S ← 2D n S ← 3N d 2D ← D n n 2D ← D d D ← d D ← n n 3N ← D n 3N ← n d

23 / 43

slide-24
SLIDE 24

Organisational matters Introduction Plan of the Course Literature

Chomsky Hierarchy - Examples

Context-free Languages

L = {anban | n ≥ 1} is not regular! G, Σ, S, R

24 / 43

slide-25
SLIDE 25

Organisational matters Introduction Plan of the Course Literature

Chomsky Hierarchy - Examples

Context-sensitive Languages

L = {anbncn | n ≥ 1} is not context-free! G, Σ, S, R with

25 / 43

slide-26
SLIDE 26

Organisational matters Introduction Plan of the Course Literature

Motivation

Finite State Automata

◮ What can be done with FSAs? ◮ → recognize regular languages:

Is (a+(c*d)) a correct arithemtic expression? (a+(c*d)) ∈ L(A)

26 / 43

slide-27
SLIDE 27

Organisational matters Introduction Plan of the Course Literature

Motivation

Why Finite State Automata/Regular languages?

◮ Nice properties!

← closure ← decidability ← complexity

27 / 43

slide-28
SLIDE 28

Organisational matters Introduction Plan of the Course Literature

Properties

Closure

If K and L are regular then also :

◮ K ∪ L ◮ K ∩ L ◮ -L ◮ K - L ◮ K L ◮ L*

28 / 43

slide-29
SLIDE 29

Organisational matters Introduction Plan of the Course Literature

Properties

Decidability

If K and L are regular then decidable :

◮ w ∈ L ? ◮ L ⊆ K ? ◮ L ∩ K = {} ? ◮ L = {} ? ◮ L = Σ*

29 / 43

slide-30
SLIDE 30

Organisational matters Introduction Plan of the Course Literature

Properties

Complexity

If L regular:

◮ space(L) = O(1)

constant space - independent of the input size

◮ linear time = O(n)

30 / 43

slide-31
SLIDE 31

Organisational matters Introduction Plan of the Course Literature

Applications

31 / 43

slide-32
SLIDE 32

Organisational matters Introduction Plan of the Course Literature

Another nice FSA property

Bidirectional use

◮ fsa

→ transducer: edges are labeled by ≤ symbol,symbol ≥ -relations Example: houses ↔ house+Noun+Pl

32 / 43

slide-33
SLIDE 33

Organisational matters Introduction Plan of the Course Literature

Morphology

Two levels

33 / 43

slide-34
SLIDE 34

Organisational matters Introduction Plan of the Course Literature

Morphology

Two levels: transducer

34 / 43

slide-35
SLIDE 35

Organisational matters Introduction Plan of the Course Literature

Xerox and FS Morpholgy

Two levels: transducer

◮ Lauri Karttunen ◮ Kimmo Koskenniemi ◮ Martin Kay ◮ Ron Kaplan

35 / 43

slide-36
SLIDE 36

Organisational matters Introduction Plan of the Course Literature

Xerox and FS Morpholgy

Xerox System

◮ Xerox finite state software ◮ Karttunen/Beesley Finite State Morphology ◮ http://web.stanford.edu/˜laurik/fsmbook/home.html ◮ components

◮ xfst (compiler for regular expressions) ◮ lexc (compiler for lexicon representations) ◮ . . . 36 / 43

slide-37
SLIDE 37

Organisational matters Introduction Plan of the Course Literature

Morphology

Two levels: transducer

37 / 43

slide-38
SLIDE 38

Organisational matters Introduction Plan of the Course Literature

Outline

Organisational matters Introduction Plan of the Course Literature

38 / 43

slide-39
SLIDE 39

Organisational matters Introduction Plan of the Course Literature

Our program I

Plan

39 / 43

slide-40
SLIDE 40

Organisational matters Introduction Plan of the Course Literature

Our program II

Plan

40 / 43

slide-41
SLIDE 41

Organisational matters Introduction Plan of the Course Literature

Outline

Organisational matters Introduction Plan of the Course Literature

41 / 43

slide-42
SLIDE 42

Organisational matters Introduction Plan of the Course Literature

Literature I

Karttunen, L., Beesley, K. Finite State Morphology. CSLI Publications, Stanford 2003 http://web.stanford.edu/˜laurik/fsmbook/home.html van Noord, Gertjan Fsa Utilities User Manual Version 6 1998 https://www.let.rug.nl/˜vannoord/Fsa/Manual/ The FSA Utilities toolbox. https://www.let.rug.nl/˜vannoord/Fsa/ https://www.let.rug.nl/˜vannoord/papers/

42 / 43

slide-43
SLIDE 43

Organisational matters Introduction Plan of the Course Literature

Literature II

Beesley, K. Xerox Finite State Programming Course http://web.stanford.edu/˜laurik/fsmbook/lecture-notes/ Beesley2004/index.html Butt, M., B¨

  • gel, T.

Finite State Morphology Tutorial http://ling.uni-konstanz.de/pages/home/boegel/Dateien/CLT09_ tutorial.pdf Karttunen, L. Finite-State Methods in Natural Language Processing http://web.stanford.edu/˜laurik/fsmbook/LSA-207/index.html

43 / 43