Finite state morphology and phonology
Mans Hulden
- Dept. of Linguistics
Finite state morphology and phonology Natural Language Processing - - PowerPoint PPT Presentation
Finite state morphology and phonology Natural Language Processing LING/CSCI 5832 Mans Hulden Dept. of Linguistics mans.hulden@colorado.edu Jan 20 2014 FSMs for practical NLP tasks (1) How FSMs are used in modeling sound systems (phonology)
1 2
1 2
6 a 5 b 1 c 2 a b c a 3 b c 4 a b c a b c 10 a 7 b c a 8 b c 9 a b c a b c
Expression Definition FSM construction ✏ The empty string ∅ The empty language a A single symbol A∗ Kleene star of a language AB Concatenation of two languages A | B Union of two languages
1 a b c a 2 b c a b
1 a b c a 2 b c a b
(b|c|aa*c)*aa*b(aa*b|(b|aa*c)(b|c|aa*c)*aa*b)*|(b|c)* a((a|ba)|(c|bb)(b|c)*a)*|(b|c|a(a|ba)*(c|bb))*
1 a b c a 2 b c a b
(b|c|aa*c)*aa*b(aa*b|(b|aa*c)(b|c|aa*c)*aa*b)*|(b|c)* a((a|ba)|(c|bb)(b|c)*a)*|(b|c|a(a|ba)*(c|bb))*
1 a b c a 2 b c a b
(b|c|aa*c)*aa*b(aa*b|(b|aa*c)(b|c|aa*c)*aa*b)*|(b|c)* a((a|ba)|(c|bb)(b|c)*a)*|(b|c|a(a|ba)*(c|bb))*
The common data structures that our programs manipulate are clearly states, transitions, labels, and label pairs—the building blocks of finite automata and
ing also to think in terms of these objects. The automata required to implement even the simplest examples are large and involve considerable subtlety for their
much like predicting weather patterns by studying the movements of atoms and molecules or inverting a matrix with a Turing machine. The only hope of success in this domain lies in developing an appropriate set of high-level alge- braic operators for reasoning about languages and relations and for justifying a corresponding set of operators and automata for computation. (Kaplan and Kay, 1994, p.376)
From “Regular models of phonological rule systems”
1 2
1 2
1 2
1 2
(0,0)
1 2
1 2
(0,0) a
(1,1)
1 2
1 2
(0,0) a
(1,1) b (1,2)
1 2
1 2
(0,0) a
(1,1) b (1,2) c (2,2)
Algorithm 3.2: PRODUCTCONSTRUCTION Input: FSM1 = (Q1, Σ, 1, s0, F1), FSM2 = (Q2, Σ, 2, t0, F2), OP 2 {[, \, } Output: FSM3 = (Q3, Σ, 3, u0, F3) begin
1
Agenda (s0, t0)
2
Q3 (s0, t0)
3
u0 (s0, t0)
4
index (s0, t0)
5
while Agenda 6= ; do
6
Choose a state pair (p, q) from Agenda
7
foreach pair of transitions 1(p, x, p0) 2(q, x, q0) do
8
Add 3((p, q), x, (p0, q0))
9
if (p’,q’) is not indexed then
10
Index (p0, q0) and add to Agenda and Q3
11
end
12
end
13
end
14
foreach State s in Q3 = (p, q) do
15
Add s to F3 iff p 2 F1 OP q 2 F2
16
end
17
end
18
a b d 1 c 2 a 3 <a:b> b d c a b c d
a b d 1 c 2 a 3 <a:b> b d c a b c d
1 <a:c> <a:d> 2 <b:0> <c:0>
Regular languages
1 2 (0,0) a:x
(1,0) c:d (2,0)
1 a e i o u 2 <a:0> <e:0> <i:0> <o:0> <u:0> b c d f p t k a e i o u <a:0> <e:0> <i:0> <o:0> <u:0> b c d f p t k
UNDERLYING REPRESENTATION Lexical rules Postlexical rules LEXICAL REPRESENTATION SURFACE REPRESENTATION ↓ ↓ (13)
put morphemes together
phonemes and morphemes change when they are conjoined, modeled by phonological rules.
@ 1 a @ a
@ 1 a @ a
@ + m p 1 n 2 <n:m> @ m p 4 + n <n:m> 3 + p @ + m n <n:m>
@ + e i l t y 1 b @ + e i t y b 7 l 2 <l:i> 3 <e:l> 4 <+:i> 5 <i:t> 6 <t:y> <y:0> @ + i l t y b 8 e @ e i l t y 9 + b @ + e l t y b 10 i @ + e i l y b 11 t @ + e i l t b@ <+:0>
@ + m p 1 n 2 <n:m> @ m p 4 + n <n:m> 3 + p @ + m n <n:m>
@ + e i l t y 1 b @ + e i t y b 7 l 2 <l:i> 3 <e:l> 4 <+:i> 5 <i:t> 6 <t:y> <y:0> @ + i l t y b 8 e @ e i l t y 9 + b @ + e l t y b 10 i @ + e i l y b 11 t @ + e i l t b@ <+:0>
10 u i 5 a 2 s 1 d 9 <+:0> 8 e 3 e 4 m i 6 u 7 t@ + m p 1 n 2 <n:m> @ m p 4 + n <n:m> 3 + p @ + m n <n:m>
@ + e i l t y 1 b @ + e i t y b 7 l 2 <l:i> 3 <e:l> 4 <+:i> 5 <i:t> 6 <t:y> <y:0> @ + i l t y b 8 e @ e i l t y 9 + b @ + e l t y b 10 i @ + e i l y b 11 t @ + e i l t b@ <+:0>
@ + m p 1 n 2 <n:m> @ m p 4 + n <n:m> 3 + p @ + m n <n:m>
@ + e i l t y 1 b @ + e i t y b 7 l 2 <l:i> 3 <e:l> 4 <+:i> 5 <i:t> 6 <t:y> <y:0> @ + i l t y b 8 e @ e i l t y 9 + b @ + e l t y b 10 i @ + e i l y b 11 t @ + e i l t b@ <+:0>
10 u i 5 a 2 s 1 d 9 <+:0> 8 e 3 e 4 m i 6 u 7 t