1
Regular Expressions Regular Expressions and Automata and Automata
Berlin Chen 2003
References:
- 1. Speech and Language Processing, chapter 2
Regular Expressions Regular Expressions and Automata and Automata - - PowerPoint PPT Presentation
Regular Expressions Regular Expressions and Automata and Automata Berlin Chen 2003 References: 1. Speech and Language Processing, chapter 2 1 Introduction Regular Expressions (REs) Finite-State Automata (FSAs) Formal
1
References:
2
3
4
“You’ve left the burglar behind again !” said Nori /!/ “All our pretty songs” /song/ “Dagmar, my gift please, Chaire says, ” /Chaire︺says,/ “Mary Ann stopped by Mona’s” /a/ “interesting links to woodchucks and lemurs” /woodchucks/ Example Patterns Matched RE
5
6
7
baa! baaa! baaaa! baaaaa! baaaaaa! ….
8
9
10
11
12
Don’t mean end-of-line here. match a word boundary
13
14
15
The memory feature “register”
16
17
18
A tape with cells. The state-transition table An FSA.
19
20
21
22
23
24
25
26
Account for number from 1 to 99. Account for number from 1 to 99.
27
28
29
30
Agenda (s1, pos u ) (s5, pos v ) …….
Discussed later
A search-state A machine state/node
31
Add new search states to the agenda Node Tape pos, Generate alternatives Depends on the search algorithm adopted
32
Infinite loop ? Infinite loop ?
Time-synchronous Time-asynchronous
Viterbi/Breadth-first search Best-first search
33
Agenda (q0, pos 0) Agenda (q1, pos 1) Agenda (q2, pos 2) Agenda (q2, pos 3) (q3, pos 3) Agenda (q2, pos 3) Agenda (q2, pos 4) (q3, pos 4)
0 1 2 3 4 5
Agenda (q2, pos 4) (q4, pos 5)
34
0 1 2 3 4 5
Agenda (q0, pos 0) Agenda (q1, pos 1) Agenda (q2, pos 2) Agenda (q2, pos 3) (q3, pos 3) Agenda (q2, pos 4) (q3, pos 4)
35
36
q0 q1 q2 q2,3 q4 b a a a !
37
1
L
2
2 1 2 1
, L y L x xy L L ∈ ∈ = ⋅
1
L
2
2 1
L L ∪
1
L
2
1
L
* 1
L FSA RE RL
38
1
L
2
1
L
2
1
L
1
L
2 1
L L ∩
2 1
L L −
1 *
L − Σ
R
L1
39
40
41
42
43
merge + V + PRES-PART merging goose + N +PL geese (goose +N +SG) or (goose +V) goose gooses +V +3SG gooses (catch +V +PAST-PART) or (catch +V + PAST) caught city + N +PL cities cat + N + SG cat cat + N +PL cats Morphological Parsed Outputs Inputs word stems and morphological features
44
45
46
big cool red clear happy real
47
48
49
50
mapping
51
generating a string parsing a string (more complicated)
52
Antworth 1990