finite state transducers
play

Finite State Transducers Data structures and algorithms for - PowerPoint PPT Presentation

Finite State Transducers Data structures and algorithms for Computational Linguistics III ar ltekin ccoltekin@sfs.uni-tuebingen.de University of Tbingen Seminar fr Sprachwissenschaft Winter Semester 20192020 Introduction


  1. Finite State Transducers Data structures and algorithms for Computational Linguistics III Çağrı Çöltekin ccoltekin@sfs.uni-tuebingen.de University of Tübingen Seminar für Sprachwissenschaft Winter Semester 2019–2020

  2. Introduction Operations on FSTs WS 19–20 SfS / University of Tübingen Ç. Çöltekin, a:b a:a b:b a:a a:b b:b 2 1 0 outputs the corresponding output symbol conditioned on a pair of symbols A quick introduction Finite state transducers Summary Determinizing FSTs 1 / 17 • A fjnite state transducer (FST) is a fjnite state machine where transitions are • The machine moves between the states based on input symbol, while it • An FST encodes a relation , a mapping from a set to another • The relation defjned by an FST is called a regular (or rational ) relation babba → babbb aba → bbb aba → abb

  3. Introduction Operations on FSTs WS 19–20 SfS / University of Tübingen Ç. Çöltekin, a:b a:a b:b a:a a:b b:b 2 1 0 outputs the corresponding output symbol conditioned on a pair of symbols A quick introduction Finite state transducers Summary Determinizing FSTs 1 / 17 • A fjnite state transducer (FST) is a fjnite state machine where transitions are • The machine moves between the states based on input symbol, while it • An FST encodes a relation , a mapping from a set to another • The relation defjned by an FST is called a regular (or rational ) relation babba → babbb aba → bbb aba → abb

  4. Introduction Operations on FSTs Determinizing FSTs Summary Formal defjnition Ç. Çöltekin, SfS / University of Tübingen WS 19–20 2 / 17 A fjnite state transducer is a tuple ( Σ i , Σ o , Q , q 0 , F , ∆ ) Σ i is the input alphabet Σ o is the output alphabet Q a fjnite set of states q 0 is the start state, q 0 ∈ Q F is the set of accepting states, F ⊆ Q ∆ is a relation ( ∆ : Q × Σ i → Q × Σ o )

  5. Introduction Operations on FSTs WS 19–20 SfS / University of Tübingen Ç. Çöltekin, 3 / 17 Where do we use FSTs? Uses in NLP/CL Summary Determinizing FSTs • Morphological analysis • Spelling correction • Transliteration • Speech recognition • Grapheme-to-phoneme mapping • Normalization • Tokenization • POS tagging (not typical, but done) • partial parsing / chunking • …

  6. Introduction Operations on FSTs WS 19–20 SfS / University of Tübingen Ç. Çöltekin, label ‘a’ is a shorthand for ‘a:a’. In this lecture, we treat an FSA as a simple FST that outputs its input: edge g o t a 4 / 17 6 5 4 3 2 1 0 example 1: morphological analysis Where do we use FSTs? Summary Determinizing FSTs c s: ⟨ PL ⟩ d

  7. Introduction Operations on FSTs WS 19–20 SfS / University of Tübingen Ç. Çöltekin, we can ‘compose’ these automata. Note: (1) It is important to express the ambiguity. (2) This gets interesting if PROPN:NP N:NP 2 1 0 arrow:N an:D like:V like:ADP 2 Determinizing FSTs Summary Where do we use FSTs? example 2: POS tagging / shallow parsing 0 1 5 / 17 fmies:V 3 4 5 time:N fmies:N ADJ: ϵ DET: ϵ

  8. Introduction Operations on FSTs WS 19–20 SfS / University of Tübingen Ç. Çöltekin, 6 / 17 Closure properties of FSTs Like FSA, FSTs are closed under some operations. Summary Determinizing FSTs • Concatenation • Kleene star • Complement • Reversal • Union • Intersection • Inversion • Composition

  9. Introduction Operations on FSTs WS 19–20 SfS / University of Tübingen Ç. Çöltekin, b:a a b b:a a b 2 1 0 a:b a b a:b a b 2 1 0 FST inversion Summary Determinizing FSTs 7 / 17 • Since FST encodes a relation, it can be reversed • Inverse of an FST swaps the input symbols with output symbols • We indicate inverse of an FST M with M − 1 M M − 1

  10. Introduction 0 WS 19–20 SfS / University of Tübingen Ç. Çöltekin, Can we compose without running the FSTs sequentially? Operations on FSTs c b:c a a b c 1 8 / 17 a:b a b a:b 2 1 0 Determinizing FSTs Summary FST composition sequential application M 1 M 2 M 1 − − − aa → M 1 − − − bb → M 1 aaaa − − − → M 1 − − − abaa →

  11. Introduction 0 WS 19–20 SfS / University of Tübingen Ç. Çöltekin, Can we compose without running the FSTs sequentially? Operations on FSTs c b:c a a b c 1 8 / 17 sequential application a:b a b a:b 2 1 0 FST composition Summary Determinizing FSTs M 1 M 2 M 1 M 2 − − − − − − aa bb → → M 1 M 2 − − − − − − bb ∅ → → M 1 M 2 aaaa − − − baab − − − → → M 1 M 2 − − − − − − abaa bbab → →

  12. Introduction 0 WS 19–20 SfS / University of Tübingen Ç. Çöltekin, Operations on FSTs c b:c a a b c 1 8 / 17 a:b 0 FST composition Summary Determinizing FSTs sequential application 1 2 a b a:b M 1 M 2 M 1 ◦ M 2 − − − − − − − − − − − − − − − − → M 1 M 2 − − − − − − aa bb bb → → M 1 M 2 − − − − − − bb ∅ ∅ → → M 1 M 2 aaaa − − − baab − − − baac → → M 1 M 2 − − − − − − abaa bbab bbac → → • Can we compose without running the FSTs sequentially?

  13. Introduction a:b WS 19–20 SfS / University of Tübingen Ç. Çöltekin, c b:c a a b c 1 0 a b Operations on FSTs a:b 2 1 0 FST composition Summary Determinizing FSTs 9 / 17 M 1 M 2

  14. Introduction 0 WS 19–20 SfS / University of Tübingen Ç. Çöltekin, 00 c b:c a a b c 1 a:b Operations on FSTs a b a:b 2 1 0 FST composition Summary Determinizing FSTs 9 / 17 M 1 M 2

  15. Introduction 1 WS 19–20 SfS / University of Tübingen Ç. Çöltekin, a:b 01 00 c b:c a a b c 0 Operations on FSTs a:b a b a:b 2 1 0 FST composition Summary Determinizing FSTs 9 / 17 M 1 M 2

  16. Introduction a WS 19–20 SfS / University of Tübingen Ç. Çöltekin, b a:b a 20 11 a:b 01 00 c b:c a b c Operations on FSTs 1 0 a:b a b a:b 2 1 0 FST composition Summary Determinizing FSTs 9 / 17 M 1 M 2

  17. Introduction a WS 19–20 SfS / University of Tübingen Ç. Çöltekin, b:c a b a:b a 20 11 a:b 01 00 Operations on FSTs c b:c a 1 Determinizing FSTs Summary FST composition b c 0 2 a:b a b a:b 0 1 9 / 17 M 1 M 2 c : a

  18. Introduction Operations on FSTs WS 19–20 SfS / University of Tübingen Ç. Çöltekin, b:c a b a:b a 20 11 a:b 01 00 c b:c a a 1 Determinizing FSTs Summary FST composition b c 0 9 / 17 2 a:b a b a:b 0 1 M 1 M 2 c : a M 1 ◦ M 2

  19. Introduction a:b WS 19–20 SfS / University of Tübingen Ç. Çöltekin, a a a 2 1 0 a b Operations on FSTs a:b 2 1 0 output language Projection Summary Determinizing FSTs 10 / 17 • Projection turns an FST into a FSA, accepting either the input language or the M project ( M )

  20. Introduction 1 WS 19–20 SfS / University of Tübingen Ç. Çöltekin, a b b a a a:b 3 2 0 Operations on FSTs Is this FST deterministic? subsequential FST FSTs symbol transitions from every state on any input FST determinization Summary Determinizing FSTs 11 / 17 • A deterministic FST has unambiguous • We can extend the subset construction to • Determinization often means converting to a • However, not all FSTs can be determinized

  21. Introduction 1 WS 19–20 SfS / University of Tübingen Ç. Çöltekin, a b b a a a:b 3 2 0 Operations on FSTs Is this FST deterministic? subsequential FST FSTs symbol transitions from every state on any input FST determinization Summary Determinizing FSTs 11 / 17 • A deterministic FST has unambiguous • We can extend the subset construction to • Determinization often means converting to a • However, not all FSTs can be determinized

  22. Introduction Operations on FSTs WS 19–20 SfS / University of Tübingen Ç. Çöltekin, b a b b a b:a a:ab 3 2 1 0 ambiguity input each state on every input symbol Sequential FSTs Summary Determinizing FSTs 12 / 17 a: ϵ • A sequential FST has a single transition from • Output symbols can be strings, as well as ϵ • The recognition is linear in the length of • However, sequential FSTs do not allow

  23. Introduction Operations on FSTs WS 19–20 SfS / University of Tübingen Ç. Çöltekin, e.g., bb a a b a:b b a:b 2 1 0 accepting state Subsequential FSTs Summary Determinizing FSTs 13 / 17 • A k-subsequential FST is a sequential FST which can output up to k strings at an • Subsequential transducers allow limited ambiguity • Recognition time is still linear b: ϵ • The 2-subsequential FST above maps every string it accepts to two strings, – baa → bba – baa → bbbb

  24. a: b: a: b Introduction 5 WS 19–20 SfS / University of Tübingen Ç. Çöltekin, a b ab bb ba aa 4 Operations on FSTs 3 2 1 0 a b a:a a:b b 2 1 0 Convert the following FST to a subsequential FST An exercise Summary Determinizing FSTs 14 / 17

  25. Introduction 3 WS 19–20 SfS / University of Tübingen Ç. Çöltekin, a b ab bb ba aa 5 Operations on FSTs 4 2 1 Determinizing FSTs Summary An exercise Convert the following FST to a subsequential FST 14 / 17 0 1 2 a:a a:b b a b 0 a: ϵ a: ϵ b: ϵ b

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend