finite state transducers
play

Finite State Transducers fmies:N 1 2 3 4 5 time:N fmies:V - PDF document

Finite State Transducers fmies:N 1 2 3 4 5 time:N fmies:V example 2: POS tagging / shallow parsing like:ADP like:V an:D arrow:N 0 1 2 0 Where do we use FSTs? PROPN:NP In this lecture, we treat an FSA as a simple FST that outputs


  1. Finite State Transducers fmies:N 1 2 3 4 5 time:N fmies:V example 2: POS tagging / shallow parsing like:ADP like:V an:D arrow:N 0 1 2 0 Where do we use FSTs? PROPN:NP In this lecture, we treat an FSA as a simple FST that outputs 5 6 a t Data structures and algorithms o g its input: edge label ‘a’ is a shorthand for ‘a:a’. Summary Ç. Çöltekin, SfS / University of Tübingen WS 19–20 4 / 17 Introduction Operations on FSTs Determinizing FSTs N:NP Note: (1) It is important to express the ambiguity. (2) This 3 0 0 1 2 a:b a b a b a:b 1 FST inversion 2 b:a a b a b b:a Ç. Çöltekin, SfS / University of Tübingen WS 19–20 symbols Summary gets interesting if we can ‘compose’ these automata. Summary Ç. Çöltekin, SfS / University of Tübingen WS 19–20 5 / 17 Introduction Operations on FSTs Determinizing FSTs Closure properties of FSTs Determinizing FSTs Like FSA, FSTs are closed under some operations. Ç. Çöltekin, SfS / University of Tübingen WS 19–20 6 / 17 Introduction Operations on FSTs 4 7 / 17 2 0 Summary Determinizing FSTs 1 Introduction 1 / 17 WS 19–20 SfS / University of Tübingen Ç. Çöltekin, a:b a:a b:b a:a a:b b:b 2 1 rational ) relation Ç. Çöltekin, symbol, while it outputs the corresponding output symbol transitions are conditioned on a pair of symbols A quick introduction Finite state transducers Summary Determinizing FSTs Operations on FSTs Introduction Winter Semester 2019–2020 Seminar für Sprachwissenschaft University of Tübingen ccoltekin@sfs.uni-tuebingen.de Çağrı Çöltekin for Computational Linguistics III Formal defjnition Operations on FSTs SfS / University of Tübingen Introduction Ç. Çöltekin, SfS / University of Tübingen WS 19–20 3 / 17 Uses in NLP/CL Where do we use FSTs? WS 19–20 Determinizing FSTs Operations on FSTs Summary 2 / 17 Summary Introduction Operations on FSTs Determinizing FSTs 0 example 1: morphological analysis Where do we use FSTs? • A fjnite state transducer (FST) is a fjnite state machine where • The machine moves between the states based on input • An FST encodes a relation , a mapping from a set to another • The relation defjned by an FST is called a regular (or babba → babbb aba → bbb aba → abb • Morphological analysis A fjnite state transducer is a tuple ( Σ i , Σ o , Q , q 0 , F , ∆ ) • Spelling correction Σ i is the input alphabet • Transliteration Σ o is the output alphabet • Speech recognition Q a fjnite set of states • Grapheme-to-phoneme mapping q 0 is the start state, q 0 ∈ Q • Normalization F is the set of accepting states, F ⊆ Q • Tokenization ∆ is a relation ( ∆ : Q × Σ i → Q × Σ o ) • POS tagging (not typical, but done) • partial parsing / chunking • … c s: ⟨ PL ⟩ ADJ: ϵ DET: ϵ d • Since FST encodes a relation, it can be reversed • Inverse of an FST swaps the input symbols with output • Concatenation • We indicate inverse of an FST M with M − 1 • Kleene star • Complement M • Reversal • Union • Intersection • Inversion M − 1 • Composition

  2. Introduction 11 1 2 a:b a b a:b 0 1 b c a a c b:c 00 01 a:b 20 FST composition 9 / 17 0 FST composition Summary Determinizing FSTs Operations on FSTs Introduction WS 19–20 a SfS / University of Tübingen Ç. Çöltekin, b:c a b a:b 0 Summary 2 1 1 0 a:b a b a:b 2 0 a Operations on FSTs FST composition Summary Determinizing FSTs Operations on FSTs Introduction b c a Determinizing FSTs b Operations on FSTs Introduction 9 / 17 WS 19–20 SfS / University of Tübingen Ç. Çöltekin, a:b c b:c a 20 11 a:b 01 00 1 a:b WS 19–20 construction to FSTs a a a Ç. Çöltekin, SfS / University of Tübingen WS 19–20 10 / 17 Introduction Operations on FSTs Determinizing FSTs Summary FST determinization unambiguous transitions from every state on any input symbol converting to a subsequential FST 1 a WS 19–20 SfS / University of Tübingen Ç. Çöltekin, a b b a determinized a:b 3 2 1 0 Is this FST deterministic? 2 0 a b 00 a:b a 20 11 a:b 01 c b:c a a a b c 1 0 a:b b b:c a:b Projection a b a:b 2 1 0 input language or the output language Summary Ç. Çöltekin, Determinizing FSTs Operations on FSTs Introduction 9 / 17 WS 19–20 SfS / University of Tübingen 9 / 17 11 / 17 SfS / University of Tübingen 8 / 17 0 a:b a b a:b sequential application 2 1 0 FST composition Summary Determinizing FSTs Operations on FSTs Introduction WS 19–20 b c SfS / University of Tübingen Ç. Çöltekin, 0 2 a:b a b a:b 0 1 b c a a c b:c Ç. Çöltekin, 1 1 a 1 a a:b a b a:b a:b 0 b c 0 Determinizing FSTs a a c b:c 00 01 1 2 Summary Ç. Çöltekin, FST composition Summary Determinizing FSTs c b:c Operations on FSTs Introduction 9 / 17 00 FST composition WS 19–20 SfS / University of Tübingen M 1 M 1 M 2 M 2 M 1 ◦ M 2 − − − − − − − − − − − − − − − − → M 1 M 2 aa − − − bb − − − bb → → M 1 M 2 − − − − − − bb ∅ ∅ → → M 1 M 2 − − − − − − aaaa baab baac → → M 1 M 2 abaa − − − bbab − − − bbac → → • Can we compose without running the FSTs sequentially? M 1 M 1 M 2 M 2 M 1 M 1 M 2 M 2 c c : : a a M 1 ◦ M 2 • A deterministic FST has • Projection turns an FST into a FSA, accepting either the • We can extend the subset M project ( M ) • Determinization often means • However, not all FSTs can be

  3. Introduction WS 19–20 b a Note that we cannot ‘determine’ the output on fjrst input, until reaching the fjnal input. Ç. Çöltekin, SfS / University of Tübingen 15 / 17 a Introduction Operations on FSTs Determinizing FSTs Summary FSA vs FST the inputs they accept – FSTs are not closed under intersection and complement a a – Determinizing FSTs is not always possible Determinizing FSTs Ç. Çöltekin, SfS / University of Tübingen Operations on FSTs 14 / 17 Introduction Operations on FSTs Summary a:b Determinizing FSTs Another example Can you convert the following FST to a subsequential FST? 0 1 2 3 – We can compose (and invert) the FSTs course) ab Mohri, Mehryar (2009). “Weighted automata algorithms”. In: Handbook of Weighted SfS / University of Tübingen WS 19–20 A.1 References References / additional reading material (cont.) Introduction to Natural Language Processing, Computational Linguistics, and Speech Automata . Monographs in Theoretical Computer Science. Springer, pp. 213–254. – Mohri (2009): weighted FSTs Roche, Emmanuel and Yves Schabes (1996). Introduction to Finite-State Devices in Natural Laboratories. url: http://www.merl.com/publications/docs/TR96-13.pdf . Roche, Emmanuel and Yves Schabes (1997). Finite-state Language Processing . A Bradford book. MIT Press. isbn: 9780262181822. Ç. Çöltekin, SfS / University of Tübingen WS 19–20 Ç. Çöltekin, FSTs and their use in NLP Ç. Çöltekin, Next SfS / University of Tübingen WS 19–20 16 / 17 Introduction Operations on FSTs Determinizing FSTs Summary – String search (FSA) – Roche and Schabes (1996) and Roche and Schabes (1997): – Finite-state morphology (FST) Ç. Çöltekin, SfS / University of Tübingen WS 19–20 17 / 17 References References / additional reading material a b WS 19–20 bb a a:b 2 1 0 Subsequential FSTs Summary Determinizing FSTs Operations on FSTs Introduction 12 / 17 WS 19–20 SfS / University of Tübingen Ç. Çöltekin, b b a:b allow ambiguity Determinizing FSTs Summary Sequential FSTs transition from each state on every input symbol length of input 0 b 1 2 3 a:ab b:a a b A.2 a b a:a a:b b Determinizing FSTs Summary An exercise a 0 1 2 a b Introduction 0 1 2 3 4 5 aa ba Operations on FSTs Convert the following FST to a subsequential FST 13 / 17 Ç. Çöltekin, WS 19–20 SfS / University of Tübingen bb to two strings, e.g., • A k-subsequential FST is a sequential FST which can output up to k strings at an accepting state • A sequential FST has a single • Subsequential transducers allow limited ambiguity a: ϵ • Recognition time is still linear b: ϵ • Output symbols can be strings, as well as ϵ • The recognition is linear in the • However, sequential FSTs do not • The 2-subsequential FST above maps every string it accepts – baa → bba – baa → bbbb a: ϵ b: ϵ a: ϵ b • FSA are acceptors , FSTs are transducers • FSA accept or reject their input, FSTs produce output(s) for • Practical applications of fjnite-state machines • FSA defjne sets, FSTs defjne relations between sets • FSTs share many properties of FSAs. However, • Dependency grammars and dependency parsing • Constituency (context-free) parsing • Both FSA and FSTs can be weighted (not covered in this Jurafsky, Daniel and James H. Martin (2009). Speech and Language Processing: An • Jurafsky and Martin (2009, Ch. 3) Recognition . second. Pearson Prentice Hall. isbn: 978-0-13-504196-3. • Additional references include: Language Processing Technical Report . Tech. rep. TR96-13. Mitsubishi Electric Research

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend