Finite State Transducers Data structures and algorithms for - - PowerPoint PPT Presentation

finite state transducers
SMART_READER_LITE
LIVE PREVIEW

Finite State Transducers Data structures and algorithms for - - PowerPoint PPT Presentation

Finite State Transducers Data structures and algorithms for Computational Linguistics III ar ltekin ccoltekin@sfs.uni-tuebingen.de University of Tbingen Seminar fr Sprachwissenschaft Winter Semester 20192020 Introduction


slide-1
SLIDE 1

Finite State Transducers

Data structures and algorithms for Computational Linguistics III Çağrı Çöltekin ccoltekin@sfs.uni-tuebingen.de

University of Tübingen Seminar für Sprachwissenschaft

Winter Semester 2019–2020

slide-2
SLIDE 2

Introduction Operations on FSTs Determinizing FSTs Summary

Finite state transducers

A quick introduction

  • A fjnite state transducer (FST) is a fjnite state machine where transitions are

conditioned on a pair of symbols

  • The machine moves between the states based on input symbol, while it
  • utputs the corresponding output symbol
  • An FST encodes a relation, a mapping from a set to another
  • The relation defjned by an FST is called a regular (or rational) relation

1 2

a:a a:b b:b a:a b:b a:b

babba → babbb aba → bbb aba → abb

Ç. Çöltekin, SfS / University of Tübingen WS 19–20 1 / 17

slide-3
SLIDE 3

Introduction Operations on FSTs Determinizing FSTs Summary

Finite state transducers

A quick introduction

  • A fjnite state transducer (FST) is a fjnite state machine where transitions are

conditioned on a pair of symbols

  • The machine moves between the states based on input symbol, while it
  • utputs the corresponding output symbol
  • An FST encodes a relation, a mapping from a set to another
  • The relation defjned by an FST is called a regular (or rational) relation

1 2

a:a a:b b:b a:a b:b a:b

babba → babbb aba → bbb aba → abb

Ç. Çöltekin, SfS / University of Tübingen WS 19–20 1 / 17

slide-4
SLIDE 4

Introduction Operations on FSTs Determinizing FSTs Summary

Formal defjnition

A fjnite state transducer is a tuple (Σi, Σo, Q, q0, F, ∆) Σi is the input alphabet Σo is the output alphabet Q a fjnite set of states q0 is the start state, q0 ∈ Q F is the set of accepting states, F ⊆ Q ∆ is a relation (∆ : Q × Σi → Q × Σo)

Ç. Çöltekin, SfS / University of Tübingen WS 19–20 2 / 17

slide-5
SLIDE 5

Introduction Operations on FSTs Determinizing FSTs Summary

Where do we use FSTs?

Uses in NLP/CL

  • Morphological analysis
  • Spelling correction
  • Transliteration
  • Speech recognition
  • Grapheme-to-phoneme mapping
  • Normalization
  • Tokenization
  • POS tagging (not typical, but done)
  • partial parsing / chunking

Ç. Çöltekin, SfS / University of Tübingen WS 19–20 3 / 17

slide-6
SLIDE 6

Introduction Operations on FSTs Determinizing FSTs Summary

Where do we use FSTs?

example 1: morphological analysis

1 2 3 4 5 6

c a t d

  • g

s:⟨PL⟩

In this lecture, we treat an FSA as a simple FST that outputs its input: edge label ‘a’ is a shorthand for ‘a:a’.

Ç. Çöltekin, SfS / University of Tübingen WS 19–20 4 / 17

slide-7
SLIDE 7

Introduction Operations on FSTs Determinizing FSTs Summary

Where do we use FSTs?

example 2: POS tagging / shallow parsing

1 2 3 4 5

time:N fmies:N fmies:V like:ADP like:V an:D arrow:N

1 2

DET:ϵ ADJ:ϵ N:NP PROPN:NP

Note: (1) It is important to express the ambiguity. (2) This gets interesting if we can ‘compose’ these automata.

Ç. Çöltekin, SfS / University of Tübingen WS 19–20 5 / 17

slide-8
SLIDE 8

Introduction Operations on FSTs Determinizing FSTs Summary

Closure properties of FSTs

Like FSA, FSTs are closed under some operations.

  • Concatenation
  • Kleene star
  • Complement
  • Reversal
  • Union
  • Intersection
  • Inversion
  • Composition

Ç. Çöltekin, SfS / University of Tübingen WS 19–20 6 / 17

slide-9
SLIDE 9

Introduction Operations on FSTs Determinizing FSTs Summary

FST inversion

  • Since FST encodes a relation, it can be reversed
  • Inverse of an FST swaps the input symbols with output symbols
  • We indicate inverse of an FST M with M−1

M 1 2

a:b a b a b a:b

M−1 1 2

b:a a b a b b:a

Ç. Çöltekin, SfS / University of Tübingen WS 19–20 7 / 17

slide-10
SLIDE 10

Introduction Operations on FSTs Determinizing FSTs Summary

FST composition

sequential application

M1 1 2

a:b a b a:b

M2 1

b c a a c b:c

aa

M1

− − − → bb

M1

− − − → aaaa

M1

− − − → abaa

M1

− − − → Can we compose without running the FSTs sequentially?

Ç. Çöltekin, SfS / University of Tübingen WS 19–20 8 / 17

slide-11
SLIDE 11

Introduction Operations on FSTs Determinizing FSTs Summary

FST composition

sequential application

M1 1 2

a:b a b a:b

M2 1

b c a a c b:c

aa

M1

− − − → bb

M2

− − − → bb

M1

− − − → ∅

M2

− − − → aaaa

M1

− − − → baab

M2

− − − → abaa

M1

− − − → bbab

M2

− − − → Can we compose without running the FSTs sequentially?

Ç. Çöltekin, SfS / University of Tübingen WS 19–20 8 / 17

slide-12
SLIDE 12

Introduction Operations on FSTs Determinizing FSTs Summary

FST composition

sequential application

M1 1 2

a:b a b a:b

M2 1

b c a a c b:c

M1◦M2

− − − − − − − − − − − − − − − − → aa

M1

− − − → bb

M2

− − − → bb bb

M1

− − − → ∅

M2

− − − → ∅ aaaa

M1

− − − → baab

M2

− − − → baac abaa

M1

− − − → bbab

M2

− − − → bbac

  • Can we compose without running the FSTs sequentially?

Ç. Çöltekin, SfS / University of Tübingen WS 19–20 8 / 17

slide-13
SLIDE 13

Introduction Operations on FSTs Determinizing FSTs Summary

FST composition

M1 M2 1 2

a:b a b a:b

1

b c a a c b:c

Ç. Çöltekin, SfS / University of Tübingen WS 19–20 9 / 17

slide-14
SLIDE 14

Introduction Operations on FSTs Determinizing FSTs Summary

FST composition

M1 M2 1 2

a:b a b a:b

1

b c a a c b:c

00

Ç. Çöltekin, SfS / University of Tübingen WS 19–20 9 / 17

slide-15
SLIDE 15

Introduction Operations on FSTs Determinizing FSTs Summary

FST composition

M1 M2 1 2

a:b a b a:b

1

b c a a c b:c

00 01

a:b

Ç. Çöltekin, SfS / University of Tübingen WS 19–20 9 / 17

slide-16
SLIDE 16

Introduction Operations on FSTs Determinizing FSTs Summary

FST composition

M1 M2 1 2

a:b a b a:b

1

b c a a c b:c

00 01

a:b

11 20

a a:b b

Ç. Çöltekin, SfS / University of Tübingen WS 19–20 9 / 17

slide-17
SLIDE 17

Introduction Operations on FSTs Determinizing FSTs Summary

FST composition

M1 M2 1 2

a:b a b a:b

1

b c a a c b:c

00 01

a:b

11 20

a a:b b a a : c b:c

Ç. Çöltekin, SfS / University of Tübingen WS 19–20 9 / 17

slide-18
SLIDE 18

Introduction Operations on FSTs Determinizing FSTs Summary

FST composition

M1 M2 M1 ◦ M2 1 2

a:b a b a:b

1

b c a a c b:c

00 01

a:b

11 20

a a:b b a a : c b:c

Ç. Çöltekin, SfS / University of Tübingen WS 19–20 9 / 17

slide-19
SLIDE 19

Introduction Operations on FSTs Determinizing FSTs Summary

Projection

  • Projection turns an FST into a FSA, accepting either the input language or the
  • utput language

M 1 2

a:b a b a:b

project(M) 1 2

a a a

Ç. Çöltekin, SfS / University of Tübingen WS 19–20 10 / 17

slide-20
SLIDE 20

Introduction Operations on FSTs Determinizing FSTs Summary

FST determinization

  • A deterministic FST has unambiguous

transitions from every state on any input symbol

  • We can extend the subset construction to

FSTs

  • Determinization often means converting to a

subsequential FST

  • However, not all FSTs can be determinized

Is this FST deterministic?

1 2 3

a:b a a b b a

Ç. Çöltekin, SfS / University of Tübingen WS 19–20 11 / 17

slide-21
SLIDE 21

Introduction Operations on FSTs Determinizing FSTs Summary

FST determinization

  • A deterministic FST has unambiguous

transitions from every state on any input symbol

  • We can extend the subset construction to

FSTs

  • Determinization often means converting to a

subsequential FST

  • However, not all FSTs can be determinized

Is this FST deterministic?

1 2 3

a:b a a b b a

Ç. Çöltekin, SfS / University of Tübingen WS 19–20 11 / 17

slide-22
SLIDE 22

Introduction Operations on FSTs Determinizing FSTs Summary

Sequential FSTs

  • A sequential FST has a single transition from

each state on every input symbol

  • Output symbols can be strings, as well as ϵ
  • The recognition is linear in the length of

input

  • However, sequential FSTs do not allow

ambiguity 1 2 3

a:ab b:a a a:ϵ b b a b

Ç. Çöltekin, SfS / University of Tübingen WS 19–20 12 / 17

slide-23
SLIDE 23

Introduction Operations on FSTs Determinizing FSTs Summary

Subsequential FSTs

  • A k-subsequential FST is a sequential FST which can output up to k strings at an

accepting state

  • Subsequential transducers allow limited ambiguity
  • Recognition time is still linear

1 2

b:ϵ a:b b a:b a b

a bb

  • The 2-subsequential FST above maps every string it accepts to two strings,

e.g.,

– baa → bba – baa → bbbb

Ç. Çöltekin, SfS / University of Tübingen WS 19–20 13 / 17

slide-24
SLIDE 24

Introduction Operations on FSTs Determinizing FSTs Summary

An exercise

Convert the following FST to a subsequential FST

1 2

a:a a:b b a b

1 2 3 4 5 aa ba bb ab

b a b a: b: a:

Ç. Çöltekin, SfS / University of Tübingen WS 19–20 14 / 17

slide-25
SLIDE 25

Introduction Operations on FSTs Determinizing FSTs Summary

An exercise

Convert the following FST to a subsequential FST

1 2

a:a a:b b a b

1 2 3 4 5 aa ba bb ab

b a b a:ϵ b:ϵ a:ϵ

Ç. Çöltekin, SfS / University of Tübingen WS 19–20 14 / 17

slide-26
SLIDE 26

Introduction Operations on FSTs Determinizing FSTs Summary

Determinizing FSTs

Another example

Can you convert the following FST to a subsequential FST? 1 2 3

a:b a a a b a

Note that we cannot ‘determine’ the output on fjrst input, until reaching the fjnal input.

Ç. Çöltekin, SfS / University of Tübingen WS 19–20 15 / 17

slide-27
SLIDE 27

Introduction Operations on FSTs Determinizing FSTs Summary

Determinizing FSTs

Another example

Can you convert the following FST to a subsequential FST? 1 2 3

a:b a a a b a

Note that we cannot ‘determine’ the output on fjrst input, until reaching the fjnal input.

Ç. Çöltekin, SfS / University of Tübingen WS 19–20 15 / 17

slide-28
SLIDE 28

Introduction Operations on FSTs Determinizing FSTs Summary

FSA vs FST

  • FSA are acceptors, FSTs are transducers
  • FSA accept or reject their input, FSTs produce output(s) for the inputs they

accept

  • FSA defjne sets, FSTs defjne relations between sets
  • FSTs share many properties of FSAs. However,

– FSTs are not closed under intersection and complement – We can compose (and invert) the FSTs – Determinizing FSTs is not always possible

  • Both FSA and FSTs can be weighted (not covered in this course)

Ç. Çöltekin, SfS / University of Tübingen WS 19–20 16 / 17

slide-29
SLIDE 29

Introduction Operations on FSTs Determinizing FSTs Summary

Next

  • Practical applications of fjnite-state machines

– String search (FSA) – Finite-state morphology (FST)

  • Dependency grammars and dependency parsing
  • Constituency (context-free) parsing

Ç. Çöltekin, SfS / University of Tübingen WS 19–20 17 / 17

slide-30
SLIDE 30

References

References / additional reading material

  • Jurafsky and Martin (2009, Ch. 3)
  • Additional references include:

– Roche and Schabes (1996) and Roche and Schabes (1997): FSTs and their use in NLP – Mohri (2009): weighted FSTs

Ç. Çöltekin, SfS / University of Tübingen WS 19–20 A.1

slide-31
SLIDE 31

References

References / additional reading material (cont.)

Jurafsky, Daniel and James H. Martin (2009). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. second. Pearson Prentice Hall. isbn: 978-0-13-504196-3. Mohri, Mehryar (2009). “Weighted automata algorithms”. In: Handbook of Weighted Automata. Monographs in Theoretical Computer Science. Springer, pp. 213–254. Roche, Emmanuel and Yves Schabes (1996). Introduction to Finite-State Devices in Natural Language Processing Technical Report. Tech. rep. TR96-13. Mitsubishi Electric Research Laboratories. url: http://www.merl.com/publications/docs/TR96-13.pdf. Roche, Emmanuel and Yves Schabes (1997). Finite-state Language Processing. A Bradford book. MIT Press. isbn: 9780262181822.

Ç. Çöltekin, SfS / University of Tübingen WS 19–20 A.2