Simpler and More General Minimization for Weighted Finite-State - PDF document

Appeared in Proceedings of the Joint Meeting of the Human Language Technology Conference and the North American Chapter of the Association for Computational Linguistics (HLT-NAACL 2003) , pp. 64-71, 2003. Simpler and More General Minimization for Weighted Finite-State Automata Jason Eisner Department of Computer Science Johns Hopkins University Baltimore, MD, USA 21218-2691 jason@cs.jhu.edu Abstract semiring and will be explained below. K -valued functions that can be computed by finite-state automata are Previous work on minimizing weighted finite-state automata called rational functions . (including transducers) is limited to particular types of weights. How does minimization generalize to arbitrary weight We present efficient new minimization algorithms that apply much more generally, while being simpler and about as fast. semirings? The question is of practical as well as theoret- We also point out theoretical limits on minimization algo- ical interest. Some NLP automata use the real semiring rithms. We characterize the kind of “well-behaved” weight ( R , + , × ) , or its log equivalent, to compute unnormalized semirings where our methods work. Outside these semirings, probabilities or other scores outside the range [0 , 1] (Laf- minimization is not well-defined (in the sense of producing a ferty et al., 2001; Cortes et al., 2002). Expectation semir- unique minimal automaton), and even finding the minimum number of states is in general NP-complete and inapproximable. ings (Eisner, 2002) are used to handle bookkeeping when training the parameters of a probabilistic transducer. A byproduct of this paper is a minimization algorithm that 1 Introduction works fully with those semirings, a new result permitting It is well known how to efficiently minimize a determin- more efficient automaton processing in those situations. istic finite-state automaton (DFA), in the sense of con- Surprisingly, we will see that minimization is not structing another DFA that recognizes the same language even well-defined for all weight semirings! We will as the original but with as few states as possible (Aho et then (nearly) characterize the semirings where it is well- al., 1974). This DFA also has as few arcs as possible. defined, and give a recipe for constructing minimization Minimization is useful for saving memory, as when algorithms similar to Mohri’s in such semirings. building very large automata or deploying NLP systems Finally, we follow this recipe to obtain a specific, sim- on small hand-held devices. When automata are built up ple and practical algorithm that works for all division through complex regular expressions, the savings from semirings . All the cases above either fall within this minimization can be considerable, especially when ap- framework or can be forced into it by adding multiplica- plied at intermediate stages of the construction, since (for tive inverses to the semiring. The new algorithm provides example) smaller automata can be intersected faster. arguably simpler minimization for the cases that Mohri Recently the computational linguistics community has has already treated, and also handles additional cases. turned its attention to weighted automata that compute 2 Weights and Minimization interesting functions of their input strings. A traditional automaton only returns an boolean from the set K = We introduce weighted automata by example. The trans- { true, false } , which indicates whether it has accepted ducer below describes a partial function from strings to the input. But a probabilistic automaton returns a prob- strings. It maps aab �→ xyz and bab �→ wwyz . Why? ability in K = [0 , 1] , or equivalently, a negated log- Since the transducer is deterministic, each input (such as probability in K = [0 , ∞ ] . A transducer returns an output aab ) is accepted along at most one path; the correspond- string from K = ∆ ∗ (for some alphabet ∆ ). ing output (such as xyz ) is found by concatenating the Celebrated algorithms by Mohri (1997; 2000) have output strings found along the path. ε denotes the empty a:y recently made it possible to minimize deterministic au- string. tomata whose weights (outputs) are log-probabilities or b:zz a:x 1 2 b:z strings. These cases are of central interest in language 0 a:wwy b: ε and speech processing. 5 3 ε b: b:wwzzz However, automata with other kinds of weights can also be defined. The general formulation of weighted 4 automata (Berstel and Reutenauer, 1988) permits any weight set K , if appropriate operations ⊕ and ⊗ are pro- δ and σ standardly denote the automaton’s transition and output functions : δ (3 , a ) = 2 is the state reached by the vided for combining weights from the different arcs of the automaton. The triple ( K, ⊕ , ⊗ ) is called a weight a arc from state 3, and σ (3 , a ) = wwy is that arc’s output.

Simpler and More General Minimization for Weighted Finite-State - PDF document

Appeared in Proceedings of the Joint Meeting of the Human Language Technology Conference and the North American Chapter of the Association for Computational Linguistics (HLT-NAACL 2003) , pp. 64-71, 2003. Simpler and More General Minimization for

1 The Minimization Problem The Minimization Problem Input: A DFA (deterministic finite-state

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Weighted graphs 2 Weighted graphs So far we have only considered weighted graphs with

Weighted graphs 3 Weighted graph Edges in weighted graph are assigned a weight: w(v 1 , v 2 ),

Minimization Satoru Iwata (University of Tokyo) Submodular Function Minimization ( )

FEM Theory: Variational Approach Energy Minimization Another approach: Weighted Residual

Hyper-minimization for deterministic weighted tree automata Andreas Maletti and Daniel Quernheim

Making State Government Simpler, Faster, Better, and Less Costly Michael Buerger and Rich

SimpleR SimpleR - goals and intentions A Windows-based interface to R for basic statistics T

Finite A to B implies |A| = |B| Cardinality for finite A, B finite-card .1 finite-card .2

Dynamic Programming: Interval Scheduling and Knapsack 6.1 Weighted Interval Scheduling Weighted

Heuristic search Weighted A Kustaa Kangas October 17, 2013 K. Kangas () Heuristic search

11.4 The Pricing Method: Vertex Cover Weighted Vertex Cover Weighted vertex cover. Given a

WEIGHTED ORLICZ ALGEBRAS Serap OZTOP Istanbul University ( This is joint work with Alen

Learn more Do more Be more Learn more Do more Be more UNITY Learn more Do

Finite Finite- -state Model state Model Deterministic machines: next state S ( t +1) determined

Functional programming and hardware design: where to now?? Wouter Swierstra, Koen Claessen, Carl

The DESQ Framework for Declarative and Scalable Frequent Sequence Mining Kaustubh Beedkar 1 Rainer

Inducing Suffix and LCP Arrays in External Memory Timo Bingmann, Johannes Fischer, and Vitaly

GPU Primitives - Case Study: Hair Rendering Ulf Assarsson, Markus Billeter, Ola Olsson, Erik

Discrete Morse Theory and Generalized Factor Order Bruce Sagan Department of Mathematics,

Using Lua features to implement a syntax-based test generator Cleverton Hentz 1 and Anamaria

5G: From Theory to Practice Senior Manager, Advanced Wireless Research ian.wong@ni.com ni.com

Advances in QBF Reasoning Florian Lonsing Knowledge-Based Systems Group, Vienna University of