Rigorous Approximated Determinization of Weighted Automata - - PowerPoint PPT Presentation

rigorous approximated determinization of weighted automata
SMART_READER_LITE
LIVE PREVIEW

Rigorous Approximated Determinization of Weighted Automata - - PowerPoint PPT Presentation

Rigorous Approximated Determinization of Weighted Automata Benjamin Aminof (Hebrew University) Orna Kupferman (Hebrew University) Robby Lampert (Weizmann Institute) Israel Outline Weighted automata Determinizability of weighted


slide-1
SLIDE 1

Rigorous Approximated Determinization

  • f Weighted Automata

Benjamin Aminof (Hebrew University) Orna Kupferman (Hebrew University) Robby Lampert (Weizmann Institute) Israel

slide-2
SLIDE 2

Outline

 Weighted automata  Determinizability of weighted automata  Mohri’s determinization algorithm  Approximated-determinization algorithm  Correctness and termination  Summary  Future work

slide-3
SLIDE 3

 w=abc  w=abbd  w=abb

Weighted Automata (WFA)

cost(w)=(1+1+1+1)+0=4 cost(w)=(1+2+1)+0=4

q1/0 q0 q2/0

A:

q3/0

a,1 c,1 b,2 a,1 b,1 d,1

cost(w)=min{5,3}=3

weight functions c: transitions ! R¸0 f: accepting states ! R¸0

slide-4
SLIDE 4

 A run of A on a word w=w1…wn

is a sequence r=r0 r1 r2 … rn over Q such that r02Q0 and for all 1 · i · n, we have .

 A run r is accepting $ rn is accepting.

(standard finite-word accepting condition)

 L(A)={w: A has an accepting run on w}

Weighted Automata – language

ri-1 ri

wi

rn

slide-5
SLIDE 5

Weighted Automata – costs

 A cost of a run r=r0 r1 r2 … rn is

cost(r) = ∑i=1 c( ) + f( )

 defined only for accepting runs

 A cost of a word w=w1…wn is

cost(w)=minaccepting runs r of A on w cost(r)

 If w62L(A) then cost(w)=1.

n

ri-1 ri

wi

rn

slide-6
SLIDE 6

Weighted Automata – more

 A WFA A is trim if each of its states is

reachable from some initial state, and has a reachable accepting state.

 A WFA A is unambiguous (single-run) if

it has at most one accepting run on every word.

slide-7
SLIDE 7

Applications of WFA

 formal verification of quantitative

properties

 automatic speech recognition  image compression  pattern matching (widely used in

computational biology)

 …

slide-8
SLIDE 8

A1 is non-determinizable

 cost(abkc)=2k+2, cost(abkd)=k+2  After reading the word abk, the difference

between the costs of reading c and d is k.

 For i≠j, a deterministic WFA must be in

different states after reading abi and abj.

 A deterministic WFA must have 1 states. q1 q0 q2

A1:

q3/0

a,1 c,1 b,2 a,1 b,1 d,1

slide-9
SLIDE 9

Determinizability

 Weighted automata are not

necessarily determinizable.

 To decide whether a given

weighted automaton is determinizable is an open question.

 A sufficient condition for

determinizability + algorithm [Mohri ’97].

slide-10
SLIDE 10

 The twins property:  In case the automaton is trim (no empty

states) and unambiguous (single-run), the twins property is a characterization.

A sufficient condition [Mohri ’97]

q q0

u u v v

For every two states q,q’2Q, if q,q’2δ(Q0,u), and two words u,v2Σ*,

q’

then cost(q,v,q)=cost(q’,v,q’). q2δ(q,v), and q’2δ(q’,v),

slide-11
SLIDE 11

{(q1,0), (q2,1)}

Determinization algorithm [Mohri ’97] - example

{(q0,0)} {(q1,0), (q2,1)}

A2’:

a,? c,? {(q3,0)} /0 b,? d,?

q1 q0 q2

A2:

q3/0

a,3 c,5 a,4 d,4 b,2 b,3

word / cost ac 8 bc 7 ad 8 bd 7

{(q3,0)} /0 3 min {3,4} min {2,3} 2 3-2 5 0+5 5 0+0 4-3 1+4

slide-12
SLIDE 12

{(q3,0)} /0

c,2 {(q1,0), (q2,2)} /3

Determinization algorithm - another example

q1 q0 q2/1

A3:

q3/0

a,1 d,2 a,3 d,1 b,4 b,1 c,2 c,2

word / cost aci 3+2i+1 bci 2+2i acid 3+2i bcid 2+2i

{(q0,0)}

A3’:

a,1 c,2 b,1 d,1 {(q1,0), (q2,2)} /3 {(q1,3), (q2,0)} /1

{(q3,0)} /0

c,2 {(q1,3), (q2,0)} /1 c,2 d,2

{(q3,0)} /0

slide-13
SLIDE 13

Determinization algorithm - non-determinizable example

b,2

q1 q0 q2

A1:

q3/0

a,1 c,1 a,1 b,1 d,1

{(q1,1), (q2,0)} {(q0,0)} {(q1,0), (q2,0)}

a,1 b,1

word / cost abic 2+2i abid 2+i

A1’:

{(q1,2), (q2,0)}

b,1

{(q1,3), (q2,0)}

b,1

slide-14
SLIDE 14

Determinization algorithm - a bad determinizable example

b,2

q1 q0 q2

A1:

q3/0

a,1 c,1 a,1 b,1 d,1

{(q1,1), (q2,0)} {(q0,0)} {(q1,0), (q2,0)}

a,1 b,1

A1’:

{(q1,2), (q2,0)}

b,1

{(q1,3), (q2,0)}

b,1

d

word / cost abic 2+2i abid 2+i

slide-15
SLIDE 15

Mohri’s algorithm - remarks

 Mohri’s algorithm terminates iff

the original automaton has the twins property.

 For trim and unambiguous WFAs,

there is a polynomial algorithm for testing the twins property.

 There are determinizable WFAs that do

not satisfy the twins property.

T

slide-16
SLIDE 16

Approximated determinization

Given a WFA A and an approximation factor t≥1, construct a deterministic WFA A’, such that for every word w we have cost(A,w) ≤ cost(A’,w) ≤ t ∙ cost(A,w).

 When exact determinization is impossible.  When the result of exact determinization is

too large.

slide-17
SLIDE 17

Succinctness

Σ,t Σ,0 a,1 Σ,0 Σ,0

Σ,0 n-1

A4:

Ln={Σ* . a . Σn- 1}

1 w = ε cost(w)= 1 w 2 Ln t w 2 Σ

+\Ln

L(A4)=Σ

+

A deterministic equivalent requires 2n states A t-approximate deterministic?

2 states

/ t

slide-18
SLIDE 18
  • Approx. determinization algorithm

[Buchsbaum-

Giancarlo-Westbrook ’01]

 Based on Mohri’s algorithm.  Relaxes the condition for unification of

states – rather than requiring residuals

  • f corresponding states to be identical,

requires them to be close (within 1+ε of the smaller one).

 No guarantees about the new costs.  No sufficient condition for termination.

slide-19
SLIDE 19

Our algorithm: t-determinization

 Determinization up to a factor t

 The new cost of any accepted word w is

between cost(w) and t¢cost(w).

 differs from Mohri’s algorithm

 Weights are multiplied by t.  For each state in a subset we maintain

a range of residues rather than one.

 The criterion for unification of states is

relaxed (they may be non-identical).

slide-20
SLIDE 20

{(q1,-1,0), (q2,-1,0)}

2-determinization of A1

{(q0,0,0)}

a,? b,2 b,? b,2

q1 q0 q2

A1:

q3/0

a,1 c,1 a,1 b,1 d,1

A1’:

a,2 {(q1,-1,1), (q2,-2,0)} b,2 lower bound cost(w) upper bound t ¢cost(w) d,2 {(q3,-2,0)} /0 {(q3,-2,0)} /0 c,2

  • 1
  • 1

residual ranges contain those of

  • 1 2
  • 2 0
slide-21
SLIDE 21

a,2

2-determinization of A2

a,2 b,2 b,2

q0/0

A5:

a,2 c,2

q2/0

a,1 b,1

A5’:

b,2

{(q0,-1,0)} /0

q1/0

{(q0,0,0), (q1,0,0)} /0 {(q0,-1,0), (q2,0,2)} /0 {(q1,-2,0)} /0

c,4

{(q0,-2,0)} /0

a,2

{(q0,-2,0), (q1,0,4)} /0

b,2

{(q0,-2,0)} /0

a,2 b,2

{(q2,-4,0)} /0

a,4

{(q1,-4,0)} /0

c,4

{(q1,-6,0)} /0

b,4 b,4

slide-22
SLIDE 22

Correctness of the algorithm

 Thm: If the algorithm terminates on a

given WFA A, with the result A’, then for every word w we have cost(A,w) ≤ cost(A’,w) ≤ t ∙ cost(A,w).

slide-23
SLIDE 23

Termination of the algorithm

 Thm: If a WFA has the t-twins property,

then the algorithm terminates on it.

 The weights and the factor t are rational.

 Thm: For trim unambiguous WFAs,

a WFA is t-determinizable iff it has the t-twins property.

 Thm: Deciding the t-twins property for

trim unambiguous WFAs can be done in polynomial time.

slide-24
SLIDE 24

Summary

 Why approximate determinization?

 Non-determinizable WFA  Equivalent deterministic is large

 t-determinization algorithm

 Weights multiplied by t  Use ranges rather than single residues  Collapse to a state whose ranges are contained in mine

 A sufficient condition

 The t-twins property  For unambiguous WFAs – characterizes determinizability  Decidable in polynomial time

slide-25
SLIDE 25

Future work

 Generalize the termination proof to the

case where the weights and the factor t are real numbers (R¸0).

 An algorithm to decide whether a WFA

is determinizable. Alternatively – prove that it is undecidable.

slide-26
SLIDE 26