[PPT] - Two-Way Automata in Coq Christian Doczkal and Gert Smolka PowerPoint Presentation

SLIDE 1

Two-Way Automata in Coq

Christian Doczkal and Gert Smolka

Interactive Theorem Proving, Nancy, France, August 24, 2016

computer science

saarland

university

SLIDE 2

computer science

saarland

university

Motivation

Myhill-Nerode in Isabelle/HOL based on regular expressions (Wu, Zhang, Urban ’11) Various automata formalizations in dependent type theory:

◮ Myhill-Nerode based on automata in Nuprl (Constable ’00) ◮ Coq tactic for deciding RE equivalence (Coquand Siles ’11) ◮ Coq tactic for deciding Kleene algebras (Braibant Pous ’12)

Student Project: Elegant formalization of automata/Myhill-Nerode based on Ssreflect’s finite types. Equivalence of DFAs, NFAs, and REs and constructive variant of Myhill-Nerode in Coq (Doczkal Kaiser Smolka ’13) Today: Reduction from Two-Way automata to DFAs based on constructive Myhill-Nerode theorem formalized constructively in Coq.

Christian Doczkal & Gert Smolka Two-Way Automata in Coq ITP 2016 2 / 22

SLIDE 3

computer science

saarland

university

Two-Way Automata

Another representation of regular languages Essentially: “Read-only Turing machines without memory” Introduced together with one-way automata (Rabin Scott ’59) Reductions to DFAs in (Rabin Scott ’59) and (Shepherdson ’59) Reduction from 2NFAs to NFAs for complement language (Vardi ’89) Recent Survey Paper by (Pighizzini ’13): “Two-Way Auotmata: Old and Recent Results”

Christian Doczkal & Gert Smolka Two-Way Automata in Coq ITP 2016 3 / 22

SLIDE 4

computer science

saarland

university

Two-Way Automata

2NFA M = (Q, s, F, δ) where Q is a finite type of states s : Q is the starting state F : 2Q is the set of final states δ : Q → Σ ⊎ {⊲, ⊳} → 2Q×{L,R} is the transition function. ⊲ b b a a a b ⊳ s δ

Christian Doczkal & Gert Smolka Two-Way Automata in Coq ITP 2016 4 / 22

SLIDE 5

computer science

saarland

university

Two-Way Automata

2NFA M = (Q, s, F, δ) where Q is a finite type of states s : Q is the starting state F : 2Q is the set of final states δ : Q → Σ ⊎ {⊲, ⊳} → 2Q×{L,R} is the transition function. ⊲ b b a a a b ⊳ p δ

Christian Doczkal & Gert Smolka Two-Way Automata in Coq ITP 2016 4 / 22

SLIDE 6

computer science

saarland

university

Two-Way Automata

2NFA M = (Q, s, F, δ) where Q is a finite type of states s : Q is the starting state F : 2Q is the set of final states δ : Q → Σ ⊎ {⊲, ⊳} → 2Q×{L,R} is the transition function. ⊲ b b a a a b ⊳ q δ

Christian Doczkal & Gert Smolka Two-Way Automata in Coq ITP 2016 4 / 22

SLIDE 7

computer science

saarland

university

Language of a Two-Way Automaton

Configurations on word x: Cx := Q × {0, . . . , |x| + 1}. Step relation on x: (p, i) − →

x (q, j) : Cx → Cx → B.

L(M) := { x ∈ Σ∗ | ∃q ∈ F. (s, 1) − →

x ∗ (q, |x| + 1) }

⊲ b b a a a b ⊳ q δ

1 Language membership is (obviously) decidable 2 Main Result: 2DFAs and 2NFAs accept exactly the regular languages

(M is a deterministic (a 2DFA) if − →

x is functional for all x.)

Christian Doczkal & Gert Smolka Two-Way Automata in Coq ITP 2016 5 / 22

SLIDE 8

computer science

saarland

university

Two-Way vs. One-Way

In := (a + b)∗a(a + b)n−1 automata model size of minimal automaton DFA O(2n) NFA O(n) 2DFA O(n) Cost (in the number of states) of simulating 2DFAs with DFAs is at least exponential. Conjecture (Sakoda & Sipser ’78): The cost of simulating NFAs using 2DFAs is exponential.

Christian Doczkal & Gert Smolka Two-Way Automata in Coq ITP 2016 6 / 22

SLIDE 9

computer science

saarland

university

Main Results

1 Vardi ’89:

n-state 2NFA for L NFA for L with at most 22n states

2 Shepherdson ’59:

n-state 2DFA for L DFA for L with at most (n + 1)(n+1) states

3 Shepherdson ’59 ∪ folklore ∪ Vardi ’89:

n-state 2NFA for L DFA for L with at most 2n2+n states

Christian Doczkal & Gert Smolka Two-Way Automata in Coq ITP 2016 7 / 22

SLIDE 10

computer science

saarland

university

Main Results

1 Vardi ’89:

n-state 2NFA for L NFA for L with at most 22n states

2 Shepherdson ’59:

n-state 2DFA for L DFA for L with at most (n + 1)(n+1) states

3 Shepherdson ’59 ∪ folklore ∪ Vardi ’89:

n-state 2NFA for L DFA for L with at most 2n2+n states

Christian Doczkal & Gert Smolka Two-Way Automata in Coq ITP 2016 7 / 22

SLIDE 11

computer science

saarland

university

Vardi Construction

Input: 2NFA M = (Q, s, F, δ) Output: NFA accepting L(M) ⊲ b b a a a b ⊳ (extended) word in L(M) C0 C1 C2 C3 C4 C5 C6 C7 negative certificate

Definition (Negative Certificate for x)

1 s ∈ C1 2 If p ∈ Ci and (p, i) −

→

x (q, j), then q ∈ Cj 3 F ∩ C|x|+1 = ∅.

Construct NFA N that accepts words that have negative certificates

Christian Doczkal & Gert Smolka Two-Way Automata in Coq ITP 2016 8 / 22

SLIDE 12

computer science

saarland

university

Vardi Construction

C0 C1 C1 C2 C2 C3 C3 C4 b b a C D D′ E a

1 D = D′ 2 If p ∈ D and (q, L) ∈ δ p a, then q ∈ C 3 If p ∈ D and (q, R) ∈ δ p a, then q ∈ E

N := (Q′, S′, F ′, δ′) where Q′ := 2Q × 2Q

Christian Doczkal & Gert Smolka Two-Way Automata in Coq ITP 2016 9 / 22

SLIDE 13

computer science

saarland

university

Vardi Construction

Input: 2NFA M = (Q, s, F, δ) Output: NFA accepting L(M) ⊲ b b a a a b ⊳ (extended) word in L(M) C0 C1 C2 C3 C4 C5 C6 C7 negative certificate

Definition (Negative Certificate for x)

1 s ∈ C1 2 If p ∈ Ci and (p, i) −

→

x (q, j), then q ∈ Cj 3 F ∩ C|x|+1 = ∅.

Construct NFA N that accepts words that have negative certificates

Christian Doczkal & Gert Smolka Two-Way Automata in Coq ITP 2016 10 / 22

SLIDE 14

computer science

saarland

university

Vardi Construction

C0 C1 C1 C2 C2 C3 C3 C4 b b a C D D′ E a

1 D = D′ 2 If p ∈ D and (q, L) ∈ δ p a, then q ∈ C 3 If p ∈ D and (q, R) ∈ δ p a, then q ∈ E

N := (Q′, S′, F ′, δ′) where Q′ := 2Q × 2Q S′ := { (C0, C1) | s ∈ C1, ∀pq. p ∈ C0 ∧ (q, R) ∈ δ p ⊲ → q ∈ C1 } F ′ := { (C, D) | F ∩ D = ∅, ∀pq. p ∈ D ∧ (q, L) ∈ δ p ⊳ → q ∈ C }

Christian Doczkal & Gert Smolka Two-Way Automata in Coq ITP 2016 11 / 22

SLIDE 15

computer science

saarland

university

The Need for Runs

Lemma

x ∈ L(N) iff there exists a negative certificate for x. Usually: generalize to arbitrary states of N and use induction on x direct inductive proof would require a nontrivial generalization ⊲ x ⊳ C0 C Cn

⊲

a x ⊳ ? ? ? ? Formalized proof employs an explicit notion of run (≈ 1/3 of proof) Straightforward but tedious calculation

Christian Doczkal & Gert Smolka Two-Way Automata in Coq ITP 2016 12 / 22

SLIDE 16

computer science

saarland

university

Vardi Result

Theorem

For every n-state 2NFA M one can construct an NFA accepting L(M) that has at most 22n states. (recall: Q′ := 2Q × 2Q and |2Q| = 2|Q| due to extensional representation)

Corollary

For every n-state 2NFA M there exists a DFA accepting L(M) that has at most 222n states.

Christian Doczkal & Gert Smolka Two-Way Automata in Coq ITP 2016 13 / 22

SLIDE 17

computer science

saarland

university

Shepherdson Construction

Input: 2NFA M = (Q, s, F, δ) Output: DFA accepting L(M) having at most 2|Q|2+|Q| states. How to collect all the information M can gather about its input in a single sweep?

Christian Doczkal & Gert Smolka Two-Way Automata in Coq ITP 2016 14 / 22

SLIDE 18

computer science

saarland

university

Tables

⊲ x z ⊳ p q q′ T : Σ∗ → 2Q × 2Q×Q possible states when first entering z enter/return relation when crossing from right “T x abstracts away the first part of the composite word xy.”

Christian Doczkal & Gert Smolka Two-Way Automata in Coq ITP 2016 15 / 22

SLIDE 19

computer science

saarland

university

Tables

⊲ x z ⊳ p q q′ ⊲ y z ⊳ p q q′ T : Σ∗ → 2Q × 2Q×Q possible states when first entering z enter/return relation when crossing from right finite type If T x = T y, then xz ∈ L(M) iff yz ∈ L(M)

Christian Doczkal & Gert Smolka Two-Way Automata in Coq ITP 2016 15 / 22

SLIDE 20

computer science

saarland

university

One-Way Sweep

T : Σ∗ → 2Q × 2Q×Q ⊲ a b b a a b a ⊳ T ε Q′ := 2Q × 2Q×Q s′ := Tε δ′ (Tx) a := T(xa) F ′ := { Tx | x ∈ L(M) }

Christian Doczkal & Gert Smolka Two-Way Automata in Coq ITP 2016 16 / 22

SLIDE 21

computer science

saarland

university

One-Way Sweep

T : Σ∗ → 2Q × 2Q×Q ⊲ a b b a a b a ⊳ T a Q′ := 2Q × 2Q×Q s′ := Tε δ′ (Tx) a := T(xa) F ′ := { Tx | x ∈ L(M) }

Christian Doczkal & Gert Smolka Two-Way Automata in Coq ITP 2016 16 / 22

SLIDE 22

computer science

saarland

university

One-Way Sweep

T : Σ∗ → 2Q × 2Q×Q ⊲ a b b a a b a ⊳ T ab Q′ := 2Q × 2Q×Q s′ := Tε δ′ (Tx) a := T(xa) F ′ := { Tx | x ∈ L(M) }

Christian Doczkal & Gert Smolka Two-Way Automata in Coq ITP 2016 16 / 22

SLIDE 23

computer science

saarland

university

One-Way Sweep

T : Σ∗ → 2Q × 2Q×Q ⊲ a b b a a b a ⊳ T abbaaba Q′ := 2Q × 2Q×Q s′ := Tε δ′ (Tx) a := T(xa) F ′ := { Tx | x ∈ L(M) }

Christian Doczkal & Gert Smolka Two-Way Automata in Coq ITP 2016 16 / 22

SLIDE 24

computer science

saarland

university

One-Way Sweep

T : Σ∗ → 2Q × 2Q×Q ⊲ a b b a a b a ⊳ T abbaaba Q′ := 2Q × 2Q×Q s′ := Tε δ′ (Tx) a := T(xa) surjectivity? right congruence? F ′ := { Tx | x ∈ L(M) }

Christian Doczkal & Gert Smolka Two-Way Automata in Coq ITP 2016 16 / 22

SLIDE 25

computer science

saarland

university

Reduction to Myhill-Nerode

Theorem (Myhill-Nerode)

Let L be a decidable language, X a finite type, and T : Σ∗ → X such that

1 T x = T y → T(xa) = T(ya)

(T right congruent)

2 T x = T y → (x ∈ L ↔ y ∈ L)

(T refines L) Then one can construct a DFA accepting L that has at most |X| states. Proof: construct DFA A = (Q′, s′, F ′, δ′) where Q′ := X s′ := Tε δ′ (Tx) a := T(xa) F ′ := { Tx | x ∈ L }

Christian Doczkal & Gert Smolka Two-Way Automata in Coq ITP 2016 17 / 22

SLIDE 26

computer science

saarland

university

Reduction to Myhill-Nerode

Theorem (Myhill-Nerode)

Let L be a decidable language, X a finite type, and T : Σ∗ → X such that

1 T x = T y → T(xa) = T(ya)

(T right congruent)

2 T x = T y → (x ∈ L ↔ y ∈ L)

(T refines L) Then one can construct a DFA accepting L that has at most |X| states. Proof: construct DFA A = (Q′, s′, F ′, δ′) where Q′ := T(Σ∗) s′ := Tε δ′ (Tx) a := T(xa) F ′ := { Tx | x ∈ L }

Christian Doczkal & Gert Smolka Two-Way Automata in Coq ITP 2016 17 / 22

SLIDE 27

computer science

saarland

university

Reduction to Myhill-Nerode

Theorem (Myhill-Nerode)

Let L be a decidable language, X a finite type, and T : Σ∗ → X such that

1 T x = T y → T(xa) = T(ya)

(T right congruent)

2 T x = T y → (x ∈ L ↔ y ∈ L)

(T refines L) Then one can construct a DFA accepting L that has at most |X| states. Proof: construct DFA A = (Q′, s′, F ′, δ′) where Q′ := T(Σ∗) s′ := Tε δ′ q a := T((Tq)a)) F ′ := { Tx | x ∈ L } where T : Q′ → Σ∗ satisfies T(Tq) = q

Christian Doczkal & Gert Smolka Two-Way Automata in Coq ITP 2016 17 / 22

SLIDE 28

computer science

saarland

university

Reduction to Myhill-Nerode

Theorem (Myhill-Nerode)

Let L be a decidable language, X a finite type, and T : Σ∗ → X such that

1 T x = T y → T(xa) = T(ya)

(T right congruent)

2 T x = T y → (x ∈ L ↔ y ∈ L)

(T refines L) Then one can construct a DFA accepting L that has at most |X| states. Proof: construct DFA A = (Q′, s′, F ′, δ′) where Q′ := T(Σ∗) s′ := Tε δ′ q a := T((Tq)a)) F ′ := { q | T q ∈ L } where T : Q′ → Σ∗ satisfies T(Tq) = q

Christian Doczkal & Gert Smolka Two-Way Automata in Coq ITP 2016 17 / 22

SLIDE 29

computer science

saarland

university

Size Bounds

T : Σ∗ → 2Q × 2Q×Q

Lemma

T is right congruent and refines L(M).

Theorem (Vardi ’89)

For every n-state 2NFA M one can construct a DFA accepting L(M) that has at most 2n2+n states. If M is deterministic: T T ′ : Σ∗ → (Q + 1) × (Q ⇒fin Q + 1)

Theorem (Shepherdson ’59)

For every n-state 2DFA M one can construct a DFA accepting L(M) that has at most (n + 1)(n+1) states.

Christian Doczkal & Gert Smolka Two-Way Automata in Coq ITP 2016 18 / 22

SLIDE 30

computer science

saarland

university

A Glimpse of Technicalities

(p, i) k − →

x (q, j) := (p, i) −

→

x (q, j) ∧ i = k

Tx := ({ q | (s, 1)

|x|+1

− − − →

x ∗ (q, |x| + 1) }, . . .)

Lemma

If Tx = Ty then every run on xz that starts end ends on z has a corresponding run on yz.

Lemma

T is right congruent and refines L(M).

Christian Doczkal & Gert Smolka Two-Way Automata in Coq ITP 2016 19 / 22

SLIDE 31

computer science

saarland

university

A Glimpse of Technicalities

(p, i) k − →

x (q, j) := (p, i) −

→

x (q, j) ∧ i = k

Tx := ({ q | (s, 1)

|x|+1

− − − →

x ∗ (q, |x| + 1) }, . . .)

Lemma

If Tx = Ty, i ≤ |z| + 1, 1 ≤ j ≤ |z| + 1, and 1 ≤ k, then (p, |x| + i)

|x|+k

− − − →

xz ∗ (q, |x| + j) iff (p, |y| + i) |y|+k

− − − →

xz ∗ (q, |y| + j).

Lemma

T is right congruent and refines L(M).

Christian Doczkal & Gert Smolka Two-Way Automata in Coq ITP 2016 19 / 22

SLIDE 32

computer science

saarland

university

A Glimpse of Technicalities

(p, i) k − →

x (q, j) := (p, i) −

→

x (q, j) ∧ i = k

Tx := ({ q | (s, 1)

|x|+1

− − − →

x ∗ (q, |x| + 1) }, . . .)

Lemma

If Tx = Ty, i ≤ |z| + 1, 1 ≤ j ≤ |z| + 1, and 1 ≤ k, then (p, |x| + i)

|x|+k

− − − →

xz ∗ (q, |x| + j) iff (p, |y| + i) |y|+k

− − − →

xz ∗ (q, |y| + j).

Lemma

T is right congruent and refines L(M).

Christian Doczkal & Gert Smolka Two-Way Automata in Coq ITP 2016 19 / 22

SLIDE 33

computer science

saarland

university

Taming Dependent Types

Cx := Q × { i : N | i < |x| + 2 } _ − →

_ _ : ∀x. Cx → Cx → B

(p, i) − →

x (q, i + 1)

(p, i, H1) − →

x (q, i + 1, H2)

H1 : i < |x| + 2, H2 : i + 1 < |x| + 2

Christian Doczkal & Gert Smolka Two-Way Automata in Coq ITP 2016 20 / 22

SLIDE 34

computer science

saarland

university

Taming Dependent Types

Cx := Q × { i : N | i < |x| + 2 } _ − →

_ _ : ∀x. Cx → Cx → B

(p, i) − →

x (q, i + 1)

(p, i, H1) − →

x (q, i + 1, H2)

H1 : i < |x| + 2, H2 : i + 1 < |x| + 2 (p, inord i) − →

x (q, inord (i + 1))

inord : N → { i : N | i < |x| + 2 } Maintains the separation between stating properties and proving the bounds

Christian Doczkal & Gert Smolka Two-Way Automata in Coq ITP 2016 20 / 22

SLIDE 35

computer science

saarland

university

Formalization

Formalization: Part LoC 2FAs and simulation of DFAs 160 Vardi construction (incl. runs) 150 Shepherdson construction (2NFAs and 2DFAs) 290 Total 600 Prerequisites: Coq dependent types, types as first class objects Ssreflect discrete types, finite types, finite sets, quotients, . . . Theory DFAs, NFAs, (Myhill-Nerode) (Doczkal Kaiser Smolka ’13)

Christian Doczkal & Gert Smolka Two-Way Automata in Coq ITP 2016 21 / 22

SLIDE 36

computer science

saarland

university

Formalization

Formalization: Part LoC 1FAs and Myhill-Nerode 600 2FAs and simulation of DFAs 160 Vardi construction (incl. runs) 150 Shepherdson construction (2NFAs and 2DFAs) 290 Total 1200 Prerequisites: Coq dependent types, types as first class objects Ssreflect discrete types, finite types, finite sets, quotients, . . . Theory DFAs, NFAs, (Myhill-Nerode) (Doczkal Kaiser Smolka ’13)

Christian Doczkal & Gert Smolka Two-Way Automata in Coq ITP 2016 21 / 22

SLIDE 37

computer science

saarland

university

Conclusion & Future Work

Conclusion: Translations from 2FAs to 1FAs defined and verified constructively Translation employs constructive variant of Myhill-Nerode Correctness proofs are relatively short but technical https://www.ps.uni-saarland.de/extras/itp16-2FA Future Work: Translation from 2NFAs to NFAs (Kapoutsis ’05)

ther automata models: infinite words, infinite trees, alternating, . . .

Christian Doczkal & Gert Smolka Two-Way Automata in Coq ITP 2016 22 / 22

SLIDE 38

computer science

saarland

university

Conclusion & Future Work

Conclusion: Translations from 2FAs to 1FAs defined and verified constructively Translation employs constructive variant of Myhill-Nerode Correctness proofs are relatively short but technical https://www.ps.uni-saarland.de/extras/itp16-2FA Future Work: Translation from 2NFAs to NFAs (Kapoutsis ’05)

ther automata models: infinite words, infinite trees, alternating, . . .

Thank You! Questions?

Christian Doczkal & Gert Smolka Two-Way Automata in Coq ITP 2016 22 / 22