CALF: Categorical Automata Learning Framework Matteo Sammartino - - PowerPoint PPT Presentation

calf categorical automata learning framework
SMART_READER_LITE
LIVE PREVIEW

CALF: Categorical Automata Learning Framework Matteo Sammartino - - PowerPoint PPT Presentation

CALF: Categorical Automata Learning Framework Matteo Sammartino Alexandra Silva Gerco van Heerdt May 23, 2017 1 / 61 Active automata learning Active automata learning algorithms learn an automaton describing the behaviour of a system by


slide-1
SLIDE 1

CALF: Categorical Automata Learning Framework

Gerco van Heerdt Matteo Sammartino Alexandra Silva May 23, 2017

1 / 61

slide-2
SLIDE 2

Active automata learning

◮ Active automata learning algorithms learn an automaton

describing the behaviour of a system by providing inputs and

  • bserving outputs

◮ Enables verification methods that work on an automaton ◮ Allows comparison of different implementations of e.g. a

network protocol Capturing systems more precisely requires more complex types of automata and more complicated learning algorithms Idea: understanding the main concepts on an abstract level helps developing and reasoning about new algorithms

2 / 61

slide-3
SLIDE 3

CALF

Our Categorical Automata Learning Framework

◮ Gives an abstract view on the ingredients and constructions of

learning algorithms, leading to new adaptations

◮ Covers also minimisation and equivalence testing ◮ Allows transferring optimisations among these areas

Automata Learning Minimisation Testing

1

  • ther automata

types

  • ptimisations

2 3 3 / 61

slide-4
SLIDE 4

Active learning of DFAs: the basic setting

◮ Finite alphabet set A ◮ Target regular language L: A∗ → 2 = {0, 1} ◮ Oracle that can tell whether a given word is in L (membership

queries) Aim is to learn a DFA accepting L, in particular the minimal one A simple data structure used to conjecture a DFA is the

  • bservation table

4 / 61

slide-5
SLIDE 5

Observation table

Given S, E ⊆ A∗, define rowt : S → 2E rowt(s)(e) = L(se) rowb : S · A → 2E rowb(sa)(e) = L(sae) E ε a aa S

  • ε

1 S · A

  • a

1 b S and E evolve throughout runs of learning algorithms

5 / 61

slide-6
SLIDE 6

Hypothesis

Given an observation table defined by S, E ⊆ A∗, the hypothesis DFA is given by H = {rowt(s) | s ∈ S} ⊆ 2E init ∈ H init = rowt(ε) δ: H × A → H δ(rowt(s), a) = rowb(sa)

  • ut: H → 2
  • ut(rowt(s)) = rowt(s)(ε)

provided that ε ∈ S ∩ E and two properties hold

6 / 61

slide-7
SLIDE 7

Closedness and consistency

◮ Closedness states that each transition leads to a state of the

  • hypothesis. The table is closed if for all t ∈ S · A there is

s ∈ S such that rowt(s) = rowb(t)

◮ Consistency states that there is no ambiguity in determining

  • transitions. The table is consistent if for all s1, s2 ∈ S with

rowt(s1) = rowt(s2) we have, for any a ∈ A, rowb(s1a) = rowb(s2a)

7 / 61

slide-8
SLIDE 8

Closedness

The table is closed if for all t ∈ S · A there is s ∈ S such that rowt(s) = rowb(t) ε ε 1 a If no such s exists, add the word t to S ε ε 1 a aa 1

8 / 61

slide-9
SLIDE 9

Consistency

The table is consistent if for all s1, s2 ∈ S with rowt(s1) = rowt(s2) we have, for any a ∈ A, rowb(s1a) = rowb(s2a) ε ε 1 a 1 aa If rowb(s1a)(e) = rowb(s2a)(e), add ae to E to distinguish rowt(s1) and rowt(s2) ε aε ε 1 1 a 1 aa

9 / 61

slide-10
SLIDE 10

Hypothesis construction

◮ State space: distinct top rows (image of rowt) ◮ Initial state: ε row ◮ Output: taken from ε column ◮ Transitions: appending symbols to row labels

ε a ε 1 a 1 aa 1 0 1

  • a
  • 1 0

a

  • 10 / 61
slide-11
SLIDE 11

ID algorithm

Assume a given set S ⊆ A∗ such that for every state of the minimal DFA accepting L there is a word in S reaching that state Closedness will automatically hold

  • 1. Initialise E = {ε}
  • 2. Enforce consistency
  • 3. Construct the hypothesis

The hypothesis will be isomorphic to the minimal DFA

11 / 61

slide-12
SLIDE 12

L

⋆ algorithm

Assume an oracle that can tell whether a hypothesis accepts the right language, and if not provides a counterexample word (equivalence queries)

  • 1. Enforce closedness and consistency
  • 2. Construct the hypothesis
  • 3. Ask the oracle if the hypothesis is correct
  • 4. If not, add all prefixes of the counterexample to S and restart

The hypothesis will be correct after finitely many iterations, and it will be isomorphic to the minimal DFA

12 / 61

slide-13
SLIDE 13

DA of words

Given the language L: A∗ → 2, we have a DA accepting L:

◮ State space: A∗ ◮ Initial state: ε ∈ A∗ ◮ Output: L: A∗ → 2 ◮ Transitions:

c : A∗ × A → A∗ c(u, a) = ua

13 / 61

slide-14
SLIDE 14

Reachability map

If Q is a DA accepting L, there is a unique DA homomorphism r : A∗ → Q given by r(ε) = initQ r(ua) = δQ(r(u), a) called the reachability map, which assigns to each word the state it reaches in Q Q is reachable if r is surjective: every state is reached by a word

14 / 61

slide-15
SLIDE 15

DA of languages

Given the language L: A∗ → 2, we have a DA accepting L:

◮ State space: 2A∗ ◮ Initial state: L ∈ 2A∗ ◮ Output:

ε?: 2A∗ → 2 ε?(l) = l(ε)

◮ Transitions:

∂ : 2A∗ × A → 2A∗ ∂(l, a)(v) = l(av) e.g. ∂({a, ba, abb}, a) = {ε, bb}

15 / 61

slide-16
SLIDE 16

Observability map

If Q is a DA accepting L, there is a unique DA homomorphism

  • : Q → 2A∗ given by
  • (q)(ε) = outQ(q)
  • (q)(av) = o(δQ(q, a))(v)

called the observability map, which assigns to each state the language it accepts The DA Q is observable if o is injective: different states accept different languages A DA is minimal if it is both reachable and observable

16 / 61

slide-17
SLIDE 17

Total response

The language L: A∗ → 2 induces DAs A∗ and 2A∗ accepting L The reachability map of 2A∗ coincides with the observability map

  • f A∗ in the DA homomorphism called the total response of L:

tL : A∗ → 2A∗ tL(u)(v) = L(uv) If Q is any DA accepting L, then tL = A∗

r

− → Q o − → 2A∗

17 / 61

slide-18
SLIDE 18

Function factorisation

Every function can be written as a surjection followed by an injection: B

f

  • e

C im(f )

  • m
  • e(b) = f (b)

m(c) = c

18 / 61

slide-19
SLIDE 19

Factorisation uniqueness

In a commutative square of functions as on the left, U

i g

V

h

  • W

j

X

U

i g

V

h

  • d
  • W

j X

where i is surjective and j injective, there is a unique diagonal d making the triangles commute: d(i(u)) = g(u)

19 / 61

slide-20
SLIDE 20

DA homomorphism factorisation

In an image factorisation B

f

  • e

C im(f )

m

  • if f is a DA homomorphism, then so are e and m, given this DA

structure on im(f ):

◮ Initial state: initial state of C ◮ Output: output of C ◮ Transitions: the unique diagonal

B × A

e×idA δB

im(f ) × A

m×idA δim(f )

C × A

δC

  • B

e

im(f )

m

C

20 / 61

slide-21
SLIDE 21

Minimal DA

The minimal DA accepting L: A∗ → 2 can be obtained in theory by factorising the total response tL: A∗

tL

  • e

2A∗ M

  • m
  • Since e and m are DA homomorphisms, we must have e = rM and

m = oM by the uniqueness properties A∗

tL

  • r

2A∗ M

  • 21 / 61
slide-22
SLIDE 22

Minimisation

Similarly, the reachable part of a DA Q is obtained by factorising its reachability map: A∗

r

  • r

Q R

  • Equivalent states are merged by factorising the observability map:

Q

  • 2A∗

O

  • 22 / 61
slide-23
SLIDE 23

The hypothesis approximates the minimal DA

A∗

tL

  • r

2A∗ M

  • Concretely, the minimal DA is given by

M = {tL(u) | u ∈ A∗} init ∈ M init = tL(ε) δ: M × A → M δ(tL(u), a) = tL(ua)

  • ut: M → 2
  • ut(tL(u)) = tL(u)(ε)

This is equivalent to the hypothesis for S = E = A∗

23 / 61

slide-24
SLIDE 24

Abstract automaton

Given a category C, objects I and O in C, and a functor F : C → C, an automaton is an object Q in C with three morphisms: FQ

δ

  • Q
  • ut
  • I

init

  • O

24 / 61

slide-25
SLIDE 25

DAs as automata

For DAs:

◮ A singleton 1 serves as the initial state selector ◮ The set 2 = {0, 1} captures rejection (0) and acception (1) ◮ The functor (−) × A provides the transition domain

FQ

δ

  • Q
  • ut
  • I

init

O Q × A

δ

  • Q
  • ut
  • 1

init

2

25 / 61

slide-26
SLIDE 26

Reachability and observability maps

Assume an initial object @ among automata without output and a final object Ω among automata without initial state: F@

Fr

  • FQ

Fo

  • δ
  • FΩ
  • @

r

Q

  • ut
  • I
  • init
  • O

Languages can be defined as morphisms I → Ω or @ → O, which correspond bijectively to each other through the total response The total response may be defined as the reachability map of Ω or as the observability map of @

26 / 61

slide-27
SLIDE 27

Factorisation system

We assume two classes of C-morphisms:

◮ “surjective” morphisms E and ◮ “injective” morphisms M

such that

◮ every C-morphism f : A → B can be factored as f = m ◦ e,

with e ∈ E and m ∈ M;

◮ E and M are closed under composition and contain all isos; ◮ everything in E is an epi, and everything in M is a mono; and ◮ we have the unique diagonal property that does not fit on this

slide but is the same as before Lifts to the category of automata if F preserves E

27 / 61

slide-28
SLIDE 28

28 / 61

slide-29
SLIDE 29

Approximating an object

A wrapper for an object T is a pair of morphisms w = (S

σ

− → T, T

π

− → P)

◮ T is called the target of w ◮ σ selects from T ◮ π classifies T

The (unstructured) hypothesis H is the image of ξ = π ◦ σ: S

σ

  • e

T

π

P

H

m

  • 29 / 61
slide-30
SLIDE 30

Observation table wrapper

For S, E ⊆ A∗, we have a wrapper for the minimal DA M for L: (S

α

− → A∗

r

− → M, M

→ 2A∗

ω

− → 2E), where

◮ α is the inclusion and ◮ ω restricts to E

Recall o ◦ r = tL and note that ξ = ω ◦ tL ◦ α = rowt The image of rowt is precisely the state space of the hypothesis in learning

30 / 61

slide-31
SLIDE 31

Approximating algebraic structure

Consider a wrapper w = (S

σ

− → T, T

π

− → P) Given a functor F and an F-algebra f : FT → T, we have the approximation ξf = FS

Fσ FT f

  • T

π

P

31 / 61

slide-32
SLIDE 32

Approximating the minimal DFA transition function

For the observation table wrapper (S

α

− → A∗

r

− → M, M

→ 2A∗

ω

− → 2E) and transition function δ: M × A → M, we have ξδ = rowb (up to S × A ∼ = S · A): S × A

α×idA A∗ × A r×idA c

  • M × A

δ

  • A∗

r

  • tL
  • M
  • 2A∗

ω

2E

rowb(sa)(e) = L(sae)

32 / 61

slide-33
SLIDE 33

Closedness and consistency

A wrapper (S

σ

− → T, T

π

− → P) is f -closed, for f : FT → T, if a morphism close exists making the left triangle commute FS

Fe ξf

  • close

FH

cons

  • H

m

P

S × A

e×id ξδ

  • close
  • H × A

cons

  • H

m

2E

It is f -consistent if a morphism cons exists making the right triangle commute For the observation table wrapper, δ-closedness and δ-consistency are the classical notions of closedness and consistency

33 / 61

slide-34
SLIDE 34

Structured hypothesis

If (S

σ

− → T, T

π

− → P) is f -closed and f -consistent, for f : FT → T, we have an algebra FS

Fe close

FH

cons

  • θ
  • H

m

P

S × A

e×idA close

  • H × A

cons

  • θ
  • H

m

2E

For an observation table wrapper and f = δM, θ = δH (We only consider F that preserve “surjective” morphisms)

34 / 61

slide-35
SLIDE 35

Initial state

The initial state of M can be seen as an algebra init: 1 → M for 1 = {∗} an arbitrary singleton This gives a closedness property init-closedness (init-consistency is trivial) stating that there must be s ∈ S s.t. ξ(s) = ξinit(∗), where ξinit : 1 → 2E ξinit(∗)(e) = L(e) is the row of the empty word Thus, this property is weaker than requiring ε ∈ S

35 / 61

slide-36
SLIDE 36

Output

The set of accepting states can be seen as a coalgebra

  • ut: M → 2

for 2 = {0, 1} This gives a consistency property (technically coclosedness) stating that for all s1, s2 ∈ S s.t. ξ(s1) = ξ(s2) we must have ξout(s1) = ξout(s2), where ξout : S → 2 ξinit(s) = L(s) is the column of the empty word Again, this property is weaker than requiring ε ∈ E

36 / 61

slide-37
SLIDE 37

Simple correctness conditions

Consider a wrapper (S

σ

− → T, T

π

− → P) If σ is surjective, we have a diagonal S

σ e

T

π

  • φ
  • H

m

P

If π is injective, we have a diagonal S

e σ

H

m

  • ψ
  • T

π

P

If both of these hold, then φ and ψ are inverse to each other

37 / 61

slide-38
SLIDE 38

Results

Consider a wrapper (S

σ

− → T, T

π

− → P) If σ is surjective, we have a diagonal S

σ e

T

π

  • φ
  • H

m

P

◮ For any f : FT → T, the wrapper is f -closed ◮ If the wrapper is f -consistent, φ is an F-algebra

homomorphism

38 / 61

slide-39
SLIDE 39

Results

Consider a wrapper (S

σ

− → T, T

π

− → P) If π is injective, we have a diagonal S

e σ

H

m

  • ψ
  • T

π

P

◮ For any f : FT → T, the wrapper is f -consistent ◮ If the wrapper is f -closed, ψ is an F-algebra homomorphism

39 / 61

slide-40
SLIDE 40

Results

Consider a wrapper (S

σ

− → T, T

π

− → P) If σ is surjective and π injective, we have diagonals S

σ e

T

π

  • φ
  • H

m

P

S

e σ

H

m

  • ψ
  • T

π

P

◮ φ and ψ are inverse to each other ◮ For any f : FT → T, the wrapper is f -closed and f -consistent ◮ φ and ψ are F-algebra homomorphisms (and thus isos)

40 / 61

slide-41
SLIDE 41

Simple correctness conditions for observation tables

For an observation table wrapper (S

α

− → A∗

r

− → M, M

→ 2A∗

ω

− → 2E),

◮ r ◦ α is surjective if and only if for each state of M there is a

word in S reaching that state

◮ ω ◦ o is injective if and only if for each pair of distinct states

  • f M there is a word in E on which they behave differently

41 / 61

slide-42
SLIDE 42

Less simple correctness conditions (1)

Let (S

σ

− → Q, Q π − → P) be a wrapper for an automaton Q If

◮ σ is surjective; ◮ Q is observable; ◮ the wrapper is out-consistent; and ◮ the wrapper is δ-consistent

then H is an automaton isomorphic to Q

42 / 61

slide-43
SLIDE 43

Less simple correctness conditions (2)

Let (S

σ

− → Q, Q π − → P) be a wrapper for an automaton Q If

◮ π is injective; ◮ Q is reachable; ◮ the wrapper is init-closed; and ◮ the wrapper is δ-closed

then H is an automaton isomorphic to Q

43 / 61

slide-44
SLIDE 44

ID correctness

If

◮ σ is surjective; ◮ Q is observable; ◮ the wrapper is out-consistent; and ◮ the wrapper is δ-consistent

then H is an automaton isomorphic to Q

◮ ID assumes a set S such that σ = S α

− → A∗

r

− → M is surjective

◮ Q = M is observable by definition ◮ out-consistency holds because ε ∈ E ◮ δ-consistency is what the algorithm enforces

44 / 61

slide-45
SLIDE 45

L

⋆ correctness

If

◮ σ is surjective; ◮ Q is observable; ◮ the wrapper is out-consistent; and ◮ the wrapper is δ-consistent

then H is an automaton isomorphic to Q

◮ Adding all prefixes of a counterexample to S increases the

image of σ = S

α

− → A∗

r

− → M

◮ Q = M is observable by definition ◮ out-consistency holds because ε ∈ E ◮ δ-consistency is enforced before constructing the hypothesis

45 / 61

slide-46
SLIDE 46

Reachability analysis

To find the reachable part R of a known DFA Q, we can use a wrapper of inclusions (S → R, R → Q), where S ⊆ R

◮ init-closedness: initR ∈ S ◮ δ-closedness: for each s ∈ S and a ∈ A, δR(s, a) ∈ S

Since initR = initQ and δR(s, a) = δQ(s, a), this leads to the usual algorithm:

◮ initialise S = {initQ} ◮ while δQ(s, a) ∈ S, add it

46 / 61

slide-47
SLIDE 47

Reachability analysis correctness

(S

σ

− → R, R

π

− → Q) If

◮ π is injective; ◮ R is reachable; ◮ the wrapper is init-closed; and ◮ the wrapper is δ-closed

then H is an automaton isomorphic to R

47 / 61

slide-48
SLIDE 48

State merging

To merge equivalent states of a DFA Q, we could use a wrapper (Q σ − → O, O o − → 2A∗

ω

− → 2E) where O is the automaton of languages accepted by Q and σ classifies states according to their language

  • ut-consistency says that states of Q equivalent under ξ must have

the same output (accept/reject) δ-consistency says that equivalent states of Q must have equivalent successors for each a ∈ A These can be satisfied as in learning

48 / 61

slide-49
SLIDE 49

State merging correctness

(Q σ − → O, O o − → 2A∗

ω

− → 2E) If

◮ σ is surjective; ◮ O is observable; ◮ the wrapper is out-consistent; and ◮ the wrapper is δ-consistent

then H is an automaton isomorphic to O

49 / 61

slide-50
SLIDE 50

General equivalence testing theorem

For U and V DFAs and S, E ⊆ A∗ we have wrappers wU = (σU, πU) = (S

α

− → A∗

r

− → U, U

→ 2A∗

ω

− → 2E) wV = (σV , πV ) = (S

α

− → A∗

r

− → V , V

→ 2A∗

ω

− → 2E) Suppose

◮ σU is surjective; ◮ πU is injective; and ◮ either σV is surjective and V observable or πV is injective and

V reachable Then U ∼ = V if and only if all of the below hold ξwU = ξwV ξwU

δ

= ξwV

δ

ξwU

init = ξwV init

ξwU

  • ut = ξwV
  • ut

50 / 61

slide-51
SLIDE 51

W-method

Let U be a known minimal DFA and V an unknown one Using minimization-like algorithms inspired by learning, we can find

◮ S ⊆ A∗ such that S σU

− → U is surjective and

◮ E ⊆ A∗ such that U πU

− → 2E is injective These are the first two conditions for the theorem They also ensure that the hypothesis of wU is isomorphic to U

51 / 61

slide-52
SLIDE 52

W-method

Assume that at this point the equalities hold: ξwU = ξwV ξwU

δ

= ξwV

δ

ξwU

init = ξwV init

ξwU

  • ut = ξwV
  • ut

Then the two hypotheses coincide and are isomorphic to U Assume a given upper bound n on |V | Updating S to S · A≤(n−|U|) ensures that (assuming ε ∈ S)

◮ σV is surjective (and we know that V is observable)

which triggers the theorem: U ∼ = V if and only if ξwU = ξwV ξwU

δ

= ξwV

δ

ξwU

init = ξwV init

ξwU

  • ut = ξwV
  • ut

52 / 61

slide-53
SLIDE 53

W-method

Determining the equalities ξwU = ξwV ξwU

δ

= ξwV

δ

ξwU

init = ξwV init

ξwU

  • ut = ξwV
  • ut

consists in testing whether U and V agree on a set of words: ξwU(s)(e) = LU(se) ξwU

δ (s)(a)(e) = LU(sae)

etc., and analogously for V

53 / 61

slide-54
SLIDE 54

Optimising learning

We distinguish rows by adding a word to E, but this requires a query for every row in the table More efficient is to handle the classification using a classification tree

54 / 61

slide-55
SLIDE 55

Classification tree

a

1

bb

1

◮ Internal nodes represent experiments, the result of which

determines the next subtree

◮ Classification is into the set L of labels making up the leaves ◮ A tree τ classifies languages, 2A∗ ωτ

− → L, and states of an automaton using the composition Q o − → 2A∗

ωτ

− → L

55 / 61

slide-56
SLIDE 56

Sifting

Given the target language L, a tree also classifies words: given a word u we move on a node v to the subtree of L(uv) This is called sifting The closedness and consistency that follow from our general definitions are conveniently described using this classification:

◮ Closedness states that all words in S · A must sift into a leaf

into which a word from S sifts

◮ Consistency states that for s1, s2 ∈ S sifting into the same leaf,

s1a and s2a for each a ∈ A must also sift into the same leaf

56 / 61

slide-57
SLIDE 57

Optimised algorithms

◮ L ⋆: Kearns and Vazirani’s algorithm ◮ State merging: splitting tree algorithm ◮ Conformance testing: HSI-method

57 / 61

slide-58
SLIDE 58

Other instances

◮ Nondeterministic automata (more generally JSL automata) ◮ Weighted automata over a field ◮ Nominal automata ◮ Automata with a state space that is an algebra for a monad

preserving finite sets (naive general algorithm: CONCUR submission)

58 / 61

slide-59
SLIDE 59

Future work: more instances

◮ Register automata ◮ Tree automata ◮ B¨

uchi-style automata

◮ Alternating automata ◮ (Subclasses of) probabilistic automata

59 / 61

slide-60
SLIDE 60

More future work

◮ Optimised algorithms for automata with structure ◮ Implementation of the CONCUR algorithm ◮ Describing iterative algorithms abstractly ◮ Finding other (possibly even non-automaton?) applications of

the general wrapper theory

60 / 61

slide-61
SLIDE 61

Reading material

◮ Master thesis: An Abstract Automata Learning Framework

Gerco van Heerdt

◮ CSL submission: CALF: Categorical Automata Learning

Framework Gerco van Heerdt, Matteo Sammartino, Alexandra Silva

◮ CONCUR submission: Learning Automata with Side-Effects

Gerco van Heerdt, Matteo Sammartino, Alexandra Silva These and others can be found on our website: http://calf-project.org

61 / 61