In the Maze of Data Data Languages Languages In the Maze of Loris - - PowerPoint PPT Presentation

in the maze of data data languages languages in the maze
SMART_READER_LITE
LIVE PREVIEW

In the Maze of Data Data Languages Languages In the Maze of Loris - - PowerPoint PPT Presentation

In the Maze of Data Data Languages Languages In the Maze of Loris D'Antoni Loris D'Antoni WPE II WPE II 05/08/2012 05/08/2012 Data Languages Data Languages Motivation Motivation Data Model Data Strings Automata and Logics


slide-1
SLIDE 1

In the Maze of In the Maze of Data Data Languages Languages

Loris D'Antoni Loris D'Antoni WPE II WPE II 05/08/2012 05/08/2012

slide-2
SLIDE 2

05/08/12 In the Maze of Data Languages 2

Data Languages Data Languages

  • Motivation

Motivation

  • Data Model
  • Data Strings

– Automata and Logics – Regularity

  • Data Trees
  • Conclusion
slide-3
SLIDE 3

05/08/12 In the Maze of Data Languages 3

Introduction Introduction

  • Most analysis techniques over programs and data

consider the domain to be finite finite in order to achieve decidability

  • Often this restriction is too strong

– XML documents and languages over XML use data

data comparison comparison

– Interesting properties about programs compare values

values

  • f variables
  • f variables at different points in the program
slide-4
SLIDE 4

05/08/12 In the Maze of Data Languages 4

Motivation 1: XML Processing Motivation 1: XML Processing

Messages Note Note Id From To Body Id From To Body Mary 502 501 Tom Prepare dinner! Mary Tom Will do!

  • An XML document can be seen as an unranked tree in which

– Inner nodes correspond to elements

elements (tags)

– Leaves correspond to data

data (attributes, text content)

slide-5
SLIDE 5

05/08/12 In the Maze of Data Languages 5

Motivation 1: XML Processing Motivation 1: XML Processing

Messages Note Note Id From To Body Id From To Body

  • For many useful tasks data values can be ignored

– we can consider the tree

tree to be over a finite alphabet finite alphabet

– good for navigation, validation, transformation...

WHAT ABOUT TASKS IN WHICH WE WANT TO WHAT ABOUT TASKS IN WHICH WE WANT TO SPECIFY CONSTRAINTS OVER DATA? SPECIFY CONSTRAINTS OVER DATA?

slide-6
SLIDE 6

05/08/12 In the Maze of Data Languages 6

Motivation 1: XML Processing Motivation 1: XML Processing

  • A concrete example: XPath query optimization

XPath query optimization

  • SCHEMA:

SCHEMA: can define XML language and can also specify constraints on data

  • XPATH:

XPATH: query language for XML that also allows data comparison

– Q1

Q1: select all notes someone sent to himself

– Q2

Q2: select people who sent more than 3 notes

  • QUERY OPTIMIZATION:

QUERY OPTIMIZATION: given two XPath queries q1,q2 and a Schema S, decide whether, for each valid document x in S, q1(x) ⊆ q2(x for each valid document x in S, q1(x) ⊆ q2(x)

slide-7
SLIDE 7

05/08/12 In the Maze of Data Languages 7

Motivation 2: Verification Motivation 2: Verification

  • Model Checking:

Model Checking: checking properties about programs that can have possibly infinite reachable states

– Represent system as a finite structure – Define a transition relation – Use algorithm for reachability of some particular state

  • Several ad-hoc solutions

ad-hoc solutions for particular cases of infinite alphabets and infinite states

– Timed Automata [Alur90] – Regular model checking [Bouajjani00]

slide-8
SLIDE 8

05/08/12 In the Maze of Data Languages 8

Motivation 2: Verification Motivation 2: Verification

  • No model considers inter-state properties such as

the same resource is never granted twice the same resource is never granted twice (with infinitely many resources) (with infinitely many resources)

  • A run of the transition system can be seen as a string/list
  • f the form

where the states are from a finite alphabet and the resources are from an infinite domain and

  • Now we can ask:

Is there a list with the same resource appearing twice Is there a list with the same resource appearing twice

q0 r1 q1 r4 q3 r1 qf r1 ….

slide-9
SLIDE 9

05/08/12 In the Maze of Data Languages 9

Some Models for Infinite Some Models for Infinite

  • Several models have been proposed to work with infinite

infinite alphabets alphabets:

– LTL with Freeze Quantifiers (LTL with storing registers) – Timed Automata (can reason about Time) – Symbolic Automata and Transducers (theory over input)

  • Most of these models are quite domain specific even

though they come with nice properties

  • We want a general theory

general theory for structures structures over

  • ver infinite alphabets

infinite alphabets

slide-10
SLIDE 10

05/08/12 In the Maze of Data Languages 10

Data Languages Data Languages

  • Motivation
  • Data Model

Data Model

  • Data Strings

– Automata and Logics – Regularity

  • Data Trees
  • Conclusion
slide-11
SLIDE 11

05/08/12 In the Maze of Data Languages 11

Data Model: Design Principles Data Model: Design Principles

  • We need a simple model with some decidable features
  • The model should be useful
  • Possibly it should be guided by some practical

applications DATA STRINGS and DATA TREES

DATA STRINGS and DATA TREES

slide-12
SLIDE 12

05/08/12 In the Maze of Data Languages 12

Data Languages Data Languages

  • We take languages of words and trees over finite

alphabets

  • Then, one
  • ne data element from an infinite domain

infinite domain is allowed for every position/node

  • The only operation
  • nly operation that can be performed over data is

checking for equality equality

  • It is a bit restrictive but easy to study

easy to study and useful useful in practice

  • Moreover, most extensions immediately lead to

undecidability

slide-13
SLIDE 13

05/08/12 In the Maze of Data Languages 13

Data Strings Data Strings

  • In a data string each position carries

– a label

label from a finite alphabet and

– a data value

data value from an infinite alphabet

  • For example in the data string above

– the finite alphabet is {r,w,s} – the infinite alphabet is the natural numbers r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4

slide-14
SLIDE 14

05/08/12 In the Maze of Data Languages 14

Data Trees Data Trees

  • Similarly, in a data tree each NODE

NODE carries

– a label

label from a finite alphabet and

– a data value

data value from an infinite alphabet

Messages Note Note Id From To Body Id From To Body Mary 502 501 Tom Mary Tom Will do! Prepare dinner!

slide-15
SLIDE 15

05/08/12 In the Maze of Data Languages 15

And now? And now?

  • Data languages seem to nicely extend regular languages
  • The framework is set but now:

– How do we define data languages? – What is the best/right

best/right model for data string languages?

– What is the best/right

best/right model for data tree languages?

– What is a regular

regular data language?

slide-16
SLIDE 16

05/08/12 In the Maze of Data Languages 16

Regularity Regularity

  • Ideally we are looking for a model for data languages with

all the nice properties of regular string languages

– Good tradeoff between expressiveness

expressiveness and decidability decidability

– Efficiency

Efficiency of the membership problem

– Good closure properties

closure properties

– Robustness:

Robustness: clear counter part in logic and several characterizations

DOES A MODEL LIKE THAT EVEN EXIST? DOES A MODEL LIKE THAT EVEN EXIST?

slide-17
SLIDE 17

05/08/12 In the Maze of Data Languages 17

Data Languages Data Languages

  • Motivation
  • Data Model
  • Data Strings

Data Strings

– Automata and Logics

Automata and Logics

– Regularity

  • Data Trees
  • Conclusion
slide-18
SLIDE 18

05/08/12 In the Maze of Data Languages 18

Models for Data Strings Models for Data Strings

  • Several models have been proposed for data strings and

they are mainly of two kinds:

– Auotomata

Auotomata based models

– Logic

Logic based models

  • Usually an automata model is good when it has an

equivalent logic model

  • Here we present the models that are considered more

relevant in the `treasure hunt' for regular data string languages

slide-19
SLIDE 19

05/08/12 In the Maze of Data Languages 19

Register Automata 1/4 Register Automata 1/4

  • Finite state automaton + finite set of registers

finite set of registers that can store data values and test for equality STATE q R1=4, R2=1 (q,R1,r) steps to (q',L) (q,R1,r) steps to (q',L) STATE q' q' R1=4, R2=1

r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4 r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4

slide-20
SLIDE 20

05/08/12 In the Maze of Data Languages 20

Register Automata 2/4 Register Automata 2/4

STATE q R1=4, R2=1 if no R no R contains current value (q,r) steps to (q',R2,R) (q,r) steps to (q',R2,R) STATE q' q' R1=4, R2=5 5

r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4 r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4

slide-21
SLIDE 21

05/08/12 In the Maze of Data Languages 21

Register Automata 3/4 Register Automata 3/4

  • Language of data strings were two adjacent positions

contain the same data value

– (q0,{r,w,s}) → (q1,R1,R) – (q1,R1,{r,w,s})

(qf,S)

– (q1,{r,w,s})

(q1,R1,R) STATE q0 R1=

r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4

slide-22
SLIDE 22

05/08/12 In the Maze of Data Languages 22

Register Automata 3/4 Register Automata 3/4

  • Language of data strings were two adjacent positions

contain the same data value

– (q0,{r,w,s}) → (q1,R1,R)

(q0,{r,w,s}) → (q1,R1,R)

– (q1,R1,{r,w,s})

(qf,S)

– (q1,{r,w,s})

(q1,R1,R) STATE q0 R1=

r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4

slide-23
SLIDE 23

05/08/12 In the Maze of Data Languages 23

Register Automata 3/4 Register Automata 3/4

  • Language of data strings were two adjacent positions

contain the same data value

– (q0,{r,w,s}) → (q1,R1,R) – (q1,R1,{r,w,s})

(qf,S)

– (q1,{r,w,s})

(q1,R1,R) STATE q1 q1 R1=1 1

r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4

slide-24
SLIDE 24

05/08/12 In the Maze of Data Languages 24

Register Automata 3/4 Register Automata 3/4

  • Language of data strings were two adjacent positions

contain the same data value

– (q0,{r,w,s}) → (q1,R1,R) – (q1,R1,{r,w,s})

(qf,S)

– (q1,{r,w,s})

(q1,R1,R) (q1,{r,w,s}) (q1,R1,R) STATE q1 R1=1

r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4

slide-25
SLIDE 25

05/08/12 In the Maze of Data Languages 25

Register Automata 3/4 Register Automata 3/4

  • Language of data strings were two adjacent positions

contain the same data value

– (q0,{r,w,s}) → (q1,R1,R) – (q1,R1,{r,w,s})

(qf,S)

– (q1,{r,w,s})

(q1,R1,R) STATE q1 q1 R1=4 4

r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4

slide-26
SLIDE 26

05/08/12 In the Maze of Data Languages 26

Register Automata 3/4 Register Automata 3/4

  • Language of data strings were two adjacent positions

contain the same data value

– (q0,{r,w,s}) → (q1,R1,R) – (q1,R1,{r,w,s})

(qf,S)

– (q1,{r,w,s})

(q1,R1,R) (q1,{r,w,s}) (q1,R1,R) STATE q1 R1=4

r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4

slide-27
SLIDE 27

05/08/12 In the Maze of Data Languages 27

Register Automata 3/4 Register Automata 3/4

  • Language of data strings were two adjacent positions

contain the same data value

– (q0,{r,w,s}) → (q1,R1,R) – (q1,R1,{r,w,s})

(qf,S)

– (q1,{r,w,s})

(q1,R1,R) STATE q1 q1 R1=1 1

r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4

slide-28
SLIDE 28

05/08/12 In the Maze of Data Languages 28

Register Automata 3/4 Register Automata 3/4

  • Language of data strings were two adjacent positions

contain the same data value

– (q0,{r,w,s}) → (q1,R1,R) – (q1,R1,{r,w,s})

(qf,S) (q1,R1,{r,w,s}) (qf,S)

– (q1,{r,w,s})

(q1,R1,R) STATE q1 R1=1

r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4

slide-29
SLIDE 29

05/08/12 In the Maze of Data Languages 29

Register Automata 3/4 Register Automata 3/4

  • Language of data strings were two adjacent positions

contain the same data value

– (q0,{r,w,s}) → (q1,R1,R) – (q1,R1,{r,w,s})

(qf,S)

– (q1,{r,w,s})

(q1,R1,R) STATE qf qf R1=1 DATA STRING ACCEPTED DATA STRING ACCEPTED

r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4

slide-30
SLIDE 30

05/08/12 In the Maze of Data Languages 30

Register Automata 4/4 Register Automata 4/4

  • The Register Automata (RA)

Register Automata (RA) in the previous example is called one-way determistic

  • Based on the restrictions we impose RAs can be

– Deterministic, Nondeterministic or Alternating – One way or Two way

  • Each of the above choices affects the expressiveness and

the decidability of the model

slide-31
SLIDE 31

05/08/12 In the Maze of Data Languages 31

What do we have so far? What do we have so far?

Emptiness Universality Inclusion Equivalence 1Way–ND Register Automata

Decidable Decidable Undecidable Undecidable

Undecidable 2Way–D Register Automata Undecidable Undecidable Undecidable Undecidable

Maybe we should slow down a little bit... Let's put PAs aside and try to improve 1Way-ND RAs 1Way-ND RAs

slide-32
SLIDE 32

05/08/12 In the Maze of Data Languages 32

Generalizing a Little Bit... Generalizing a Little Bit...

  • 1Way-ND-RAs

1Way-ND-RAs are limited in expressiveness: they can't even represent the language of strings where all the data values are different!!

  • We need a slightly more expressive

more expressive model...but not too expressive otherwise we hit undecidability

  • Weakness of RAs

Weakness of RAs: they can only talk about global properties but not about local properties:

– GLOBAL

GLOBAL: property of the whole string

– LOCAL

LOCAL: all label of position with same data have some property

slide-33
SLIDE 33

05/08/12 In the Maze of Data Languages 33

Class Memory Automata 1/3 Class Memory Automata 1/3

  • More expressive

More expressive than Register Automata Register Automata :)

  • Single pass

Single pass and one way

  • ne way!! :(
  • Non-deterministic

Non-deterministic :(

  • At every point transitions depend on class history
  • Let's see an example...
slide-34
SLIDE 34

05/08/12 In the Maze of Data Languages 34

Class Memory Automata 2/3 Class Memory Automata 2/3

  • Following transitions to implement the same language as

before where all data values are different. For every q, For every q,

– (q,{r,w,s},-) → q1

(q,{r,w,s},-) → q1

– (q,{r,w,s},q1) → q2

(q,{r,w,s},q1) → q2

– (q,{r,w,s},q2) → q2

(q,{r,w,s},q2) → q2

  • Global acceptance {q0,q1,q2}, local acceptance {q1}

r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4 1

  • 4
  • 34
  • 5
  • q0
slide-35
SLIDE 35

05/08/12 In the Maze of Data Languages 35

Class Memory Automata 2/3 Class Memory Automata 2/3

  • Following transitions to implement the same language as

before where all data values are different. For every q,

– (q,{r,w,s},-) → q1

(q,{r,w,s},-) → q1

– (q,{r,w,s},q1) → q2 – (q,{r,w,s},q2) → q2

  • Global acceptance {q0,q1,q2}, local acceptance {q1}

r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4 1

  • 4
  • 34
  • 5
  • q0
slide-36
SLIDE 36

05/08/12 In the Maze of Data Languages 36

Class Memory Automata 2/3 Class Memory Automata 2/3

  • Following transitions to implement the same language as

before where all data values are different. For every q,

– (q,{r,w,s},-) → q1 – (q,{r,w,s},q1) → q2 – (q,{r,w,s},q2) → q2

  • Global acceptance {q0,q1,q2}, local acceptance {q1}

r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4 1 q1 q1 4

  • 34
  • 5
  • q1

q1

slide-37
SLIDE 37

05/08/12 In the Maze of Data Languages 37

Class Memory Automata 2/3 Class Memory Automata 2/3

  • Following transitions to implement the same language as

before where all data values are different. For every q,

– (q,{r,w,s},-) → q1

(q,{r,w,s},-) → q1

– (q,{r,w,s},q1) → q2 – (q,{r,w,s},q2) → q2

  • Global acceptance {q0,q1,q2}, local acceptance {q1}

r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4 1 q1 4

  • 34
  • 5
  • q1
slide-38
SLIDE 38

05/08/12 In the Maze of Data Languages 38

Class Memory Automata 2/3 Class Memory Automata 2/3

  • Following transitions to implement the same language as

before where all data values are different. For every q,

– (q,{r,w,s},-) → q1 – (q,{r,w,s},q1) → q2 – (q,{r,w,s},q2) → q2

  • Global acceptance {q0,q1,q2}, local acceptance {q1}

r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4 1 q1 4 q1 q1 34

  • 5
  • q1

q1

slide-39
SLIDE 39

05/08/12 In the Maze of Data Languages 39

Class Memory Automata 2/3 Class Memory Automata 2/3

  • Following transitions to implement the same language as

before where all data values are different. For every q,

– (q,{r,w,s},-) → q1 – (q,{r,w,s},q1) → q2

(q,{r,w,s},q1) → q2

– (q,{r,w,s},q2) → q2

  • Global acceptance {q0,q1,q2}, local acceptance {q1}

r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4 1 q1 4 q1 34

  • 5
  • q1
slide-40
SLIDE 40

05/08/12 In the Maze of Data Languages 40

Class Memory Automata 2/3 Class Memory Automata 2/3

  • Following transitions to implement the same language as

before where all data values are different. For every q,

– (q,{r,w,s},-) → q1 – (q,{r,w,s},q1) → q2 – (q,{r,w,s},q2) → q2

  • Global acceptance {q0,q1,q2}, local acceptance {q1}

r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4 1 q1 4 q2 q2 34

  • 5
  • q2

q2

slide-41
SLIDE 41

05/08/12 In the Maze of Data Languages 41

Class Memory Automata 2/3 Class Memory Automata 2/3

  • Following transitions to implement the same language as

before where all data values are different. For every q,

– (q,{r,w,s},-) → q1 – (q,{r,w,s},q1) → q2 DATA STRING REJECTED

DATA STRING REJECTED

– (q,{r,w,s},q2) → q2

  • Global acceptance {q0,q1,q2}, local acceptance {q1}

r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4 1 q2 4 q2 34 q1 5 q2 q2

slide-42
SLIDE 42

05/08/12 In the Maze of Data Languages 42

Class Memory Automata 3/3 Class Memory Automata 3/3

  • This example can't be computed by a 1WAY-ND-RA and

in general

– 1WAY-ND-RA

1WAY-ND-RA are strictly included strictly included in CMA CMAs for what concern expressiveness

  • Emptiness

Emptiness is still decidable decidable for CMAs CMAs!!!

  • Membership

Membership was easy now is NP-Complete NP-Complete :(

slide-43
SLIDE 43

05/08/12 In the Maze of Data Languages 43

Why is this model right? Why is this model right?

  • Class memory automata are usable

usable :)

  • More expressive

expressive than 1Way-ND-Register Automata :)

  • Still decidable

decidable emptiness :)

  • But who tells us there is no better model?? :(
  • Regular string languages are equivalent to MSO

– Models equivalent to a logic are more robust, – They have closure properties for free

  • We need a declarative model equivalent

declarative model equivalent to CMA...

slide-44
SLIDE 44

05/08/12 In the Maze of Data Languages 44

Data Languages Data Languages

  • Motivation
  • Data Model
  • Data Strings

Data Strings

– Automata and Logics

Logics

– Regularity

  • Data Trees
  • Conclusion
slide-45
SLIDE 45

05/08/12 In the Maze of Data Languages 45

First-Order Logic for Data Strings First-Order Logic for Data Strings

  • Variables

Variables range over positions positions of the data string

  • Formulae

Formulae of the form Φ ::= a(x) | x=y | ∃x.Φ | Φ V Φ | ¬Φ | x=y+1 | x<y | x~y x=y+1 | x<y | x~y

  • We call this logic FO(+1,<,~)

FO(+1,<,~)

  • Here is a simple formula for the language of data

strings with all difgerent data values ¬∃x.∃y.¬(x=y) ∧ x~y ¬∃x.∃y.¬(x=y) ∧ x~y

slide-46
SLIDE 46

05/08/12 In the Maze of Data Languages 46

Monadic Second-Order Logic for Monadic Second-Order Logic for Data Strings Data Strings

  • Variables

Variables range over positions positions of the data string

  • Second-order variables

Second-order variables range over sets of positions sets of positions

  • Formulae

Formulae of the form Φ ::= a(x) | x=y | ∃x.Φ | Φ V Φ | ¬Φ | x<y | x=y+1 | x~y | x<y | x=y+1 | x~y | ∃X. ∃X.Φ | Φ | x x ϵ ϵ X X

  • We call this logic MSO(+1,<,~)

MSO(+1,<,~)

  • Here is a formula for the language of data strings

with at least one position with label a a ∃ ∃X. ∃x. x

  • X. ∃x. x ϵ X ∧ a(x)

ϵ X ∧ a(x)

slide-47
SLIDE 47

05/08/12 In the Maze of Data Languages 47

Bad news... Bad news...

  • Emptiness:

Emptiness: Given a formula formula Φ Φ in FO(+1,<,~) or MSO(+1,<,~) checking whether there exists exists a data string s s that is a model of Φ model of Φ is undecidable undecidable

  • Too much expressiveness, we need something weaker

weaker

  • Maybe we can consider a logic where the number of

number of variables variables that can be used is limited limited...

slide-48
SLIDE 48

05/08/12 In the Maze of Data Languages 48

Two-Variable Logics 1/3 Two-Variable Logics 1/3

  • We call FO2

FO2 first-order logic with only two variables x,y x,y

  • In general adding variables adds expressiveness

– FOK(+1,<,~)

FOK(+1,<,~) is less expressive less expressive than FO(K+1)(+1,<,~) FO(K+1)(+1,<,~)

  • And...

– Emptiness

Emptiness for a formula in FO2(...) FO2(...) is decidable decidable

– Emptiness

Emptiness for a formula in FO3(...) FO3(...) is undecidable undecidable

  • This is a nice cut-ofg!!
slide-49
SLIDE 49

05/08/12 In the Maze of Data Languages 49

Two-Variable Logics 2/3 Two-Variable Logics 2/3

  • We can actually do a bit better
  • We call EMSO2(+1,<,~)

EMSO2(+1,<,~), monadic second order logic where all the set variables X set variables X are existentially existentially quantified and they appear at the beginning beginning of a formula

– ∃

∃X1. X1... ..∃Xn. ∃Xn.Φ Φ and Φ Φ only contains FO2(...) FO2(...)

  • And..

– Emptiness

Emptiness for a formula in EMSO2(...) EMSO2(...) is decidable decidable

– EMSO2(...)

EMSO2(...) is strictly more expressive than FO2(...) FO2(...)

slide-50
SLIDE 50

05/08/12 In the Maze of Data Languages 50

Two-Variable Logics 3/3 Two-Variable Logics 3/3

  • We can push the decidability even a bit further
  • Consider the operator ⊕1

⊕1 such that:

– x=y⊕1

x=y⊕1 iff y y is the next position next position after x x with the with the same data value same data value of x x

  • We call EMSO2(+1,<,~,⊕1)

EMSO2(+1,<,~,⊕1), EMSO2(+1,<,~) EMSO2(+1,<,~) enriched with the ⊕1 ⊕1 operator

  • Emptiness

Emptiness for a formula in EMSO2( EMSO2(+1,<,~,⊕1) +1,<,~,⊕1) is decidable decidable

  • EMSO2

EMSO2(+1,<,~,⊕1) (+1,<,~,⊕1) is strictly more expressive more expressive than EMSO2(+1,<,~) EMSO2(+1,<,~)

slide-51
SLIDE 51

05/08/12 In the Maze of Data Languages 51

Grand Finale Grand Finale

  • Incredible but true

CLASS MEMORY AUTOMATA CLASS MEMORY AUTOMATA EMSO2(+1,<,~,⊕1) EMSO2(+1,<,~,⊕1) all have the same expressiveness same expressiveness

slide-52
SLIDE 52

05/08/12 In the Maze of Data Languages 52

Data Languages Data Languages

  • Motivation
  • Data Model
  • Data Strings

Data Strings

– Automata and Logics – Regularity

Regularity

  • Data Trees
  • Conclusion
slide-53
SLIDE 53

05/08/12 In the Maze of Data Languages 53

Regularity 1/2 Regularity 1/2

  • It seems that class memory automata are a good model

but it still not regular regular in the standard sense

Regular String Regular String Languages Languages Class Memory Class Memory Automata Automata Equivalence Decidable Decidable Undecidable Undecidable Emptiness Decidable Decidable Decidable Decidable Deterministic Yes Yes No No Closure under intersection, union, star Yes Yes Yes Yes Closure under complement Yes Yes No No Logical counter part Yes Yes Yes Yes Membership Linear Linear NP-Complete NP-Complete

slide-54
SLIDE 54

05/08/12 In the Maze of Data Languages 54

Regularity 2/2 Regularity 2/2

  • Unfortunately this is the best we have so far... so:

– We can look for a less expressive

less expressive model maybe equivalent to equivalent to some form of FO FO so that we have closure under complement

– We accept that Data Languages are hard to work with

hard to work with and we give up on some properties

  • In any case, what defines a regular string data language

regular string data language is still and open problem

  • pen problem
slide-55
SLIDE 55

05/08/12 In the Maze of Data Languages 55

Data Languages Data Languages

  • Motivation
  • Data Model
  • Data Strings

– Automata and Logics – Regularity

  • Data Trees

Data Trees

  • Conclusion
slide-56
SLIDE 56

05/08/12 In the Maze of Data Languages 56

Data Trees Data Trees

  • Similarly, in a data tree each NODE

NODE carries

– a label

label from a finite alphabet and

– a data value

data value from an infinite alphabet

Messages Note, 34 Note, 54 Id From To Body Id From To Body Mary 502 501 Tom Mary Tom Will do! Prepare dinner!

slide-57
SLIDE 57

05/08/12 In the Maze of Data Languages 57

Models for Data Trees Models for Data Trees

  • The state of the art

state of the art for automata and logics over data trees is embryonic embryonic

  • Very few studied models with decidable properties

– Some extensions of register automata

register automata, but without intuitive semantics and limited expressiveness

– Extensions of FO2(+1,<,~),

FO2(+1,<,~), EMSO2(+1,<,~) EMSO2(+1,<,~) to data trees

  • Logic is more relevant for regularity, so let's concentrate
  • n that
slide-58
SLIDE 58

05/08/12 In the Maze of Data Languages 58

Two-Variable Logics 1/3 Two-Variable Logics 1/3

  • In a tree the predicates +1 and < don't have a clear

interpretation...

  • We replace them with two

two new predicates

– x = y

x = y ↓ ↓ 1 1 parent relation and

a b g c d e f h i j k

slide-59
SLIDE 59

05/08/12 In the Maze of Data Languages 59

Two-Variable Logics 1/3 Two-Variable Logics 1/3

  • In a tree the predicates +1 and < don't have a clear

interpretation...

  • We replace them with two

two new predicates

– x = y ↓ 1 parent relation and – x = y

x = y → → 1 1 next sibling relation

a b g c d e f h i j k

slide-60
SLIDE 60

05/08/12 In the Maze of Data Languages 60

Two-Variable Logics 2/3 Two-Variable Logics 2/3

  • Now the +1

+1 predicate corresponds to ↓ ↓ and → →

  • The <

< predicate corresponds to the transitive closures transitive closures of ↓ ↓ and → →, ↓* ↓* and →* →*

  • We now have an interpretation for

– FO2(+1,<,~)

FO2(+1,<,~) and EMSO2(+1,<,~) EMSO2(+1,<,~) over data trees

  • But we can also consider simpler logics where we drop

drop the < < operator

– FO2(+1,~)

FO2(+1,~) and EMSO2(+1,~) EMSO2(+1,~) over data trees

  • But why would we do that??
slide-61
SLIDE 61

05/08/12 In the Maze of Data Languages 61

Two-Variable Logics 3/3 Two-Variable Logics 3/3

  • Emptiness

Emptiness for

– FO2(+1,~)

FO2(+1,~) and EMSO2(+1,~) EMSO2(+1,~) over data trees is decidable decidable

– We will see a proof sketch

  • Emptiness

Emptiness for

– FO2(+1,<,~)

FO2(+1,<,~) and EMSO2(+1,<,~) EMSO2(+1,<,~) over data trees is a very hard open problem hard open problem

– Emptiness

Emptiness for vector addition tree automata vector addition tree automata reduces to it...

slide-62
SLIDE 62

05/08/12 In the Maze of Data Languages 62

Vector Addition Tree Automata Vector Addition Tree Automata

  • Bottom up

Bottom up tree automata over binary trees in which transitions have three vectors

– Every leaf is assigned a vector of values over Nat – Transitions of the form q1,q2,a,b,c,l → q

q1,q2,a,b,c,l → q

  • Is there a run in which root

root is labeled with the vector 0 vector 0

  • Very hard open problem

Very hard open problem reduces to emptiness emptiness for FO2(+1,<,~) FO2(+1,<,~)

q1,x q1,x q2,y q2,y l l x x y y q,z q,z

If If x-a>0 x-a>0 and and y-b>0 y-b>0 z=(x-a)+(y-b)+c z=(x-a)+(y-b)+c

slide-63
SLIDE 63

05/08/12 In the Maze of Data Languages 63

Emptiness for FO(+1,~) is Emptiness for FO(+1,~) is Decidable 1/4 Decidable 1/4

  • Proof outline, given a formula F

F in FO(+1,~) FO(+1,~)

– Compute a ``puzzle'' P

``puzzle'' P that has solutions solutions iff F F is satisfiable satisfiable

– If a ``puzzle'' P

``puzzle'' P has a solution solution, then there also exists a ``small'' solution ``small'' solution

– Find

Find if there exists a ``small'' solution ``small'' solution of P P using some extended tree automata extended tree automata

slide-64
SLIDE 64

05/08/12 In the Maze of Data Languages 64

Emptiness for FO(+1,~) is Emptiness for FO(+1,~) is Decidable 2/4 Decidable 2/4

  • Compute a ``puzzle'' P

``puzzle'' P that has has solutions solutions iff F F is satisfiable satisfiable

– Reduce F

F to a normal form normal form F' F'

– For every formula F'

F' we can compute compute a puzzle puzzle

  • A puzzle

puzzle over Σ is a pair (L,F) (L,F)

– L

L is a tree automata tree automata over an extension of Σ

– F

F is a set of accepting pairs set of accepting pairs (D,S) (D,S) where D,S are disjoint subsets of Σ

– Solution:

Solution: a data tree ``in'' L L where for each class: class: there exists pair (D,S) (D,S) such that all labels all labels are from D U S from D U S and each label in D label in D appears at most at most

  • nce
  • nce

Class: Class: Maximal set of connected nodes with the same data value

slide-65
SLIDE 65

05/08/12 In the Maze of Data Languages 65

Emptiness for FO(+1,~) is Emptiness for FO(+1,~) is Decidable 3/4 Decidable 3/4

  • If a ``puzzle'' P

``puzzle'' P has a solution has a solution, then there also exists a ``small'' solution ``small'' solution

– Let's assume there is a solution

solution S S

  • A (M,N)-reduced solution

(M,N)-reduced solution is a solution where:

– At most M

M classes classes are of size greater greater than N N

– There are at most M

M sibilinghoods sibilinghoods with more than N N classes classes

  • Given a solution we can compute

compute a (M,N)-reduced (M,N)-reduced solution solution and

– M

M and N N are effectively computable computable numbers from the size of the puzzle

Sibilinghood: Sibilinghood: Set of children

  • f a node
slide-66
SLIDE 66

05/08/12 In the Maze of Data Languages 66

Emptiness for FO(+1,~) is Emptiness for FO(+1,~) is Decidable 4/4 Decidable 4/4

  • Find

Find if there exists a ``small'' solution ``small'' solution of P P using some extended tree automata extended tree automata

  • Given a tree automata A

A and a tree t t, a run run of A A can be seen as a labeling labeling of t t with the states states Q Q of A A

– Linear constraint tree automata

Linear constraint tree automata (LCTA LCTA) are equipped with a linear constraint linear constraint K K over Q Q

– A run accepts

accepts if K K is satisfied satisfied when instantiated with the number of states number of states in the run

– Emptiness

Emptiness of LCTA is decidable decidable

  • We can compute a LCTA

LCTA that is empty empty iff the puzzle P P does not does not have (M,N)-reduced solutions solutions

QED QED

slide-67
SLIDE 67

05/08/12 In the Maze of Data Languages 67

Regularity Regularity

  • Talking about regularity

regularity for data tree languages data tree languages doesn't doesn't make quite sense

  • Only one

Only one very limited logic logic with decidable emptiness

  • No

No automata automata equivalent model

  • No proof of undecidability for more expressive logic
  • Very little research so far

Very little research so far

slide-68
SLIDE 68

05/08/12 In the Maze of Data Languages 68

Conclusion Conclusion

  • Motivation
  • Data Model
  • Data Strings

– Automata and Logics – Regularity

  • Data Trees
  • Conclusion

Conclusion

slide-69
SLIDE 69

05/08/12 In the Maze of Data Languages 69

We have seen... We have seen...

  • First proposals for automata models for string data

languages (RA) (RA)

  • Improved automata models with equivalent logical

counterpart (CMA) (CMA)

  • Most relevant logics for data languages (FO2,EMSO2)

(FO2,EMSO2)

  • Basic (only) result on decidability of FO2

FO2 for data trees data trees

  • Some discussion on what regularity

regularity might be for data data languages languages (no hope for full package)

slide-70
SLIDE 70

05/08/12 In the Maze of Data Languages 70

Open Problems Open Problems

  • Is there a variant of RA

RA closed under complementation complementation?

  • Is there a variant of FO2

FO2 equivalent equivalent to 1WAY-ND-RA 1WAY-ND-RA?

  • Is there a logic

logic closed under complementation complementation with decidable equivalence decidable equivalence?

  • Can we extend

extend current data strings data strings models models to data data trees trees preserving good properties?

  • Is FO2(+1,<,~)

FO2(+1,<,~) decidable decidable for data trees data trees?

slide-71
SLIDE 71

05/08/12 In the Maze of Data Languages 71

References References

  • Frank Neven... Finite state machines for strings over innite
  • alphabets. 2004
  • Mikolaj Bojanczyk... Two-variable logic on words with
  • data. 2006
  • Mikolaj Bojanczyk...Two-variable logic on data trees and

xml reasoning. 2006

  • Henrik Bjrklund... On notions of regularity for data
  • languages. 2007
  • …..
slide-72
SLIDE 72

05/08/12 In the Maze of Data Languages 72

Thank you... Thank you... Questions? Questions?