In the Maze of In the Maze of Data Data Languages Languages
Loris D'Antoni Loris D'Antoni WPE II WPE II 05/08/2012 05/08/2012
In the Maze of Data Data Languages Languages In the Maze of Loris - - PowerPoint PPT Presentation
In the Maze of Data Data Languages Languages In the Maze of Loris D'Antoni Loris D'Antoni WPE II WPE II 05/08/2012 05/08/2012 Data Languages Data Languages Motivation Motivation Data Model Data Strings Automata and Logics
Loris D'Antoni Loris D'Antoni WPE II WPE II 05/08/2012 05/08/2012
05/08/12 In the Maze of Data Languages 2
Motivation
– Automata and Logics – Regularity
05/08/12 In the Maze of Data Languages 3
consider the domain to be finite finite in order to achieve decidability
– XML documents and languages over XML use data
data comparison comparison
– Interesting properties about programs compare values
values
05/08/12 In the Maze of Data Languages 4
Messages Note Note Id From To Body Id From To Body Mary 502 501 Tom Prepare dinner! Mary Tom Will do!
– Inner nodes correspond to elements
elements (tags)
– Leaves correspond to data
data (attributes, text content)
05/08/12 In the Maze of Data Languages 5
Messages Note Note Id From To Body Id From To Body
– we can consider the tree
tree to be over a finite alphabet finite alphabet
– good for navigation, validation, transformation...
WHAT ABOUT TASKS IN WHICH WE WANT TO WHAT ABOUT TASKS IN WHICH WE WANT TO SPECIFY CONSTRAINTS OVER DATA? SPECIFY CONSTRAINTS OVER DATA?
05/08/12 In the Maze of Data Languages 6
XPath query optimization
SCHEMA: can define XML language and can also specify constraints on data
XPATH: query language for XML that also allows data comparison
– Q1
Q1: select all notes someone sent to himself
– Q2
Q2: select people who sent more than 3 notes
QUERY OPTIMIZATION: given two XPath queries q1,q2 and a Schema S, decide whether, for each valid document x in S, q1(x) ⊆ q2(x for each valid document x in S, q1(x) ⊆ q2(x)
05/08/12 In the Maze of Data Languages 7
Model Checking: checking properties about programs that can have possibly infinite reachable states
– Represent system as a finite structure – Define a transition relation – Use algorithm for reachability of some particular state
ad-hoc solutions for particular cases of infinite alphabets and infinite states
– Timed Automata [Alur90] – Regular model checking [Bouajjani00]
05/08/12 In the Maze of Data Languages 8
the same resource is never granted twice the same resource is never granted twice (with infinitely many resources) (with infinitely many resources)
where the states are from a finite alphabet and the resources are from an infinite domain and
Is there a list with the same resource appearing twice Is there a list with the same resource appearing twice
q0 r1 q1 r4 q3 r1 qf r1 ….
05/08/12 In the Maze of Data Languages 9
infinite alphabets alphabets:
– LTL with Freeze Quantifiers (LTL with storing registers) – Timed Automata (can reason about Time) – Symbolic Automata and Transducers (theory over input)
though they come with nice properties
general theory for structures structures over
infinite alphabets
05/08/12 In the Maze of Data Languages 10
Data Model
– Automata and Logics – Regularity
05/08/12 In the Maze of Data Languages 11
applications DATA STRINGS and DATA TREES
DATA STRINGS and DATA TREES
05/08/12 In the Maze of Data Languages 12
alphabets
infinite domain is allowed for every position/node
checking for equality equality
easy to study and useful useful in practice
undecidability
05/08/12 In the Maze of Data Languages 13
– a label
label from a finite alphabet and
– a data value
data value from an infinite alphabet
– the finite alphabet is {r,w,s} – the infinite alphabet is the natural numbers r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4
05/08/12 In the Maze of Data Languages 14
NODE carries
– a label
label from a finite alphabet and
– a data value
data value from an infinite alphabet
Messages Note Note Id From To Body Id From To Body Mary 502 501 Tom Mary Tom Will do! Prepare dinner!
05/08/12 In the Maze of Data Languages 15
– How do we define data languages? – What is the best/right
best/right model for data string languages?
– What is the best/right
best/right model for data tree languages?
– What is a regular
regular data language?
05/08/12 In the Maze of Data Languages 16
all the nice properties of regular string languages
– Good tradeoff between expressiveness
expressiveness and decidability decidability
– Efficiency
Efficiency of the membership problem
– Good closure properties
closure properties
– Robustness:
Robustness: clear counter part in logic and several characterizations
DOES A MODEL LIKE THAT EVEN EXIST? DOES A MODEL LIKE THAT EVEN EXIST?
05/08/12 In the Maze of Data Languages 17
Data Strings
– Automata and Logics
Automata and Logics
– Regularity
05/08/12 In the Maze of Data Languages 18
they are mainly of two kinds:
– Auotomata
Auotomata based models
– Logic
Logic based models
equivalent logic model
relevant in the `treasure hunt' for regular data string languages
05/08/12 In the Maze of Data Languages 19
finite set of registers that can store data values and test for equality STATE q R1=4, R2=1 (q,R1,r) steps to (q',L) (q,R1,r) steps to (q',L) STATE q' q' R1=4, R2=1
r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4 r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4
05/08/12 In the Maze of Data Languages 20
STATE q R1=4, R2=1 if no R no R contains current value (q,r) steps to (q',R2,R) (q,r) steps to (q',R2,R) STATE q' q' R1=4, R2=5 5
r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4 r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4
05/08/12 In the Maze of Data Languages 21
contain the same data value
– (q0,{r,w,s}) → (q1,R1,R) – (q1,R1,{r,w,s})
(qf,S)
– (q1,{r,w,s})
(q1,R1,R) STATE q0 R1=
r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4
05/08/12 In the Maze of Data Languages 22
contain the same data value
– (q0,{r,w,s}) → (q1,R1,R)
(q0,{r,w,s}) → (q1,R1,R)
– (q1,R1,{r,w,s})
(qf,S)
– (q1,{r,w,s})
(q1,R1,R) STATE q0 R1=
r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4
05/08/12 In the Maze of Data Languages 23
contain the same data value
– (q0,{r,w,s}) → (q1,R1,R) – (q1,R1,{r,w,s})
(qf,S)
– (q1,{r,w,s})
(q1,R1,R) STATE q1 q1 R1=1 1
r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4
05/08/12 In the Maze of Data Languages 24
contain the same data value
– (q0,{r,w,s}) → (q1,R1,R) – (q1,R1,{r,w,s})
(qf,S)
– (q1,{r,w,s})
(q1,R1,R) (q1,{r,w,s}) (q1,R1,R) STATE q1 R1=1
r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4
05/08/12 In the Maze of Data Languages 25
contain the same data value
– (q0,{r,w,s}) → (q1,R1,R) – (q1,R1,{r,w,s})
(qf,S)
– (q1,{r,w,s})
(q1,R1,R) STATE q1 q1 R1=4 4
r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4
05/08/12 In the Maze of Data Languages 26
contain the same data value
– (q0,{r,w,s}) → (q1,R1,R) – (q1,R1,{r,w,s})
(qf,S)
– (q1,{r,w,s})
(q1,R1,R) (q1,{r,w,s}) (q1,R1,R) STATE q1 R1=4
r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4
05/08/12 In the Maze of Data Languages 27
contain the same data value
– (q0,{r,w,s}) → (q1,R1,R) – (q1,R1,{r,w,s})
(qf,S)
– (q1,{r,w,s})
(q1,R1,R) STATE q1 q1 R1=1 1
r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4
05/08/12 In the Maze of Data Languages 28
contain the same data value
– (q0,{r,w,s}) → (q1,R1,R) – (q1,R1,{r,w,s})
(qf,S) (q1,R1,{r,w,s}) (qf,S)
– (q1,{r,w,s})
(q1,R1,R) STATE q1 R1=1
r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4
05/08/12 In the Maze of Data Languages 29
contain the same data value
– (q0,{r,w,s}) → (q1,R1,R) – (q1,R1,{r,w,s})
(qf,S)
– (q1,{r,w,s})
(q1,R1,R) STATE qf qf R1=1 DATA STRING ACCEPTED DATA STRING ACCEPTED
r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4
05/08/12 In the Maze of Data Languages 30
Register Automata (RA) in the previous example is called one-way determistic
– Deterministic, Nondeterministic or Alternating – One way or Two way
the decidability of the model
05/08/12 In the Maze of Data Languages 31
Emptiness Universality Inclusion Equivalence 1Way–ND Register Automata
Decidable Decidable Undecidable Undecidable
Undecidable 2Way–D Register Automata Undecidable Undecidable Undecidable Undecidable
Maybe we should slow down a little bit... Let's put PAs aside and try to improve 1Way-ND RAs 1Way-ND RAs
05/08/12 In the Maze of Data Languages 32
1Way-ND-RAs are limited in expressiveness: they can't even represent the language of strings where all the data values are different!!
more expressive model...but not too expressive otherwise we hit undecidability
Weakness of RAs: they can only talk about global properties but not about local properties:
– GLOBAL
GLOBAL: property of the whole string
– LOCAL
LOCAL: all label of position with same data have some property
05/08/12 In the Maze of Data Languages 33
More expressive than Register Automata Register Automata :)
Single pass and one way
Non-deterministic :(
05/08/12 In the Maze of Data Languages 34
before where all data values are different. For every q, For every q,
– (q,{r,w,s},-) → q1
(q,{r,w,s},-) → q1
– (q,{r,w,s},q1) → q2
(q,{r,w,s},q1) → q2
– (q,{r,w,s},q2) → q2
(q,{r,w,s},q2) → q2
r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4 1
05/08/12 In the Maze of Data Languages 35
before where all data values are different. For every q,
– (q,{r,w,s},-) → q1
(q,{r,w,s},-) → q1
– (q,{r,w,s},q1) → q2 – (q,{r,w,s},q2) → q2
r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4 1
05/08/12 In the Maze of Data Languages 36
before where all data values are different. For every q,
– (q,{r,w,s},-) → q1 – (q,{r,w,s},q1) → q2 – (q,{r,w,s},q2) → q2
r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4 1 q1 q1 4
q1
05/08/12 In the Maze of Data Languages 37
before where all data values are different. For every q,
– (q,{r,w,s},-) → q1
(q,{r,w,s},-) → q1
– (q,{r,w,s},q1) → q2 – (q,{r,w,s},q2) → q2
r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4 1 q1 4
05/08/12 In the Maze of Data Languages 38
before where all data values are different. For every q,
– (q,{r,w,s},-) → q1 – (q,{r,w,s},q1) → q2 – (q,{r,w,s},q2) → q2
r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4 1 q1 4 q1 q1 34
q1
05/08/12 In the Maze of Data Languages 39
before where all data values are different. For every q,
– (q,{r,w,s},-) → q1 – (q,{r,w,s},q1) → q2
(q,{r,w,s},q1) → q2
– (q,{r,w,s},q2) → q2
r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4 1 q1 4 q1 34
05/08/12 In the Maze of Data Languages 40
before where all data values are different. For every q,
– (q,{r,w,s},-) → q1 – (q,{r,w,s},q1) → q2 – (q,{r,w,s},q2) → q2
r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4 1 q1 4 q2 q2 34
q2
05/08/12 In the Maze of Data Languages 41
before where all data values are different. For every q,
– (q,{r,w,s},-) → q1 – (q,{r,w,s},q1) → q2 DATA STRING REJECTED
DATA STRING REJECTED
– (q,{r,w,s},q2) → q2
r w w r s r r s r w 1 4 1 1 4 34 4 5 5 4 1 q2 4 q2 34 q1 5 q2 q2
05/08/12 In the Maze of Data Languages 42
in general
– 1WAY-ND-RA
1WAY-ND-RA are strictly included strictly included in CMA CMAs for what concern expressiveness
Emptiness is still decidable decidable for CMAs CMAs!!!
Membership was easy now is NP-Complete NP-Complete :(
05/08/12 In the Maze of Data Languages 43
usable :)
expressive than 1Way-ND-Register Automata :)
decidable emptiness :)
– Models equivalent to a logic are more robust, – They have closure properties for free
declarative model equivalent to CMA...
05/08/12 In the Maze of Data Languages 44
Data Strings
– Automata and Logics
Logics
– Regularity
05/08/12 In the Maze of Data Languages 45
Variables range over positions positions of the data string
Formulae of the form Φ ::= a(x) | x=y | ∃x.Φ | Φ V Φ | ¬Φ | x=y+1 | x<y | x~y x=y+1 | x<y | x~y
FO(+1,<,~)
strings with all difgerent data values ¬∃x.∃y.¬(x=y) ∧ x~y ¬∃x.∃y.¬(x=y) ∧ x~y
05/08/12 In the Maze of Data Languages 46
Variables range over positions positions of the data string
Second-order variables range over sets of positions sets of positions
Formulae of the form Φ ::= a(x) | x=y | ∃x.Φ | Φ V Φ | ¬Φ | x<y | x=y+1 | x~y | x<y | x=y+1 | x~y | ∃X. ∃X.Φ | Φ | x x ϵ ϵ X X
MSO(+1,<,~)
with at least one position with label a a ∃ ∃X. ∃x. x
ϵ X ∧ a(x)
05/08/12 In the Maze of Data Languages 47
Emptiness: Given a formula formula Φ Φ in FO(+1,<,~) or MSO(+1,<,~) checking whether there exists exists a data string s s that is a model of Φ model of Φ is undecidable undecidable
weaker
number of variables variables that can be used is limited limited...
05/08/12 In the Maze of Data Languages 48
FO2 first-order logic with only two variables x,y x,y
– FOK(+1,<,~)
FOK(+1,<,~) is less expressive less expressive than FO(K+1)(+1,<,~) FO(K+1)(+1,<,~)
– Emptiness
Emptiness for a formula in FO2(...) FO2(...) is decidable decidable
– Emptiness
Emptiness for a formula in FO3(...) FO3(...) is undecidable undecidable
05/08/12 In the Maze of Data Languages 49
EMSO2(+1,<,~), monadic second order logic where all the set variables X set variables X are existentially existentially quantified and they appear at the beginning beginning of a formula
– ∃
∃X1. X1... ..∃Xn. ∃Xn.Φ Φ and Φ Φ only contains FO2(...) FO2(...)
– Emptiness
Emptiness for a formula in EMSO2(...) EMSO2(...) is decidable decidable
– EMSO2(...)
EMSO2(...) is strictly more expressive than FO2(...) FO2(...)
05/08/12 In the Maze of Data Languages 50
⊕1 such that:
– x=y⊕1
x=y⊕1 iff y y is the next position next position after x x with the with the same data value same data value of x x
EMSO2(+1,<,~,⊕1), EMSO2(+1,<,~) EMSO2(+1,<,~) enriched with the ⊕1 ⊕1 operator
Emptiness for a formula in EMSO2( EMSO2(+1,<,~,⊕1) +1,<,~,⊕1) is decidable decidable
EMSO2(+1,<,~,⊕1) (+1,<,~,⊕1) is strictly more expressive more expressive than EMSO2(+1,<,~) EMSO2(+1,<,~)
05/08/12 In the Maze of Data Languages 51
CLASS MEMORY AUTOMATA CLASS MEMORY AUTOMATA EMSO2(+1,<,~,⊕1) EMSO2(+1,<,~,⊕1) all have the same expressiveness same expressiveness
05/08/12 In the Maze of Data Languages 52
Data Strings
– Automata and Logics – Regularity
Regularity
05/08/12 In the Maze of Data Languages 53
but it still not regular regular in the standard sense
Regular String Regular String Languages Languages Class Memory Class Memory Automata Automata Equivalence Decidable Decidable Undecidable Undecidable Emptiness Decidable Decidable Decidable Decidable Deterministic Yes Yes No No Closure under intersection, union, star Yes Yes Yes Yes Closure under complement Yes Yes No No Logical counter part Yes Yes Yes Yes Membership Linear Linear NP-Complete NP-Complete
05/08/12 In the Maze of Data Languages 54
– We can look for a less expressive
less expressive model maybe equivalent to equivalent to some form of FO FO so that we have closure under complement
– We accept that Data Languages are hard to work with
hard to work with and we give up on some properties
regular string data language is still and open problem
05/08/12 In the Maze of Data Languages 55
– Automata and Logics – Regularity
Data Trees
05/08/12 In the Maze of Data Languages 56
NODE carries
– a label
label from a finite alphabet and
– a data value
data value from an infinite alphabet
Messages Note, 34 Note, 54 Id From To Body Id From To Body Mary 502 501 Tom Mary Tom Will do! Prepare dinner!
05/08/12 In the Maze of Data Languages 57
state of the art for automata and logics over data trees is embryonic embryonic
– Some extensions of register automata
register automata, but without intuitive semantics and limited expressiveness
– Extensions of FO2(+1,<,~),
FO2(+1,<,~), EMSO2(+1,<,~) EMSO2(+1,<,~) to data trees
05/08/12 In the Maze of Data Languages 58
interpretation...
two new predicates
– x = y
x = y ↓ ↓ 1 1 parent relation and
a b g c d e f h i j k
05/08/12 In the Maze of Data Languages 59
interpretation...
two new predicates
– x = y ↓ 1 parent relation and – x = y
x = y → → 1 1 next sibling relation
a b g c d e f h i j k
05/08/12 In the Maze of Data Languages 60
+1 predicate corresponds to ↓ ↓ and → →
< predicate corresponds to the transitive closures transitive closures of ↓ ↓ and → →, ↓* ↓* and →* →*
– FO2(+1,<,~)
FO2(+1,<,~) and EMSO2(+1,<,~) EMSO2(+1,<,~) over data trees
drop the < < operator
– FO2(+1,~)
FO2(+1,~) and EMSO2(+1,~) EMSO2(+1,~) over data trees
05/08/12 In the Maze of Data Languages 61
Emptiness for
– FO2(+1,~)
FO2(+1,~) and EMSO2(+1,~) EMSO2(+1,~) over data trees is decidable decidable
– We will see a proof sketch
Emptiness for
– FO2(+1,<,~)
FO2(+1,<,~) and EMSO2(+1,<,~) EMSO2(+1,<,~) over data trees is a very hard open problem hard open problem
– Emptiness
Emptiness for vector addition tree automata vector addition tree automata reduces to it...
05/08/12 In the Maze of Data Languages 62
Bottom up tree automata over binary trees in which transitions have three vectors
– Every leaf is assigned a vector of values over Nat – Transitions of the form q1,q2,a,b,c,l → q
q1,q2,a,b,c,l → q
root is labeled with the vector 0 vector 0
Very hard open problem reduces to emptiness emptiness for FO2(+1,<,~) FO2(+1,<,~)
q1,x q1,x q2,y q2,y l l x x y y q,z q,z
If If x-a>0 x-a>0 and and y-b>0 y-b>0 z=(x-a)+(y-b)+c z=(x-a)+(y-b)+c
05/08/12 In the Maze of Data Languages 63
F in FO(+1,~) FO(+1,~)
– Compute a ``puzzle'' P
``puzzle'' P that has solutions solutions iff F F is satisfiable satisfiable
– If a ``puzzle'' P
``puzzle'' P has a solution solution, then there also exists a ``small'' solution ``small'' solution
– Find
Find if there exists a ``small'' solution ``small'' solution of P P using some extended tree automata extended tree automata
05/08/12 In the Maze of Data Languages 64
``puzzle'' P that has has solutions solutions iff F F is satisfiable satisfiable
– Reduce F
F to a normal form normal form F' F'
– For every formula F'
F' we can compute compute a puzzle puzzle
puzzle over Σ is a pair (L,F) (L,F)
– L
L is a tree automata tree automata over an extension of Σ
– F
F is a set of accepting pairs set of accepting pairs (D,S) (D,S) where D,S are disjoint subsets of Σ
– Solution:
Solution: a data tree ``in'' L L where for each class: class: there exists pair (D,S) (D,S) such that all labels all labels are from D U S from D U S and each label in D label in D appears at most at most
Class: Class: Maximal set of connected nodes with the same data value
05/08/12 In the Maze of Data Languages 65
``puzzle'' P has a solution has a solution, then there also exists a ``small'' solution ``small'' solution
– Let's assume there is a solution
solution S S
(M,N)-reduced solution is a solution where:
– At most M
M classes classes are of size greater greater than N N
– There are at most M
M sibilinghoods sibilinghoods with more than N N classes classes
compute a (M,N)-reduced (M,N)-reduced solution solution and
– M
M and N N are effectively computable computable numbers from the size of the puzzle
Sibilinghood: Sibilinghood: Set of children
05/08/12 In the Maze of Data Languages 66
Find if there exists a ``small'' solution ``small'' solution of P P using some extended tree automata extended tree automata
A and a tree t t, a run run of A A can be seen as a labeling labeling of t t with the states states Q Q of A A
– Linear constraint tree automata
Linear constraint tree automata (LCTA LCTA) are equipped with a linear constraint linear constraint K K over Q Q
– A run accepts
accepts if K K is satisfied satisfied when instantiated with the number of states number of states in the run
– Emptiness
Emptiness of LCTA is decidable decidable
LCTA that is empty empty iff the puzzle P P does not does not have (M,N)-reduced solutions solutions
05/08/12 In the Maze of Data Languages 67
regularity for data tree languages data tree languages doesn't doesn't make quite sense
Only one very limited logic logic with decidable emptiness
No automata automata equivalent model
Very little research so far
05/08/12 In the Maze of Data Languages 68
– Automata and Logics – Regularity
Conclusion
05/08/12 In the Maze of Data Languages 69
languages (RA) (RA)
counterpart (CMA) (CMA)
(FO2,EMSO2)
FO2 for data trees data trees
regularity might be for data data languages languages (no hope for full package)
05/08/12 In the Maze of Data Languages 70
RA closed under complementation complementation?
FO2 equivalent equivalent to 1WAY-ND-RA 1WAY-ND-RA?
logic closed under complementation complementation with decidable equivalence decidable equivalence?
extend current data strings data strings models models to data data trees trees preserving good properties?
FO2(+1,<,~) decidable decidable for data trees data trees?
05/08/12 In the Maze of Data Languages 71
xml reasoning. 2006
05/08/12 In the Maze of Data Languages 72