Efficient Static Analysis of XML Paths and Types Pierre Genevs - - PowerPoint PPT Presentation

efficient static analysis of xml paths and types
SMART_READER_LITE
LIVE PREVIEW

Efficient Static Analysis of XML Paths and Types Pierre Genevs - - PowerPoint PPT Presentation

Efficient Static Analysis of XML Paths and Types Pierre Genevs EPFL, Switzerland Joint work with Nabil Layada and Alan Schmitt INRIA, France PLDI07, San Diego, June 2007 Introduction More and more XML data Objective: ensuring


slide-1
SLIDE 1

Efficient Static Analysis of XML Paths and Types

Pierre Genevès – EPFL, Switzerland Joint work with Nabil Layaïda and Alan Schmitt – INRIA, France PLDI’07, San Diego, June 2007

slide-2
SLIDE 2

Introduction

More and more XML data Objective: ensuring safety and efficiency of programs that manipulate XML Two ways for processing XML:

1

General purpose languages extended with librairies

2

DSLs: e.g. XSLT, XQuery (W3C standards) that rely on XPath

In both cases: static analysis of programs very hard (very complex to detect errors at compile-time) This paper: we solve important XML static analysis tasks by reduction to satisfiability of a new tree logic

P . Genevès, EPFL Efficient Static Analysis of XML Paths and Types

slide-3
SLIDE 3

Safety and Efficiency of Programs

Programs that manipulate XML trees Analysis:

tree types (XML Schemas, DTDs) queries (XPath)

/ c a b c a b

Type T

/descendant::b/parent::a/child::c /descendant::b/parent::a/child::c

  • q

⊕T

?

= ∅ ⊕T

?

≡ /child::a/child::c

  • qoptimised

Before: complexity too high, implementations out of scope... This paper: optimal complexity + efficient implementation

P . Genevès, EPFL Efficient Static Analysis of XML Paths and Types

slide-4
SLIDE 4

Safety and Efficiency of Programs

Programs that manipulate XML trees Analysis:

tree types (XML Schemas, DTDs) queries (XPath)

/ c a b c a b

Type T

/descendant::b/parent::a/child::c /descendant::b/parent::a/child::c

  • q

⊕T

?

= ∅ ⊕T

?

≡ /child::a/child::c

  • qoptimised

Before: complexity too high, implementations out of scope... This paper: optimal complexity + efficient implementation

P . Genevès, EPFL Efficient Static Analysis of XML Paths and Types

slide-5
SLIDE 5

Safety and Efficiency of Programs

Programs that manipulate XML trees Analysis:

tree types (XML Schemas, DTDs) queries (XPath)

/ c a b c a b

Type T

/descendant::b/parent::a/child::c /descendant::b/parent::a/child::c

  • q

⊕T

?

= ∅ ⊕T

?

≡ /child::a/child::c

  • qoptimised

Before: complexity too high, implementations out of scope... This paper: optimal complexity + efficient implementation

P . Genevès, EPFL Efficient Static Analysis of XML Paths and Types

slide-6
SLIDE 6

Safety and Efficiency of Programs

Programs that manipulate XML trees Analysis:

tree types (XML Schemas, DTDs) queries (XPath)

/ c a b c a b

Type T

/descendant::b/parent::a/child::c /descendant::b/parent::a/child::c

  • q

⊕T

?

= ∅ ⊕T

?

≡ /child::a/child::c

  • qoptimised

Before: complexity too high, implementations out of scope... This paper: optimal complexity + efficient implementation

P . Genevès, EPFL Efficient Static Analysis of XML Paths and Types

slide-7
SLIDE 7

Safety and Efficiency of Programs

Programs that manipulate XML trees Analysis:

tree types (XML Schemas, DTDs) queries (XPath)

/ c a b c a b

Type T

/descendant::b/parent::a/child::c /descendant::b/parent::a/child::c

  • q

⊕T

?

= ∅ ⊕T

?

≡ /child::a/child::c

  • qoptimised

Before: complexity too high, implementations out of scope... This paper: optimal complexity + efficient implementation

P . Genevès, EPFL Efficient Static Analysis of XML Paths and Types

slide-8
SLIDE 8

Safety and Efficiency of Programs

Programs that manipulate XML trees Analysis:

tree types (XML Schemas, DTDs) queries (XPath)

/ c a b c a b

Type T

/descendant::b/parent::a/child::c

  • q

⊕T

?

= ∅ ⊕T

?

≡ /child::a/child::c

  • qoptimised

for x in (q) do { ... } let n = q; ...

Before: complexity too high, implementations out of scope... This paper: optimal complexity + efficient implementation

P . Genevès, EPFL Efficient Static Analysis of XML Paths and Types

slide-9
SLIDE 9

Safety and Efficiency of Programs

Programs that manipulate XML trees Analysis:

tree types (XML Schemas, DTDs) queries (XPath)

/ c a b c a b

Type T

/descendant::b/parent::a/child::c

  • q

⊕T

?

= ∅ ⊕T

?

≡ /child::a/child::c

  • qoptimised

for x in (q) do { ... } let n = q; ...

Before: complexity too high, implementations out of scope... This paper: optimal complexity + efficient implementation

P . Genevès, EPFL Efficient Static Analysis of XML Paths and Types

slide-10
SLIDE 10

Safety and Efficiency of Programs

Programs that manipulate XML trees Analysis:

tree types (XML Schemas, DTDs) queries (XPath)

/ c a b c a b

Type T

/descendant::b/parent::a/child::c

  • q

⊕T

?

= ∅ ⊕T

?

≡ /child::a/child::c

  • qoptimised

for x in (q) do { ... } let n = q; ...

Before: complexity too high, implementations out of scope... This paper: optimal complexity + efficient implementation

P . Genevès, EPFL Efficient Static Analysis of XML Paths and Types

slide-11
SLIDE 11

Safety and Efficiency of Programs

Programs that manipulate XML trees Analysis:

tree types (XML Schemas, DTDs) queries (XPath)

/ c a b c a b

Type T

/descendant::b/parent::a/child::c

  • q

⊕T

?

≡ /child::a/child::c

  • qoptimised

for x in (q) do { ... } let n = q; ...

Before: complexity too high, implementations out of scope... This paper: optimal complexity + efficient implementation

P . Genevès, EPFL Efficient Static Analysis of XML Paths and Types

slide-12
SLIDE 12

Safety and Efficiency of Programs

Programs that manipulate XML trees Analysis:

tree types (XML Schemas, DTDs) queries (XPath)

/ c a b c a b

Type T

/descendant::b/parent::a/child::c

  • q

⊕T

?

≡ /child::a/child::c

  • qoptimised

for x in (q) do { ... } let n = q; ... qoptimised

Before: complexity too high, implementations out of scope... This paper: optimal complexity + efficient implementation

P . Genevès, EPFL Efficient Static Analysis of XML Paths and Types

slide-13
SLIDE 13

Safety and Efficiency of Programs

Programs that manipulate XML trees Analysis:

tree types (XML Schemas, DTDs) queries (XPath)

/ c a b c a b

Type T

/descendant::b/parent::a/child::c

  • q

⊕T

?

≡ /child::a/child::c

  • qoptimised

for x in (q) do { ... } let n = q; ... qoptimised

Before: complexity too high, implementations out of scope... This paper: optimal complexity + efficient implementation

P . Genevès, EPFL Efficient Static Analysis of XML Paths and Types

slide-14
SLIDE 14

Safety and Efficiency of Programs

Programs that manipulate XML trees Analysis:

tree types (XML Schemas, DTDs) queries (XPath)

/ c a b c a b

Type T

/descendant::b/parent::a/child::c

  • q

⊕T

?

≡ /child::a/child::c

  • qoptimised

q ∩ qforbidden

?

= ∅

for x in (q) do { ... } let n = q; ... qoptimised

Before: complexity too high, implementations out of scope... This paper: optimal complexity + efficient implementation

P . Genevès, EPFL Efficient Static Analysis of XML Paths and Types

slide-15
SLIDE 15

Safety and Efficiency of Programs

Programs that manipulate XML trees Analysis:

tree types (XML Schemas, DTDs) queries (XPath)

/ c a b c a b

Type T

/descendant::b/parent::a/child::c

  • q

⊕T

?

≡ /child::a/child::c

  • qoptimised

q ∩ qforbidden

?

= ∅

for x in (q) do { ... } let n = q; ... qoptimised

!

forbidden access!

Before: complexity too high, implementations out of scope... This paper: optimal complexity + efficient implementation

P . Genevès, EPFL Efficient Static Analysis of XML Paths and Types

slide-16
SLIDE 16

Safety and Efficiency of Programs

Programs that manipulate XML trees Analysis:

tree types (XML Schemas, DTDs) queries (XPath)

/ c a b c a b

Type T

/descendant::b/parent::a/child::c

  • q

⊕T

?

≡ /child::a/child::c

  • qoptimised

q ∩ qforbidden

?

= ∅

for x in (q) do { ... } let n = q; ... qoptimised

!

forbidden access!

Before: complexity too high, implementations out of scope... This paper: optimal complexity + efficient implementation

P . Genevès, EPFL Efficient Static Analysis of XML Paths and Types

slide-17
SLIDE 17

Safety and Efficiency of Programs

Programs that manipulate XML trees Analysis:

tree types (XML Schemas, DTDs) queries (XPath)

/ c a b c a b

Type T

/descendant::b/parent::a/child::c

  • q

⊕T

?

≡ /child::a/child::c

  • qoptimised

q ∩ qforbidden

?

= ∅

for x in (q) do { ... } let n = q; ... qoptimised

!

forbidden access!

Before: complexity too high, implementations out of scope... This paper: optimal complexity + efficient implementation

P . Genevès, EPFL Efficient Static Analysis of XML Paths and Types

slide-18
SLIDE 18

XPath Static Analysis Tasks

Basic Tasks

1

XPath typing

2

XPath query comparisons

query containment, emptiness, overlap, equivalence

Main Applications Static analysis of host languages: error detection, optimization (static type-checkers, optimizing compilers) Checking integrity constraints in XML databases

P . Genevès, EPFL Efficient Static Analysis of XML Paths and Types

slide-19
SLIDE 19

Challenges

Query comparisons and typing are undecidable for the complete XPath language Open Questions What are the largest XPath fragments with decidable static analysis? Which fragments can be effectively decided in a compiler? Is there a generic algorithm able to solve all related XPath decision problems? Difficulties Considered XPath operators and their combination (e.g., multidirectional navigation, recursion) Checking properties on a possibly infinite set of XML documents Very high computational complexity

P . Genevès, EPFL Efficient Static Analysis of XML Paths and Types

slide-20
SLIDE 20

Challenges

Query comparisons and typing are undecidable for the complete XPath language Open Questions What are the largest XPath fragments with decidable static analysis? Which fragments can be effectively decided in a compiler? Is there a generic algorithm able to solve all related XPath decision problems? Difficulties Considered XPath operators and their combination (e.g., multidirectional navigation, recursion) Checking properties on a possibly infinite set of XML documents Very high computational complexity

P . Genevès, EPFL Efficient Static Analysis of XML Paths and Types

slide-21
SLIDE 21

Challenges

Query comparisons and typing are undecidable for the complete XPath language Open Questions What are the largest XPath fragments with decidable static analysis? Which fragments can be effectively decided in a compiler? Is there a generic algorithm able to solve all related XPath decision problems? Difficulties Considered XPath operators and their combination (e.g., multidirectional navigation, recursion) Checking properties on a possibly infinite set of XML documents Very high computational complexity

P . Genevès, EPFL Efficient Static Analysis of XML Paths and Types

slide-22
SLIDE 22

The Logical Approach: Overview

Find an appropriate logic for reasoning on XML trees Formulate the problem into the logic and test satisfiability

XPath Fragment

Schemas Logic

q1 q2 Yes/No Satisfiability Testing Algorithm ¬(ϕ ⇒ ϕ )

2 1

S ϕS Translation Translation

counter- example

Critical Aspects

1

The logic must be expressive enough

2

The algorithm must be effective in practice for XML translations

P . Genevès, EPFL Efficient Static Analysis of XML Paths and Types

slide-23
SLIDE 23

The Logical Approach: Overview

Find an appropriate logic for reasoning on XML trees Formulate the problem into the logic and test satisfiability

XPath Fragment

Schemas Logic

q1 q2 Yes/No Satisfiability Testing Algorithm ¬(ϕ ⇒ ϕ )

2 1

S ϕS Translation Translation

counter- example

Critical Aspects

1

The logic must be expressive enough

2

The algorithm must be effective in practice for XML translations

P . Genevès, EPFL Efficient Static Analysis of XML Paths and Types

slide-24
SLIDE 24

Models for XML Documents

Finite ordered binary trees, one label per node Bijective encoding of unranked trees as binary trees: 1 2 3 1 2 3

P . Genevès, EPFL Efficient Static Analysis of XML Paths and Types

slide-25
SLIDE 25

Formulas of the Lµ Logic

Programs α ∈ {1, 2, 1, 2} for navigating binary trees (α = α) 1 2

Lµ ∋ ϕ, ψ ::= formula ⊤ true | σ | ¬σ atomic prop (negated) |

  • |

¬ starting context (negated) | ϕ ∨ ψ disjunction | ϕ ∧ ψ conjunction | α ϕ | ¬ α ⊤ existential (negated) | X variable | µX.ϕ unary fixpoint | µXi.ϕi in ψ n-ary fixpoint

Closed formulas

P . Genevès, EPFL Efficient Static Analysis of XML Paths and Types

slide-26
SLIDE 26

Semantics of Lµ

The set of models of a formula ϕ is the set of finite binary trees for which ϕ is satisfied on some node µZ.ϕ : finite recursion {1, 2} required for forward axes! {1, 2} required for reverse axes! Converse programs are crucial Almost full XPath can be translated (only variable counting constraints and data value comparisons left) Schemas can also be captured!

P . Genevès, EPFL Efficient Static Analysis of XML Paths and Types

slide-27
SLIDE 27

Semantics of Lµ

The set of models of a formula ϕ is the set of finite binary trees for which ϕ is satisfied on some node Translating in Lµ: following-sibling::a µZ.ϕ : finite recursion {1, 2} required for forward axes! {1, 2} required for reverse axes! Converse programs are crucial Almost full XPath can be translated (only variable counting constraints and data value comparisons left) Schemas can also be captured!

P . Genevès, EPFL Efficient Static Analysis of XML Paths and Types

slide-28
SLIDE 28

Semantics of Lµ

The set of models of a formula ϕ is the set of finite binary trees for which ϕ is satisfied on some node Translating in Lµ: following-sibling::a a a µZ.ϕ : finite recursion {1, 2} required for forward axes! {1, 2} required for reverse axes! Converse programs are crucial Almost full XPath can be translated (only variable counting constraints and data value comparisons left) Schemas can also be captured!

P . Genevès, EPFL Efficient Static Analysis of XML Paths and Types

slide-29
SLIDE 29

Semantics of Lµ

The set of models of a formula ϕ is the set of finite binary trees for which ϕ is satisfied on some node Translating in Lµ: following-sibling::a a c a µZ.ϕ : finite recursion {1, 2} required for forward axes! {1, 2} required for reverse axes! Converse programs are crucial Almost full XPath can be translated (only variable counting constraints and data value comparisons left) Schemas can also be captured!

P . Genevès, EPFL Efficient Static Analysis of XML Paths and Types

slide-30
SLIDE 30

Semantics of Lµ

The set of models of a formula ϕ is the set of finite binary trees for which ϕ is satisfied on some node Translating in Lµ: following-sibling::a a a c a µZ.ϕ : finite recursion {1, 2} required for forward axes! {1, 2} required for reverse axes! Converse programs are crucial Almost full XPath can be translated (only variable counting constraints and data value comparisons left) Schemas can also be captured!

P . Genevès, EPFL Efficient Static Analysis of XML Paths and Types

slide-31
SLIDE 31

Semantics of Lµ

The set of models of a formula ϕ is the set of finite binary trees for which ϕ is satisfied on some node Translating in Lµ: following-sibling::a a b a c a µZ.ϕ : finite recursion {1, 2} required for forward axes! {1, 2} required for reverse axes! Converse programs are crucial Almost full XPath can be translated (only variable counting constraints and data value comparisons left) Schemas can also be captured!

P . Genevès, EPFL Efficient Static Analysis of XML Paths and Types

slide-32
SLIDE 32

Semantics of Lµ

The set of models of a formula ϕ is the set of finite binary trees for which ϕ is satisfied on some node Translating in Lµ: following-sibling::a a ∧

  • µZ.
  • 2
  • 2
  • Z
  • b

a c a µZ.ϕ : finite recursion {1, 2} required for forward axes! {1, 2} required for reverse axes! Converse programs are crucial Almost full XPath can be translated (only variable counting constraints and data value comparisons left) Schemas can also be captured!

P . Genevès, EPFL Efficient Static Analysis of XML Paths and Types

slide-33
SLIDE 33

Semantics of Lµ

The set of models of a formula ϕ is the set of finite binary trees for which ϕ is satisfied on some node Translating in Lµ: following-sibling::a a ∧

  • µZ.
  • 2
  • 2
  • Z
  • /preceding-sibling::b
  • b

a c a µZ.ϕ : finite recursion {1, 2} required for forward axes! {1, 2} required for reverse axes! Converse programs are crucial Almost full XPath can be translated (only variable counting constraints and data value comparisons left) Schemas can also be captured!

P . Genevès, EPFL Efficient Static Analysis of XML Paths and Types

slide-34
SLIDE 34

Semantics of Lµ

The set of models of a formula ϕ is the set of finite binary trees for which ϕ is satisfied on some node Translating in Lµ: following-sibling::a a ∧

  • µZ.
  • 2
  • 2
  • Z
  • a ∧
  • µZ.
  • 2
  • 2
  • Z
  • /preceding-sibling::b

b ∧ [µY. 2 ( ) ∨ 2 Y]

  • b

a c a b µZ.ϕ : finite recursion {1, 2} required for forward axes! {1, 2} required for reverse axes! Converse programs are crucial Almost full XPath can be translated (only variable counting constraints and data value comparisons left) Schemas can also be captured!

P . Genevès, EPFL Efficient Static Analysis of XML Paths and Types

slide-35
SLIDE 35

Semantics of Lµ

The set of models of a formula ϕ is the set of finite binary trees for which ϕ is satisfied on some node Translating in Lµ: following-sibling::a a ∧

  • µZ.
  • 2
  • 2
  • Z
  • a ∧
  • µZ.
  • 2
  • 2
  • Z
  • /preceding-sibling::b

b ∧ [µY. 2 ( ) ∨ 2 Y]

  • b

a c a b µZ.ϕ : finite recursion {1, 2} required for forward axes! {1, 2} required for reverse axes! Converse programs are crucial Almost full XPath can be translated (only variable counting constraints and data value comparisons left) Schemas can also be captured!

P . Genevès, EPFL Efficient Static Analysis of XML Paths and Types

slide-36
SLIDE 36

Semantics of Lµ

The set of models of a formula ϕ is the set of finite binary trees for which ϕ is satisfied on some node Translating in Lµ: following-sibling::a a ∧

  • µZ.
  • 2
  • 2
  • Z
  • a ∧
  • µZ.
  • 2
  • 2
  • Z
  • /preceding-sibling::b

b ∧ [µY. 2 ( ) ∨ 2 Y]

  • b

a c a b µZ.ϕ : finite recursion {1, 2} required for forward axes! {1, 2} required for reverse axes! Converse programs are crucial Almost full XPath can be translated (only variable counting constraints and data value comparisons left) Schemas can also be captured!

P . Genevès, EPFL Efficient Static Analysis of XML Paths and Types

slide-37
SLIDE 37

Semantics of Lµ

The set of models of a formula ϕ is the set of finite binary trees for which ϕ is satisfied on some node Translating in Lµ: following-sibling::a a ∧

  • µZ.
  • 2
  • 2
  • Z
  • a ∧
  • µZ.
  • 2
  • 2
  • Z
  • /preceding-sibling::b

b ∧ [µY. 2 ( ) ∨ 2 Y]

  • b

a c a b µZ.ϕ : finite recursion {1, 2} required for forward axes! {1, 2} required for reverse axes! Converse programs are crucial Almost full XPath can be translated (only variable counting constraints and data value comparisons left) Schemas can also be captured!

P . Genevès, EPFL Efficient Static Analysis of XML Paths and Types

slide-38
SLIDE 38

Satisfiability-Testing Algorithm: Principles

Search for a Tree that Satisfies ψ ψ truth status can be determined from a few of its subformulas A node is a ψ-type (conjunction of formulas) Bottom-up Construction of a Tree of ψ-types A set T of ψ-types is repeatedly updated (least fixpoint computation)

Initially: ∅ Step 1 : all possible leaves are added Step i : all possible parent nodes of current nodes are added

Termination If ψ is present in some node, then ψ is satisfiable Otherwise, the algorithm terminates when no more node can be added

P . Genevès, EPFL Efficient Static Analysis of XML Paths and Types

slide-39
SLIDE 39

Correctness & Complexity

Theorem The satisfiability problem for a formula ψ ∈ Lµ is decidable in time 2O(n) where n is the size of ψ.

P . Genevès, EPFL Efficient Static Analysis of XML Paths and Types

slide-40
SLIDE 40

Experimental Results

The First Implementation Able to handle such a large XPath fragment Able to handle schemas (regular tree types) What Can Now Be Done

Time (s) Solved Problems < 0.5 Comparisons of XPath queries (XPathmark) without tree types < 1 Medium tree types involved (≈ 30 symbols, ≈ 20 variables) Example: W3C SMIL < 3 Large tree types involved (≈ 100 symbols, ≈ 400 variables) Example: W3C XHTML

P . Genevès, EPFL Efficient Static Analysis of XML Paths and Types

slide-41
SLIDE 41

Summary and Perspectives

A New Tree Logic Best balance known between expressiveness/complexity Translation of main XML concepts: linear Implementation already fairly efficient for static analysis Future Work Extensions of the logic

Decidable data-value comparisons Decidable counting constraints

Type inference for XSLT/XQuery without output type annotations More applications in program analysis?

Lµ is as expressive as MSO, and the solver is orders of magnitude faster than MONA...

P . Genevès, EPFL Efficient Static Analysis of XML Paths and Types

slide-42
SLIDE 42

Thank you!

pierre.geneves@epfl.ch

P . Genevès, EPFL Efficient Static Analysis of XML Paths and Types