An Incremental Correction Algorithm for XML Documents and Single - - PowerPoint PPT Presentation

an incremental correction algorithm for xml documents and
SMART_READER_LITE
LIVE PREVIEW

An Incremental Correction Algorithm for XML Documents and Single - - PowerPoint PPT Presentation

An Incremental Correction Algorithm for XML Documents and Single Type Tree Grammars Martin Svoboda, Irena Mlnkov XML and Web Engineering Research Group Charles University in Prague The Czech Republic 24 April 2012 NDT 2012 Dubai, United


slide-1
SLIDE 1

An Incremental Correction Algorithm for XML Documents and Single Type Tree Grammars

Martin Svoboda, Irena Mlýnková

XML and Web Engineering Research Group Charles University in Prague The Czech Republic 24 April 2012 NDT 2012 Dubai, United Arab Emirates

slide-2
SLIDE 2

An Incremental Correction Algorithm for XML Documents 2 NDT 2012, Dubai, UAE 24 April 2012

Outline

  • Introduction
  • Motivation
  • Objectives
  • Approach
  • Corrections
  • Algorithms
  • Experiments
  • Conclusion
slide-3
SLIDE 3

An Incremental Correction Algorithm for XML Documents 3 NDT 2012, Dubai, UAE 24 April 2012

Introduction

  • Motivation
  • Incorrect XML documents

‒ Well-formedness ‒ Schema validity ‒ Data consistency ‒ …

  • Strategies

‒ Adjusting algorithms ‒ Correcting data

slide-4
SLIDE 4

An Incremental Correction Algorithm for XML Documents 4 NDT 2012, Dubai, UAE 24 April 2012

Introduction

  • Problem
  • Input

‒ One XML document

  • Well-formed but (potentially) invalid

‒ DTD or XML Schema

  • Output

‒ All minimal repairs

  • Structural corrections of elements
slide-5
SLIDE 5

An Incremental Correction Algorithm for XML Documents 5 NDT 2012, Dubai, UAE 24 April 2012

Definitions

  • Document
  • Trees

‒ Nodes for elements and texts ‒ Prefix numbering of nodes

  • Example

<a> <x><d/></x> <d><d/><d/></d> </a> a

ε

x d

1

d

1.0

d

1.1

d

0.0

slide-6
SLIDE 6

An Incremental Correction Algorithm for XML Documents 6 NDT 2012, Dubai, UAE 24 April 2012

Definitions

  • Schema
  • Grammars

‒ Terminal symbols for element names ‒ Nonterminal symbols for types ‒ Production rules based on regular expressions

  • Classes

‒ Regular tree grammars ‒ Single type tree grammars (XML Schema) ‒ Local tree grammars (DTD)

slide-7
SLIDE 7

An Incremental Correction Algorithm for XML Documents 7 NDT 2012, Dubai, UAE 24 April 2012

Model

  • Edit operations
  • ADD leaf, REMOVE leaf, RENAME label
  • Update operations
  • Sequences of edit operations
  • INSERT, DELETE, REPAIR, RENAME
  • Cost function
  • Unit costs of edit operations
slide-8
SLIDE 8

An Incremental Correction Algorithm for XML Documents 8 NDT 2012, Dubai, UAE 24 April 2012

Model

a

ε

x d

1

d

1.0

d

1.1

d

0.0

a

ε

d

1

d

1.0

c d

2

d

2.0

d

2.1

a

ε

c d

1

d

1.0

d

1.1

b

ε

d d

1

d

1.0

d

1.1

d

0.0

d

Type Name Model A a C.D* B b D* C c empty D d D*

<a> <x><d/></x> <d> <d/><d/> </d> </a>

slide-9
SLIDE 9

An Incremental Correction Algorithm for XML Documents 9 NDT 2012, Dubai, UAE 24 April 2012

Algorithm

  • Naive algorithm
  • Task

‒ At each level of top-down tree processing… …find repairs for a sequence of sibling nodes

  • Steps

‒ Construct a repairing multigraph ‒ Recursively repair subtrees ‒ Compose a repairing structure

slide-10
SLIDE 10

An Incremental Correction Algorithm for XML Documents 10 NDT 2012, Dubai, UAE 24 April 2012

Algorithm

DELETE RENAME REPAIR INSERT a

ε

x d

1

d

1.0

d

1.1

d

0.0

00 10 20 01 11 21 02 12 22

x d 1 2

Type Name Model A a C.D* B b D* C c empty D d D*

slide-11
SLIDE 11

An Incremental Correction Algorithm for XML Documents 11 NDT 2012, Dubai, UAE 24 April 2012

Algorithm

RENAME REPAIR INSERT

00 10 20 01 11 21 02 12 22

x d 1 2 a

ε

d

1

d

1.0

c d

2

d

2.0

d

2.1

a

ε

c d

1

d

1.0

d

1.1

d RENAME REPAIR

1 2 1

slide-12
SLIDE 12

An Incremental Correction Algorithm for XML Documents 12 NDT 2012, Dubai, UAE 24 April 2012

Algorithms

  • Naive
  • Dynamic
  • Directly follows Dijkstra’s algorithm and, thus, only

required multigraph parts are explored

  • Caching
  • Avoids repeated recursive computations by

detecting and caching identical repairs

  • Incremental
  • Evaluates repairing multigraphs step by step
slide-13
SLIDE 13

An Incremental Correction Algorithm for XML Documents 13 NDT 2012, Dubai, UAE 24 April 2012

Algorithms

  • Incremental
  • Task

‒ Structure encapsulating multigraph evaluation

  • Multigraph structure
  • Dijkstra’s variables
  • Scheduler

‒ Processing of an activated task:

  • Request further refinement of perspective edges
  • Activate corresponding tasks for nested problems
slide-14
SLIDE 14

An Incremental Correction Algorithm for XML Documents 14 NDT 2012, Dubai, UAE 24 April 2012

Experiments

  • Data
  • Single type tree grammar

‒ 7 nonterminal symbols ‒ 6 terminal symbols ‒ Recursion, iteration

  • XML data trees

‒ Maximal depth 5, fan-out 8 ‒ Elements from 100 to 1,000 ‒ 20 files for each particular size ‒ Average values from 20 repeats

slide-15
SLIDE 15

An Incremental Correction Algorithm for XML Documents 15 NDT 2012, Dubai, UAE 24 April 2012

Experiments

  • Execution time in miliseconds

10 20 30 40 200 400 600 800 1000

Elements Caching Incremental

slide-16
SLIDE 16

An Incremental Correction Algorithm for XML Documents 16 NDT 2012, Dubai, UAE 24 April 2012

Experiments

  • Number of correction intents
  • Equals to a number of distinct multigraphs

1000 2000 3000 4000 200 400 600 800 1000

Elements Caching Incremental

slide-17
SLIDE 17

An Incremental Correction Algorithm for XML Documents 17 NDT 2012, Dubai, UAE 24 April 2012

Conclusion

  • Contributions
  • Single type tree grammars
  • Always all minimal repairs
  • New incremental algorithm
  • Advantages
  • Compact repair structure
  • Prototype implementation
slide-18
SLIDE 18

Thank you for your attention…

XML and Web Engineering Research Group

Charles University in Prague