Refinement Correction Strategy for Invalid XML Documents and Regular - - PowerPoint PPT Presentation

refinement correction strategy
SMART_READER_LITE
LIVE PREVIEW

Refinement Correction Strategy for Invalid XML Documents and Regular - - PowerPoint PPT Presentation

Refinement Correction Strategy for Invalid XML Documents and Regular Tree Grammars Martin Svoboda and Irena Holubova (Mlynkova) svoboda@ksi.mff.cuni.cz DEXA 2014 Munich, Germany September 2, 2014 XML and Web Engineering Research Group


slide-1
SLIDE 1

XML and Web Engineering Research Group

Charles University in Prague

Refinement Correction Strategy

for Invalid XML Documents and Regular Tree Grammars Martin Svoboda and Irena Holubova (Mlynkova)

svoboda@ksi.mff.cuni.cz DEXA 2014 Munich, Germany September 2, 2014

slide-2
SLIDE 2

Refinement Correction Strategy for XML Documents 2 DEXA 2014, Munich September 2, 2014

Outline

  • Introduction
  • Corrections
  • Algorithms
  • Experiments
  • Conclusion
slide-3
SLIDE 3

Refinement Correction Strategy for XML Documents 3 DEXA 2014, Munich September 2, 2014

Introduction

  • Motivation
  • Incorrect XML data

‒ Well-formedness, schema validity, data consistency

  • Input
  • One XML document

‒ Well-formed but (potentially) invalid

  • DTD or XSD schema
  • Goal
  • Structural corrections of elements
slide-4
SLIDE 4

Refinement Correction Strategy for XML Documents 4 DEXA 2014, Munich September 2, 2014

Sample Correction

  • Document

<a> <x><c/></x> <d><c/></d> <d><c/><a/></d> </a>

  • Grammar

[a, C.DA*  A] [b, DB*  B] [c,   C] [d, C*  DA] [d, A|B|C  DB]

slide-5
SLIDE 5

Refinement Correction Strategy for XML Documents 5 DEXA 2014, Munich September 2, 2014

Edit Operations

  • Edit operations
  • Add leaf node
  • Remove leaf node
  • Rename node
  • Edit sequences
  • Insert new subtree
  • Delete existing subtree
  • Repair existing subtree

‒ With an option of node renaming

slide-6
SLIDE 6

Refinement Correction Strategy for XML Documents 6 DEXA 2014, Munich September 2, 2014

Edit Operations

  • Example

renameNode(0,c), removeLeaf(0.0), renameNode(2.1,c)

  • Cost
slide-7
SLIDE 7

Refinement Correction Strategy for XML Documents 7 DEXA 2014, Munich September 2, 2014

Algorithm Idea

  • Recursive processing
  • From the root node towards leaf nodes…
  • … and at each particular data tree node…
  • … correct a sequence of its child nodes
  • Example

C.DA*

slide-8
SLIDE 8

Refinement Correction Strategy for XML Documents 8 DEXA 2014, Munich September 2, 2014

Horizontal Correction

  • Automaton traversal
  • Start

‒ Before the entire node sequence ‒ At the initial automaton state

  • Step

‒ Before some particular node (if any) ‒ At some particular automaton state

  • End

‒ After the entire node sequence ‒ At one of the accepting states

slide-9
SLIDE 9

Refinement Correction Strategy for XML Documents 9 DEXA 2014, Munich September 2, 2014

Correction Multigraphs

  • Structure
  • Vertices
  • Edges
slide-10
SLIDE 10

Refinement Correction Strategy for XML Documents 10 DEXA 2014, Munich September 2, 2014

Shortest Paths

  • Paths
  • Source
  • Targets
slide-11
SLIDE 11

Refinement Correction Strategy for XML Documents 11 DEXA 2014, Munich September 2, 2014

Intent Repair

  • Structure
slide-12
SLIDE 12

Refinement Correction Strategy for XML Documents 12 DEXA 2014, Munich September 2, 2014

Intent Signatures

  • Observation
  • Different intents may lead to identical repairs

‒ We do not need to evaluate them repeatedly

  • Solution
  • Intent signatures
  • Repairs caching
slide-13
SLIDE 13

Refinement Correction Strategy for XML Documents 13 DEXA 2014, Munich September 2, 2014

Correction Strategies

  • Strategies
  • Default
  • Exploring
  • Refinement
slide-14
SLIDE 14

Refinement Correction Strategy for XML Documents 14 DEXA 2014, Munich September 2, 2014

Refinement Strategy

  • Observation
  • Until now we always worked with…

‒ … fully evaluated nested intents ‒ … and therefore their final costs

  • Idea
  • Refinement exploration based on estimations
slide-15
SLIDE 15

Refinement Correction Strategy for XML Documents 15 DEXA 2014, Munich September 2, 2014

Refinement Strategy

slide-16
SLIDE 16

Refinement Correction Strategy for XML Documents 16 DEXA 2014, Munich September 2, 2014

Refinement Strategy

slide-17
SLIDE 17

Refinement Correction Strategy for XML Documents 17 DEXA 2014, Munich September 2, 2014

Refinement Strategy

slide-18
SLIDE 18

Refinement Correction Strategy for XML Documents 18 DEXA 2014, Munich September 2, 2014

Refinement Strategy

slide-19
SLIDE 19

Refinement Correction Strategy for XML Documents 19 DEXA 2014, Munich September 2, 2014

Refinement Strategy

slide-20
SLIDE 20

Refinement Correction Strategy for XML Documents 20 DEXA 2014, Munich September 2, 2014

Refinement Strategy

slide-21
SLIDE 21

Refinement Correction Strategy for XML Documents 21 DEXA 2014, Munich September 2, 2014

Refinement Strategy

slide-22
SLIDE 22

Refinement Correction Strategy for XML Documents 22 DEXA 2014, Munich September 2, 2014

Refinement Strategy

slide-23
SLIDE 23

Refinement Correction Strategy for XML Documents 23 DEXA 2014, Munich September 2, 2014

Refinement Strategy

slide-24
SLIDE 24

Refinement Correction Strategy for XML Documents 24 DEXA 2014, Munich September 2, 2014

Refinement Strategy

slide-25
SLIDE 25

Refinement Correction Strategy for XML Documents 25 DEXA 2014, Munich September 2, 2014

Refinement Strategy

slide-26
SLIDE 26

Refinement Correction Strategy for XML Documents 26 DEXA 2014, Munich September 2, 2014

Refinement Strategy

slide-27
SLIDE 27

Refinement Correction Strategy for XML Documents 27 DEXA 2014, Munich September 2, 2014

Refinement Strategy

slide-28
SLIDE 28

Refinement Correction Strategy for XML Documents 28 DEXA 2014, Munich September 2, 2014

Refinement Strategy

  • Exploration loop
  • Complete vertex

‒ Explore outgoing edges ‒ Obtain first cost estimations ‒ Update current distances

  • Incomplete vertex

‒ Request refinement of open perspective ingoing edges

  • Assign a quota to limit the allowed refinement progress
slide-29
SLIDE 29

Refinement Correction Strategy for XML Documents 29 DEXA 2014, Munich September 2, 2014

Execution times

  • Refinement strategy

1 2 3 4 10k 20k 30k 40k 50k 60k 70k 80k 90k 100k Time in seconds Number of nodes

slide-30
SLIDE 30

Refinement Correction Strategy for XML Documents 30 DEXA 2014, Munich September 2, 2014

Conclusion

  • Features
  • Regular tree grammars
  • Compact repair structure
  • All minimal corrections
  • No parameters required
  • Nearly linear algorithms
slide-31
SLIDE 31

Thank you for your attention…

Faculty of Mathematics and Physics

Charles University in Prague