XML and Web Engineering Research Group
Refinement Correction Strategy for Invalid XML Documents and Regular - - PowerPoint PPT Presentation
Refinement Correction Strategy for Invalid XML Documents and Regular - - PowerPoint PPT Presentation
Refinement Correction Strategy for Invalid XML Documents and Regular Tree Grammars Martin Svoboda and Irena Holubova (Mlynkova) svoboda@ksi.mff.cuni.cz DEXA 2014 Munich, Germany September 2, 2014 XML and Web Engineering Research Group
Refinement Correction Strategy for XML Documents 2 DEXA 2014, Munich September 2, 2014
Outline
- Introduction
- Corrections
- Algorithms
- Experiments
- Conclusion
Refinement Correction Strategy for XML Documents 3 DEXA 2014, Munich September 2, 2014
Introduction
- Motivation
- Incorrect XML data
‒ Well-formedness, schema validity, data consistency
- Input
- One XML document
‒ Well-formed but (potentially) invalid
- DTD or XSD schema
- Goal
- Structural corrections of elements
Refinement Correction Strategy for XML Documents 4 DEXA 2014, Munich September 2, 2014
Sample Correction
- Document
<a> <x><c/></x> <d><c/></d> <d><c/><a/></d> </a>
- Grammar
[a, C.DA* A] [b, DB* B] [c, C] [d, C* DA] [d, A|B|C DB]
Refinement Correction Strategy for XML Documents 5 DEXA 2014, Munich September 2, 2014
Edit Operations
- Edit operations
- Add leaf node
- Remove leaf node
- Rename node
- Edit sequences
- Insert new subtree
- Delete existing subtree
- Repair existing subtree
‒ With an option of node renaming
Refinement Correction Strategy for XML Documents 6 DEXA 2014, Munich September 2, 2014
Edit Operations
- Example
renameNode(0,c), removeLeaf(0.0), renameNode(2.1,c)
- Cost
Refinement Correction Strategy for XML Documents 7 DEXA 2014, Munich September 2, 2014
Algorithm Idea
- Recursive processing
- From the root node towards leaf nodes…
- … and at each particular data tree node…
- … correct a sequence of its child nodes
- Example
C.DA*
Refinement Correction Strategy for XML Documents 8 DEXA 2014, Munich September 2, 2014
Horizontal Correction
- Automaton traversal
- Start
‒ Before the entire node sequence ‒ At the initial automaton state
- Step
‒ Before some particular node (if any) ‒ At some particular automaton state
- End
‒ After the entire node sequence ‒ At one of the accepting states
Refinement Correction Strategy for XML Documents 9 DEXA 2014, Munich September 2, 2014
Correction Multigraphs
- Structure
- Vertices
- Edges
Refinement Correction Strategy for XML Documents 10 DEXA 2014, Munich September 2, 2014
Shortest Paths
- Paths
- Source
- Targets
Refinement Correction Strategy for XML Documents 11 DEXA 2014, Munich September 2, 2014
Intent Repair
- Structure
Refinement Correction Strategy for XML Documents 12 DEXA 2014, Munich September 2, 2014
Intent Signatures
- Observation
- Different intents may lead to identical repairs
‒ We do not need to evaluate them repeatedly
- Solution
- Intent signatures
- Repairs caching
Refinement Correction Strategy for XML Documents 13 DEXA 2014, Munich September 2, 2014
Correction Strategies
- Strategies
- Default
- Exploring
- Refinement
Refinement Correction Strategy for XML Documents 14 DEXA 2014, Munich September 2, 2014
Refinement Strategy
- Observation
- Until now we always worked with…
‒ … fully evaluated nested intents ‒ … and therefore their final costs
- Idea
- Refinement exploration based on estimations
Refinement Correction Strategy for XML Documents 15 DEXA 2014, Munich September 2, 2014
Refinement Strategy
Refinement Correction Strategy for XML Documents 16 DEXA 2014, Munich September 2, 2014
Refinement Strategy
Refinement Correction Strategy for XML Documents 17 DEXA 2014, Munich September 2, 2014
Refinement Strategy
Refinement Correction Strategy for XML Documents 18 DEXA 2014, Munich September 2, 2014
Refinement Strategy
Refinement Correction Strategy for XML Documents 19 DEXA 2014, Munich September 2, 2014
Refinement Strategy
Refinement Correction Strategy for XML Documents 20 DEXA 2014, Munich September 2, 2014
Refinement Strategy
Refinement Correction Strategy for XML Documents 21 DEXA 2014, Munich September 2, 2014
Refinement Strategy
Refinement Correction Strategy for XML Documents 22 DEXA 2014, Munich September 2, 2014
Refinement Strategy
Refinement Correction Strategy for XML Documents 23 DEXA 2014, Munich September 2, 2014
Refinement Strategy
Refinement Correction Strategy for XML Documents 24 DEXA 2014, Munich September 2, 2014
Refinement Strategy
Refinement Correction Strategy for XML Documents 25 DEXA 2014, Munich September 2, 2014
Refinement Strategy
Refinement Correction Strategy for XML Documents 26 DEXA 2014, Munich September 2, 2014
Refinement Strategy
Refinement Correction Strategy for XML Documents 27 DEXA 2014, Munich September 2, 2014
Refinement Strategy
Refinement Correction Strategy for XML Documents 28 DEXA 2014, Munich September 2, 2014
Refinement Strategy
- Exploration loop
- Complete vertex
‒ Explore outgoing edges ‒ Obtain first cost estimations ‒ Update current distances
- Incomplete vertex
‒ Request refinement of open perspective ingoing edges
- Assign a quota to limit the allowed refinement progress
Refinement Correction Strategy for XML Documents 29 DEXA 2014, Munich September 2, 2014
Execution times
- Refinement strategy
1 2 3 4 10k 20k 30k 40k 50k 60k 70k 80k 90k 100k Time in seconds Number of nodes
Refinement Correction Strategy for XML Documents 30 DEXA 2014, Munich September 2, 2014
Conclusion
- Features
- Regular tree grammars
- Compact repair structure
- All minimal corrections
- No parameters required
- Nearly linear algorithms