2. XML Processor APIs Document Parser Interfaces 2. XML Processor APIs Document Parser Interfaces (See, e.g., Leventhal (See, e.g., Leventhal, Lewis & Fuchs: Designing XML , Lewis & Fuchs: Designing XML � How can (Java) applications manipulate How can (Java) applications manipulate � Internet Applications, Chapter 10, and Internet Applications, Chapter 10, and D. D. Megginson Megginson: Events vs. Trees [online]) : Events vs. Trees [online]) structured (XML) documents? structured (XML) documents? � Every XML application contains some kind of Every XML application contains some kind of � – An overview of XML processor interfaces An overview of XML processor interfaces – a parser a parser – – editors, browsers editors, browsers – transformation/style engines, DB loaders, ... transformation/style engines, DB loaders, ... 2.1 SAX: an event- 2.1 SAX: an event -based interface based interface – � XML parsers have become standard tools of XML parsers have become standard tools of � 2.2 DOM: an object- -based interface based interface 2.2 DOM: an object application development frameworks application development frameworks 2.3 JAXP: Java API for XML Processing 2.3 JAXP: Java API for XML Processing – JDK 1.4 contains JAXP, with its default parser – JDK 1.4 contains JAXP, with its default parser (Apache Crimson) (Apache Crimson) XPT 2006 XML APIs: SAX 1 XPT 2006 XML APIs: SAX 2 Tasks of a Parser Tasks of a Parser Document Parser Interfaces Document Parser Interfaces � Document instance decomposition Document instance decomposition � I: Event- I: Event -based interfaces based interfaces – elements, attributes, text, processing instructions, elements, attributes, text, processing instructions, – – Command line and ESIS interfaces – Command line and ESIS interfaces entities, ... entities, ... » Element Structure Information Set, traditional » Element Structure Information Set, traditional � Verification Verification � interface to stand- interface to stand -alone SGML parsers alone SGML parsers – well – well- -formedness formedness checking checking – Event call Event call- -back interfaces: SAX back interfaces: SAX » » syntactical correctness of XML markup syntactical correctness of XML markup – – validation (against a DTD or Schema; optional) – validation (against a DTD or Schema; optional) II: Tree- II: Tree -based (object model) interfaces based (object model) interfaces � Access to contents of the DTD (if supported) Access to contents of the DTD (if supported) � – W3C DOM Recommendation – W3C DOM Recommendation – SAX 2.0 Extensions provide info of declarations: – SAX 2.0 Extensions provide info of declarations: – Java – Java- -specific object models: JAXB, JDOM, dom4J specific object models: JAXB, JDOM, dom4J element type names and their content model element type names and their content model expressions expressions XPT 2006 XML APIs: SAX 3 XPT 2006 XML APIs: SAX 4 Command- -line ESIS interface line ESIS interface Event Call- Event Call -Back Interfaces Back Interfaces Command � Application implements a set of Application implements a set of call call- -back back � Application Application methods for handling parse events for handling parse events methods – – parser notifies the application by method calls parser notifies the application by method calls Ai CDATA 1 Ai CDATA 1 Command Command (E (E – qualified further by parameters: qualified further by parameters: – ESIS ESIS line call line call -Hi! Hi! - » element type name » element type name Stream Stream )E )E » names and values of attributes » names and values of attributes » » values of content strings, values of content strings, … … SGML/XML Parser SGML/XML Parser � Idea behind Idea behind ‘‘ ‘‘SAX SAX’’ ’’ (Simple API for XML) (Simple API for XML) � – an industry standard API for XML parsers – an industry standard API for XML parsers <E i="1"> > Hi! </E> <E i="1" Hi! </E> – – could think as could think as “ “S Serial erial A Access ccess X XML ML” ” XPT 2006 XML APIs: SAX 5 XPT 2006 XML APIs: SAX 6 An event call- An event call -back application back application Object Model Interfaces Object Model Interfaces � The parser builds ... The parser builds ... � Application Main Application Main – a document object consisting of sub a document object consisting of sub- -objects such objects such – Routine Routine Parse() Parse() as document document , , elements, attributes, text elements, attributes, text , , … … as � Abstraction level higher than in event based Abstraction level higher than in event based � startDocument() startDocument () Callback Routines Callback interfaces; more powerful access interfaces; more powerful access "A",[i="1"] "A",[i="1"] – – to descendants, following siblings, to descendants, following siblings, … … startElement startElement() () "Hi!" "Hi!" � Drawback: Higher memory consumption Drawback: Higher memory consumption � Routines characters() characters() – – > used mainly in client applications > used mainly in client applications <?xml version='1.0'?> <?xml version='1.0'?> (to implement document manipulation by user) (to implement document manipulation by user) "A" "A" endElement() () endElement <A i="1"> <A i="1"> Hi! Hi! </A> </A> XPT 2006 XML APIs: SAX 7 XPT 2006 XML APIs: SAX 8
Recommend
More recommend