xml parsers
play

XML Parsers Asst. Prof. Dr. Kanda Runapongsa Saikaew - PowerPoint PPT Presentation

XML Parsers Asst. Prof. Dr. Kanda Runapongsa Saikaew (krunapon@kku.ac.th) Dept. of Computer Engineering Khon Kaen University 1 Overview What are XML Parsers? Programming Interfaces of XML Parsers DOM: Document Object Model


  1. XML Parsers Asst. Prof. Dr. Kanda Runapongsa Saikaew (krunapon@kku.ac.th) Dept. of Computer Engineering Khon Kaen University 1

  2. Overview  What are XML Parsers?  Programming Interfaces of XML Parsers  DOM: Document Object Model  SAX: Simple API for XML  StAX: Streaming API for XML 2

  3. What are XML Parsers? (1/2)  The most common XML processing task is parsi sing ng an XML document  Parsing involves reading an XML document to determine its structure and contents  It is essential for the automatic processing of XML documents 3

  4. What are XML Parsers? (2/2)  Parsers also check whether documents conform to the XML standard and have a correct structure  There are two types of XML parsers  Validating: check documents against a DTD or an XML schema  Non-validating: do not check documents against a DTD or an XML schema 4

  5. Available Java XML Parsers APIs  SUN  Integrated in JDK 1.4 version and later  Package javax.xml.parsers  Apache Xerces: XML Parsers in Java, C++, and Perl  http://xerces.apache.org/  SAX  http://www.saxproject.org/  XP – an XML Parser in Java  http://www.jclark.com/xml/xp/index.html 5

  6. Programming Interfaces (1/2)  PHP and Java  Document Object Model (DOM)  Model a document as a tree  Java  Simple API for XML (SAX)  The user needs to create the model  Streaming API for XML (StAX)  Use a pull model for event processing  Provide user-friendly APIs for read-in and write-out 6

  7. Programming Interfaces (2/2)  PHP  SimpleXML extension  Provides a very simple and easily usable toolset to convert XML to an object  XMLReader extension  The reader acts as a cursor going forward on the document stream and stopping at each node  XMLWriter extension  The writer that provides a non-cached, forward- only means of generating streams or files containing XML data 7

  8. How to Use a Parser  In general, here’s how you use a parser:  Create a parser object  Point the parser object at your XML document  Process the results  The common XML parsing tools can make the task much simpler 8

  9. What is DOM? (1/2)  DOM is an official recommendation of the W3C  It defines an interface that enables programs to access and update the structure of XML documents  When an XML parser claims to support the DOM, that means it implements the interfaces defined in the standard 9

  10. What is DOM? (2/2)  When you parse an XML document with a DOM parser, you get back a tree of nodes that represent the structure and contents of the XML document  You can access your information by interacting with this tree of nodes 10

  11. DOM Data Modeling  Each element node contains a list of other nodes as its children  These children might contain text values or other nodes  DOM preserves the sequence of the elements that it reads from XML documents 11

  12. DOM Processing Model (1/2)  The DOM Processing Model consists of reading the entire XML document into memory and building a tree representation of the structured data  This process can require a substantial amount of memory when the XML document is large 12

  13. DOM Processing Model (2/2)  By having the data in memory, DOM introduces the capability of manipulating the XML data by  Inserting, editing, or deleting tree elements  It supports random access to any node in the tree 13

  14. What is SAX? (1/2)  SAX is an alternative way of working with the information in your XML document  It was designed to have a smaller memory footprint, but it puts more of the work on the grammar  SAX does not crate a default object model on top of your XML document  SAX was originally developed by David Megginson 14

  15. What is SAX? (2/2)  When you parse an XML document with a SAX parser, the parser generates a series of events as it reads the document  These events are pushed to event handlers  You need to decide what to do with the events when you parse an XML document 15

  16. Sample SAX Events  The startDocumen rtDocument event  For each element, a startEleme rtElement nt event at the start of the element, and an endElement ement event at the end of the element  If an element contains contain, there will be events such as char arac acter ters for additional text  The endDocu Document ment event 16

  17. What is StAX?  StAX is an exciting new parsing technique  Like SAX, it uses an event-driven model  However, instead of using SAX’s push model, StAX uses a pull model for event processing  Instead of using a callback mechanism, a StAX parser returns events as requested by the application 17

  18. SAX vs. StAX  SAX returns different types of event to the ContentHandler  StAX returns its events to the application and can even provide the events as objects  StAX includes factories for creating the StAX reader and writer  Applications can use the StAX interfaces without reference to the details of a particular implementation 18

  19. StAX vs. DOM and SAX  StAX specifies two parsing models  The cursor model  The iterator model  Like SAX, the cursor model simply returns events  The iterator model returns events as objects  Provide a more natural interface but has the additional overhead of object creation 19

  20. DOM vs. SAX (1/3)  In the case of DOM, the parser does almost everything  Read the XML document in  Create an object model on top of it  Give you a reference to this object model (a document object) so that you can manipulate it  SAX does not expect the parser to do much 20

  21. DOM vs. SAX (2/3)  For SAX, the parser should  Read in the XML document  Fire a bunch of events depending on what tags it encounters in the XML document  Then, the programmer needs to make sense of all the tag events and create objects in their own object model 21

  22. DOM vs. SAX (3/3)  SAX can be really fast at runtime if your object model is simple  SAX is faster than DOM because  it bypasses the creation of a tree based object model of your information  On the other hand, you have to write a SAX document handler to interpret all the SAX events 22

  23. Drawbacks of DOM  Partial parsing is not possible  Loading the whole document and building the entire tree structure in memory can be expensive  The DOM tree is an order of magnitude larger than the document  The generic DOM node type is an interoperability advantage but may not be the best when you do object type binding 23

  24. When to Use DOM  When the development needs to be done quickly  DOM is quite easy to implement  When you need to have random access to the XML document  Example: An XSL Processor  When you need to modify an XML document  Example: An XML Editor 24

  25. Drawbacks of SAX  You have to implement the event handlers to handle all incoming events  Must maintain event states in your code  Must keep track of where the parser is in the document  It does not have built-in document navigation support  No random access support 25

  26. When to Use SAX  When you have a small amount of memory  SAX requires little memory because it does not construct an internal representation of the XML data  When you need to only read the content in a single pass  Example: Many B2B and EAI applications use XML just as an encapsulation format in which the receiving end simply retrieves all the data 26

  27. Drawbacks of StAX  It does not have built-in document navigation support  No random access support  Document modification is still quite difficult if you want to do anything beyond simple one-pass transformations 27

  28. When to Use StAX  When applications need to take advantage of the streaming model for performance while maintaining full support of namespaces  For an application that can easily request events from multiple StAX parsers and put them into a single context  Example: Web services 28

  29. Summary of Java Parser APIs  XML parsers are programs to read, manipulate, and create XML documents  To automate the XML processing, XML developers need to develop XML parsers  XML parsers APIs  DOM  + Easy for developers to develop  + Random access  - Requires lots of memory  SAX, StAX  + Fast processing  - Developers need to create their own data model 29

  30. Streaming APIs in PHP  ext/xmlreader and ext/xmlwriter  Allow for XML to be read or written to/from PHP streams  Resulting in very low memory usage  But providing very focused and uni-directional XML support (can write or read only)  To manipulate XML data tree  Using DOM or SimpleXML 30

  31. PHP DOM vs. SimpleXML (1/2)  DOM allows a developer to access and manipulate XML in any way needed, but it comes at a price  DOM is a large and complex API, requiring a developer to really understand all details  SimpleXML aims to break through all the XML complexities and provide an intuitive and simple 31

  32. PHP DOM vs. SimpleXML (2/2)  The vast majority of people working with XML are really only concerned with elements having simple content  DOM models an XML document as a tree  SimpleXML takes an easier approach and views a document as an object  Elements are represented as properties and attributes as accessors 32

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend