XML Parsers Asst. Prof. Dr. Kanda Runapongsa Saikaew - - PowerPoint PPT Presentation

xml parsers
SMART_READER_LITE
LIVE PREVIEW

XML Parsers Asst. Prof. Dr. Kanda Runapongsa Saikaew - - PowerPoint PPT Presentation

XML Parsers Asst. Prof. Dr. Kanda Runapongsa Saikaew (krunapon@kku.ac.th) Dept. of Computer Engineering Khon Kaen University 1 Overview What are XML Parsers? Programming Interfaces of XML Parsers DOM: Document Object Model


slide-1
SLIDE 1

1

XML Parsers

  • Asst. Prof. Dr. Kanda Runapongsa Saikaew

(krunapon@kku.ac.th)

  • Dept. of Computer Engineering

Khon Kaen University

slide-2
SLIDE 2

2

Overview

 What are XML Parsers?

 Programming Interfaces of XML

Parsers

 DOM: Document Object Model  SAX: Simple API for XML  StAX: Streaming API for XML

slide-3
SLIDE 3

3

What are XML Parsers? (1/2)

The most common XML

processing task is parsi sing ng an XML document

Parsing involves reading an XML

document to determine its structure and contents

It is essential for the automatic

processing of XML documents

slide-4
SLIDE 4

4

What are XML Parsers? (2/2)

Parsers also check whether

documents conform to the XML standard and have a correct structure

There are two types of XML parsers

 Validating: check documents

against a DTD or an XML schema

 Non-validating: do not check

documents against a DTD or an XML schema

slide-5
SLIDE 5

5

Available Java XML Parsers APIs

 SUN

 Integrated in JDK 1.4 version and later  Package javax.xml.parsers

 Apache Xerces: XML Parsers in Java,

C++, and Perl

 http://xerces.apache.org/

 SAX

 http://www.saxproject.org/

 XP – an XML Parser in Java

 http://www.jclark.com/xml/xp/index.html

slide-6
SLIDE 6

6

Programming Interfaces (1/2)

PHP and Java

Document Object Model (DOM)

Model a document as a tree

Java

Simple API for XML (SAX)

The user needs to create the model

Streaming API for XML (StAX)

Use a pull model for event processing Provide user-friendly APIs for read-in and

write-out

slide-7
SLIDE 7

Programming Interfaces (2/2)

 PHP

 SimpleXML extension

 Provides a very simple and easily usable toolset to

convert XML to an object

 XMLReader extension

 The reader acts as a cursor going forward on the

document stream and stopping at each node

 XMLWriter extension

 The writer that provides a non-cached, forward-

  • nly means of generating streams or files

containing XML data

7

slide-8
SLIDE 8

8

How to Use a Parser

In general, here’s how you use a

parser:

 Create a parser object  Point the parser object at your XML

document

 Process the results

The common XML parsing tools

can make the task much simpler

slide-9
SLIDE 9

9

What is DOM? (1/2)

 DOM is an official recommendation of

the W3C

 It defines an interface that enables

programs to access and update the structure of XML documents

 When an XML parser claims to

support the DOM, that means it implements the interfaces defined in the standard

slide-10
SLIDE 10

10

What is DOM? (2/2)

 When you parse an XML

document with a DOM parser, you get back a tree of nodes that represent the structure and contents of the XML document

You can access your information

by interacting with this tree of nodes

slide-11
SLIDE 11

11

DOM Data Modeling

 Each element node contains a

list of other nodes as its children

 These children might contain text

values or other nodes

 DOM preserves the sequence of

the elements that it reads from XML documents

slide-12
SLIDE 12

12

DOM Processing Model (1/2)

 The DOM Processing Model

consists of reading the entire XML document into memory and building a tree representation of the structured data

This process can require a

substantial amount of memory when the XML document is large

slide-13
SLIDE 13

13

DOM Processing Model (2/2)

By having the data in memory,

DOM introduces the capability of manipulating the XML data by

 Inserting, editing, or deleting tree

elements

It supports random access to any

node in the tree

slide-14
SLIDE 14

14

What is SAX? (1/2)

SAX is an alternative way of working

with the information in your XML document

It was designed to have a smaller

memory footprint, but it puts more of the work on the grammar

SAX does not crate a default object

model on top of your XML document

SAX was originally developed by

David Megginson

slide-15
SLIDE 15

15

What is SAX? (2/2)

 When you parse an XML

document with a SAX parser, the parser generates a series of events as it reads the document

 These events are pushed to

event handlers

 You need to decide what to do

with the events when you parse an XML document

slide-16
SLIDE 16

16

Sample SAX Events

The startDocumen

rtDocument event

For each element, a startEleme

rtElement nt event at the start of the element, and an endElement ement event at the end of the element

If an element contains contain,

there will be events such as char arac acter ters for additional text

The endDocu

Document ment event

slide-17
SLIDE 17

17

What is StAX?

StAX is an exciting new parsing

technique

Like SAX, it uses an event-driven

model

However, instead of using SAX’s

push model, StAX uses a pull model for event processing

Instead of using a callback mechanism,

a StAX parser returns events as requested by the application

slide-18
SLIDE 18

18

SAX vs. StAX

 SAX returns different types of event to the

ContentHandler

 StAX returns its events to the application

and can even provide the events as

  • bjects

 StAX includes factories for creating the

StAX reader and writer

 Applications can use the StAX interfaces

without reference to the details of a particular implementation

slide-19
SLIDE 19

19

StAX vs. DOM and SAX

 StAX specifies two parsing models

 The cursor model  The iterator model

 Like SAX, the cursor model simply

returns events

 The iterator model returns events as

  • bjects

 Provide a more natural interface but has

the additional overhead of object creation

slide-20
SLIDE 20

20

DOM vs. SAX (1/3)

In the case of DOM, the parser does

almost everything

 Read the XML document in  Create an object model on top of it  Give you a reference to this object

model (a document object) so that you can manipulate it

SAX does not expect the parser to do

much

slide-21
SLIDE 21

21

DOM vs. SAX (2/3)

For SAX, the parser should

 Read in the XML document  Fire a bunch of events depending

  • n what tags it encounters in the

XML document

Then, the programmer needs to

make sense of all the tag events and create objects in their own

  • bject model
slide-22
SLIDE 22

22

DOM vs. SAX (3/3)

SAX can be really fast at runtime

if your object model is simple

SAX is faster than DOM because

it bypasses the creation of a tree

based object model of your information

On the other hand, you have to

write a SAX document handler to interpret all the SAX events

slide-23
SLIDE 23

23

Drawbacks of DOM

 Partial parsing is not possible  Loading the whole document and

building the entire tree structure in memory can be expensive

 The DOM tree is an order of magnitude

larger than the document

 The generic DOM node type is an

interoperability advantage but may not be the best when you do object type binding

slide-24
SLIDE 24

24

When to Use DOM

When the development needs to be

done quickly

 DOM is quite easy to implement

When you need to have random

access to the XML document

 Example: An XSL Processor

When you need to modify an XML

document

Example: An XML Editor

slide-25
SLIDE 25

25

Drawbacks of SAX

 You have to implement the event

handlers to handle all incoming events

 Must maintain event states in your code  Must keep track of where the parser is

in the document

 It does not have built-in document

navigation support

 No random access support

slide-26
SLIDE 26

26

When to Use SAX

When you have a small amount of

memory

 SAX requires little memory because it

does not construct an internal representation of the XML data

When you need to only read the

content in a single pass

 Example: Many B2B and EAI applications use

XML just as an encapsulation format in which the receiving end simply retrieves all the data

slide-27
SLIDE 27

27

Drawbacks of StAX

 It does not have built-in document

navigation support

 No random access support

 Document modification is still quite

difficult if you want to do anything beyond simple one-pass transformations

slide-28
SLIDE 28

28

When to Use StAX

When applications need to take

advantage of the streaming model for performance while maintaining full support of namespaces

For an application that can easily

request events from multiple StAX parsers and put them into a single context

 Example: Web services

slide-29
SLIDE 29

29

Summary of Java Parser APIs

 XML parsers are programs to read,

manipulate, and create XML documents

 To automate the XML processing, XML

developers need to develop XML parsers

 XML parsers APIs

 DOM

 + Easy for developers to develop  + Random access  - Requires lots of memory

 SAX, StAX

 + Fast processing  - Developers need to create their own data model

slide-30
SLIDE 30

Streaming APIs in PHP

 ext/xmlreader and ext/xmlwriter Allow for XML to be read or written

to/from PHP streams

Resulting in very low memory usage But providing very focused and

uni-directional XML support (can write

  • r read only)

 To manipulate XML data tree Using DOM or SimpleXML

30

slide-31
SLIDE 31

PHP DOM vs. SimpleXML (1/2)

 DOM allows a developer to access

and manipulate XML in any way needed, but it comes at a price

 DOM is a large and complex API,

requiring a developer to really understand all details

 SimpleXML aims to break through all

the XML complexities and provide an intuitive and simple

31

slide-32
SLIDE 32

PHP DOM vs. SimpleXML (2/2)

 The vast majority of people working

with XML are really only concerned with elements having simple content

 DOM models an XML document as a

tree

 SimpleXML takes an easier approach

and views a document as an object

 Elements are represented as properties

and attributes as accessors

32

slide-33
SLIDE 33

33

References

 Sang Shin , “XML Course Page”

http://www.javapassion.com/xml/

 Oracle, “Parsing XML Efficiently”

http://www.oracle.com/technology/oramag/oracle/0 3-sep/o53devxml.html

 Zend Technologies, “XML and PHP5”,

http://devzone.zend.com/article/2387