1 Document Instances and Grammars 1 Document Instances and Grammars 2.1 XML and XML documents 2.1 XML and XML documents Fundamentals of hierarchical document Fundamentals of hierarchical document � XML XML - - Extensible Markup Language, Extensible Markup Language, � structures, or structures, or W3C Recommendation, February 1998 W3C Recommendation, February 1998 Computer Scientist’ Computer Scientist ’s view of XML s view of XML – not an official standard, but a stable industry standard not an official standard, but a stable industry standard – nd Ed 10/2000, 3 rd Ed 2/2004 2 nd Ed 10/2000, 3 rd – – 2 Ed 2/2004 1.1 XML and XML documents 1.1 XML and XML documents » » editorial revisions, editorial revisions, not not new versions of XML 1.0 new versions of XML 1.0 1.2 Basics of document grammars 1.2 Basics of document grammars � a simplified subset of SGML, Standard a simplified subset of SGML, Standard � 1.3 Basics of XML DTDs 1.3 Basics of XML DTDs Generalized Markup Language, ISO 8879:1987 Generalized Markup Language, ISO 8879:1987 – – what is said later about what is said later about valid valid XML documents applies XML documents applies 1.4 XML Namespaces 1.4 XML Namespaces to SGML documents, too to SGML documents, too XPT 2006 XML Instances and Grammars 1 XPT 2006 XML Instances and Grammars 2 What is XML? What is XML? What is XML (2)? What is XML (2)? � Extensible Extensible Markup Language Markup Language is is not not a markup a markup � � XML XML is is � language! language! – a way to use markup to represent information – a way to use markup to represent information – does not fix a tag set nor its semantics does not fix a tag set nor its semantics – – a a metalanguage metalanguage – (like markup languages like HTML do) (like markup languages like HTML do) » supports definition of specific markup languages through XML » supports definition of specific markup languages through XML � XML documents have XML documents have no inherent no inherent (processing or (processing or DTDs (Document Type Definitions) or Schemas DTDs (Document Type Definitions) or Schemas � » E.g. XHTML a reformulation of HTML using XML » E.g. XHTML a reformulation of HTML using XML presentation) semantics semantics presentation) � Often Often “ “XML XML” ” ≈ ≈ XML + XML technology XML + XML technology � – Implementing those semantics is the topic of this – Implementing those semantics is the topic of this – that is, processing models and languages we that is, processing models and languages we’ ’re re – course! course! studying (and many others ...) studying (and many others ...) XPT 2006 XML Instances and Grammars 3 XPT 2006 XML Instances and Grammars 4 Essential Features of XML Essential Features of XML How does does it it look? look? How <?xml version=’ <?xml version= ’1.0 1.0’ ’ encoding= encoding=” ”iso iso- -8859 8859- -1 1” ” ?> ?> � Overview of XML essentials Overview of XML essentials � <invoice invoice num= num=” ”1234 1234” ”> > < – – many details skipped many details skipped <client client clNum= clNum=” ”00 00- -01 01” ”> > < < <name>Pekka name>Pekka Kilpel Kilpelä äinen</ inen</name name> > – – Learn to consult original sources Learn to consult original sources <email>kilpelai@cs.uku.fi < email>kilpelai@cs.uku.fi</ </email email> > (specifications, documentation etc) for details! (specifications, documentation etc) for details! </client </ client> > » The XML specification is easy to browse » The XML specification is easy to browse < <item item price= price=” ”60 60” ” unit= unit=” ”EUR EUR” ”> > XML Handbook Handbook</ </item item> > XML � First of all, XML is a textual or character First of all, XML is a textual or character- -based based � < <item item price= price=” ”350 350” ” unit= unit=” ”FIM FIM” ”> > way to represent data way to represent data XSLT Programmer XSLT Programmer’ ’s s Ref Ref</ </item item> > </invoice </ invoice> > XPT 2006 XML Instances and Grammars 5 XPT 2006 XML Instances and Grammars 6 XML Document Characters XML Document Characters External Aspects of Characters External Aspects of Characters � XML documents are made of ISO XML documents are made of ISO- -10646 (32 10646 (32- -bit) bit) � Documents are stored/transmitted as a sequence Documents are stored/transmitted as a sequence � � characters characters; in practice of their 16 ; in practice of their 16- -bit Unicode bit Unicode of bytes (of 8 bits). An encoding of bytes (of 8 bits). An encoding determines how determines how subset (used, e.g., in Java) characters are represented represented by bytes. by bytes. subset (used, e.g., in Java) characters are – Unicode 2.0 defines almost 39,000 distinct characters – Unicode 2.0 defines almost 39,000 distinct characters – – UTF UTF- -8 ( 8 ( ≈ ≈ 7 7- -bit ASCII) is the XML default encoding bit ASCII) is the XML default encoding – encoding="KOI8R" – encoding="KOI8R" should be OK for Cyrillic texts should be OK for Cyrillic texts � Characters have three different aspects Characters have three different aspects : : � » » (but I cannot comment on parser support) (but I cannot comment on parser support) – – their identification as numeric code points their identification as numeric code points � A A font font (collection of character images called (collection of character images called � – their their representation representation by bytes by bytes – glyphs ) determines the ) determines the visual presentation visual presentation of of glyphs – their – their visual presentation visual presentation characters characters XPT 2006 XML Instances and Grammars 7 XPT 2006 XML Instances and Grammars 8
Recommend
More recommend