 
              Standardized Data Formats for Quantum Chemistry Based on XML/CML ? A Literature and Web Resources Research Sarah Gerster 17.01.2007
Overview ● Motivation ● A flavour of XML ● Efforts to develop an XML standard for computations in quantum chemistry ● Standard?
Motivation ● Facilitate: – Data exchange – Automated workflows Workflow used by Andreas Elsener – Metadata in his Diploma Thesis – Data storage and retrieval
What is XML? ● Markup Language – combines character data and markup ● Extensible – can create any needed tag ● Structured documents – querying the XML files is reasonably easy
An XML Example 1 <?xml version="1.0" encoding="ISO-8859-1"?> 2 3 <book> 4 5 <chapter>Introduction to XML 6 <para>What is HTML</para> 7 <para>What is XML</para> 8 </chapter> 9 10 <chapter>XML Syntax 11 <para>Elements must have a closing tag</para> 12 <para>Elements must be properly nested</para> 13 </chapter> 14 15 </book> XML example given under http://www.w3schools.com/xml/
Some Issues... ● Database to hold the XML files ● XML Schema or Document Type Definition – outline the structure of an XML file – provide a set of rules – well-formed / valid documents – new standard = new XML Schema
An XML Schema Example ● Structure of an energy entry mandatory optional Value format? Energy Unit au, eV, ? ● Which energies have to be specified? Total Value Unit Electronic Value Unit Nuclear Value Unit
Strengths of XML ● Platform-independent ● Based on international standards ● Web standard ● File is human as well as machine-readable ● A lot of software around to handle XML ● Based on SGML which exists since 1986
Weaknesses of XML ● Redundant and verbose syntax ● Hierarchical model for representation ● Parsers have to check for improperly formatted data ● Parsers should be able to recurse arbitrarily nested data
Chemical Markup Language (CML) ● Implementation of an XML for chemistry ● DTD/Schema covering chemistry in general – substances – quantities – structure – metadata – properties – ... ● Extensions for specific domains, for example for computational chemistry
Drawbacks of CML ● Primarily designed for molecular structures and chemical reactions, not for QC – Validation: Don't use the extensibility of CML to define the additional tags – Efficiency: Not all mandatory fields of CML are required for QC compuations ● The array format is not consistent with the standard XML schema => accessibility problems from different platforms
Other projects to create an XML standard for QC ● Quantum Monte Carlo – ALPS – Zori ● QMWISE ● GAMESS – structured data output ● QCML Example from the QCML working group – wrappers
Standard? ● XML is becoming the de facto standard for transferring data in QC ● CML is a “patchwork” ● Much experience and knowledge in CML ● No other widespread XML schema for QC ● Put efforts together for a new standard?
Thank you for your attention! For further information and references, please refer to: http://www.echinops.ch/downloads/
Recommend
More recommend