Standardized Data Formats for Quantum Chemistry Based on XML/CML ? - - PowerPoint PPT Presentation
Standardized Data Formats for Quantum Chemistry Based on XML/CML ? - - PowerPoint PPT Presentation
Standardized Data Formats for Quantum Chemistry Based on XML/CML ? A Literature and Web Resources Research Sarah Gerster 17.01.2007 Overview Motivation A flavour of XML Efforts to develop an XML standard for computations in
Overview
- Motivation
- A flavour of XML
- Efforts to develop an XML standard for
computations in quantum chemistry
- Standard?
Motivation
- Facilitate:
– Data exchange – Automated workflows – Metadata – Data storage and retrieval
Workflow used by Andreas Elsener in his Diploma Thesis
What is XML?
- Markup Language
– combines character data and markup
- Extensible
– can create any needed tag
- Structured documents
– querying the XML files is reasonably easy
An XML Example
1 <?xml version="1.0" encoding="ISO-8859-1"?> 2 3 <book> 4 5 <chapter>Introduction to XML 6 <para>What is HTML</para> 7 <para>What is XML</para> 8 </chapter> 9 10 <chapter>XML Syntax 11 <para>Elements must have a closing tag</para> 12 <para>Elements must be properly nested</para> 13 </chapter> 14 15 </book>
XML example given under http://www.w3schools.com/xml/
Some Issues...
- Database to hold the XML files
- XML Schema or Document Type Definition
– outline the structure of an XML file – provide a set of rules – well-formed / valid documents – new standard = new XML Schema
An XML Schema Example
- Structure of an energy entry
- Which energies have to be specified?
Value au, eV, ? Total Value Unit Nuclear Value Unit Electronic Value Unit Energy Unit format? mandatory
- ptional
Strengths of XML
- Platform-independent
- Based on international standards
- Web standard
- File is human as well as machine-readable
- A lot of software around to handle XML
- Based on SGML which exists since 1986
Weaknesses of XML
- Redundant and verbose syntax
- Hierarchical model for representation
- Parsers have to check for improperly
formatted data
- Parsers should be able to recurse
arbitrarily nested data
Chemical Markup Language (CML)
- Implementation of an XML for chemistry
- DTD/Schema covering chemistry in general
– substances – quantities – structure – metadata – properties – ...
- Extensions for specific domains, for
example for computational chemistry
Drawbacks of CML
- Primarily designed for molecular structures
and chemical reactions, not for QC
– Validation: Don't use the extensibility of
CML to define the additional tags
– Efficiency: Not all mandatory fields of
CML are required for QC compuations
- The array format is not consistent with the
standard XML schema => accessibility problems from different platforms
Other projects to create an XML standard for QC
- Quantum Monte Carlo
– ALPS – Zori
- QMWISE
- GAMESS
– structured data output
- QCML
– wrappers
Example from the QCML working group
Standard?
- XML is becoming the de facto standard for
transferring data in QC
- CML is a “patchwork”
- Much experience and knowledge in CML
- No other widespread XML schema for QC
- Put efforts together for a new standard?