Standardized Data Formats for Quantum Chemistry Based on XML/CML ? - - PowerPoint PPT Presentation

standardized data formats for quantum chemistry based on
SMART_READER_LITE
LIVE PREVIEW

Standardized Data Formats for Quantum Chemistry Based on XML/CML ? - - PowerPoint PPT Presentation

Standardized Data Formats for Quantum Chemistry Based on XML/CML ? A Literature and Web Resources Research Sarah Gerster 17.01.2007 Overview Motivation A flavour of XML Efforts to develop an XML standard for computations in


slide-1
SLIDE 1

Standardized Data Formats for Quantum Chemistry Based on XML/CML ?

A Literature and Web Resources Research Sarah Gerster 17.01.2007

slide-2
SLIDE 2

Overview

  • Motivation
  • A flavour of XML
  • Efforts to develop an XML standard for

computations in quantum chemistry

  • Standard?
slide-3
SLIDE 3

Motivation

  • Facilitate:

– Data exchange – Automated workflows – Metadata – Data storage and retrieval

Workflow used by Andreas Elsener in his Diploma Thesis

slide-4
SLIDE 4

What is XML?

  • Markup Language

– combines character data and markup

  • Extensible

– can create any needed tag

  • Structured documents

– querying the XML files is reasonably easy

slide-5
SLIDE 5

An XML Example

1 <?xml version="1.0" encoding="ISO-8859-1"?> 2 3 <book> 4 5 <chapter>Introduction to XML 6 <para>What is HTML</para> 7 <para>What is XML</para> 8 </chapter> 9 10 <chapter>XML Syntax 11 <para>Elements must have a closing tag</para> 12 <para>Elements must be properly nested</para> 13 </chapter> 14 15 </book>

XML example given under http://www.w3schools.com/xml/

slide-6
SLIDE 6

Some Issues...

  • Database to hold the XML files
  • XML Schema or Document Type Definition

– outline the structure of an XML file – provide a set of rules – well-formed / valid documents – new standard = new XML Schema

slide-7
SLIDE 7

An XML Schema Example

  • Structure of an energy entry
  • Which energies have to be specified?

Value au, eV, ? Total Value Unit Nuclear Value Unit Electronic Value Unit Energy Unit format? mandatory

  • ptional
slide-8
SLIDE 8

Strengths of XML

  • Platform-independent
  • Based on international standards
  • Web standard
  • File is human as well as machine-readable
  • A lot of software around to handle XML
  • Based on SGML which exists since 1986
slide-9
SLIDE 9

Weaknesses of XML

  • Redundant and verbose syntax
  • Hierarchical model for representation
  • Parsers have to check for improperly

formatted data

  • Parsers should be able to recurse

arbitrarily nested data

slide-10
SLIDE 10

Chemical Markup Language (CML)

  • Implementation of an XML for chemistry
  • DTD/Schema covering chemistry in general

– substances – quantities – structure – metadata – properties – ...

  • Extensions for specific domains, for

example for computational chemistry

slide-11
SLIDE 11

Drawbacks of CML

  • Primarily designed for molecular structures

and chemical reactions, not for QC

– Validation: Don't use the extensibility of

CML to define the additional tags

– Efficiency: Not all mandatory fields of

CML are required for QC compuations

  • The array format is not consistent with the

standard XML schema => accessibility problems from different platforms

slide-12
SLIDE 12

Other projects to create an XML standard for QC

  • Quantum Monte Carlo

– ALPS – Zori

  • QMWISE
  • GAMESS

– structured data output

  • QCML

– wrappers

Example from the QCML working group

slide-13
SLIDE 13

Standard?

  • XML is becoming the de facto standard for

transferring data in QC

  • CML is a “patchwork”
  • Much experience and knowledge in CML
  • No other widespread XML schema for QC
  • Put efforts together for a new standard?
slide-14
SLIDE 14

Thank you for your attention!

For further information and references, please refer to: http://www.echinops.ch/downloads/