XML GUS Data Loading The Genomics Unified Schema Users and - PowerPoint PPT Presentation

XML GUS Data Loading The Genomics Unified Schema User’s and Developer’s Workshop July 7, 2005 Josef Jurek Daphne Preuss Laboratory Molecular Genetics and Cell Biology The University of Chicago jurek@cs.uchicago.edu Terry Clark, Josef Jurek, Gregory Kettler, and Daphne Preuss, A Structured Interface to the Object-Oriented Genomics Unified Schema for XML Formatted Data , Applied Bioinformatics , in Press, Spring 2005. 1

Goals Formulate an XML interface that includes relational database key con- straint definitions Create an XML for GUS generalized enough to input data into any table or group of tables Regularize the traversal though that XML (syntax checking). Allow for user/site specific processing of data. 2

What the User Requires • The XMLGUS plugin, available at http://amrit.ittc.ku.edu/flora. XML::YYLex (for XML processing) XML::DOM processor (provides the lexical analysis for the parser) Berkeley YACC compiler generator Perl-byacc • A user designed XML scheme for marking up data. • A context-free grammar or CFG. (Don’t be alarmed). There are also some CFG’s available at http://flora.uchicago.edu/grammars. • Optional user-defined functions for additional processing of data. 3

An Example of User Designed XML Tags for XMLGUS < gus > < dots nasequence depth=”0” > . < dots sequencetype fkobj=”dots::sequencetype” depth=”1” > . < name > DNA < /name > . < /dots sequencetype > . < sequencetypeid pkobj=”dots::sequencetype” key=”sequence type id”/ > . < sres taxonname fkobj=”sres::taxonname” depth=”1” > . < name > Olimarabidopsis pumila < /name > . < /sres taxonname > . < taxonid pkobj=”sres::taxonname” key=”taxon id”/ > . < description > OPM18B21 Contig10 < /description > . < sequence > ATCGGAGTCAGGCTGGAAGACAACTCCTCTGCGAAGTCGCGGTGAGTTTTAGT GCATCGATGAATTTACGGATGACAACACTGTTTGTACTCTCTAAAACAACCAG CCACCTAGCACAACAACTTTACCCCGAATATCTTATCACATATCTTTTAAAGT . < /sequence > < /dots nasequence > < /gus > 4

Deriving Foreign Keys from Candidate Keys . < dots sequencetype fkobj=”dots::sequencetype” depth=”1” > . < name > DNA < /name > . < /dots sequencetype > . < sequencetypeid pkobj=”dots::sequencetype” key=”sequence type id”/ > DoTS::NASequence (view on GUS::Model::DoTS::NASequenceImp) column null? type parent table na sequence id no number(10) sequence version no number(3) subclass view no varchar2(30) sequence type id no number(4) DoTS::SequenceType taxon id number(12) SRes::Taxon sequence clob(4000) length number(12) ... ... ... ... 5

Example of a user designed XML for XMLGUS (Again) < gus > < dots nasequence depth=”0” > . < dots sequencetype fkobj=”dots::sequencetype” depth=”1” > . < name > DNA < /name > . < /dots sequencetype > . < sequencetypeid pkobj=”dots::sequencetype” key=”sequence type id”/ > . < sres taxonname fkobj=”sres::taxonname” depth=”1” > . < name > Olimarabidopsis pumila < /name > . < /sres taxonname > . < taxonid pkobj=”sres::taxonname” key=”taxon id”/ > . < description > OPM18B21 Contig10 < /description > . < sequence > ATCGGAGTCAGGCTGGAAGACAACTCCTCTGCGAAGTCGCGGTGAGTTTTAGT GCATCGATGAATTTACGGATGACAACACTGTTTGTACTCTCTAAAACAACCAG CCACCTAGCACAACAACTTTACCCCGAATATCTTATCACATATCTTTTAAAGT . < /sequence > < /dots nasequence > < /gus > 6

Another XML Example: inserting rows into child tables < gus > < dots nafeature depth=”0” > . < dots externalnasequence depth=”1” fkobj=”dots::genefeature” > . < name > Arabidopsis thaliana < /name > . < sres externaldatabaserelease depth=”2” fkobj=”dots::externalnasequence” > . < sres externaldatabase depth=”3” fkobj=”sres::externaldatabaserelease” > . < lowercase name > ncbi < /lowercase name > . < /sres externaldatabase > . < external database id pkobj=”sres::externaldatabase” key=”external database id”/ > . < version > NC 003070.5 < /version > . < /sres externaldatabaserelease > . < external database release id pkobj=”sres::externaldatabaserelease” key=”external database release id”/ > . < /dots externalnasequence > . < na sequence id pkobj=”dots::externalnasequence” key=”na sequence id”/ > . < name > misc feature < /name > . < dots nalocation depth=”1” > . < start min > 1 < /start min > . < end max > 444 < /end max > . < is reversed > 0 < /is reversed > . < /dots nalocation > . < dots nafeaturecomment depth=”1” > . < comment string > . nucleotide sequence in this region was derived from BAC clone TEL1N. . < /comment string > . < /dots nafeaturecomment > < /dots nafeature > < /gus > 7

Another Example of Deriving Foreign Keys from Candidate Keys DoTS:ExternalNASequence is a parent of . SRes:ExternalDatabaseRelease is a parent of . SRes:ExternalDatabase < dots externalnasequence depth=”1” fkobj=”dots::genefeature” > . < name > Arabidopsis thaliana < /name > . < sres externaldatabaserelease depth=”2” fkobj=”dots::externalnasequence” > . < sres externaldatabase depth=”3” fkobj=”sres::externaldatabaserelease” > . < lowercase name > ncbi < /lowercase name > . < /sres externaldatabase > . < external database id pkobj=”sres::externaldatabase” key=”external database id”/ > . < version > NC 003070.5 < /version > . < /sres externaldatabaserelease > . < external database release id pkobj=”sres::externaldatabaserelease” key=”external database release id”/ > < /dots externalnasequence > < na sequence id pkobj=”dots::externalnasequence” key=”na sequence id”/ > 8

Resolving Foreign Keys from Candidate Keys Once per File < gus > < sres externaldatabaserelease depth=”0” fkobj=”dots::externalnasequence” > . < sres externaldatabase depth=”1” fkobj=”sres::externaldatabaserelease” > . < lowercase name > ncbi < /lowercase name > . < /sres externaldatabase > . < external database id pkobj=”sres::externaldatabase” key=”external database id”/ > . < version > NC 003070.5 < /version > < /sres externaldatabaserelease > < dots externalnasequence depth=”0” fkobj=”dots::genefeature” > . < external database release id pkobj=”sres::externaldatabaserelease” key=”external database release id”/ > . < name > Arabidopsis thaliana < /name > < /dots externalnasequence > < dots nafeature depth=”0” > . < na sequence id pkobj=”dots::externalnasequence” key=”na sequence id”/ > . < name > misc feature < /name > . < dots nalocation depth=”1” > . < start min > 1 < /start min > . < end max > 444 < /end max > . < is reversed > 0 < /is reversed > . < /dots nalocation > < /dots nafeature > < dots nafeature depth=”0” > . [...] < /dots nafeature > < dots nafeature depth=”0” > . [...] < /dots nafeature > < /gus > 9

The XMLGUS Context Free Grammars (CFG) Written in YACC, compiled by Perl-byacc into PERL. Consists principally of variables and terminals associated with GUSXML elements (table names, table attribute names). Some pre-written XMLGUS Grammars are available from the University of Chicago at http://flora.uchicago.edu/grammars. 10

Production/Rule for Table P1 DOTS NASEQUENCE: dots nasequence P1 DOTS NASEQUENCE SET dots nasequence { . GUS::Common::Plugin::XMLGUS::process xml rule( . undef, undef, . ”DoTS::NASequence”, . $2- > getNodeValue, . $1- > getAttribute(”pkobj”), . $1- > getAttribute(”fkobj”), . $1- > getAttribute(”key”), . $1- > getAttribute(”depth”) . ); } ; P1 DOTS NASEQUENCE SET: P1 DOTS NASEQUENCE ATT | . . P1 DOTS NASEQUENCE SET P1 DOTS NASEQUENCE ATT; 11

Production/Rule for Table Attributes P1 DOTS NASEQUENCE ATT: P2 DOTS NASEQUENCE DESCRIPTION | . P2 DOTS NASEQUENCE LENGTH | . P2 DOTS NASEQUENCE SEQUENCE | . P2 DOTS NASEQUENCE A COUNT | . P2 DOTS NASEQUENCE C COUNT | . P2 DOTS NASEQUENCE G COUNT | . P2 DOTS NASEQUENCE T COUNT | . P2 DOTS NASEQUENCE OTHER COUNT | . F1 DOTS SEQUENCETYPE | . P2 DOTS NASEQUENCE SEQUENCE TYPE ID | . F2 SRES TAXONNAME | . P2 DOTS NASEQUENCE TAXON ID | . N1 DOTS NASEQUENCEKEYWORD | . . N1 F3 DOTS KEYWORD; P2 DOTS NASEQUENCE DESCRIPTION: description TEXT description { . GUS::Common::Plugin::XMLGUS::process xml rule( . undef, undef, . ”DoTS::NASequence::description”, . $2- > getNodeValue, . $1- > getAttribute(”pkobj”), . $1- > getAttribute(”fkobj”), . $1- > getAttribute(”key”), . $1- > getAttribute(”depth”) . ); } ; 12

XML GUS Data Loading The Genomics Unified Schema Users and - PowerPoint PPT Presentation

XML GUS Data Loading The Genomics Unified Schema Users and Developers Workshop July 7, 2005 Josef Jurek Daphne Preuss Laboratory Molecular Genetics and Cell Biology The University of Chicago jurek@cs.uchicago.edu Terry Clark, Josef

Module 2 Module 2 XML Basics XML Basics (XML, Namespaces, (XML, Namespaces, Usage scenarios,

XML and Web Services Lecture 8 1 Outline XML (Section 17) XML syntax, semistructured

Binary XML and its Characterization Robin Berjon, XML Prague, 25/06/2005 What is Binary XML?

XML in Programming Patryk Czarnik XML and Applications 2015/2016 Lecture 5 4.04.2016 XML in

XML Documents XML Documents The XML Namespace mechanism Anders Mller & Michael I.

Java 2 Micro Edition XML F. Ricci 2010/2011 J2Me XML overview XML, REST Parsing XML :

Querying XML Documents Querying XML Documents How XML may be supported in databases with

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

Transforming XML Documents Transforming XML Documents How the XSLT language transforms XML

Session 23 XML XML Reading and Reference Reading https://en.wikipedia.org/wiki/XML

XML and Content Management Lecture 3: Modelling XML Documents: XML Schema Maciej Ogrodniczuk,

Modelling XML Applications Patryk Czarnik XML and Applications 2015/2016 Lecture 2

XML Walking the Tree Modifying the Tree Generating XML Documents Creating Documents Volker

Modelling XML Applications Patryk Czarnik XML and Applications 2013/2014 Lecture 2

How does does it it look? look? How <?xml version= <?xml version= 1.0 1.0

Modelling XML Applications Patryk Czarnik XML and Applications 2013/2014 Lecture 2

A new phylo-HMM paradigm to search for sequences Jean-Baka D OMELEVO E NTFELLNER & Olivier G

On the convergence of Boolean automata networks without negative cycles Tarek Melliti and Damien

Inferring parameters in genetic regulatory networks Camilo La Rota 1 Fabien Tarissan 2 Leo Liberti

Machine Learning Methods for Metabolic Pathway Prediction Joseph M. Dale, Liviu Popescu, and

Bioinformatics: Network Analysis Network Motifs COMP 572 (BIOS 572 / BIOE 564) - Fall 2013 Luay

Structure-to-Function Theory for Boolean Networks Henning S. Mortveit Department of Engineering

Workshop Schedule 9am Introductions & Running the VM 10:30am Coffee 11am

Introduction to Bioinformatics Biological words Recap p DNA codes information with alphabet of 4

XML GUS Data Loading The Genomics Unified Schema Users and - PowerPoint PPT Presentation

XML GUS Data Loading The Genomics Unified Schema Users and Developers Workshop July 7, 2005 Josef Jurek Daphne Preuss Laboratory Molecular Genetics and Cell Biology The University of Chicago jurek@cs.uchicago.edu Terry Clark, Josef

Module 2 Module 2 XML Basics XML Basics (XML, Namespaces, (XML, Namespaces, Usage scenarios,

XML and Web Services Lecture 8 1 Outline XML (Section 17) XML syntax, semistructured

Binary XML and its Characterization Robin Berjon, XML Prague, 25/06/2005 What is Binary XML?

XML in Programming Patryk Czarnik XML and Applications 2015/2016 Lecture 5 4.04.2016 XML in

XML Documents XML Documents The XML Namespace mechanism Anders Mller &amp; Michael I.

Java 2 Micro Edition XML F. Ricci 2010/2011 J2Me XML overview XML, REST Parsing XML :

Querying XML Documents Querying XML Documents How XML may be supported in databases with

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

Transforming XML Documents Transforming XML Documents How the XSLT language transforms XML

Session 23 XML XML Reading and Reference Reading https://en.wikipedia.org/wiki/XML

XML and Content Management Lecture 3: Modelling XML Documents: XML Schema Maciej Ogrodniczuk,

Modelling XML Applications Patryk Czarnik XML and Applications 2015/2016 Lecture 2

XML Walking the Tree Modifying the Tree Generating XML Documents Creating Documents Volker

Modelling XML Applications Patryk Czarnik XML and Applications 2013/2014 Lecture 2

How does does it it look? look? How &lt;?xml version= &lt;?xml version= 1.0 1.0

Modelling XML Applications Patryk Czarnik XML and Applications 2013/2014 Lecture 2

A new phylo-HMM paradigm to search for sequences Jean-Baka D OMELEVO E NTFELLNER &amp; Olivier G

On the convergence of Boolean automata networks without negative cycles Tarek Melliti and Damien

Inferring parameters in genetic regulatory networks Camilo La Rota 1 Fabien Tarissan 2 Leo Liberti

Machine Learning Methods for Metabolic Pathway Prediction Joseph M. Dale, Liviu Popescu, and

Bioinformatics: Network Analysis Network Motifs COMP 572 (BIOS 572 / BIOE 564) - Fall 2013 Luay

Structure-to-Function Theory for Boolean Networks Henning S. Mortveit Department of Engineering

Workshop Schedule 9am Introductions &amp; Running the VM 10:30am Coffee 11am

Introduction to Bioinformatics Biological words Recap p DNA codes information with alphabet of 4

XML Documents XML Documents The XML Namespace mechanism Anders Mller & Michael I.

How does does it it look? look? How <?xml version= <?xml version= 1.0 1.0

A new phylo-HMM paradigm to search for sequences Jean-Baka D OMELEVO E NTFELLNER & Olivier G

Workshop Schedule 9am Introductions & Running the VM 10:30am Coffee 11am