management of quantified semantic taxonomies for
play

Management of Quantified Semantic Taxonomies for Biothreat Response - PowerPoint PPT Presentation

Management of Quantified Semantic Taxonomies for Biothreat Response Cliff Joslyn Computer and Computational Sciences Los Alamos National Laboratory Modeling, Algorithms, and Informatics (CCS-3) DIMACS Tutorial and Working Group on


  1. Management of Quantified Semantic Taxonomies for Biothreat Response Cliff Joslyn Computer and Computational Sciences Los Alamos National Laboratory Modeling, Algorithms, and Informatics (CCS-3) DIMACS Tutorial and Working Group on Order-Theoretic Aspects of Epidemiology March, 2005 Los Alamos Unlimited Release 04-8407, 05-0340, 05-0640, 05-0907, 05-1621

  2. OUTLINE • Knowledge integration for biothtreat response • Bio-ontologies • Order theoretical representations and approaches: POSet Ontologies (POSOs) • Categorization and annotation problems • Quantified POSOs • Interoperability problem: towards a mathematical definition Cliff Joslyn, joslyn@lanl.gov dimacs05f, p. 1, 3/8/2005

  3. KNOWLEDGE INTEGRATION FOR BIOTHREAT RESPONSE • Rapid response to a novel Presentation biothreat • Past experiences: flu, resis- Alert tant TB, SARS, ebola, an- Genomic/ Diagnostic Proteomic thrax Agent Identification • Natural or engineered Lethality Virulence • Mucho funding: NIH, NSF, Agent Characterization DHS, DOD, DARPA, DOE Immunological Pathogenesis • New Los Alamos effort in Pathways Transmissibility Disease computational and theoret- Characterization Containment Therapeutic ical pathomics Response • Integration of knowledge bases within a biothreat Attribution response workflow KM Verspoor, CA Joslyn, JA Ambrosiano, A B¨ acker, O Bodenreider, L Hirschman, P Karp, H Kelly, S Loranger, M Musen, R Sriram, C Wroe: (2005) “Knowledge Integration for Biothreat Response”, Los Alamos Technical Report 05-0907 Cliff Joslyn, joslyn@lanl.gov dimacs05f, p. 2, 3/8/2005

  4. BIO-ONTOLOGIES • Domain-specific concepts and their semantic relations • At least: taxonomic, semantic hierarchies of typed objects and relations • In addition: inference engines over these data objects • Genomic revolution: large collections of hierarchically orga- nized categorizations of biological objects such as genes and proteins • IT revolution generally: anatomy, clinical, epidemeological • Computational biology primary success story for ontology development • Rapid proliferation: many more, more coming, other fields Gene Ontology http://www.geneontology.org Fundamental Model of Anatomy http://sig.biostr.washington.edu/projects/fm/AboutFM.html Unified Medical Language System http://www.nlm.nih.gov/research/umls Open Biology Ontologies http://obo.sourceforge.net MEdical Subject Headings http://www.nlm.nih.gov/mesh/meshhome.html Enzyme Structures Database http://www.biochem.ucl.ac.uk/bsm/enzymes Cliff Joslyn, joslyn@lanl.gov dimacs05f, p. 3, 3/8/2005

  5. GENE ONTOLOGY (GO): DNA METABOLISM PORTION • Taxonomic con- trolled vocabulary • ∼ 16 K nodes P GO populated by genes, proteins • Two orders on P GO : ≤ isa , ≤ has • Major community effort: assuming primary position in general bioin- Gene Ontology Consortium (2000): “Gene Ontology: Tool formatics For the Unification of Biology”, Nature Genetics , 25:25-29 • Tremendous computational resource: large, semantically rich, validated, middle ontology, first (?) in major use Cliff Joslyn, joslyn@lanl.gov dimacs05f, p. 4, 3/8/2005

  6. GO CA. 2001 Courtesy of Robert Kueffner, NCGR, 2001 Cliff Joslyn, joslyn@lanl.gov dimacs05f, p. 5, 3/8/2005

  7. CATEGORIZATION TASK: “CLUSTER” GENES IN ONTOLOGY SPACE • Develop functional hypotheses about genes identified through expression experiments • Given the Gene Ontology (GO) . . . • And a list of hundreds of genes of interest . . . • “Splatter” them over the GO . . . • Where do they end up? – Concentrated? – Dispersed – Clustered? – High or low? – Overlapping or distinct? Joslyn, Cliff; Mniszewski, Susan; Fulmer, Andy; and Heaton, Gary: (2004) “The Gene Ontology Categorizer”, Bioinformatics , v. 20 :s1, pp. 169-177 Cliff Joslyn, joslyn@lanl.gov dimacs05f, p. 6, 3/8/2005

  8. ANNOTATION TASK • Mappings among regions of sequence, structure, key- word spaces x • Mappings into regions of biological function space: Sequences taxonomic bio-ontologies of molecular function • Characterize formal struc- ture of bio-ontologies: – Order theoretical ap- proaches Structures Functions – Combinatoric algorithms KM Verspoor, JD Cohn, SM Mniszewski, and CA Joslyn: (2004) “Nearest Neighbor Catego- rization for Function Prediction”, in: Proc. 5th Community Wide Experiment on the Critical Assessment of Techniques for Protein Struc- ture Prediction (CASP 05) , in press Keywords/Literature Cliff Joslyn, joslyn@lanl.gov dimacs05f, p. 7, 3/8/2005

  9. INTEROPERABILITY TASKS: MERGING AND MATCHING Matching: Measure similarity between 1 1 two regions of a single ontology Comparing: Twist one ontology on a A C K given term set into another ordering i Merging: Given two completely dis- tinct ontologies: G F E J b g,h b g,h,i • Identify structurally similar re- gions: intersection I D j j • Create encompassing meta- GO EC ontologies: product or union? Cliff Joslyn, joslyn@lanl.gov dimacs05f, p. 8, 3/8/2005

  10. ORDER THEORETICAL KNOWLEDGE DISCOVERY • Cast databases as (collections of) ordered data objects: Native: Constructed explicitly (e.g. ontologies) Induced: From other relational data (e.g. concept lattices) • With inherent semantics: node, link types; metadata; text • Equipped with measures: Combinatorial: Distance, rank Statistical: Various scores, entropy measures . . . • Tasks: Induction, navigation, visualization, link analysis, search, classification, retrieval, anomaly detection, merger, linkage • Motivated now by appearance of databases and methods • Substantial progress and value from novel applications of elementary concepts • Need help : algorithms, mathematics, applications, funding, concepts, organization? Joslyn, Cliff; Oliverira, Joseph; and Scherrer, Chad: (2004) “Order Theoretical Knowledge Discovery: A White Paper”, Los Alamos Technical Report 04-5812, ftp://ftp.c3.lanl.gov/pub/users/joslyn/white.pdf Cliff Joslyn, joslyn@lanl.gov dimacs05f, p. 9, 3/8/2005

  11. SEMANTIC HIERARCHIES AS PARTIALLY ORDERED SETS • Partial Order: Set P ; relation ≤ ⊆ Directed P 2 : reflexive, anti-symmetric, tran- Graph sitive • Poset: P = � P, ≤� Partial Order = • Simplest mathematical structures Poset = DAG which admit to descriptions in terms of “levels” and “hierarchies” • More specific than graphs or net- Lattice works: no cycles, equivalent to Di- rected Acyclic Graphs (DAGs) • More general than trees, lattices: Tree single nodes, pairs of nodes can have multiple parents • Ubiquitous in knowledge systems: Antichain Chain constructed, induced, empirical Cliff Joslyn, joslyn@lanl.gov dimacs05f, p. 10, 3/8/2005

  12. BASIC POSET CONCEPTS Comparable Nodes: a ∼ b := a ≤ b or b ≤ a Chain: Collection of comparable nodes: a 1 ≤ a 2 ≤ . . . ≤ a n Chains: a ≤ b → C ( a, b ) := { C 1 ( a, b ) , . . . , C j ( a, b ) , . . . , C M ( a, b ) } ⊆ 2 2 P , and use C j , 1 ≤ j ≤ M . 1 Height: Size of maximal chain: H ( P ) Noncomparable Nodes: a �∼ b B C K Antichain: Collection of noncom- parable nodes: a 1 �∼ a 2 �∼ . . . �∼ I a n f F G E J Width: Size of maximal antichain b,d b g,h,i H W ( P ) e Interval: [ a, b ] := { c ∈ P : a ≤ c ≤ b } A D is a bounded sub-poset of P a,b,c j Cliff Joslyn, joslyn@lanl.gov dimacs05f, p. 11, 3/8/2005

  13. SOME GO POSET STATISTICS H W Nodes Leaves Interior Edges MF 7.0K 5.6K 1.3K 8.1K 13 ≥ 3 . 5K BP 7.7K 4.1K 3.6K 11.8K 15 ≥ 2 . 9K CC 1.3K 0.9K 0.4K 1.7K 13 ≥ 0 . 4K GO 16.0K 10.6K 5.4K 21.5K 16 ≥ 5 . 9K • GO for September, 2003 • Model as P GO = � P GO , ≤ isa ∪ ≤ has � Cliff Joslyn, joslyn@lanl.gov dimacs05f, p. 12, 3/8/2005

  14. DAGS, POSETS, AND COVERS 1 Graphical DAG: Γ := { γ 1 , γ 2 , . . . , γ i , . . . , γ n } Directed Edge: B C K γ i = � a, b � ∈ P 2 , a, b ∈ P . I Also use γ ( a, b ). F G E J Relational DAG: H D (Γ) := � P, ⇐� , where ⇐ ⊆ P 2 , ∀ a, b ∈ P, a ⇐ b ↔ A D � a, b � ∈ Γ. 0 Cover: V ( D ) := � P, < ·� , transitive reduction of ⇐ Poset: P ( D ) := � P, ≤� , transitive and reflexive closure of ⇐ . Ideal, Filter: ↓ ( a ) := { b ∈ P : b ≤ a } , ↑ ( a ) := { b ∈ P : a ≤ b } Children, Parents: ↓ ( a ) := { b ∈ P : b < · a } , ˙ ˙ ↑ ( a ) := { b ∈ P : a < · b } Cliff Joslyn, joslyn@lanl.gov dimacs05f, p. 13, 3/8/2005

  15. CHAIN DECOMPOSITION OF INTERVALS Assume a ≤ b ∈ P 1 Chain Decomposition: M � [ a, b ] = C j B C K j =1 Dilworth: M ≥ W ([ a, b ]) I Chain Length: F G E J h j := | C j | − 1 , ¯ h j := h j / ( H − 1) H Vectors of Chain Lengths: � � � h ( a, b ) := A D h 1 , h 2 , . . . , h j , . . . , h M , � h ( a, b ) := � ¯ h/ ( H − 1) 0 Extremes: ¯ ¯ h ∗ ( a, b ) = min h j , h ∗ ( a, b ) = min h j , h j ∈ � h j ∈ � h ( a,b ) ¯ ¯ h ( a,b ) h ∗ ( a, b ) = h ∗ ( a, b ) = ¯ ¯ max max h j , h j . h j ∈ � h j ∈ � ¯ ¯ h ( a,b ) h ( a,b ) Chains: C j = { γ ( a, c 1 ) , . . . , γ ( c h j − 3 , c h j − 2 ) , γ ( c h j − 2 , b ) } for some collection of nodes { c 1 , c 2 , . . . , c i , . . . c h j − 2 } ⊆ P, 1 ≤ i ≤ h j − 2. C j = a < · c 1 < · . . . < · c h j − 3 < · c h j − 2 < · b, γ i ∈ C j , 1 ≤ i ≤ h j Cliff Joslyn, joslyn@lanl.gov dimacs05f, p. 14, 3/8/2005

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend