the niehs data commons
play

The NIEHS Data Commons Deep Patel, Mike Conway Office of Data - PowerPoint PPT Presentation

The NIEHS Data Commons Deep Patel, Mike Conway Office of Data Science National Institute of Environmental Health Sciences National Institutes of Health U.S. Department of Health and Human Services 1 The NIEHS Office of Data Science Who


  1. The NIEHS Data Commons Deep Patel, Mike Conway Office of Data Science National Institute of Environmental Health Sciences National Institutes of Health • U.S. Department of Health and Human Services 1

  2. The NIEHS Office of Data Science Who are we? “The mission of the Office of Data Science is to accelerate scientific discovery, foster collaborative research, and ultimately improve public health through the application of scientific data and knowledge management in the environmental health sciences.” National Institutes of Health 2 U.S. Department of Health and Human Services

  3. Commons objectives Develop a standards-based commons • Beginning with internal researchers, managing data originating from core laboratories, including next-gen sequencing data. • Define organizational policies to handle data life-cycle • Track provenance and relationship of data sets to source data and analysis National Institutes of Health 3 U.S. Department of Health and Human Services

  4. Commons objectives Manage metadata for discoverability and long-term usability • FAIR Data • Develop standard metadata, including controlled vocabularies and ontologies • Automatic metadata from instruments, pipelines, and computer- actionable policies • Support multiple indexes and search technologies for data discovery and re-use • Allow publication to reference collections, such as NCBI GEO National Institutes of Health 4 U.S. Department of Health and Human Services

  5. Commons objectives Support integration and use of data in computation and analysis • Ease discovery and access through common tools and platforms • Securely share data with collaborators • Allow audit and enforcement of access and data usage agreements • Track provenance and authenticity • Ensure reproducibility National Institutes of Health 5 U.S. Department of Health and Human Services

  6. The NIEHS Data Commons NIH Data Commons Data/Tools Enrollment Secure Collaboration, Data/Tools Analysis, and Discovery Workflow Execution Data Commons APIs National Institutes of Health 6 U.S. Department of Health and Human Services

  7. Data Commons serving a full data life-cycle Current ‘commons’ efforts (e.g. NIH Commons) focus on the mature part of the research data lifecycle and say less about where the data comes from! NIEHS Concerns: • Metadata quality • Delivery to PI • Appropriate sharing within project • Retention, compliance • Ingest pipelines NIH Concerns: • FAIR • Publishing • Data sharing/licensing • Discoverability • Analytics and derived data Moore, Reagan W., et al. "White Paper: National Data Infrastructure for Earth System National Institutes of Health Science." 7 U.S. Department of Health and Human Services

  8. Commons ‘Patterns’ • Let’s look at the NIEHS Commons and see where patterns come into play. – How do we as a community develop frameworks around iRODS capabilities and the philosophy of policy-based data management that ease development? – How do we develop a pattern language and architectural discipline and talk with each other about systems that support FAIR and Big Data? – The Consortium is already developing a pattern catalog, and this is a Good.Thing . National Institutes of Health 8 U.S. Department of Health and Human Services

  9. Extracting Patterns… • Shout out to the Consortium folks, this may be the ‘next thing’. • How would a good catalog of patterns translate into frameworks and capabilities in iRODS? Patterns from https://irods.org/documentation/ National Institutes of Health 9 U.S. Department of Health and Human Services

  10. Core Labs Ingest and Pipelines Clarity or Tiering Landing Instrument other LIMS Zone Instrument s Instruments s NextGen Sequencing NAS Data-to- Storage compute File Resource Scanner CWL,NextFlow Procedure/Provenance Synch, tiering, Compute metadata staging, replication -to-data DDN commonsProdZone National Institutes of Health 10 U.S. Department of Health and Human Services

  11. Metadata Support *Virtual Collections ???? NIEHS Central Ontology/CV *Metadata Templates Service Instrument Instrument s Vocabs s Index/Search Platforms *Indexing Framework commonsProdZone National Institutes of Health 11 U.S. Department of Health and Human Services

  12. New Challenges • Managing immutable archives (e.g. BDBag) and persistent identifiers • Managing federated authn/authz • Integrating the Data Commons into the workflows and daily routines of researchers in non-disruptive ways that make their work easier, not more difficult • Keeping the focus on science, not cyberinfrastructure National Institutes of Health 12 U.S. Department of Health and Human Services

  13. Big Data is Big Preservation • Let’s not forget our roots, and how this applies now more than ever. • OAIS and related concepts, including trusted digital preservation provide lots of useful language and a good conceptual framework to add to the ‘cloud’, ‘FAIR’, and NSF ‘Cyberinfrastructure for the 21 st Century’ frame. • FAIR does not matter if the data turns out to be lost! National Institutes of Health 13 U.S. Department of Health and Human Services

  14. Acknowlegements NIEHS: Beth Bowden, John Bucher, Allen Dearry, Leesa Deterding, Michael Devito, Christopher Duncan, Matthew Edin, Thomas Van'T Erve, John Grovenstein, Guang Hu, Mary Jacobson, Jeffrey Kuhn, Beth Lauderdale, Jian-Liang Li, Alex Merrick, Geoffrey Mueller, Suzanne Osborne, Scott Redman, Andy Shapiro, Troy Simpson, Chris Stone, Cheryl Thompson, Paul Wade, Deborah Wales, Jason Williams, Rick Woychik; Renaissance Computing Institute: (RENCI); iRODS Consortium National Institutes of Health 14 U.S. Department of Health and Human Services

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend