DATA IN THE CLOUD Christopher R. Kinsinger, National Cancer - - PowerPoint PPT Presentation
DATA IN THE CLOUD Christopher R. Kinsinger, National Cancer - - PowerPoint PPT Presentation
CONSIDERATIONS FOR CANCER RESEARCH DATA IN THE CLOUD Christopher R. Kinsinger, National Cancer Institute Outline Introduction to Cancer Research Data Commons Considerations for shifting the proteomics research community to the cloud
▪ Introduction to Cancer Research Data Commons ▪ Considerations for shifting the proteomics research community to the cloud
Outline
Clark, DJ et al. Cell, 2019, 179, 964-983
Integrating across data types
4
The Beau Biden Cancer Moonshotsm
Overarching goals
- Accelerate progress in cancer, including
prevention & screening
- From cutting edge basic research to
wider uptake of standard of care
- Encourage greater cooperation and
collaboration
- Within and between academia,
government, and private sector
- Enhance data sharing
Blue Ribbon Panel – October, 2016
Recommendations include:
- Build a National Cancer Data Ecosystem
- Enhanced cloud-computing platforms
- Services that link disparate
information, including clinical, image, and molecular data
- Essential underlying data science
infrastructure, standards, methods, and portals for the Cancer Data Ecosystem
The Beau Biden Cancer Moonshotsm
The Cancer Imaging Archive*
TCIA
Components:
- Data Nodes
- Data Commons
Framework
- Data Aggregators
- Cloud Resources
- APIs
- Elastic compute
resources
- Portals
- Workspaces
- Analytic Tools
- Tool repositories
Clinical Proteomics Tumor Analysis Consortium*
Canine Immuno-oncology studies
Data Sources:
Data Scientists
Fence
Centralized Authentication & Authorization
IndexD
Centralized Indexing
Reusable, expandable framework for a Data Commons Core principles and structures for a Data Commons Set of modular components that can be leveraged across Data Commons NCI is developing the Framework and will use it to stand up several example Data Commons the community can leverage or use as a model to build their own commons.
https://dcf.gen3.org/
Data Commons Framework powered by Gen3 (U of Chicago)
7
The Cloud Resources provide:
- Access to large cancer data sets without need to download
- Access to workspaces, analysis tools, and pipelines
- Ability for researchers to bring their own data and tools
http://firecloud.terra.bio/# http://cgc.sbgenomics.com http://isb-cgc.org
NCI Cloud Resources
- Launched in 2016 with
- ver 4 PB of data.
- Joint project with OICR.
- Used by 1000 -2000+
users per day.
- Based upon an open
source software stack that can be used to build
- ther data commons.
*See: NCI Genomic Data Commons: Grossman, Robert L., et al. "Toward a shared vision for cancer genomic data." New England Journal of Medicine 375.12 (2016): 1109-1112.
NCI Genomic Data Commons (GDC)
PDC - https://pdc.esacinc.com/pdc/pdc
▪ Research has been focused on data production ▪ Standard formats ▪ Data shared through a small number of interconnected repositories ▪ Varied analyses ▪ Difficult to reproduce an analysis ▪ Stable research community ▪ Slowly permeating biomedical research space
State of proteomics data
▪ Data shared through downloading ▪ Analyses performed on servers at local institutions ▪ Popular workflows not yet available in cloud ▪ CLOUD Act (outside US) ▪ Disparity of felt cost
Cultural challenges
Carrots and sticks
Carrots
- Supported analysis pipelines
- Reproducible analyses
- Cost effective data solution
- Data security
- Easy collaboration
- Recognition from data science
community
Potential Sticks
- Reduce NIH support for local
computation
- Journals require data deposition
in cloud
- Datasets not on cloud become
irrelevant
NCI
- Tony Kerlavage
- Vivian Ota Wang
- Juli Klemm
- Tanja Davidsen
- Craig Hayn
- Ian Fore
- Elizabeth Hsu
- Sherri De Coronado
- Sima Pandya
- Todd Pihl
- Jaime Guidry Auvil
- Freddie Pruitt
- Zhining Wang
- Eve Shalley
- John Otridge
- Allen Dearry
- Kanakadurga Addepalli
- Erika Kim
NCI
- Lyubov Remennik
- David Patton
- Nina Ghanem
- Matthew Byers
- Melissa Cook
- Mark Jensen
- Eric Scott
- Dale Lamb
- Melissa Cook
- Keyvan Farahani
- Sylvia Gale
- Johanna Goderre Jones
- Cathy Rowe
- Smita Hastak
- Denise Warzel
- Anna Mencarelli
- Barbara Vann
- Resham Kulkarni
NCI
- Henry Rodriguez
- Emily Boja
- Tara Hiltke
- Mehdi Mesri
- Ana Robles
- Anna Roberts-Pilgrim
- Annette Marrero
- Dawn Hayward
PDC
- Anand Basu
- Ratna Thangudu
- Michael Holck
- Michael MacCoss
- Paul Rudnick