Capacity building in the cloud for Data Intensive Cancer Genomics
Bruce Press, Ci4CC Fall meeting October 1, 2018
Capacity building in the cloud for Data Intensive Cancer Genomics - - PowerPoint PPT Presentation
Capacity building in the cloud for Data Intensive Cancer Genomics Bruce Press, Ci4CC Fall meeting October 1, 2018 The rate of data generation is accelerating rapidly. 50 years 73 days Densen, P. Trans Am Clin Climatol Assoc 2011 Using this
Bruce Press, Ci4CC Fall meeting October 1, 2018
73 days 50 years
Densen, P. Trans Am Clin Climatol Assoc 2011
Scalable & Secure Environments Data Sharing & Collaboration Data Analysis Fluency Data Harmonization & Organization Technology Social
purchase and install hardware.
researchers at institutions without large compute infrastructure investments can access powerful data and compute resources.
reduces need for backup copies.
researchers can access data without needing to physically copy it. Old model: send data to compute New model: send compute to data
security/compliance ‘out of the box’.
methods, across multiple underlying cloud infrastructures.
not managing computational resources.
and the Cancer Research Data Commons) paved the way for secure access and analysis of high value datasets in the cloud.
mechanisms enable approved researchers to access Controlled data initially from TCGA and TARGET and now an expanding set of data resources.
reducing replication of data and speed research by avoiding download times.
https://cbiit.cancer.gov/ncip/cancer-research-data-commons
Gil Press, Forbes 2016
advanced search allows finding data of interest from enormous repositories.
properties most interesting for a particular research question tend to be unique
highly important for birth defect research but not a typical variable for adult cancer research.
Common Workflow Language and packaging tools in Docker containers, the exact routine used for large harmonization efforts can be applied to novel data.
WES allows the same analysis to be performed on multiple platforms.
workflow run on GTEX files.
to facilitate reproducibility and extension
levels of data and analysis access.
researchers to discuss analyses and results in situ.
facilitate data sharing since there’s no additional cost for more researchers to access and analyze data.
broadly available without embargo while ensuring compliance with patient consents - CHOP has led this charge via the CAVATICA platform.
data, methods, and results in a Findable, Accessible, Interoperable and Reusable(FAIR data principles) way.
provides powerful teaching resource.
training sessions are important to build expertise across individuals with diverse backgrounds.
allow automation and optimization by advanced users while visual interfaces support a broad user base.
Work presented was funded in whole or in part by: HHSN261201400008C, HHSN261200800001E, U2C HL138346-01, OT3 HL142478, OT3 OD02546 and U24CA224067