DATA IN THE CLOUD Christopher R. Kinsinger, National Cancer - - PowerPoint PPT Presentation

data in the cloud
SMART_READER_LITE
LIVE PREVIEW

DATA IN THE CLOUD Christopher R. Kinsinger, National Cancer - - PowerPoint PPT Presentation

CONSIDERATIONS FOR CANCER RESEARCH DATA IN THE CLOUD Christopher R. Kinsinger, National Cancer Institute Outline Introduction to Cancer Research Data Commons Considerations for shifting the proteomics research community to the cloud


slide-1
SLIDE 1

Christopher R. Kinsinger, National Cancer Institute

CONSIDERATIONS FOR CANCER RESEARCH DATA IN THE CLOUD

slide-2
SLIDE 2

▪ Introduction to Cancer Research Data Commons ▪ Considerations for shifting the proteomics research community to the cloud

Outline

slide-3
SLIDE 3

Clark, DJ et al. Cell, 2019, 179, 964-983

Integrating across data types

slide-4
SLIDE 4

4

The Beau Biden Cancer Moonshotsm

Overarching goals

  • Accelerate progress in cancer, including

prevention & screening

  • From cutting edge basic research to

wider uptake of standard of care

  • Encourage greater cooperation and

collaboration

  • Within and between academia,

government, and private sector

  • Enhance data sharing

Blue Ribbon Panel – October, 2016

Recommendations include:

  • Build a National Cancer Data Ecosystem
  • Enhanced cloud-computing platforms
  • Services that link disparate

information, including clinical, image, and molecular data

  • Essential underlying data science

infrastructure, standards, methods, and portals for the Cancer Data Ecosystem

The Beau Biden Cancer Moonshotsm

slide-5
SLIDE 5

The Cancer Imaging Archive*

TCIA

Components:

  • Data Nodes
  • Data Commons

Framework

  • Data Aggregators
  • Cloud Resources
  • APIs
  • Elastic compute

resources

  • Portals
  • Workspaces
  • Analytic Tools
  • Tool repositories

Clinical Proteomics Tumor Analysis Consortium*

Canine Immuno-oncology studies

Data Sources:

Data Scientists

slide-6
SLIDE 6

Fence

Centralized Authentication & Authorization

IndexD

Centralized Indexing

Reusable, expandable framework for a Data Commons Core principles and structures for a Data Commons Set of modular components that can be leveraged across Data Commons NCI is developing the Framework and will use it to stand up several example Data Commons the community can leverage or use as a model to build their own commons.

https://dcf.gen3.org/

Data Commons Framework powered by Gen3 (U of Chicago)

slide-7
SLIDE 7

7

The Cloud Resources provide:

  • Access to large cancer data sets without need to download
  • Access to workspaces, analysis tools, and pipelines
  • Ability for researchers to bring their own data and tools

http://firecloud.terra.bio/# http://cgc.sbgenomics.com http://isb-cgc.org

NCI Cloud Resources

slide-8
SLIDE 8
  • Launched in 2016 with
  • ver 4 PB of data.
  • Joint project with OICR.
  • Used by 1000 -2000+

users per day.

  • Based upon an open

source software stack that can be used to build

  • ther data commons.

*See: NCI Genomic Data Commons: Grossman, Robert L., et al. "Toward a shared vision for cancer genomic data." New England Journal of Medicine 375.12 (2016): 1109-1112.

NCI Genomic Data Commons (GDC)

slide-9
SLIDE 9

PDC - https://pdc.esacinc.com/pdc/pdc

slide-10
SLIDE 10

▪ Research has been focused on data production ▪ Standard formats ▪ Data shared through a small number of interconnected repositories ▪ Varied analyses ▪ Difficult to reproduce an analysis ▪ Stable research community ▪ Slowly permeating biomedical research space

State of proteomics data

slide-11
SLIDE 11

▪ Data shared through downloading ▪ Analyses performed on servers at local institutions ▪ Popular workflows not yet available in cloud ▪ CLOUD Act (outside US) ▪ Disparity of felt cost

Cultural challenges

slide-12
SLIDE 12

Carrots and sticks

Carrots

  • Supported analysis pipelines
  • Reproducible analyses
  • Cost effective data solution
  • Data security
  • Easy collaboration
  • Recognition from data science

community

Potential Sticks

  • Reduce NIH support for local

computation

  • Journals require data deposition

in cloud

  • Datasets not on cloud become

irrelevant

slide-13
SLIDE 13

NCI

  • Tony Kerlavage
  • Vivian Ota Wang
  • Juli Klemm
  • Tanja Davidsen
  • Craig Hayn
  • Ian Fore
  • Elizabeth Hsu
  • Sherri De Coronado
  • Sima Pandya
  • Todd Pihl
  • Jaime Guidry Auvil
  • Freddie Pruitt
  • Zhining Wang
  • Eve Shalley
  • John Otridge
  • Allen Dearry
  • Kanakadurga Addepalli
  • Erika Kim

NCI

  • Lyubov Remennik
  • David Patton
  • Nina Ghanem
  • Matthew Byers
  • Melissa Cook
  • Mark Jensen
  • Eric Scott
  • Dale Lamb
  • Melissa Cook
  • Keyvan Farahani
  • Sylvia Gale
  • Johanna Goderre Jones
  • Cathy Rowe
  • Smita Hastak
  • Denise Warzel
  • Anna Mencarelli
  • Barbara Vann
  • Resham Kulkarni

NCI

  • Henry Rodriguez
  • Emily Boja
  • Tara Hiltke
  • Mehdi Mesri
  • Ana Robles
  • Anna Roberts-Pilgrim
  • Annette Marrero
  • Dawn Hayward

PDC

  • Anand Basu
  • Ratna Thangudu
  • Michael Holck
  • Michael MacCoss
  • Paul Rudnick

Acknowledgments