CanDIG Distributed na0onal analyses of locally- controlled genomic - - PowerPoint PPT Presentation

candig
SMART_READER_LITE
LIVE PREVIEW

CanDIG Distributed na0onal analyses of locally- controlled genomic - - PowerPoint PPT Presentation

CanDIG Distributed na0onal analyses of locally- controlled genomic data h:p://distributedgenomics.ca 1 genomicsandhealth.org Canadian Distributed Infrastructure for Genomics (CanDIG) New (start date: this spring) 4-year funded Canadian project


slide-1
SLIDE 1

genomicsandhealth.org

CanDIG

Distributed na0onal analyses of locally- controlled genomic data h:p://distributedgenomics.ca

1

slide-2
SLIDE 2

genomicsandhealth.org

Canadian Distributed Infrastructure for Genomics (CanDIG)

2

New (start date: this spring) 4-year funded Canadian project to enable batch and interac=ve analysis over na=onal cohorts with provincially controlled private genomic data - send analyses to data.

slide-3
SLIDE 3

genomicsandhealth.org

CanDIG:

  • Over coming months:
  • Support paediatric cancer project (PROFYLE)
  • Provide data directory, dashboard, coordinate processing
  • Expand to directly suppor=ng analyses
  • Support for basket-type cancer clinical trial project (CaMPACT)
  • Distributed data plaPorm
  • Support clinician decision-making by interfacing with cBioPortal
  • By year 4:
  • Large scale data directory
  • Analysis interface to large amount of research & clinical genomics data
  • “App store” of available analyses - interac=ve and batch
  • Privacy layer
  • Programa=c access for development of new distributed analyses methods

3

Canadian Distributed Infrastructure for Genomics (CanDIG)

slide-4
SLIDE 4

genomicsandhealth.org

PlaBorm Goals - Fully Distributed:

  • Par=cipa=ng sites: provide access

to data, source of user requests

  • Distributed synchroniza=on of

apps available, project membership, etc.

  • Sites authen=cate their users
  • Local sites control access to their

data

4

Canadian Distributed Infrastructure for Genomics (CanDIG)

slide-5
SLIDE 5

genomicsandhealth.org

PlaBorm Goals - API access:

  • Want all data access to be through APIs:

logging, audibility; no processes dropped in directory of files.

  • Maybe no files: opaque back-end to different

data stores (files, variant data bases, etc)

  • WES (Cloud) and Reads/Variants servers

communica=ng internally via htsget (Large- Scale Genomics)

  • Metadata/clinical data standards (Clinical &

Pheno Data Capture)

5

Variants

Workflows

Canadian Distributed Infrastructure for Genomics (CanDIG)

slide-6
SLIDE 6

genomicsandhealth.org

PlaBorm Goals - AAI:

  • Authen=ca=on: Federated OpenID

Connect

  • Local site authorizes

based on remote ID and distributed role informa=on

  • Verified tokens used internally

amongst services

  • Build with eye towards future

interoperability with DURI

6

? ! ? !

Canadian Distributed Infrastructure for Genomics (CanDIG)

slide-7
SLIDE 7

genomicsandhealth.org

Work so far - interac0ve analysis

  • Less obvious it would work nicely in
  • ur federated context
  • E.g., re-crea=ng some classic

thousand genomes figures across federated datasets - small regions for interac=vity

7

Canadian Distributed Infrastructure for Genomics (CanDIG)

slide-8
SLIDE 8

genomicsandhealth.org

Work so far - interac0ve analysis

  • Less obvious it would work nicely in
  • ur federated context
  • E.g., re-crea=ng some classic

thousand genomes figures across federated datasets - small regions for interac=vity

8

Canadian Distributed Infrastructure for Genomics (CanDIG)

slide-9
SLIDE 9

genomicsandhealth.org

Work so far - interac0ve analysis

  • Less obvious it would work nicely in
  • ur federated context
  • E.g., re-crea=ng some classic

thousand genomes figures across federated datasets - small regions for interac=vity

9

Canadian Distributed Infrastructure for Genomics (CanDIG)

slide-10
SLIDE 10

genomicsandhealth.org

Work so far - interac0ve analysis

  • Less obvious it would work nicely in
  • ur federated context
  • E.g., re-crea=ng some classic

thousand genomes figures across federated datasets - small regions for interac=vity

10

Canadian Distributed Infrastructure for Genomics (CanDIG)

slide-11
SLIDE 11

genomicsandhealth.org

Work so far - interac0ve analysis

  • Needed to greatly enhance R & V server

performance

  • Serializa=on
  • “Column-oriented” approach to

(e.g.) FORMAT fields

  • Contributed back
  • J. Foong, HSC
  • Gives good indica=on on where

aggrega0on, filtering queries will be needed

  • Federated queries in a CanDIG layer

11

Canadian Distributed Infrastructure for Genomics (CanDIG)

slide-12
SLIDE 12

genomicsandhealth.org

Work so far - differen0al privacy

  • With coun=ng queries, raises possibility

for introducing (e.g.) differen=al privacy

  • Make it easier for sites to make available

data they might not otherwise

  • Federated classifier training with

differen=al privacy over R&V API:

  • What approach works best, with real

privacy model?

  • What happens when different sites

have different privacy requirements?

  • N. Memon, BCGSC

12

Canadian Distributed Infrastructure for Genomics (CanDIG)

slide-13
SLIDE 13

genomicsandhealth.org

Work so far - authen0ca0on

  • Robust, standards-based OIDC

authen=ca=on for R&V server

  • R. deBorja and others, UHN

13

Canadian Distributed Infrastructure for Genomics (CanDIG)

slide-14
SLIDE 14

genomicsandhealth.org

Current work - PROFYLE

  • Na=onal paediatric precision oncology

project

  • Data catalog/dashboard for project
  • Extend to analyses, data access
  • Exis=ng work w/ IGV.html,

simple analyses (joint variant calling at locus)

  • Extended support for metadata access
  • Schemas for experiments / analyses will

need con=nued work

14

Canadian Distributed Infrastructure for Genomics (CanDIG)

slide-15
SLIDE 15

genomicsandhealth.org

Current work - CaMPACT

  • Oncology basket trial
  • cBioPortal for clinician data

explora=on

  • Remote data access, ingest into

cBioPortal

  • Extend to remote data API?

15

Canadian Distributed Infrastructure for Genomics (CanDIG)

slide-16
SLIDE 16

genomicsandhealth.org

Coming months

  • Begin building on work of Cloud team for batch processing/analysis:
  • TES (Funnel), WES; DOS?
  • Con=nue building on work of LSG team:
  • Incorporate htsget for internal transfers
  • Building AAI API gateway
  • Building on, contribu=ng to metadata standards, EHR ingest (Clinical

& Pheno capture)

16

Canadian Distributed Infrastructure for Genomics (CanDIG)

slide-17
SLIDE 17

genomicsandhealth.org

Longer-term work

  • Reads API: search by content of reads (string), quality, and not

just mapped loca=on

  • Work towards interoperability with DURI for Researcher ID and

data use/authoriza=on

  • Interoperability between LSG & Cloud team genomic data

access models

  • Discovery APIs atop our plaPorm

17

Canadian Distributed Infrastructure for Genomics (CanDIG)