candig
play

CanDIG Distributed na0onal analyses of locally- controlled genomic - PowerPoint PPT Presentation

CanDIG Distributed na0onal analyses of locally- controlled genomic data h:p://distributedgenomics.ca 1 genomicsandhealth.org Canadian Distributed Infrastructure for Genomics (CanDIG) New (start date: this spring) 4-year funded Canadian project


  1. CanDIG Distributed na0onal analyses of locally- controlled genomic data h:p://distributedgenomics.ca 1 genomicsandhealth.org

  2. Canadian Distributed Infrastructure for Genomics (CanDIG) New (start date: this spring) 4-year funded Canadian project to enable batch and interac=ve analysis over na=onal cohorts with provincially controlled private genomic data - send analyses to data. genomicsandhealth.org 2

  3. Canadian Distributed Infrastructure for Genomics (CanDIG) CanDIG : ● Over coming months: ● Support paediatric cancer project (PROFYLE) ● Provide data directory, dashboard, coordinate processing ● Expand to directly suppor=ng analyses ● Support for basket-type cancer clinical trial project (CaMPACT) ● Distributed data plaPorm ● Support clinician decision-making by interfacing with cBioPortal ● By year 4: ● Large scale data directory ● Analysis interface to large amount of research & clinical genomics data ● “App store” of available analyses - interac=ve and batch ● Privacy layer ● Programa=c access for development of new distributed analyses methods genomicsandhealth.org 3

  4. Canadian Distributed Infrastructure for Genomics (CanDIG) PlaBorm Goals - Fully Distributed: ● Par=cipa=ng sites: provide access to data, source of user requests ● Distributed synchroniza=on of apps available, project membership, etc. ● Sites authen=cate their users ● Local sites control access to their data genomicsandhealth.org 4

  5. Canadian Distributed Infrastructure for Genomics (CanDIG) Variants Workflows PlaBorm Goals - API access: ● Want all data access to be through APIs: logging, audibility; no processes dropped in directory of files. ● Maybe no files: opaque back-end to different data stores (files, variant data bases, etc) ● WES ( Cloud ) and Reads/Variants servers communica=ng internally via htsget ( Large- Scale Genomics ) ● Metadata/clinical data standards ( Clinical & Pheno Data Capture ) genomicsandhealth.org 5

  6. Canadian Distributed Infrastructure for Genomics (CanDIG) PlaBorm Goals - AAI: Authen=ca=on: Federated OpenID ● ? ? Connect ! ! Local site authorizes ● based on remote ID and distributed role informa=on Verified tokens used internally ● amongst services Build with eye towards future ● interoperability with DURI genomicsandhealth.org 6

  7. Canadian Distributed Infrastructure for Genomics (CanDIG) Work so far - interac0ve analysis Less obvious it would work nicely in ● our federated context E.g., re-crea=ng some classic ● thousand genomes figures across federated datasets - small regions for interac=vity genomicsandhealth.org 7

  8. Canadian Distributed Infrastructure for Genomics (CanDIG) Work so far - interac0ve analysis Less obvious it would work nicely in ● our federated context E.g., re-crea=ng some classic ● thousand genomes figures across federated datasets - small regions for interac=vity genomicsandhealth.org 8

  9. Canadian Distributed Infrastructure for Genomics (CanDIG) Work so far - interac0ve analysis Less obvious it would work nicely in ● our federated context E.g., re-crea=ng some classic ● thousand genomes figures across federated datasets - small regions for interac=vity genomicsandhealth.org 9

  10. Canadian Distributed Infrastructure for Genomics (CanDIG) Work so far - interac0ve analysis Less obvious it would work nicely in ● our federated context E.g., re-crea=ng some classic ● thousand genomes figures across federated datasets - small regions for interac=vity genomicsandhealth.org 10

  11. Canadian Distributed Infrastructure for Genomics (CanDIG) Work so far - interac0ve analysis Needed to greatly enhance R & V server ● performance Serializa=on ● “Column-oriented” approach to ● (e.g.) FORMAT fields Contributed back ● J. Foong, HSC ● Gives good indica=on on where ● aggrega0on , filtering queries will be needed Federated queries in a CanDIG layer ● genomicsandhealth.org 11

  12. Canadian Distributed Infrastructure for Genomics (CanDIG) Work so far - differen0al privacy With coun=ng queries, raises possibility ● for introducing ( e.g. ) differen=al privacy Make it easier for sites to make available ● data they might not otherwise Federated classifier training with ● differen=al privacy over R&V API: What approach works best, with real ● privacy model? What happens when different sites ● have different privacy requirements? N. Memon, BCGSC ● genomicsandhealth.org 12

  13. Canadian Distributed Infrastructure for Genomics (CanDIG) Work so far - authen0ca0on Robust, standards-based OIDC ● authen=ca=on for R&V server R. deBorja and others, UHN ● genomicsandhealth.org 13

  14. Canadian Distributed Infrastructure for Genomics (CanDIG) Current work - PROFYLE ● Na=onal paediatric precision oncology project ● Data catalog/dashboard for project ● Extend to analyses, data access ● Exis=ng work w/ IGV.html, simple analyses (joint variant calling at locus) ● Extended support for metadata access ● Schemas for experiments / analyses will need con=nued work genomicsandhealth.org 14

  15. Canadian Distributed Infrastructure for Genomics (CanDIG) Current work - CaMPACT Oncology basket trial ● cBioPortal for clinician data ● explora=on Remote data access, ingest into ● cBioPortal Extend to remote data API? ● genomicsandhealth.org 15

  16. Canadian Distributed Infrastructure for Genomics (CanDIG) Coming months ● Begin building on work of Cloud team for batch processing/analysis: ● TES (Funnel), WES; DOS? ● Con=nue building on work of LSG team: ● Incorporate htsget for internal transfers ● Building AAI API gateway ● Building on, contribu=ng to metadata standards, EHR ingest ( Clinical & Pheno capture ) genomicsandhealth.org 16

  17. Canadian Distributed Infrastructure for Genomics (CanDIG) Longer-term work ● Reads API: search by content of reads (string), quality, and not just mapped loca=on ● Work towards interoperability with DURI for Researcher ID and data use/authoriza=on ● Interoperability between LSG & Cloud team genomic data access models ● Discovery APIs atop our plaPorm genomicsandhealth.org 17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend