data: challenges and opportunities A/Prof Zornitza Stark and Dr - - PowerPoint PPT Presentation
data: challenges and opportunities A/Prof Zornitza Stark and Dr - - PowerPoint PPT Presentation
International sharing of genomic and clinical data: challenges and opportunities A/Prof Zornitza Stark and Dr Alejandro Metke First human genome 2003 >10 years, USD$3 billion Genome sequencing: cost and time 19.5 hours 3 Genomic ic
>10 years, USD$3 billion First human genome 2003
3
Genome sequencing: cost and time
19.5 hours
4
Genomic ic testin ing in in healt lthcare: next xt 5 years
60,000,000 patients
5
Rare disease Cancer Infectious disease Drug response Common disease Population screening
2012 ~1%
Percentage of whole genomes and exomes that are funded by healthcare systems
2018 ~20% 2022 >80%
Areas of clinical uptake: infectious disease, cancer, rare disease, common/chronic
The world is changing
The GA4GH Ecosystem
Global Alliance members include: Universities and research institutes (22%) Academic medical centers and health systems (10%) Disease advocacy organizations and patient groups (4%) Consortia and professional societies (13%) Funders and agencies (5%) Life science and information technology companies (46%)
3000+ Subscribers 70+ Countries 550+ Organizational Members
Global Participation
70+ Countries represented
Afghanistan Finland Mexico South Korea Argentina France Morocco Spain Australia Georgia Nepal Sri Lanka Austria Germany Netherlands Sudan Belgium Ghana New Zealand Sweden Botswana Greece Nicaragua Switzerland Brazil Hong Kong Niger Taiwan Cameroon India Nigeria Tanzania Canada Ireland Norway Tunisia China Israel Peru Turkey Colombia Italy Philippines Uganda Congo Japan Portugal Ukraine Costa Rica Kenya Qatar United Kingdom Croatia Luxembourg Russian Federation United States Czech Republic Malawi Sierra Leone Uruguay Denmark Malaysia Singapore Venezuela Egypt Mali Slovenia Virgin Islands, U.S. Estonia Mauritius South Africa
Roadmap & Leadership Roles
9
Driver Project Champions Work Stream Leads Work Stream Contributors
- Give high-level input on GA4GH activities and Roadmap tools
- ALL DPCs: attend bi-annual in-person SC meetings
- 1 representative DPC from each project: join bi-annual SC calls
- Act as ‘team leads’ (e.g. appointing Contributors on Work Streams)
- Support GA4GH tool implementation within the Driver Projects
- “Community-minded”
leaders with bandwidth to ensure delivery of tools at the expected rate
- Ensure balanced input
from multiple DPs on tool development
- Chair WS calls
- Participate in quarterly SC
meetings
- Actively contribute to
development of deliverables
- Represent needs of Driver
Projects on WS calls
- Liaise between WS
activities & DPCs
Direct Engagement Indirect Engagement
Data Sharing Challenges
Legal and ethical Genomic data Clinical data
A new paradigm
Data Copying Data Visiting
FROM TO
Federation
Open research data Healthcare data with research use analysis analysis Aggregate data globally Download, analyse locally Continues for basic research Analyse data locally (via VMs) Collate analyses New approach for both research and healthcare
Core Principles of Data Sharing
Enable international data sharing Promote sharing across the translational continuum (discovery research, clinical trials, healthcare system, diagnostic labs, industry) Encourages technology-enabled federated approaches (bring analysis to the data) Promote interoperability
- Scientific: Standards adoption; transparent documentation
- Technical: Standardized file formats, variant calling protocols, variant & gene annotation
- Ethical: Consent policies to ensure data can be shared internationally
15
Global Learning for Health
Interoperable APIs, standards & frameworks to support global data sharing
Research Healthcare
Genomic Knowledge Exchanges
Real World Driver Projects: Develop and test standards, tools and frameworks for data sharing
Clinical Sequencing Laboratories
Data Quality Assessment Quality reports on BAM + VCF data produced by qprofiler software
Australian Genomics Study Database
Genomic Data VCF BAM FASTQ
Gen-Phen Database ‘Variant Atlas’
Genotypes and Phenotypes available for interactive summary-level queries and visualisations
Genomic Data Repository
Phase 1 genomic data store Phase 2 comprehensive genomic data catalogue
Data Access and Approvals System
Approvals issued automatically (low-sensitivity)
- r via data access review (high-sensitivity)
Standardised Clinical Phenotypes Patient phenotype data coded in SNOMED / HPO terms and represented in FHIR format
‘Shariant’ Platform
Classified variants and curation evidence shared across laboratories Classified Variants Reported & Unreported Curation Evidence Metadata laboratory, sequencer, library preparation etc
Program Two Data Management Work Flow and Capabilities
Consent + CTRL Dynamic Consent Platform
Access to individual-level data Access to summary-level data
Genotypes Phenotypes
Flagship Clinical Phenotypes Patient phenotype data Data Access Agreements Data Governance Policies Data Access Committee Upload Genomic Data + Associated Metadata
Secondary use of data
Genomics England: 100,000 Genomes Project
UK National Genomics Informatics Service (NGIS)
An evolution of the 100,000 Genomes Project platform
National Genomic ic In Informatics Service
Primary Care NHS Trusts
WGS Sequencing Service
Value Extraction Community
Research Support Service Data Management Service National BioInformatics Service
NHS England’s Genomics Unit Genomic Lab Hubs
Secondary & Longitudinal Clinical Data
GMCs
MDTs Diagnostic returns
National Genomic Data Store
Decision Support Service National Genomic Test Ordering Service Authentication Service NPEx Service
1 6 7 8 9
Identifiable Data De-Identified Data
Hu bs
11
eConsent Service Panel Assigner Pedigree Tool
4 5 10
National Genomic Test Directory Service Elucidata NPEx Illumina Optum GeL Illumina Congenica Fabric
Delivery Partners
Zone Digital
2
Sample Tracking Service
3
Clinical data sharing: harmonizing data capture and exchange
Clinical data Analysis pipelines Variant-level interpretation Case-level interpretation Normal variation Disease cohorts Matchmaking Diagnostics Research
Relevant High quality Accurate Machine-readable Interoperable Time Effort Skill Scalability Risk: poor quality, incomplete data = suboptimal interpretation
The problem
Mapping, harmonization, exchange
Clinical data model NGIS
Consensus clinical data
Common with other tests:
- Name, surname
- DOB
- Gender: phenotypic/karyotypic
- Contact details
- Identifiers: study/hospital
- Referring clinician/centre/contact details
Consensus clinical data
Unique to genetics/genomics:
- Consent status +/-additional findings, data sharing, research
- Pedigree/consanguinity
- Additional family members to be tested and affected/unaffected
status/consent status
- Suspected clinical diagnosis/gene
- Gene panels for prioritized analysis
Consensus phenotypic data
- Phenotype using standard terminology: HPO
- (Relevant prior tests: genetic and non-genetic)
Phenotype capture
Phenotype capture
Acute Care: HPO terms in REDCap
(via Ontoserver and REDCap Ontology Module)
Ethnicity
Understanding rare variation Clinical diagnostics: Is this variant actually rare/absent? Errors in interpretation More VOUS, false pos, false negs Research: Responsibility to build diverse and representative datasets
Capturing ethnicity: problems
Ascertainment heterogeneity and ambiguity Different levels of granularity Population identifiers: Geographical? Racial? Cultural? Political? Multicultural societies and mixed ancestries Lack of standards/ontologies. Are census codes fit for purpose?
Capturing ethnicity: current status
12 REA categories:
- European (non-Finnish)
- European (Finnish)
- Sub-Saharan Africa
- Asian
- North African/Middle Eastern
- Other Oceanian
- People of the Americas
- Maori/Pacific Islander
- Aboriginal/Torres Strait Islander
- Australian/New Zealander
- Ashkenazi Jewish
- Sephardic Jewish
16 REA categories:
- Chinese
- White: British
- White: Irish
- White: any other
- Asian or British Asian: Pakistani
- Asian or British Asian: Bangladeshi
- Asian or British Asian: Indian
- Asian or British Asian: any other
- Black or British Black: Carribean
- Black or British Black: African
- Black or British Black: any other
- Mixed: x 4
Capturing ethnicity: a way forward?
Human Ancestry Ontology
Human Ancestry Ontology
Pedigree
Consanguinity? Affected 1st degree relatives Suspected mode
- f inheritance
Building tools to support implementation
GA4GH
Clinical & Phenotypic Data Capture Work Stream 2019 Roadmap
Deliverable #1: Definition of phenotype models for different clinical domains with driver projects Deliverable #2: Phenopackets on FHIR Deliverable #3: Pedigree representation
Phenopackets can help us do better
- Craniosynostosis
- Brachydactyly
- Proptosis
- Broad thumb…
How severe are these? Are some more severe than
- thers?
When were they first
- bserved?
Were they NOT observed? How are these linked to a patient?
To genomic info? To samples? To parents
and siblings?
Phenopackets
Diseases, e.g. Abetalipoproteinemia Variants Biosamples
Phenopackets on FHIR
Thank you!
A/Prof Zornitza Stark
Clinical Geneticist, Victorian Clinical Genetics Services Murdoch Children’s Research Institute
Dr Alejandro Metke
Senior Research Scientist, Health Data Interoperability Team Leader Australian e-Health Research Centre, CSIRO