data: challenges and opportunities A/Prof Zornitza Stark and Dr - - PowerPoint PPT Presentation

data challenges and opportunities
SMART_READER_LITE
LIVE PREVIEW

data: challenges and opportunities A/Prof Zornitza Stark and Dr - - PowerPoint PPT Presentation

International sharing of genomic and clinical data: challenges and opportunities A/Prof Zornitza Stark and Dr Alejandro Metke First human genome 2003 >10 years, USD$3 billion Genome sequencing: cost and time 19.5 hours 3 Genomic ic


slide-1
SLIDE 1

International sharing of genomic and clinical data: challenges and opportunities

A/Prof Zornitza Stark and Dr Alejandro Metke

slide-2
SLIDE 2

>10 years, USD$3 billion First human genome 2003

slide-3
SLIDE 3

3

Genome sequencing: cost and time

19.5 hours

slide-4
SLIDE 4

4

Genomic ic testin ing in in healt lthcare: next xt 5 years

60,000,000 patients

slide-5
SLIDE 5

5

Rare disease Cancer Infectious disease Drug response Common disease Population screening

slide-6
SLIDE 6

2012 ~1%

Percentage of whole genomes and exomes that are funded by healthcare systems

2018 ~20% 2022 >80%

Areas of clinical uptake: infectious disease, cancer, rare disease, common/chronic

The world is changing

slide-7
SLIDE 7

The GA4GH Ecosystem

Global Alliance members include: Universities and research institutes (22%) Academic medical centers and health systems (10%) Disease advocacy organizations and patient groups (4%) Consortia and professional societies (13%) Funders and agencies (5%) Life science and information technology companies (46%)

3000+ Subscribers 70+ Countries 550+ Organizational Members

slide-8
SLIDE 8

Global Participation

70+ Countries represented

Afghanistan Finland Mexico South Korea Argentina France Morocco Spain Australia Georgia Nepal Sri Lanka Austria Germany Netherlands Sudan Belgium Ghana New Zealand Sweden Botswana Greece Nicaragua Switzerland Brazil Hong Kong Niger Taiwan Cameroon India Nigeria Tanzania Canada Ireland Norway Tunisia China Israel Peru Turkey Colombia Italy Philippines Uganda Congo Japan Portugal Ukraine Costa Rica Kenya Qatar United Kingdom Croatia Luxembourg Russian Federation United States Czech Republic Malawi Sierra Leone Uruguay Denmark Malaysia Singapore Venezuela Egypt Mali Slovenia Virgin Islands, U.S. Estonia Mauritius South Africa

slide-9
SLIDE 9

Roadmap & Leadership Roles

9

Driver Project Champions Work Stream Leads Work Stream Contributors

  • Give high-level input on GA4GH activities and Roadmap tools
  • ALL DPCs: attend bi-annual in-person SC meetings
  • 1 representative DPC from each project: join bi-annual SC calls
  • Act as ‘team leads’ (e.g. appointing Contributors on Work Streams)
  • Support GA4GH tool implementation within the Driver Projects
  • “Community-minded”

leaders with bandwidth to ensure delivery of tools at the expected rate

  • Ensure balanced input

from multiple DPs on tool development

  • Chair WS calls
  • Participate in quarterly SC

meetings

  • Actively contribute to

development of deliverables

  • Represent needs of Driver

Projects on WS calls

  • Liaise between WS

activities & DPCs

slide-10
SLIDE 10

Direct Engagement Indirect Engagement

slide-11
SLIDE 11

Data Sharing Challenges

Legal and ethical Genomic data Clinical data

slide-12
SLIDE 12

A new paradigm

Data Copying Data Visiting

FROM TO

slide-13
SLIDE 13

Federation

Open research data Healthcare data with research use analysis analysis Aggregate data globally Download, analyse locally Continues for basic research Analyse data locally (via VMs) Collate analyses New approach for both research and healthcare

slide-14
SLIDE 14

Core Principles of Data Sharing

Enable international data sharing Promote sharing across the translational continuum (discovery research, clinical trials, healthcare system, diagnostic labs, industry) Encourages technology-enabled federated approaches (bring analysis to the data) Promote interoperability

  • Scientific: Standards adoption; transparent documentation
  • Technical: Standardized file formats, variant calling protocols, variant & gene annotation
  • Ethical: Consent policies to ensure data can be shared internationally
slide-15
SLIDE 15

15

Global Learning for Health

Interoperable APIs, standards & frameworks to support global data sharing

Research Healthcare

Genomic Knowledge Exchanges

slide-16
SLIDE 16

Real World Driver Projects: Develop and test standards, tools and frameworks for data sharing

slide-17
SLIDE 17
slide-18
SLIDE 18
slide-19
SLIDE 19

Clinical Sequencing Laboratories

Data Quality Assessment Quality reports on BAM + VCF data produced by qprofiler software

Australian Genomics Study Database

Genomic Data VCF BAM FASTQ

Gen-Phen Database ‘Variant Atlas’

Genotypes and Phenotypes available for interactive summary-level queries and visualisations

Genomic Data Repository

Phase 1 genomic data store Phase 2 comprehensive genomic data catalogue

Data Access and Approvals System

Approvals issued automatically (low-sensitivity)

  • r via data access review (high-sensitivity)

Standardised Clinical Phenotypes Patient phenotype data coded in SNOMED / HPO terms and represented in FHIR format

‘Shariant’ Platform

Classified variants and curation evidence shared across laboratories Classified Variants Reported & Unreported Curation Evidence Metadata laboratory, sequencer, library preparation etc

Program Two Data Management Work Flow and Capabilities

Consent + CTRL Dynamic Consent Platform

Access to individual-level data Access to summary-level data

Genotypes Phenotypes

Flagship Clinical Phenotypes Patient phenotype data Data Access Agreements Data Governance Policies Data Access Committee Upload Genomic Data + Associated Metadata

Secondary use of data

slide-20
SLIDE 20

Genomics England: 100,000 Genomes Project

slide-21
SLIDE 21

UK National Genomics Informatics Service (NGIS)

An evolution of the 100,000 Genomes Project platform

National Genomic ic In Informatics Service

Primary Care NHS Trusts

WGS Sequencing Service

Value Extraction Community

Research Support Service Data Management Service National BioInformatics Service

NHS England’s Genomics Unit Genomic Lab Hubs

Secondary & Longitudinal Clinical Data

GMCs

MDTs Diagnostic returns

National Genomic Data Store

Decision Support Service National Genomic Test Ordering Service Authentication Service NPEx Service

1 6 7 8 9

Identifiable Data De-Identified Data

Hu bs

11

eConsent Service Panel Assigner Pedigree Tool

4 5 10

National Genomic Test Directory Service Elucidata NPEx Illumina Optum GeL Illumina Congenica Fabric

Delivery Partners

Zone Digital

2

Sample Tracking Service

3

slide-22
SLIDE 22

Clinical data sharing: harmonizing data capture and exchange

slide-23
SLIDE 23

Clinical data Analysis pipelines Variant-level interpretation Case-level interpretation Normal variation Disease cohorts Matchmaking Diagnostics Research

slide-24
SLIDE 24

Relevant High quality Accurate Machine-readable Interoperable Time Effort Skill Scalability Risk: poor quality, incomplete data = suboptimal interpretation

The problem

slide-25
SLIDE 25

Mapping, harmonization, exchange

Clinical data model NGIS

slide-26
SLIDE 26

Consensus clinical data

Common with other tests:

  • Name, surname
  • DOB
  • Gender: phenotypic/karyotypic
  • Contact details
  • Identifiers: study/hospital
  • Referring clinician/centre/contact details
slide-27
SLIDE 27

Consensus clinical data

Unique to genetics/genomics:

  • Consent status +/-additional findings, data sharing, research
  • Pedigree/consanguinity
  • Additional family members to be tested and affected/unaffected

status/consent status

  • Suspected clinical diagnosis/gene
  • Gene panels for prioritized analysis
slide-28
SLIDE 28

Consensus phenotypic data

  • Phenotype using standard terminology: HPO
  • (Relevant prior tests: genetic and non-genetic)
slide-29
SLIDE 29

Phenotype capture

slide-30
SLIDE 30

Phenotype capture

Acute Care: HPO terms in REDCap

(via Ontoserver and REDCap Ontology Module)

slide-31
SLIDE 31

Ethnicity

Understanding rare variation Clinical diagnostics: Is this variant actually rare/absent? Errors in interpretation More VOUS, false pos, false negs Research: Responsibility to build diverse and representative datasets

slide-32
SLIDE 32

Capturing ethnicity: problems

Ascertainment heterogeneity and ambiguity Different levels of granularity Population identifiers: Geographical? Racial? Cultural? Political? Multicultural societies and mixed ancestries Lack of standards/ontologies. Are census codes fit for purpose?

slide-33
SLIDE 33
slide-34
SLIDE 34

Capturing ethnicity: current status

12 REA categories:

  • European (non-Finnish)
  • European (Finnish)
  • Sub-Saharan Africa
  • Asian
  • North African/Middle Eastern
  • Other Oceanian
  • People of the Americas
  • Maori/Pacific Islander
  • Aboriginal/Torres Strait Islander
  • Australian/New Zealander
  • Ashkenazi Jewish
  • Sephardic Jewish

16 REA categories:

  • Chinese
  • White: British
  • White: Irish
  • White: any other
  • Asian or British Asian: Pakistani
  • Asian or British Asian: Bangladeshi
  • Asian or British Asian: Indian
  • Asian or British Asian: any other
  • Black or British Black: Carribean
  • Black or British Black: African
  • Black or British Black: any other
  • Mixed: x 4
slide-35
SLIDE 35

Capturing ethnicity: a way forward?

slide-36
SLIDE 36

Human Ancestry Ontology

slide-37
SLIDE 37

Human Ancestry Ontology

slide-38
SLIDE 38

Pedigree

Consanguinity? Affected 1st degree relatives Suspected mode

  • f inheritance
slide-39
SLIDE 39

Building tools to support implementation

slide-40
SLIDE 40

GA4GH

Clinical & Phenotypic Data Capture Work Stream 2019 Roadmap

Deliverable #1: Definition of phenotype models for different clinical domains with driver projects Deliverable #2: Phenopackets on FHIR Deliverable #3: Pedigree representation

slide-41
SLIDE 41

Phenopackets can help us do better

  • Craniosynostosis
  • Brachydactyly
  • Proptosis
  • Broad thumb…

How severe are these? Are some more severe than

  • thers?

When were they first

  • bserved?

Were they NOT observed? How are these linked to a patient?

To genomic info? To samples? To parents

and siblings?

slide-42
SLIDE 42

Phenopackets

Diseases, e.g. Abetalipoproteinemia Variants Biosamples

slide-43
SLIDE 43

Phenopackets on FHIR

slide-44
SLIDE 44

Thank you!

A/Prof Zornitza Stark

Clinical Geneticist, Victorian Clinical Genetics Services Murdoch Children’s Research Institute

Dr Alejandro Metke

Senior Research Scientist, Health Data Interoperability Team Leader Australian e-Health Research Centre, CSIRO