CPTR RDST Data Platform Concept
September 22, 2014
CPTR RDST Data Platform Concept September 22, 2014 Outline C-Path - - PowerPoint PPT Presentation
CPTR RDST Data Platform Concept September 22, 2014 Outline C-Path overview and examples of data projects Knowledge sharing concept RDST approach Examples of RDST data types Database architecture Next steps timeline
September 22, 2014
CPTR-RDST Data Platform 2014 Workshop Slides
2
Coalition Against Major Diseases
UNDERSTANDING DISEASES OF THE BRAIN
Critical Path to TB Drug Regimens
TESTING DRUG COMBINATIONS
Multiple Sclerosis Outcome Assessments Consortium
DRUG EFFECTIVENESS IN MS
Polycystic Kidney Disease Consortium
NEW IMAGING BIOMARKERS
Patient-Reported Outcome Consortium
DRUG EFFECTIVENESS
Electronic Patient-Reported Outcome Consortium
DRUG EFFECTIVENESS
Predictive Safety Testing Consortium
DRUG SAFETY
Seven global consortia developing novel drug development tools
Biomarkers Clinical Outcome
Assessment Instruments
Clinical Trial
Simulation Tools
In vitro tools Data Standards
3
Current C-Path examples CAMD – AD Clinical Trial Simulation Tool PKD - Biomarker Qualification Project MSOAC – New Outcome Assessment Instrument for MS
MSOAC
4
CDC TB study data now available
Consortium Therapeutic Area # of Studies Total Number
Number of Data Contributors Coalition Against Major Diseases Alzheimer's disease 27 7340 11 Parkinson's disease 7 2597 2 Critical Path to TB drug Regimens Tuberculosis 10 2495 5 MS Outcome Assessments Consortium Multiple sclerosis 6 4700 4 Polycystic Kidney Disease Polycystic kidney disease 5 2941 4 Predictive Safety Testing Consortium Normal healthy volunteer-kidney 1 172 1 Skeletal-muscular (non- clinical) 38 1766 6 Hepato-toxicity (non-clinical) 43 2340 7 Nephro-toxicity (non-clinical) 14 941 8
5
Nine member companies agreed to share data from 24 Alzheimer’s disease (AD) trials The data were not in a common format The data were remapped to the CDISC AD standard and pooled A new clinical trial simulation tool was created and has been the first model endorsed by the FDA and EMA Researchers utilizing database to advance research
Start Point Result
24 studies, >6500 patients 6
7
Model endorsed by FDA and EMA Access to AD data available to qualified researchers
Future TB model
8
9
CPTR TB Drug Resistance DB
Data Platform to Inform Assay Development
10
How do we build this system?
Linking Global TB Sequence Researchers
investigational biomarkers to validated status
consolidated DBs
for regulatory submission DB
Approved members
Validated DR biomarkers
Expert Panel Review Approved biomarkers
Sequence repository
Anonymized sequence data Clinical annotation Phenotypic methods User friendly cloud interface Analysis files generated
Investigational DB
Analysis files generated Analysis files generated Analysis files generated Analysis files generated Analysis files generated
CPTR-RDST Data Platform 2014 Workshop Slides
How do we accomplish this?
– Improved research resource to enable development of new rapid diagnostics for TB
– With sustainability funding: resource for clinicians
– Apply technology product development discipline – Design to handle wide range of data types – Quality criteria and defined process for incoming data – Lean, efficient and well managed implementation – Expandable / adaptable / flexible – Great usability
11
CPTR-RDST Data Platform 2014 Workshop Slides
Related efforts: TBDReamDB
12
CPTR-RDST Data Platform 2014 Workshop Slides
http://www.tbdreamdb.com/index.html
Example of future objective: Stanford HIV database
13
CPTR-RDST Data Platform 2014 Workshop Slides
http://hivdb.stanford.edu/
– Verification of all features and function – Usability – Performance – Scalability
– RDST members – Qualified external researchers
Requirements Prototype Design Build Test Deploy Support and enhance
14
CPTR-RDST Data Platform 2014 Workshop Slides
RDST Data: multiple data types
15
Need to incorporate multiple types of data
Which need to be analyzed to find and validate correlations
CPTR-RDST Data Platform 2014 Workshop Slides
RDST Data: genotypic data example
@M00347:61:000000000-A9B8J:1:1101:15324:1677 1:N:0:1 TCTTGATCGCGAGTTCGCGGCCCGGGGTGAGCACCCAGGTGAGCGGGAAATGCGTGGTGTCGTGGTAGCTGACGTCGACGATGCCGTGGCG + 11>>1@BF1>>11AEF00000AA////A//A1AB/?/GAEC1GBE??///FFG/?E?EFHF/F?A?EG1BDBC/FCGGCC<FACHCCG/>CC-.<>10<<-< @M00347:61:000000000-A9B8J:1:1101:15765:1689 1:N:0:1 TGCGATTGCAGCGCGTCGGCGTCGGTGGTGTAACCGGTCTTGGTCTTCTTGGTCTTCGGCATCTCCAGCTCGTCGAACAGAACGGCCTGCA + 11111>A1D31B1A0EE00A/EA//E//EAFGHHFGCEEGHDBHHHHHHHBDHHHHFG/E?EHHHGFFGHHGEFG?FHGEEHHEGGGGGHGGH @M00347:61:000000000-A9B8J:1:1101:15578:1705 1:N:0:1 CTCGACGTCGGCAAGGGTCAGGTCGTGGTGGTGCTCGGCCCCTCGGGCTCGGGCAAATCGACGTTGTGCCGCACGATCAATCGCCTCGAG + 1>11>ADDA?1000000BFFFFHF0E?/AFEECGHH//AEEEGGEC?GGH?/@@GGFHHE0EEEFHGGHHGGGEGGHHGHFHGGGGGG/CHGG @M00347:61:000000000-A9B8J:1:1101:13636:1714 1:N:0:1 CGATTCGACGGCCTGTTCATCGCCGACGTGCTCGGTACCTACGACGTGTACGGCGGCAGCGACGAGGCCGCGATCCGTCACGCCGCGCAG + 111>AFABA1@AAEFGGGFGFFCFA?E/EFHGHGGGHGHHHHE?FGGGHHHHGGGGGEEECEGGGG?CGGGGGGGGHGHGGCFGCCGGG @M00347:61:000000000-A9B8J:1:1101:15489:1729 1:N:0:1 TATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAACAAACCAATAAACAA
16
TB FASTQ file – 650 MB uncompressed
CPTR-RDST Data Platform 2014 Workshop Slides
RDST Data: SNP report example
17
CPTR-RDST Data Platform 2014 Workshop Slides
RDST Data: phenotypic data example
18
CPTR-RDST Data Platform 2014 Workshop Slides
https://tbdr.org/cgi/tbdr
STUDYID DOMAIN USUBJID AGE SEX RACE ARM 19 DM 10001 27 F WHITE Ethambutol 5 Times Per Week 19 DM 10002 63 M WHITE Moxifloxacin 3 Times Per Week 19 DM 10003 42 M BLACK OR AFRICAN AMERICAN Moxifloxacin 5 Times Per Week 19 DM 10004 30 F ASIAN Moxifloxacin 5 Times Per Week 19 DM 10005 29 M BLACK OR AFRICAN AMERICAN Moxifloxacin 3 Times Per Week 19 DM 10006 35 M BLACK OR AFRICAN AMERICAN Ethambutol 3 Times Per Week 19 DM 10007 46 F UNKNOWN Ethambutol 3 Times Per Week 19 DM 10008 34 F BLACK OR AFRICAN AMERICAN Moxifloxacin 5 Times Per Week 19 DM 10009 55 M BLACK OR AFRICAN AMERICAN Ethambutol 3 Times Per Week 19 DM 10010 42 M ASIAN Moxifloxacin 5 Times Per Week 19 DM 10011 23 F BLACK OR AFRICAN AMERICAN Ethambutol 3 Times Per Week 19 DM 10012 47 F WHITE Ethambutol 3 Times Per Week 19 DM 10013 25 F BLACK OR AFRICAN AMERICAN Moxifloxacin 5 Times Per Week 19 DM 10014 21 M WHITE Ethambutol 3 Times Per Week 19 DM 10015 79 M WHITE Moxifloxacin 3 Times Per Week 19 DM 10016 27 F ASIAN Moxifloxacin 3 Times Per Week 19 DM 10017 37 M BLACK OR AFRICAN AMERICAN Ethambutol 3 Times Per Week 19 DM 10018 28 M BLACK OR AFRICAN AMERICAN Moxifloxacin 3 Times Per Week
RDST Data: clinical data example (hypothetical data)
19
STUDYID DOMAIN USUBJID MBTESTCD MBTEST MBORRES MBSPEC VISIT 13 MB 10001 AFB Acid Fast Bacilli NEGATIVE SPONT SPUTUM WEEK 8 13 MB 10001 ORGANISM Organism Present NEGATIVE FOR TUBERCULOSIS SPONT SPUTUM WEEK 4 13 MB 10001 MTBINH M.tuberculosis INH Resistant POSITIVE NON-OVERNIGHT SP SCREENING 15 MB 10001 ORGANISM Organism Present NEGATIVE FOR TUBERCULOSIS SPONT SPUTUM WEEK 4 13 MB 10002 AFB Acid Fast Bacilli NEGATIVE SPONT SPUTUM WEEK 8 13 MB 10002 ORGANISM Organism Present POSITIVE FOR M. TUBERCULOSIS COMPLEX SPONT SPUTUM SCREENING 15 MB 10002 ORGANISM Organism Present POSITIVE FOR M. TUBERCULOSIS COMPLEX SPONT SPUTUM SCREENING 13 MB 10003 ORGANISM Organism Present NEGATIVE FOR TUBERCULOSIS INDUCED SPUTUM WEEK 4 15 MB 10004 ORGANISM Organism Present NEGATIVE FOR TUBERCULOSIS SPONT SPUTUM WEEK 4 STUDYID DOMAIN USUBJID MOTESTCD MOTEST MOORRES MOSTRESC MOLOC VISIT MODY 17 MO 10001 CAVIT Cavitation Y Y LUNG, LEFT SCREENING
17 MO 10002 CAVIT Cavitation Y Y LUNG, LEFT SCREENING
17 MO 10002 PLEURALDPleural Disease N N LUNG, LEFT SCREENING 1 17 MO 10004 PLEURALDPleural Disease N N LUNG, LEFT SCREENING
17 MO 10005 CAVIT Cavitation N N LUNG, LEFT SCREENING
17 MO 10005 CAVIT Cavitation N N LUNG, LEFT SCREENING
17 MO 10006 CAVIT Cavitation Y Y LUNG, LEFT SCREENING 1 17 MO 10006 PLEURALDPleural Disease N N LUNG, LEFT SCREENING
CPTR-RDST Data Platform 2014 Workshop Slides
RDST Data: TB strain summary table
http://www.ncbi.nlm.nih.gov/genome/genomes/166
20
CPTR-RDST Data Platform 2014 Workshop Slides
RDST data platform: design to handle multiple data types
Research Database
Subject – Level Clinical Trial Data
VAR1 1 2 3 4 5 6 7 s1 x1 x2 x3 x4 x5 x6 x7 s2 y1 y2 y3 y4 y5 y6 y7 s.. z1 z2 z3 z4 z5 z6 z7
Time
VAR2 1 2 3 4 5 6 7 s1 x1 x2 x3 x4 x5 x6 x7 s2 y1 y2 y3 y4 y5 y6 y7 s.. z1 z2 z3 z4 z5 z6 z7 VAR3 1 2 3 4 5 6 7 s1 x1 x2 x3 x4 x5 x6 x7 s2 y1 y2 y3 y4 y5 y6 y7 s.. z1 z2 z3 z4 z5 z6 z7
Strain 1 sequence data Strain 2 sequence data Strain 3 sequence data ACAAGATGCCATTGTCCCGCT… CCTGGAGGGTGGGAGACA… CTTTCCTCGCTTGGGTGG…..
21
Clinical Trial Data
Data Analysis Data Analysis
OBS1 1 2 3 4 5 6 7 s1 x1 x2 x3 x4 x5 s2 y1 y2 y3 y4 y5 s.. z1 z2 z3 z4 VAR1 1 2 3 4 5 6 7 s1 x1 x6 x7 s2 y1 y4 y5 y6 y7 s.. z1 z2 z3 z4 z5 z6 z7
TEST1 1 2 3 4 5 6 7 s1 x_base x_chk1 x7 s2 y_base y_chk1 y7 s.. z_base z2 z3 z7
Surveillance Data Time
Apply CDISC Data Standards
Surveillance data Genotypic data Phenotypic data
CPTR-RDST Data Platform 2014 Workshop Slides
Key success factors for incoming data
– Survey and prioritize – Proactive engagement – Recognition and incentives for contributions
– Develop and vet during initial data survey & prioritization
– Unified pipeline for incoming sequence data – Ability to apply CDISC standards to create efficient database (vs large number of small data buckets)
22
CPTR-RDST Data Platform 2014 Workshop Slides
Quality criteria and defined process for incoming sequence data
23
Incoming FASTQ plus associated SNP report New SNP report generated with RDST unified pipeline
RDST unified pipeline for sequence data
CPTR-RDST Data Platform 2014 Workshop Slides
Data Element: Phase of TB treatment Data Element: TB Symptoms
24
TB clinical data mapping to CDISC We do this today for CPTR
USUBJID EXTRT EXDOS EXDOSU USUBJID CETERM CEPRESP CEOCCUR
Clinical Events (CE) Exposure (EX) Skin Response (SR)
USUBJID SRTESTCD SRTEST SRORRES SRORRESU 12345 INDURDIA Induration Diameter 16 mm
Controlled Terminology
Map to CDISC domains
CDISC Variables
Data Element: Tuberculin Skin Test Result Definition: The number of millimeters in diameter of the induration, or raised hardening, at the tuberculin skin test site. Permissible value set: mm
24
CPTR-RDST Data Platform 2014 Workshop Slides
Rapid DST Data Platform
25
CPTR-RDST Data Platform 2014 Workshop Slides
Interventions Special Purpose
Demographics Subject Elements Subject Visits
Findings
ECG Incl/Excl Exceptions
Events
Con Meds Disposition Comments
Trial Design
Trial Elements Trial Arms Trial Visits Trial Incl/Excl Exposure Substance Use Adverse Events Medical History Deviations Clinical Events PK Concentrations Vital Signs Microbiology Spec. Questionnaire Drug Accountability Subject Characteristics Labs Microbiology Suscept. PK Parameters Physical Exam Trial Summary Findings About
26
CPTR-RDST Data Platform 2014 Workshop Slides
CDISC Study Data Tabulation Model (SDTM) domains for classification of data elements
Data Mapping: SNP report example
27
PFORREF – reference result (can apply to nucleotides or amino acids, depends on value in PFTEST) PFORRES – experimental result (can apply to nucleotides or amino acids, depends on value in PFTEST) PFRESCAT – category of result (is this a nonsense or missense mutation? frameshift? etc.) PFGENTYP – type of feature we’re looking at (gene, sector, protein, etc.) PFGENRI – region of interest (it is defined as the specific gene or locus being looked at) PFSTRESC – standard result of the analysis. Usually uses HGVS nomenclature
CPTR-RDST Data Platform 2014 Workshop Slides
Rapid DST Data Platform
28
Three primary categories of data
–Master copy –Full complement of data for RDST consortium use –Authorized subset for external researchers (as broad as possible within sharing terms and conditions imposed by each data contributor)
CPTR-RDST Data Platform 2014 Workshop Slides
Rapid DST Data Platform
+ Lean, efficient and well managed implementation + Expandable / adaptable / flexible + Great usability Strong alignment with anticipated analysis use cases
29
Rapid DST Data Platform
30
CPTR-RDST Data Platform 2014 Workshop Slides
2014 2015 2016 2017
S O N D J F M A M J J A S O N D J F M A M J J A S O N D J F M A M J J A S O S O N D J F M A M J J A S O N D J F M A M J J A S O N D J F M A M J J A S O2014 2015 2016 2017
RDST Data Sharing Platform Timeline v4
1.1 Governance Model 1.5 Dev Ph 1 Dev Ph 2
Data Platform Available for Consortium Members
2.4 Perform phase 1 program assessment 2.6 Enable for external researchers
Sustainability Funding Secured
C-Path Milestone
1.3 Req’s, Arch and Design 1.2 Value proposition, DUA updates and Communication Plan 1.5 Test Ph 1 1.7 Test Ph 2 Dev Ph 3 1.8 Test Ph 3 3.5 Perform Phase 2 program assessment
Data Platform Available for external researchers
3.6/3.8 Release 2 Dev/Test
FIND Milestone
2.5 Expand Capacity 3.7 Expand Capacity 1.6/2.2/3.3 Prepare and load contributed data in Data Platform as it becomes available 2.3/3.4 Review and approve access requests as they are submitted 2.1/3.2 Monitor performance and usage 1.1.1 Inventory of available DBs 1.2.1 Form Expert Panel and support Data Platform development and use 1.2.2 Develop criteria for determination of resistance mutations 1.2.3 Develop algorithms for interpretation of genotypic data
1.2.3 Published algorithm for interpretation of genotypic data
1.3.1 Develop guidelines/criteria for clinical validation of assays to detect/interpret resistance mutation 1.4.1 Support for development of access models and tools for broad access
PHASE 2 PHASE 3
C-PATH Milestones
3.9 Pursue funding to support clinical use 2.8/3.1 Pursue sustainability funding 1.6 Load early data 2.7 Beta Test 1.4 Request early data 1.9 Prep for production Request early data 1.5.1/1.5.2 Support for sustainable business model and review process
1.1.2 Early Data Packages available for inclusion in Data Platform 1.2.2 Defined Criteria for determination of resistance mutations 1.3.1 WHO report on guidelines/criteria for validation of assays to detect and intrerpret resistance mutations
1.1.2 Prepare data packages for inclusion in Data Platform 1.1.3 Input to C-Path on design of Data Platform
PHASE 1
FIND Milestones
Assist with development of Value Proposition and Communications Plan
Build and deploy Expanded access Sustainability
Rapid DST Data Platform
32
CPTR-RDST Data Platform 2014 Workshop Slides
33
www.c-path.org
CPTR-RDST Data Platform 2014 Workshop Slides
Rapid DST Data Platform
34
CPTR-RDST Data Platform 2014 Workshop Slides
Rapid DST Data Platform
35
+ Lean, efficient and well managed implementation + Expandable / adaptable / flexible + Great usability
CPTR-RDST Data Platform 2014 Workshop Slides
Rapid DST Data Platform
36
Investigational DB with user access levels (data team, RDST, external)
user friendly cloud interface
FASTQ data files
internal
Incoming data storage
external
CPTR-RDST Data Platform 2014 Workshop Slides
Rapid DST Data Platform
37
Strong alignment with anticipated analysis use cases
CPTR-RDST Data Platform 2014 Workshop Slides