DRN OC Updates October 5, 2015 Agenda Discussion of revised CDM - - PowerPoint PPT Presentation
DRN OC Updates October 5, 2015 Agenda Discussion of revised CDM - - PowerPoint PPT Presentation
DRN OC Updates October 5, 2015 Agenda Discussion of revised CDM Implementation FAQs: Shelley Rusincovitch Phase 2 data characterization update: Laura Qualls Defining analysis-ready: Lesley Curtis 2 Revised CDM implementation FAQs Shelley
Agenda
Discussion of revised CDM Implementation FAQs: Shelley Rusincovitch Phase 2 data characterization update: Laura Qualls Defining analysis-ready: Lesley Curtis
2
Revised CDM implementation FAQs
Shelley Rusincovitch
Relational Database Management Systems (RDBMS’s) and SAS
In Phase II, different PCORnet activities will be based within the RDBMS and SAS instances at each datamart
- For example, menu-driven querying within
the RDBMS; data characterization within SAS
Question: In order to support both SAS queries and menu-driven SQL queries, will we need SAS datasets or a relational database management systems (RDBMS) database or both? Response: Both the RDBMS and SAS instances need to be present. Therefore, this question is actually about the data stores.
6
Data Stores for RDBMS-SAS Each site has 2 basic options:
- 1. Most straightforward configuration: the site stores their
data in 2 parallel instances: an RDBMS schema, and a SAS dataset collection
- 2. Option for advanced technical teams: The site configures
their SAS instance to run distributed SAS programs against 1 data store in their RDBMS tables
Essential for each site to work with their institution’s SAS technical team to determine the optimal SAS configuration at the site
7 Read the complete SAS FAQs at https://pcornet.centraldesktop.com/p/aQAAAAACXx_y
Both configurations need the SAS platform
PCORnet Distributed Activity RDBMS Instance
(such as Oracle, SQL Server, etc)
SAS Instance
CDM Data Stored in RDBMS Tables Distributed SAS Module Distributed SQL Query
Institution’s Firewall
SAS Platform
Configuration B: Stand-alone RDBMS Data Store PCORnet Distributed Activity RDBMS Instance
(such as Oracle, SQL Server, etc)
Configuration A: Parallel Data Stores SAS Instance
CDM Data Stored in RDBMS Tables CDM Data Stored in a SAS Dataset Collection Distributed SAS Module Distributed SQL Query
Institution’s Firewall
Considerations for RDBMS-SAS configurations
Running distributed SAS programs against RDBMS tables is an advanced technical setup
- This configuration may result in suboptimal performance,
systems resource use, and response time:
- As SAS reads the data, the RDBMS optimizer may
not be able to compensate for these scans; performance optimization needs to be considered within the SAS basis as well
- The PCORnet SAS programs still need to run without
modification (except to change the libname to point it to the correct data source)
9
Data Store Backups
Sites should be able to quickly (within 1-2 weeks) revert to and use the prior analysis-ready data store
- Important for situations such as the current datamart refresh
has issues, is found to be unusable, etc. Will be dependent upon site’s configuration
- For parallel RDBMS-SAS data stores, sites may choose to
archive SAS dataset collection
- For stand-alone RDBMS data store, RDBMS tables would be
archived
Next Areas
Data characterization and analysis-ready classification Role of the HARVEST table Datamart refresh expectations
- Participation in specific study activities
Phase 2 Data Characterization
Laura Qualls
Phase 2 data characterization
Foundation for “analysis-ready” Approach
- Built upon the foundation of Phase 1 data characterization
- Iterative changes to the query package to characterize additional tables
- Enhanced analytic tools to expedite the characterization process and
facilitate comparisons between DataMarts and between DataMart refreshes.
Query distribution specifics
- DRN Query Tool (PopMedNet) file distribution
- SAS programs that expect SAS version 9.3+ and SAS data types (esp. for
dates and times) 13
Phase 2 data characterization, continued
Query package v3.0
- Similar to Phase 1 data characterization package
- Characterizes the 7 expected tables (Demographic, Enrollment,
Encounter, Diagnosis, Procedures, Vital, and Harvest)
- Includes approximately 15 new analytic queries, including some cross-
table queries (e.g. patients with at least 1 diagnosis code and 1 vital measurement in the past year)
- Includes DataMart metadata (HARVEST table; SAS version & installed
components; operating system; etc.) 14
Phase 2 data characterization timeline
October 2015
- Release code package for beta-testing query execution and response (not the
full data characterization process)
November 2015
- Refine code based on beta-testing results; finalize query package
December 2015-January 2016
- Develop data characterization tools and reports
February 2016
- Phase 2 data characterization onboarding begins
- Locked/static DataMart required
- Schedule TBD; DataMarts participating in demonstration projects will be
prioritized Estimated time to onboard each DataMart
- Approximately 8-12 weeks
- Timeline will vary depending on query response time and number of issues
identified
15
Defining Analysis-Ready Data for CDRNs
DRN Operations Center Lesley Curtis
What do we mean by analysis-ready?
Data that support feasibility assessments, prep-to-research queries, and dashboard metrics Data that support interventional and observational CER studies with minimal additional curation (research queries)
17 Key assumption: Our definition of analysis-ready will evolve as demonstration projects get underway and use of the data increases!
Analysis-ready according to the PFA
Full range of quality-checked data for a population of 1m Transformed into the current version of the PCORnet CDM Able to execute SAS queries against CDRN data without modification
18
Analysis-ready according to the PFA* and FAQs
Full range of quality-checked data for a population of 1m* Transformed into the current version of the PCORnet CDM* Able to execute SAS queries against CDRN data without modification* Shifting dates is not recommended Locked, static instance (RDBMS and SAS dataset collection)
19
DRN onboarding complete (PopMedNet) DataMarts in PCORnet CDM v3.0
- SAS and RDBMS
Static DataMart queryable with SAS Unselected population Initial data characterization complete with no significant issues
- No primary key violations
- PCORnet CDM tables populated for DEMOGRAPHIC,
ENCOUNTER, DIAGNOSIS, PROCEDURES, HARVEST, ENROLLMENT, VITAL
20
Proposed requirements for feasibility and PTR query, dashboard metric readiness
Requirements for ‘research-ready’
Meets all requirements for feasibility and PTR query readiness No date obfuscation Ability to link with claims or actual linked datasets No other significant findings in data characterization
- High level of completeness for most data fields
- No significant errors in data mapping
21
22
Research
Feasibility, PTR, and dashboard
Data stoplight for feasibility and PTR queries, dashboard metrics
23 Not approved Proceed with caution Approved Foundational requirements not met Data characterization incomplete Foundational requirements met Data characterization complete Data issues identified for resolution with next refresh Foundational requirements met Data characterization complete No significant data issues identified
Data stoplight for research queries
24 Not approved Proceed with caution Approved Approved for feasibility and PTR queries, dashboard metrics Linkage to claims possible for specific studies Date obfuscation Incomplete data at the field level Linked population is ready for analysis No date obfuscation High level of completeness for most fields
Data stoplight for research queries
Establish a uniform, objective standard Determination of whether a given site/datamart is ready to participate in a given study will depend on the requirements of the protocol
- ‘Proceed with caution’ may be sufficient for some studies while
‘approved’ may be insufficient for others.
25
Next steps
Develop objective metrics for each requirement
- A few are straightforward (no date obfuscation), most are not
- Review, refine with EC
Clarify implications of ‘not approved’ Present to CDRN PIs, datamart technical teams Implement with SAS-based data characterization process Consider upstream ways to streamline the number of analysis- ready assessments.
26