DRN OC Updates October 5, 2015 Agenda Discussion of revised CDM - - PowerPoint PPT Presentation

drn oc updates
SMART_READER_LITE
LIVE PREVIEW

DRN OC Updates October 5, 2015 Agenda Discussion of revised CDM - - PowerPoint PPT Presentation

DRN OC Updates October 5, 2015 Agenda Discussion of revised CDM Implementation FAQs: Shelley Rusincovitch Phase 2 data characterization update: Laura Qualls Defining analysis-ready: Lesley Curtis 2 Revised CDM implementation FAQs Shelley


slide-1
SLIDE 1

DRN OC Updates

October 5, 2015

slide-2
SLIDE 2

Agenda

Discussion of revised CDM Implementation FAQs: Shelley Rusincovitch Phase 2 data characterization update: Laura Qualls Defining analysis-ready: Lesley Curtis

2

slide-3
SLIDE 3

Revised CDM implementation FAQs

Shelley Rusincovitch

slide-4
SLIDE 4

Relational Database Management Systems (RDBMS’s) and SAS

In Phase II, different PCORnet activities will be based within the RDBMS and SAS instances at each datamart

  • For example, menu-driven querying within

the RDBMS; data characterization within SAS

slide-5
SLIDE 5
slide-6
SLIDE 6

Question: In order to support both SAS queries and menu-driven SQL queries, will we need SAS datasets or a relational database management systems (RDBMS) database or both? Response: Both the RDBMS and SAS instances need to be present. Therefore, this question is actually about the data stores.

6

slide-7
SLIDE 7

Data Stores for RDBMS-SAS Each site has 2 basic options:

  • 1. Most straightforward configuration: the site stores their

data in 2 parallel instances: an RDBMS schema, and a SAS dataset collection

  • 2. Option for advanced technical teams: The site configures

their SAS instance to run distributed SAS programs against 1 data store in their RDBMS tables

Essential for each site to work with their institution’s SAS technical team to determine the optimal SAS configuration at the site

7 Read the complete SAS FAQs at https://pcornet.centraldesktop.com/p/aQAAAAACXx_y

slide-8
SLIDE 8

Both configurations need the SAS platform

PCORnet Distributed Activity RDBMS Instance

(such as Oracle, SQL Server, etc)

SAS Instance

CDM Data Stored in RDBMS Tables Distributed SAS Module Distributed SQL Query

Institution’s Firewall

SAS Platform

Configuration B: Stand-alone RDBMS Data Store PCORnet Distributed Activity RDBMS Instance

(such as Oracle, SQL Server, etc)

Configuration A: Parallel Data Stores SAS Instance

CDM Data Stored in RDBMS Tables CDM Data Stored in a SAS Dataset Collection Distributed SAS Module Distributed SQL Query

Institution’s Firewall

slide-9
SLIDE 9

Considerations for RDBMS-SAS configurations

Running distributed SAS programs against RDBMS tables is an advanced technical setup

  • This configuration may result in suboptimal performance,

systems resource use, and response time:

  • As SAS reads the data, the RDBMS optimizer may

not be able to compensate for these scans; performance optimization needs to be considered within the SAS basis as well

  • The PCORnet SAS programs still need to run without

modification (except to change the libname to point it to the correct data source)

9

slide-10
SLIDE 10

Data Store Backups

Sites should be able to quickly (within 1-2 weeks) revert to and use the prior analysis-ready data store

  • Important for situations such as the current datamart refresh

has issues, is found to be unusable, etc. Will be dependent upon site’s configuration

  • For parallel RDBMS-SAS data stores, sites may choose to

archive SAS dataset collection

  • For stand-alone RDBMS data store, RDBMS tables would be

archived

slide-11
SLIDE 11

Next Areas

Data characterization and analysis-ready classification Role of the HARVEST table Datamart refresh expectations

  • Participation in specific study activities
slide-12
SLIDE 12

Phase 2 Data Characterization

Laura Qualls

slide-13
SLIDE 13

Phase 2 data characterization

Foundation for “analysis-ready” Approach

  • Built upon the foundation of Phase 1 data characterization
  • Iterative changes to the query package to characterize additional tables
  • Enhanced analytic tools to expedite the characterization process and

facilitate comparisons between DataMarts and between DataMart refreshes.

Query distribution specifics

  • DRN Query Tool (PopMedNet) file distribution
  • SAS programs that expect SAS version 9.3+ and SAS data types (esp. for

dates and times) 13

slide-14
SLIDE 14

Phase 2 data characterization, continued

Query package v3.0

  • Similar to Phase 1 data characterization package
  • Characterizes the 7 expected tables (Demographic, Enrollment,

Encounter, Diagnosis, Procedures, Vital, and Harvest)

  • Includes approximately 15 new analytic queries, including some cross-

table queries (e.g. patients with at least 1 diagnosis code and 1 vital measurement in the past year)

  • Includes DataMart metadata (HARVEST table; SAS version & installed

components; operating system; etc.) 14

slide-15
SLIDE 15

Phase 2 data characterization timeline

October 2015

  • Release code package for beta-testing query execution and response (not the

full data characterization process)

November 2015

  • Refine code based on beta-testing results; finalize query package

December 2015-January 2016

  • Develop data characterization tools and reports

February 2016

  • Phase 2 data characterization onboarding begins
  • Locked/static DataMart required
  • Schedule TBD; DataMarts participating in demonstration projects will be

prioritized Estimated time to onboard each DataMart

  • Approximately 8-12 weeks
  • Timeline will vary depending on query response time and number of issues

identified

15

slide-16
SLIDE 16

Defining Analysis-Ready Data for CDRNs

DRN Operations Center Lesley Curtis

slide-17
SLIDE 17

What do we mean by analysis-ready?

Data that support feasibility assessments, prep-to-research queries, and dashboard metrics Data that support interventional and observational CER studies with minimal additional curation (research queries)

17 Key assumption: Our definition of analysis-ready will evolve as demonstration projects get underway and use of the data increases!

slide-18
SLIDE 18

Analysis-ready according to the PFA

Full range of quality-checked data for a population of 1m Transformed into the current version of the PCORnet CDM Able to execute SAS queries against CDRN data without modification

18

slide-19
SLIDE 19

Analysis-ready according to the PFA* and FAQs

Full range of quality-checked data for a population of 1m* Transformed into the current version of the PCORnet CDM* Able to execute SAS queries against CDRN data without modification* Shifting dates is not recommended Locked, static instance (RDBMS and SAS dataset collection)

19

slide-20
SLIDE 20

DRN onboarding complete (PopMedNet) DataMarts in PCORnet CDM v3.0

  • SAS and RDBMS

Static DataMart queryable with SAS Unselected population Initial data characterization complete with no significant issues

  • No primary key violations
  • PCORnet CDM tables populated for DEMOGRAPHIC,

ENCOUNTER, DIAGNOSIS, PROCEDURES, HARVEST, ENROLLMENT, VITAL

20

Proposed requirements for feasibility and PTR query, dashboard metric readiness

slide-21
SLIDE 21

Requirements for ‘research-ready’

Meets all requirements for feasibility and PTR query readiness No date obfuscation Ability to link with claims or actual linked datasets No other significant findings in data characterization

  • High level of completeness for most data fields
  • No significant errors in data mapping

21

slide-22
SLIDE 22

22

Research

Feasibility, PTR, and dashboard

slide-23
SLIDE 23

Data stoplight for feasibility and PTR queries, dashboard metrics

23 Not approved Proceed with caution Approved Foundational requirements not met Data characterization incomplete Foundational requirements met Data characterization complete Data issues identified for resolution with next refresh Foundational requirements met Data characterization complete No significant data issues identified

slide-24
SLIDE 24

Data stoplight for research queries

24 Not approved Proceed with caution Approved Approved for feasibility and PTR queries, dashboard metrics Linkage to claims possible for specific studies Date obfuscation Incomplete data at the field level Linked population is ready for analysis No date obfuscation High level of completeness for most fields

slide-25
SLIDE 25

Data stoplight for research queries

Establish a uniform, objective standard Determination of whether a given site/datamart is ready to participate in a given study will depend on the requirements of the protocol

  • ‘Proceed with caution’ may be sufficient for some studies while

‘approved’ may be insufficient for others.

25

slide-26
SLIDE 26

Next steps

Develop objective metrics for each requirement

  • A few are straightforward (no date obfuscation), most are not
  • Review, refine with EC

Clarify implications of ‘not approved’ Present to CDRN PIs, datamart technical teams Implement with SAS-based data characterization process Consider upstream ways to streamline the number of analysis- ready assessments.

26