Publicly Available Large Data Sets for Health Outcomes Research: - - PowerPoint PPT Presentation

publicly available large data sets for
SMART_READER_LITE
LIVE PREVIEW

Publicly Available Large Data Sets for Health Outcomes Research: - - PowerPoint PPT Presentation

Publicly Available Large Data Sets for Health Outcomes Research: Pearls, Pitfalls, Prices & More LAKSH IKA TEN N AKOON - MD , MS C , MP H IL , D TM&H R ESEAR C H SC IEN TIST TR AU MA, AC U TE C AR E AN D C R ITIC AL C AR E SU R GERY


slide-1
SLIDE 1

Publicly Available Large Data Sets for Health Outcomes Research: Pearls, Pitfalls, Prices & More

October, 2018

LAKSH IKA TEN N AKOON - MD , MS C, MP H IL, D TM&H R ESEAR C H SC IEN TIST TR AU MA, AC U TE C AR E AN D C R ITIC AL C AR E SU R GERY STAN FO R D U N IVER SITY

slide-2
SLIDE 2

Aims

  • To encourage use of public data for

Research

  • To characterize existing large clinical

databases

slide-3
SLIDE 3

Databases Dates Nationwide Inpatient Sample (NIS) 1988- 2016 Nationwide Emergency Department Sample (NEDS) 2006-2016 Nationwide Readmissions Database (NRD) 2010-2016 KID Inpatient Data (KID) 1997,2000, 2003,2006, 2009, 2012, 2016 National Trauma Databank (NTDB) 2002-2016 National Surgical Quality Improvement program (NSQIP) 2005-2016 National Ambulatory Medical Care Survey (NAMCS) 1993-2015

Best Currently Available Databases

Source

HCUP

HCUP HCUP HCUP ACS ACS CDC

slide-4
SLIDE 4

Databases Dates National Health and Nutrition Examination Survey (NHANES) 1999-2015 National Hospital Ambulatory Medical Care Survey (NHAMCS) 1992-2015 Medicare/SEER 1991-2015 MarketScan 2002-2011 Hospital based Registry data

Databases……….

Source CDC CDC Government Private Hospital Based

slide-5
SLIDE 5
  • The largest publicly available all-payer inpatient care

database in the United States

  • Samples include all discharges from 20% stratified

sample of US hospitals

  • NIS data can be weighted

to generate national estimates

  • Years available: 1988 to 2016
  • Has 8 million hospital stays a year
  • NIS_2015_CORE data file has: 7,153,989 Records

Nationwide Inpatient Sample (NIS)

slide-6
SLIDE 6

Cost & Data Load Software

  • Cost of 2016 NIS : $625
  • Original Data comes as CSV or ASCII files
  • Load programs are available in:

STATA SAS SPSS

  • Data storage : Large databases need a server or

BOX

slide-7
SLIDE 7

NIS-Requirements

slide-8
SLIDE 8

Citing HCUP Databases

  • Citing HCUP Databases in Abstract and Manuscript:

As specified in the HCUP DUAs, include the database name, HCUP, and AHRQ as demonstrated below for each HCUP database:

  • HCUP Nationwide Inpatient Sample (NIS). Healthcare Cost and

Utilization Project (HCUP). 2007-2009. Agency for Healthcare Research and Quality, Rockville, MD. www.hcup- us.ahrq.gov/nisoverview.jsp

slide-9
SLIDE 9

Data Files ▪ Core Data ▪ Hospital Data ▪ Illness Severity Data ▪ Cost to charge ratio Data ▪ Diagnosis & Procedure Groups Data

▪ https://www.hcup-us.ahrq.gov/db/nation/nis/nisdde.jsp

What Data Elements Are in the NIS?

slide-10
SLIDE 10
  • Age at admission
  • Gender of patient
  • Race of Patient
  • Location of patient’s residence
  • Median household income for patient's ZIP code
  • ICD-9-CM

diagnoses: primary and secondary diagnoses, number of diagnoses, diagnosis coding system

  • External causes of injury and poisoning: ECODE 1-4, number
  • f external cause of injury
  • ICD-9-CM Procedures: primary and secondary procedures,

number of procedures, procedure systems, duration of primary and secondary procedures

  • Total charges
  • Disposition
  • Length of stay

Core Data File

slide-11
SLIDE 11
  • Hospital bed size
  • Type
  • f

Hospital: government

  • r

private; government, nonfederal, public; private, non- profit; private, investor-own

  • Hospital Location: rural or urban,
  • Location/teaching

status

  • f

Hospital: rural, urban non-teaching, urban teaching

  • Region of Hospital: Northeast, Midwest, South,

West

  • Hospital Weights: weight to hospitals in AHA

universe, weight to hospitals in the State

Hospital Data File

slide-12
SLIDE 12

Severity of Illness Data

  • Severity of Illness Subclass
  • Risk of Mortality Subclass
  • 29 Comorbid conditions: Alcohol Abuse, Depression, Drug

Abuse, Liver Disease, Renal Failure, Obesity…………..

  • Defined by Elixhauser Comorbid Scale
  • https://www.hcup-

us.ahrq.gov/toolssoftware/comorbidityicd10/comorbidity_icd10.jsp

slide-13
SLIDE 13

Cost-to-Charge- Ratio Data

  • Year
  • Hospital Unique Identifier
  • Wage Index
  • CCR_NIS (an Identifier, linking NIS 2012 to current )
  • Calculate “Total Cost” based on above data and “Total Charges”

(TOTCHG) variable which is available in NIS core data file

  • Formula : gen Total_COSTS= TOTCHG*CCR_NIS
slide-14
SLIDE 14
slide-15
SLIDE 15
  • NEDS is the largest all-payer ED database in the

United States

  • Samples include stratified samples of 20% of US

hospital-based Emergency Departments

  • Years available: 2006-2016
  • Number of ED visits: Between 25 and 30 million

(unweighted) records for ED visits from 950 hospitals

  • Cost of NEDS 2016 $1000

Nationwide Emergency Department Sample (NEDS)

slide-16
SLIDE 16

Four Data Files per year ▪ Core data ▪ Emergency department data ▪ Inpatient data ▪ Hospital Weights data

What Data Elements Are in the NEDS?

slide-17
SLIDE 17

Examples of NEDS-Based Research

slide-18
SLIDE 18

Nationwide Readmissions Database (NRD)

  • Calculate national readmission rates for all payers and the uninsured
  • Available nationally representative information on hospital

readmissions for all ages

  • Unweighted NRD data from approximately 12 million discharges each

year

  • Has Core data, Hospital data, Illness severity, Cost to Charge Ratio

data

  • Available years 2010- 2016
  • Cost of NRD 2016 data $1000
slide-19
SLIDE 19
slide-20
SLIDE 20

KID (Kids’ Inpatient Database )

  • Only all-payer pediatric inpatient care database in the USA
  • Contains 2-3 million hospital stays
  • Helps to develop national & regional estimates on diseases
  • Data available for Demographics, Injury characteristics,

Diagnosis, Hospital characteristics, Outcomes and Healthcare Cost

  • Need to sign a DUA
  • Cost of KID 2016 data $500
slide-21
SLIDE 21
slide-22
SLIDE 22
  • The largest registry of trauma patients admitted to

trauma centers in the United States

  • Data is not weighted
  • No DUA (data user agreement)
  • Samples are obtained from trauma center
  • registries

▪ In 2011, 747 trauma centers were included

  • Years available: 2002 -2016
  • Data files are in CSV format
  • Cost of 2016 NTDB data $300

National Trauma Data Bank (NTDB)

slide-23
SLIDE 23

NTDB Data

  • Demographic data
  • Injury severity data
  • Emergency department data
  • Mechanisms of Injury data
  • ICD9 and ICD10 Procedure data
  • ICD9 and ICD10 Diagnosis data
  • Discharge disposition data
  • Facility data
  • Vital signs data
  • Protective devices & transportation data
  • Comorbid and complications data
slide-24
SLIDE 24
slide-25
SLIDE 25

National Surgical Quality Improvement Program (NSQIP)

  • A nationally validated, risk-adjusted, and outcomes-

based program

  • NSQIP has prospective and outcomes data
  • Years available: 2005 - 2011
  • NSQIP will measure and improve the quality of surgical

care across surgical specialties

  • 680 hospitals are participating NSQIP in 2017
slide-26
SLIDE 26

What Data Elements Are in the NSQIP?

  • Preoperative risk factors
  • Intraoperative variables
  • 30-day postoperative mortality and morbidity
  • utcomes
  • Demographic data
  • Current Procedural Terminology (CPT) data
  • Health and behavior data
  • Physical examination data
slide-27
SLIDE 27
  • Free data for NSQIP participating hospitals
  • Data Request Process

www.facs.org/quality-programs/acs-nsqip

  • Need to sign a DUA (Data User Agreement)
  • Download the data

www.facs.org/quality-programs/acs-nsqip

  • Data files available in 3 different formats: Text, SPSS,

SAS

slide-28
SLIDE 28
slide-29
SLIDE 29
  • Private database
  • MarketScan is broadly representative of the commercially

insured population of United States

  • High quality, longitudinal, and patient level data
  • Low percentage of missing data
  • Years available: 2002 – 2011
  • Need to sign a DUA (Data User Agreement)

Cost around $50,000/year

MarketScan Data

slide-30
SLIDE 30
  • Patient socio-demographic data
  • Admission date and type
  • Diagnosis code (principal and secondary)
  • Discharge status
  • Procedure code (principal and secondary)
  • Length of stay
  • Place of service
  • Provider ID
  • Data on drugs/medications

What Data Elements Are in the MarketScan?

slide-31
SLIDE 31
slide-32
SLIDE 32

SEER-Medicare Data

  • SEER-Medicare Linked Database
  • Medicare beneficiaries with cancer
  • Data derived from Surveillance, Epidemiology and End Results
  • Diagnosis & Procedure codes: ICD9, ICD10, CPT,
  • HCPCS (Healthcare Common Procedure Classification System)
  • Patient Demographic and Socioeconomic Characteristics
  • Comorbidity
  • Breast, Colorectal, and Prostate Cancer Screening
  • Radiation Therapy (includes codes to identify radiation therapy)
  • Chemotherapy Use (includes codes to identify chemotherapy)
  • Complications of Cancer Treatment
  • Surveillance After Cancer Treatment
  • Data sets available from 1991-2015
  • Need to sign a DUA (Data User Agreement)
slide-33
SLIDE 33
slide-34
SLIDE 34
  • Physician Characteristics
  • Hospital Characteristics
  • Health Care Costs Related to Cancer Treatment
slide-35
SLIDE 35

National Health and Nutrition Examination Survey (NHANES)

  • Cross-sectional and high quality survey data of adults and children

in United States

  • Data available on nationally representative sample of about 5,000

persons/each year

  • Years available:

1971-75—NHANES I 1976-80—NHANES II 1982-84—Hispanic Health and Nutrition Examination Survey (HHANES) 1988-94—NHANES III 1999-present--National Health and Nutrition Examination Survey (Continuous NHANES)

  • Free to download the data from CDC website
slide-36
SLIDE 36
  • Socio-Demographic data
  • Dietary data
  • Clinical examination data (medical, dental, and

physiological measurements)

  • Laboratory data
  • Questionnaire data
  • Genetic data
  • Mortality data
  • NHANES Medicare Utilization and Expenditure Linked

Files (Restricted data)

  • NHANES Linked Mortality files
  • NHANES Linked Social Security Administration Files

(Restricted Data)

What Data Elements Are in the NHANES?

slide-37
SLIDE 37
slide-38
SLIDE 38
slide-39
SLIDE 39
  • National survey has ambulatory medical care

services data in the United States

  • Data will represent a sample of visits to non-

federal employed, office-based physicians who are primarily engaged in direct patient care

  • NAMCS has high quality cross-sectional data
  • Years available: 1973-Current
  • Free to download the data from CDC

website

National Ambulatory Medical Care Survey (NAMCS)

slide-40
SLIDE 40
  • Socio-demographic data
  • Source of payment and number of past visits
  • Patient’s Primary Care Physician Information

Diagnosis

  • Chronic disease checklist and disease management

programs

  • Screening and diagnostic services
  • Treatments and drugs prescribed
  • Physician specialty
  • EMR use and practice parameters
  • sources of revenue
  • Providers seen and duration of care under those

providers

What Data Elements Are in the NAMCS?

slide-41
SLIDE 41

Evidence Based Research-NAMCS

slide-42
SLIDE 42

National Hospital Ambulatory Medical Care Survey (NHAMCS)

  • National probability sample survey of visits to Emergency and

Outpatient departments in nonfederal, general, and short-stay hospitals in United States

  • Records-based survey data, producing annual estimates of the

number and attributes of visits to hospital emergency departments (EDs) in the U.S

  • Survey is a visit based and cannot calculate prevalence and incidence

rates

  • Years available: 1992-Current
  • Free to download the data from CDC website
slide-43
SLIDE 43
slide-44
SLIDE 44

Thank you ! Lakshika email: lakshika@stanford.edu