Managing Data Quality at Scale The National Health Services - - PowerPoint PPT Presentation

managing data quality at scale
SMART_READER_LITE
LIVE PREVIEW

Managing Data Quality at Scale The National Health Services - - PowerPoint PPT Presentation

Managing Data Quality at Scale The National Health Services Directory Health Data Analytics 2019 Sydney, 16-17 October Allen Nugent Senior Data Analyst Healthdirect Australia 17/10/2019 Classification: Unclassified Classification:


slide-1
SLIDE 1

Classification: Unclassified Classification: Unclassified

Managing Data Quality at Scale

The National Health Services Directory

17/10/2019

Allen Nugent Senior Data Analyst Healthdirect Australia

Health Data Analytics 2019

Sydney, 16-17 October

slide-2
SLIDE 2

Classification: Unclassified Classification: Unclassified

Who is Healthdirect Australia?

17/10/2019 2

 Patients & Carers  GPs & Specialists  Hospitals & Clinics  Pharmacies  Allied Health Services

Trusted health information and advice

  • national
  • government-owned
  • not-for-profit
slide-3
SLIDE 3

Classification: Unclassified Classification: Unclassified

Trusted data sources Service process Access & integration tooling NHSD Use Cases

The National Health Services Directory

17/10/2019 3

Consumer search Provider search / referral Health service provision gap analysis Policy development, planning, decision making Geospatial applications Secure electronic messaging, referrals, discharge summaries, event summaries, etc.

NHSD Use Cases 1 2 3 1 2 3

slide-4
SLIDE 4

Classification: Unclassified Classification: Unclassified

Data Value Augmentation

17/10/2019 4

The Goal

Validated synthesis of rich, high-level entities from multiple, specialised data sources

The Challenge

Primary Sources

AHPRA Medicare ABR

Secondary Sources

Telstra Health RCH Argus HotDoc

inconsistent address formats & content ambiguous names non-standard

  • ntology & coding

abbreviations & misspellings

  • missions

… …

slide-5
SLIDE 5

Classification: Unclassified Classification: Unclassified

Data Value Augmentation

17/10/2019 5

The Solution

Entity resolution / record matching

  • edit distance metrics
  • decision trees
  • clustering algorithms
  • generative models
  • machine learning

https://www.dropbox.com/sh/f4v4wfls4w4q887/AAC3e3KmdDcj3tFMCIIEUHcqa?dl=0

slide-6
SLIDE 6

Classification: Unclassified Classification: Unclassified

Example of Value-Added Data

17/10/2019 6

A composite entity for the Find a Service use-case

slide-7
SLIDE 7

Classification: Unclassified Classification: Unclassified

Data Quality

A practical definition:

The extent to which tolerance(1) is consistent with intended use(2)

17/10/2019 7

_____________________________________________

(1) tolerance:

  • maximum permissable error

(2) intended end use:

  • reasonable presumption of inference
  • n the part of the end user
  • effectively communicated usage guidelines

 analytics / data engineering  UX design / governance

slide-8
SLIDE 8

Classification: Unclassified Classification: Unclassified

Why is data quality important?

17/10/2019 8

User expectations

Accurate location,

  • pening hours,

service offerings

Value to general public

Convenient, reliable discovery

  • f services

Accurate and timely data for patient results, transfers, referrals

Value to healthcare infrastructure

Verification of professional accreditation & Medicare processes Reduced risk of clerical errors & ambiguity

slide-9
SLIDE 9

Classification: Unclassified Classification: Unclassified

Problem Statement

17/10/2019 9

How to develop & implement data quality metrics that …

  • are meaningful
  • drive action
  • serve identified objectives(1)

______________________

(1) defined by stakeholders

  • are transparent
  • are reproducible
  • can be productionised

business performance KPIs governance guidelines

slide-10
SLIDE 10

Classification: Unclassified Classification: Unclassified

Data Quality

Working definition(s):

Timeliness:

proportion of updates captured within specified time lag

Coverage:

proportion of known universe represented

Consistency:

proportion of data that agrees with the relevant Systems of Record

Completeness:

proportion of records with no missing essential fields

Veracity:

proportion of data that meets our criteria for conformity, uniqueness, and accuracy

17/10/2019 10

A composite measure of data quality

 driven by stakeholder feedback

slide-11
SLIDE 11

Classification: Unclassified Classification: Unclassified

Technical Challenges

17/10/2019 11

Quis custodiet ipsos custodes Who watches the watchers?

How to create a paradigm for data quality auditing that is itself governable?

How to implement data quality audits in a manner that does not involve complex code embedded in invisible software modules?

Quis regunt et duces Who governs the governors?

  • 1 M documents
  • streaming data with

infinite versioning

  • extensible data model

 applications  widgets  extracts, reports  FHIR

slide-12
SLIDE 12

Classification: Unclassified Classification: Unclassified

Definition of an Audit Template

17/10/2019 12

An audit should be …

  • based on a use-case or a data cleaning requirement
  • composed of logical tests implemented using uncomplicated code
  • self-describing
  • self-qualifying
  • self-normalising
  • hierarchical

confer context mitigate concealed errors

slide-13
SLIDE 13

Classification: Unclassified Classification: Unclassified

Output: Audit Table

17/10/2019 13

explicit statement

  • f rule

for filtering, grouping logical name for labeling charts, etc. count of matches count of test rows count needed for perfection pass rate

  • verall

result

Context Outcome Criteria

slide-14
SLIDE 14

Classification: Unclassified Classification: Unclassified

Code Sample: An Audit Test

17/10/2019 14

slide-15
SLIDE 15

Classification: Unclassified Classification: Unclassified

Mid-Level Audit Results

17/10/2019 15

slide-16
SLIDE 16

Classification: Unclassified Classification: Unclassified

Mid-Level Audit Results

17/10/2019 16

Total: 95,335

  • rganisations
slide-17
SLIDE 17

Classification: Unclassified Classification: Unclassified

Conclusions

17/10/2019 17

  • 1. The ultimate success of entity-matching and other data

cleaning techiques depends on data quality audits.

  • 2. A highly visible, self-explanatory auditing system can be

implemented in-house.

  • 3. This paradigm can scale with the (Big Data) platform of the

data itself.

slide-18
SLIDE 18

Classification: Unclassified Classification: Unclassified

Conclusions

17/10/2019 18

If we get all this right …

Use-Cases:

  • Find a Practitioner: GP
  • Discharge Summary: GP
  • Find a Service: Nurse Triage
  • Referral to Service Provider
slide-19
SLIDE 19

Classification: Unclassified Classification: Unclassified

… we won’t have to worry about this:

17/10/2019 19 upload.wikimedia.org

slide-20
SLIDE 20

Classification: Unclassified Classification: Unclassified

Questions

allen.nugent@healthdirect.org.au

17/10/2019