Data Management Hints & Tips By Clark Lawson, Nationwide - - PowerPoint PPT Presentation
Data Management Hints & Tips By Clark Lawson, Nationwide - - PowerPoint PPT Presentation
Data Management Hints & Tips By Clark Lawson, Nationwide Building Society @thesasgeek Agenda Introduction Data Management Data Quality in SAS Introduction SAS data warehouse manager at Nationwide Building Society. Data
Agenda
- Introduction
- Data Management
- Data Quality in SAS
Introduction
- SAS data warehouse manager at
Nationwide Building Society.
- Data Warehouse supplies data to over 200
SAS users.
- A SAS user since 2005
- Recent focus is migrating from SAS Base
to SAS Data Integration Studio
Data Management
- With everyone moving into the digital age, data is recognised as a vital enterprise
asset.
- Having data management principles embedded into what we do either as Data
Scientists or Analysts, will help make more informed and effective decisions.
- This means that our role needs to include some form of minimum standards of
standards, governance and control.
- This enables the business analyst to focus on insight rather than doing data
management at the start of their project.
- What can these look like...
Data Management Standards
Data Governance Data Modelling Master Data Management Data Quality Data Management SAS Standards Reconciliation and Controls Process Models Information Management
Data Management Applied in SAS
SAS Standards
Across Data Domains
- Consistent Variable Names
- Consistent Variable Values
- Consistent Formats
SAS Processing
- Uniformed processing
- Using SAS Data Integration Studio
and its built in transformations Documentation
- Interface contracts both input &
- utput
- Service Level agreements
Data Quality
Data Quality
- Identify key fields and track data
quality Data Validation
- Check data against interface
contracts
- Continually evaluate at every step
Business Validation Rules (BVRs)
- Defined data rules constantly
checked
Data Quality in SAS
- Here we will discuss how to apply
data quality standards in SAS.
- At this point assume that we have…
Interface Contracts BVRs
Data Quality Starting Point
Component Description Name The name of the variable in the dataset Description The agreed definition of the relevant variable Metadata Information of the type, length & format of the variable Nullable Are missing values / nulls allowed? Acceptable Values A list of agreed acceptable values for both continuous and discrete variables
Data Quality Steps in SAS
Data Profiling
Continuous Variables Check range & outliers using PROC MEANS & GCHART Discrete Variables Check values using PROC FREQ Duplicate Values Check Check for duplicate values using PROC SQL
Data Validation
Key Variable Integrity Validate schemas using primary & foreign keys using PROC DATASETS. Business Validation Rules Utilise lookup tables and call execute to loop around rules using SQL