Data Integrity Verification IIA Orange County Chapter November 13, - - PowerPoint PPT Presentation

data integrity verification
SMART_READER_LITE
LIVE PREVIEW

Data Integrity Verification IIA Orange County Chapter November 13, - - PowerPoint PPT Presentation

Data Integrity Verification IIA Orange County Chapter November 13, 2015 1 Data Integrity Verification Excel Transformed My Data! BEFORE AFTER


slide-1
SLIDE 1

Data Integrity Verification

  • 1

Data Integrity Verification

IIA Orange County Chapter November 13, 2015

slide-2
SLIDE 2

Excel Transformed My Data!

Data Integrity Verification 2

BEFORE AFTER

slide-3
SLIDE 3

3

Michael Kano

Senior Manager, Data Analytics Sunera LLC

Michael is a Senior Manager with Sunera’s national data analytics practice. Michael has 20 years of experience in data analytics and internal audit with organizations in the USA, Canada, and Kuwait. He has 20 years of experience with ACL software, including 8 years as the leader of ACL Services Ltd.’s global training team. During his tenure at ACL Services, Michael helped drive the training business to new levels of revenues and profits by actively supporting the Sales team in pre-sales discussions. Michael’s most recent experience consists of four years with eBay, Inc.’s internal audit team as Manager, Audit Analysis. He was tasked with integrating data analytics into the audit workflow on strategic and tactical levels. This included developing quality and documentation standards, training users, and providing analytics support on numerous audits in the IT, PayPal, and eBay marketplaces business areas. He also provided support to non-IA teams such as the Business Ethics Office and Enterprise Risk Management teams. During his years at eBay, Michael supported audits throughout the organization in the IT, compliance,

  • perations, vendor management, revenue assurance, T&E, and human resources areas.

Michael also has 7 years of experience with Arbutus Software, and has managed the transition to Arbutus from other data analysis tools. He is a proficient user of Tableau, Microsoft Access, and Teradata SQL Assistant.

slide-4
SLIDE 4

AGENDA

  • Defining data integrity verification (DIV)
  • Sources of integrity erosion
  • File-level testing
  • Field-level testing

Data Integrity Verification 4

slide-5
SLIDE 5

Defining Data Integrity Verification

5 Data Integrity Verification

slide-6
SLIDE 6

Data Integrity Verification (DIV)

The process by which the data analyst tests the data to determine whether it is acceptable for analysis Tests should be carried out at both the file level and the field level before conducting any analytics.

Data Integrity Verification 6

slide-7
SLIDE 7

The Risks of Integrity Erosion

Lost time Incorrect conclusions Revenue/cost Security Professional standing

7 Data Integrity Verification

slide-8
SLIDE 8

Evidence of data integrity erosion

Missing records Excess records Duplicates Shifted fields Skewed records

Data Integrity Verification 8

Blank/invalid entries in key fields Incorrect/invalid formatting Invalid characters in data

slide-9
SLIDE 9

Shifted Fields

Data Integrity Verification 9

slide-10
SLIDE 10

Skewed Records

Data Integrity Verification 10

slide-11
SLIDE 11

Sources of Integrity Erosion

11 Data Integrity Verification

slide-12
SLIDE 12

Processing…

Data Integrity Verification 12

slide-13
SLIDE 13

The Process

13 Data Integrity Verification

slide-14
SLIDE 14

Sources of data integrity errors

Miscommunication of requirements Extraction Conversion Transmission Import Manual edits Data definition

Data Integrity Verification 14

slide-15
SLIDE 15

Miscommunication

"All AP transactions between April and June, including all important fields." "All AP payments and reversals between 4/1/2015 and 6/30/2015 (inclusive) including the following fields: <field list>. The output should be in a tab-delimited text file, and at no point should it pass through a spreadsheet

  • r be opened in a spreadsheet application."

15 Data Integrity Verification

slide-16
SLIDE 16

Conversion

Dropping leading zeros (ID numbers) Converting date to numeric Removing alphas from alphanumeric field Use of delimiter that is included within a text field Insertion of blank lines in Excel

Data Integrity Verification 16

slide-17
SLIDE 17

Date Conversion

Data Integrity Verification 17

slide-18
SLIDE 18

Manual Edits

Inadvertent/deliberate editing How does that happen?

Sorting Formatting Copy/pasting

18 Data Integrity Verification

slide-19
SLIDE 19

Data Definition

Record length Field position Formatting (date fields)

Data Integrity Verification 19

slide-20
SLIDE 20

File-Level Testing

20 Data Integrity Verification

slide-21
SLIDE 21

File-Level Testing

Structure Content

Data Integrity Verification 21

slide-22
SLIDE 22

Structure

Review metadata Send table layout to a table in Arbutus/ACL Compare field type/length/format to metadata

22 Data Integrity Verification

slide-23
SLIDE 23

Content

Completeness

Run COUNT to document number of records Run TOTAL on numeric fields for control totals

Uniqueness: Run DUPLICATES command selecting all fields to identify duplicate records Validity: Run VERIFY against numeric and date fields

Data Integrity Verification 23

slide-24
SLIDE 24

Field-Level Testing: Numerics

24 Data Integrity Verification

slide-25
SLIDE 25

Numeric Fields: What to look for

Data Integrity Verification 25

Field total Lowest value Highest value Average Second-highest value Range Ratio of 2nd highest to highest Absolute value Median Number of zeros Number of positives Number of negatives Number of corrupt entries

slide-26
SLIDE 26

Testing Numeric Fields

Run STATISTICS against all numeric fields

Look for zeros, negatives, bounds, highest/second-highest

Recalculate computed value with computed fields (e.g, Total_Amount = Price * Quantity)

Data Integrity Verification 26

slide-27
SLIDE 27

Scripted Solution

Data Integrity Verification 27

Shows table/field names, and test date-time in a table Provides comprehensive, standard test results Faster and less error-prone than manual execution 2 million records, 4 numeric fields in ~45 seconds Also saves table layout for file with _TL suffix

slide-28
SLIDE 28

Script Results: Numerics

28 Data Integrity Verification

slide-29
SLIDE 29

Field-Level Testing: Dates

29 Data Integrity Verification

slide-30
SLIDE 30

Date Fields: What to look for

Data Integrity Verification 30

Oldest Weekends Most recent Blanks Span of valid dates Invalid non-blank dates

slide-31
SLIDE 31

Testing Date Fields

Run STATISTICS against all date fields

Blanks/invalids/weekends Bounds

Test related fields, e.g., PO_Date <= Invoice_Date Test for completeness (24/7 data) with GAPS command

Data Integrity Verification 31

slide-32
SLIDE 32

Blank Dates & Formatting

Entire date column is blank = Incorrect format in field definition. Edit >> Table Layout to review and correct format

Data Integrity Verification 32

slide-33
SLIDE 33

Formatting Date Fields

Data Integrity Verification 33

slide-34
SLIDE 34

Dates: Scripted Solution

34 Data Integrity Verification

slide-35
SLIDE 35

Field-Level Testing: Characters

35 Data Integrity Verification

slide-36
SLIDE 36

Character Fields: What to look for

Data Integrity Verification 36

Item Functionality Blanks ISBLANK(<key>) Invalid entries CLASSIFY ON <key> CLASSIFY ON FORMAT(<key>) Duplicates DUPLICATES ON <key>

slide-37
SLIDE 37

Character Fields: Formats

Verify that format is valid May need to scrub PO numbers, customer IDs, phone numbers, zip codes Use FORMAT() function in CLASSIFY to display list of unique formats

CLASSIFY ON FORMAT(<field name>) TO "<output file>" OPEN

Data Integrity Verification 37

slide-38
SLIDE 38

Output of CLASSIFY + Format()

Data Integrity Verification 38

1 record per format Shows frequency x= lower-case alpha X = upper-case alpha 9 = numeric Blanks/special characters

slide-39
SLIDE 39

Mitigating Integrity Risk

39 Data Integrity Verification

slide-40
SLIDE 40

Key Items

Know your data Obtain data independently (SQL?) Short chain from extraction to analysis Automated DIV

40 Data Integrity Verification

slide-41
SLIDE 41

The Process

41 Data Integrity Verification

slide-42
SLIDE 42

The New Process

42 Data Integrity Verification

slide-43
SLIDE 43

Benefits

Independence Confidence Shorter time Comprehensive DIV

Data Integrity Verification 43

slide-44
SLIDE 44

Any questions?

Michael Kano, ACDA mkano@sunera.com

Data Integrity Verification 44