Data Quality Services SQL Server 2012 Ash Tewari @ashtewari - - PowerPoint PPT Presentation
Data Quality Services SQL Server 2012 Ash Tewari @ashtewari - - PowerPoint PPT Presentation
Data Quality Services SQL Server 2012 Ash Tewari @ashtewari ashtewari.com Data Quality Services // Ash Tewari // @ashtewari What is Data Quality? The degree to which the data is fit for its intended use. Data Quality Services // Ash Tewari
Data Quality Services // Ash Tewari // @ashtewari
What is Data Quality?
The degree to which the data is fit for its intended use.
Data Quality Services // Ash Tewari // @ashtewari
Why is Data Quality Important?
Bad data is expensive. Management decision-making Regulatory compliance. Good data is good for business.
Data Quality Services // Ash Tewari // @ashtewari
Data Quality Issues
Completeness Conformity Consistency Accuracy Validity Duplication
Source : Data Quality Services FAQ
Data Quality Services // Ash Tewari // @ashtewari
Completeness
Is all the required information available? Example: if you have an email field where only 50,000 values are present out of a total of 75,000 records, then the email field is 66.6% complete.
Source : Data Quality Services FAQ
Data Quality Services // Ash Tewari // @ashtewari
Conformity
Are there expectations that data values conform to specified formats? Example: The Gender codes in two different systems are represented differently; in one system the codes are defined as ‘M’, ‘F’ and ‘U’ whereas in the second system they appear as 0, 1, and 2.
Source : Data Quality Services FAQ
Data Quality Services // Ash Tewari // @ashtewari
Consistency
Do values represent the same meaning? Example: Is a city name used consistently? For example: New York, NY, NYC, and The Big Apple refer to the same city.
Source : Data Quality Services FAQ
Data Quality Services // Ash Tewari // @ashtewari
Accuracy
Do data objects accurately represent the “real- world” values they are expected to model? Example: A customer’s address is a valid USPS address. However, the ZIP code is incorrect and the customer name contains a spelling mistake.
Source : Data Quality Services FAQ
Data Quality Services // Ash Tewari // @ashtewari
Validity
Do data values fall within acceptable ranges? Example: Salary values should be between 60,000 and 120,000 for position levels 51 and 52.
Source : Data Quality Services FAQ
Data Quality Services // Ash Tewari // @ashtewari
Duplication
Are there multiple, unnecessary representations of the same data objects within your data set?
Name Address Postal Code City State
- Mag. Smith
545 S Valley View D. # 136 34563 <Anytown> New York Margaret smith 545 Valley View ave unit 136 34563-2341 <Anytown> New-York Maggie Smith 545 S Valley View Dr <Anytown> NY. Source : Data Quality Services FAQ
Data Quality Services // Ash Tewari // @ashtewari
DQS Mechanisms
Cleaning Matching Profiling Monitoring
Data Quality Services // Ash Tewari // @ashtewari
DQS – Why?
Knowledge-driven Semantics Knowledge feedback loop Extensible
Data Quality Services // Ash Tewari // @ashtewari
DQS Installation
New in SQL Server 2012 SQL Server 2012 – Enterprise or BI Edition Installed from SQL Server 2012 Installer
Data Quality Services // Ash Tewari // @ashtewari
DQS Installer Bug
Data Quality Services // Ash Tewari // @ashtewari
SQL Server 2012 Installer
Data Quality Services // Ash Tewari // @ashtewari
DQSInstaller.exe
Data Quality Services // Ash Tewari // @ashtewari
DQS Demo
DQS Client Knowledge Base Management Cleansing Project Matching Project
Data Quality Services // Ash Tewari // @ashtewari
DQS + SSIS
DQS Cleansing Component
Data Quality Services // Ash Tewari // @ashtewari
DQS + MDS
MDS Excel Add-In
Data Quality Services // Ash Tewari // @ashtewari
Resources
Data Quality Services (MSDN) http://msdn.microsoft.com/en-us/library/ff877925.aspx Data Quality Services Blog http://blogs.msdn.com/b/dqs/ Data Quality Services Resources http://technet.microsoft.com/en-us/sqlserver/hh780961 Data Quality Services FAQ http://social.technet.microsoft.com/wiki/contents/articles/3919.data-quality-services-dqs- faq.aspx
Data Quality Services // Ash Tewari // @ashtewari