Data Quality Initiative At the Botanic Garden and Botanical Museum - - PowerPoint PPT Presentation

data quality initiative
SMART_READER_LITE
LIVE PREVIEW

Data Quality Initiative At the Botanic Garden and Botanical Museum - - PowerPoint PPT Presentation

Data Quality Initiative At the Botanic Garden and Botanical Museum Berlin-Dahlem David Fichtmueller 2013-10-29 Match the Country Names Country Name ISO 3166-1 alpha 2 Code Match the Country Names


slide-1
SLIDE 1

Data Quality Initiative

At the Botanic Garden and Botanical Museum Berlin-Dahlem

David Fichtmueller 2013-10-29

slide-2
SLIDE 2

Match the Country Names

Country Name

ISO 3166-1 alpha 2 Code

slide-3
SLIDE 3

Match the Country Names

Италия Estados Unidos Siraaliyoon アイスランド

Country Name

ISO 3166-1 alpha 2 Code

US IS IT SL

slide-4
SLIDE 4

Match the Country Names

Италия Estados Unidos Siraaliyoon アイスランド

United States - Spanish

Country Name

ISO 3166-1 alpha 2 Code

US IS IT SL

slide-5
SLIDE 5

Match the Country Names

Италия Estados Unidos Siraaliyoon アイスランド

Sierra Leone - Somali United States - Spanish

Country Name

ISO 3166-1 alpha 2 Code

US IS IT SL

slide-6
SLIDE 6

Match the Country Names

Италия Estados Unidos Siraaliyoon アイスランド

Sierra Leone - Somali Italy - Russian United States - Spanish

Country Name

ISO 3166-1 alpha 2 Code

US IS IT SL

slide-7
SLIDE 7

Match the Country Names

Италия Estados Unidos Siraaliyoon アイスランド

Iceland - Japanese Sierra Leone - Somali Italy - Russian United States - Spanish

Country Name

ISO 3166-1 alpha 2 Code

US IS IT SL

slide-8
SLIDE 8

Data Quality Initiative (DQI)

4 Projects at the Botanic Garden and Botanical Museum Berlin-Dahlem (BGBM) about DQ

slide-9
SLIDE 9

Goal

  • Avoid Duplicate Work
  • Create Better T
  • ols
  • Share Knowledge
  • Make T
  • ols/Knowledge public

– Open Source Software License

slide-10
SLIDE 10

What are Data Quality T

  • ols?
  • Any Software that helps improve

Data Quality

– Detect Errors

and/or

– Correct Errors

  • Automated!

– Don't bring the data to the tools,

but bring the tools to the data!

slide-11
SLIDE 11

How Data Quality T

  • ols should work
slide-12
SLIDE 12

How Data Quality T

  • ols should work

Data Quality Software The Software that accesses the research data to be checked

slide-13
SLIDE 13

How Data Quality T

  • ols should work

Data Quality Software Web Service The Software that accesses the research data to be checked

Making program logic accessible via web Example: REST-API

HTTP

slide-14
SLIDE 14

How Data Quality T

  • ols should work

Library Data Quality Software Web Service

Contains program logic, API Depending on Programming Language Example: Jar-File for Java-Library

The Software that accesses the research data to be checked

Making program logic accessible via web Example: REST-API

HTTP in the Software

slide-15
SLIDE 15

How Data Quality T

  • ols should work

Library Data Data Quality Software Web Service

Independent of Programming Language In a particular Format: XML, JSON, CSV, … Example: Dataset of Country Names Contains program logic, API Depending on Programming Language Example: Jar-File for Java-Library

The Software that accesses the research data to be checked

Making program logic accessible via web Example: REST-API

HTTP in the Software

slide-16
SLIDE 16

How Data Quality T

  • ols should work

Library Data Data Quality Software Web Service

Independent of Programming Language In a particular Format: XML, JSON, CSV, … Example: Dataset of Country Names Contains program logic, API Depending on Programming Language Example: Jar-File for Java-Library

The Software that accesses the research data to be checked

Making program logic accessible via web Example: REST-API

HTTP in the Software

Focus of the DQI

slide-17
SLIDE 17

Current Focus

  • Occurrence and Collection Data
  • Correction on individual values or

combination of values of one individual

  • No group validation

– Outliner Detection – Duplicate Detection

  • Programming Languages: Java and

JavaScript

slide-18
SLIDE 18

What can the DQI do for you?

Public Wiki: http://biowikifarm.net/dataquality

slide-19
SLIDE 19

What can you do for the DQI?

  • Let us know about good data sets /

libraries / web services

  • Spread the word, join the discussion
  • Bundle your tools in a library
  • Improve existing tools
  • T

urn a library into a web service

  • Suggest new tools
  • Port a library to a different language
slide-20
SLIDE 20

Future of the Data Quality Initiative

  • More and better tools
  • Fill the Wiki
  • Code Hosting and Bug Tracking
  • One DQ-Library to rule them all
  • Hosting for Web Services?
  • <Insert your idea here>
slide-21
SLIDE 21

Funding

slide-22
SLIDE 22

Thank You!

Questions ? Wiki:

http://biowikifarm.net/dataquality

E-Mail: d.fichtmueller@bgbm.org