ADMINISTRATIVE DATA SOURCES DOCUMENTATION AND QUALITY EVALUATION G. - - PowerPoint PPT Presentation

administrative data sources
SMART_READER_LITE
LIVE PREVIEW

ADMINISTRATIVE DATA SOURCES DOCUMENTATION AND QUALITY EVALUATION G. - - PowerPoint PPT Presentation

AN ONTOLOGY - BASED APPROACH TO ADMINISTRATIVE DATA SOURCES DOCUMENTATION AND QUALITY EVALUATION G. DAngiolini, P. De Salvo, A. Passacantilli, E. Patruno, T. Saccoccio Istituto Nazionale di Statistica - ITALY New Techniques and Technologies


slide-1
SLIDE 1

AN ONTOLOGY-BASED APPROACH TO ADMINISTRATIVE DATA SOURCES’ DOCUMENTATION AND QUALITY EVALUATION

  • G. D’Angiolini, P. De Salvo, A. Passacantilli, E. Patruno, T.

Saccoccio Istituto Nazionale di Statistica - ITALY

New Techniques and Technologies for Statistics – Brussels, 10-12 March 2015

slide-2
SLIDE 2

Istat is launching a strategy for providing any actual or potential user

  • f administrative data sources with NEW SERVICEs

 SUPPLYING DOCUMENTATION about the available administrative data sources, in particular about

  • the INFORMATION CONTENT
  • the QUALITY

 MAKING the available administrative data sources MORE USABLE for statistical purposes by means of modifying their content, when possible

1

The Istat’s strategy for supporting the statistical usage of administrative data

An ontology-based approach to administrative data sources’ documentation and quality evaluation – NTTS 2015

slide-3
SLIDE 3

Administrative data sources’ INVESTIGATIONS Administrative data sources’ SURVEYS SUPERVISION ON CHANGES AND INNOVATION PROJECTS concerning administrative data sources and forms

DARCAP (Documenting Public Administration ARchives) system Modular Quality Assessment Framework for Administrative Data Sources

Disseminates the collected information about the administrative data sources’ CONTENT and QUALITY Supports the CHANGE NOTIFICATION activities Organizes the collected information about QUALITY Steers the quality evaluator in producing SPECIFIC RELIABILITY ESTIMATES

The Istat’s strategy: ACTIVITIES and TOOLS

An ontology-based approach to administrative data sources’ documentation and quality evaluation – NTTS 2015

2

slide-4
SLIDE 4

Today ISTAT – a no-nordic NSI – aims at a massive usage of the available administrative data sources, which have various information content and quality features Need for STANDARD and MODULAR DOCUMENTATION about the administrative data sources’ INFORMATION CONTENT and QUALITY In all countries NEW TRENDS imply that the NSIs assume a role of methodological regulation  non statistical organizations implement their own Decision Support Systems,

produce statistical information, use administrative data, exchange data 3

Supporting the statistical usage of administrative data: the need for STANDARD and MODULAR documentation

An ontology-based approach to administrative data sources’ documentation and quality evaluation – NTTS 2015

slide-5
SLIDE 5

 The statistical users need to answer both GENERAL and SPECIFIC questions concerning the INFORMATION CONTENT of the available administrative data sources  Answering SPECIFIC questions is crucial in order to effectively support the evaluation and the usage of each source as well as the COMPARISON and INTEGRATION of sources  For answering SPECIFIC questions the statistical users need STANDARD and MODULAR documentation of the INFORMATION CONTENT

4

DOCUMENTING the ADMINISTRATIVE DATA SOURCES’ INFORMATION CONTENT: the goal

An ontology-based approach to administrative data sources’ documentation and quality evaluation – NTTS 2015

slide-6
SLIDE 6

The statistical users need to answer both GENERAL and SPECIFIC questions concerning the INFORMATION CONTENT

  • a GENERAL QUESTION about the INFORMATION CONTENT: what

phenomena the administrative data source observe?

  • a SPECIFIC QUESTION about the INFORMATION CONTENT: does the

source observe the set of events “Students’ enrollments”? Which is the administrative definition for “Students’ enrollments”? Which are the observed characteristics for “Students’ enrollments”? the answers require the specification of the ADMINISTRATIVE DATA SOURCE’S ONTOLOGY according to a CONCEPTUAL MODEL 5

DOCUMENTING the ADMINISTRATIVE DATA SOURCES’ INFORMATION CONTENT: the goal

An ontology-based approach to administrative data sources’ documentation and quality evaluation – NTTS 2015

slide-7
SLIDE 7

A CONCEPTUAL MODEL for documenting the ADMINISTRATIVE DATA SOURCE’S ONTOLOGY, based on a logical background For each administrative data source, by means of a dedicated INVESTIGATION… …we document the ADMINISTRATIVE DATA SOURCE’S ONTOLOGY, by means of singling out all the collectives in the source with their owned relationships and their owned characteristics Such a STANDARD DOCUMENTATION of the analyzed administrative data sources’ INFORMATION CONTENT is available for any user in the DARCAP system

6

DOCUMENTING the ADMINISTRATIVE DATA SOURCES’ INFORMATION CONTENT: our approach

An ontology-based approach to administrative data sources’ documentation and quality evaluation – NTTS 2015

slide-8
SLIDE 8

OUR CONCEPTUAL MODEL

OUR CONCEPTS COMMON LITERATURE CONCEPTSNCEPTS SURVEY CONCEPTS COLLECTIVES

  • POPULATIONS or SETS

OF EVENTS

  • many linked collectives in an

ADS

  • set of single items, elements

OBJECT SETS

  • ften population and

event are not separate concepts

  • many linked collectives

in an ADS

  • their single items are

called OBJECTS POPULATION

  • ften population and

event are not separate concepts

  • ne main population,

some linked ones

  • their single items are

called STATIST. UNITS CHARACTERISTICS

  • quantitative or qualitative
  • have observation domains
  • r classifications
  • set of couples: collective

element + classification (or domain) item VARIABLES

  • quantitative or

qualitative

  • ften variable and

classification are not separate concepts VARIABLES

  • quantitative or

qualitative

  • ften variable and

classification are not separate concepts RELATIONSHIPS

  • set of couples: collective

element + linked other collective element

  • ften relationships are

not explicitly taken into account

  • ften relationships are

not explicitly taken into account 7

slide-9
SLIDE 9

OUR CONCEPTUAL MODEL: Examples

OUR CONCEPTS Examples Examples in logic form COLLECTIVES

  • POPULATIONS or

SETS OF EVENTS

  • set of single items,

elements COLLECTIVES POPULATIONS

  • Students

SETS OF EVENTS

  • Exams
  • Degree_events

COLLECTIVES POPULATIONS

  • Students (Rossi)

SETS OF EVENTS

  • Exams (Exami)
  • Degree_events (Degree_eventi)

OWNED CHARACTERISTICS

  • set of couples:

collective element + classification (or domain) item OWNED CHARACTERISTICS

  • Has residence in

+ Town list

  • Assigned credits

+ [6,12]

OWNED CHARACTERISTICS

  • Has residence in (Rossi, Rome)
  • Assigned credits (Exami, 12)

OWNED RELATIONSHIPS

  • set of couples:

collective element + linked other collective element OWNED RELATIONSHIPS

  • Passed_by
  • Concerns

OWNED RELATIONSHIPS

  • Passed_by (Exami, Rossi)
  • Concerns (Degree_eventi, Rossi)

8

slide-10
SLIDE 10

The statistical users need to answer both GENERAL and SPECIFIC questions concerning the QUALITY

  • a GENERAL QUESTION about the QUALITY: has the administrative data

source an admissible overall quality?

  • a SPECIFIC QUESTION about the QUALITY: does the administrative

source collect reliable information about the set of events “Students’ enrollments”? and about the characteristic “Owned degree”? the answers require a MODULAR QUALITY ASSESSMENT CHECKLIST IT IS A DIFFICULT TASK Surveys and administrative data sources are very different observation processes 9

DOCUMENTING the ADMINISTRATIVE DATA SOURCES’ QUALITY: the goal

An ontology-based approach to administrative data sources’ documentation and quality evaluation – NTTS 2015

slide-11
SLIDE 11

IT IS A DIFFICULT TASK Surveys and administrative data sources are very different observation processes

  • Different DATA COLLECTION PROCEDURES: surveys are designed as

snapshots of the observed collectives at specified moments, administrative data sources collect new information at any moment, in a continuous way, in particular they observe sets of events which occur in the course of time

  • Different QUALITY DETERMINANTS: the administrative sources’ data are
  • ften affected by systematic errors, and only the administrative source’s

experts can provide the data users with proper information about the effects

  • f such errors

10

DOCUMENTING the ADMINISTRATIVE DATA SOURCES’ QUALITY: the goal

An ontology-based approach to administrative data sources’ documentation and quality evaluation – NTTS 2015

slide-12
SLIDE 12

A MODULAR QUALITY ASSESSMENT FRAMEWORK based on a MODULAR QUALITY ASSESSMENT CHECKLIST For each administrative data source, by means of a dedicated INVESTIGATION… … we interview the data source’s expert about the existing sources of errors and when possible we perform in-depth quality analyses according to a MODULAR QUALITY ASSESSMENT CHECKLIST… ….because we want to assign specific quality indicators to each collective,

  • wned relationship and owned characteristic in the administrative data

source’s ontology

11

DOCUMENTING the ADMINISTRATIVE DATA SOURCES’ QUALITY: our approach

An ontology-based approach to administrative data sources’ documentation and quality evaluation – NTTS 2015

slide-13
SLIDE 13

We are now building the MODULAR QUALITY ASSESSMENT CHECKLIST  Collectives, owned relationships and owned characteristics in the administrative data source are associated with different kinds of minimal information units which are affected by different kinds of minimal errors  We are now singling out all such minimal errors and combining them in

  • rder to define all the possible errors which may affect each component
  • f the administrative data source’s ontology: each collective, owned

relationship and owned characteristic

12

DOCUMENTING the ADMINISTRATIVE DATA SOURCES’ QUALITY: our approach

An ontology-based approach to administrative data sources’ documentation and quality evaluation – NTTS 2015

slide-14
SLIDE 14

13

KINDS OF MINIMAL INFORMATION UNITS in any administrative data source

  • Collective belonging statements
  • Rossi belongs to the collective Students
  • Exami belongs to the collective Exams
  • Degreei belongs to the collective Degree_events
  • Owned characteristic statements
  • Rossi Has residence in Rome
  • Exami Assigned credits 12
  • Owned relationship statements
  • Exami Passed_by Rossi
  • Degree_eventi Concerns Rossi

Students (Rossi) Has residence in (Rossi, Rome) Exams (Exami) Assigned credits (Exami, 12) Passed_by (Exami, Rossi) Concerns (Degree_eventi, Rossi)

BUILDING the MODULAR QUALITY ASSESSMENT CHECKLIST

Deegre_events (Degree_eventi)

An ontology-based approach to administrative data sources’ documentation and quality evaluation – NTTS 2015

slide-15
SLIDE 15

14

KINDS of MINIMAL ERRORS for MINIMAL INFORMATION UNITS Example: Belonging statements ACCEPTANCE ERRORS

  • Exclusion errors

 Example: Rossi belongs to Students, but there is not a record in the administrative data source

  • Inclusion errors

 Example: Exami does not exist, but there is a record in the administrative data source

IDENTIFICATION ERRORS

  • Syntax errors (wrongly built identification code)
  • Semantic errors (identification code lack, duplicate identification code, etc.)

LINK CODE ERRORS defined only for Owned relationship statements

Students (Rossi) Exams (Esamei) Deegre_events (Degree_eventi)

BUILDING the MODULAR QUALITY ASSESSMENT CHECKLIST

An ontology-based approach to administrative data sources’ documentation and quality evaluation – NTTS 2015

slide-16
SLIDE 16

We are developing a unique, standard method for  Documenting the information content of any administrative data source in a comparable way  Assigning a standard, specific quality indicator to each component of such an information content, in a modular way This is a mandatory step for defining a FORMAL quality assessment framework for administrative data sources… …. based on assigning probability estimates to the possible errors which may affect each component of the administrative data source’s ontology: each collective, owned relationship and owned characteristic

15

The advantages of our approach and the future developments

An ontology-based approach to administrative data sources’ documentation and quality evaluation – NTTS 2015