USE OF GEOSPATIAL AND WEB DATA FOR OECD STATISTICS CCSA S PECIAL - - PowerPoint PPT Presentation

use of geospatial and web data for oecd statistics
SMART_READER_LITE
LIVE PREVIEW

USE OF GEOSPATIAL AND WEB DATA FOR OECD STATISTICS CCSA S PECIAL - - PowerPoint PPT Presentation

USE OF GEOSPATIAL AND WEB DATA FOR OECD STATISTICS CCSA S PECIAL SESSION ON SHOWCASING BIG DATA 1 O CTOBER 2015 Paul Schreyer Deputy-Director, Statistics Directorate, OECD OECD APPROACH OECD : Facilitator of discussion on new data sources


slide-1
SLIDE 1

USE OF GEOSPATIAL AND WEB DATA FOR OECD STATISTICS

CCSA SPECIAL SESSION ON SHOWCASING BIG DATA 1 OCTOBER 2015 Paul Schreyer Deputy-Director, Statistics Directorate, OECD

slide-2
SLIDE 2

OECD APPROACH

slide-3
SLIDE 3
  • OECD:

– Facilitator of discussion on new data sources for NSOs – OECD’s own use of new data sources

  • From Big Data to Sm art Data

– Not every New data source is Big Not every Big data source is New

slide-4
SLIDE 4

Business value analysis: why are we working on this?

  • More granularity or coverage of existing data

(e.g. spatial disaggregation)

  • New output (e.g., measuring trust, inequalities)
  • Greater tim eliness – nowcasting
  • Increased im pact – analysis supporting OECD

mission, possibility to link areas

  • Increased responsiveness – capacity to address new

topics quickly, respond to what-if questions

slide-5
SLIDE 5

– Capacity to identify, evaluate and access new data sources – Command of methodology – Proven quality and metadata frameworks – Suitable IT infrastructures – Established legal and ethical frameworks – Skills and training capacity

Business process analysis: Necessary capabilities

slide-6
SLIDE 6

* Online Real estate prices

(OECD GOV)

* Measuring trade

restrictiveness by scraping and analysing trade laws (OECD TAD)

Web crawling, web scraping Content Analysis Mobility studies

Sensor and geospatial data

* African Economic

Outlook (AEO): Civil tensions and political governance indicators (OECD DEV)

* Big Data Measures of

Human Well-Being – Evidence from US Google Index (OECD STD)

* Measure transport

reliability from geolocalisation logs (ITF)

* Air quality and land

cover data (OECD GOV)

* Enriching the

metropolitan database using geo-spatial data (OECD GOV)

* PIAAC log file data

(OECD EDU)

4 types of new sources and examples of use cases

slide-7
SLIDE 7

EXAMPLE 1 ENVIRONMENTAL INDICATORS

Using geospatial data (satellite data)

slide-8
SLIDE 8

– Where air pollution is above recommended levels – Where improvements in air quality have happened – Linking air pollution to health

Average population exposure to air pollution (PM2.5)

Key messages that the indicator should communicate

slide-9
SLIDE 9

Source: Raster (satellite observations)

9

Ground-based stations Satellite observations Advantages

  • Direct measures
  • Offer regular levels of air pollution over

time

  • More pollutants are available
  • Global coverage
  • Consistent method to compute air

pollution in cities, regions and countries

  • Consistent time-series data, spanning

more than a decade Disadvantages

  • Low coverage in developing countries
  • Uneven coverage within and across

countries

  • PM2.5 concentration rarely monitored
  • Site selection, measurement

techniques, and reporting methods differ across regions and countries

  • Modelled data
  • Satellite observations are less precise

for bright surfaces (snow or desert)

  • Current data are on a multi-year

average, evaluation of short-term events often unavailable

Satellite observations

  • Raster: van Donkelaar et al. (2014)
  • Resolution: ~10 km2
  • Years: 1998-2012
slide-10
SLIDE 10
  • 1. The satellite-based values of air pollution

are multiplied by the population living in the area (using a 1km2 resolution grid)

  • 2. The exposure to air pollution in a region is

given by the sum of the population weighted values of PM2.5 in the 1km2 grid cells falling within the boundaries of the region

  • 3. Finally, dividing this aggregated value by the

total population in the region, we obtain the average exposure to PM2.5 concentration in a region

Basic methodology

slide-11
SLIDE 11
  • 68% of the urban population in OECD countries (376 million people) are exposed to

pollution above the WHO’s recommended levels.

  • OECD estimates show wide variation in PM2.5 exposure levels across cities within

countries, the largest in Mexico, Italy, Japan and Korea

11

Levels and trends in OECD cities

Mérida Palermo Naha Ulsan Toulon Portland Gdańsk Las Palmas Bremen Stockholm Glasgow Brno Concepción Geneva Quebec Utrecht Lisbon Athens Antwerp Linz Cuernavaca Milan Kumamoto Cheongju Strasbourg Buffalo Kraków Zaragoza Essen Malmö Liverpool Ostrava Santiago Zurich Toronto The Hague Porto Thessalonica Brussel Vienna Budapest Bratislava Ljubljana Copenhaguen Helsinki Tallinn Oslo Dublin

  • 10

10 20 30 40

Mexico (33) Italy (11) Japan (36) Korea (10) France (15) United States (70) Poland (8) Spain (8) Germany (24) Sweden (3) United Kingdom (15) Czech Republic (3) Chile (3) Switzerland (3) Canada (9) Netherlands (5) Portugal (2) Greece (2) Belgium (4) Austria (3) Hungary (1) Slovak Republic (1) Slovenia (1) Denmark (1) Finland (1) Estonia (1) Norway (1) Ireland (1) Metropolitan minimum Country average Metropolitan maximum Country (No. of cities)

Source: Brezzi and Sanchez-Serra (2014)

slide-12
SLIDE 12

Europe USA Japan World Raster nam e Corine land cover National land cover dataset (NLCD) Japan National Land Service Information data MODIS 500 Map of Global Urban Extent Resolution 25 metres 30 metres 100 metres 500m Years 2000-06 2001-06 1997-2006 2008

  • Classif. of

urban land 4 4 land urban classes 21 land cover classes 11 land cover classes 17 land cover classes Water

Other example: raster sources used for land cover

slide-13
SLIDE 13

…feeds into the OECD Regional Well-Being Database

Links: Regional Well-Being database Regional Well-Being web tool

slide-14
SLIDE 14

EXAMPLE 2 TRADE POLICY ANALYSIS

Using qualitative data from government websites

slide-15
SLIDE 15

Basic idea

Traditionally:

  • Policy questionnaires to countries
  • ‘Manual’ screening of government websites

New:

  • Machine-based monitoring of government web sites
  • Automatic check for changes or addition of rules and

regulations Test case: qualitative information for the OECD’s trade restrictiveness information and index

slide-16
SLIDE 16

Text comparison - Initial discovery

 Run a text comparison between the original document and the new updated document  Detect and flag specific paragraphs changed or updated inside long documents

Text comparison - Advanced discovery.

 Changes in rules and regulations can also happen through new pages  Use ‘big data’ techniques to compare in house structured information to the universe of laws and regulations in a given country.  Work on text definitions similar to the original ones to help identifying potentially relevant documents.

How?

slide-17
SLIDE 17

 Web-crawling: scripts to systematically scan governmental websites where regulations can be found (federal, provincial, regional, etc.).  Web-scraping: scripts to extract the relevant information in documents, possibly based on articles and paragraphs (text analysis).  Document conversion: most laws and regulations are in pdf but possibly in other formats that would need to become text documents to run text analysis.  Text comparison: tools and dictionaries to compare the text of updated documents with the original text, to calculate similarity coefficients with other documents, in a variety of languages with the option to also use proximity of similar words.

IT Tools

slide-18
SLIDE 18

Promising results on French legal texts (Legifrance)

Web scraping / Text analysis

slide-19
SLIDE 19
  • Significant potential
  • Use cases and pilots provide really

important reality checks

  • Smart data and multiple source, not

necessarily big data

  • Initiatives have sprung in many parts of

OECD

  • Need to be accompanied by overall

strategy being developed at OECD Summary

slide-20
SLIDE 20

Thank you!