SLIDE 1 Public Data
Enhancing Data Discovery and Exploration Benjamin Yolken (yolken@google.com)
September 2011
SLIDE 2 Overview
Disseminating public statistics Google tools
- Public Data Explorer
- Fusion Tables
- Refine
Conclusion
SLIDE 3
Disseminating Statistics
SLIDE 4
Objective
Make public statistics accessible, useful, and well-organized.
SLIDE 5
SLIDE 6
SLIDE 7
Public statistics (2)
SLIDE 8
Public statistics (4)
SLIDE 9 Accessible...
(1) Access: Data need to be online and findable
- Provider web sites
- Third-party aggregators
- Search engines
(2) Understanding: Statisticians aren't the only users
- Lay users: Teachers, students, journalists, policy makers
- Computers: Search engines
- If not accessible to non-experts, data can become unused or,
worse, misused
SLIDE 10
Useful...
There are a lot of distractions today: tables and simple plots are not enough Need to engage not just with users' eyes, but also their brains
SLIDE 11 Well-organized...
Go beyond flat lists of data...
- Topics
- Time periods
- Geographic regions
- Formats
- Languages, etc...
Ultimately, depends on having good metadata
SLIDE 12
Google Tools
SLIDE 13 Public Data Explorer (PDE) [Link]
What it is:
- Stand-alone product for interactively exploring and visualizing rich
datasets
- Visualizations can be shared or embedded on 3rd party sites
What it's good for:
- Reaching out to non-expert users
- Getting traffic to your site
- Categorical, aggregated, time-series data
Caveats:
- Datasets must be in Dataset Publishing Language (DSPL) format
○ Have some tools to help ○ Working on converters from other formats like SDMX
SLIDE 14 PDE: Demo
Demo link
SLIDE 15 PDE: Embed
Demo link
SLIDE 16 Fusion Tables [Link]
What it is:
- Product for creating, editing, and sharing tabular data
What it's good for:
- Table edits and transformations: joining, filtering, aggregating, etc.
- Static visualizations, particularly maps
- Exposing data to users via APIs
Caveats:
- Not connected to PDE (yet)
- Not as useful for time series exploration
SLIDE 17 Fusion Tables: Demo
Demo link
SLIDE 18 Google Refine
What it is:
- Desktop-based tool for cleaning up and transforming tabular data
What it's good for:
- Bulk data transformations
- Faceted data browsing
- Outlier-detection and cleanup
Caveats:
- No collaboration features (yet)
SLIDE 19
Google Refine
SLIDE 20
Conclusion
Need to make statistics accessible, useful, organized Google has tools that can help Key advice: Think about the users, their needs Really exciting area, only scratched the surface in terms of what's possible
SLIDE 21
Thank you! Questions?
SLIDE 22
Appendix
SLIDE 23
PDE Intro Video
SLIDE 24 PDE: Metadata
Dataset Publishing Language (DSPL)
- Designed for interactive exploration and visualization
- Released under BSD, open source license
- Combines data tables (CSV) with metadata (XML)
SLIDE 25
PDE: Dataset Creation and Upload