Public Data Enhancing Data Discovery and Exploration Benjamin Yolken - - PowerPoint PPT Presentation

public data
SMART_READER_LITE
LIVE PREVIEW

Public Data Enhancing Data Discovery and Exploration Benjamin Yolken - - PowerPoint PPT Presentation

Public Data Enhancing Data Discovery and Exploration Benjamin Yolken (yolken@google.com) September 2011 Overview Disseminating public statistics Google tools Public Data Explorer Fusion Tables Refine Conclusion Disseminating


slide-1
SLIDE 1

Public Data

Enhancing Data Discovery and Exploration Benjamin Yolken (yolken@google.com)

September 2011

slide-2
SLIDE 2

Overview

Disseminating public statistics Google tools

  • Public Data Explorer
  • Fusion Tables
  • Refine

Conclusion

slide-3
SLIDE 3

Disseminating Statistics

slide-4
SLIDE 4

Objective

Make public statistics accessible, useful, and well-organized.

slide-5
SLIDE 5
slide-6
SLIDE 6
slide-7
SLIDE 7

Public statistics (2)

slide-8
SLIDE 8

Public statistics (4)

slide-9
SLIDE 9

Accessible...

(1) Access: Data need to be online and findable

  • Provider web sites
  • Third-party aggregators
  • Search engines

(2) Understanding: Statisticians aren't the only users

  • Lay users: Teachers, students, journalists, policy makers
  • Computers: Search engines
  • If not accessible to non-experts, data can become unused or,

worse, misused

slide-10
SLIDE 10

Useful...

There are a lot of distractions today: tables and simple plots are not enough Need to engage not just with users' eyes, but also their brains

slide-11
SLIDE 11

Well-organized...

Go beyond flat lists of data...

  • Topics
  • Time periods
  • Geographic regions
  • Formats
  • Languages, etc...

Ultimately, depends on having good metadata

slide-12
SLIDE 12

Google Tools

slide-13
SLIDE 13

Public Data Explorer (PDE) [Link]

What it is:

  • Stand-alone product for interactively exploring and visualizing rich

datasets

  • Visualizations can be shared or embedded on 3rd party sites

What it's good for:

  • Reaching out to non-expert users
  • Getting traffic to your site
  • Categorical, aggregated, time-series data

Caveats:

  • Datasets must be in Dataset Publishing Language (DSPL) format

○ Have some tools to help ○ Working on converters from other formats like SDMX

slide-14
SLIDE 14

PDE: Demo

Demo link

slide-15
SLIDE 15

PDE: Embed

Demo link

slide-16
SLIDE 16

Fusion Tables [Link]

What it is:

  • Product for creating, editing, and sharing tabular data

What it's good for:

  • Table edits and transformations: joining, filtering, aggregating, etc.
  • Static visualizations, particularly maps
  • Exposing data to users via APIs

Caveats:

  • Not connected to PDE (yet)
  • Not as useful for time series exploration
slide-17
SLIDE 17

Fusion Tables: Demo

Demo link

slide-18
SLIDE 18

Google Refine

What it is:฀

  • Desktop-based tool for cleaning up and transforming tabular data

What it's good for:฀

  • Bulk data transformations
  • Faceted data browsing
  • Outlier-detection and cleanup

Caveats:฀

  • No collaboration features (yet)
slide-19
SLIDE 19

Google Refine

slide-20
SLIDE 20

Conclusion

Need to make statistics accessible, useful, organized Google has tools that can help Key advice: Think about the users, their needs Really exciting area, only scratched the surface in terms of what's possible

slide-21
SLIDE 21

Thank you! Questions?

slide-22
SLIDE 22

Appendix

slide-23
SLIDE 23

PDE Intro Video

slide-24
SLIDE 24

PDE: Metadata

Dataset Publishing Language (DSPL)

  • ฀Designed for interactive exploration and visualization
  • Released under BSD, open source license
  • Combines data tables (CSV) with metadata (XML)
slide-25
SLIDE 25

PDE: Dataset Creation and Upload