The Office for National Statistics Big Data Project Nigel Swier - - PowerPoint PPT Presentation

the office for national statistics big data project
SMART_READER_LITE
LIVE PREVIEW

The Office for National Statistics Big Data Project Nigel Swier - - PowerPoint PPT Presentation

The Office for National Statistics Big Data Project Nigel Swier NTTS Conference, 10-12 March 2015, Brussels Background Set up as a12 month project Jan-Dec 2014 Aims: Investigate the potential of big data for official statistics and


slide-1
SLIDE 1

The Office for National Statistics Big Data Project

Nigel Swier NTTS Conference, 10-12 March 2015, Brussels

slide-2
SLIDE 2

Background

  • Set up as a12 month project Jan-Dec 2014
  • Aims:
  • Investigate the potential of big data for official

statistics and to understand the challenges

  • Establish an ONS policy and longer term strategy
  • Recommend practical next steps
  • 3 month extension to prepare business case

for next phase

slide-3
SLIDE 3

Work packages

  • Management and Strategy
  • Stakeholder Engagement
  • Communication
  • Analysis and infrastructure:

Prices #Twitter Smart meters Mobile phones

Pilots Technology

slide-4
SLIDE 4

Stakeholder Engagement

Government International Privacy Groups Academia Commercial Sector

slide-5
SLIDE 5

Pilot 1: Smart meters

Potential of data from electricity smart-type meters to identify unoccupied households

  • More efficient response chasing
  • Data from smart meter trials in Great Britain

and Republic of Ireland

  • A range of potential methods identified
  • Privacy and ethics
slide-6
SLIDE 6

Smart-type Meter Energy Use Profiles

Occupied profile Unoccupied profile Anomaly

slide-7
SLIDE 7

Pilot 2: Mobile Phones

Mobile phone data to model population flows, e.g. Commuting statistics

  • Building relationships with mobile network
  • perators and other parts of UK Government
  • No data yet. Seeking better coordinated data

access for Government

  • Privacy and ethics (again)
slide-8
SLIDE 8

Pilot 3: Prices

Use of web scraped price data for use in price statistics

  • ONS prices collection is manual
  • Web scraping promises more detailed, more

frequent and cheaper data

  • Prototype web scrapers:
  • 35 CPI/RPI item categories
  • 3 supermarkets
  • Daily collection (around 6500 a day)
  • Data ‘wrangling’ is a big challenge
slide-9
SLIDE 9

Daily Price Index (Whiskey)

slide-10
SLIDE 10

Pilot 4: Twitter

Potential of geo-located Twitter to gain new insights mobility and migration

  • 7 months of geo-located tweets within Great Britain

(about 100 million data points)

  • Methodology to infer place of usual residence:
  • Identify user ‘anchor points’ by clustering tweets using a

DBSCAN algorithm

  • Identify residential anchor points using AddressBase and

nearest neighbour analysis

slide-11
SLIDE 11

Use case: Student mobility

slide-12
SLIDE 12

Conclusion

  • A range of potential benefits (not just about replacing

existing outputs)

  • Challenges can be overcome:
  • Technical/Skills => Innovation labs
  • Legal/ethical => Policy and guidance
  • Statistical (bias) => Benchmarking survey
  • Affordable access => ???
  • Long term investment is required

Project recommends a further 3 year project to deliver both tangible benefits and a broader capability to support big data projects