The Impact of the Data Revolution on Official Statistics: - - PowerPoint PPT Presentation

the impact of the data revolution on official statistics
SMART_READER_LITE
LIVE PREVIEW

The Impact of the Data Revolution on Official Statistics: - - PowerPoint PPT Presentation

The Impact of the Data Revolution on Official Statistics: Opportunities, Challenges and Risks Prof. Rob Kitchin NIRSA, Maynooth University Background All-Island Research Observatory (AIRO; www.airo.ie) Dublin Dashboard


slide-1
SLIDE 1

The Impact of the Data Revolution on Official Statistics: Opportunities, Challenges and Risks

  • Prof. Rob Kitchin

NIRSA, Maynooth University

slide-2
SLIDE 2

Background

  • All-Island Research

Observatory (AIRO; www.airo.ie)

  • Dublin Dashboard

(www.dublindashboard.ie)

  • Digital Repository of Ireland

(DRI; www.dri.ie)

  • The Programmable City
slide-3
SLIDE 3

The Data Revolution book

  • A synoptic overview of big data,
  • pen data and data infrastructures
  • An introduction to thinking

conceptually about data, data infrastructures, data analytics and data markets

  • A critical discussion of the

technical issues and the social, political and ethical consequences

  • f the data revolution
  • An analysis of the implications of

the data revolution to academic, business and government practices

slide-4
SLIDE 4

The data revolution

  • Data infrastructures
  • Open and linked data
  • Big data
  • Data analytics
  • Data markets
  • Conceptualisation of data
  • Disruptive innovations that offer opportunities,

challenges and risks for government, business and academy

slide-5
SLIDE 5

Data infrastructures

  • Actively planned, curated and managed
  • Enables storing, scaling, combining, sharing and consuming data

across networked archives and repositories

  • Produces ‘data amplification’
  • NSIs long and loosely operated as such (trusted) infrastructures,

but now organising into more coordinated platforms with:

  • dedicated and integrated hardware and networked technologies;

interoperable software and middleware services and tools; shared standards, protocols, metadata; shared services (relating to data management and processing), analysis tools & policies (concerning access, use, IPR, etc)

  • Such infrastructures are being federated into larger pan-national

infrastructures (Eurostat, ESPON, UN, etc).

  • Many other institutions catching up
slide-6
SLIDE 6

Open and linked data

  • Opening PSI (and other) data for re-use: driven by

transparency, participation, collaboration, economic arguments

  • Linking data/metadata using non-propriety formats and

URIs and RDF so that data can be referenced and conjoined

  • NSIs already very active in this space; other government

data providers much further beyond

  • More to be done, especially retro opening and linking

historical records; producing APIs; upgrading extent of

  • penness (licensing re. re-use, reworking, redistribution,

reselling); using non-proprietary formats; opening data about the organizations themselves

slide-7
SLIDE 7

Big data

Characteristic Small data Big data Volume Limited to large Very large Exhaustivity Samples Entire populations Resolution and indexicality Coarse & weak to tight & strong Tight & strong Relationality Weak to strong Strong Velocity Slow, freeze-framed Fast Variety Limited to wide Wide Flexible and scalable Low to middling High

slide-8
SLIDE 8

Big data and official statistics (source ESSC 2014)

slide-9
SLIDE 9

Data analytics

  • Challenge of making sense of big data is coping

with its abundance and exhaustivity, timeliness and dynamism, messiness and uncertainty, semi- structured or unstructured nature

  • Solution has been machine learning made possible

by advances in computation and computational techniques

  • Four broad classes of analytics:
  • data mining and pattern recognition
  • statistical analysis
  • prediction, simulation, and optimization
  • data visualization and visual analytics
slide-10
SLIDE 10
slide-11
SLIDE 11

Conceptualising data

  • Technically and methodologically: data generation, handling,

processing, storing, analyzing, sharing, etc.

  • Philosophically: ontology, epistemology, ideology
  • what can we know about the world, how can we know it, what do should we

do with such knowledge

  • Critical data studies
  • rather than understanding data as objective, neutral, pre-analytic &

commonsensical, data are understood as being framed socially, political, ethically, philosophically in terms of their form, selection, analysis and deployment

  • data do not exist independently of the ideas, instruments, practices,

contexts, knowledges and systems used to generate, process and analyze them

  • data express a normative notion about what should be measured, for

what reasons, and what they should tell us; they have normative effects; they do not simply reflect the world but actively produce it

  • data are framed by and situated within data assemblages – NSI constitute

such assemblages

slide-12
SLIDE 12

Data assemblage

Attributes Elements Systems of thought Modes of thinking, philosophies, theories, models, ideologies, rationalities, etc. Forms of knowledge Research texts, manuals, magazines, websites, experience, word of mouth, chat forums, etc. Finance Business models, investment, venture capital, grants, philanthropy, profit, etc. Political economy Policy, tax regimes, public and political opinion, ethical considerations, etc. Governmentalities / Legalities Data standards, file formats, system requirements, protocols, regulations, laws, licensing, intellectual property regimes, etc. Materialities & infrastructures Paper/pens, computers, digital devices, sensors, scanners, databases, networks, servers, etc. Practices Techniques, ways of doing, learned behaviours, scientific conventions, etc. Organisations & institutions Archives, corporations, consultants, manufacturers, retailers, government agencies, universities, conferences, clubs and societies, committees and boards, communities of practice, etc. Subjectivities & communities Of data producers, curators, managers, analysts, scientists, politicians, users, citizens, etc. Places Labs, offices, field sites, data centres, server farms, business parks, etc, and their agglomerations Marketplace For data, its derivatives (e.g., text, tables, graphs, maps), analysts, analytic software, interpretations, etc.

slide-13
SLIDE 13

Implications and uses of data

  • Scaled, open, linked, big data and associated analytics produces

knowledge that enhances governing of people, managing

  • rganisations, leveraging value and producing capital, creating

better places, improving health and well-being, tackling social and ecological issues, fostering civic participation, etc.

  • They improve insight and wisdom, productivity, competitiveness,

efficiency, effectiveness, utility, sustainability, safety & security, transparency ...

  • Challenge established epistemologies in the academy
  • “Revolutions in science have often been preceded by revolutions in

measurement” Sinan Aral

  • new empiricism, data-driven science, computational social sciences,

digital humanities

  • transforming how we frame, ask and answer questions
slide-14
SLIDE 14

Opportunities for OS/NSIs

  • New sources of dynamic and linked data and more timely
  • utputs
  • Complement/replace/improve/add to existing

data/approaches

  • New forms of data analytics can provide greater insights

from existing and new datasets

  • Optimize working practices, gain efficiencies, redeploy

staff

  • Stronger links/partnerships with computational social

science, data science (esp. viz), and data industries

  • Drive creation of data-driven institutions and evidence-

informed governance

  • Greater visibility and use of products
slide-15
SLIDE 15

Challenges for OS/NSIs

  • Sourcing data from third parties and associated partnering,

legal and financial issues, including opening OSs derived from private data

  • Experimenting and trialing to determine:
  • suitability for official statistics, esp. when data being repurposed, is

not representatively sampled, and is flexible thus potentially altering continuity, and has undefined data quality (re. veracity (accuracy, fidelity), uncertainty, error, bias, reliability, calibration)

  • technological feasibility re. transferring, storing, cleaning,

checking, and linking big data

  • methodological feasibility re. augmenting/producing OSs.
slide-16
SLIDE 16

Challenges for OS/NSIs

  • Building and maintaining new IT infrastructure, retro

work on older data (opening, linking); ensuring security/data protection, deploying new data analytics

  • Sourcing additional resourcing (financial and staffing)

for dealing with new data streams and opening/linking data

  • Developing new technical and methodological skills

and sourcing/retaining trained/skilled staff

  • Establishing standards, standardization,

interoperability across jurisdictions

slide-17
SLIDE 17

Risks for OS/NSIs

  • Undermining of reputation and trust
  • quantity and utility of data opened (moving beyond low-hanging fruit)
  • quality of data (big data often messy & dirty) and losing control of

generation/sampling/processing

  • established statistical products become undermined or discontinued

before alternatives fully established/verified

  • partnering with third parties (tarnished by their reputation)
  • public perception and resistance to use of big data
  • Privacy and security
  • Access and continuity (will private sources of data be available
  • ver long term; will flexibility alter/break time-series);

resistance from third parties to sharing data (gratis);

  • Fragmented landscape across jurisdictions
  • Pressure to reduce staff/budget rather than redeploy
  • Competition and privatisation (data brokers)
slide-18
SLIDE 18

Solutions

  • Need:
  • conceptual, practical and strategic thought re. challenges and risks
  • f building data infrastructures, opening data, using big data
  • planning of change management from short to long-term
  • coordinated response re. experimentation, processes, trialing,

standards, IPR, legislation, software, building infrastructure, etc. to establish best practice and ensure continuity across jurisdictions

  • coordinated political lobbying re. resourcing
  • Alliances and sharing information with similar organisations (e.g.,

RDA, WDS)

  • Some of this already happening. More needed in a fast

moving space.

slide-19
SLIDE 19

Conclusion

  • A data revolution is underway
  • a fundamental shift in data openness and sharing,
  • volume, exhaustiveness, timeliness, granularity, relationality,

variety, analytics, technical infrastructures, etc.

  • conceptual thought relating to data
  • Creating a set of disruptive innovations that is producing
  • pportunities, challenges and risks for NSIs and others
  • It is important for NSIs to get ahead of the curve with

respect to challenges and risks, becoming proactive not reactive and setting the agenda for new innovations

  • This requires conceptual, practical and strategic thought

and a coordinated approach across institutions

slide-20
SLIDE 20

Rob.Kitchin@nuim.ie @robkitchin Kitchin, R., Lauriault, T. and McArdle, G. (2015) Knowing and governing cities through urban indicators, city benchmarking and real-time dashboards. Regional Studies, Regional Science 2: 1-28 Kitchin, R. and Lauriault, T. (2014) Small data in the era of big data. GeoJournal online first Kitchin, R. (2014) Big data, new epistemologies and paradigm shifts. Big Data and Society 1 (April-June): 1-12. Kitchin, R. and Lauriault, T. (2014) Towards critical data studies: Charting and unpacking data assemblages and their work. The Programmable City Working Paper 2, SSRN Kitchin, R. (2013) Big data and human geography: Opportunities, challenges and risks. Dialogues in Human Geography 3(3): 262–267 http://www.nuim.ie/progcity @progcity