Trusted Smart Statistics: What it is Why it comes Where it brings - - PowerPoint PPT Presentation

trusted smart statistics what it is why it comes where it
SMART_READER_LITE
LIVE PREVIEW

Trusted Smart Statistics: What it is Why it comes Where it brings - - PowerPoint PPT Presentation

Trusted Smart Statistics: What it is Why it comes Where it brings us Fabio Ricciato fabio.ricciato@ec.europa.eu EUROSTAT - Big Data Task Force Smart Statistics 4 Smart Cities Kalamata, Greece, 6.10.2018 The new datafied world The cyber world


slide-1
SLIDE 1

Trusted Smart Statistics: What it is Why it comes Where it brings us

Fabio Ricciato fabio.ricciato@ec.europa.eu EUROSTAT - Big Data Task Force Smart Statistics 4 Smart Cities Kalamata, Greece, 6.10.2018

slide-2
SLIDE 2

The new datafied world

  • The cyber world is natively digitial. And the physical world is

being increasingly digitized (IoT, Smart Devices…)

  • “Anything that goes digital, gets logged”

(somewehere, by somebody) 1° fundamental law of datafication digitalization à datafication

my mobile phone operator

slide-3
SLIDE 3

The new datafied world

  • The cyber world is natively digitial. And the physical world is

being increasingly digitized (IoT, Smart Devices…)

  • “Anything that goes digital, gets logged”

(somewehere, by somebody) 1° fundamental law of datafication digitalization à datafication

  • Individuals, organizations, places … become “data fountains”
  • More and more business companies become “data buckets”

my mobile phone operator my energy provider my app provider me and my smart devices

slide-4
SLIDE 4

data and new data

  • Name. Gender. Birth date.

Marital Status. Residence address.

  • Occupation. Household composition…

Monthly income. Monthly expenditures per good category. Number of touristic trips in a year. … “micro-data”

  • Features about the

individual

  • changing slowly or rarely
  • recorded at coarse

temporal aggregation (months, years).

slide-5
SLIDE 5

data and new data

  • Name. Gender. Birth date.

Marital Status. Residence address.

  • Occupation. Household composition…

Monthly income. Monthly expenditures per good category. Number of touristic trips in a year. Your exact location, every second. Every single heart-beat, blood pressure… Every single transaction, purchases, encounter, event involving you… Your current opinion on any single fact… …

  • Features about single

events, transactions à highly pervasive, sub-individual level

  • changing continuously
  • recorded at fine temporal

aggregation (minutes, seconds) “micro-data” “nano-data”

  • Features about the

individual

  • changing slowly or rarely
  • recorded at coarse

temporal aggregation (months, years).

slide-6
SLIDE 6

data and new data

  • Name. Gender. Birth date.

Marital Status. Residence address.

  • Occupation. Household composition…

… Monthly income. Monthly expenditures per good category. Number of touristic trips in a year … Your exact location, every second. Every single heart-beat, blood pressure… Every single transaction, purchases, encounter, event involving you… Your current opinion on any single fact… … “Deep data” “Shallow data” “micro-data” “nano-data”

slide-7
SLIDE 7

Official Statistics.

  • The ultimate goal of Official Statistics is

to produce macro-data (statistics) from input micro-data

  • Collection of micro-data as ancillary task

macro-data (statistics) micro-data (abut individual)

slide-8
SLIDE 8

Official Statistics. Augmented

  • Availability of new (deep, nano) data sources

as opportunity to extend & empower Official Statistics

macro-data (statistics) micro-data (abut individual) nano-data (sub-individual) Additional statistical products: more dimensions, better timeliness, finer spatio/temporal granularity, … Additional processes Additional Input Data Sources Additional micro-data, possibly derived from nano-data

slide-9
SLIDE 9

Where the data can be accessed?

smart car smart home smartphone carmaker energy company

  • nline

platforms smartwatch Statistical Office

B2G channel Business(Bucket?)-to-Government access to privately-held data private-public partnerships … C2G channel Citizens-to-Government Crodwsourcing, Smart Surveys Citizen Statistics!

slide-10
SLIDE 10

Official Statistics based on survey data

society, economy policy, media, research SO collection processing Public sector SO: Statistical Office

slide-11
SLIDE 11

Official Statistics based on survey data and administrative data

society, economy policy, media, research SO Public sector collection processing processing SO: Statistical Office

slide-12
SLIDE 12

and now Big Data come into play

society, economy policy, media, research SO Public sector collection processing processing Private sector (business and citizens)

slide-13
SLIDE 13

Handling the new in old ways Pull data in

society, economy policy, media, research SO Public sector collection processing processing processing processing processing processing Private sector

x

This is not feasible. Technical scalability,

  • rganisational,

legal (risk concentration), …

slide-14
SLIDE 14

Handling the new in old ways Pull data in

society, economy policy, media, research SO Public sector collection processing processing processing processing processing processing Private sector

Deep data “Shallow data” micro-data nano-data

x

This is not feasible. Technical scalability,

  • rganisational,

legal (risk concentration), …

slide-15
SLIDE 15

Handle the new in new ways Push computation out (partially)

society, economy policy, media, research SO Public sector collection processing processing Private sector

slide-16
SLIDE 16

society, economy policy, media, research SO Public sector collection processing

Handle the new in new ways Push computation out (partially)

processing processing processing processing processing processing processing processing processing processing processing processing Private sector processing Trusted Smart Statistics

slide-17
SLIDE 17

Trusted Smart Statistics

processing processing processing processing processing processing processing processing processing processing processing processing

Smart: externalization towards data sources

  • f the (intial) part of processing execution

Leveraging the “smart” features of the data sources (often Smart Systems, Smart Objects) and other “smart technologies” (e.g., Smart Contracts). Trusted: ensure an articulated set

  • f trust guarantees to all players

(SO as “taker” and “giver” of trust guarantees) Smart Statiscs as an opportunity to deliver more advanced statistical products, more timely (nowcasting), more targeted to specific user groups, through novel reporting and presentation ways …

Private sector (business and citizens) SO Trusted Smart Statistics

Guarantee that data are processed for the agreed purpose, by the agreed method, respect of user privacy & business confidentiality, compliance with legal provisions …

slide-18
SLIDE 18

Towards a Reference Architecture for Trusted Smart Statistics

Design Principles Reference Architecture Implementation

… Work-in-progress at Eurostat in coordination with ESS European Statistical System in dialogue with other stakeholders

  • Private Data Holders
  • Researchers, Academic communities
  • Data Protection Authorities
  • ther arms of European Commission
  • National and Local authorities

Specifications

slide-19
SLIDE 19

Some design principles

  • 1. Processing method (algorithm) transparent to all involved parties
  • co-designed or at least agreed-upon (consensus-based design)
  • 2. Data are not “moved to/shared with”, but only “used by” the

Statistical Office – goal is the output, not the input!

  • Adopt technologies for Secure Private Computing technologies,

e.g., Secure Multy-Party Computation

  • 3. Engage and partner with the input parties
  • Incentives might involve “giving back” computation output to them
  • 4. Agreement for data usage bound to computation instance.
  • Technological means guarantee that data cannot be used for other query/

purpose other than the agreed one(s)

  • 5. Purpose and algorithms open for public scrutiny
  • public transparency à public trust
slide-20
SLIDE 20

consensus SO DH-1 source code approved by all parties 1 DH-2 CA

Certification Authority? Statistical Office Data Holders

slide-21
SLIDE 21

Some design principles

  • 1. Processing method (algorithm) transparent to all involved parties
  • co-designed or at least agreed-upon (consensus-based design)
  • 2. Data are not “moved to/shared with”, but only “used by” the

Statistical Office – goal is the output, not the input!

  • Adopt technologies for Secure Private Computing technologies,

e.g., Secure Multy-Party Computation

  • 3. Engage and partner with the input parties
  • Incentives might involve “giving back” computation output to them
  • 4. Agreement for data usage bound to computation instance.
  • Technological means guarantee that data cannot be used for other query/

purpose other than the agreed one(s)

  • 5. Purpose and algorithms open for public scrutiny
  • public transparency à public trust
slide-22
SLIDE 22

DH-1 consensus CA SO DH-1 source code approved by all parties 1 confidential input data 2 SO

  • fficial

statistics […] DH-2 authenticated binary code executed in secure hardwar DH-2 secret shares non-personal intermediate data exported to SO

slide-23
SLIDE 23

secret shares

Secure Multi-Party Computation (SMPC) infrastructure

confidential input data computation output (non-personal)

SMPC computation

An infrastructure (technology + organizational provisions) to let the output information be extracted without exchanging the input data

slide-24
SLIDE 24

DH-1 confidential input data SO

  • fficial

statistics […] DH-2 authenticated binary code executed in secure hardwar secret shares

B2G scenario with multiple DHs

slide-25
SLIDE 25

DH-1 confidential input data SO

  • fficial

statistics […] DH-2 authenticated binary code executed in secure hardwar secret shares confidential input data SO

BG2G scenario: SO providing input data

slide-26
SLIDE 26

DH-1 confidential input data SO

  • fficial

statistics non-persona ldata exported to SO […] DH-2 authenticated binary code executed in secure hardwar secret shares confidential input data SO commercial analytics non-personal data exported for commercial purpose […] private company

B2G2B scenario: giving back to the private sector! B&G Partnership model?

Returning some output analytics product to the private sector for legitimate business purposes (with certification), might facilitate partnership models between Statistical Offices and private Data Holders

slide-27
SLIDE 27

DH-1 confidential input data DH-2 authenticated binary code executed in secure hardwar secret shares commercial analytics non-personal data exported for commercial purpose […] private company

Reusing the infrastructure for B2B analytics?

slide-28
SLIDE 28

Some design principles

  • 1. Processing method (algorithm) transparent to all involved parties
  • co-designed or at least agreed-upon (consensus-based design)
  • 2. Data are not “moved to/shared with”, but only “used by” the

Statistical Office – goal is the output, not the input!

  • Adopt technologies for Secure Private Computing technologies,

e.g., Secure Multy-Party Computation

  • 3. Engage and partner with the input parties
  • Incentives might involve “giving back” computation output to them
  • 4. Agreement for data usage bound to computation instance.
  • Technological means guarantee that data cannot be used for other query/

purpose other than the agreed one(s)

  • 5. Purpose and algorithms open for public scrutiny
  • public transparency à public trust
slide-29
SLIDE 29

Some design principles

  • 1. Processing method (algorithm) transparent to all involved parties
  • co-designed or at least agreed-upon (consensus-based design)
  • 2. Data are not “moved to/shared with”, but only “used by” the

Statistical Office – goal is the output, not the input!

  • Adopt technologies for Secure Private Computing technologies,

e.g., Secure Multy-Party Computation

  • 3. Engage and partner with the input parties
  • Incentives might involve “giving back” computation output to them
  • 4. Agreement for data usage bound to computation instance.
  • Technological means guarantee that data cannot be used for other query/

purpose other than the agreed one(s)

  • 5. Purpose and algorithms open for public scrutiny
  • public transparency à public trust
slide-30
SLIDE 30

Sharing input data Using input data on per-purpose basis

Statistical Office agreement #1 agreement #2 query #1 query #2 Statistical Office agreement data import query #1 query #2 query #3

slide-31
SLIDE 31

Some design principles

  • 1. Processing method (algorithm) transparent to all involved parties
  • co-designed or at least agreed-upon (consensus-based design)
  • 2. Data are not “moved to/shared with”, but only “used by” the

Statistical Office – goal is the output, not the input!

  • Adopt technologies for Secure Private Computing technologies,

e.g., Secure Multy-Party Computation

  • 3. Engage and partner with the input parties
  • Incentives might involve “giving back” computation output to them
  • 4. Agreement for data usage bound to computation instance.
  • Technological means guarantee that data cannot be used for other query/

purpose other than the agreed one(s)

  • 5. Purpose and algorithms open for public scrutiny
  • more public transparency à more public trust

more data pervasiveness more public transparency

slide-32
SLIDE 32

Some design principles

  • 1. Processing method (algorithm) transparent to all involved parties
  • co-designed or at least agreed-upon (consensus-based design)
  • 2. Data are not “moved to/shared with”, but only “used by” the

Statistical Office – goal is the output, not the input!

  • Adopt technologies for Secure Private Computing technologies,

e.g., Secure Multy-Party Computation

  • 3. Engage and partner with the input parties
  • Incentives might involve “giving back” computation output to them
  • 4. Agreement for data usage bound to computation instance.
  • Technological means guarantee that data cannot be used for other query/

purpose other than the agreed one(s)

  • 5. Purpose and algorithms open for public scrutiny
  • more public transparency à more public trust
slide-33
SLIDE 33

Some slogans to shout loud

  • Let the information flow, not the data!
  • Don’t show your data to me, but let me use it!
  • Share/distribute the computation

share/distribute the control don’t share/distribute the data!

  • Close the data, open the algorithms!
  • Using more pervasive data calls for
  • à more public transparency (open-source)
  • à more checks and balances

(distributed control, consensus, certification authorities?)

  • à stronger engagement of sources (fountains and buckets)
slide-34
SLIDE 34

Take home message

  • Trusted Smart Statistics = the future of Official Statistics
  • New sources of “big” data as input: more pervasive, timely,

heterogeneous... and often privately held!

  • Exploiting such data for Official Statistics requires a new

architecture to build “trust” among all stakeholders à ongoing work in Eurostat

  • Key ingredients: SMPC and/or Trusted Hardware,
  • pen algorithms, source-code certification (?),…
  • Once deployed, the same platform can be reused

for other public interest purposes (and perhaps even for B2B applications)

slide-35
SLIDE 35

Thanks for your attention

For follow-up contact

fabio.ricciato@ec.europa.eu