A Blockchain based Data Production Traceability System Research - - PowerPoint PPT Presentation

a blockchain based data production traceability system
SMART_READER_LITE
LIVE PREVIEW

A Blockchain based Data Production Traceability System Research - - PowerPoint PPT Presentation

A Blockchain based Data Production Traceability System Research Project 2 Sandino Moeniralam Wednesday February 28, 2018 University of Amsterdam Introduction Need for data lineage Copernicus EO Sentinel-2 mission Blockchain based


slide-1
SLIDE 1

A Blockchain based Data Production Traceability System

Research Project 2

Sandino Moeniralam Wednesday February 28, 2018

University of Amsterdam

slide-2
SLIDE 2

Introduction

  • Need for data lineage
  • Copernicus EO Sentinel-2 mission
  • Blockchain based

1/18

slide-3
SLIDE 3

Problem statement

  • Reproducibility crisis
  • Ideal situation
  • Copernicus EO missions largest in history
  • Version Control System insufficient

2/18

slide-4
SLIDE 4

Reproducibility crisis

Figure 1: 1,500 scientists lift the lid on reproducibility Source: https://www.nature.com/news/1-500-scientists-lift-the-lid-on-reproducibility-1.19970 3/18

slide-5
SLIDE 5

Related Work

Technologies

  • 1. BigchainDB
  • 2. Ethereum

Implementations

  • 1. Provenance
  • 2. Quality Assurance for Essential Climate Variables (QA4ECV)
  • 3. VCS-Blockchain

4/18

slide-6
SLIDE 6

Research questions

What requirements should a Blockchain based production traceability system for satellite data adhere to?

  • What does the data production process of Sentinel-2 Copernicus’s

Earth Observation data look like?

  • What types of data are to be distinguished?
  • How does one capture all the steps of the data production process?

5/18

slide-7
SLIDE 7

Data Lineage and Data Provenance

  • Difference data lineage data provenance
  • Several layers of abstraction
  • Different views
  • Open source provenance capture applications

6/18

slide-8
SLIDE 8

Copernicus Sentinel-2 EO missions

  • World’s largest single earth observation program
  • Sentinel 1-7 planned
  • 30 satellites in total
  • Different companies involved including Airbus, EUMETSAT, SpaceX

7/18

slide-9
SLIDE 9

Types of data

  • The datasets themselves
  • The production environment
  • Entire OS with applications
  • Python virtual environment
  • The production process
  • Human view: comments, explanation
  • Machine view: automatic scripts

8/18

slide-10
SLIDE 10

Satellite Data Processing Levels

  • No strict definitions
  • Level 0, 1A, 1B, 1C, 2A, 2B, 3A, 3B and 4
  • Published from level 1C onwards

9/18

slide-11
SLIDE 11

Blockchain

Advantages

  • Immutable
  • Distributed
  • Secure
  • Open

Disadvantages

  • Scalability issues
  • Computationally expensive

Figure 2: Distributed ledger Source: https://elearningindustry.com/bitcoin- blockchain-impacting-elearning-industry 10/18

slide-12
SLIDE 12

Bitcoin, Ethereum, BigchainDB

Figure 3: Abstract overview of a Blockchain Source: https://medium.com/@lhartikk/a-blockchain-in-200-lines-of-code-963cc1cc0e54

Data

  • Bitcoin: transactions
  • Ethereum: scripts
  • BigchainDB: storage

11/18

slide-13
SLIDE 13

Quality Assurance For Essential Climate Variables project

Figure 4: Provenance Traceability Chain Source: http://www.qa4ecv.eu 12/18

slide-14
SLIDE 14

Proposed design

Blockchain data

  • Cryptographic hash of the previous block
  • Timestamp
  • Proof-of-work

Data

  • Hash(dataset)
  • Pointer to dataset
  • Hash(production environment)
  • Pointer to production environment
  • Hash(production process)
  • Pointer to the production process

13/18

slide-15
SLIDE 15

Schematic sketch

Table 1: A schematic sketch

Block 0 Block 1 Block 2 hash(0) hash(Block 0) hash(Block 1) timestamp timestamp timestamp proof-of-work proof-of-work proof-of-work hash(dataset V1) hash(dataset V2) hash(dataset V3) pointer to dataset V1 pointer to dataset V2 pointer to dataset V3 hash(PE #1) hash(PE #2) hash(PE #3) pointer to PE #1 pointer to PE #2 pointer to PE #3 hash(PP #1) hash(PP #2) hash(PP #3) pointer to the PP #1 pointer to the PP #2 pointer to the PP #3

14/18

slide-16
SLIDE 16

Discussion

  • Volatile nature of digital data
  • Production Environment large size
  • Production Process complex
  • Blockchain based
  • Actual storage of the data unresolved

15/18

slide-17
SLIDE 17

Conclusion

What requirements should a Blockchain based production traceability system for satellite data adhere to? Every block should include the datasets, production environment and the production process for humans and machines.

16/18

slide-18
SLIDE 18

Future Work

  • More technical analysis into different Production Environments
  • Ethereum Virtual Machine compatible
  • Scalability issue

17/18

slide-19
SLIDE 19

Questions?

Questions?

Sandino Moeniralam

sandino.moeniralam@os3.nl

”It’s not broken, it’s a feature...”

18/18