CASD-TeraLab Secure Remote Access to Confidential Big Data - - PowerPoint PPT Presentation

casd teralab
SMART_READER_LITE
LIVE PREVIEW

CASD-TeraLab Secure Remote Access to Confidential Big Data - - PowerPoint PPT Presentation

1 CASD-TeraLab Secure Remote Access to Confidential Big Data Alexandre Marty [ alexandre.marty@casd.eu ] Outline 2 CASD-TeraLab Use Cases Live Demo The Secure Data Access Centre 3 Data Insertions/extractions are A group of


slide-1
SLIDE 1

CASD-TeraLab

Secure Remote Access to Confidential Big Data

1

Alexandre Marty [ alexandre.marty@casd.eu ]

slide-2
SLIDE 2

Outline

CASD-TeraLab Use Cases Live Demo

2

slide-3
SLIDE 3

The Secure Data Access Centre 3

Insertions Extractions

Data Insertions/extractions are

  • controlled. Users do not have

Internet access from their workspace.

Hermetic Bubble

A group of tightly-sealed secured servers Hadoop cluster is available for handling Big Data. SD-Boxes are the only means of access to the Bubble. Access occurs via the Internet by encrypted channels User applications and processing are executed strictly within the Bubble. Sensitive data is hosted

  • nly within the Bubble.

Servers & Applications Sensitive Data

slide-4
SLIDE 4

TeraLab

 Publicly funded Big Data & Data Science platform  Open to:

 R&D and teaching projects, proof of concepts  Public and private sectors

 Everything for Big Data:

 Powerful and scalable infrastructure  Hadoop-based with all Hadoop tools  Extensive tools for scientists (R, SAS, machine learning…)

 Turnkey solution with full support and maintenance

4

slide-5
SLIDE 5

Use Cases

 Electricity transmission network data with RTE

 Impressive variety of data sources  Development of innovative apps

 Health data

 Requires high confidentiality  About 250 TB generated each year

 Mobile telecommunications data for tourism statistics

 European data

 Involvement in European projects: DwB, Eurostat Big

Data Task Force

5

slide-6
SLIDE 6

Scanner Data Project

 Work in collaboration with the Consumer Price

Index team

 One goal is to improve the CPI calculation  Find new opportunities to use the data and

develop new methodologies

 Daily sales data from 4 French major distribution

companies

 Very detailed data: products, stores…  5.7 billion rows, 1 TB

 Randomly generated dataset used for this

demonstration

6

slide-7
SLIDE 7

Live Demo

7

slide-8
SLIDE 8

For More Information

 www.teralab-datascience.fr  casd.eu  alexandre.marty@casd.eu

8