Secure Data Sharing and Distribution Platform for Integrated Big - - PowerPoint PPT Presentation

secure data sharing and distribution platform for
SMART_READER_LITE
LIVE PREVIEW

Secure Data Sharing and Distribution Platform for Integrated Big - - PowerPoint PPT Presentation

Secure Data Sharing and Distribution Platform for Integrated Big Data Utilization Oct.2015-Mar.2021 funded by Japan Science and Technology Agency Secure Data Sharing and Distribution Platform for Integrated Big Data Utilization - Handling all


slide-1
SLIDE 1

Secure Data Sharing and Distribution Platform for Integrated Big Data Utilization

Secure Data Sharing and Distribution Platform for Integrated Big Data Utilization

Waseda University Hayato YAMANA

Institute of Information Security AtsuhiroGOTO Ochanomizu University Masato OGUCHI Kogakuin University Saneyasu YAMAGUCHI The University of Electro-Communications Takahiko SHINTANI Meiji Pharmaceutical University Tamotsu NOGUCHI 1

  • Handling all data with encryption -

Group Members

Oct.2015-Mar.2021 funded by Japan Science and Technology Agency

slide-2
SLIDE 2

SD2 Platform for Integrated Big Data Utilization

Brief Introduction of our Project

  • 1. Research Background
  • 2. Objective
  • 3. Research Goal
  • 4. Research Strategy
  • 5. Experiment
  • 6. Schedule
  • 7. Progress in 2015FY

2

slide-3
SLIDE 3

SD2 Platform for Integrated Big Data Utilization

  • 1. Research Background

3

(*) http://www.emc.com/leadership/digital-universe/2014iview/

At least 40% of it requires some level

  • f security, from privacy protection

to full-encryption ‘lockdown.’ … Also unfortunately, the amount

needing protection will grow …

e.g. How should we manage private genome data?

slide-4
SLIDE 4

SD2 Platform for Integrated Big Data Utilization

n Anonymization

n Attribute Linkage Model

n k-anonymity, l-diversity, t-closeness

n Probabilistic Model

n differential privacy

4

  • 1. Research Background

William was governor of Massachusetts and his medical records were in the GIC data. Governor Weld lived in

  • Cambridge. According to the Cambridge

Voter list, six people had his particular birth date; only three of them were men; and, he was the only one in his 5- digit ZIP code.

Limitation of Anonymization Link Attack

GIC data

slide-5
SLIDE 5

SD2 Platform for Integrated Big Data Utilization

5

  • 2. OBJECTIVE

OUR APPROACH IS HANDLING ALL DATA WITH ENCRYPTION THROUGHOUT DATA LIFE CYCLE OUR APPROACH IS NOT ANONYMIZATION YOU CAN ADOPT ANONYMIZATION, BESIDES.

slide-6
SLIDE 6

SD2 Platform for Integrated Big Data Utilization

6

  • 3. Research Goal

HANDLING ALL DATA WITH ENCRYPTION THROUGHOUTDATA LIFE CYCLE

storage storage storage d1 d2 d3 dn Raw data User 1 User 2 User i knowledge knowledge Analysis

  • n cloud servers B

cloud servers A

Protecting Data from Leaking

slide-7
SLIDE 7

SD2 Platform for Integrated Big Data Utilization

7

  • 3. Research Goal

HANDLING ALL DATA WITH ENCRYPTION THROUGHOUTDATA LIFE CYCLE

storage storage storage knowledge knowledge d1 d2 d3 dn Raw data User 1 User 2 User i

  • 3. Flexible and assured access control
  • 2. Assurance of content source and provenance

1.Confidentiality guarantee for kinds of contents using Fully Homomorphic Encryption(FHE) using Proof of storage using Attribute-based encryption

Analysis

  • n cloud servers B

cloud servers A

slide-8
SLIDE 8

SD2 Platform for Integrated Big Data Utilization

1.Confidentiality Guarantee for kinds of Contents

8

storage storage storage PK: Public Key SK: Secret Key d1 d2 d3 dn Raw data

Enc(d1) Enc(d2) Enc(d3) Enc(dn)

①Create PK&SK ②Encrypt Key generation (FHE/ideal lattices): 2.5 s/key Key size : 17MB~2GB Encryption: 0.2 s/data (7,500 hour / GB) NOT FIT TO CPU’s cache memory 103~1010 slower than normal operations ③ Execute

Enc (d1) Enc (d2) Enc (d3) Enc (dn)

No usable library to execute data mining and machine learning. ~1010 slower than w/o encryption Bootstrapping: 6 s/data (2,200 hour / MB) 30 times slower than encryption ④ Bootstrap

Enc (d1) Enc (d2) Enc (d3) Enc (dn)

Enc (x1) Enc (x2) Enc (x3) Enc (xm)

Analysis

  • n cloud servers B

cloud servers A Noise part invades indispensable part

slide-9
SLIDE 9

SD2 Platform for Integrated Big Data Utilization

  • 2. Assurance of content source and

provenance

9

storage storage storage

Enc(d1)

σ1

Enc(d2)

σ2

Enc(d3)

σ3

Enc(dn)

σn

Using proof of storage Signature size: twice the original data (2TB for 1TB original data)

  • Requires a large storage space
  • Re-tagcreation is required depending
  • n a kind of calculation

Analysis

  • n cloud servers B

knowledge knowledge PK: Public Key SK: Secret Key ①Create PK&SK ②Encrypt ③ Execute

Enc(d1) Enc(d2) Enc(d3) Enc(dn)

d1 d2 d3 dn Raw data 2.5 Create signature σi

Enc(d1)

σ1

Enc(d2)

σ2

Enc(d3)

σ3

Enc(dn)

σn

④ Bootstrap

Enc (x1)

σ ‘ 1

Enc (x2)

σ ‘ 2

Enc (x3)

σ ‘ 3

Enc (xm)

σ ‘ m

cloud servers A

slide-10
SLIDE 10

SD2 Platform for Integrated Big Data Utilization

  • 3. Flexible and assured access control

10

storage storage storage

Enc(d1)

σ1

Enc(d2)

σ2

Enc(d3)

σ3

Enc(dn)

σn

Analysis

  • n cloud servers

knowledge knowledge PK: Public Key SK: Secret Key ①Create PK&SK ②Encrypt ③ Execute

Enc(d1) Enc(d2) Enc(d3) Enc(dn)

d1 d2 d3 dn Raw data

Enc(d1)

σ1

Enc(d2)

σ2

Enc(d3)

σ3

Enc(dn)

σn

User 1 @JST User 2 @JST User 3 @JST Using Attribute-based encryption 102~103 speedup is indispensable

  • handling “numeric number” as it is,

not as character ④ Bootstrap

Enc (x1)

σ ‘ 1

Enc (x2)

σ ‘ 2

Enc (x3)

σ ‘ 3

Enc (xm)

σ ‘ m

cloud servers A 2.5 Create signature σi User X @JaT ⑤ Flexile Access control

slide-11
SLIDE 11

SD2 Platform for Integrated Big Data Utilization

  • 3. Research Goal

11

1,000 TIMES FASTER THAN CURRENT ENCRYPTION METHODS BASELINE

CURRENT FHE, PROOF OF STORAGE, ATTRIBUTE-BASED ENCRYPTION

TO SHOW THE EFFECTIVENESS OF OUR PLATFORM WITH EXPERIMENTAL DEMONSTRATION

slide-12
SLIDE 12

SD2 Platform for Integrated Big Data Utilization

  • 4. Research Strategy

l Parallelizaion l Escape Bootstrapping as possible as we can

12

(1) For FHE, adopt “Ideal Lattice” whose basic operation is “matrix calculations,” to parallelize (2) If SWHE is applicable at some execution, use it

slide-13
SLIDE 13

SD2 Platform for Integrated Big Data Utilization

  • 4. Research Strategy

l Off-load Engine/Stream Processing/Migration l I/O tuning / optimization l Cache unfriendly tuning of workload l Data Mining Library based on FHE

13

Latency(clock) Bandwidth Registers 1 L1 cache 4+ 330GB/s L2 cache 11+ 220GB/s L3 cache 24+ 110GB/s DRAM 200-400 10-50GB/s SSD 350,000 200MB/s HDD 35,000,000+ 600MB/s

・ Effective use of “memory hierarchy”

107

・ Parallelization & adopt FPGA ・ Strem-processing called Queue Linker platform ・ Inter-cloud migration

Adopting a mechanism to bridge the gap. Use Memory Appliance to bridge the gap between SSD and HDD NEW CHALLENGE OUR ORIGINAL OUR ORIGINAL OUR ORIGINAL OUR ORIGINAL

slide-14
SLIDE 14

SD2 Platform for Integrated Big Data Utilization

  • 5. Experiment

Experimental demonstration

à show the effectiveness of our platform

n Life Log Analysis (sensor data)

n Gathering hundreds of thousands users data (raw 1TB data) n Analyzing characteristics of human behavior

14

n Drug Adverse Analysis (text data)

n Gathering over 2 million users’ drug adverse data and 26 thousand medicinal drugs data n Cooperated with pharmacies, estimate user’s drug adverse > Proof of Storage > verifiable delegation of computation > Proof of Storage > Secure multiparty computation with fully homomorphic encryption > verifiable delegation of computation > attribute based encryption

slide-15
SLIDE 15

SD2 Platform for Integrated Big Data Utilization

  • 6. Schedule

15

2015

2016 2017 2018 2019 2020 Outlook Legal Study Encryption Hierarchy

  • f Storage

・ Parallel & Distributed Computing Experi- menal

Mid-term evaluation Final Evaluation Legal Coordination Guideline Over 1,000 logs Over 10,000 logs Life log Several drugstores drug adverse 20 drugstores Computer Architecture Friendly algorithm Improvement based on LS Platform Const. at Waseda&Ochanomizu Pre-fetch I/O optimization FPGA Use of platform

  • Attribute-based encryption
  • fully homomorphic encryption
  • Proof of storage

Over 30 times faster

× =

Improvement

  • Cont. of

Inter-Cloud

Over 103 faster

(practical use level)

20 times

faster

Parallelizing by using “Ideal lattice” base encryption

20 times

faster Use of platform Over 30 times faster

slide-16
SLIDE 16

SD2 Platform for Integrated Big Data Utilization

  • 7. PROGRESS IN 2015FY

16

l Legal Study

l Studied possible data transfer and analysis under the provision of 2015 Japanese amendment of Act on the protection of personal Information.

l Encryption Algorithm

l Proposed a theory of FHE for real numbers called FHE4FX. l It enables Homomorphic Greater-Than-bit computation.

l Implementation

l Implemented “Apriori algorithm,” 10 times faster than the state-of- the-art method by adopting packing with HElib.

l Platform

l Analyzed I/O performance where data are on outer/inner zone of platter with large scale data access. l Prepared our Cloud Platform between Waseda Univ. and Ochanomizu Univ.

slide-17
SLIDE 17

SD2 Platform for Integrated Big Data Utilization

17

THANK YOU