SGX BigMatrix A Practical Encrypted Data Analytic Framework with - - PowerPoint PPT Presentation

sgx bigmatrix
SMART_READER_LITE
LIVE PREVIEW

SGX BigMatrix A Practical Encrypted Data Analytic Framework with - - PowerPoint PPT Presentation

UT DALLAS Erik%Jonsson%School%of%Engineering%&%Computer%Science SGX BigMatrix A Practical Encrypted Data Analytic Framework with Trusted Processors Fahad Shaon Murat Kantarcioglu Zhiqiang Lin Latifur Khan The University of Texas at


slide-1
SLIDE 1

UT DALLAS

Erik%Jonsson%School%of%Engineering%&%Computer%Science

FEARLESS engineering

SGX BigMatrix

A Practical Encrypted Data Analytic Framework with Trusted Processors Fahad Shaon Murat Kantarcioglu Zhiqiang Lin Latifur Khan

The University of Texas at Dallas

FEARLESS engineering 1 / 49

slide-2
SLIDE 2

Problem - Secure Data Analytics on Cloud

Result Code & Data

◮ We want to utilize cloud environment for data analytics ◮ Service provider can observe the data ◮ Problematic for sensitive data (e.g., medical, financial data)

FEARLESS engineering 2 / 49

slide-3
SLIDE 3

Problem - Secure Data Analytics on Cloud

Encrypted Result Encrypted Code & Data

◮ We outsource encrypted sensitive data ◮ However, encrypted data is difficult to analyze

FEARLESS engineering 3 / 49

slide-4
SLIDE 4

Problem - Secure Data Analytics - Approaches

Homomorphic Encryption

◮ Theoretically robust and

provides highest level of security

◮ High computational cost ◮ Impractical for large data

processing Trusted Hardware

◮ Cost effective ◮ Provides reasonable security ◮ Intel SGX is available in all

new processors

◮ Needs careful consideration

  • f side channel attacks

FEARLESS engineering 4 / 49

slide-5
SLIDE 5

Objective of the work Create a data analytics platform utilizing trusted processor, which is - secure, practical, general purpose, and scalable.

FEARLESS engineering 5 / 49

slide-6
SLIDE 6

State of the Art

ObliVM (Liu et al., 2015)

◮ Provides a language and covert the logic into circuit ◮ Difficult to perform analysis on large data set

Oblivious Multi-party ML (Ohrimenko et al., 2016)

◮ Performs important machine learning algorithms using SGX ◮ Specific for set of algorithms

Opaque (Zheng et al., 2017)

◮ Oblivious and encrypted distributed analytics platform using

Apache Spark and Intel SGX (mainly focused on supporting SQL)

FEARLESS engineering 6 / 49

slide-7
SLIDE 7

Background - Intel SGX

◮ SGX stands for Software Guard Extensions ◮ SGX is new Intel instruction set ◮ Allows us to create secure compartment inside processor,

called Enclave

◮ Privileged softwares, such as, OS, Hypervisor, can’t directly

  • bserve data and computation inside enclave

FEARLESS engineering 7 / 49

slide-8
SLIDE 8

Background - Intel SGX - Attack Surface

◮ SGX essentially reduce the attack surface to processor and

enclave code

OS VMM Hardware App App App

Attack Surface

Attack surface of traditional computation system

FEARLESS engineering 8 / 49

slide-9
SLIDE 9

Background - Intel SGX - Attack Surface

◮ SGX essentially reduce the attack surface to processor and

enclave code

OS VMM Hardware App App App

Attack Surface

Attack surface of traditional computation system

OS VMM App App App Hardware

Attack Surface

Attack surface with SGX

FEARLESS engineering 8 / 49

slide-10
SLIDE 10

Background - Intel SGX Application

Untrusted Part

  • f App

Trusted Part

  • f App

◮ We only trust the processor and the code inside the

enclave (Intel, 2015)

FEARLESS engineering 9 / 49

slide-11
SLIDE 11

Background - Intel SGX Impact

SGX Server

Encrypted Result Encrypted Code & Data

◮ We can outsource computation securely ◮ No need to trust the cloud provider (i.e. Hypervisor, OS,

Cloud administrators)

FEARLESS engineering 10 / 49

slide-12
SLIDE 12

Threat Model

Server

Memory Processor

Enclave

Disk Code & Data Result ◮ Adversary can control OS (i.e. memory, disk, networking) ◮ Adversary can not temper with enclave code ◮ Adversary can not observe CPU register content

FEARLESS engineering 11 / 49

slide-13
SLIDE 13

Challenges - Obliviousness

Challenge: Access Pattern Leakage

◮ SGX uses system memory, which is controlled by the adversary ◮ Adversary can observe memory accesses ◮ Memory access reveals a lot about the data (Islam, Kuzu, and

Kantarcioglu, 2012; Naveed, Kamara, and Wright, 2015)

FEARLESS engineering 12 / 49

slide-14
SLIDE 14

Challenges - Obliviousness

Challenge: Access Pattern Leakage

◮ SGX uses system memory, which is controlled by the adversary ◮ Adversary can observe memory accesses ◮ Memory access reveals a lot about the data (Islam, Kuzu, and

Kantarcioglu, 2012; Naveed, Kamara, and Wright, 2015) Solution

◮ To reduce information leakage we ensure Data Obliviousness

FEARLESS engineering 12 / 49

slide-15
SLIDE 15

Data Obliviousness - Example

◮ Program executes same path for all input of same size

FEARLESS engineering 13 / 49

slide-16
SLIDE 16

Data Obliviousness - Example

◮ Program executes same path for all input of same size

Example: Non-Oblivious swap method of Bitonic sort if (dir == (arr[i] > arr[j])) { int h = arr[i]; arr[i] = arr[j]; arr[j] = h; }

FEARLESS engineering 13 / 49

slide-17
SLIDE 17

Data Obliviousness - Example (Cont.)

Example: Oblivious swap method of Bitonic sort int x = arr[i]; int y = arr[j]; _asm{ ... mov eax , x mov ebx , y mov ecx , dir cmp ebx , eax setg dl xor edx , ecx mov eax , x mov ecx , y mov ebx , y mov edx , x cmovz eax , ecx cmovz ebx , edx mov [x], eax mov [y], ebx }

FEARLESS engineering 14 / 49

slide-18
SLIDE 18

Data Obliviousness - Challenges

Challenge

◮ Building data obliviousness solution is non-trivial ◮ Requires a lot of time and effort

FEARLESS engineering 15 / 49

slide-19
SLIDE 19

Data Obliviousness - Challenges

Challenge

◮ Building data obliviousness solution is non-trivial ◮ Requires a lot of time and effort

Solution

◮ We provide our own python (NumPy, Pandas) inspired

language that ensures data obliviousness

FEARLESS engineering 15 / 49

slide-20
SLIDE 20

Data Oblivious - Vectorization

◮ We removed if and emphasis on vectorization

Example: Compute average income of people with age >= 50 sum = 0, count = 0 for i = 0 to Person.length: if Person.age >= 50: count ++ sum += P.income print sum / count

FEARLESS engineering 16 / 49

slide-21
SLIDE 21

Data Oblivious - Example

Example: Compute average income of people with age >= 50 S = where(Person , "Person[‘age ’] >= 50") print (S .* Person[‘income ’] ) / sum(S)

FEARLESS engineering 17 / 49

slide-22
SLIDE 22

Challenge - Memory constraint

Challenge

◮ Current version of SGX (v1) allows only 90MB of memory

allocation

FEARLESS engineering 18 / 49

slide-23
SLIDE 23

Challenge - Memory constraint

Challenge

◮ Current version of SGX (v1) allows only 90MB of memory

allocation Solution

◮ We build flexible data blocking mechanism with efficient

and secure caching

◮ We build matrix manipulation library that supports blocking

and we call the abstraction BigMatrix

FEARLESS engineering 18 / 49

slide-24
SLIDE 24

Security Properties - Summary

◮ Individual operations in our system is data oblivious ◮ Combination of oblivious operations is also oblivious ◮ Compiler warns user about potential leakage ◮ We perform optimization based on publicly known

information, e.g. data size

FEARLESS engineering 19 / 49

slide-25
SLIDE 25

System Overview - SGX BigMatrix

Untrusted Trusted Compiler Block Size Optimizer Service Manager BigMatrix Library Intel SGX SDK Execution Engine Block Cache

OCalls ECalls

Compiler BMRT Client Server Client

SGX BigMatrix

FEARLESS engineering 20 / 49

slide-26
SLIDE 26

BigMatrix Library

Untrusted Trusted Compiler Block Size Optimizer Service Manager BigMatrix Library Intel SGX SDK Execution Engine Block Cache

OCalls ECalls

Compiler BMRT Client Server Client

SGX BigMatrix - BigMatrix Library

FEARLESS engineering 21 / 49

slide-27
SLIDE 27

BigMatrix Library

Operations in BigMatrix Library

◮ Data access operations - load, publish, get row, etc. ◮ Matrix Operations - inverse, multiply, element wise,

transpose, etc.

◮ Relational Algebra Operations - where, sort, join, etc. ◮ Data generation operations - rand, zeros, etc. ◮ Statistical Operations - norm, var

FEARLESS engineering 22 / 49

slide-28
SLIDE 28

BigMatrix Library - Security Properties

◮ All the operations are data oblivious ◮ All the operations supports blocking ◮ We proved that combination of data oblivious operations is

also data oblivious (in Section 4)

◮ Data oblivious and blocking aware implementation details in

Appendix A

FEARLESS engineering 23 / 49

slide-29
SLIDE 29

BigMatrix Library - Trace

◮ Each operation has fixed trace ◮ Trace is the information disclosed to adversary during

execution

◮ For example: operation type, input and output data size

FEARLESS engineering 24 / 49

slide-30
SLIDE 30

BigMatrix Library - Trace

◮ Each operation has fixed trace ◮ Trace is the information disclosed to adversary during

execution

◮ For example: operation type, input and output data size

Example: Trace of Matrix Multiplication C = A ∗ B

◮ Instruction type (i.e. multiplication) ◮ Input Matrices size (i.e., A.rows, A.cols, B.rows, B.cols) ◮ Output Matrix size (i.e., C.rows, C.cols) ◮ Block size ◮ Oblivious memory read and write sequences, which does not

depend on data content

FEARLESS engineering 24 / 49

slide-31
SLIDE 31
  • Exec. Engine & Block Cache

Untrusted Trusted Compiler Block Size Optimizer Service Manager BigMatrix Library Intel SGX SDK Execution Engine Block Cache

OCalls ECalls

Compiler BMRT Client Server Client

SGX BigMatrix - Execution Engine and Block Cache

FEARLESS engineering 25 / 49

slide-32
SLIDE 32
  • Exec. Engine & Block Cache

Execution Engine

◮ Execute BigMatrix library operations ◮ Parse instruction in the form of

Var ASSIGN Operation (Var, Var, ...)

◮ Process sequence of instructions ◮ Maintain intermediate states required to execute complex

program, such as, variable to BigMatrix assignments Block Cache

◮ Help with the decision when to remove a block from memory

based on next sequence of instructions

FEARLESS engineering 26 / 49

slide-33
SLIDE 33
  • Exec. Engine & Block Cache - Security Properties

◮ Execution Engine and Block Cache is also data oblivious

given the input program is data oblivious

◮ Compiler warns about potential data leakage ◮ Adversary can not infer anything more about data, apart from

the trace of all the operations

FEARLESS engineering 27 / 49

slide-34
SLIDE 34

Compiler

Untrusted Trusted Compiler Block Size Optimizer Service Manager BigMatrix Library Intel SGX SDK Execution Engine Block Cache

OCalls ECalls

Compiler BMRT Client Server Client

SGX BigMatrix - Compiler

FEARLESS engineering 28 / 49

slide-35
SLIDE 35

Compiler

◮ Compiles our python inspired language into basic command ◮ It ensures data obliviousness by removing support for if ◮ We emphasis on operation vectorization

Input: Linear Regression x = load ( ‘ path / to / X Matrix ’ ) y = load ( ‘ path / to / Y Matrix ’ ) xt = transpose ( x ) theta = i n v e r s e ( xt ∗ x ) ∗ xt ∗ y p u b l i s h ( theta )

FEARLESS engineering 29 / 49

slide-36
SLIDE 36

Compiler - Output

Output: Linear Regression x = load ( X Matrix ID ) y = load ( Y Matrix ID ) xt = transpose ( x ) t1 = m u l t i p l y ( xt , x ) unset ( x ) t2 = i n v e r s e ( t1 ) unset ( t1 ) t3 = m u l t i p l y ( t2 , xt ) unset ( xt ) unset ( t2 ) theta = m u l t i p l y ( t3 , y ) unset ( y ) unset ( t3 ) p u b l i s h ( theta )

FEARLESS engineering 30 / 49

slide-37
SLIDE 37

Compiler - Track data leakage

◮ We report against accidental data leakage through trace ◮ We check if any sensitive data is used in trace of any operation ◮ In our system, sensitive data - content of any BigMatrix,

content of intermediate variables Example X = load(‘path/to/X_Matrix ‘) s = count(where(X[1] >= 0)) Y = zeros(s, 1) publish(Y) We report that zeros operation revealing sensitive data s

FEARLESS engineering 31 / 49

slide-38
SLIDE 38

SQL Support

◮ We also support basic SQL

Input I = sql(‘SELECT * FROM person p JOIN person_income pi (1) ON p.id = pi.id WHERE p.age > 50 AND pi.income > 100000 ’)

FEARLESS engineering 32 / 49

slide-39
SLIDE 39

SQL Support (Cont.)

Output t1 = where(person , ’C:3;V:50;O:=’) # person.age is in column 3 t2 = zeros(person.rows , 2) set_column(t2 , 0, t3) t3 = get_column(person , 0) # person.id is in column 0 set_column(t2 , 1, t1) t4 = where(person_income , ’C:1;V:100000;O:=’) t5 = zeros( person_income .rows , 2) set_column(t5 , 0, t6) t6 = get_column(person_income , 0) # person_income .id is in column 0 set_column(t5 , 1, t4) A = join(t3 , t5 , ’c:t1.0;c:t2.0;O:=’, 1) ...

FEARLESS engineering 33 / 49

slide-40
SLIDE 40

Block Size Optimizer

Untrusted Trusted Compiler Block Size Optimizer Service Manager BigMatrix Library Intel SGX SDK Execution Engine Block Cache

OCalls ECalls

Compiler BMRT Client Server Client

SGX BigMatrix - Block Size Optimizer

FEARLESS engineering 34 / 49

slide-41
SLIDE 41

Block Size Optimizer - Intro & Design Decisions

◮ We observed that input block size has impact on

performances of the system

◮ Adversary doesn’t gain any knowledge about data based on

block size

◮ So, we find optimum block size for each instruction before

executing a program

◮ We explicitly do not want to perform optimization inside

enclave because

◮ Optimization libraries are large and complex, which can

introduce unintended security flaws

◮ Any efficient optimization algorithm will reveal information

about data

◮ So we only perform optimization on trace data, nothing else FEARLESS engineering 35 / 49

slide-42
SLIDE 42

Block Size Optimizer - Overview

◮ We generate DAG of execution graph

◮ Internal nodes represent operations ◮ Edges represent block conversions

◮ We know cost for each operation for different matrix and

block size

◮ Given input matrix sizes we can find optimized block size ◮ We can convert one block configuration to another and know

the cost of conversion

FEARLESS engineering 36 / 49

slide-43
SLIDE 43

Block Size Optimizer - Example - Linear Regression

◮ Execution graph (DAG) of Θ = (XT X)−1XT Y in liner

regression training phase

FEARLESS engineering 37 / 49

slide-44
SLIDE 44

Block Size Optimizer - Example - LR Cost Function

Cost = Convert(X, (brX, bcX), (x0, x1)) + OP Cost(′Transpose′, X, (x0, x1)) + Convert(XT , (x1, x0), (x2, x3)) + Convert(X, (brX, bcX), (x4, x5)) + OP Cost(′Multiply′, [XT , X], [(x2, x3), (x4, x5)]) + ... We convert this into integer programming and solve it for all the xn variables.

FEARLESS engineering 38 / 49

slide-45
SLIDE 45

Experimental Evaluations

We implemented a prototype using Intel SGX SDK and observe performance of different operations Setup

◮ Processor Intel Core i7 6700 ◮ Memory 64GB ◮ OS Windows 7 ◮ SGX SDK Version 1.0 ◮ Number of Machine 1

FEARLESS engineering 39 / 49

slide-46
SLIDE 46

Performance Impact - Matrix Size

200000 400000 600000 800000 1x106 1.2x106 1.4x106 5 x 1

6

1 x 1

7

1 . 5 x 1

7

2 x 1

7

2 . 5 x 1

7

Matrix Multiplication Time (ms) Matrix Elements Unencrypted Encrypted

Matrix Multiplication (e.g. C = A ∗ B)

150000 200000 250000 300000 350000 400000 450000 500000 550000 6 6 5 7 7 5 8 8 5 9 9 5 1 x 1

6

Join time (ms) Matrix Elements Unencrypted Encrypted

Oblivious Join

FEARLESS engineering 40 / 49

slide-47
SLIDE 47

Performance Impact - Matrix Size - Summary

◮ We observe similar trends for all matrix operations ◮ We observe minimal overhead for encrypted computation ◮ However, the overhead depends on operation type ◮ More experimental evaluations in Section 5

FEARLESS engineering 41 / 49

slide-48
SLIDE 48

Performance Impact - Block Size

Execution Time 100 200 300 400 500 100 200 300 400 500 140 145 150 155 160 Scalar Operation Time (ms)

Scalar Multiplication

Execution Time 100 200 300 400 500 100 200 300 400 500 18000 18400 18800 19200 19600 20000 Matrix Multiplication Time (ms)

Matrix Multiplication

FEARLESS engineering 42 / 49

slide-49
SLIDE 49

Performance Impact - Block Size - Summary

◮ We observe execution time increases with block size ◮ Also, very small block size increases execution time, due to

blocking overhead

◮ As a result, we performed optimization

FEARLESS engineering 43 / 49

slide-50
SLIDE 50

Comparison with ObliVM

◮ We compare performance of SGX-BigMatrix with ObliVM for

two-party matrix multiplication

◮ We observe that SGX-BigMatrix is magnitude faster because

we are utilizing hardware and do not require expensive over the network communication Matrix ObliVM BigMatrix BigMatrix Dimension SGX Enc. SGX Unenc. 100 28s 660ms 10ms 10ms 250 7m 0s 90ms 93ms 88ms 500 53m 48s 910ms 706.66ms 675.66ms 750 2h 59m 40s 990ms 2s 310ms 2s 260ms 1,000 6h 34m 17s 900ms 10s 450ms 10s 330ms

Table: Two-party matrix multiplication time in ObliVM vs BigMatrix

FEARLESS engineering 44 / 49

slide-51
SLIDE 51

Case Studies - Page Rank

◮ Performed Page Rank on three popular datasets ◮ Each dataset contains directed graph

Data Set Nodes BigMatrix Encrypted Wiki-Vote 7,115 97s 560ms Astro-Physics 18,772 6m 41s 200ms Enron Email 36,692 23m 19s 700ms

Table: Page Rank on real datasets

FEARLESS engineering 45 / 49

slide-52
SLIDE 52

Conclusion

◮ We propose a practical data analytics framework with SGX ◮ We present BigMatrix abstraction to handle large matrices in

constrained environment

◮ We proposed a programming abstraction for secure data

analytics

◮ We applied our system to solve real world problems

FEARLESS engineering 46 / 49

slide-53
SLIDE 53

Thank You Questions / Comments

◮ Fahad Shaon - fahad.shaon@utdallas.edu ◮ Murat Kantarcioglu - muratk@utdallas.edu ◮ Zhiqiang Lin - zhiqiang.lin@utdallas.edu ◮ Latifur Khan - lkhan@utdallas.edu

FEARLESS engineering 47 / 49

slide-54
SLIDE 54

References I

Intel (2015). Presentation for Intel SGX: ISCA 2015. url: https: //software.intel.com/sites/default/files/332680- 002.pdf. Islam, Mohammad Saiful, Mehmet Kuzu, and Murat Kantarcioglu (2012). “Access Pattern disclosure on Searchable Encryption: Ramification, Attack and Mitigation.” In: NDSS. Vol. 20, p. 12. Liu, Chang et al. (2015). “Oblivm: A programming framework for secure computation”. In: Security and Privacy (SP), 2015 IEEE Symposium on. IEEE, pp. 359–376. Naveed, Muhammad, Seny Kamara, and Charles V Wright (2015). “Inference attacks on property-preserving encrypted databases”. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. ACM, pp. 644–655.

FEARLESS engineering 48 / 49

slide-55
SLIDE 55

References II

Ohrimenko, Olga et al. (2016). “Oblivious Multi-Party Machine Learning on Trusted Processors”. In: 25th USENIX Security Symposium (USENIX Security 16). Austin, TX: USENIX Association, pp. 619–636. isbn: 978-1-931971-32-4. url: https://www.usenix.org/conference/usenixsecurity16/ technical-sessions/presentation/ohrimenko. Zheng, Wenting et al. (2017). “Opaque: A Data Analytics Platform with Strong Security”. In: 14th USENIX Symposium

  • n Networked Systems Design and Implementation (NSDI 17).

Boston, MA: USENIX Association. url: https://www.usenix.org/conference/nsdi17/technical- sessions/presentation/zheng.

FEARLESS engineering 49 / 49