SGX IR Secure Information Retrieval with Trusted Processors Fahad - - PowerPoint PPT Presentation

sgx ir
SMART_READER_LITE
LIVE PREVIEW

SGX IR Secure Information Retrieval with Trusted Processors Fahad - - PowerPoint PPT Presentation

UT DALLAS Erik%Jonsson%School%of%Engineering%&%Computer%Science SGX IR Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu The University of Texas at Dallas FEARLESS engineering FEARLESS engineering 1 / 29


slide-1
SLIDE 1

UT DALLAS

Erik%Jonsson%School%of%Engineering%&%Computer%Science

FEARLESS engineering

SGX IR

Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu

The University of Texas at Dallas

FEARLESS engineering 1 / 29

slide-2
SLIDE 2

Problem - Secure Cloud based Information Retrieval

Encrypted Data Encrypted Result Encrypted Search Query

Build a secure information retrieval system ◮ User stores encrypted files in cloud server ◮ Perform selective retrieval

FEARLESS engineering 2 / 29

slide-3
SLIDE 3

Build Block - Intel SGX

◮ We use Intel SGX - Software Guard Extensions ◮ SGX is new Intel instruction set ◮ Allows us to create secure compartment inside processor, called Enclave ◮ Privileged softwares, such as, OS, Hypervisor, can not directly

  • bserve data and computation inside enclave

FEARLESS engineering 3 / 29

slide-4
SLIDE 4

Threat Model - Intel SGX

Encrypted Data & Code Encrypted Result

Disk Memory CPU Enclave

Server

Adversary can control hypervisor, OS, memory, disk of the server

FEARLESS engineering 4 / 29

slide-5
SLIDE 5

State of The Art

◮ Relevant search or indexing systems that uses SGX - HardIDX (Fuhry et al., 2017), Rearguard (Sun et al., 2018), Oblix (Mishra et al., 2018), Hardware-supported ORAM (Hoang et al., 2019) ◮ These works mainly focus on building efficient data structures for searching using SGX ◮ Assume inverted index is built and/or build the index in client ◮ Did not look into ranked retrieval

FEARLESS engineering 5 / 29

slide-6
SLIDE 6

Challenges - Access Pattern Leakage

Challenge: Access Pattern Leakage ◮ Adversary can observe memory accesses in SGX ◮ Memory access reveals about encrypted data (Islam, Kuzu, and Kantarcioglu, 2012; Naveed, Kamara, and Wright, 2015)

FEARLESS engineering 6 / 29

slide-7
SLIDE 7

Challenges - Access Pattern Leakage

Challenge: Access Pattern Leakage ◮ Adversary can observe memory accesses in SGX ◮ Memory access reveals about encrypted data (Islam, Kuzu, and Kantarcioglu, 2012; Naveed, Kamara, and Wright, 2015) Solution ◮ Data Obliviousness - we build custom data oblivious indexing algorithms

FEARLESS engineering 6 / 29

slide-8
SLIDE 8

Data Obliviousness - Oblivious Select

◮ Data Obliviousness: Program executes same path for all input of same size

FEARLESS engineering 7 / 29

slide-9
SLIDE 9

Data Obliviousness - Oblivious Select

◮ Data Obliviousness: Program executes same path for all input of same size ◮ Example: x == y ? a : b

FEARLESS engineering 7 / 29

slide-10
SLIDE 10

Data Obliviousness - Oblivious Select

◮ Data Obliviousness: Program executes same path for all input of same size ◮ Example: x == y ? a : b

  • blivousSelect (a, b, x, y):

... mov %[x],%%eax mov %[y],%%ebx xor %%eax , %%ebx ... mov %[a],%%ecx mov %[b],%%edx cmovz %%ecx ,%% edx ... mov %%edx , %[out]

FEARLESS engineering 7 / 29

slide-11
SLIDE 11

Challenge - Memory Constraint

Challenge: Memory Constraint ◮ SGX (v1) only 90MB enclave

FEARLESS engineering 8 / 29

slide-12
SLIDE 12

Challenge - Memory Constraint

Challenge: Memory Constraint ◮ SGX (v1) only 90MB enclave Solution ◮ Blocking - Break large data into small blocks ◮ We utilize SGXBigMatrix (Shaon et al., 2017) primitives ◮ BigMatrix handles the complexity of data blocking

FEARLESS engineering 8 / 29

slide-13
SLIDE 13

Objectives - Summary

Encrypted Intermediate Data Encrypted Result Pre-Processing Encrypted Search Query Final Processing

◮ Very low client side processing ◮ Build index securely in the cloud using SGX ◮ Build data oblivious algorithms ◮ Support ranked retrieval

FEARLESS engineering 9 / 29

slide-14
SLIDE 14

SGX IR - Document and Query Types

◮ Text Data

◮ Ranked document retrieval using TF-IDF (Token Frequency and Inverse Document Frequency)

◮ Image Data

◮ Face recognition using Eigenface

FEARLESS engineering 10 / 29

slide-15
SLIDE 15

Text Pre-Processing - Client

Tokenization Stemming TokenID Generation

Cryptography is the practice and study of techniques for secure communication ... cryptographi practic studi techniqu secur commun cryptographi practic studi techniqu secur 1 2 3 4 5

Encrypted BigMatrix Generation

tok-id 1 2 3 doc-id 1 1 1 ... freq ... 2 10 6 ...

◮ We tokenize and stem the input text files ◮ We build a matrix I with token id, document id, and frequency columns ◮ Finally, we encrypt I and upload ◮ Single round of read and write is required

FEARLESS engineering 11 / 29

slide-16
SLIDE 16

Text Indexing - Server

doc-id tok-id freq 1 1 2 2 1 3 ... ... ... 8 2 1 1 2 5 ... ... ... 17 3 8 1 4 1 ... ... ... count tok-id sum 1 8 20 2 4 9 3 7 15 4 5 3 # # # ... ... ... ... ... ... 5 1 2 6 1 1 doc-id tok-id freq 1,0 1 2 7 2 ... ... ... 1 3 9 10 ... ... ... 1,1 ... ... 2,0 ... ... ... 2,1 9 10 3,0 ... Indexing

◮ Input I, we output two matrices ◮ U ′ containing total frequencies of the tokens, for IDF calculation ◮ T containing equal length blocks of token to document frequency mapping for TF calculation

FEARLESS engineering 12 / 29

slide-17
SLIDE 17

Text Indexing - IDF - Server

doc-id tok-id freq 1 1 2 2 1 3 ... ... ... 8 2 1 1 2 5 ... ... ... 17 3 8 1 4 1 ... ... ... Sort Count & Sum

Sort and Adjust

doc-id tok-id freq 1 1 2 1 2 5 ... ... ... 1 4 1 ... ... ... 2 1 3 2 5 10 3 6 4 ... ... ... count tok-id sum 1 # # # ... ... ... 2 8 20 # # # ... ... ... 3 4 9 # # # ... ... ... count tok-id sum 1 8 20 2 4 9 3 7 15 4 5 3 # # # ... ... ... ... ... ... 5 1 2 6 1 1

◮ I′ ← Obliviously sort I on token id column ◮ We generate U, to keep count and sum of frequencies

◮ c ← I′[i].token id = I′[i − 1].token id ◮ U[i].sum ← obliviousSelect(sum, #, 1, c) ◮ sum ← obliviousSelect(sum, 0, 1, c) + I[i].frequency

◮ Finally, we sort this matrix so that the dummy entries go to the bottom

FEARLESS engineering 13 / 29

slide-18
SLIDE 18

Text Indexing - TF - Block Size Optimization

◮ We can read document frequency of tokens from matrix I′ ◮ This will reveal number of documents having a specific token ◮ So, we split I′ into equal length blocks ◮ We optimize block size b from count column of U′ using technique outline in (Shaon and Kantarcioglu, 2016)

◮ We assume the frequency follow Pareto distribution ◮ Mathematically find the value minimize the padding

FEARLESS engineering 14 / 29

slide-19
SLIDE 19

Text Indexing - TF - Padding Generation

We regenerate token id with bucket number function σ

doc-id tok-id freq 1 1 2 1 2 5 ... ... ... 1 4 1 ... ... ... 2 1 3 2 5 10 3 6 4 ... ... ...

Regenerate TokenId

doc-id tok-id freq 1,0 1 2 7 2 ... ... ... 1 3 9 10 ... ... ... 1,1 ... ... ... 2,0 ... ... ... 2,1 9 10 3,0

We generate padding

Generate Padding Rows

count tok-id sum 1 8 20 2 4 9 3 7 15 4 5 3 # # # ... ... ... ... ... ... 5 1 2 6 1 1 doc-id tok-id freq 1,1 # # ... ... ... ... ... ... ... ... ... 2,1 ... ... ... 3,1 # # # # # # # # # #

Finally we merge and sort X and J to get the output T matrix.

FEARLESS engineering 15 / 29

slide-20
SLIDE 20

TF - IDF Calculation

◮ On T we run term frequency functions - (log normalization) 1 + log(tft,d) ◮ On U′ we run document frequency functions, such as, IDF log N d ft ◮ Query result we use T for TF and U′ for IDF

FEARLESS engineering 16 / 29

slide-21
SLIDE 21

Bitonic Sorting of Arbitrary Input Size

◮ Sorting is one of the most frequently used operations ◮ We use arbitrary length Bitonic sort version (Lang, 1998) ◮ However, existing definition is recursive ◮ Not suitable for memory constrained environments like SGX ◮ So, we propose a non-recursive algorithm without using stack

FEARLESS engineering 17 / 29

slide-22
SLIDE 22

Bitonic Sort Non Recursive Algorithm - Concept

Concept ◮ We can express a number as N = 2xm+...+2x3+2x2+2x1 ◮ Merge network can sort a descending and an ascending block into ascending order block ◮ We sort then merge from smallest to biggest block

Sort Merge

FEARLESS engineering 18 / 29

slide-23
SLIDE 23

Bitonic Sort Non Recursive Algorithm

1: for d = 0 to ⌈log2(N)⌉ do 2:

if ((N >> d) & 1) = 0 then

3:

start ← (−1 << (d + 1)) & N

4:

size ← 1 << d

5:

dir ← (size & N & − N) = 0

6:

bitonicSort2K(start, size, dir)

7:

if !dir then

8:

bitonicMerge(start, N − start, 1)

9:

end if

10:

end if

11: end for

FEARLESS engineering 19 / 29

slide-24
SLIDE 24

Face recognition indexing

◮ We adopt EigenFace ◮ Pre-processing and matching face are simple matrix operations ◮ Core problem to solve obliviously is eigenvector calculation ◮ We adopt Jacobi method of eigenvector calculation

FEARLESS engineering 20 / 29

slide-25
SLIDE 25

Eigenvector calculation - Jacobi method

Oblivious Value Extract Oblivious Column Extract Rotate Oblivious Column Assign Oblivious Row Assign Calculate & Ressign

We find the max off-diagonal element at Ak,l, then rotate column k and l. Repeat until A becomes diagonal. The diagonal values are eigenvalues.

FEARLESS engineering 21 / 29

slide-26
SLIDE 26

Experimental Evaluations

We implemented a prototype using Intel SGX SDK 2.6 for Linux Setup ◮ Processor Intel Xeon E3-1270 ◮ Memory 64GB ◮ OS Ubuntu 18.04 ◮ SGX SDK Version 2.6 for Linux

FEARLESS engineering 22 / 29

slide-27
SLIDE 27

Experimental Results - Bitonic Sort and Text Indexing

5 10 15 20 25 30 5 x 1

6

1 x 1

7

1 . 5 x 1

7

2 x 1

7

2 . 5 x 1

7

3 x 1

7

3 . 5 x 1

7

Sorting time (min) Number of rows Sorting time Sorting time next 2k

Bitonic sort

20 40 60 80 100 120 140 160 9 5 1 1 5 1 1 1 1 5 1 2 1 2 5 1 3 1 3 5 Index preparation (sec) Data set size (MB) Encryption only Incremental id MD5 hash SHA256 hash Murmur hash

Client end processing cost on Enron Dataset

FEARLESS engineering 23 / 29

slide-28
SLIDE 28

Experimental Results - Text Indexing

100 120 140 160 180 200 220 240 260 280 9 5 1 1 5 1 1 1 1 5 1 2 1 2 5 1 3 1 3 5 Indexing time (min) Data set size (MB) Oblivious index building Non-Oblivious index building

SGX index processing on Enron Dataset

0.2 0.4 0.6 0.8 1 9 5 1 1 5 1 1 1 1 5 1 2 1 2 5 1 3 1 3 5 NDCG Score Data set size (MB) NDCG Score compare to Lucene

NDCG results compare to Apace Lucene on Enron Dataset

FEARLESS engineering 24 / 29

slide-29
SLIDE 29

Experimental Result - Eigenvector Calculation

15 20 25 30 35 40 8 8 5 9 9 5 1 1 5 1 1 1 1 5 1 2 Scaling and project time (sec) Number of faces Scaling and project time (sec)

Pre-processing overhead

50 100 150 200 250 300 350 400 8 8 5 9 9 5 1 1 5 1 1 1 1 5 1 2 1 2 5 Calculation time (Hours) Matrix elements Oblivious Eigenface calculation Non-oblivious Eigenface calculation

Eigenvector calculation time

FEARLESS engineering 25 / 29

slide-30
SLIDE 30

Thank you Questions / Comments

◮ Fahad Shaon - fahad.shaon@utdallas.edu ◮ Murat Kantarcioglu - muratk@utdallas.edu Acknowledgments: This research is supported in part by NIH award 1R01HG006844, NSF awards CICI-1547324, IIS-1633331, CNS-1837627, OAC-1828467 and ARO award W911NF-17-1-0356.

FEARLESS engineering 26 / 29

slide-31
SLIDE 31

References I

Fuhry, Benny et al. (2017). “HardIDX: Practical and secure index with SGX”. In: IFIP Annual Conference on Data and Applications Security and Privacy. Springer, pp. 386–408. Hoang, Thang et al. (2019). “Hardware-supported ORAM in effect: Practical oblivious search and update on very large dataset”. In: Proceedings on Privacy Enhancing Technologies 2019.1, pp. 172–191. Islam, Mohammad Saiful, Mehmet Kuzu, and Murat Kantarcioglu (2012). “Access Pattern disclosure on Searchable Encryption: Ramification, Attack and Mitigation.”. In: NDSS. Vol. 20, p. 12. Lang, H.W. (1998). Bitonic sorting network for n not a power of 2. Accessed 05/31/2020. url: http://www.iti.fh-flensburg. de/lang/algorithmen/sortieren/bitonic/oddn.htm.

FEARLESS engineering 27 / 29

slide-32
SLIDE 32

References II

Mishra, Pratyush et al. (2018). “Oblix: An efficient oblivious search index”. In: 2018 IEEE Symposium on Security and Privacy (SP). IEEE, pp. 279–296. Naveed, Muhammad, Seny Kamara, and Charles V Wright (2015). “Inference attacks on property-preserving encrypted databases”. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. ACM, pp. 644–655. Shaon, Fahad and Murat Kantarcioglu (2016). “A practical framework for executing complex queries over encrypted multimedia data”. In: IFIP Annual Conference on Data and Applications Security and Privacy. Springer, pp. 179–195.

FEARLESS engineering 28 / 29

slide-33
SLIDE 33

References III

Shaon, Fahad et al. (2017). “SGX-BigMatrix: A Practical Encrypted Data Analytic Framework With Trusted Processors”. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. CCS 17. Dallas, Texas, USA: Association for Computing Machinery, 12111228. isbn:

  • 9781450349468. doi: 10.1145/3133956.3134095.

Sun, Wenhai et al. (2018). “REARGUARD: Secure keyword search using trusted hardware”. In: IEEE INFORM.

FEARLESS engineering 29 / 29