sgx ir
play

SGX IR Secure Information Retrieval with Trusted Processors Fahad - PowerPoint PPT Presentation

UT DALLAS Erik%Jonsson%School%of%Engineering%&%Computer%Science SGX IR Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu The University of Texas at Dallas FEARLESS engineering FEARLESS engineering 1 / 29


  1. UT DALLAS Erik%Jonsson%School%of%Engineering%&%Computer%Science SGX IR Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu The University of Texas at Dallas FEARLESS engineering FEARLESS engineering 1 / 29

  2. Problem - Secure Cloud based Information Retrieval Encrypted Data Encrypted Search Query Encrypted Result Build a secure information retrieval system ◮ User stores encrypted files in cloud server ◮ Perform selective retrieval FEARLESS engineering 2 / 29

  3. Build Block - Intel SGX ◮ We use Intel SGX - S oftware G uard E x tensions ◮ SGX is new Intel instruction set ◮ Allows us to create secure compartment inside processor , called Enclave ◮ Privileged softwares, such as, OS, Hypervisor, can not directly observe data and computation inside enclave FEARLESS engineering 3 / 29

  4. Threat Model - Intel SGX Enclave Encrypted Data & Code Memory CPU Encrypted Result Disk Server Adversary can control hypervisor, OS, memory, disk of the server FEARLESS engineering 4 / 29

  5. State of The Art ◮ Relevant search or indexing systems that uses SGX - HardIDX (Fuhry et al., 2017), Rearguard (Sun et al., 2018), Oblix (Mishra et al., 2018), Hardware-supported ORAM (Hoang et al., 2019) ◮ These works mainly focus on building efficient data structures for searching using SGX ◮ Assume inverted index is built and/or build the index in client ◮ Did not look into ranked retrieval FEARLESS engineering 5 / 29

  6. Challenges - Access Pattern Leakage Challenge: Access Pattern Leakage ◮ Adversary can observe memory accesses in SGX ◮ Memory access reveals about encrypted data (Islam, Kuzu, and Kantarcioglu, 2012; Naveed, Kamara, and Wright, 2015) FEARLESS engineering 6 / 29

  7. Challenges - Access Pattern Leakage Challenge: Access Pattern Leakage ◮ Adversary can observe memory accesses in SGX ◮ Memory access reveals about encrypted data (Islam, Kuzu, and Kantarcioglu, 2012; Naveed, Kamara, and Wright, 2015) Solution ◮ Data Obliviousness - we build custom data oblivious indexing algorithms FEARLESS engineering 6 / 29

  8. Data Obliviousness - Oblivious Select ◮ Data Obliviousness: Program executes same path for all input of same size FEARLESS engineering 7 / 29

  9. Data Obliviousness - Oblivious Select ◮ Data Obliviousness: Program executes same path for all input of same size ◮ Example: x == y ? a : b FEARLESS engineering 7 / 29

  10. Data Obliviousness - Oblivious Select ◮ Data Obliviousness: Program executes same path for all input of same size ◮ Example: x == y ? a : b oblivousSelect (a, b, x, y): ... mov %[x],%%eax mov %[y],%%ebx xor %%eax , %%ebx ... mov %[a],%%ecx mov %[b],%%edx cmovz %%ecx ,%% edx ... mov %%edx , %[out] FEARLESS engineering 7 / 29

  11. Challenge - Memory Constraint Challenge: Memory Constraint ◮ SGX (v1) only 90MB enclave FEARLESS engineering 8 / 29

  12. Challenge - Memory Constraint Challenge: Memory Constraint ◮ SGX (v1) only 90MB enclave Solution ◮ Blocking - Break large data into small blocks ◮ We utilize SGXBigMatrix (Shaon et al., 2017) primitives ◮ BigMatrix handles the complexity of data blocking FEARLESS engineering 8 / 29

  13. Objectives - Summary Encrypted Intermediate Data Encrypted Search Query Encrypted Result Pre-Processing Final Processing ◮ Very low client side processing ◮ Build index securely in the cloud using SGX ◮ Build data oblivious algorithms ◮ Support ranked retrieval FEARLESS engineering 9 / 29

  14. SGX IR - Document and Query Types ◮ Text Data ◮ Ranked document retrieval using TF-IDF (Token Frequency and Inverse Document Frequency) ◮ Image Data ◮ Face recognition using Eigenface FEARLESS engineering 10 / 29

  15. Text Pre-Processing - Client cryptographi 1 tok-id doc-id freq practic 2 Encrypted 1 1 2 Cryptography is cryptographi the practice and Tokenization BigMatrix TokenID 1 10 practic studi studi 3 2 study of techniques Stemming Generation Generation techniqu secur for secure commun 3 1 6 techniqu 4 communication ... ... 5 ... ... secur ◮ We tokenize and stem the input text files ◮ We build a matrix I with token id , document id , and frequency columns ◮ Finally, we encrypt I and upload ◮ Single round of read and write is required FEARLESS engineering 11 / 29

  16. Text Indexing - Server freq tok-id count sum tok-id doc-id freq tok-id doc-id 1 1 2 1 8 20 1,0 1 2 ... ... ... 2 4 9 2 1 3 ... ... ... 3 7 15 1,1 7 2 ... ... ... 8 2 1 4 5 3 Indexing 5 1 2 1 2 5 2,0 1 3 ... ... ... ... ... ... 6 1 1 ... ... ... 2,1 17 3 8 9 10 ... ... ... 1 4 1 # # # ... ... ... ... ... ... 3,0 9 10 ◮ Input I , we output two matrices ◮ U ′ containing total frequencies of the tokens, for IDF calculation ◮ T containing equal length blocks of token to document frequency mapping for TF calculation FEARLESS engineering 12 / 29

  17. Text Indexing - IDF - Server tok-id doc-id freq tok-id doc-id freq tok-id count sum tok-id count sum 1 1 2 1 8 20 1 1 2 1 0 0 2 1 3 1 2 5 # # # 2 4 9 ... ... ... ... ... ... Sort and Sort 1 4 1 Count & Sum 3 7 15 ... ... ... 8 2 1 2 8 20 4 5 3 Adjust 1 2 5 2 1 3 # # # 5 1 2 ... ... ... ... ... ... 2 5 10 6 1 1 ... ... ... ... ... ... 17 3 8 3 4 9 1 4 1 3 6 4 # # # # # # ... ... ... ... ... ... ... ... ... ... ... ... ◮ I ′ ← Obliviously sort I on token id column ◮ We generate U , to keep count and sum of frequencies ◮ c ← I ′ [ i ] .token id � = I ′ [ i − 1] .token id ◮ U [ i ] .sum ← obliviousSelect ( sum, # , 1 , c ) ◮ sum ← obliviousSelect ( sum, 0 , 1 , c ) + I [ i ] .frequency ◮ Finally, we sort this matrix so that the dummy entries go to the bottom FEARLESS engineering 13 / 29

  18. Text Indexing - TF - Block Size Optimization ◮ We can read document frequency of tokens from matrix I ′ ◮ This will reveal number of documents having a specific token ◮ So, we split I ′ into equal length blocks ◮ We optimize block size b from count column of U ′ using technique outline in (Shaon and Kantarcioglu, 2016) ◮ We assume the frequency follow Pareto distribution ◮ Mathematically find the value minimize the padding FEARLESS engineering 14 / 29

  19. Text Indexing - TF - Padding Generation We regenerate token id with bucket number function σ tok-id doc-id freq tok-id doc-id freq 1,0 1 2 1 1 2 ... ... ... 1 2 5 Regenerate 1 4 1 TokenId 1,1 7 2 ... ... ... ... ... ... 2 1 3 2,0 1 3 ... ... ... 2 5 10 ... ... ... 2,1 9 10 ... ... ... 3 6 4 ... ... ... 3,0 9 10 We generate padding tok-id count sum freq tok-id doc-id 1 8 20 1,1 # # ... ... ... 2 4 9 Generate # # 3 7 15 # 4 5 3 Padding Rows ... ... ... 5 1 2 2,1 # # ... ... ... 6 1 1 ... ... ... # # # ... ... ... # # # ... ... ... 3,1 # # Finally we merge and sort X and J to get the output T matrix. FEARLESS engineering 15 / 29

  20. TF - IDF Calculation ◮ On T we run term frequency functions - (log normalization) 1 + log ( tf t,d ) ◮ On U ′ we run document frequency functions, such as, IDF log N d f t ◮ Query result we use T for TF and U ′ for IDF FEARLESS engineering 16 / 29

  21. Bitonic Sorting of Arbitrary Input Size ◮ Sorting is one of the most frequently used operations ◮ We use arbitrary length Bitonic sort version (Lang, 1998) ◮ However, existing definition is recursive ◮ Not suitable for memory constrained environments like SGX ◮ So, we propose a non-recursive algorithm without using stack FEARLESS engineering 17 / 29

  22. Bitonic Sort Non Recursive Algorithm - Concept Concept ◮ We can express a number as N = 2 x m + ... +2 x 3 +2 x 2 +2 x 1 ◮ Merge network can sort a descending and an ascending block into ascending order block ◮ We sort then merge from smallest to biggest block Sort Merge FEARLESS engineering 18 / 29

  23. Bitonic Sort Non Recursive Algorithm 1: for d = 0 to ⌈ log 2 ( N ) ⌉ do if (( N >> d ) & 1) � = 0 then 2: start ← ( − 1 << ( d + 1)) & N 3: size ← 1 << d 4: dir ← ( size & N & − N ) � = 0 5: bitonicSort 2 K ( start, size, dir ) 6: if ! dir then 7: bitonicMerge ( start, N − start, 1 ) 8: end if 9: end if 10: 11: end for FEARLESS engineering 19 / 29

  24. Face recognition indexing ◮ We adopt EigenFace ◮ Pre-processing and matching face are simple matrix operations ◮ Core problem to solve obliviously is eigenvector calculation ◮ We adopt Jacobi method of eigenvector calculation FEARLESS engineering 20 / 29

  25. Eigenvector calculation - Jacobi method Oblivious Oblivious Column Extract Column Assign Oblivious Value Extract Rotate Calculate & Ressign Oblivious Row Assign We find the max off-diagonal element at A k,l , then rotate column k and l . Repeat until A becomes diagonal. The diagonal values are eigenvalues. FEARLESS engineering 21 / 29

  26. Experimental Evaluations We implemented a prototype using Intel SGX SDK 2.6 for Linux Setup ◮ Processor Intel Xeon E3-1270 ◮ Memory 64GB ◮ OS Ubuntu 18.04 ◮ SGX SDK Version 2.6 for Linux FEARLESS engineering 22 / 29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend