UT DALLAS
Erik%Jonsson%School%of%Engineering%&%Computer%Science
FEARLESS engineering
SGX IR
Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu
The University of Texas at Dallas
FEARLESS engineering 1 / 29
SGX IR Secure Information Retrieval with Trusted Processors Fahad - - PowerPoint PPT Presentation
UT DALLAS Erik%Jonsson%School%of%Engineering%&%Computer%Science SGX IR Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu The University of Texas at Dallas FEARLESS engineering FEARLESS engineering 1 / 29
UT DALLAS
Erik%Jonsson%School%of%Engineering%&%Computer%Science
FEARLESS engineering
The University of Texas at Dallas
FEARLESS engineering 1 / 29
Encrypted Data Encrypted Result Encrypted Search Query
FEARLESS engineering 2 / 29
FEARLESS engineering 3 / 29
Encrypted Data & Code Encrypted Result
Disk Memory CPU Enclave
Server
FEARLESS engineering 4 / 29
FEARLESS engineering 5 / 29
FEARLESS engineering 6 / 29
FEARLESS engineering 6 / 29
FEARLESS engineering 7 / 29
FEARLESS engineering 7 / 29
FEARLESS engineering 7 / 29
FEARLESS engineering 8 / 29
FEARLESS engineering 8 / 29
Encrypted Intermediate Data Encrypted Result Pre-Processing Encrypted Search Query Final Processing
FEARLESS engineering 9 / 29
◮ Ranked document retrieval using TF-IDF (Token Frequency and Inverse Document Frequency)
◮ Face recognition using Eigenface
FEARLESS engineering 10 / 29
Tokenization Stemming TokenID Generation
Cryptography is the practice and study of techniques for secure communication ... cryptographi practic studi techniqu secur commun cryptographi practic studi techniqu secur 1 2 3 4 5
Encrypted BigMatrix Generation
tok-id 1 2 3 doc-id 1 1 1 ... freq ... 2 10 6 ...
FEARLESS engineering 11 / 29
doc-id tok-id freq 1 1 2 2 1 3 ... ... ... 8 2 1 1 2 5 ... ... ... 17 3 8 1 4 1 ... ... ... count tok-id sum 1 8 20 2 4 9 3 7 15 4 5 3 # # # ... ... ... ... ... ... 5 1 2 6 1 1 doc-id tok-id freq 1,0 1 2 7 2 ... ... ... 1 3 9 10 ... ... ... 1,1 ... ... 2,0 ... ... ... 2,1 9 10 3,0 ... Indexing
FEARLESS engineering 12 / 29
doc-id tok-id freq 1 1 2 2 1 3 ... ... ... 8 2 1 1 2 5 ... ... ... 17 3 8 1 4 1 ... ... ... Sort Count & Sum
Sort and Adjust
doc-id tok-id freq 1 1 2 1 2 5 ... ... ... 1 4 1 ... ... ... 2 1 3 2 5 10 3 6 4 ... ... ... count tok-id sum 1 # # # ... ... ... 2 8 20 # # # ... ... ... 3 4 9 # # # ... ... ... count tok-id sum 1 8 20 2 4 9 3 7 15 4 5 3 # # # ... ... ... ... ... ... 5 1 2 6 1 1
◮ c ← I′[i].token id = I′[i − 1].token id ◮ U[i].sum ← obliviousSelect(sum, #, 1, c) ◮ sum ← obliviousSelect(sum, 0, 1, c) + I[i].frequency
FEARLESS engineering 13 / 29
◮ We assume the frequency follow Pareto distribution ◮ Mathematically find the value minimize the padding
FEARLESS engineering 14 / 29
doc-id tok-id freq 1 1 2 1 2 5 ... ... ... 1 4 1 ... ... ... 2 1 3 2 5 10 3 6 4 ... ... ...
Regenerate TokenId
doc-id tok-id freq 1,0 1 2 7 2 ... ... ... 1 3 9 10 ... ... ... 1,1 ... ... ... 2,0 ... ... ... 2,1 9 10 3,0
Generate Padding Rows
count tok-id sum 1 8 20 2 4 9 3 7 15 4 5 3 # # # ... ... ... ... ... ... 5 1 2 6 1 1 doc-id tok-id freq 1,1 # # ... ... ... ... ... ... ... ... ... 2,1 ... ... ... 3,1 # # # # # # # # # #
FEARLESS engineering 15 / 29
FEARLESS engineering 16 / 29
FEARLESS engineering 17 / 29
Sort Merge
FEARLESS engineering 18 / 29
1: for d = 0 to ⌈log2(N)⌉ do 2:
3:
4:
5:
6:
7:
8:
9:
10:
11: end for
FEARLESS engineering 19 / 29
FEARLESS engineering 20 / 29
Oblivious Value Extract Oblivious Column Extract Rotate Oblivious Column Assign Oblivious Row Assign Calculate & Ressign
FEARLESS engineering 21 / 29
FEARLESS engineering 22 / 29
5 10 15 20 25 30 5 x 1
6
1 x 1
7
1 . 5 x 1
7
2 x 1
7
2 . 5 x 1
7
3 x 1
7
3 . 5 x 1
7
Sorting time (min) Number of rows Sorting time Sorting time next 2k
20 40 60 80 100 120 140 160 9 5 1 1 5 1 1 1 1 5 1 2 1 2 5 1 3 1 3 5 Index preparation (sec) Data set size (MB) Encryption only Incremental id MD5 hash SHA256 hash Murmur hash
FEARLESS engineering 23 / 29
100 120 140 160 180 200 220 240 260 280 9 5 1 1 5 1 1 1 1 5 1 2 1 2 5 1 3 1 3 5 Indexing time (min) Data set size (MB) Oblivious index building Non-Oblivious index building
0.2 0.4 0.6 0.8 1 9 5 1 1 5 1 1 1 1 5 1 2 1 2 5 1 3 1 3 5 NDCG Score Data set size (MB) NDCG Score compare to Lucene
FEARLESS engineering 24 / 29
15 20 25 30 35 40 8 8 5 9 9 5 1 1 5 1 1 1 1 5 1 2 Scaling and project time (sec) Number of faces Scaling and project time (sec)
50 100 150 200 250 300 350 400 8 8 5 9 9 5 1 1 5 1 1 1 1 5 1 2 1 2 5 Calculation time (Hours) Matrix elements Oblivious Eigenface calculation Non-oblivious Eigenface calculation
FEARLESS engineering 25 / 29
FEARLESS engineering 26 / 29
FEARLESS engineering 27 / 29
FEARLESS engineering 28 / 29
FEARLESS engineering 29 / 29