Enc Encryp ypted Sear ed Search h
Seny Kamara Brown University
Enc Encryp ypted Sear ed Search h Seny Kamara Brown University - - PowerPoint PPT Presentation
Enc Encryp ypted Sear ed Search h Seny Kamara Brown University 2 3 4 Q: Why is this happening? 5 Big Data Industry and Governments want more data NaDonal security Machine learning Business analyDcs NLP
Seny Kamara Brown University
2
► Industry and Governments want more data ► NaDonal security ► Machine learning ► Business analyDcs ► NLP ► LocaDon-based services ► …
u More intrusive & sensitive
u Photos, medical records u Location data, email, u browsing history, voicemails
u Greater need for security u Harder to secure
u NSA Bluffdale holds 2EBs! (2K PBs) u Facebook holds 300PBs of photos/
videos
u Vs. nation states, intelligence
agencies,
u Impossible to work with
u Lose search, DBs, IR u Find your photo among 300PBs? u Rank results?
u End-to-end (e2e) encryption!
u Reduces attack surface u Secure small key instead of Big Data
Cryptography Data Structures
InformaDon Retrieval Graph Theory
Databases
Combinatorial OpDmizaDon StaDsDcs
► Startups ► CipherCloud ($30M+$50M) ► Navajo (Salesforce) ► SkyHigh , Vaultive, Inpher ► Bitglass, Private Machines, … ► Major Corporations ► Microsoft, IBM, ► Google, Yahoo ► Hitachi, Fujitsu ► Funding agencies ► IARPA ► DARPA ► NSF
“There are a lot of advancements in things like encrypted search...but in general it is a difficult problem”
tk EDB
DB DB
tk EDB
St Storag age le leak akag age Qu Query leakage Siz Size of f EDB EDB Se Sear arch h Dme me Siz Size of f tk tk
► Stream ciphers [SWP01] ► BuckeDng [HILM02] ► Structured and searchable encrypDon (StE/SSE) [SWP01,CGKO06,CK10] ► Oblivious RAM (ORAM) [GO96] ► FuncDonal encrypDon (e.g., PEKS) [BCOP06] ► MulD-party computaDon (MPC) [Yao82,GMW87] ► Property-preserving encrypDon (PPE) [AKSX04,BBO06,BCLO09] ► Fully-homomorphic encrypDon [G09]
Effic fficiency ST STE/SSE-based PPE-based FHE-based ORAM-based skFE-based pkFE-based Leak Leakag age e
SK-FE-based STE/SSE-based PPE PPE-based
FHE-based ORAM-based PK-FE-based Effic fficiency Fu FuncDonality SQL QL NoSQL QL
► TheoreDcal Cryptography [Goldwasser-Micali82,…] ► A great success story ► Helps us reason about confidenDality, integrity, … ► Focused on leakage-free cryptography ► Real-world systems security relies on tradeoffs ► No cryptographic foundaDons for tradeoffs ► Can we leak X but not Y? ► How do we model leakage?
[Curtmola-Garay-K.-Ostrovsky06, Chase-K.10, Islam-Kuzu-Kantarcioglu12, K.15]
► Leakage analysis: what is being leaked? ► Proof: prove that soluDon leaks no more ► Cryptanalysis: can we exploit the leakage?
Leakage analysis Proof of security Leakage cryptanalysis
► Desktop search ► Windows search, Apple Spotlight ► Personal cloud storage ► Dropbox, OneDrive, iCloud, … ► Webmail ► Gmail, Yahoo! Mail, Outlook.com,…
► Standard DBs ► DB encrypted in memory ► Cloud DBs ► DB encrypted in cloud
► To & from numbers, Dme of call, duraDon for all US-to-US, US-to-Foreign and Foreign-to-US calls ► NSA DB can only be queried by individual phone number (seed) ► Analyst queries must be approved by small number of NSA officials
1 3 2 1 2 3
► CS2 (C++) ► Microsos Research, 2012 ► Queries: single keyword search ► 16MB email collecDon in 53ms
► BlindSeer (C++) [IARPA] ► Columbia & Bell Labs, 2014 ► Queries: boolean ► SyntheDc dataset ► Search Dme
► Fo
For (w1 an and w w2): 250ms
► w1 in 1 docs ► w2 in 10K docs
► IBM-UCI (C++) [IARPA] ► IBM Research & UC Irvine, 2013 ► Queries: conjuncDve ► 1.3GB email collecDon ► Search Dme
► Fo
For (w1 an and w w2): 5ms
► w1 in 15 docs ► w2 in 1M docs
► Clusion (Java) ► Brown & Colorado St., 2016 ► Queries: Boolean ► 1.3GB email collecDon ► Search Dme
► For (w
(w1 or w
) and (w3 or w
) in 1.5 1.5ms ms
► (w
(w1 or w
) in 10 docs
► (w
(w3 or w
) in 1M docs
► GRECS ► Microsos Research, Boston U., Harvard & Ben Gurion, 2015 ► Queries: (approximate) shortest distance on graphs ► 1.6M nodes & 11M edges ► Query Dme: 10ms
► ExciDng and acDve area of research ► Big potenDal impact in pracDce ► Lots of new research direcDons in theory and systems ► PotenDal for collaboraDon between many areas of CS ► Algorithms and data structures ► Databases ► InformaDon retrieval ► Combinatorial opDmizaDon ► StaDsDcs