Practical Solutions for Format- Preserving Encryption
Mor Weiss
Joint work with Boris Rozenberg and Muhammad Barham
Research conducted while all authors were at IBM Research Labs, Haifa
Practical Solutions for Format- Preserving Encryption Mor Weiss - - PowerPoint PPT Presentation
Practical Solutions for Format- Preserving Encryption Mor Weiss Joint work with Boris Rozenberg and Muhammad Barham Research conducted while all authors were at IBM Research Labs, Haifa Why Format Preserving Encryption? Why Format Preserving
Joint work with Boris Rozenberg and Muhammad Barham
Research conducted while all authors were at IBM Research Labs, Haifa
Problem (1): encrypted entry incompatible with database entry structure Non-solution (1): generate new tables
Problem (2): encrypted entry incompatible with applications using data Non-solution (2): re-write applications
general formats
– Message space ℳ – Randomized 𝐿𝑓𝑧𝐻𝑓𝑜: ℕ → – Deterministic 𝐹𝑜𝑑: × ℳ → 𝒟 – Deterministic 𝐸𝑓𝑑: × 𝒟 → ℳ
plaintext
garbage”, possibly causing
– Applications using data to crash – Tables designed to store data unsuitable for storing encrypted data
– 𝐹𝑜𝑑𝑙 is a permutation over plaintext space ℳ – Ciphertexts have same format as plaintexts!
𝐸𝑓𝑑𝑙 𝐹𝑜𝑑𝑙 𝑛 = 𝑛
– For secret and random 𝑙 ∈ – Hierarchy of security notions [BRRS`09] – Strongest: random 𝑙 ⇒ 𝐹𝑜𝑑𝑙 close to pseudorandom permutation
– Guaranteed security against (improbable) attacks incurs expensive
– Weakest: Message Recovery
– Even given advantageous distribution over ℳ
– Social Security Numbers (SSNs) [Hoo`11] – Credit Card Numbers (CCNs) – Dates [LJLC`10] – …
– Designed for specific formats (different scheme for every format) – New encryption techniques, little (if any) security analysis
– Methods described as early as 1981 – FFX [BRS`10], BPS [BPS`10] submitted to NIST for consideration Useful for general- format FPE
FPEs from int-FPE
– Order ℳ arbitrarily: 𝐬𝐛𝐨𝐥: ℳ → 1, . . , 𝑁
FPEs from int-FPE
– Order ℳ arbitrarily: 𝐬𝐛𝐨𝐥: ℳ → 1, . . , 𝑁 1 2 3 4 5 6 7 8
FPEs from int-FPE
– Order ℳ arbitrarily: 𝐬𝐛𝐨𝐥: ℳ → 1, . . , 𝑁 – To encrypt message 𝑛:
1 2 3 4 5 6 7 8
FPEs from int-FPE
– Order ℳ arbitrarily: 𝐬𝐛𝐨𝐥: ℳ → 1, . . , 𝑁 – To encrypt message 𝑛:
1 2 3 4 5 6 7 8
FPE from integer-FPE
– Order ℳ arbitrarily: 𝐬𝐛𝐨𝐥: ℳ → 1, . . , 𝑁 – To encrypt plaintext 𝑛:
– rank not meant to, and does not, add security
– “Meta” ranking technique for regular languages [BRRS`09]
– Plaintexts interpreted as strings – ℳ divided into subsets ℳ
1, … , ℳ𝑙 defined by
– Ranking computed using generalized lexicographic ordering
ℱ
𝑜𝑏𝑛𝑓: format of valid names
Name: 1-4 space-separated words Word: upper case letter followed by 1-15 lower case letters Subsets: ℳ
1 contains Al
ℳ2 contains Tal … ℳ
15 contains Muthuramakrishna
ℳ
16 contains El Al
properties
– Reason: each sub-format ℳ𝑗 encrypted separately – “John Doe” can encrypt “Jane Roe” but not “Johnnie Dee” – If only one of them is possible, adversary knows plaintext for sure
[WRB`15]
– MR (message recovery) is the weakest notion – Implies insecurity according to other FPE security notions
used to recover message
Election Commission (FEC) reports of 2008-2012
– Regulates campaign finance legislation in the US – Report lists all donors over $200:
– Data stored at remote server – Attacker has access to all or part of database – No access to secret encryption key – may have prior knowledge
2% 5% 93% 𝟐𝟏𝟏 < 𝑶 < 𝟑𝟐, 𝟘𝟒𝟏 𝑶 ≤ 𝟐𝟏 𝟐𝟏 < 𝑶 ≤ 𝟐𝟏𝟏
When 𝓑 recovers name and town columns
entries
7% 9% 28% 56%
𝟐𝟏𝟏 < 𝑶 ≤ 𝟒𝟒𝟒𝟓 𝟐𝟏 < 𝑶 ≤ 𝟐𝟏𝟏 𝟐 ≤ 𝑶 ≤ 𝟑 𝟑 < 𝑶 ≤ 𝟐𝟏
When 𝓑 recovers entire database
entries!
14% 15% 68% 3%
𝑶 = 𝟑 𝑶 = 𝟐 𝟐𝟏 < 𝑶 ≤ 𝟑𝟔𝟏 𝟑 < 𝑶 ≤ 𝟐𝟏
FPE “Wish List”
– Simple method of representing formats – Efficient rank, unrank procedures
– Hide all plaintext-specific properties
The Scheme:
– Support integer-FPEs for integral and almost integral domains
– Scheme is user-oriented
– “Basic” building-blocks (primitives)
– Operations used to construct complex formats
– ℱ
𝑣𝑞𝑞𝑓𝑠 = {A,B,…,Z}
– ℱ𝑚𝑝𝑥𝑓𝑠 = length-𝑙 lower-case letter strings, 1 ≤ 𝑙 ≤ 15 – ℱ
𝑡𝑡𝑜 = social-security numbers (SSNs)
– Concatenation:
– Words: ℱ𝑥𝑝𝑠𝑒 = ℱ
𝑣𝑞𝑞𝑓𝑠 ⋅ ℱ𝑚𝑝𝑥𝑓𝑠
𝑜 (𝑒1, … , 𝑒𝑜−1 are delimiters)
– Range: ℱ = ℱ1 ⋅ 𝑒 𝑙, 𝑛𝑗𝑜 ≤ 𝑙 ≤ 𝑛𝑏𝑦
𝑜𝑏𝑛𝑓 = ℱ𝑥𝑝𝑠𝑒 ⋅ 𝑡𝑞𝑏𝑑𝑓 𝑙 for 1 ≤ 𝑙 ≤ 4
– Union: ℱ = ℱ1 ∪ ⋯ ∪ ℱ𝑙
𝑜𝑏𝑛𝑓 ∪ ℱ 𝑡𝑡𝑜
name house # street city zip
𝑜𝑏𝑛𝑓 = ℱ𝑥𝑝𝑠𝑒 ⋅ 𝑡𝑞𝑏𝑑𝑓 𝑙 for 1 ≤ 𝑙 ≤ 4 (range)
𝑜𝑣𝑛 = 1, … , 100 (integral domain)
ℱ𝑏𝑒𝑒 = ℱ
𝑜𝑏𝑛𝑓 ⋅ ℱ 𝑜𝑣𝑛 ⋅ ℱ 𝑜𝑏𝑛𝑓 ⋅ ℱ 𝑜𝑏𝑛𝑓 ⋅ ℱ𝑨𝑗𝑞
name house # street city zip
– Use “off-the-shelf” integer-FPE schemes – Inherit security of underlying integer-FPE
– Parse string to components – Delegate substring ranking to format components – “Glue” ranks together using ranking for operations
ℱ = ℱ1 ⋅ 𝑒 ⋅ ℱ2
𝑛 = 𝑛1 ⋅ 𝑒 ⋅ 𝑛2 𝑛1 𝑛2 𝑠
1
𝑠2
𝒔 = 𝒔𝟐 + 𝒔𝟑 ⋅ 𝓖𝟐. 𝐭𝐣𝐴𝐟()
Scale by size of sub-formats
ℱ = ℱ1 ⋅ 𝑒 ⋅ ℱ2 𝑛 = 𝑛1 ⋅ 𝑒 ⋅ 𝑛2
– Only provably secure schemes
– Require factoring domain size
– Divide large formats, encrypt each sub-format separately – Minimize security loss by “hiding” plaintext-specific properties:
– 𝑛𝑏𝑦𝑇𝑗𝑨𝑓 determined by user-defined performance constraints
Main challenge!
name house # street city zip
ℱ𝑏𝑒𝑒 = ℱ
𝑜𝑏𝑛𝑓 ⋅ ℱ 𝑜𝑣𝑛 ⋅ ℱ 𝑜𝑏𝑛𝑓 ⋅ ℱ 𝑜𝑏𝑛𝑓 ⋅ ℱ𝑨𝑗𝑞
– E.g., ℱ
𝑜𝑏𝑛𝑓 divided according to number of words in name
name house # street city zip
unranking
– Generalize rank, unrank to lists of ranks
– Usually hides all plaintext-specific properties – Small 𝑛𝑏𝑦𝑇𝑗𝑨𝑓 ⇒ may preserve some properties in huge formats
plaintext properties
– Maximizes sub-format size
security loss
based FPE
– On 1M records of the Federal Election Commission (FEC) reports
matches at most 250 records
– 99% encrypted records match > 𝟐𝟏𝟏𝟏 records – 94% encrypted records match > 𝟐𝟏, 𝟏𝟏𝟏 records – 67% encrypted records match > 𝟐𝟏𝟏, 𝟏𝟏𝟏 records – …
encryption of general formats
– Also based on Rank-then-Encipher
– Formats represented using Regular Expressions – Ranking uses automatons (deterministic or non-deterministic)
– Defining new formats – Choosing “right” scheme to use
– libFTE “worst case” can be much worse
– Show security vulnerabilities – Inefficiencies also exist
– Based on Rank-the-Encipher – Simple and efficient methodology of representing and ranking formats – Flexible scheme: