locally decodable codes
play

Locally decodable codes: from computational complexity to cloud - PowerPoint PPT Presentation

Locally decodable codes: from computational complexity to cloud computing Sergey Yekhanin Microsoft Research Error-correcting codes: paradigm 2 E() 2 E() +noise 0110001 011000100101 01*00*10010*


  1. Locally decodable codes: from computational complexity to cloud computing Sergey Yekhanin Microsoft Research

  2. Error-correcting codes: paradigm π‘œ 𝑙 π‘Œ π‘Œ ∈ 𝐺 2 E(π‘Œ) ∈ 𝐺 2 E(π‘Œ) +noise 0110001 011000100101 01*00*10010* 0110001 Encoder Channel Decoder Corrupts up to 𝑓 coordinates. The paradigm dates back to 1940s (Shannon / Hamming)

  3. Local decoding: paradigm 𝑙 π‘œ E(π‘Œ) +noise E(π‘Œ) ∈ 𝐺 2 π‘Œ 𝑗 π‘Œ ∈ 𝐺 2 0110001 011000100101 01*00*10010* 1 Local Encoder Channel Decoder Corrupts up to 𝑓 Reads up to 𝑠 coordinates. coordinates. Local decoder runs in time much smaller than the message length! β€’ First account: Reed’s decoder for Muller’s codes (1954) β€’ Implicit use: (1950s-1990s) β€’ Formal definition and systematic study (late 1990s) [Levin’ 95, STV’ 98, KT’ 00]  Original applications in computational complexity theory  Cryptography  Most recently used in practice to provide reliability in distributed storage

  4. Local decoding: example E(X) X 1 X 2 X 3 X X 1 X 2 X 3 X 1 οƒ… X 2 X 1 οƒ… X 3 X 2 οƒ… X 3 X 1 οƒ… X 2 οƒ… X 3 Message length: k = 3 Codeword length: n = 7 Corrupted locations: 𝑓 = 3 Locality: 𝑠 = 2

  5. Local decoding: example E(X) X 1 X 2 X 3 X X 1 X 2 X 3 X 1 οƒ… X 2 X 1 οƒ… X 3 X 2 οƒ… X 3 X 1 οƒ… X 2 οƒ… X 3 Message length: k = 3 Codeword length: n = 7 Corrupted locations: 𝑓 = 3 Locality: 𝑠 = 2

  6. Locally decodable codes 𝑙 β†’ 𝐺 π‘œ is 𝑠 -locally decodable, if for every message π‘Œ , Definition: A code E: 𝐺 π‘Ÿ π‘Ÿ each π‘Œ 𝑗 can be recovered from reading some 𝑠 symbols of 𝐹(π‘Œ) , even after up to 𝑓 coordinates of 𝐹(π‘Œ) are corrupted. (Erasures.) Decoder is aware of erased locations. Output is always correct. β€’ (Errors.) Decoder is randomized. Output is correct with probability 99%. β€’ k symbol message Decoder reads only r symbols 0 1 … 0 1 n symbol codeword Noise 0 0 1 0 1 … 0 1 1 0 1 0 … 0 1

  7. Locally decodable codes Goal: Understand the true shape of the tradeoff between redundancy π‘œ βˆ’ 𝑙 and locality 𝑠 , for different settings of 𝑓 (e.g., 𝑓 = πœ€π‘œ, π‘œ πœ— , 𝑃 1 . ) 𝑠 𝑙 𝜁 Multiplicity codes Local Projective reconst- (log 𝑙) 𝑑 Reed Muller geometry ruction codes codes codes Matching 𝑃(1) vector codes π‘œ 𝜁 𝑃(1) πœ€π‘œ 𝑓 Taxonomy of known families of LDCs

  8. Plan β€’ Part I: (Computational complexity) β€’ Average case hardness β€’ An avg. case hard language in EXP (unless EXP βŠ† BPP) β€’ Construction of LDCs β€’ Open questions β€’ Part II: (Distributed data storage) β€’ Erasure coding for data storage β€’ LDCs for data storage β€’ Constructions and limitations β€’ Open questions

  9. Part I: Computational complexity

  10. Average case complexity A problem is hard-on-average if any efficient algorithm errs on 10% of the inputs. β€’ Establishing hardness-on-average for a problem in NP is a major problem. β€’ Below we establish hardness-on-average for a problem in EXP, assuming EXP ⊈ BPP. β€’ Construction [STV]: 𝐹(π‘Œ) π‘Œ 𝑙 β†’ 𝐺 2 π‘œ E: 𝐺 2 Level 𝑙 is 1 1 0 1 0 1 1 1 1 0 0 0 π‘œ = π‘žπ‘π‘šπ‘§ 𝑙 , a string π‘Œ of 1 0 length 2 𝑙 𝑠 = (log 𝑙) 𝑑 , 𝑓 = π‘œ/10. 𝑀 β€² is in EXP 𝑀 is EXP-complete Theorem: If there is an efficient algorithm that errs on <10% of 𝑀 β€² ; then EXP βŠ† BPP.

  11. Average case complexity Theorem: If there is an efficient algorithm that errs on <10% of 𝑀 β€² ; then EXP βŠ† BPP. Proof: We obtain a BPP algorithm for 𝑀 : Let A be the algorithm that errs on <10% of 𝑀 β€² ; β€’ A gives us access to the corrupted encoding 𝐹(π‘Œ) . To decide if π‘Œ 𝑗 invoke the local decoder for 𝐹(π‘Œ) . β€’ Time complexity is (log 2 𝑙 ) 𝑑 βˆ— π‘žπ‘π‘šπ‘§ 𝑙 = π‘žπ‘π‘šπ‘§ 𝑙 . β€’ Output is correct with probability 99%. β€’ 𝐹(π‘Œ) π‘Œ 𝑙 β†’ 𝐺 2 0 π‘œ 1 0 E: 𝐺 2 1 0 1 1 1 1 0 1 1 0 0 π‘œ = π‘žπ‘π‘šπ‘§ 𝑙 , 𝑠 = (log 𝑙) 𝑑 , 𝑓 = π‘œ/10. 𝑀 β€² is in EXP 𝑀 is EXP-complete

  12. Reed Muller codes Parameters: π‘Ÿ, 𝑛, 𝑒 = 1 βˆ’ 4πœ€ π‘Ÿ. β€’ Codewords: evaluations of degree 𝑒 polynomials in 𝑛 variables over 𝐺 π‘Ÿ . β€’ 𝑛 π‘Ÿ 𝑨 1 , … , 𝑨 𝑛 , deg f < 𝑒 yields a codeword: 𝑔(𝑦 ) 𝑦 ∈𝐺 Polynomial 𝑔 ∈ 𝐺 β€’ π‘Ÿ Parameters: π‘œ = π‘Ÿ 𝑛 , 𝑙 = 𝑛 + 𝑒 , 𝑠 = π‘Ÿ βˆ’ 1, 𝑓 = πœ€π‘œ. β€’ 𝑛

  13. Reed Muller codes: local decoding Key observation: Restriction of a codeword to an affine line yields an β€’ q , m , d . evaluation of a univariate polynomial 𝑔 𝑀 of degree at most 𝑒. To recover the value at 𝑦 : β€’ – Pick a random affine line through 𝑦 . – Do noisy polynomial interpolation. 𝑛 𝐺 π‘Ÿ 𝑦 Locally decodable code: Decoder reads π‘Ÿ βˆ’ 1 random locations. β€’

  14. Reed Muller codes: parameters 𝑙 = 𝑛 + 𝑒 π‘œ = π‘Ÿ 𝑛 , , 𝑒 = 1 βˆ’ 4πœ€ π‘Ÿ, 𝑠 = π‘Ÿ βˆ’ 1, 𝑓 = πœ€π‘œ. 𝑛 Setting parameters: 1 π‘ βˆ’1 . q = 𝑃 1 , 𝑛 β†’ ∞: 𝑠 = 𝑃 1 , π‘œ = exp 𝑙 β€’ Better codes are q = 𝑛 2 ∢ 𝑠 = (log 𝑙) 2 , π‘œ = π‘žπ‘π‘šπ‘§ 𝑙 . β€’ known q β†’ ∞, 𝑛 = 𝑃 1 : 𝑠 = 𝑙 πœ— , π‘œ = 𝑃 𝑙 . β€’ Reducing codeword length is a major open question.

  15. Part II: Distributed storage

  16. Data storage β€’ Store data reliably β€’ Keep it readily available for users

  17. Data storage: Replication β€’ Store data reliably β€’ Keep it readily available for users β€’ Very large overhead β€’ Moderate reliability β€’ Local recovery: Lose one machine, access one

  18. Data storage: Erasure coding β€’ Store data reliably β€’ Keep it readily available for users … β€’ Low overhead … … β€’ High reliability β€’ No local recovery: Loose one machine, access k k data chunks n-k parity chunks Need: Erasure codes with local decoding

  19. Codes for data storage … … P 1 P n-k X 1 X 2 X k β€’ Goals: β€’ (Cost) minimize the number of parities. β€’ (Reliability) tolerate any pattern of h+1 simultaneous failures. β€’ (Availability) recover any data symbol by accessing at most r other symbols β€’ (Computational efficiency) use a small finite field to define parities.

  20. Local reconstruction codes β€’ Def: An (r,h) – Local Reconstruction Code (LRC) encodes k symbols to n symbols, and β€’ Corrects any pattern of h+1 simultaneous failures; β€’ Recovers any single erased data symbol by accessing at most r other symbols.

  21. Local reconstruction codes β€’ Def: An (r,h) – Local Reconstruction Code (LRC) encodes k symbols to n symbols, and β€’ Corrects any pattern of h+1 simultaneous failures; β€’ Recovers any single erased data symbol by accessing at most r other symbols. 𝑙 β€’ Theorem[GHSY]: In any (r,h) – (LRC), redundancy n-k satisfies π‘œ βˆ’ 𝑙 β‰₯ 𝑠 + β„Ž.

  22. Local reconstruction codes β€’ Def: An (r,h) – Local Reconstruction Code (LRC) encodes k symbols to n symbols, and β€’ Corrects any pattern of h+1 simultaneous failures; β€’ Recovers any single erased data symbol by accessing at most r other symbols. 𝑙 β€’ Theorem[GHSY]: In any (r,h) – (LRC), redundancy n-k satisfies π‘œ βˆ’ 𝑙 β‰₯ 𝑠 + β„Ž. β€’ Theorem[GHSY]: If r | k and h<r+1; then any (r,h) – LRC has the following topology: Light … parities L 1 L g Local group … … … X k-r Data symbols X 1 X r X k Heavy … H 1 H h parities

  23. Local reconstruction codes β€’ Def: An (r,h) – Local Reconstruction Code (LRC) encodes k symbols to n symbols, and β€’ Corrects any pattern of h+1 simultaneous failures; β€’ Recovers any single erased data symbol by accessing at most r other symbols. 𝑙 β€’ Theorem[GHSY]: In any (r,h) – (LRC), redundancy n-k satisfies π‘œ βˆ’ 𝑙 β‰₯ 𝑠 + β„Ž. β€’ Theorem[GHSY]: If r | k and h<r+1; then any (r,h) – LRC has the following topology: Light … L 1 parities L g Local group … … … X k-r Data symbols X 1 X r X k Heavy … H 1 H h parities β€’ Fact: There exist (r,h) – LRCs with optimal redundancy over a field of size k+h.

  24. Reliability Set k=8, r=4, and h=3. L 1 L 2 X 5 X 1 X 2 X 3 X 6 X 4 X 7 X 8 H 2 H 1 H 3

  25. Reliability Set k=8, r=4, and h=3. L 1 L 2 X 5 X 1 X 2 X 3 X 6 X 4 X 7 X 8 H 2 H 1 H 3 β€’ All 4-failure patterns are correctable.

  26. Reliability Set k=8, r=4, and h=3. L 1 L 2 X 5 X 1 X 2 X 3 X 6 X 4 X 7 X 8 H 2 H 1 H 3 β€’ All 4-failure patterns are correctable. β€’ Some 5-failure patterns are not correctable.

  27. Reliability Set k=8, r=4, and h=3. L 1 L 2 X 5 X 1 X 2 X 3 X 6 X 4 X 7 X 8 H 2 H 1 H 3 β€’ All 4-failure patterns are correctable. β€’ Some 5-failure patterns are not correctable. β€’ Other 5-failure patterns might be correctable.

  28. Reliability Set k=8, r=4, and h=3. L 1 L 2 X 5 X 1 X 2 X 3 X 6 X 4 X 7 X 8 H 2 H 1 H 3 β€’ All 4-failure patterns are correctable. β€’ Some 5-failure patterns are not correctable. β€’ Other 5-failure patterns might be correctable.

  29. Combinatorics of correctable failure patterns Def: A regular failure pattern for a (r,h)-LRC is a pattern that can be obtained by failing one symbol in each local group and h extra symbols. L 1 L 2 L 1 L 2 X 3 X 4 X 7 X 3 X 4 X 7 X 1 X 2 X 5 X 6 X 8 X 1 X 2 X 5 X 6 X 8 H 1 H 2 H 3 H 1 H 2 H 3 Theorem: Every failure pattern that is not dominated by a regular failure pattern is not β€’ correctable by any LRC. There exist LRCs that correct all regular failure patterns. β€’

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend