store forget check using algebraic signatures to check
play

Store, Forget & Check: Using Algebraic Signatures to Check - PowerPoint PPT Presentation

Store, Forget & Check: Using Algebraic Signatures to Check Remotely Administered Storage Ethan L. Miller & Thomas J. E. Schwarz Storage Systems Research Center University of California, Santa Cruz Whats the problem? Systems


  1. Store, Forget & Check: Using Algebraic Signatures to Check Remotely Administered Storage Ethan L. Miller & Thomas J. E. Schwarz Storage Systems Research Center University of California, Santa Cruz

  2. What’s the problem? • Systems store data on remote nodes • Remote nodes may not be trustworthy • Data owner must check to ensure that data is really stored • Two current approaches: • Read data from multiple sites and check for consistency • Generate checksum remotely and compare to checksum of local data • We developed an efficient algorithm that does not require keeping a local copy of the data 2

  3. Internet storage: backup • Participants in the scheme offer limited storage on their machine in exchange for storing their own data • Data protected using parity or redundancy • Extra blocks calculated using m/n redundancy codes • Generate n blocks • Require any m of the blocks to rebuild the data • Many known mechanisms for m/n codes • Linear interpolation • XOR and Galois field-based • Participants need to be able to verify that other nodes are doing their part... 3

  4. Storage Service Providers • Storage utility provides remotely managed storage • Client sends data to the SSP • Client retrieves data as needed • Trust issue: how can client tell if SSP is doing its job? • Read data, check (public key-based) signature • Read data, decrypt, check secure hash and object ID • SafeStore does something like this • Other approaches that don’t use network bandwidth? 4

  5. Peer-to-peer file systems • Farsite: uses free space on workstations within an organization • Freehaven: anonymity of storer • OceanStore • “Billions of users” • Byzantine fault tolerance, k-availability through erasure- codes • PAST • Users can store files up to their quota • Provides k-availability through replication • CFS, Intermemory, Ivy, Starfish, … 5

  6. Common challenges • Storage nodes cannot be trusted • Storage nodes might lack high uplink bandwidth • Storage nodes might have low availability • Free Rider problem • Node pretends to store data • In reality, uses replicas (or protection against unavailability mechanism) to fetch requested file from elsewhere • Gains the benefits of participation without providing storage 6

  7. Terribly naïve algorithm • Maintain local copy of data • Periodically request blocks of data and compare to the local copy • Problems • Very bandwidth-intensive • Can’t check much data • Need to keep the original! 7

  8. Terribly naïve algorithm • Maintain local copy of data • Periodically request blocks of data and compare to the local copy • Problems • Very bandwidth-intensive • Can’t check much data • Need to keep the original! 7

  9. Terribly naïve algorithm • Maintain local copy of data • Periodically request blocks of data and compare to the local copy • Problems • Very bandwidth-intensive • Can’t check much data • Need to keep the original! ≟ 7

  10. Verification: existing algorithm • Periodically, verify random blocks • Compute function across the blocks ( m/n coding) • Alternative: verify keyed hash stored with the block • Problems: • Need to transfer entire block • Taxes network with diagnostic data • Peers often have asymmetric Internet connections • Leaks information heavily 8

  11. Verification: existing algorithm • Periodically, verify random blocks • Compute function across the blocks ( m/n coding) • Alternative: verify keyed hash stored with the block • Problems: • Need to transfer entire block • Taxes network with diagnostic data • Peers often have asymmetric Internet connections • Leaks information heavily 8

  12. Verification: existing algorithm • Periodically, verify random blocks • Compute function across the blocks ( m/n coding) • Alternative: verify keyed hash stored with the block • Problems: • Need to transfer entire block • Taxes network with diagnostic data • Peers often have asymmetric Internet connections • Leaks information heavily ⊕ ⊕ 8

  13. Verification: existing algorithm • Periodically, verify random blocks • Compute function across the blocks ( m/n coding) • Alternative: verify keyed hash stored with the block • Problems: • Need to transfer entire block • Taxes network with diagnostic data • Peers often have asymmetric Internet connections • Leaks information heavily ≟ ⊕ ⊕ 8

  14. Verification using algebraic signatures • Solution: use checksums? • Cryptographic checksums (like SHA-1) won’t work for randomly selected ranges • Requires original data for comparison • Our scheme • Uses small challenges and responses • Allows unpredictable tests • Free rider can’t just store the answer to all possible challenges (with any storage benefit) • Verifies that all remote chunks are consistent with each other • Requires that parity is calculated with an XOR code, a linear m/n code, or a convolutional code • Examples: X-code, EvenOdd, row-diagonal parity, linear codes over a Galois field 9

  15. What is a Galois field? • Simple answer: • Calculations on a set of symbols • A field called GF(2 n ) uses n -bit symbols • Two kinds of operations • Addition (done by XOR) • Multiplication (more complex, done by tables) • Complex answer: • Galois fields are math done using the coefficients of polynomial equations • Often, coefficients are represented in base-2 • Galois field using polynomials with maximum degree n and base-2 coefficients are called GF(2 n ) • This answer explains how the addition and multiplication tables are generated 10

  16. What is an algebraic signature? • Digital hash with algebraic properties • Important properties: • Small changes in data result in complete change of signature • Signature of parity is parity of signatures D 1 D 2 D 3 D m P 1 P 2 P 3 P k ••• ••• (sig(D 1 ),sig(D 2 ),sig(D 3 ), … sig(D m ), sig(P 1 ),sig(P 2 ),sig(P 3 )…sig(P k )) is a codeword! 11

  17. Algebraic signatures • Defined over same Galois field as the linear m/n code • Use “primitive” element a • All non-zero elements are powers of a • Consists of n coordinates • Additional properties if a i = a i • Coordinate signature defined by 12

  18. Algebraic signatures • Algebraic properties • Assume that X and Y are large data objects: • sig(X ⊕ Y) = sig(X) ⊕ sig(Y) • sig( β ⋅ X) = β ⋅ sig(X) • Multiplication is in the Galois field of the signature calculation • Signatures and parity formation commute • Signatures can be updated from the old signature and the signature of the delta (XOR) between old and new data • Signature calculation is fast ! • Hundreds of megabytes per second on a modern CPU • Speed limited by disk bandwidth 13

  19. Our algorithm D 1 D 2 D 3 P • Store data across distributed system • Challenge sites to prove that they hold the data • Sites respond with the signatures of requested data • Sites reveal tiny amount of information: size of signature • Challenger verifies that the signatures are consistent 14

  20. Our algorithm D 1 D 2 D 3 P • Store data across distributed system • Challenge sites to prove that they hold the data • Sites respond with the signatures of requested data • Sites reveal tiny amount of information: size of signature • Challenger verifies that the Calculate signature of 32 byte signatures are consistent ranges at 4+ i × 71, i = 5,…,20 14

  21. Our algorithm D 1 D 2 D 3 P • Store data across distributed system • Challenge sites to prove that they hold the data • Sites respond with the signatures of requested data • Sites reveal tiny amount of information: size of signature • Challenger verifies that the Calculate signature of 32 byte signatures are consistent ranges at 4+ i × 71, i = 5,…,20 14

  22. Our algorithm D 1 D 2 D 3 P • Store data across distributed system • Challenge sites to prove that they hold the data • Sites respond with the signatures of requested sig 1 data • Sites reveal tiny amount of information: size of signature • Challenger verifies that the Calculate signature of 32 byte signatures are consistent ranges at 4+ i × 71, i = 5,…,20 14

  23. Our algorithm D 1 D 2 D 3 P • Store data across distributed system • Challenge sites to prove that they hold the data • Sites respond with the signatures of requested sig 1 sig 2 sig 3 sig p data • Sites reveal tiny amount of information: size of signature • Challenger verifies that the Calculate signature of 32 byte signatures are consistent ranges at 4+ i × 71, i = 5,…,20 14

  24. Our algorithm D 1 D 2 D 3 P • Store data across distributed system • Challenge sites to prove that they hold the data • Sites respond with the signatures of requested sig 1 sig 2 sig 3 sig p data • Sites reveal tiny amount of information: size of signature • Challenger verifies that the signatures are consistent 14

  25. Our algorithm D 1 D 2 D 3 P • Store data across distributed system • Challenge sites to prove that they hold the data • Sites respond with the signatures of requested sig 1 sig 2 sig 3 sig p data • Sites reveal tiny amount of information: size of signature • Challenger verifies that the signatures are consistent sig 1 ⊕ sig 2 ⊕ sig 3 ≟ sig P 14

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend