SLIDE 1
Proofs of Retrievability via Fountain Code
Sumanta Sarkar and Reihaneh Safavi-Naini
Department of Computer Science, University of Calgary, Canada
Foundations and Practice of Security October 25, 2012
SLIDE 2 Outsourcing Data into Cloud Storage
◮ Suppose a user generates lots of electronic data: videos,
photos, emails, text documents.
◮ He also has many devices: desktop, laptop, tablet,
- smartphone. But none of them are capable of storing huge
data.
◮ Cloud storage comes with the solution:
◮ Outsource the data into the cloud. ◮ Access all data from all the devices and from anywhere. ◮ Cloud keeps the whole data intact as long as the client wants.
SLIDE 3
Risk of Outsourcing Data into Cloud Storage
◮ Completely rely on the cloud for the integrity of the data. ◮ No control over the infrastructure of the cloud. ◮ Device failure may erase some portions of the data. ◮ A dishonest cloud may erase some portions of the data to
reduce its own storage cost.
SLIDE 4
Checking the Integrity of the Data
◮ Store a MAC of the data locally. ◮ Can download the whole file, compute the MAC and check
with the previously stored one.
◮ Not a practical solution when the data is big.
SLIDE 5
Proofs of Retrievability (PoR)
◮ Juels and Kaliski 2007 introduced Proofs of Retrievability
(PoR) protocol which verifies the integrity of the data through an audit protocol.
SLIDE 6
Proofs of Retrievability (PoR)
◮ Juels and Kaliski 2007 introduced Proofs of Retrievability
(PoR) protocol which verifies the integrity of the data through an audit protocol.
◮ The client applies an erasure code on the file M and stores
the encoded file M′ in the cloud.
◮ M can be decoded from a fraction, say ρ of M′.
SLIDE 7
Proofs of Retrievability (PoR)
◮ Juels and Kaliski 2007 introduced Proofs of Retrievability
(PoR) protocol which verifies the integrity of the data through an audit protocol.
◮ The client applies an erasure code on the file M and stores
the encoded file M′ in the cloud.
◮ M can be decoded from a fraction, say ρ of M′. ◮ Along with M′, the client also stores some extra information
∆(M) which will be used in the audit.
◮ An audit is a challenge-response protocol. In the audit the
client (verifier) challenges on some random location of the file and cloud’s (prover) correct response proves that file blocks are intact in those locations.
SLIDE 8
Proofs of Retrievability (PoR)
◮ Juels and Kaliski 2007 introduced Proofs of Retrievability
(PoR) protocol which verifies the integrity of the data through an audit protocol.
◮ The client applies an erasure code on the file M and stores
the encoded file M′ in the cloud.
◮ M can be decoded from a fraction, say ρ of M′. ◮ Along with M′, the client also stores some extra information
∆(M) which will be used in the audit.
◮ An audit is a challenge-response protocol. In the audit the
client (verifier) challenges on some random location of the file and cloud’s (prover) correct response proves that file blocks are intact in those locations.
◮ The security of a PoR scheme is formalized by showing the
existence of an extractor which retrieves the file with very high probability from an erasing adversary that can pass the audit protocol with some reasonable probability.
SLIDE 9
Efficiency of PoR System
◮ The computational cost of preparing a file for storing in the
cloud, and calculating the response,
◮ Communication cost required during an audit and, ◮ The extra storage (overhead) needed for storing the file M.
SLIDE 10
Efficiency of PoR System
◮ The computational cost of preparing a file for storing in the
cloud, and calculating the response,
◮ Communication cost required during an audit and, ◮ The extra storage (overhead) needed for storing the file M. ◮ So small size challenge improves the communication cost of
the protocol, and also the computation cost of the prover as less blocks will be involved in the computation of response.
SLIDE 11
Bounded/Unbounded-use PoR and Private/Public Verifiability
◮ PoR that allows “unlimited” number of challenge-response
interactions is unbounded-use, otherwise it is bounded-use.
◮ A private verifiable PoR allows only the owner of the file who
stores the file can run the challenge-response protocol, whereas in public verifiable PoR, anyone knowing the appropriate public key can perform the verification.
SLIDE 12
Main Contribution
◮ We present an unbounded-use private PoR scheme that
improves the cost of response computation and the cost of communication of challenges in the average case.
◮ Our construction closely follows that of Shacham and Waters
2008 and uses Fountain code.
SLIDE 13
Related Work on PoR
◮ PoR was introduced by Juels and Kaliski 2007 and
subsequently has been extended and improved by Shacham and Waters 2008; Bowers, Juels and Oprea 2009; Dodis, Vadhan and Wichs 2009.
◮ JK07 scheme has quadratic communication complexity (in
terms of security parameter) for response.
◮ This was improved to linear complexity in SW08 by using
homomorphic linear authenticators.
◮ Dodis et al. viewed the set of all correct responses
corresponding to the file M′ = Enc(M) stored in the cloud as a codeword C which is a challenge-response encoding of M.
◮ The set of all responses for the same file M′ from the prover
form a word C ′ which may differ from C. The extractor decodes M from C ′.
SLIDE 14 Background on PoR
We follow SW08.
◮ Kg(): This randomized algorithm generates a secret key sk
and the public key pk.
◮ St(sk,M): This randomized algorithm takes the secret key sk
and the client file M ∈ {0, 1}∗. Then it processes M and
- utputs M∗ which is stored in the cloud.
◮ P, V : The randomized algorithms that correspond to the
prover and the verifier. At the end of the prover-verifier interaction: {0, 1} R ← (V(pk, sk, t) ⇋ P(pk, t, M∗)).
SLIDE 15
PoR properties: Correctness and Soundness
◮ Correctness means that if the prover is honest then
(V(pk, sk, t) ⇋ P(pk, t, M∗)) = 1.
◮ A PoR is sound if any prover that convinces the verification
means that it actually holds the file.
SLIDE 16 ǫ-adversary and the Extractor
◮ Adversary is assumed to erase some portion of the file with
probability bounded by a fixed value.
◮ A prover is ǫ-admissible if it convincingly answers an ǫ fraction
◮ A PoR scheme is ǫ-sound if there exists an extraction
algorithm (Extractor) which by interacting (challenge-response) with the ǫ-admissible adversary can recover the file except with negligible probability.
SLIDE 17
Fountain Codes
◮ In Fountain codes the sender generates potentially a limitless
string of encoded symbols. The receiver can recover the message from sufficiently many encoded symbols.
◮ Examples: LT code [Luby 2002] and Raptor code [Shokrollahi
2006] are two well known Fountain codes.
SLIDE 18 Raptor Code: Encoding
Precoding
◮ The message is (x1, . . . , xk), where each xi is of ℓ-bits. ◮ First (x1, . . . , xk) is encoded to (y1, . . . , yn) by an erasure
code Cn which can recover (x1, . . . , xk) from any ρn number
LT coding To generate Raptor encoding symbols, LT code is applied on (y1, . . . , yn). For that, a degree distribution defined by a polynomial w(x) =
n
wixi where wi is the probability of choosing i, i ∈ {1, . . . , n} is chosen.
◮ Randomly choose a degree, say j, using w(x). ◮ Choose uniformly at random, j symbols from the set
{y1, . . . , yn}, and XOR them to produce the encoded symbol (output symbol) ri = yi1 ⊕ . . . ⊕ yij.
SLIDE 19
Raptor Code structure
SLIDE 20
Raptor Code: Decoding
◮ After collecting ri symbols little more than k in amount, apply
BP decoding and get ρ fraction of {y1, . . . , yn}, and then applying decoding of Cn receiver can recover (x1, . . . , xk).
SLIDE 21 Raptor Code parameters
◮ The following are from the Raptor code construction given in
[Shokrollahi 2006].
◮ Let α > 0 be a real number, set D = ⌈4(1 + α)/α⌉ and define
wD(x) = 1 µ + 1(µx +
D
xi (i − 1)i + xD+1 D ), (1) where µ = (α/2) + (α/2)2.
◮ The average of wD is
ln(1/α) + β + O(α), (2) where 1 < β < 1 + γ + ln(9), the constant γ is the Euler’s constant.
SLIDE 22 Results on decoding Raptor Code
Lemma (Shokrollahi 2006)
There exists a positive real number c (depending on α) such that with an error probability of at most e−cn any set of (1 + α/2)n + 1
- utput symbols of the LT-code with distribution wD and n-input
symbols y1, . . . , yn are sufficient to recover at least ρn input symbols from {y1, . . . , yn} via belief propagation decoding, where ρ = 1 − α/4
1+α.
Theorem (Shokrollahi 2006)
Let α > 0 be a real number, k an integer, D = ⌈4(1 + α)/α⌉, R = (1 + α/2)/(1 + α), n = ⌈k/R⌉. Let Cn be an erasure code which can decode (1 − R)/2 erasures. Then the Raptor code with precode Cn and the LT-code with the distribution wD(x) which encodes k symbols, can decode from (1 + α)k output symbols.
SLIDE 23 PoR of SW08
◮ Suppose F ′ = (m1, . . . , mn) is the erasure encoded file of the
client file F. Each mi ∈ Zp.
◮ Choose θ ∈ Zp randomly and create authenticators
σi = PRF(i) + θmi.
◮ Challenge: Q = {(i1, v1), . . . , (iw, vw)}, where ij randomly
chosen from {1, . . . , n} and vj chosen randomly from Zp.
◮ Response: r = (i,vi)∈Q vimi and σ = (i,vi)∈Q viσi. ◮ Verify: σ ?
=
SLIDE 24
RAPTOR-PoR: Choosing Key
Kg(): A random symmetric encryption key kenc
R
← Kenc and a random MAC key kmac
R
← Kmac are chosen. The secret key is sk = (kenc, kmac). Since this is private verification, there is no public key pk.
SLIDE 25
RAPTOR-PoR: Preparing File for Storing
◮ First M = (x1, . . . , xk), xi is ℓ-bits, is encoded by an erasure
code Cn to obtain M′ = (y1, . . . , yn), where Cn is such that any ρn symbols from (y1, . . . , yn) will be enough for the reconstruction of M.
SLIDE 26
RAPTOR-PoR: Preparing File for Storing
◮ First M = (x1, . . . , xk), xi is ℓ-bits, is encoded by an erasure
code Cn to obtain M′ = (y1, . . . , yn), where Cn is such that any ρn symbols from (y1, . . . , yn) will be enough for the reconstruction of M.
◮ Choose a PRF key kprf R
← Kprf and a random binary ℓ × ℓ matrix A = [A1, . . . , Aℓ]T, where each Ai is an ℓ-bit row vector.
SLIDE 27
RAPTOR-PoR: Preparing File for Storing
◮ First M = (x1, . . . , xk), xi is ℓ-bits, is encoded by an erasure
code Cn to obtain M′ = (y1, . . . , yn), where Cn is such that any ρn symbols from (y1, . . . , yn) will be enough for the reconstruction of M.
◮ Choose a PRF key kprf R
← Kprf and a random binary ℓ × ℓ matrix A = [A1, . . . , Aℓ]T, where each Ai is an ℓ-bit row vector.
◮ Let t0 = n||Enckenc(kprf ||A1|| · · · ||Aℓ), and
t = t0||MACkmac(t0) be the file tag.
SLIDE 28
RAPTOR-PoR: Preparing File for Storing
◮ First M = (x1, . . . , xk), xi is ℓ-bits, is encoded by an erasure
code Cn to obtain M′ = (y1, . . . , yn), where Cn is such that any ρn symbols from (y1, . . . , yn) will be enough for the reconstruction of M.
◮ Choose a PRF key kprf R
← Kprf and a random binary ℓ × ℓ matrix A = [A1, . . . , Aℓ]T, where each Ai is an ℓ-bit row vector.
◮ Let t0 = n||Enckenc(kprf ||A1|| · · · ||Aℓ), and
t = t0||MACkmac(t0) be the file tag.
◮ For each i, where 1 ≤ i ≤ n, create authenticators σ1, . . . , σn
as σi = PRFkprf (i) ⊕ yiA for 1 ≤ i ≤ n. Each σi is also an ℓ-bit symbol.
◮ Then M∗ = (y1, . . . , yn, σ1, . . . , σn) is the processed file. Send
M∗ and t to the cloud.
SLIDE 29
RAPTOR-PoR: Audit (1)
V.Tagcheck(sk, t) :
◮ Obtains kmac and kenc from the secret key sk. ◮ t0 = n||Enckenc(kprf ||A1|| · · · ||Aℓ)
t = t0||MACkmac(t0) Receives the tag t from the prover and verify it by the kmac, if MAC does not match, quit the audit. Otherwise, using the symmetric key kenc, decrypt Enckenc(kprf ||A1|| · · · ||Aℓ) and recover n, kprf and the matrix A.
SLIDE 30 RAPTOR-PoR: Audit (1)
V.Tagcheck(sk, t) :
◮ Obtains kmac and kenc from the secret key sk. ◮ t0 = n||Enckenc(kprf ||A1|| · · · ||Aℓ)
t = t0||MACkmac(t0) Receives the tag t from the prover and verify it by the kmac, if MAC does not match, quit the audit. Otherwise, using the symmetric key kenc, decrypt Enckenc(kprf ||A1|| · · · ||Aℓ) and recover n, kprf and the matrix A. V.Chal(n) :
◮ Choose an integer w using the degree distribution with the
generator polynomial wD(x) = n
i=1 wixi. Then choose w
indices, say {i1, . . . , iw}, uniformly from {1, . . . , n} and choose
- ne index, say c, uniformly at random from {i1, . . . , iw}. Send
Q = ({i1, . . . , iw}, {c}) to the prover.
SLIDE 31
RAPTOR-PoR: Audit (2)
◮ P(Q, M∗) : In response to the challenge Q compute
r = yi1 ⊕ . . . ⊕ yiw (3) σ = σi1 ⊕ . . . ⊕ σiw . Send resp = (r, σ, yc, σc) to the verifier.
SLIDE 32 RAPTOR-PoR: Audit (2)
◮ P(Q, M∗) : In response to the challenge Q compute
r = yi1 ⊕ . . . ⊕ yiw (3) σ = σi1 ⊕ . . . ⊕ σiw . Send resp = (r, σ, yc, σc) to the verifier.
◮ V.Ver(A, kprf , resp) : After receiving prover’s response, check
whether σ ? = rA ⊕
PRFkprf (i), σc
?
= PRFkprf (c) ⊕ ycA.
SLIDE 33 Parameters for RAPTOR-PoR
◮ Refer to Raptor code parameters: we take α = 1/ℓ. ℓ is the
security parameter.
◮ Rate of the precode Cn is R = 2ℓ+1 2ℓ+2. Then n = poly(ℓ), if
k = poly(ℓ).
◮ The erasure probability that Cn can handle is 1 − ρ = 1 4(ℓ+1). ◮ D = 4(ℓ + 1), µ = 1 2ℓ + 1 4ℓ2 , the degree distribution is
wD(x) = 2ℓ + 1 4ℓ2 + 2ℓ + 1 x + 4ℓ2 2ℓ + 1
1.2 + x3 2.3 + . . . + x4(ℓ+1) (4ℓ + 3)(4ℓ + 4) + x4ℓ+5 4ℓ + 4
The mean of this distribution is ln(ℓ) + β + O(1/ℓ) = O(log ℓ), where 1 < β < 1 + γ + ln(9), the constant γ is the Euler’s constant.
SLIDE 34
RAPTOR-PoR: Result on Extractor
Theorem
If the prover is ǫ-admissible then running the Audit protocol for
(1+1/ℓ)k ǫ
iterations, the extractor will be able to retrieve the file with error probability e−poly(ℓ).
SLIDE 35 RAPTOR-PoR: Comparison with the other PoR
◮ All previous schemes challenge a fixed number of blocks of
◮ In our scheme, the size of the challenge set is chosen from the
interval [1, 4ℓ + 5] according to the probability distribution wD(x).
◮ So in the worst case, it is O(ℓ). However, in the average case
it is ln(ℓ) + β + O(1/ℓ), where 1 < β < 1 + γ + ln(9), i.e., O(log ℓ).
◮ This also means that the cloud has to consider O(log ℓ)
number of blocks while computing a response in the average case.
◮ In RAPTOR-PoR, the response is formed just by XORing w
ℓ-bit-elements, whereas forming a response for a challenge on w elements in SW 2008 scheme, one has to compute w-multiplications and (w − 1)-additions over Zp.
SLIDE 36
Conclusion
◮ We have proposed a PoR construction based on the SW 2008
PoR and improved the response computation.
◮ Notably we use challenge of variable length which are chosen
probabilistically.
◮ The next task is to have an efficient implementation of our
scheme, which requires additional measures. For instance applying erasure encoding on a big file is not practical, so file should be divided into stripes and then we can apply erasure encoding on each stripes. So this require completely new analysis of the scheme.
SLIDE 37
THANK YOU