New Bound for Batch Codes with Restricted Query Size Vitaly Skachek - - PowerPoint PPT Presentation

new bound for batch codes with restricted query size
SMART_READER_LITE
LIVE PREVIEW

New Bound for Batch Codes with Restricted Query Size Vitaly Skachek - - PowerPoint PPT Presentation

New Bound for Batch Codes with Restricted Query Size Vitaly Skachek Joint work with Hui Zhang Estonian CS Theory Days 29 January 2016 Supported by the research grants PUT405 and IUT2-1 from the Estonian Research Council and by the COST Action


slide-1
SLIDE 1

New Bound for Batch Codes with Restricted Query Size

Vitaly Skachek

Joint work with Hui Zhang Estonian CS Theory Days 29 January 2016

Supported by the research grants PUT405 and IUT2-1 from the Estonian Research Council and by the COST Action IC1104 on random network coding and designs over Fq.

  • V. Skachek

Bounds for batch codes

slide-2
SLIDE 2

Distributed storage systems

Enormous amounts of data are stored in a large number of servers. Occasionally servers fail. Failed server is replaced and the data has to be copied to the new server.

  • V. Skachek

Bounds for batch codes

slide-3
SLIDE 3

Distributed storage systems

Enormous amounts of data are stored in a large number of servers. Occasionally servers fail. Failed server is replaced and the data has to be copied to the new server.

  • V. Skachek

Bounds for batch codes

slide-4
SLIDE 4

Distributed storage systems

Enormous amounts of data are stored in a huge number of servers. Occasionally servers fail. Failed server is replaced and the data has to be copied to the new server.

  • V. Skachek

Bounds for batch codes

slide-5
SLIDE 5

Distributed storage systems

Enormous amounts of data are stored in a huge number of servers. Occasionally servers fail. Failed server is replaced and the data has to be copied to the new server.

  • V. Skachek

Bounds for batch codes

slide-6
SLIDE 6

Distributed storage systems

Enormous amounts of data are stored in a huge number of servers. Occasionally servers fail. Failed server is replaced and the data has to be copied to the new server.

  • V. Skachek

Bounds for batch codes

slide-7
SLIDE 7

Locally repairable codes

Consideration: minimize amount of transferred data. Proposed in [Dimakis, Godfrey, Wu, Wainwright, Ramchandran 2008].

  • V. Skachek

Bounds for batch codes

slide-8
SLIDE 8

Locally repairable codes

Consideration: minimize amount of transferred data. Proposed in [Dimakis, Godfrey, Wu, Wainwright, Ramchandran 2008]. Erasure-correcting codes! Additional property: erasures can be recovered by using a small number of other symbols (locality).

  • V. Skachek

Bounds for batch codes

slide-9
SLIDE 9

Locally repairable codes

Consideration: minimize amount of transferred data. Proposed in [Dimakis, Godfrey, Wu, Wainwright, Ramchandran 2008]. Erasure-correcting codes! Additional property: erasures can be recovered by using a small number of other symbols (locality).

1 1 1 1

  • V. Skachek

Bounds for batch codes

slide-10
SLIDE 10

Locally repairable codes

Consideration: minimize amount of transferred data. Proposed in [Dimakis, Godfrey, Wu, Wainwright, Ramchandran 2008]. Erasure-correcting codes! Additional property: erasures can be recovered by using a small number of other symbols (locality).

1 1 1 1 ?

  • V. Skachek

Bounds for batch codes

slide-11
SLIDE 11

Locally repairable codes

Consideration: minimize amount of transferred data. Proposed in [Dimakis, Godfrey, Wu, Wainwright, Ramchandran 2008]. Erasure-correcting codes! Additional property: erasures can be recovered by using a small number of other symbols (locality).

1 1 1 1 ?

  • V. Skachek

Bounds for batch codes

slide-12
SLIDE 12

Batch codes

Proposed in [Ishai, Kushilevitz, Ostrovsky, Sahai 2004]. Can be used in:

Load balancing. Private information retrieval. Distributed storage systems.

  • V. Skachek

Bounds for batch codes

slide-13
SLIDE 13

Batch codes

Proposed in [Ishai, Kushilevitz, Ostrovsky, Sahai 2004]. Can be used in:

Load balancing. Private information retrieval. Distributed storage systems.

Constructions: [Ishai et al. 2004]: algebraic, expander graphs, subsets, RM codes, locally-decodable codes

  • V. Skachek

Bounds for batch codes

slide-14
SLIDE 14

Prior art

Design-based constructions and bounds: [Stinson, Wei, Paterson 2009] [Brualdi, Kiernan, Meyer, Schroeder 2010] [Bujtas, Tuza 2011] [Bhattacharya, Ruj, Roy 2012] [Silberstein, Gal 2013]

  • V. Skachek

Bounds for batch codes

slide-15
SLIDE 15

Prior art

Design-based constructions and bounds: [Stinson, Wei, Paterson 2009] [Brualdi, Kiernan, Meyer, Schroeder 2010] [Bujtas, Tuza 2011] [Bhattacharya, Ruj, Roy 2012] [Silberstein, Gal 2013] Application to distributed storage: [Rawat, Papailiopoulos, Dimakis, Vishwanath 2014] [Silberstein 2014]

  • V. Skachek

Bounds for batch codes

slide-16
SLIDE 16

Prior art

Design-based constructions and bounds: [Stinson, Wei, Paterson 2009] [Brualdi, Kiernan, Meyer, Schroeder 2010] [Bujtas, Tuza 2011] [Bhattacharya, Ruj, Roy 2012] [Silberstein, Gal 2013] Application to distributed storage: [Rawat, Papailiopoulos, Dimakis, Vishwanath 2014] [Silberstein 2014] Graph-based constructions: [Dimakis, Gal, Rawat, Song 2014]

  • V. Skachek

Bounds for batch codes

slide-17
SLIDE 17

Batch codes

Definition [Ishai et al. 2004] C is an (k, N, t, n, ν)Σ batch code over Σ if it encodes any string x = (x1, x2, · · · , xk) ∈ Σk into n strings (buckets) of total length N

  • ver Σ, namely y1, y2, · · · , yn, such that for each t-tuple (batch)
  • f (not neccessarily distinct) indices i1, i2, · · · , it ∈ [k], the symbols

xi1, xi2, · · · , xit can be retrieved by t users, respectively, by reading ≤ ν symbols from each bucket, such that xiℓ is recovered from the symbols read by the ℓ-th user alone.

  • V. Skachek

Bounds for batch codes

slide-18
SLIDE 18

Batch codes

Definition [Ishai et al. 2004] C is an (k, N, t, n, ν)Σ batch code over Σ if it encodes any string x = (x1, x2, · · · , xk) ∈ Σk into n strings (buckets) of total length N

  • ver Σ, namely y1, y2, · · · , yn, such that for each t-tuple (batch)
  • f (not neccessarily distinct) indices i1, i2, · · · , it ∈ [k], the symbols

xi1, xi2, · · · , xit can be retrieved by t users, respectively, by reading ≤ ν symbols from each bucket, such that xiℓ is recovered from the symbols read by the ℓ-th user alone. Definition If ν = 1, then we use notation (k, N, t, n)Σ for it. Only one symbol is read from each bucket.

  • V. Skachek

Bounds for batch codes

slide-19
SLIDE 19

Batch codes

Definition [Ishai et al. 2004] C is an (k, N, t, n, ν)Σ batch code over Σ if it encodes any string x = (x1, x2, · · · , xk) ∈ Σk into n strings (buckets) of total length N

  • ver Σ, namely y1, y2, · · · , yn, such that for each t-tuple (batch)
  • f (not neccessarily distinct) indices i1, i2, · · · , it ∈ [k], the symbols

xi1, xi2, · · · , xit can be retrieved by t users, respectively, by reading ≤ ν symbols from each bucket, such that xiℓ is recovered from the symbols read by the ℓ-th user alone. Definition If ν = 1, then we use notation (k, N, t, n)Σ for it. Only one symbol is read from each bucket. Definition An (k, N, t, n, ν)q batch code is linear, if every symbol in every bucket is a linear combination of original symbols.

  • V. Skachek

Bounds for batch codes

slide-20
SLIDE 20

Small buckets

In what follows, consider linear codes with ν = 1 and N = n: each encoded bucket contains just one symbol in Fq.

  • V. Skachek

Bounds for batch codes

slide-21
SLIDE 21

Small buckets

In what follows, consider linear codes with ν = 1 and N = n: each encoded bucket contains just one symbol in Fq.

1 1 1 1 x x x

2 2 3

  • V. Skachek

Bounds for batch codes

slide-22
SLIDE 22

Linear batch codes

For simplicity we refer to a linear (k, N = n, t, n)q batch code as [n, k, t]q batch code.

  • V. Skachek

Bounds for batch codes

slide-23
SLIDE 23

Linear batch codes

For simplicity we refer to a linear (k, N = n, t, n)q batch code as [n, k, t]q batch code. Let x = (x1, x2, · · · , xk) be an information string. Let y = (y1, y2, · · · , yn) be an encoding of x. Each encoded symbol yi, i ∈ [n], is written as yi = k

j=1 gj,ixj

. Form the matrix G: G =

  • gj,i
  • j∈[k],i∈[n] ;

the encoding is y = xG.

  • V. Skachek

Bounds for batch codes

slide-24
SLIDE 24

Retrieval

Theorem Let C be an [n, k, t]q batch code. It is possible to retrieve xi1, xi2, · · · , xit simultaneously if and only if there exist t non-intersecting sets T1, T2, · · · , Tt of indices of columns in G, and for Tr there exists a linear combination of columns of G indexed by that set, which equals to the column vector eT

ir , for all

r ∈ [t].

  • V. Skachek

Bounds for batch codes

slide-25
SLIDE 25

Retrieval

Theorem Let C be an [n, k, t]q batch code. It is possible to retrieve xi1, xi2, · · · , xit simultaneously if and only if there exist t non-intersecting sets T1, T2, · · · , Tt of indices of columns in G, and for Tr there exists a linear combination of columns of G indexed by that set, which equals to the column vector eT

ir , for all

r ∈ [t]. Example [Ishai et al. 2004] Consider the following linear binary batch code C whose 4 × 9 generator matrix is given by G =     1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1     .

  • V. Skachek

Bounds for batch codes

slide-26
SLIDE 26

Retrieval (cont.)

Example Let x = (x1, x2, x3, x4), y = xG. Assume that we want to retrieve the values of (x1, x1, x2, x2). We can retrieve (x1, x1, x2, x2) from the following set of equations:        x1 = y1 x1 = y2 + y3 x2 = y5 + y8 x2 = y4 + y6 + y7 + y9 . It is straightforward to verify that any 4-tuple (xi1, xi2, xi3, xi4), where i1, i2, i3, i4 ∈ [4], can be retrieved by using columns indexed by some four non-intersecting sets of indices in [9]. Therefore, the code C is a [9, 4, 4]2 batch code.

  • V. Skachek

Bounds for batch codes

slide-27
SLIDE 27

Properties of linear batch codes

Theorem Let C be an [n, k, t]2 batch code C over F2. Then, G is a generator matrix of the classical error-correcting [n, k, ≥ t]2 code.

  • V. Skachek

Bounds for batch codes

slide-28
SLIDE 28

Properties of linear batch codes

Theorem Let C be an [n, k, t]2 batch code C over F2. Then, G is a generator matrix of the classical error-correcting [n, k, ≥ t]2 code. Example The converse is not true. For example, take G to be a generator matrix of the classical [4, 3, 2]2 ECC as follows: G =   1 1 1 1 1 1 1 1   . Let x = (x1, x2, x3), y = (y1, y2, y3, y4) = xG. It is impossible to retrieve (x2, x3): x2 = y1 + y2 = y3 + y4 and x3 = y1 + y3 = y2 + y4 .

  • V. Skachek

Bounds for batch codes

slide-29
SLIDE 29

Bounds on the parameters

Various well-studied properties of linear ECCs, such as MacWilliams identities, apply also to linear batch codes (for ν = 1, n = N and q = 2).

  • V. Skachek

Bounds for batch codes

slide-30
SLIDE 30

Bounds on the parameters

Various well-studied properties of linear ECCs, such as MacWilliams identities, apply also to linear batch codes (for ν = 1, n = N and q = 2). A variety of bounds on the parameters of ECCs, such as sphere-packing bound, Plotkin bound, Griesmer bound, Elias-Bassalygo bound, McEliece-Rodemich-Rumsey-Welch bound apply to the parameters of [n, k, t]2 batch codes.

  • V. Skachek

Bounds for batch codes

slide-31
SLIDE 31

Restricted Query Size

Definition A primitive (k, n, r, t) batch code C with restricted query size over an alphabet Σ encodes a string x ∈ Σk into a string y = C(x) ∈ Σn, such that for all multisets of indices {i1, i2, . . . , it}, where all ij ∈ [k], each of the entries xi1, xi2, . . . , xit can be retrieved independently of each other by reading at most r symbols of y.

  • V. Skachek

Bounds for batch codes

slide-32
SLIDE 32

Related Works

[Gopalan, Huang, Simitci, Yekhanin 2012] [Forbes, Yekhanin 2014] [Rawat, Papailiopoulos, Dimakis, Vishwanath 2010] [Rawat, Mazumdar, Vishwanath 2014] [Tamo, Barg 2014]

  • V. Skachek

Bounds for batch codes

slide-33
SLIDE 33

Main Theorem

Lemma Let C be a linear (k, n, r, t) batch code over F, x ∈ Fk, y = C(x). Let S1, S2, · · · , St ⊆ [n] be t disjoint recovery sets for the coordinate xi. Then, there exist indices ℓ2 ∈ S2, ℓ3 ∈ S3, · · · , ℓt ∈ St, such that if we fix the values of all coordinates of y indexed by the sets S1, S2\{ℓ2}, S3\{ℓ3}, · · · , St\{ℓt}, then the values of the coordinates of y indexed by {ℓ2, ℓ3, · · · , ℓt} are uniquely determined.

  • V. Skachek

Bounds for batch codes

slide-34
SLIDE 34

Main Theorem

Lemma Let C be a linear (k, n, r, t) batch code over F, x ∈ Fk, y = C(x). Let S1, S2, · · · , St ⊆ [n] be t disjoint recovery sets for the coordinate xi. Then, there exist indices ℓ2 ∈ S2, ℓ3 ∈ S3, · · · , ℓt ∈ St, such that if we fix the values of all coordinates of y indexed by the sets S1, S2\{ℓ2}, S3\{ℓ3}, · · · , St\{ℓt}, then the values of the coordinates of y indexed by {ℓ2, ℓ3, · · · , ℓt} are uniquely determined. Theorem Let C be a linear (k, n, r, t) batch code over F with the minimum distance d. Then, d ≤ n − k − (t − 1)

  • k

rt − t + 1

  • − 1
  • + 1 .
  • V. Skachek

Bounds for batch codes

slide-35
SLIDE 35

Algorithm

Input: linear (k, n, r, t) batch code C 1: C0 = C 2: j = 0 3: while |Cj| > 1 do 4: j = j + 1 5: Choose the multiset {i1

j , i2 j , . . . , it j } ⊆ [k] and disjoint subsets

S1

j , . . . , St j ∈ [n], where Sℓ j is a recovery set for the information

bit iℓ

j , such that there exist at least two codewords in Cj−1

that differ in (at least) one coordinate 6: Let σj ∈ Σ|Sj | be the most frequent element in the multiset {x|Sj : x ∈ Cj−1}, where Sj = S1

j ∪ · · · ∪ St j

7: Define Cj {x : x ∈ Cj−1, x|Sj = σj} 8: end while Output: Cj−1

  • V. Skachek

Bounds for batch codes

slide-36
SLIDE 36

Extensions of the Main Theorem

Corollary Let C be a linear (k, n, r, t) batch code over F with the minimum distance d. Then, n ≥ max

1≤β≤t,β∈N

  • (β − 1)
  • k

rβ − β + 1

  • − 1
  • + k + d − 1
  • .
  • V. Skachek

Bounds for batch codes

slide-37
SLIDE 37

Extensions of the Main Theorem

Corollary Let C be a linear (k, n, r, t) batch code over F with the minimum distance d. Then, n ≥ max

1≤β≤t,β∈N

  • (β − 1)
  • k

rβ − β + 1

  • − 1
  • + k + d − 1
  • .

Corollary Let C be a linear systematic (k, n, r, t) batch code over F with the minimum distance d. Then, n ≥ max

2≤β≤t,β∈N

  • (β−1)
  • k

rβ − β − r + 2

  • − 1
  • +k+d−1
  • .
  • V. Skachek

Bounds for batch codes

slide-38
SLIDE 38

Example

Consider a batch codes, which are obtained by taking [7, 3, 4] simplex codes. It was shown in [Wang Kiah Cassuto 2015] that the linear code, formed by the generator matrix   1 1 1 1 1 1 1 1 1 1 1 1   is a (3, 7, 2, 4) batch code with the minimum distance d = 4. Here r = 2 and t = 4. Pick β = 2. The right-hand side in the Main Theorem can be re-written as (2 − 1)

  • 3

2 · 2 − 2 − 2 + 2

  • − 1
  • + 3 + 4 − 1

= 7 , and therefore the bound is attained with equality for β = 2.

  • V. Skachek

Bounds for batch codes

slide-39
SLIDE 39

Further Improvements

Assume that µj = 1 for all 1 ≤ j ≤ τ (i.e. in each step i of the algorithm, the set Si recovers multiple copies of one symbol). Additionally, assume that k ≥ 2(rt − t + 1) + 1 . Let ǫ and λ be some positive integers,

  • V. Skachek

Bounds for batch codes

slide-40
SLIDE 40

Further Improvements (cont.)

A = A(k, r, d, β, ǫ) (β − 1)

  • k + ǫ

rβ − β + 1

  • − 1
  • + k + d − 1 ,

B = B(k, r, d, β, λ) (β − 1)

  • k + λ

rβ − β + 1

  • − 1
  • + k + d − 1 ,

C = C(k, r, β, λ, ǫ) (rβ − λ + 1)k − k 2

  • (ǫ − 1) .
  • V. Skachek

Bounds for batch codes

slide-41
SLIDE 41

Improved Bound

Theorem Let C be a linear (k, n, r, t) batch code with the minimum distance

  • d. Then,

n ≥ max

β∈N∩

  • 1,min
  • t,
  • k−3

2(r−1)

  • max

ǫ,λ∈N∩[1,rβ−β] {min {A, B, C}}

  • .
  • V. Skachek

Bounds for batch codes

slide-42
SLIDE 42

Example

Take k = 12, r = 2 and t = 3. The maximum of the right-hand side is obtained when β = 3. For that selection of parameters, we have n ≥ 15 + d ≥ 18 . At the same time, by taking β = 3, λ = 1 and ǫ = 1, we obtain that A = B = 17 + d and C = 6 · 12 − 0 = 72 , and so n ≥ min{17 + d, 72} ≥ 20 .

  • V. Skachek

Bounds for batch codes

slide-43
SLIDE 43

Thank you!

Questions?

  • V. Skachek

Bounds for batch codes