Sparser Johnson-Lindenstrauss Transforms Jelani Nelson Princeton - - PowerPoint PPT Presentation

sparser johnson lindenstrauss transforms
SMART_READER_LITE
LIVE PREVIEW

Sparser Johnson-Lindenstrauss Transforms Jelani Nelson Princeton - - PowerPoint PPT Presentation

Sparser Johnson-Lindenstrauss Transforms Jelani Nelson Princeton February 16, 2012 joint work with Daniel Kane (Stanford) Random Projections x R d , d huge store y = Sx , where S is a k d matrix (compression) Random Projections


slide-1
SLIDE 1

Sparser Johnson-Lindenstrauss Transforms

Jelani Nelson

Princeton

February 16, 2012

joint work with Daniel Kane (Stanford)

slide-2
SLIDE 2

Random Projections

  • x ∈ Rd, d huge
  • store y = Sx, where S is a k × d matrix (compression)
slide-3
SLIDE 3

Random Projections

  • x ∈ Rd, d huge
  • store y = Sx, where S is a k × d matrix (compression)
  • compressed sensing (recover x from y when x is (near-)sparse)
  • group-testing (as above, but Sx is Boolean multiplication)
  • recover properties of x (entropy, heavy hitters, . . .)
  • approximate norm preservation (want y ≈ x)
  • motif discovery (slightly different; randomly project discrete x
  • nto subset of its coordinates) [Buhler-Tompa]
slide-4
SLIDE 4

Random Projections

  • x ∈ Rd, d huge
  • store y = Sx, where S is a k × d matrix (compression)
  • compressed sensing (recover x from y when x is (near-)sparse)
  • group-testing (as above, but Sx is Boolean multiplication)
  • recover properties of x (entropy, heavy hitters, . . .)
  • approximate norm preservation (want y ≈ x)
  • motif discovery (slightly different; randomly project discrete x
  • nto subset of its coordinates) [Buhler-Tompa]
  • In many of these applications, random S is either required or
  • btains better parameters than deterministic constructions.
slide-5
SLIDE 5

Random Projections

  • x ∈ Rd, d huge
  • store y = Sx, where S is a k × d matrix (compression)
  • compressed sensing (recover x from y when x is (near-)sparse)
  • group-testing (as above, but Sx is Boolean multiplication)
  • recover properties of x (entropy, heavy hitters, . . .)
  • approximate norm preservation (want y ≈ x)
  • motif discovery (slightly different; randomly project discrete x
  • nto subset of its coordinates) [Buhler-Tompa]
  • In many of these applications, random S is either required or
  • btains better parameters than deterministic constructions.
slide-6
SLIDE 6

Metric Johnson-Lindenstrauss lemma

Metric JL (MJL) Lemma, 1984

Every set of n points in Euclidean space can be embedded into O(ε−2 log n)-dimensional Euclidean space so that all pairwise distances are preserved up to a 1 ± ε factor.

slide-7
SLIDE 7

Metric Johnson-Lindenstrauss lemma

Metric JL (MJL) Lemma, 1984

Every set of n points in Euclidean space can be embedded into O(ε−2 log n)-dimensional Euclidean space so that all pairwise distances are preserved up to a 1 ± ε factor. Uses:

  • Speed up geometric algorithms by first reducing dimension of

input [Indyk-Motwani, 1998], [Indyk, 2001]

  • Low-memory streaming algorithms for linear algebra problems

[Sarl´

  • s, 2006], [LWMRT, 2007], [Clarkson-Woodruff, 2009]
  • Essentially equivalent to RIP matrices from compressed

sensing [Baraniuk et al., 2008], [Krahmer-Ward, 2011] (used for recovery of sparse signals)

slide-8
SLIDE 8

How to prove the JL lemma

Distributional JL (DJL) lemma Lemma

For any 0 < ε, δ < 1/2 there exists a distribution Dε,δ on Rk×d for k = O(ε−2 log(1/δ)) so that for any x of unit norm Pr

S∼Dε,δ

  • Sx2

2 − 1

  • > ε
  • < δ.
slide-9
SLIDE 9

How to prove the JL lemma

Distributional JL (DJL) lemma Lemma

For any 0 < ε, δ < 1/2 there exists a distribution Dε,δ on Rk×d for k = O(ε−2 log(1/δ)) so that for any x of unit norm Pr

S∼Dε,δ

  • Sx2

2 − 1

  • > ε
  • < δ.

Proof of MJL: Set δ = 1/n2 in DJL and x as the difference vector

  • f some pair of points. Union bound over the

n

2

  • pairs.
slide-10
SLIDE 10

How to prove the JL lemma

Distributional JL (DJL) lemma Lemma

For any 0 < ε, δ < 1/2 there exists a distribution Dε,δ on Rk×d for k = O(ε−2 log(1/δ)) so that for any x of unit norm Pr

S∼Dε,δ

  • Sx2

2 − 1

  • > ε
  • < δ.

Proof of MJL: Set δ = 1/n2 in DJL and x as the difference vector

  • f some pair of points. Union bound over the

n

2

  • pairs.

Theorem (Alon, 2003)

For every n, there exists a set of n points requiring target dimension k = Ω((ε−2/ log(1/ε)) log n).

Theorem (Jayram-Woodruff, 2011; Kane-Meka-N., 2011)

For DJL, k = Θ(ε−2 log(1/δ)) is optimal.

slide-11
SLIDE 11

Proving the JL lemma

Older proofs

  • [Johnson-Lindenstrauss, 1984], [Frankl-Maehara, 1988]:

Random rotation, then projection onto first k coordinates.

  • [Indyk-Motwani, 1998], [Dasgupta-Gupta, 2003]:

Random matrix with independent Gaussian entries.

  • [Achlioptas, 2001]: Independent ±1 entries.
  • [Clarkson-Woodruff, 2009]:

O(log(1/δ))-wise independent ±1 entries.

  • [Arriaga-Vempala, 1999], [Matousek, 2008]:

Independent entries having mean 0, variance 1/k, and subGaussian tails

slide-12
SLIDE 12

Proving the JL lemma

Older proofs

  • [Johnson-Lindenstrauss, 1984], [Frankl-Maehara, 1988]:

Random rotation, then projection onto first k coordinates.

  • [Indyk-Motwani, 1998], [Dasgupta-Gupta, 2003]:

Random matrix with independent Gaussian entries.

  • [Achlioptas, 2001]: Independent ±1 entries.
  • [Clarkson-Woodruff, 2009]:

O(log(1/δ))-wise independent ±1 entries.

  • [Arriaga-Vempala, 1999], [Matousek, 2008]:

Independent entries having mean 0, variance 1/k, and subGaussian tails Downside: Performing embedding is dense matrix-vector multiplication, O(k · x0) time

slide-13
SLIDE 13

Fast JL Transforms

  • [Ailon-Chazelle, 2006]: x → PHDx, O(d log d + k3) time

P is a random sparse matrix, H is Hadamard, D has random ±1 on diagonal

  • [Ailon-Liberty, 2008]: O(d log k + k2) time, also based on fast

Hadamard transform

  • [Ailon-Liberty, 2011] and [Krahmer-Ward, 2011]: O(d log d)

for MJL, but with suboptimal k = O(ε−2 log n log4 d).

slide-14
SLIDE 14

Fast JL Transforms

  • [Ailon-Chazelle, 2006]: x → PHDx, O(d log d + k3) time

P is a random sparse matrix, H is Hadamard, D has random ±1 on diagonal

  • [Ailon-Liberty, 2008]: O(d log k + k2) time, also based on fast

Hadamard transform

  • [Ailon-Liberty, 2011] and [Krahmer-Ward, 2011]: O(d log d)

for MJL, but with suboptimal k = O(ε−2 log n log4 d). Downside: Slow to embed sparse vectors: running time is Ω(min{k · x0, d log d}).

slide-15
SLIDE 15

Where Do Sparse Vectors Show Up?

  • Document as bag of words: xi = number of occurrences of

word i. Compare documents using cosine similarity. d = lexicon size; most documents aren’t dictionaries

  • Network traffic: xi,j = #bytes sent from i to j

d = 264 (2256 in IPv6); most servers don’t talk to each other

  • User ratings: xi is user’s score for movie i on Netflix

d = #movies; most people haven’t rated all movies

  • Streaming: x receives a stream of updates of the form: “add

v to xi”. Maintaining Sx requires calculating v · Sei.

  • . . .
slide-16
SLIDE 16

Sparse JL transforms

One way to embed sparse vectors faster: use sparse matrices.

slide-17
SLIDE 17

Sparse JL transforms

One way to embed sparse vectors faster: use sparse matrices. s = #non-zero entries per column in embedding matrix (so embedding time is s · x0) reference value of s type [JL84], [FM88], [IM98], . . . k ≈ 4ε−2 log(1/δ) dense [Achlioptas01] k/3 sparse Bernoulli [WDALS09] no proof hashing [DKS10] ˜ O(ε−1 log3(1/δ)) hashing [KN10a], [BOR10] ˜ O(ε−1 log2(1/δ)) ” [KN12] O(ε−1 log(1/δ)) hashing (random codes)

slide-18
SLIDE 18

Other related work

  • CountSketch of [Charikar-Chen-FarachColton] gives

s = O(log(1/δ)) (see [Thorup-Zhang])

slide-19
SLIDE 19

Other related work

  • CountSketch of [Charikar-Chen-FarachColton] gives

s = O(log(1/δ)) (see [Thorup-Zhang])

  • Can recover (1 ± ε)x2 from Sx, but not as Sx2 (not an

embedding into ℓ2)

  • Not applicable in certain situations, e.g. in some nearest

neighbor data structures, and when learning classifiers over projected vectors via stochastic gradient descent

slide-20
SLIDE 20

Sparse JL Constructions

[DKS, 2010]

s = ˜ Θ(ε−1 log2(1/δ))

slide-21
SLIDE 21

Sparse JL Constructions

[DKS, 2010]

s = ˜ Θ(ε−1 log2(1/δ))

[this work]

s = Θ(ε−1 log(1/δ))

slide-22
SLIDE 22

Sparse JL Constructions

[DKS, 2010]

s = ˜ Θ(ε−1 log2(1/δ))

[this work]

s = Θ(ε−1 log(1/δ))

[this work]

k/s s = Θ(ε−1 log(1/δ))

slide-23
SLIDE 23

Sparse JL Constructions (in matrix form)

=

k

k/s

=

k

Each black cell is ±1/√s at random

slide-24
SLIDE 24

Sparse JL Constructions (nicknames)

“Graph” construction

k/s

“Block” construction

slide-25
SLIDE 25

Sparse JL notation (block construction)

  • Let h(j, r), σ(j, r) be random hash location and random sign

for copy of xj in rth block.

slide-26
SLIDE 26

Sparse JL notation (block construction)

  • Let h(j, r), σ(j, r) be random hash location and random sign

for copy of xj in rth block.

  • (Sx)i = (1/√s) ·

h(j,r)=i xj · σ(j, r)

slide-27
SLIDE 27

Sparse JL notation (block construction)

  • Let h(j, r), σ(j, r) be random hash location and random sign

for copy of xj in rth block.

  • (Sx)i = (1/√s) ·

h(j,r)=i xj · σ(j, r)

Sx2

2 = x2 2 + 1

s ·

s

  • r=1
  • i=j

xixjσ(i, r)σ(j, r) · 1h(i,r)=h(j,r)

slide-28
SLIDE 28

Sparse JL via Codes

=

k

k/s

=

k

  • Graph construction: Constant weight binary code of weight s.
  • Block construction: Code over q-ary alphabet, q = k/s.
slide-29
SLIDE 29

Sparse JL via Codes

=

k

k/s

=

k

  • Graph construction: Constant weight binary code of weight s.
  • Block construction: Code over q-ary alphabet, q = k/s.
  • Thm: Just need distance s − O(s2/k) (block construction)

(2s − O(s2/k) for graph construction)

slide-30
SLIDE 30

Analysis (block construction)

k/s

=

k

  • ηi,j,r indicates whether i, j collide in rth block.
  • Sx2

2 = x2 2 + Z

Z = (1/s)

r Zr

Zr =

i=j xixjσ(i, r)σ(j, r)ηi,j,r

slide-31
SLIDE 31

Analysis (block construction)

k/s

=

k

  • ηi,j,r indicates whether i, j collide in rth block.
  • Sx2

2 = x2 2 + Z

Z = (1/s)

r Zr

Zr =

i=j xixjσ(i, r)σ(j, r)ηi,j,r

  • Z is a quadratic form in σ, so apply known moment bounds

for quadratic forms

slide-32
SLIDE 32

Analysis

k/s

=

k

Theorem (Hanson-Wright, 1971)

σ1, . . . , σn independent ±1, B ∈ Rn×n symmetric. For λ > 0, Pr

  • σTBσ − trace(B)
  • > λ
  • < e−C·min{λ2/B2

F ,λ/B2}

Reminder:

  • BF =
  • i,j B2

i,j

  • B2 is largest magnitude of eigenvalue of B
slide-33
SLIDE 33

Analysis

Z = 1 s ·

s

  • r=1
  • i=j

xixjσ(i, r)σ(j, r)ηi,j,r

slide-34
SLIDE 34

Analysis

Z = 1 s ·

s

  • r=1
  • i=j

xixjσ(i, r)σ(j, r)ηi,j,r = σTBσ B = 1 s · B1 . . . B2 . . . ... . . . Bs

  • (Br)i,j = xixjηi,j,r
slide-35
SLIDE 35

Frobenius norm bound

B = 1 s · B1 . . . B2 . . . ... . . . Bs

  • (Br)i,j = xixjηi,j,r
slide-36
SLIDE 36

Frobenius norm bound

B = 1 s · B1 . . . B2 . . . ... . . . Bs

  • (Br)i,j = xixjηi,j,r
  • B2

F = 1 s2

  • i=j x2

i x2 j · (#times i, j collide)

slide-37
SLIDE 37

Frobenius norm bound

B = 1 s · B1 . . . B2 . . . ... . . . Bs

  • (Br)i,j = xixjηi,j,r
  • B2

F = 1 s2

  • i=j x2

i x2 j · (#times i, j collide)

< O(1/k) · x4

2 = O(1/k) (good code!)

slide-38
SLIDE 38

Operator norm bound

Br = 1 s ·      x2

1

x1x2η1,2,r . . . x1xdη1,d,r x2x1η2,1,r x2

2

. . . x2xdη2,d,r . . . ... ... . . . xdx1ηd,1,r . . . xdxd−1ηd,d−1,r x2

d

     − 1 s ·      x2

1

. . . x2

2

. . . ... . . . x2

d

    

slide-39
SLIDE 39

Operator norm bound

Br = 1 s ·      x2

1

x1x2η1,2,r . . . x1xdη1,d,r x2x1η2,1,r x2

2

. . . x2xdη2,d,r . . . ... ... . . . xdx1ηd,1,r . . . xdxd−1ηd,d−1,r x2

d

     − 1 s ·      x2

1

. . . x2

2

. . . ... . . . x2

d

    

  • Br = 1

s · (Sr − D)

slide-40
SLIDE 40

Operator norm bound

Br = 1 s ·      x2

1

x1x2η1,2,r . . . x1xdη1,d,r x2x1η2,1,r x2

2

. . . x2xdη2,d,r . . . ... ... . . . xdx1ηd,1,r . . . xdxd−1ηd,d−1,r x2

d

     − 1 s ·      x2

1

. . . x2

2

. . . ... . . . x2

d

    

  • Br = 1

s · (Sr − D)

  • D2 = x2

∞ ≤ 1

slide-41
SLIDE 41

Operator norm bound

Br = 1 s ·      x2

1

x1x2η1,2,r . . . x1xdη1,d,r x2x1η2,1,r x2

2

. . . x2xdη2,d,r . . . ... ... . . . xdx1ηd,1,r . . . xdxd−1ηd,d−1,r x2

d

     − 1 s ·      x2

1

. . . x2

2

. . . ... . . . x2

d

    

  • Br = 1

s · (Sr − D)

  • D2 = x2

∞ ≤ 1

  • Sr = k/s

i=1 ur,iuT r,i, these are the eigenvectors of Sr

(ur,i is projection of x onto coordinates hashing to i in rth block) Sr2 = maxi ur,i2

2 ≤ x2 2 = 1

slide-42
SLIDE 42

Operator norm bound

Br = 1 s ·      x2

1

x1x2η1,2,r . . . x1xdη1,d,r x2x1η2,1,r x2

2

. . . x2xdη2,d,r . . . ... ... . . . xdx1ηd,1,r . . . xdxd−1ηd,d−1,r x2

d

     − 1 s ·      x2

1

. . . x2

2

. . . ... . . . x2

d

    

  • Br = 1

s · (Sr − D)

  • D2 = x2

∞ ≤ 1

  • Sr = k/s

i=1 ur,iuT r,i, these are the eigenvectors of Sr

(ur,i is projection of x onto coordinates hashing to i in rth block) Sr2 = maxi ur,i2

2 ≤ x2 2 = 1

  • Br2 ≤ (1/s) · max{Sr2, D2} ≤ 1/s
slide-43
SLIDE 43

Wrapping up analysis

  • B2

F = O(1/k)

  • B2 ≤ 1/s

Apply Hanson-Wright:

slide-44
SLIDE 44

Wrapping up analysis

  • B2

F = O(1/k)

  • B2 ≤ 1/s

Apply Hanson-Wright: Pr [|Z| > ε] < e−C ′·min{ε2k, εs} k = Ω(log(1/δ)/ε2), s = Ω(log(1/δ)/ε), QED

slide-45
SLIDE 45

Code-based Construction: Caveat

Need a sufficiently good code.

slide-46
SLIDE 46

Code-based Construction: Caveat

Need a sufficiently good code.

  • Each pair of codewords should agree on O(s2/k) coordinates.
  • Can get this with random code by Chernoff + union bound
  • ver pairs, but then need: s2/k ≥ log(d/δ) ⇒

s ≥

  • k log(d/δ) = Ω(ε−1

log(d/δ) log(1/δ)).

slide-47
SLIDE 47

Code-based Construction: Caveat

Need a sufficiently good code.

  • Each pair of codewords should agree on O(s2/k) coordinates.
  • Can get this with random code by Chernoff + union bound
  • ver pairs, but then need: s2/k ≥ log(d/δ) ⇒

s ≥

  • k log(d/δ) = Ω(ε−1

log(d/δ) log(1/δ)).

  • Slightly better: Can assume d = O(ε−2/δ) by first embedding

into this dimension with s = 1 (Analysis: Chebyshev’s inequality) ⇒ Can get away with s = O(ε−1 log(1/(εδ)) log(1/δ)). Can we avoid the loss incurred by this union bound?

slide-48
SLIDE 48

Code-based Construction: Caveat

Need a sufficiently good code.

  • Each pair of codewords should agree on O(s2/k) coordinates.
  • Can get this with random code by Chernoff + union bound
  • ver pairs, but then need: s2/k ≥ log(d/δ) ⇒

s ≥

  • k log(d/δ) = Ω(ε−1

log(d/δ) log(1/δ)).

  • Slightly better: Can assume d = O(ε−2/δ) by first embedding

into this dimension with s = 1 (Analysis: Chebyshev’s inequality) ⇒ Can get away with s = O(ε−1 log(1/(εδ)) log(1/δ)). Can we avoid the loss incurred by this union bound?

slide-49
SLIDE 49

Improving the Construction

  • Pick h at random
  • Analysis: Directly bound the ℓ = log(1/δ)th moment of error

term Z, then Markov’s inequality Pr[|Z| > ε] < ε−ℓ · E[Z ℓ] (conditioning on the event “h gives a code” is too demanding)

slide-50
SLIDE 50

Improving the Construction

  • Pick h at random
  • Analysis: Directly bound the ℓ = log(1/δ)th moment of error

term Z, then Markov’s inequality Pr[|Z| > ε] < ε−ℓ · E[Z ℓ] (conditioning on the event “h gives a code” is too demanding)

  • Z = (1/s) · s

r=1

  • i=j xixjσ(i, r)σ(j, r)ηi,j,r
slide-51
SLIDE 51

Improving the Construction

  • Pick h at random
  • Analysis: Directly bound the ℓ = log(1/δ)th moment of error

term Z, then Markov’s inequality Pr[|Z| > ε] < ε−ℓ · E[Z ℓ] (conditioning on the event “h gives a code” is too demanding)

  • Z = (1/s) · s

r=1

  • i=j xixjσ(i, r)σ(j, r)ηi,j,r

(Z = (1/s) s

r=1 Zr)

slide-52
SLIDE 52

Improving the Construction

  • Pick h at random
  • Analysis: Directly bound the ℓ = log(1/δ)th moment of error

term Z, then Markov’s inequality Pr[|Z| > ε] < ε−ℓ · E[Z ℓ] (conditioning on the event “h gives a code” is too demanding)

  • Z = (1/s) · s

r=1

  • i=j xixjσ(i, r)σ(j, r)ηi,j,r

(Z = (1/s) s

r=1 Zr)

Eh,σ[Z ℓ] = 1 sℓ ·

  • r1<...<rn

t1,...,tn>1

  • i ti=ℓ

t1, . . . , tn

  • ·

n

  • i=1

Eh,σ[Z ti

ri ]

Bound the tth moment of any Zr, then get the ℓth moment bound for Z by plugging into the above

slide-53
SLIDE 53

Bounding E[Z t

r ]

  • Zr =

i=j xixjσ(i, r)σ(j, r)ηi,j,r

slide-54
SLIDE 54

Bounding E[Z t

r ]

  • Zr =

i=j xixjσ(i, r)σ(j, r)ηi,j,r

  • Monomials appearing in expansion of Z t

r are in

correspondence with directed multigraphs. (x1x2) · (x3x4) · (x3x8) · (x4x8) · (x2x10) →

1 5 2 4 3

slide-55
SLIDE 55

Bounding E[Z t

r ]

  • Zr =

i=j xixjσ(i, r)σ(j, r)ηi,j,r

  • Monomials appearing in expansion of Z t

r are in

correspondence with directed multigraphs. (x1x2) · (x3x4) · (x3x8) · (x4x8) · (x2x10) →

1 5 2 4 3

  • Monomial contributes to expectation iff all degrees even
  • Analysis: Group monomials appearing in Z t

r according to

graph isomorphism class then do some combinatorics.

  • Our analysis is tight up to a constant factor.
slide-56
SLIDE 56

Bounding E[Z t

r ]

m = #connected components, v = #vertices, du = degree of u Eh,σ[Z t

r ]

=

  • G∈Gt
  • i1=j1,...,it=jt

f ((iu,ju)t

u=1)=G

E t

  • u=1

ηiu,ju,r

  • ·

t

  • u=1

xiuxju

slide-57
SLIDE 57

Bounding E[Z t

r ]

m = #connected components, v = #vertices, du = degree of u Eh,σ[Z t

r ]

=

  • G∈Gt
  • i1=j1,...,it=jt

f ((iu,ju)t

u=1)=G

E t

  • u=1

ηiu,ju,r

  • ·

t

  • u=1

xiuxju

  • =
  • G∈Gt

s k v−m ·    

  • i1=j1,...,it=jt

f ((iu,ju)t

u=1)=G

t

  • u=1

xiuxju

  

slide-58
SLIDE 58

Bounding E[Z t

r ]

m = #connected components, v = #vertices, du = degree of u Eh,σ[Z t

r ]

=

  • G∈Gt
  • i1=j1,...,it=jt

f ((iu,ju)t

u=1)=G

E t

  • u=1

ηiu,ju,r

  • ·

t

  • u=1

xiuxju

  • =
  • G∈Gt

s k v−m ·    

  • i1=j1,...,it=jt

f ((iu,ju)t

u=1)=G

t

  • u=1

xiuxju

   ≤

  • G∈Gt

s k v−m · v! · 1

  • t

d1/2,...,dv/2

slide-59
SLIDE 59

Bounding E[Z t

r ]

m = #connected components, v = #vertices, du = degree of u Eh,σ[Z t

r ]

=

  • G∈Gt
  • i1=j1,...,it=jt

f ((iu,ju)t

u=1)=G

E t

  • u=1

ηiu,ju,r

  • ·

t

  • u=1

xiuxju

  • =
  • G∈Gt

s k v−m ·    

  • i1=j1,...,it=jt

f ((iu,ju)t

u=1)=G

t

  • u=1

xiuxju

   ≤

  • G∈Gt

s k v−m · v! · 1

  • t

d1/2,...,dv/2

2O(t)

v,m

t−tvv s k v−m ·

  • G
  • u
  • du

du

slide-60
SLIDE 60

Bounding E[Z t

r ]

m = #connected components, v = #vertices, du = degree of u Eh,σ[Z t

r ]

=

  • G∈Gt
  • i1=j1,...,it=jt

f ((iu,ju)t

u=1)=G

E t

  • u=1

ηiu,ju,r

  • ·

t

  • u=1

xiuxju

  • =
  • G∈Gt

s k v−m ·    

  • i1=j1,...,it=jt

f ((iu,ju)t

u=1)=G

t

  • u=1

xiuxju

   ≤

  • G∈Gt

s k v−m · v! · 1

  • t

d1/2,...,dv/2

2O(t)

v,m

t−tvv s k v−m ·

  • G
  • u
  • du

du

slide-61
SLIDE 61

Bounding E[Z t

r ]

  • In the end, can show

E[Z t

r ] ≤ 2O(t) ·

  • s/k

t < log(k/s) (t/ log(k/s))t

  • therwise
  • Plug this into expression for E[Z ℓ], QED
slide-62
SLIDE 62

Open Problems

slide-63
SLIDE 63

Open Problems

  • OPEN: Devise distribution which can be sampled using few

random bits Current record: O(log d + log(1/ε) log(1/δ) + log(1/δ) log log(1/δ)) [Kane-Meka-N., 2011] Existential: O(log d + log(1/δ))

  • OPEN: Prove a tight lower bound on achievable sparsity in a

JL distribution.

  • OPEN: Can we have a JL matrix such that we can multiply

by any k × k submatrix in k · polylog(d) time? (ultimate goal)

  • OPEN: Embed any vector in ˜

O(d) time into optimal k