Faster Johnson-Lindenstrauss style reductions Aditya Menon August - - PowerPoint PPT Presentation

faster johnson lindenstrauss style reductions
SMART_READER_LITE
LIVE PREVIEW

Faster Johnson-Lindenstrauss style reductions Aditya Menon August - - PowerPoint PPT Presentation

Faster Johnson-Lindenstrauss style reductions Faster Johnson-Lindenstrauss style reductions Aditya Menon August 23, 2007 Faster Johnson-Lindenstrauss style reductions Outline Introduction 1 Dimensionality reduction The Johnson-Lindenstrauss


slide-1
SLIDE 1

Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Aditya Menon August 23, 2007

slide-2
SLIDE 2

Faster Johnson-Lindenstrauss style reductions Outline

1

Introduction Dimensionality reduction The Johnson-Lindenstrauss Lemma Speeding up computation

2

The Fast Johnson-Lindenstrauss Transform Sparser projections Trouble with sparse vectors? Summary

3

Ailon and Liberty’s improvement Bounding the mapping The Walsh-Hadamard transform Error-correcting codes Putting it together

4

References

slide-3
SLIDE 3

Faster Johnson-Lindenstrauss style reductions Introduction Dimensionality reduction

Distances

For high-dimensional vector data, it is of interest to have a notion of distance between two vectors Recall that the ℓp norm of a vector x is ||x||p =

  • |xi|p1/p

The ℓ2 norm corresponds to the standard Euclidean norm of a vector The ℓ∞ norm is the maximal absolute value of any component ||x||∞ = max

i

|xi|

slide-4
SLIDE 4

Faster Johnson-Lindenstrauss style reductions Introduction Dimensionality reduction

Dimensionality reduction

Suppose we’re given an input vector x ∈ Rd We want to reduce the dimensionality of x to some k < d, while preserving the ℓp norm

Can think of this as a metric embedding problem - can we embed ℓd

p into ℓk p?

Formally, we have the following problem Problem Suppose we are given an x ∈ Rd, and some parameters p, ǫ. Can we find a y ∈ Rk for some k = f (ǫ) so that (1 − ǫ)||x||p ≤ ||y||p ≤ (1 + ǫ)||x||p

slide-5
SLIDE 5

Faster Johnson-Lindenstrauss style reductions Introduction The Johnson-Lindenstrauss Lemma

The Johnson-Lindenstrauss Lemma

The Johnson-Lindenstrauss Lemma [5] is the archetypal result for ℓ2 dimensionality reduction Tells us that for n points, there is an ǫ-embedding of ℓd

2 → ℓO(log n/ǫ2) 2

Theorem Suppose {ui}i=1...n ∈ Rn×d. Then, for ǫ > 0 and k = O(log n/ǫ2), there is a mapping f : Rd → Rk so that (∀i, j)(1 − ǫ)||ui − uj||2 ≤ ||f (ui) − f (uj)||2 ≤ (1 + ǫ)||ui − uj||2

slide-6
SLIDE 6

Faster Johnson-Lindenstrauss style reductions Introduction The Johnson-Lindenstrauss Lemma

Johnson-Lindenstrauss in practice

Proof of Johnson-Lindenstrauss lemma is non-constructive (unfortunately!) In practise, we use the probabilistic method to do a Johnson-Lindenstrauss style reduction Insert randomness at the cost of an exact guarantee

Now the guarantee becomes probabilistic

slide-7
SLIDE 7

Faster Johnson-Lindenstrauss style reductions Introduction The Johnson-Lindenstrauss Lemma

Johnson-Lindenstrauss in practice

Standard version: Theorem Suppose {ui}i=1...n ∈ Rn×d. Then, for ǫ > 0 and k = O(β log n/ǫ2), the mapping f (ui) =

1 √ k uiR, where R is a

d × k matrix of i.i.d. Gaussian variables, satisfies with probability at least 1 − 1

nβ ,

(∀i, j)(1 − ǫ)||ui − uj||2 ≤ ||f (ui) − f (uj)||2 ≤ (1 + ǫ)||ui − uj||2

slide-8
SLIDE 8

Faster Johnson-Lindenstrauss style reductions Introduction Speeding up computation

Achlioptas’ improvement

Achlioptas [1] gave an ever simpler matrix construction: Rij = √ 3      +1 probability = 1

6

probability = 2

3

−1 probability = 1

6 2 3rds sparse, and simpler to construct than a Gaussian matrix

With no loss in accuracy!

slide-9
SLIDE 9

Faster Johnson-Lindenstrauss style reductions Introduction Speeding up computation

A question

2 3rds sparsity is a good speedup in practise

But density is still O(dk) Computing the mapping is still an O(dk) operation asymptotically

Let A = {A : ∀ unit x ∈ Rd, with v.h.p., (1−ǫ) ≤ ||Ax||2 ≤ (1+ǫ)} Question: For which A ∈ A can Ax be computed quicker than O(dk)?

slide-10
SLIDE 10

Faster Johnson-Lindenstrauss style reductions Introduction Speeding up computation

The answer?

We look at two approaches that allow for quicker computation First is the Fast Johnson-Lindenstrauss transform, based on a Fourier transform Next is the Ailon-Liberty Transform, based on a Fourier transform and error correcting codes!

slide-11
SLIDE 11

Faster Johnson-Lindenstrauss style reductions The Fast Johnson-Lindenstrauss Transform

The Fast Johnson-Lindenstrauss Transform

Ailon and Chazelle [2] proposed the Fast Johnson-Lindenstrauss transform Can speedup ℓ2 reduction from O(dk) to (roughly) O(d log d) How?

Make the projection matrix even sparser Need some “tricks” to solve the problems associated with this

Let’s reverse engineer the construction...

slide-12
SLIDE 12

Faster Johnson-Lindenstrauss style reductions The Fast Johnson-Lindenstrauss Transform Sparser projections

Sparser projection matrix

Use the projection matrix P ∼

  • N
  • 0, 1

q

  • p = q

p = 1 − q where q = min

  • Θ

log2 n d

  • , 1
  • Density of the matrix is O

1

ǫ2 min

  • log3 n, d log n
  • In practise, this is typically significantly sparser than

Achlioptas’ matrix

slide-13
SLIDE 13

Faster Johnson-Lindenstrauss style reductions The Fast Johnson-Lindenstrauss Transform Trouble with sparse vectors?

What do we lose?

Can follow standard concentration-proof methods But we end up needing to assume that ||x||∞ is bounded - namely, that information is spread out

We fail on vectors like x = (1, 0, . . . , 0) i.e. sparse data and a sparse projection don’t mix well

So are we forced to choose between generality or usefulness?

slide-14
SLIDE 14

Faster Johnson-Lindenstrauss style reductions The Fast Johnson-Lindenstrauss Transform Trouble with sparse vectors?

What do we lose?

Can follow standard concentration-proof methods But we end up needing to assume that ||x||∞ is bounded - namely, that information is spread out

We fail on vectors like x = (1, 0, . . . , 0) i.e. sparse data and a sparse projection don’t mix well

So are we forced to choose between generality or usefulness?

Not if we try to insert randomness...

slide-15
SLIDE 15

Faster Johnson-Lindenstrauss style reductions The Fast Johnson-Lindenstrauss Transform Trouble with sparse vectors?

A clever idea

Can we randomly transform x so that

||Φ(x)||2 = ||x||2 ||Φ(x)||∞ is bounded with v.h.p.?

slide-16
SLIDE 16

Faster Johnson-Lindenstrauss style reductions The Fast Johnson-Lindenstrauss Transform Trouble with sparse vectors?

A clever idea

Can we randomly transform x so that

||Φ(x)||2 = ||x||2 ||Φ(x)||∞ is bounded with v.h.p.?

Answer: Yes! Use a Fourier transform Φ = F

Distance preserving Has an “uncertainty principle” - a “signal” and its Fourier transform cannot both be concentrated

Use the FFT to give an O(d log d) random mapping Details on the specifics in next section...

slide-17
SLIDE 17

Faster Johnson-Lindenstrauss style reductions The Fast Johnson-Lindenstrauss Transform Trouble with sparse vectors?

Applying a Fourier transform

Fourier transform will guarantee that ||x||∞ = ω(1) ⇐ ⇒ || x||∞ = o(1) But now we will be in trouble if the input is uniformly distributed! To deal with this, do a random sign change:

  • x = Dx

where D is a random diagonal ±1 matrix Now we get a guarantee of spread with high probability, so the “random” Fourier transform gives us back generality

slide-18
SLIDE 18

Faster Johnson-Lindenstrauss style reductions The Fast Johnson-Lindenstrauss Transform Trouble with sparse vectors?

Random sign change

The sign change mapping Dx will give us

  • x =

     d1x1 d2x2 . . . ddxd      =      ±x1 ±x2 . . . ±xd      where the ± are attained with equal probability Clearly norm preserving

slide-19
SLIDE 19

Faster Johnson-Lindenstrauss style reductions The Fast Johnson-Lindenstrauss Transform Trouble with sparse vectors?

Putting it together

So, we compute the mapping f : x → PF(Dx) Runtime will be O

  • d log d + min

d log n ǫ2 , log3 n ǫ2

  • Under some loose conditions, runtime is

O

  • max
  • d log d, k3

If k ∈

  • Ω(log d), O(

√ d)

  • , this is quicker than the O(dk)

simple mapping

In practise, upper bound is reasonable, lower bound might not be though

slide-20
SLIDE 20

Faster Johnson-Lindenstrauss style reductions The Fast Johnson-Lindenstrauss Transform Summary

Summary

Tried increasing sparsity with disregard for generality Used randomization to get back generality (probabilistically) Key ingredient was a Fourier transform, with a randomization step first

slide-21
SLIDE 21

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement

Ailon and Liberty’s improvement

Ailon and Liberty [3] improved the runtime from O(d log d) to O(d log k), for k = O

  • d1/2−δ

, δ > 0 Idea: Sparsity isn’t the only way to speedup computation time

Can also speedup runtime when the projection matrix has a special structure So find a matrix with a convenient structure and which will satisfy the JL property

slide-22
SLIDE 22

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement

Operator norm

We need something called the operator norm in our analysis The operator norm of a transformation matrix A is ||A||p→q = sup

||x||p=1

||Ax||q i.e. maximal q norm of the transformation of unit ℓp-norm points A fact we will need to employ: ||A||p1→p2 = ||AT||q2→q1 where

1 p1 + 1 q1 = 1, 1 p2 + 1 q2 = 1

slide-23
SLIDE 23

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement

Reverse engineering

Let’s say the mapping is a matrix multiplication In particular, say we have a mapping of the form f : x → BDx where B is some k × d matrix with unit columns, and D is a diagonal matrix whose entries are randomly ±1

Doing a random sign change again

Now we just need to see what properties we will need B to satisfy in order for ||BDx||2 ≈ ||x||2

slide-24
SLIDE 24

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement Bounding the mapping

Bounding the mapping

Easy to see that BDx =    B11d1x1 + . . . + B1dddxd . . . Bk1d1x1 + . . . + Bkdddxd    Write as BDx = Mz, where M(i) = xiB(i) z =

  • d1

. . . dd T There is a special name for a vector like Mz...

slide-25
SLIDE 25

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement Bounding the mapping

Rademacher series

Definition If M is an arbitrary k × d real matrix, and z ∈ Rd is so that zi =

  • +1

p = 1/2 −1 p = 1/2 then Mz is called a Rademacher random variable. This is a vector whose entries are arbitrary sums/differences of each of the entires in rows of M. Such a variable is interesting because of a powerful theorem...

slide-26
SLIDE 26

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement Bounding the mapping

Talagrand’s theorem

Theorem Suppose M, z are as above. Let Z = ||Mz||p, and let σ = ||M||2→p µ = median(Z) Then, Pr [|Z − µ| > t] ≤ 4e−t2/8σ2 (see [6]) σ (the “deviation”) is the maximal p-norm of all points on the unit circle Theorem says that the norm of a Rademacher variable is sharply concentrated about the median

slide-27
SLIDE 27

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement Bounding the mapping

Implications for us

Our mapping, BDx, has given us a Rademacher random variable We know that we can apply Talagrand’s theorem to get a concentration result So, all we need to do is find out what the median and deviation are...

slide-28
SLIDE 28

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement Bounding the mapping

Deviation

Let Y = ||BDx||2 = ||Mz||2 Deviation is

σ = sup

||y||2=1

||y TM||2 = sup d

  • i=1

x2

i

  • y TB(i)2

1/2 ≤ ||x||4 sup d

  • i=1

(y TB(i))4 1/4 by Cauchy-Schwartz = ||x||4||BT||2→4

slide-29
SLIDE 29

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement Bounding the mapping

What do we need?

So, σ ≤ ||x||4||BT||2→4 Fact: |1 − µ| ≤ √ 32σ Can combine to get Pr [|Y − 1| > t] ≤ c0e−c1t2/(||x||2

4||BT ||2 2→4)

slide-30
SLIDE 30

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement Bounding the mapping

What do we need?

So, σ ≤ ||x||4||BT||2→4 Fact: |1 − µ| ≤ √ 32σ Can combine to get Pr [|Y − 1| > t] ≤ c0e−c1t2/(||x||2

4||BT ||2 2→4)

Result: We need to control both ||x||4 and ||BT||2→4

i.e. we want them both to be small If we manage this, we’ve got our concentration bound

slide-31
SLIDE 31

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement Bounding the mapping

The two ingredients

To get the concentration bound, we need to ensure that ||x||4, ||BT||2→4 are sufficiently small How to control ||x||4? How to control ||BT||2→4?

slide-32
SLIDE 32

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement Bounding the mapping

The two ingredients

To get the concentration bound, we need to ensure that ||x||4, ||BT||2→4 are sufficiently small How to control ||x||4?

Use repeated Fourier/Walsh-Hadamard transforms

How to control ||BT||2→4?

slide-33
SLIDE 33

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement Bounding the mapping

The two ingredients

To get the concentration bound, we need to ensure that ||x||4, ||BT||2→4 are sufficiently small How to control ||x||4?

Use repeated Fourier/Walsh-Hadamard transforms

How to control ||BT||2→4?

Use error correcting codes

slide-34
SLIDE 34

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement The Walsh-Hadamard transform

Controlling ||x||4

Problem: Input x is “adversarial” - so how to make ||x||4 small?

slide-35
SLIDE 35

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement The Walsh-Hadamard transform

Controlling ||x||4

Problem: Input x is “adversarial” - so how to make ||x||4 small? Solution: Use an isometric mapping Φ, with a guarantee that ||Φx||4 is small with very high probability

slide-36
SLIDE 36

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement The Walsh-Hadamard transform

Controlling ||x||4

Problem: Input x is “adversarial” - so how to make ||x||4 small? Solution: Use an isometric mapping Φ, with a guarantee that ||Φx||4 is small with very high probability Problem: What is such a Φ?

slide-37
SLIDE 37

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement The Walsh-Hadamard transform

Controlling ||x||4

Problem: Input x is “adversarial” - so how to make ||x||4 small? Solution: Use an isometric mapping Φ, with a guarantee that ||Φx||4 is small with very high probability Problem: What is such a Φ? (Final!) Solution: Back to the Fourier transform!

slide-38
SLIDE 38

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement The Walsh-Hadamard transform

The Discrete Fourier transform

Discrete Fourier transform on {a0, a1, . . . , aN−1} is ak →

N−1

  • n=0

ane−2πikn/N =

N−1

  • n=0

an

  • e−2πik/Nn

Can think of it as a polynomial evaluation - if P(x) = a0 + a1x + a2x2 + . . . + aN−1xN−1 then we have ak → P

  • e−2πik/N
slide-39
SLIDE 39

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement The Walsh-Hadamard transform

The finite-field Fourier transform

Notice that ωk = e−2πik/N = 1 satisfies (ωk)N = 1 ωk is a primitive root of 1 Transform is ak → P

  • ωk

for any primitive root ω

slide-40
SLIDE 40

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement The Walsh-Hadamard transform

The multi-dimensional Fourier transform

We can also consider the transform of multi-dimensional data 1-D case: ak →

N−1

  • n=0

anωkn υ-D case: If n = (n1, . . . , nυ), ak →

N−1

  • n1,...,nυ=0

anωk.n

slide-41
SLIDE 41

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement The Walsh-Hadamard transform

The Walsh-Hadamard transform

Consider the case N = 2, ω = −1 [7]: ak1,k2 →

1

  • n1,n2=0

an1,n2(−1)k1n1+k2n2 This is called the Walsh-Hadamard transform Intuition: Instead of using sinusoidal basis functions, use square-wave functions

The square waves are called Walsh-functions

Why not the standard discrete FT?

We use a technical property about the Walsh-Hadamard transform matrix...

slide-42
SLIDE 42

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement The Walsh-Hadamard transform

Fourier transform on the binary hyper-cube

Suppose we work with F2 = {0, 1} We can encode the Fourier transform with the Walsh-Hadamard matrix Hd, Hd(i, j) = 1 2d/2 (−1)<i−1,j−1> where < i, j > is the dot-product of i, j as expressed in binary Fact: Hd = 1 √ 2 Hd/2 Hd/2 Hd/2 −Hd/2

  • Corollary: We can compute Hd in O(d log d) time
slide-43
SLIDE 43

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement The Walsh-Hadamard transform

Example of Hadamard matrix

When d = 4, we get H4 = 1 2     1 1 1 1 1 −1 1 −1 1 1 −1 −1 1 −1 −1 1     Note entries are always ±1

slide-44
SLIDE 44

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement The Walsh-Hadamard transform

Fourier again?

Let Φ : x → HdD0x

D0 as before a random diagonal ±1 matrix

Already know that it will preserve the ℓ2 norm But is ||Φ(x)||4 small?

slide-45
SLIDE 45

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement The Walsh-Hadamard transform

Fourier again?

Let Φ : x → HdD0x

D0 as before a random diagonal ±1 matrix

Already know that it will preserve the ℓ2 norm But is ||Φ(x)||4 small? Answer: Yes - by another application of Talagrand’s theorem!

slide-46
SLIDE 46

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement The Walsh-Hadamard transform

Towards Talagrand

Need σ, µ for Talagrand’s theorem Write Φ(x) = Mz as before, where M(i) = xiH(i) Estimate deviation: σ = ||M||2→4 = ||MT|| 4

3 →2 (from earlier fact)

  • x4

i

1/4 sup

||y||4/3=1

yTH(i)41/4 = ||x||4||H|| 4

3 →4

slide-47
SLIDE 47

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement The Walsh-Hadamard transform

Some magic

We now employ the following theorem [4] Theorem Haussdorf-Young theorem. For any p ∈ [1, 2], if H is the Hadamard matrix, and 1

p + 1 q = 1, then

||H||p→q ≤ √ d.d− 1

p

As a result, for p = 4

3,

σ ≤ ||x||4d−1/4 Further, we have the following fact (see [3] for proof!)... Fact: µ = O

  • 1

d1/4

slide-48
SLIDE 48

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement The Walsh-Hadamard transform

Getting the desired result

With the above σ, µ, an application of Talagrand, along with the assumption k = O(d1/2−δ), reveals ||HD0x||4 ≤ c0d−1/4 + c1d−δ/2||x||4 If we compose the mapping, ||HD1(HD0x)|| ≤ c0d−1/4 + c0c1d−1/4−δ/2 + c2

1d−δ||x||4

If we repeat this r = 1

2δ times,

||HDr−1HDr−2 . . . HD0x||4 = O

  • d−1/4
slide-49
SLIDE 49

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement The Walsh-Hadamard transform

Our resultant transform

To control ||x||4, use the composed transform Φ(r) : x → HDr−1HDr−2 . . . HD0x We manage to preserve ||x||2, and contract ||Φrx||4 Runtime is O

  • d log d

δ

slide-50
SLIDE 50

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement Error-correcting codes

Error-correcting codes

The Hadamard matrix also has a connection to error-correcting codes Such codes look to represent one’s message in such a way that it can be decoded correctly even if there are some errors during transmission Suppose we want to send out a message to a decoder which allows for at most d errors

i.e. we can recover from d or less errors in the transmission

Fact: By choosing our “code-words” from the matrix H2d −H2d

  • , where −1 → 0, we can correct up to d errors
slide-51
SLIDE 51

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement Error-correcting codes

Code matrix

An m × d matrix A is called a code matrix if A =

  • d

m      Hd(i1, :) Hd(i2, :) . . . Hd(im, :)      Picking out only m out of d rows of the Hadamard matrix

slide-52
SLIDE 52

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement Error-correcting codes

Independence in codes

A code matrix is called a-wise independent if exactly d

2a

columns agree in a places Independence is very useful for us: Theorem Suppose B is a k × d, 4-wise independent code matrix. Then, ||BT||2→4 = O

  • d1/4

√ k

slide-53
SLIDE 53

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement Error-correcting codes

Proof of theorem

Recall that we need to bound ||BT||2→4 = sup

||y||2=1

||yTB||4 Consider:

||y TB||4

4 = dE

  • (y TB(j))4

= d k2

  • i1
  • i2
  • i3
  • i4

E [yi1yi2yi3yi4b1b2b3b4] = d k2 (3||y||4

2 − 2||y||4 4)

≤ 3d k2

Consequently, ||BT||2→4 ≤ (3d)1/4 √ k

slide-54
SLIDE 54

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement Error-correcting codes

Making our matrix

We’re set if we get a k × d, 4-wise independent code matrix Problem: How do we make such a matrix?

slide-55
SLIDE 55

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement Error-correcting codes

Making our matrix

We’re set if we get a k × d, 4-wise independent code matrix Problem: How do we make such a matrix? Fact: There exists a 4-wise independent code matrix of size k × BCH(k) = Θ(k2)

Called the BCH code matrix

Which is good, because...

slide-56
SLIDE 56

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement Error-correcting codes

Making our matrix

We’re set if we get a k × d, 4-wise independent code matrix Problem: How do we make such a matrix? Fact: There exists a 4-wise independent code matrix of size k × BCH(k) = Θ(k2)

Called the BCH code matrix

Which is good, because... Fact: By padding and “copy-pasting”, we retain

  • independence. In particular, we can construct a k × d matrix

from a k × BCH(k) matrix: B =

  • BBCH

BBCH . . . BBCH

  • d

BCH(k) copies

slide-57
SLIDE 57

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement Error-correcting codes

Time to make matrix

Time to compute the mapping x → Bx? We have to do

d BCH(k) mappings BBCHxBCH

Each such mapping can be done via a Walsh-Hadamard transform, by construction of BCH codes

Takes time O(BCH(k). log BCH(k))

Total runtime is therefore O(d log k)

slide-58
SLIDE 58

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement Putting it together

Merging results

Use the randomized Fourier transform to keep ||x||4 small

O(d log d) time

Use the error-correcting code matrix to keep ||B||2→4 small

O(d log k) time

Result: We get the concentration bound!

slide-59
SLIDE 59

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement Putting it together

Runtime

Runtime is still going to be O(d log d) Question: Can we speed up the computation of Φ(r)?

slide-60
SLIDE 60

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement Putting it together

Runtime

Runtime is still going to be O(d log d) Question: Can we speed up the computation of Φ(r)? Answer: Yes - use the same “block” idea as with the error-correcting codes

Some rather technical calculation reveals this will still work

slide-61
SLIDE 61

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement Putting it together

Blocked transform

Choose β = BCH(k).kδ = Θ(k2+δ) Let H =      H1 H2 ... Hd/β      where each Hi is of size β × β Fact: The above mapping can replace Φr The mapping HD′x can be computed in time O(d log k), so

  • ur total runtime is O(d log k)
slide-62
SLIDE 62

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement Putting it together

A tabular comparison

Runtimes of the three approaches (standard JL, Fast JLT, and Ailon-Liberty) (from [3]):

k = o(log d) k ∈ [ω(log d),

  • (poly(d))]

k ∈ [Ω(poly(d)),

  • ((d log d)1/3)]

k ∈ [ω((d log d)1/3), O(d1/2−δ)] AL AL AL, FJLT AL JL FJLT FJLT FJLT JL JL JL

slide-63
SLIDE 63

Faster Johnson-Lindenstrauss style reductions Ailon and Liberty’s improvement Putting it together

Conclusion

ℓ2 dimensionality reduction is based on the Johnson-Lindenstrauss lemma The standard approach takes O(dk) time to perform the reduction By sparsifying, and compensating with a randomized Fourier transform, we can reduce the runtime to roughly O(d log d) via the Fast Johnson-Lindenstrauss transform [2] By using error-correcting codes and a randomized Fourier transform, we can reduce the runtime to roughly O(d log k) via Ailon and Liberty’s transform [3] Open questions: Can one extend this to k = O(d1−δ)? k = Ω(d)?

slide-64
SLIDE 64

Faster Johnson-Lindenstrauss style reductions References

Achlioptas, D. Database-friendly random projections. In PODS ’01: Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (New York, NY, USA, 2001), ACM Press, pp. 274–281. Ailon, N., and Chazelle, B. Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform. In STOC ’06: Proceedings of the thirty-eighth annual ACM symposium on Theory of computing (New York, NY, USA, 2006), ACM Press,

  • pp. 557–563.

Ailon, N., and Liberty, E. Fast dimension reduction using Rademacher series on dual BCH codes.

  • Tech. Rep. TR07-070, Electronic Colloquium on Computational

Complexity, 2007. Bergh, J., and Lofstrom, J. Interpolation Spaces. Springer-Verlag, 1976.

slide-65
SLIDE 65

Faster Johnson-Lindenstrauss style reductions References

Johnson, W., and Lindenstrauss, J. Extensions of Lipschitz mappings into a Hilbert space. In Conference in Modern Analysis and Probability (Providence, RI, USA, 1984), American Mathematical Society, pp. 189–206. Ledoux, M., and Talagrand, M. Probability in Banach Spaces: Isoperimetry and Processes. Springer, 2006. Massey, J. L. Design and analysis of block ciphers. http://www.win.tue.nl/math/eidma/courses/minicourses/massey/ dabcmay2000f3.pdf, May 2000. Presentation.