compsci 514: algorithms for data science Cameron Musco University - - PowerPoint PPT Presentation

compsci 514 algorithms for data science
SMART_READER_LITE
LIVE PREVIEW

compsci 514: algorithms for data science Cameron Musco University - - PowerPoint PPT Presentation

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall 2019. Lecture 22 0 logistics coming in next couple of days. 1 Problem Set 4 released last night. Due Sunday 12/15 at 8pm. Final Exam


slide-1
SLIDE 1

compsci 514: algorithms for data science

Cameron Musco University of Massachusetts Amherst. Fall 2019. Lecture 22

slide-2
SLIDE 2

logistics

  • Problem Set 4 released last night. Due Sunday 12/15 at 8pm.
  • Final Exam Thursday 12/19 at 10:30am in Thompson 104.
  • Exam prep materials (list of topics, practice problems)

coming in next couple of days.

1

slide-3
SLIDE 3

summary

Before Break:

  • Finished discussion of SGD.
  • Gradient descent and SGD as applied to least squares

regression. This Class:

  • A quick tour of the counterintuitive properties of

high-dimensional space.

  • Many connections to concentration inequalities.
  • Implications for working with high-dimensional data (curse
  • f dimensionality).

2

slide-4
SLIDE 4

summary

Before Break:

  • Finished discussion of SGD.
  • Gradient descent and SGD as applied to least squares

regression. This Class:

  • A quick tour of the counterintuitive properties of

high-dimensional space.

  • Many connections to concentration inequalities.
  • Implications for working with high-dimensional data (curse
  • f dimensionality).

2

slide-5
SLIDE 5

high-dimensional data

Modern data analysis often involves very high-dimensional data points.

  • Websites record (tens of) thousands of measurements per

user: who they follow, when they visit the site, timestamps for specific iteractions, etc.

  • A 3 minute, 500 × 500 pixel video clip at 15 FPS has ≥ 2

billion pixel values.

  • The human genome has 3 billion+ base pairs.

Typically when discussing algorithm design we imagine data in much lower (usually 3) dimensional space.

3

slide-6
SLIDE 6

high-dimensional data

Modern data analysis often involves very high-dimensional data points.

  • Websites record (tens of) thousands of measurements per

user: who they follow, when they visit the site, timestamps for specific iteractions, etc.

  • A 3 minute, 500 × 500 pixel video clip at 15 FPS has ≥ 2

billion pixel values.

  • The human genome has 3 billion+ base pairs.

Typically when discussing algorithm design we imagine data in much lower (usually 3) dimensional space.

3

slide-7
SLIDE 7

low-dimensional intuition

This can be a bit dangerous as in reality high-dimensional space is very different from low-dimensional space.

4

slide-8
SLIDE 8

low-dimensional intuition

This can be a bit dangerous as in reality high-dimensional space is very different from low-dimensional space.

4

slide-9
SLIDE 9
  • rthogonal vectors

What is the largest set of mutually orthogonal unit vectors in d-dimensional space? Answer: d.

5

slide-10
SLIDE 10
  • rthogonal vectors

What is the largest set of mutually orthogonal unit vectors in d-dimensional space? Answer: d.

5

slide-11
SLIDE 11

nearly orthogonal vectors

What is the largest set of unit vectors in d-dimensional space that have all pairwise dot products |⟨⃗ x,⃗ y⟩| ≤ ϵ? (think ϵ = .01)

  • 1. d

2. d 3. d2

  • 4. 2

d

In fact, an exponentially large set of random vectors will be nearly pairwise orthogonal with high probability! Proof: Let x1 xt each have independent random entries set to 1 d.

  • xi is always a unit vector.
  • xi xj
  • By a Chernoff bound, Pr

xi xj 2e

2d 3.

  • If we chose t

1 2e

2d 6, using a union bound over all

t2

1 4e

2d 3

possible pairs, with probability 1 2 all with be nearly orthogonal.

6

slide-12
SLIDE 12

nearly orthogonal vectors

What is the largest set of unit vectors in d-dimensional space that have all pairwise dot products |⟨⃗ x,⃗ y⟩| ≤ ϵ? (think ϵ = .01)

  • 1. d
  • 2. Θ(d)
  • 3. Θ(d2)
  • 4. 2Θ(d)

In fact, an exponentially large set of random vectors will be nearly pairwise orthogonal with high probability! Proof: Let x1 xt each have independent random entries set to 1 d.

  • xi is always a unit vector.
  • xi xj
  • By a Chernoff bound, Pr

xi xj 2e

2d 3.

  • If we chose t

1 2e

2d 6, using a union bound over all

t2

1 4e

2d 3

possible pairs, with probability 1 2 all with be nearly orthogonal.

6

slide-13
SLIDE 13

nearly orthogonal vectors

What is the largest set of unit vectors in d-dimensional space that have all pairwise dot products |⟨⃗ x,⃗ y⟩| ≤ ϵ? (think ϵ = .01)

  • 1. d
  • 2. Θ(d)
  • 3. Θ(d2)
  • 4. 2Θ(d)

In fact, an exponentially large set of random vectors will be nearly pairwise orthogonal with high probability! Proof: Let x1 xt each have independent random entries set to 1 d.

  • xi is always a unit vector.
  • xi xj
  • By a Chernoff bound, Pr

xi xj 2e

2d 3.

  • If we chose t

1 2e

2d 6, using a union bound over all

t2

1 4e

2d 3

possible pairs, with probability 1 2 all with be nearly orthogonal.

6

slide-14
SLIDE 14

nearly orthogonal vectors

What is the largest set of unit vectors in d-dimensional space that have all pairwise dot products |⟨⃗ x,⃗ y⟩| ≤ ϵ? (think ϵ = .01)

  • 1. d
  • 2. Θ(d)
  • 3. Θ(d2)
  • 4. 2Θ(d)

In fact, an exponentially large set of random vectors will be nearly pairwise orthogonal with high probability! Proof: Let x1 xt each have independent random entries set to 1 d.

  • xi is always a unit vector.
  • xi xj
  • By a Chernoff bound, Pr

xi xj 2e

2d 3.

  • If we chose t

1 2e

2d 6, using a union bound over all

t2

1 4e

2d 3

possible pairs, with probability 1 2 all with be nearly orthogonal.

6

slide-15
SLIDE 15

nearly orthogonal vectors

What is the largest set of unit vectors in d-dimensional space that have all pairwise dot products |⟨⃗ x,⃗ y⟩| ≤ ϵ? (think ϵ = .01)

  • 1. d
  • 2. Θ(d)
  • 3. Θ(d2)
  • 4. 2Θ(d)

In fact, an exponentially large set of random vectors will be nearly pairwise orthogonal with high probability! Proof: Let x1, . . . , xt each have independent random entries set to ±1/ √ d.

  • xi is always a unit vector.
  • xi xj
  • By a Chernoff bound, Pr

xi xj 2e

2d 3.

  • If we chose t

1 2e

2d 6, using a union bound over all

t2

1 4e

2d 3

possible pairs, with probability 1 2 all with be nearly orthogonal.

6

slide-16
SLIDE 16

nearly orthogonal vectors

What is the largest set of unit vectors in d-dimensional space that have all pairwise dot products |⟨⃗ x,⃗ y⟩| ≤ ϵ? (think ϵ = .01)

  • 1. d
  • 2. Θ(d)
  • 3. Θ(d2)
  • 4. 2Θ(d)

In fact, an exponentially large set of random vectors will be nearly pairwise orthogonal with high probability! Proof: Let x1, . . . , xt each have independent random entries set to ±1/ √ d.

  • xi is always a unit vector.
  • xi xj
  • By a Chernoff bound, Pr

xi xj 2e

2d 3.

  • If we chose t

1 2e

2d 6, using a union bound over all

t2

1 4e

2d 3

possible pairs, with probability 1 2 all with be nearly orthogonal.

6

slide-17
SLIDE 17

nearly orthogonal vectors

What is the largest set of unit vectors in d-dimensional space that have all pairwise dot products |⟨⃗ x,⃗ y⟩| ≤ ϵ? (think ϵ = .01)

  • 1. d
  • 2. Θ(d)
  • 3. Θ(d2)
  • 4. 2Θ(d)

In fact, an exponentially large set of random vectors will be nearly pairwise orthogonal with high probability! Proof: Let x1, . . . , xt each have independent random entries set to ±1/ √ d.

  • xi is always a unit vector.
  • E[⟨xi, xj⟩] = ?
  • By a Chernoff bound, Pr

xi xj 2e

2d 3.

  • If we chose t

1 2e

2d 6, using a union bound over all

t2

1 4e

2d 3

possible pairs, with probability 1 2 all with be nearly orthogonal.

6

slide-18
SLIDE 18

nearly orthogonal vectors

What is the largest set of unit vectors in d-dimensional space that have all pairwise dot products |⟨⃗ x,⃗ y⟩| ≤ ϵ? (think ϵ = .01)

  • 1. d
  • 2. Θ(d)
  • 3. Θ(d2)
  • 4. 2Θ(d)

In fact, an exponentially large set of random vectors will be nearly pairwise orthogonal with high probability! Proof: Let x1, . . . , xt each have independent random entries set to ±1/ √ d.

  • xi is always a unit vector.
  • E[⟨xi, xj⟩] = 0.
  • By a Chernoff bound, Pr

xi xj 2e

2d 3.

  • If we chose t

1 2e

2d 6, using a union bound over all

t2

1 4e

2d 3

possible pairs, with probability 1 2 all with be nearly orthogonal.

6

slide-19
SLIDE 19

nearly orthogonal vectors

What is the largest set of unit vectors in d-dimensional space that have all pairwise dot products |⟨⃗ x,⃗ y⟩| ≤ ϵ? (think ϵ = .01)

  • 1. d
  • 2. Θ(d)
  • 3. Θ(d2)
  • 4. 2Θ(d)

In fact, an exponentially large set of random vectors will be nearly pairwise orthogonal with high probability! Proof: Let x1, . . . , xt each have independent random entries set to ±1/ √ d.

  • xi is always a unit vector.
  • E[⟨xi, xj⟩] = 0.
  • By a Chernoff bound, Pr[|⟨xi, xj⟩| ≥ ϵ] ≤ 2e−ϵ2d/3.
  • If we chose t

1 2e

2d 6, using a union bound over all

t2

1 4e

2d 3

possible pairs, with probability 1 2 all with be nearly orthogonal.

6

slide-20
SLIDE 20

nearly orthogonal vectors

What is the largest set of unit vectors in d-dimensional space that have all pairwise dot products |⟨⃗ x,⃗ y⟩| ≤ ϵ? (think ϵ = .01)

  • 1. d
  • 2. Θ(d)
  • 3. Θ(d2)
  • 4. 2Θ(d)

In fact, an exponentially large set of random vectors will be nearly pairwise orthogonal with high probability! Proof: Let x1, . . . , xt each have independent random entries set to ±1/ √ d.

  • xi is always a unit vector.
  • E[⟨xi, xj⟩] = 0.
  • By a Chernoff bound, Pr[|⟨xi, xj⟩| ≥ ϵ] ≤ 2e−ϵ2d/3.
  • If we chose t = 1

2eϵ2d/6, using a union bound over all ≤ t2 = 1 4eϵ2d/3

possible pairs, with probability ≥ 1/2 all with be nearly orthogonal.

6

slide-21
SLIDE 21

curse of dimensionality

Up Shot: In d-dimensional space, a set of 2Θ(ϵ2d) random unit vectors have all pairwise dot products at most ϵ (think ϵ = .01) xi xj

2 2

xi

2 2

xj

2 2

2xT

i xj

1 98 Even with an exponential number of samples, we don’t see any nearby vectors.

  • Can make methods like k-nearest neighbor classification or

kernel regression useless. Curse of dimensionality for sampling/learning functions in high dimensional space – samples are very ‘sparse’ unless we have a huge amount of data.

  • Only hope is if we have strong low-dimensional structure.

7

slide-22
SLIDE 22

curse of dimensionality

Up Shot: In d-dimensional space, a set of 2Θ(ϵ2d) random unit vectors have all pairwise dot products at most ϵ (think ϵ = .01) ∥xi − xj∥2

2

xi

2 2

xj

2 2

2xT

i xj

1 98 Even with an exponential number of samples, we don’t see any nearby vectors.

  • Can make methods like k-nearest neighbor classification or

kernel regression useless. Curse of dimensionality for sampling/learning functions in high dimensional space – samples are very ‘sparse’ unless we have a huge amount of data.

  • Only hope is if we have strong low-dimensional structure.

7

slide-23
SLIDE 23

curse of dimensionality

Up Shot: In d-dimensional space, a set of 2Θ(ϵ2d) random unit vectors have all pairwise dot products at most ϵ (think ϵ = .01) ∥xi − xj∥2

2 = ∥xi∥2 2 + ∥xj∥2 2 − 2xT i xj

1 98 Even with an exponential number of samples, we don’t see any nearby vectors.

  • Can make methods like k-nearest neighbor classification or

kernel regression useless. Curse of dimensionality for sampling/learning functions in high dimensional space – samples are very ‘sparse’ unless we have a huge amount of data.

  • Only hope is if we have strong low-dimensional structure.

7

slide-24
SLIDE 24

curse of dimensionality

Up Shot: In d-dimensional space, a set of 2Θ(ϵ2d) random unit vectors have all pairwise dot products at most ϵ (think ϵ = .01) ∥xi − xj∥2

2 = ∥xi∥2 2 + ∥xj∥2 2 − 2xT i xj ≥ 1.98.

Even with an exponential number of samples, we don’t see any nearby vectors.

  • Can make methods like k-nearest neighbor classification or

kernel regression useless. Curse of dimensionality for sampling/learning functions in high dimensional space – samples are very ‘sparse’ unless we have a huge amount of data.

  • Only hope is if we have strong low-dimensional structure.

7

slide-25
SLIDE 25

curse of dimensionality

Up Shot: In d-dimensional space, a set of 2Θ(ϵ2d) random unit vectors have all pairwise dot products at most ϵ (think ϵ = .01) ∥xi − xj∥2

2 = ∥xi∥2 2 + ∥xj∥2 2 − 2xT i xj ≥ 1.98.

Even with an exponential number of samples, we don’t see any nearby vectors.

  • Can make methods like k-nearest neighbor classification or

kernel regression useless. Curse of dimensionality for sampling/learning functions in high dimensional space – samples are very ‘sparse’ unless we have a huge amount of data.

  • Only hope is if we have strong low-dimensional structure.

7

slide-26
SLIDE 26

curse of dimensionality

Up Shot: In d-dimensional space, a set of 2Θ(ϵ2d) random unit vectors have all pairwise dot products at most ϵ (think ϵ = .01) ∥xi − xj∥2

2 = ∥xi∥2 2 + ∥xj∥2 2 − 2xT i xj ≥ 1.98.

Even with an exponential number of samples, we don’t see any nearby vectors.

  • Can make methods like k-nearest neighbor classification or

kernel regression useless. Curse of dimensionality for sampling/learning functions in high dimensional space – samples are very ‘sparse’ unless we have a huge amount of data.

  • Only hope is if we have strong low-dimensional structure.

7

slide-27
SLIDE 27

curse of dimensionality

Up Shot: In d-dimensional space, a set of 2Θ(ϵ2d) random unit vectors have all pairwise dot products at most ϵ (think ϵ = .01) ∥xi − xj∥2

2 = ∥xi∥2 2 + ∥xj∥2 2 − 2xT i xj ≥ 1.98.

Even with an exponential number of samples, we don’t see any nearby vectors.

  • Can make methods like k-nearest neighbor classification or

kernel regression useless. Curse of dimensionality for sampling/learning functions in high dimensional space – samples are very ‘sparse’ unless we have a huge amount of data.

  • Only hope is if we have strong low-dimensional structure.

7

slide-28
SLIDE 28

curse of dimensionality

Up Shot: In d-dimensional space, a set of 2Θ(ϵ2d) random unit vectors have all pairwise dot products at most ϵ (think ϵ = .01) ∥xi − xj∥2

2 = ∥xi∥2 2 + ∥xj∥2 2 − 2xT i xj ≥ 1.98.

Even with an exponential number of samples, we don’t see any nearby vectors.

  • Can make methods like k-nearest neighbor classification or

kernel regression useless. Curse of dimensionality for sampling/learning functions in high dimensional space – samples are very ‘sparse’ unless we have a huge amount of data.

  • Only hope is if we have strong low-dimensional structure.

7

slide-29
SLIDE 29

bizarre shape of high-dimensional balls

Let Bd be the unit ball in d dimensions. Bd = {x ∈ Rd : ∥x∥2 ≤ 1}. What percentage of the volume of

d falls within

distance of its surface? Answer: all but a 1

d

e

d fraction. Exponentially

small in the dimension d! Volume of a radius R ball is

d 2

d 2

Rd.

8

slide-30
SLIDE 30

bizarre shape of high-dimensional balls

Let Bd be the unit ball in d dimensions. Bd = {x ∈ Rd : ∥x∥2 ≤ 1}. What percentage of the volume of Bd falls within ϵ distance of its surface? Answer: all but a 1

d

e

d fraction. Exponentially

small in the dimension d! Volume of a radius R ball is

d 2

d 2

Rd.

8

slide-31
SLIDE 31

bizarre shape of high-dimensional balls

Let Bd be the unit ball in d dimensions. Bd = {x ∈ Rd : ∥x∥2 ≤ 1}. What percentage of the volume of Bd falls within ϵ distance of its surface? Answer: all but a 1

d

e

d fraction. Exponentially

small in the dimension d! Volume of a radius R ball is

π

d 2

(d/2)! · Rd.

8

slide-32
SLIDE 32

bizarre shape of high-dimensional balls

Let Bd be the unit ball in d dimensions. Bd = {x ∈ Rd : ∥x∥2 ≤ 1}. What percentage of the volume of Bd falls within ϵ distance of its surface? Answer: all but a (1 − ϵ)d ≤ e−ϵd fraction. Exponentially small in the dimension d! Volume of a radius R ball is

π

d 2

(d/2)! · Rd.

8

slide-33
SLIDE 33

bizarre shape of high-dimensional balls

All but an e−ϵd fraction of a unit ball’s volume is within ϵ of its surface.

  • Isoperimetric inequality: the ball has the maximum surface

area/volume ratio of any shape.

  • If we randomly sample points from any high-dimensional shape,

nearly all will fall near its surface.

  • ‘All points are outliers.’

9

slide-34
SLIDE 34

bizarre shape of high-dimensional balls

All but an e−ϵd fraction of a unit ball’s volume is within ϵ of its surface.

  • Isoperimetric inequality: the ball has the maximum surface

area/volume ratio of any shape.

  • If we randomly sample points from any high-dimensional shape,

nearly all will fall near its surface.

  • ‘All points are outliers.’

9

slide-35
SLIDE 35

bizarre shape of high-dimensional balls

All but an e−ϵd fraction of a unit ball’s volume is within ϵ of its surface.

  • Isoperimetric inequality: the ball has the maximum surface

area/volume ratio of any shape.

  • If we randomly sample points from any high-dimensional shape,

nearly all will fall near its surface.

  • ‘All points are outliers.’

9

slide-36
SLIDE 36

bizarre shape of high-dimensional balls

All but an e−ϵd fraction of a unit ball’s volume is within ϵ of its surface.

  • Isoperimetric inequality: the ball has the maximum surface

area/volume ratio of any shape.

  • If we randomly sample points from any high-dimensional shape,

nearly all will fall near its surface.

  • ‘All points are outliers.’

9

slide-37
SLIDE 37

bizarre shape of high-dimensional balls

What percentage of the volume of Bd falls within ϵ distance of its equator? Answer: all but a 2

2d fraction.

Formally: volume of set S = {x ∈ Bd : |x(1)| ≤ ϵ}. By symmetry, all but a 2

2d fraction of the volume falls within

  • f

any equator! S x

d

x t

10

slide-38
SLIDE 38

bizarre shape of high-dimensional balls

What percentage of the volume of Bd falls within ϵ distance of its equator? Answer: all but a 2Θ(−ϵ2d) fraction. Formally: volume of set S = {x ∈ Bd : |x(1)| ≤ ϵ}. By symmetry, all but a 2

2d fraction of the volume falls within

  • f

any equator! S x

d

x t

10

slide-39
SLIDE 39

bizarre shape of high-dimensional balls

What percentage of the volume of Bd falls within ϵ distance of its equator? Answer: all but a 2Θ(−ϵ2d) fraction. Formally: volume of set S = {x ∈ Bd : |x(1)| ≤ ϵ}. By symmetry, all but a 2Θ(−ϵ2d) fraction of the volume falls within ϵ of any equator! S = {x ∈ Bd : |⟨x, t⟩| ≤ ϵ}

10

slide-40
SLIDE 40

bizarre shape of high-dimensional balls

Claim 1: All but a 2Θ(−ϵ2d) fraction of the volume of a ball falls within ϵ of any equator. Claim 2: All but a 2Θ(−ϵd) fraction falls within ϵ of its surface. How is this possible? High-dimensional space looks nothing like this picture!

11

slide-41
SLIDE 41

bizarre shape of high-dimensional balls

Claim 1: All but a 2Θ(−ϵ2d) fraction of the volume of a ball falls within ϵ of any equator. Claim 2: All but a 2Θ(−ϵd) fraction falls within ϵ of its surface. How is this possible? High-dimensional space looks nothing like this picture!

11

slide-42
SLIDE 42

bizarre shape of high-dimensional balls

Claim 1: All but a 2Θ(−ϵ2d) fraction of the volume of a ball falls within ϵ of any equator. Claim 2: All but a 2Θ(−ϵd) fraction falls within ϵ of its surface. How is this possible? High-dimensional space looks nothing like this picture!

11

slide-43
SLIDE 43

bizarre shape of high-dimensional balls

Claim 1: All but a 2Θ(−ϵ2d) fraction of the volume of a ball falls within ϵ of any equator. Claim 2: All but a 2Θ(−ϵd) fraction falls within ϵ of its surface. How is this possible? High-dimensional space looks nothing like this picture!

11

slide-44
SLIDE 44

bizarre shape of high-dimensional balls

Claim 1: All but a 2Θ(−ϵ2d) fraction of the volume of a ball falls within ϵ of any equator. Claim 2: All but a 2Θ(−ϵd) fraction falls within ϵ of its surface. How is this possible? High-dimensional space looks nothing like this picture!

11

slide-45
SLIDE 45

bizarre shape of high-dimensional balls

Claim 1: All but a 2Θ(−ϵ2d) fraction of the volume of a ball falls within ϵ of any equator. Claim 2: All but a 2Θ(−ϵd) fraction falls within ϵ of its surface. How is this possible? High-dimensional space looks nothing like this picture!

11

slide-46
SLIDE 46

concentration of volume at equator

Claim: All but a 2Θ(−ϵ2d) fraction of the volume of a ball falls within ϵ

  • f its equator. I.e., in S = {x ∈ Bd : |x(1)| ≤ ϵ}.

Proof Sketch:

  • Let x have entries set to independent Gaussians

0 1 and let x

x x

2 . x is selected uniformly at random from the surface of the

ball.

  • Suffices to show that Pr x 1

2

2d . Why?

  • x 1

x 1 x

2 . What is

x 2

2 ?

Pr x 2

2

d 2 2

d

  • Conditioning on x 2

2

d 2, since x 1 is normally distributed, Pr x 1 Pr x 1 x 2 Pr x 1 d 2 2

d 2 2

2

2d

12

slide-47
SLIDE 47

concentration of volume at equator

Claim: All but a 2Θ(−ϵ2d) fraction of the volume of a ball falls within ϵ

  • f its equator. I.e., in S = {x ∈ Bd : |x(1)| ≤ ϵ}.

Proof Sketch:

  • Let x have entries set to independent Gaussians N(0, 1) and let

¯ x =

x ∥x∥2 . ¯

x is selected uniformly at random from the surface of the ball.

  • Suffices to show that Pr x 1

2

2d . Why?

  • x 1

x 1 x

2 . What is

x 2

2 ?

Pr x 2

2

d 2 2

d

  • Conditioning on x 2

2

d 2, since x 1 is normally distributed, Pr x 1 Pr x 1 x 2 Pr x 1 d 2 2

d 2 2

2

2d

12

slide-48
SLIDE 48

concentration of volume at equator

Claim: All but a 2Θ(−ϵ2d) fraction of the volume of a ball falls within ϵ

  • f its equator. I.e., in S = {x ∈ Bd : |x(1)| ≤ ϵ}.

Proof Sketch:

  • Let x have entries set to independent Gaussians N(0, 1) and let

¯ x =

x ∥x∥2 . ¯

x is selected uniformly at random from the surface of the ball.

  • Suffices to show that Pr[|¯

x(1)| > ϵ] ≤ 2Θ(−ϵ2d). Why?

  • x 1

x 1 x

2 . What is

x 2

2 ?

Pr x 2

2

d 2 2

d

  • Conditioning on x 2

2

d 2, since x 1 is normally distributed, Pr x 1 Pr x 1 x 2 Pr x 1 d 2 2

d 2 2

2

2d

12

slide-49
SLIDE 49

concentration of volume at equator

Claim: All but a 2Θ(−ϵ2d) fraction of the volume of a ball falls within ϵ

  • f its equator. I.e., in S = {x ∈ Bd : |x(1)| ≤ ϵ}.

Proof Sketch:

  • Let x have entries set to independent Gaussians N(0, 1) and let

¯ x =

x ∥x∥2 . ¯

x is selected uniformly at random from the surface of the ball.

  • Suffices to show that Pr[|¯

x(1)| > ϵ] ≤ 2Θ(−ϵ2d). Why?

  • ¯

x(1) = x(1)

∥x∥2 . What is E[∥x∥2 2]?

Pr x 2

2

d 2 2

d

  • Conditioning on x 2

2

d 2, since x 1 is normally distributed, Pr x 1 Pr x 1 x 2 Pr x 1 d 2 2

d 2 2

2

2d

12

slide-50
SLIDE 50

concentration of volume at equator

Claim: All but a 2Θ(−ϵ2d) fraction of the volume of a ball falls within ϵ

  • f its equator. I.e., in S = {x ∈ Bd : |x(1)| ≤ ϵ}.

Proof Sketch:

  • Let x have entries set to independent Gaussians N(0, 1) and let

¯ x =

x ∥x∥2 . ¯

x is selected uniformly at random from the surface of the ball.

  • Suffices to show that Pr[|¯

x(1)| > ϵ] ≤ 2Θ(−ϵ2d). Why?

  • ¯

x(1) = x(1)

∥x∥2 . E[∥x∥2 2] = ∑d i=1 E[x(i)2] = d.

Pr x 2

2

d 2 2

d

  • Conditioning on x 2

2

d 2, since x 1 is normally distributed, Pr x 1 Pr x 1 x 2 Pr x 1 d 2 2

d 2 2

2

2d

12

slide-51
SLIDE 51

concentration of volume at equator

Claim: All but a 2Θ(−ϵ2d) fraction of the volume of a ball falls within ϵ

  • f its equator. I.e., in S = {x ∈ Bd : |x(1)| ≤ ϵ}.

Proof Sketch:

  • Let x have entries set to independent Gaussians N(0, 1) and let

¯ x =

x ∥x∥2 . ¯

x is selected uniformly at random from the surface of the ball.

  • Suffices to show that Pr[|¯

x(1)| > ϵ] ≤ 2Θ(−ϵ2d). Why?

  • ¯

x(1) = x(1)

∥x∥2 . E[∥x∥2 2] = ∑d i=1 E[x(i)2] = d. Pr[∥x∥2 2 ≤ d/2] ≤ 2−Θ(d)

  • Conditioning on x 2

2

d 2, since x 1 is normally distributed, Pr x 1 Pr x 1 x 2 Pr x 1 d 2 2

d 2 2

2

2d

12

slide-52
SLIDE 52

concentration of volume at equator

Claim: All but a 2Θ(−ϵ2d) fraction of the volume of a ball falls within ϵ

  • f its equator. I.e., in S = {x ∈ Bd : |x(1)| ≤ ϵ}.

Proof Sketch:

  • Let x have entries set to independent Gaussians N(0, 1) and let

¯ x =

x ∥x∥2 . ¯

x is selected uniformly at random from the surface of the ball.

  • Suffices to show that Pr[|¯

x(1)| > ϵ] ≤ 2Θ(−ϵ2d). Why?

  • ¯

x(1) = x(1)

∥x∥2 . E[∥x∥2 2] = ∑d i=1 E[x(i)2] = d. Pr[∥x∥2 2 ≤ d/2] ≤ 2−Θ(d)

  • Conditioning on ∥x∥2

2 ≥ d/2, since x(1) is normally distributed,

Pr[|¯ x(1)| > ϵ] = Pr[|x(1)| > ϵ · ∥x∥2] Pr x 1 d 2 2

d 2 2

2

2d

12

slide-53
SLIDE 53

concentration of volume at equator

Claim: All but a 2Θ(−ϵ2d) fraction of the volume of a ball falls within ϵ

  • f its equator. I.e., in S = {x ∈ Bd : |x(1)| ≤ ϵ}.

Proof Sketch:

  • Let x have entries set to independent Gaussians N(0, 1) and let

¯ x =

x ∥x∥2 . ¯

x is selected uniformly at random from the surface of the ball.

  • Suffices to show that Pr[|¯

x(1)| > ϵ] ≤ 2Θ(−ϵ2d). Why?

  • ¯

x(1) = x(1)

∥x∥2 . E[∥x∥2 2] = ∑d i=1 E[x(i)2] = d. Pr[∥x∥2 2 ≤ d/2] ≤ 2−Θ(d)

  • Conditioning on ∥x∥2

2 ≥ d/2, since x(1) is normally distributed,

Pr[|¯ x(1)| > ϵ] = Pr[|x(1)| > ϵ · ∥x∥2] ≤ Pr[|x(1)| > ϵ · √ d/2] 2

d 2 2

2

2d

12

slide-54
SLIDE 54

concentration of volume at equator

Claim: All but a 2Θ(−ϵ2d) fraction of the volume of a ball falls within ϵ

  • f its equator. I.e., in S = {x ∈ Bd : |x(1)| ≤ ϵ}.

Proof Sketch:

  • Let x have entries set to independent Gaussians N(0, 1) and let

¯ x =

x ∥x∥2 . ¯

x is selected uniformly at random from the surface of the ball.

  • Suffices to show that Pr[|¯

x(1)| > ϵ] ≤ 2Θ(−ϵ2d). Why?

  • ¯

x(1) = x(1)

∥x∥2 . E[∥x∥2 2] = ∑d i=1 E[x(i)2] = d. Pr[∥x∥2 2 ≤ d/2] ≤ 2−Θ(d)

  • Conditioning on ∥x∥2

2 ≥ d/2, since x(1) is normally distributed,

Pr[|¯ x(1)| > ϵ] = Pr[|x(1)| > ϵ · ∥x∥2] ≤ Pr[|x(1)| > ϵ · √ d/2] = 2Θ(−(ϵ√

d/2)2) = 2Θ(−ϵ2d).

12

slide-55
SLIDE 55

high dimensional cubes

Let Cd be the d-dimensional cube: Cd = {x ∈ Rd : |x(i)| ≤ 1 ∀ i}. In low-dimensions, the cube is not that different from the ball. But volume of

d is 2d while volume of d is

d 2

d 2 1 d

d . A

huge gap! So something is very different about these shapes...

13

slide-56
SLIDE 56

high dimensional cubes

Let Cd be the d-dimensional cube: Cd = {x ∈ Rd : |x(i)| ≤ 1 ∀ i}. In low-dimensions, the cube is not that different from the ball. But volume of

d is 2d while volume of d is

d 2

d 2 1 d

d . A

huge gap! So something is very different about these shapes...

13

slide-57
SLIDE 57

high dimensional cubes

Let Cd be the d-dimensional cube: Cd = {x ∈ Rd : |x(i)| ≤ 1 ∀ i}. In low-dimensions, the cube is not that different from the ball. But volume of Cd is 2d while volume of Bd is

π

d 2

(d/2)! = 1 dΘ(d) . A

huge gap! So something is very different about these shapes...

13

slide-58
SLIDE 58

high dimensional cubes

Let Cd be the d-dimensional cube: Cd = {x ∈ Rd : |x(i)| ≤ 1 ∀ i}. In low-dimensions, the cube is not that different from the ball. But volume of Cd is 2d while volume of Bd is

π

d 2

(d/2)! = 1 dΘ(d) . A

huge gap! So something is very different about these shapes...

13

slide-59
SLIDE 59

high dimensional cubes

Data generated from the ball Bd will behave very differently than data generated from the cube Cd.

  • x

d has x 2 2

1.

  • x

d has

x 2

2

, and Pr x 2

2

d 6 2

d .

  • Almost all the volume of the unit cube falls in its corners, and

these corners lie far outside the unit ball.

14

slide-60
SLIDE 60

high dimensional cubes

Data generated from the ball Bd will behave very differently than data generated from the cube Cd.

  • x ∼ Bd has ∥x∥2

2 ≤ 1.

  • x ∼ Cd has E[∥x∥2

2] = ?,

and Pr x 2

2

d 6 2

d .

  • Almost all the volume of the unit cube falls in its corners, and

these corners lie far outside the unit ball.

14

slide-61
SLIDE 61

high dimensional cubes

Data generated from the ball Bd will behave very differently than data generated from the cube Cd.

  • x ∼ Bd has ∥x∥2

2 ≤ 1.

  • x ∼ Cd has E[∥x∥2

2] = d/3,

and Pr x 2

2

d 6 2

d .

  • Almost all the volume of the unit cube falls in its corners, and

these corners lie far outside the unit ball.

14

slide-62
SLIDE 62

high dimensional cubes

Data generated from the ball Bd will behave very differently than data generated from the cube Cd.

  • x ∼ Bd has ∥x∥2

2 ≤ 1.

  • x ∼ Cd has E[∥x∥2

2] = d/3, and Pr[∥x∥2 2 ≤ d/6] ≤ 2−Θ(d).

  • Almost all the volume of the unit cube falls in its corners, and

these corners lie far outside the unit ball.

14

slide-63
SLIDE 63

high dimensional cubes

Data generated from the ball Bd will behave very differently than data generated from the cube Cd.

  • x ∼ Bd has ∥x∥2

2 ≤ 1.

  • x ∼ Cd has E[∥x∥2

2] = d/3, and Pr[∥x∥2 2 ≤ d/6] ≤ 2−Θ(d).

  • Almost all the volume of the unit cube falls in its corners, and

these corners lie far outside the unit ball.

14

slide-64
SLIDE 64

high dimensional cubes

Data generated from the ball Bd will behave very differently than data generated from the cube Cd.

  • x ∼ Bd has ∥x∥2

2 ≤ 1.

  • x ∼ Cd has E[∥x∥2

2] = d/3, and Pr[∥x∥2 2 ≤ d/6] ≤ 2−Θ(d).

  • Almost all the volume of the unit cube falls in its corners, and

these corners lie far outside the unit ball.

14

slide-65
SLIDE 65

connection to dimensionality reduction

If high-dimensional geometry is so different from low-dimensional geometry, how is dimensionality reduction (e.g., the Johnson-Lindenstrauss lemma) possible? Recall: The Johnson Lindenstrauss lemma states that if

m d is a random matrix (linear map) with m

O

log n

2

, for x1 xn

d with high probability, for all i j:

1 xi xj 2 xi xj 2 1 xi xj 2 If x1 xn are random unit vectors in d-dimensions, can show that x1 xn are essentially random unit vectors in m-dimensions. But these different dimensional spaces have very different geometries, so how is this possible?

15

slide-66
SLIDE 66

connection to dimensionality reduction

If high-dimensional geometry is so different from low-dimensional geometry, how is dimensionality reduction (e.g., the Johnson-Lindenstrauss lemma) possible? Recall: The Johnson Lindenstrauss lemma states that if Π ∈ Rm×d is a random matrix (linear map) with m = O (

log n ϵ2

) , for x1, . . . , xn ∈ Rd with high probability, for all i, j: (1 − ϵ)∥xi − xj∥2 ≤ ∥Πxi − Πxj∥2 ≤ (1 + ϵ)∥xi − xj∥2. If x1 xn are random unit vectors in d-dimensions, can show that x1 xn are essentially random unit vectors in m-dimensions. But these different dimensional spaces have very different geometries, so how is this possible?

15

slide-67
SLIDE 67

connection to dimensionality reduction

If high-dimensional geometry is so different from low-dimensional geometry, how is dimensionality reduction (e.g., the Johnson-Lindenstrauss lemma) possible? Recall: The Johnson Lindenstrauss lemma states that if Π ∈ Rm×d is a random matrix (linear map) with m = O (

log n ϵ2

) , for x1, . . . , xn ∈ Rd with high probability, for all i, j: (1 − ϵ)∥xi − xj∥2 ≤ ∥Πxi − Πxj∥2 ≤ (1 + ϵ)∥xi − xj∥2. If x1, . . . , xn are random unit vectors in d-dimensions, can show that Πx1, . . . , Πxn are essentially random unit vectors in m-dimensions. But these different dimensional spaces have very different geometries, so how is this possible?

15

slide-68
SLIDE 68

connection to dimensionality reduction

If high-dimensional geometry is so different from low-dimensional geometry, how is dimensionality reduction (e.g., the Johnson-Lindenstrauss lemma) possible? Recall: The Johnson Lindenstrauss lemma states that if Π ∈ Rm×d is a random matrix (linear map) with m = O (

log n ϵ2

) , for x1, . . . , xn ∈ Rd with high probability, for all i, j: (1 − ϵ)∥xi − xj∥2 ≤ ∥Πxi − Πxj∥2 ≤ (1 + ϵ)∥xi − xj∥2. If x1, . . . , xn are random unit vectors in d-dimensions, can show that Πx1, . . . , Πxn are essentially random unit vectors in m-dimensions. But these different dimensional spaces have very different geometries, so how is this possible?

15

slide-69
SLIDE 69

connection to dimensionality reduction

x1, . . . , xn are sampled from the surface of Bd and Πx1, . . . , Πxn are (approximately) sampled from the surface of Bm.

  • In d dimensions, 2

2d random unit vectors will have all

pairwise dot products at most with high probability

  • After JL projection,

x1 xn will still have pairwise dot products at most O with high probability.

  • In m

O

log n

2

dimensions, 2

2m

2O log n n random unit vectors will have all pairwise dot products at most with high probability.

  • m is chosen just large enough so that the odd geometry of

d-dimensional space will still hold on the n points in question.

16

slide-70
SLIDE 70

connection to dimensionality reduction

x1, . . . , xn are sampled from the surface of Bd and Πx1, . . . , Πxn are (approximately) sampled from the surface of Bm.

  • In d dimensions, 2ϵ2d random unit vectors will have all

pairwise dot products at most ϵ with high probability

  • After JL projection,

x1 xn will still have pairwise dot products at most O with high probability.

  • In m

O

log n

2

dimensions, 2

2m

2O log n n random unit vectors will have all pairwise dot products at most with high probability.

  • m is chosen just large enough so that the odd geometry of

d-dimensional space will still hold on the n points in question.

16

slide-71
SLIDE 71

connection to dimensionality reduction

x1, . . . , xn are sampled from the surface of Bd and Πx1, . . . , Πxn are (approximately) sampled from the surface of Bm.

  • In d dimensions, 2ϵ2d random unit vectors will have all

pairwise dot products at most ϵ with high probability

  • After JL projection, Πx1, . . . , Πxn will still have pairwise dot

products at most O(ϵ) with high probability.

  • In m

O

log n

2

dimensions, 2

2m

2O log n n random unit vectors will have all pairwise dot products at most with high probability.

  • m is chosen just large enough so that the odd geometry of

d-dimensional space will still hold on the n points in question.

16

slide-72
SLIDE 72

connection to dimensionality reduction

x1, . . . , xn are sampled from the surface of Bd and Πx1, . . . , Πxn are (approximately) sampled from the surface of Bm.

  • In d dimensions, 2ϵ2d random unit vectors will have all

pairwise dot products at most ϵ with high probability

  • After JL projection, Πx1, . . . , Πxn will still have pairwise dot

products at most O(ϵ) with high probability.

  • In m = O

(

log n ϵ2

) dimensions, 2ϵ2m = 2O(log n) >> n random unit vectors will have all pairwise dot products at most ϵ with high probability.

  • m is chosen just large enough so that the odd geometry of

d-dimensional space will still hold on the n points in question.

16

slide-73
SLIDE 73

connection to dimensionality reduction

x1, . . . , xn are sampled from the surface of Bd and Πx1, . . . , Πxn are (approximately) sampled from the surface of Bm.

  • In d dimensions, 2ϵ2d random unit vectors will have all

pairwise dot products at most ϵ with high probability

  • After JL projection, Πx1, . . . , Πxn will still have pairwise dot

products at most O(ϵ) with high probability.

  • In m = O

(

log n ϵ2

) dimensions, 2ϵ2m = 2O(log n) >> n random unit vectors will have all pairwise dot products at most ϵ with high probability.

  • m is chosen just large enough so that the odd geometry of

d-dimensional space will still hold on the n points in question.

16

slide-74
SLIDE 74

Questions?

17