compsci 514 algorithms for data science
play

compsci 514: algorithms for data science Cameron Musco University - PowerPoint PPT Presentation

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Spring 2020. Lecture 12 0 logistics guide/practice questions. Tuesday and also before class at 10:00am . 1 Problem Set 2 is due this upcoming


  1. compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Spring 2020. Lecture 12 0

  2. logistics guide/practice questions. Tuesday and also before class at 10:00am . 1 • Problem Set 2 is due this upcoming Sunday 3/8 at 8pm. • Midterm is next Thursday, 3/12. See webpage for study • I will hold office hours after class today . • Next week office hours will be at the usual time after class

  3. summary Last Class: Finished Up Johnson-Lindenstrauss Lemma support vector machines and k -means clustering. This Class: High-Dimensional Geometry 2 • Completed the proof of the Distributional JL lemma. • Showed two applications of random projection: faster • Started discussion of high-dimensional geometry. • Bizarre phemomena in high-dimensional space. • Connections to JL lemma and random projection.

  4. orthogonal vectors What is the largest set of mutually orthogonal unit vectors in What is the largest set of unit vectors in d -dimensional space nearly pairwise orthogonal with high probability! 3 d -dimensional space? Answer: d . that have all pairwise dot products |⟨ ⃗ x ,⃗ y ⟩| ≤ ϵ ? (think ϵ = . 01) Answer: 2 Θ( ϵ 2 d ) . In fact, an exponentially large set of random vectors will be

  5. 4 d . nearly orthogonal. 2 Claim: 2 Θ( ϵ 2 d ) random d -dimensional unit vectors will have all pairwise dot products |⟨ ⃗ x ,⃗ y ⟩| ≤ ϵ (be nearly orthogonal). Proof: Let ⃗ x 1 , . . . ,⃗ √ x t each have independent random entries set to ± 1 / • What is ∥ ⃗ x i ∥ 2 ? Every ⃗ x i is always a unit vector. • What is E [ ⟨ ⃗ x i ,⃗ x j ⟩ ] ? E [ ⟨ ⃗ x i ,⃗ x j ⟩ ] = 0 • By a Chernoff bound, Pr [ |⟨ ⃗ x i ,⃗ x j ⟩| ≥ ϵ ] ≤ 2 e − ϵ 2 d / 6 . 2 e ϵ 2 d / 12 , using a union bound over all • If we chose t = 1 8 e ϵ 2 d / 6 possible pairs, with probability ≥ 3 / 4 all will be ( t ) ≤ 1

  6. curse of dimensionality Even with an exponential number of random vector samples, have a huge amount of data. high-dimensional space – samples are very ‘sparse’ unless we Curse of dimensionality for sampling/learning functions in clustering useless. we don’t see any nearby vectors. x T 5 Up Shot: In d -dimensional space, a set of 2 Θ( ϵ 2 d ) random unit vectors have all pairwise dot products at most ϵ (think ϵ = . 01) ∥ ⃗ x i − ⃗ 2 = ∥ ⃗ 2 + ∥ ⃗ 2 − 2 ⃗ i ⃗ x j ∥ 2 x i ∥ 2 x j ∥ 2 x j ≥ 1 . 98 . • Can make methods like nearest neighbor classification or • Only hope is if we lots of structure (which we typically do...)

  7. curse of dimensionality Distances for Random Images: model for actual input data. Another Interpretation: Tells us that random data can be a very bad Distances for MNIST Digits: 6 10 7 2.5 2 1.5 1 0.5 0 0 0.5 1 1.5 2 10 7 10 5 5 5 10 10 10 15 15 15 20 20 20 8 25 25 25 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 6 5 5 5 10 10 10 15 15 15 20 20 20 4 25 25 25 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 5 5 5 2 10 10 10 15 15 15 20 20 20 25 25 25 0 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 -0.5 0 0.5 1 1.5 2 2.5

  8. connection to dimensionality reduction Recall: The Johnson Lindenstrauss lemma states that if exercise to partially work through. x n x 1 then 7 , log n ( ) Π ∈ R m × d is a random matrix (linear map) with m = O ϵ 2 for ⃗ x 1 , . . . ,⃗ x n ∈ R d with high probability, for all i , j : ( 1 − ϵ ) ∥ ⃗ x i − ⃗ 2 ≤ ∥ Π ⃗ x i − Π ⃗ 2 ≤ ( 1 + ϵ ) ∥ ⃗ x i − ⃗ x j ∥ 2 x j ∥ 2 x j ∥ 2 2 . Implies: If ⃗ x 1 , . . . ,⃗ x n are nearly orthogonal unit vectors in d -dimensions (with pairwise dot products bounded by ϵ/ 8), Π ⃗ Π ⃗ x 1 ∥ 2 , . . . , ∥ Π ⃗ ∥ Π ⃗ x n ∥ 2 are nearly orthogonal unit vectors in m -dimensions (with pairwise dot products bounded by ϵ ). • Similar to SVM analysis. Algebra is a bit messy but a good

  9. connection to dimensionality reduction Claim 1: n nearly orthogonal unit vectors can be projected to after projection to a much lower dimensional space. d -dimensional space still holds on the n points in question up to constants. orthogonal vectors. 8 dimensions and still be nearly orthogonal. log n ( ) m = O ϵ 2 Claim 2: In m dimensions, there are at most 2 O ( ϵ 2 m ) nearly • For both these to hold it might be that n ≤ 2 O ( ϵ 2 m ) . • 2 O ( ϵ 2 m ) = 2 O ( log n ) ≥ n . Tells us that the JL lemma is optimal • m is chosen just large enough so that the odd geometry of

  10. bizarre shape of high-dimensional balls small in the dimension d ! Volume of a radius R ball is d 2 9 Let B d be the unit ball in d dimensions. B d = { x ∈ R d : ∥ x ∥ 2 ≤ 1 } . What percentage of the volume of B d falls within ϵ distance of its surface? Answer: all but a ( 1 − ϵ ) d ≤ e − ϵ d fraction. Exponentially π ( d / 2 )! · R d .

  11. bizarre shape of high-dimensional balls area/volume ratio of any shape. nearly all will fall near its surface. 10 All but an e − ϵ d fraction of a unit ball’s volume is within ϵ of its surface. If we randomly sample points with ∥ x ∥ 2 ≤ 1, nearly all will have ∥ x ∥ 2 ≥ 1 − ϵ . • Isoperimetric inequality : the ball has the maximum surface • If we randomly sample points from any high-dimensional shape, • ‘All points are outliers.’

  12. bizarre shape of high-dimensional balls What fraction of the cubes are visible on the surface of the cube? 10 3 1000 11 10 3 − 8 3 = 1000 − 512 = . 488 .

  13. 12 bizarre shape of high-dimensional balls What percentage of the volume of B d falls within ϵ distance of its equator? Answer: all but a 2 Θ( − ϵ 2 d ) fraction. Formally: volume of set S = { x ∈ B d : | x ( 1 ) | ≤ ϵ } . By symmetry, all but a 2 Θ( − ϵ 2 d ) fraction of the volume falls within ϵ of any equator! S = { x ∈ B d : |⟨ x , t ⟩| ≤ ϵ }

  14. bizarre shape of high-dimensional balls How is this possible? High-dimensional space looks nothing like this picture! 13 Claim 1: All but a 2 Θ( − ϵ 2 d ) fraction of the volume of a ball falls within ϵ of any equator. Claim 2: All but a 2 Θ( − ϵ d ) fraction falls within ϵ of its surface.

  15. concentration of volume at equator x is selected uniformly at random from the surface of the ball. x 14 Proof Sketch: Claim: All but a 2 Θ( − ϵ 2 d ) fraction of the volume of a ball falls within ϵ of its equator. I.e., in S = { x ∈ B d : | x ( 1 ) | ≤ ϵ } . • Let x have independent Gaussian N ( 0 , 1 ) entries and let ¯ ∥ x ∥ 2 . ¯ x = x ( 1 ) | > ϵ ] ≤ 2 Θ( − ϵ 2 d ) . Why? • Suffices to show that Pr [ | ¯ x ( 1 ) = x ( 1 ) • ¯ ∥ x ∥ 2 . What is E [ ∥ x ∥ 2 2 ] ? E [ ∥ x ∥ 2 2 ] = ∑ d i = 1 E [ x ( i ) 2 ] = d . 2 ≤ d / 2 ] ≤ 2 − Θ( d ) Pr [ ∥ x ∥ 2 • Conditioning on ∥ x ∥ 2 2 ≥ d / 2, since x ( 1 ) is normally distributed, Pr [ | ¯ x ( 1 ) | > ϵ ] = Pr [ | x ( 1 ) | > ϵ · ∥ x ∥ 2 ] d / 2 ] = 2 Θ( − ( ϵ √ d / 2 ) 2 ) = 2 Θ( − ϵ 2 d ) . √ ≤ Pr [ | x ( 1 ) | > ϵ ·

  16. high-dimensional cubes In low-dimensions, the cube is not that different from the ball. d 2 1 huge gap! So something is very different about these shapes... 15 Let C d be the d -dimensional cube: C d = { x ∈ R d : | x ( i ) | ≤ 1 ∀ i } . π But volume of C d is 2 d while volume of B d is ( d / 2 )! = d Θ( d ) . A

  17. high-dimensional cubes Corners of cube are d times further away from the origin than the surface of the ball. 16 √

  18. high-dimensional cubes these corners lie far outside the unit ball. 17 Data generated from the ball B d will behave very differently than data generated from the cube C d . • x ∼ B d has ∥ x ∥ 2 2 ≤ 1. 2 ≤ d / 6 ] ≤ 2 − Θ( d ) . • x ∼ C d has E [ ∥ x ∥ 2 2 ] = ? d / 3, and Pr [ ∥ x ∥ 2 • Almost all the volume of the unit cube falls in its corners, and

  19. takaways low-dimensional space. lower-dimensional space that is still large enough to capture this behavior on a subset of n points. high-dimensional vectors. in high-dimensions. 18 • High-dimensional space behaves very differently from • Random projection (i.e., the JL Lemma) reduces to a much • Need to be careful when using low-dimensional intuition for • Need to be careful when modeling data as random vectors

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend