compsci 514 algorithms for data science
play

compsci 514: algorithms for data science Cameron Musco University - PowerPoint PPT Presentation

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall 2019. Lecture 10 0 logistics submissions until Sunday 10/13 at midnight with no penalty. today. 1 Problem Set 2 is due next Friday 10/11,


  1. compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall 2019. Lecture 10 0

  2. logistics submissions until Sunday 10/13 at midnight with no penalty. today. 1 • Problem Set 2 is due next Friday 10/11, although we will allow • Midterm on Thursday 10/17. Will cover material through

  3. summary Last Class: Dimensionality Reduction data science. Johnson-Lindenstrauss Lemma. This Class: Finish the JL Lemma. methods, etc. 2 • Applications and examples of dimensionality reduction in • Low-distortion embeddings (MinHash as an example). • Low-distortion embeddings for Euclidean space and the • Prove the Johnson-Lindenstrauss Lemma. • Discuss algorithmic considerations, connections to other

  4. embeddings for euclidean space If close to a k -dimensional space, can project to k dimensions without much distortion (the idea behind PCA). 3 Low Distortion Embedding for Euclidean Space: Given x 1 , . . . , x n ∈ R d x n ∈ R d ′ (where d ′ ≪ d ) such and error parameter ϵ ≥ 0, find ˜ x 1 , . . . , ˜ that for all i , j ∈ [ n ] : ( 1 − ϵ ) ∥ x i − x j ∥ 2 ≤ ∥ ˜ x i − ˜ x j ∥ 2 ≤ ( 1 + ϵ ) ∥ x i − x j ∥ 2 If x 1 , . . . , x n lie in a k -dimensional subspace of R d can project to d ′ = k dimensions with no distortion.

  5. the johnson-lindenstrauss lemma , letting Surprising and powerful result. 4 entry chosen i.i.d. as 1 Johnson-Lindenstrauss Lemma: Let Π ∈ R d ′ × d have each d ′ · N ( 0 , 1 ) . For any set of points √ ( ) x 1 , . . . , x n ∈ R d , ϵ, δ > 0, and d ′ = O log ( n /δ ) ϵ 2 x̃ i = Π x i , with probability ≥ 1 − δ we have: For all i , j : ( 1 − ϵ ) ∥ x i − x j ∥ 2 ≤ ∥ x̃ i − x̃ j ∥ 2 ≤ ( 1 + ϵ ) ∥ x i − x j ∥ 2 . • Construction of Π is simple, random and data oblivious. x 1 , . . . , x n : original data points ( d dimensions), x̃ 1 , . . . , x̃ n : compressed data points ( d ′ < d dimensions), Π ∈ R d ′ × d : random projection matrix (embedding function), ϵ : error of embedding, δ : failure probability.

  6. random projection 5 Π ∈ R d ′ × d is a random matrix. I.e., a random function mapping length d vectors to length d ′ vectors. x 1 , . . . , x n : original points ( d dims.), x̃ 1 , . . . , x̃ n : compressed points ( d ′ < d dims.), Π : random projection (embedding function), ϵ : error of embedding.

  7. connection to simhash d 6 Compression operation is x̃ i = Π x i , so for any j , ∑ x̃ i ( j ) = ⟨ Π ( j ) , x i ⟩ = Π ( j , k ) · x i ( k ) . k = 1 Π ( j ) is a vector with independent random Gaussian entries. x 1 , . . . , x n : original points ( d dims.), x̃ 1 , . . . , x̃ n : compressed points ( d ′ < d dims.), Π ∈ R d ′ × d : random projection (embedding function)

  8. connection to simhash d Points with high cosine similarity have similar random projections. 6 Compression operation is x̃ i = Π x i , so for any j , ∑ x̃ i ( j ) = ⟨ Π ( j ) , x i ⟩ = Π ( j , k ) · x i ( k ) . k = 1 Π ( j ) is a vector with independent random Gaussian entries. x 1 , . . . , x n : original points ( d dims.), x̃ 1 , . . . , x̃ n : compressed points ( d ′ < d dims.), Π ∈ R d ′ × d : random projection (embedding function)

  9. connection to simhash d Points with high cosine similarity have similar random projections. 6 Compression operation is x̃ i = Π x i , so for any j , ∑ x̃ i ( j ) = ⟨ Π ( j ) , x i ⟩ = Π ( j , k ) · x i ( k ) . k = 1 Π ( j ) is a vector with independent random Gaussian entries. x 1 , . . . , x n : original points ( d dims.), x̃ 1 , . . . , x̃ n : compressed points ( d ′ < d dims.), Π ∈ R d ′ × d : random projection (embedding function)

  10. connection to simhash d Points with high cosine similarity have similar random projections. 6 Compression operation is x̃ i = Π x i , so for any j , ∑ x̃ i ( j ) = ⟨ Π ( j ) , x i ⟩ = Π ( j , k ) · x i ( k ) . k = 1 Π ( j ) is a vector with independent random Gaussian entries. x 1 , . . . , x n : original points ( d dims.), x̃ 1 , . . . , x̃ n : compressed points ( d ′ < d dims.), Π ∈ R d ′ × d : random projection (embedding function)

  11. connection to simhash d Points with high cosine similarity have similar random projections. 6 Compression operation is x̃ i = Π x i , so for any j , ∑ x̃ i ( j ) = ⟨ Π ( j ) , x i ⟩ = Π ( j , k ) · x i ( k ) . k = 1 Π ( j ) is a vector with independent random Gaussian entries. x 1 , . . . , x n : original points ( d dims.), x̃ 1 , . . . , x̃ n : compressed points ( d ′ < d dims.), Π ∈ R d ′ × d : random projection (embedding function)

  12. connection to simhash d Points with high cosine similarity have similar random projections. 6 Compression operation is x̃ i = Π x i , so for any j , ∑ x̃ i ( j ) = ⟨ Π ( j ) , x i ⟩ = Π ( j , k ) · x i ( k ) . k = 1 Π ( j ) is a vector with independent random Gaussian entries. x 1 , . . . , x n : original points ( d dims.), x̃ 1 , . . . , x̃ n : compressed points ( d ′ < d dims.), Π ∈ R d ′ × d : random projection (embedding function)

  13. connection to simhash x 1 d : random projection (embedding function) d dims.), d x̃ n : compressed points ( d x n : original points ( d dims.), x̃ 1 random projections. similarity have similar Points with high cosine d 6 Compression operation is x̃ i = Π x i , so for any j , ∑ x̃ i ( j ) = ⟨ Π ( j ) , x i ⟩ = Π ( j , k ) · x i ( k ) . k = 1 Π ( j ) is a vector with independent random Gaussian entries. Computing a length d ′ SimHash signature SH 1 ( x i ) , . . . , SH d ′ ( x i ) is identical to computing x̃ i = Π x i and then taking sign ( x̃ i ) .

  14. distributional jl The Johnson-Lindenstrauss Lemma is a direct consequence of a compressed vector rather than distances between vectors. high probability. 7 sen i.i.d. as 1 closely related lemma: Distributional JL Lemma: Let Π ∈ R m × d have each entry cho- ( ) log 1 /δ √ m · N ( 0 , 1 ) . If we set m = O ϵ 2 , then for any y ∈ R d , with probability ≥ 1 − δ ( 1 − ϵ ) ∥ y ∥ 2 ≤ ∥ Π y ∥ 2 ≤ ( 1 + ϵ ) ∥ y ∥ 2 Applying a random matrix Π to any vector y preserves y ’s norm with • Like a low-distortion embedding, but for the length of a • Can be proven from first principles. Will see next. Π ∈ R m × d : random projection matrix. d : original dimension. m : compressed dimension (analogous to d ′ ), ϵ : embedding error, δ : embedding failure prob.

  15. 2 projection matrix. d : original dimension. m : compressed dimension (analo- 8 ⇒ jl distributional jl = ⇒ JL Lemma: Distributional JL show that Distributional JL Lemma = a random projection Π preserves the norm of any y . The main JL Lemma says that Π preserves distances between vectors. Since Π is linear these are the same thing! ( n ) Proof: Given x 1 , . . . , x n , define vectors y ij where y ij = x i − x j . x 1 , . . . , x n : original points, x̃ 1 , . . . , x̃ n : compressed points, Π ∈ R m × d : random gous to d ′ ), ϵ : embedding error, δ : embedding failure prob.

  16. 8 2 projection matrix. d : original dimension. m : compressed dimension (analo- ⇒ jl distributional jl = ⇒ JL Lemma: Distributional JL show that Distributional JL Lemma = a random projection Π preserves the norm of any y . The main JL Lemma says that Π preserves distances between vectors. Since Π is linear these are the same thing! ( n ) Proof: Given x 1 , . . . , x n , define vectors y ij where y ij = x i − x j . ( ) log 1 /δ • If we choose Π with m = O ϵ 2 , for each y ij with probability ≥ 1 − δ we have: ( 1 − ϵ ) ∥ y ij ∥ 2 ≤ ∥ Π y ij ∥ 2 ≤ ( 1 + ϵ ) ∥ y ij ∥ 2 x 1 , . . . , x n : original points, x̃ 1 , . . . , x̃ n : compressed points, Π ∈ R m × d : random gous to d ′ ), ϵ : embedding error, δ : embedding failure prob.

  17. 8 2 projection matrix. d : original dimension. m : compressed dimension (analo- ⇒ jl distributional jl = ⇒ JL Lemma: Distributional JL show that Distributional JL Lemma = a random projection Π preserves the norm of any y . The main JL Lemma says that Π preserves distances between vectors. Since Π is linear these are the same thing! ( n ) Proof: Given x 1 , . . . , x n , define vectors y ij where y ij = x i − x j . ( ) log 1 /δ • If we choose Π with m = O ϵ 2 , for each y ij with probability ≥ 1 − δ we have: ( 1 − ϵ ) ∥ x i − x j ∥ 2 ≤ ∥ Π ( x i − x j ) ∥ 2 ≤ ( 1 + ϵ ) ∥ x i − x j ∥ 2 x 1 , . . . , x n : original points, x̃ 1 , . . . , x̃ n : compressed points, Π ∈ R m × d : random gous to d ′ ), ϵ : embedding error, δ : embedding failure prob.

  18. 8 2 projection matrix. d : original dimension. m : compressed dimension (analo- ⇒ jl distributional jl = ⇒ JL Lemma: Distributional JL show that Distributional JL Lemma = a random projection Π preserves the norm of any y . The main JL Lemma says that Π preserves distances between vectors. Since Π is linear these are the same thing! ( n ) Proof: Given x 1 , . . . , x n , define vectors y ij where y ij = x i − x j . ( ) log 1 /δ • If we choose Π with m = O ϵ 2 , for each y ij with probability ≥ 1 − δ we have: ( 1 − ϵ ) ∥ x i − x j ∥ 2 ≤ ∥ x̃ i − x̃ j ∥ 2 ≤ ( 1 + ϵ ) ∥ x i − x j ∥ 2 x 1 , . . . , x n : original points, x̃ 1 , . . . , x̃ n : compressed points, Π ∈ R m × d : random gous to d ′ ), ϵ : embedding error, δ : embedding failure prob.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend