Sparser Johnson-Lindenstrauss Transforms Jelani Nelson Princeton - PowerPoint PPT Presentation

Sparser Johnson-Lindenstrauss Transforms Jelani Nelson Princeton February 16, 2012 joint work with Daniel Kane (Stanford)

Random Projections • x ∈ R d , d huge • store y = Sx , where S is a k × d matrix (compression)

Random Projections • x ∈ R d , d huge • store y = Sx , where S is a k × d matrix (compression) • compressed sensing (recover x from y when x is (near-)sparse) • group-testing (as above, but Sx is Boolean multiplication) • recover properties of x (entropy, heavy hitters, . . . ) • approximate norm preservation (want � y � ≈ � x � ) • motif discovery (slightly different; randomly project discrete x onto subset of its coordinates) [Buhler-Tompa]

Random Projections • x ∈ R d , d huge • store y = Sx , where S is a k × d matrix (compression) • compressed sensing (recover x from y when x is (near-)sparse) • group-testing (as above, but Sx is Boolean multiplication) • recover properties of x (entropy, heavy hitters, . . . ) • approximate norm preservation (want � y � ≈ � x � ) • motif discovery (slightly different; randomly project discrete x onto subset of its coordinates) [Buhler-Tompa] • In many of these applications, random S is either required or obtains better parameters than deterministic constructions.

Metric Johnson-Lindenstrauss lemma Metric JL (MJL) Lemma, 1984 Every set of n points in Euclidean space can be embedded into O ( ε − 2 log n )-dimensional Euclidean space so that all pairwise distances are preserved up to a 1 ± ε factor.

Metric Johnson-Lindenstrauss lemma Metric JL (MJL) Lemma, 1984 Every set of n points in Euclidean space can be embedded into O ( ε − 2 log n )-dimensional Euclidean space so that all pairwise distances are preserved up to a 1 ± ε factor. Uses: • Speed up geometric algorithms by first reducing dimension of input [Indyk-Motwani, 1998], [Indyk, 2001] • Low-memory streaming algorithms for linear algebra problems [Sarl´ os, 2006], [LWMRT, 2007], [Clarkson-Woodruff, 2009] • Essentially equivalent to RIP matrices from compressed sensing [Baraniuk et al., 2008], [Krahmer-Ward, 2011] (used for recovery of sparse signals)

How to prove the JL lemma Distributional JL (DJL) lemma Lemma For any 0 < ε, δ < 1 / 2 there exists a distribution D ε,δ on R k × d for k = O ( ε − 2 log(1 /δ )) so that for any x of unit norm � > ε �� Sx � 2 � � Pr 2 − 1 < δ. S ∼D ε,δ

How to prove the JL lemma Distributional JL (DJL) lemma Lemma For any 0 < ε, δ < 1 / 2 there exists a distribution D ε,δ on R k × d for k = O ( ε − 2 log(1 /δ )) so that for any x of unit norm � > ε �� Sx � 2 � � Pr 2 − 1 < δ. S ∼D ε,δ Proof of MJL: Set δ = 1 / n 2 in DJL and x as the difference vector � n � of some pair of points. Union bound over the pairs. 2

How to prove the JL lemma Distributional JL (DJL) lemma Lemma For any 0 < ε, δ < 1 / 2 there exists a distribution D ε,δ on R k × d for k = O ( ε − 2 log(1 /δ )) so that for any x of unit norm � > ε �� Sx � 2 � � Pr 2 − 1 < δ. S ∼D ε,δ Proof of MJL: Set δ = 1 / n 2 in DJL and x as the difference vector � n � of some pair of points. Union bound over the pairs. 2 Theorem (Alon, 2003) For every n, there exists a set of n points requiring target dimension k = Ω(( ε − 2 / log(1 /ε )) log n ) . Theorem (Jayram-Woodruff, 2011; Kane-Meka-N., 2011) For DJL, k = Θ( ε − 2 log(1 /δ )) is optimal.

Proving the JL lemma Older proofs • [Johnson-Lindenstrauss, 1984], [Frankl-Maehara, 1988]: Random rotation, then projection onto first k coordinates. • [Indyk-Motwani, 1998], [Dasgupta-Gupta, 2003]: Random matrix with independent Gaussian entries. • [Achlioptas, 2001]: Independent ± 1 entries. • [Clarkson-Woodruff, 2009]: O (log(1 /δ ))-wise independent ± 1 entries. • [Arriaga-Vempala, 1999], [Matousek, 2008]: Independent entries having mean 0, variance 1 / k , and subGaussian tails

Proving the JL lemma Older proofs • [Johnson-Lindenstrauss, 1984], [Frankl-Maehara, 1988]: Random rotation, then projection onto first k coordinates. • [Indyk-Motwani, 1998], [Dasgupta-Gupta, 2003]: Random matrix with independent Gaussian entries. • [Achlioptas, 2001]: Independent ± 1 entries. • [Clarkson-Woodruff, 2009]: O (log(1 /δ ))-wise independent ± 1 entries. • [Arriaga-Vempala, 1999], [Matousek, 2008]: Independent entries having mean 0, variance 1 / k , and subGaussian tails Downside: Performing embedding is dense matrix-vector multiplication, O ( k · � x � 0 ) time

Fast JL Transforms • [Ailon-Chazelle, 2006]: x �→ PHDx , O ( d log d + k 3 ) time P is a random sparse matrix, H is Hadamard, D has random ± 1 on diagonal • [Ailon-Liberty, 2008]: O ( d log k + k 2 ) time, also based on fast Hadamard transform • [Ailon-Liberty, 2011] and [Krahmer-Ward, 2011]: O ( d log d ) for MJL, but with suboptimal k = O ( ε − 2 log n log 4 d ).

Fast JL Transforms • [Ailon-Chazelle, 2006]: x �→ PHDx , O ( d log d + k 3 ) time P is a random sparse matrix, H is Hadamard, D has random ± 1 on diagonal • [Ailon-Liberty, 2008]: O ( d log k + k 2 ) time, also based on fast Hadamard transform • [Ailon-Liberty, 2011] and [Krahmer-Ward, 2011]: O ( d log d ) for MJL, but with suboptimal k = O ( ε − 2 log n log 4 d ). Downside: Slow to embed sparse vectors: running time is Ω(min { k · � x � 0 , d log d } ).

Where Do Sparse Vectors Show Up? • Document as bag of words: x i = number of occurrences of word i . Compare documents using cosine similarity. d = lexicon size; most documents aren’t dictionaries • Network traffic: x i , j = #bytes sent from i to j d = 2 64 (2 256 in IPv6); most servers don’t talk to each other • User ratings: x i is user’s score for movie i on Netflix d = # movies ; most people haven’t rated all movies • Streaming: x receives a stream of updates of the form: “add v to x i ”. Maintaining Sx requires calculating v · Se i . • . . .

Sparse JL transforms One way to embed sparse vectors faster: use sparse matrices.

Sparse JL transforms One way to embed sparse vectors faster: use sparse matrices. s = #non-zero entries per column in embedding matrix (so embedding time is s · � x � 0 ) reference value of s type k ≈ 4 ε − 2 log(1 /δ ) [JL84], [FM88], [IM98], . . . dense [Achlioptas01] k / 3 sparse Bernoulli [WDALS09] no proof hashing O ( ε − 1 log 3 (1 /δ )) ˜ [DKS10] hashing O ( ε − 1 log 2 (1 /δ )) ˜ [KN10a], [BOR10] ” O ( ε − 1 log(1 /δ )) [KN12] hashing (random codes)

Other related work • CountSketch of [Charikar-Chen-FarachColton] gives s = O (log(1 /δ )) (see [Thorup-Zhang])

Other related work • CountSketch of [Charikar-Chen-FarachColton] gives s = O (log(1 /δ )) (see [Thorup-Zhang]) • Can recover (1 ± ε ) � x � 2 from Sx , but not as � Sx � 2 (not an embedding into ℓ 2 ) • Not applicable in certain situations, e.g. in some nearest neighbor data structures, and when learning classifiers over projected vectors via stochastic gradient descent

Sparse JL Constructions Θ( ε − 1 log 2 (1 /δ )) [DKS, 2010] s = ˜

Sparse JL Constructions [DKS, 2010] Θ( ε − 1 log 2 (1 /δ )) s = ˜ [this work] s = Θ( ε − 1 log(1 /δ ))

Sparse JL Constructions Θ( ε − 1 log 2 (1 /δ )) [DKS, 2010] s = ˜ s = Θ( ε − 1 log(1 /δ )) [this work] s = Θ( ε − 1 log(1 /δ )) [this work] k/s

Sparse JL Constructions (in matrix form) 0 = 0 k 0 0 0 = 0 k/s k 0 0 Each black cell is ± 1 / √ s at random

Sparse JL Constructions (nicknames) “Graph” construction “Block” construction k/s

Sparse JL notation (block construction) • Let h ( j , r ) , σ ( j , r ) be random hash location and random sign for copy of x j in r th block.

Sparse JL notation (block construction) • Let h ( j , r ) , σ ( j , r ) be random hash location and random sign for copy of x j in r th block. • ( Sx ) i = (1 / √ s ) · � h ( j , r )= i x j · σ ( j , r )

Sparse JL notation (block construction) • Let h ( j , r ) , σ ( j , r ) be random hash location and random sign for copy of x j in r th block. • ( Sx ) i = (1 / √ s ) · � h ( j , r )= i x j · σ ( j , r ) s 2 + 1 � Sx � 2 2 = � x � 2 � � s · x i x j σ ( i , r ) σ ( j , r ) · 1 h ( i , r )= h ( j , r ) r =1 i � = j

Sparse JL via Codes 0 = 0 k 0 0 0 = 0 k/s k 0 0 • Graph construction: Constant weight binary code of weight s . • Block construction: Code over q -ary alphabet, q = k / s .

Sparser Johnson-Lindenstrauss Transforms Jelani Nelson Princeton - PowerPoint PPT Presentation

Sparser Johnson-Lindenstrauss Transforms Jelani Nelson Princeton February 16, 2012 joint work with Daniel Kane (Stanford) Random Projections x R d , d huge store y = Sx , where S is a k d matrix (compression) Random Projections

Faster Johnson-Lindenstrauss style reductions Aditya Menon August 23, 2007 Faster

Sparse Johnson-Lindenstrauss Transforms Jelani Nelson MIT May 24, 2011 joint work with Daniel

Random Projections Instructor: Sham Kakade 1 The Johnson-Lindenstrauss lemma Theorem 1.1.

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

Lipschitz Quotients [S. Bates], W.B.J., J. Lindenstrauss, D. Preiss, G. Schechtman Background

JUST THE MATHS SLIDES NUMBER 16.7 LAPLACE TRANSFORMS 7 (An appendix) by A.J.Hobson One

Drawing on the Web CSS CSCI-UA 380 Transforms, Transitions, and Animation Drawing on the Web

The Johnson-Lindenstrauss Lemma in Linear Programming Leo Liberti , Vu Khac Ky, Pierre-Louis

Sharing Clinical Trial Data at Johnson & Johnson Dr. Joanne Waldstreicher Chief Medical

Research that Transforms Healthcare and Transforms Lives Dianne Morrison-Beedy, PhD, RN, WHNP-BC,

Week 5 -Wednesday What did we talk about last time? Transforms Translation Rotation

F F Fast Transforms using the Cell/B.E. Processor Fast Transforms using the Cell/B.E. Processor

M- -Channel Filter Banks: Channel Filter Banks: M Block and Lapped Transforms Block and Lapped

JUST THE MATHS SLIDES NUMBER 16.8 Z-TRANSFORMS 1 (Definition and rules) by A.J.Hobson

Learning From Data Lecture 10 Nonlinear Transforms The Z -space Polynomial transforms Be

Algorithms for Lattice Transforms and 2348 2349 Lattice Transforms for 234 248 239

Outline Applications of Random Networks Random Networks Applications of Random Networks

ChIP-seq analysis Morgane Thomas-Chollier Samuel Collombet

Inferring Hierarchical Motifs from Execution Traces Saba Alimadadi , Ali Mesbah, Karthik

TERTIARY MOTIF INTERACTIONS ON RNA STRUCTURE Bioinformatics Senior Project Wasay Hussain Spring

Blue Bible pg 1242 Ephesians 3:1-13 Blue Bible pg 1242 Ephesians 3:1 13 ESV 1 For this reason

DESIGNER K3 SURFACES BRENDAN HASSETT We will focus our attention on a limited part of the story.

One-Slide Summary Inheritance and Godel's Proof Inheritance allows a subclass to share

DC English IV World/British Literature Teacher: Mr. Smith, room 1217 contact information e:

Sparser Johnson-Lindenstrauss Transforms Jelani Nelson Princeton - PowerPoint PPT Presentation

Sparser Johnson-Lindenstrauss Transforms Jelani Nelson Princeton February 16, 2012 joint work with Daniel Kane (Stanford) Random Projections x R d , d huge store y = Sx , where S is a k d matrix (compression) Random Projections

Faster Johnson-Lindenstrauss style reductions Aditya Menon August 23, 2007 Faster

Sparse Johnson-Lindenstrauss Transforms Jelani Nelson MIT May 24, 2011 joint work with Daniel

Random Projections Instructor: Sham Kakade 1 The Johnson-Lindenstrauss lemma Theorem 1.1.

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

Lipschitz Quotients [S. Bates], W.B.J., J. Lindenstrauss, D. Preiss, G. Schechtman Background

JUST THE MATHS SLIDES NUMBER 16.7 LAPLACE TRANSFORMS 7 (An appendix) by A.J.Hobson One

Drawing on the Web CSS CSCI-UA 380 Transforms, Transitions, and Animation Drawing on the Web

The Johnson-Lindenstrauss Lemma in Linear Programming Leo Liberti , Vu Khac Ky, Pierre-Louis

Sharing Clinical Trial Data at Johnson &amp; Johnson Dr. Joanne Waldstreicher Chief Medical

Research that Transforms Healthcare and Transforms Lives Dianne Morrison-Beedy, PhD, RN, WHNP-BC,

Week 5 -Wednesday What did we talk about last time? Transforms Translation Rotation

F F Fast Transforms using the Cell/B.E. Processor Fast Transforms using the Cell/B.E. Processor

M- -Channel Filter Banks: Channel Filter Banks: M Block and Lapped Transforms Block and Lapped

JUST THE MATHS SLIDES NUMBER 16.8 Z-TRANSFORMS 1 (Definition and rules) by A.J.Hobson

Learning From Data Lecture 10 Nonlinear Transforms The Z -space Polynomial transforms Be

Algorithms for Lattice Transforms and 2348 2349 Lattice Transforms for 234 248 239

Outline Applications of Random Networks Random Networks Applications of Random Networks

ChIP-seq analysis Morgane Thomas-Chollier Samuel Collombet

Inferring Hierarchical Motifs from Execution Traces Saba Alimadadi , Ali Mesbah, Karthik

TERTIARY MOTIF INTERACTIONS ON RNA STRUCTURE Bioinformatics Senior Project Wasay Hussain Spring

Blue Bible pg 1242 Ephesians 3:1-13 Blue Bible pg 1242 Ephesians 3:1 13 ESV 1 For this reason

DESIGNER K3 SURFACES BRENDAN HASSETT We will focus our attention on a limited part of the story.

One-Slide Summary Inheritance and Godel's Proof Inheritance allows a subclass to share

DC English IV World/British Literature Teacher: Mr. Smith, room 1217 contact information e:

Sharing Clinical Trial Data at Johnson & Johnson Dr. Joanne Waldstreicher Chief Medical