sketching as a tool for algorithmic design
play

Sketching as a tool for Algorithmic Design Alex Andoni (Columbia - PowerPoint PPT Presentation

Sketching as a tool for Algorithmic Design Alex Andoni (Columbia University) Find similar pairs Methodology ? Small space algorithms Sketching Fast algorithms 000000 000000 dimension 011100 011100 010100 010100 reduction 000100


  1. Sketching as a tool for Algorithmic Design Alex Andoni (Columbia University)

  2. Find similar pairs

  3. Methodology ? Small space algorithms Sketching Fast algorithms 000000 000000 dimension 011100 011100 010100 010100 reduction 000100 000100 010100 010100 011111 011111 • compression • good for 000000 000000 specific task 001100 001100 111111 ≈ 000100 000100 • lossy 000100 000100 110100 110100 111111 [Johnson- Dimension reduction: linear map 𝑇: ℝ 𝑜 → ℝ 𝑙 s.t: Lindenstrauss’84]: for any points 𝑞, 𝑟 ∈ ℝ 𝑜 : • 𝑇 = Gaussian matrix ||𝑇 𝑞 −𝑇(𝑟)|| Pr ∈ 1 ± 𝜗 ≥ 1 − 𝜀 𝜗 2 log 1 1 ||𝑞−𝑟|| 𝑇 𝑙 = 𝑃 𝜀

  4. Plan a sketch of sketching applications…  Numerical Linear Algebra  Nearest Neighbor Search  Min-cost matching in plane 4

  5. Plan  Numerical Linear Algebra  the power of linear sketches  Nearest Neighbor Search  Min-cost matching in plane 5

  6. Numerical Linear Algebra 𝑦  Problem: Least Square Regression 𝐵 𝑇 − 𝑇 𝑐  𝑦 ∗ = 𝑏𝑠𝑕𝑛𝑗𝑜 𝑦 ||𝐵𝑦 − 𝑐||  where 𝐵 is 𝑜 × 𝑒 matrix  𝑜 ≫ 𝑒  1 + 𝜗 approximation 𝑇𝐵 𝑦 𝑇𝑐 −  Idea: Sketch-And-Solve  solve 𝑦 ′ = 𝑏𝑠𝑕𝑛𝑗𝑜 𝑦 ||𝑇 ⋅ 𝐵𝑦 − 𝑐 || = 𝑏𝑠𝑕𝑛𝑗𝑜 𝑦 ||𝑇𝐵𝑦 − 𝑇𝑐||  where 𝑇: ℝ 𝑜 → ℝ 𝑙 is a dimension-reducing matrix  reduces to much smaller 𝑙 × 𝑒 problem  Hope: ||𝐵𝑦 ′ − 𝑐|| ≤ 1 + 𝜗 ||𝐵𝑦 ∗ − 𝑐||

  7. Sketch-And-Solve [S’06, CW’13, NN’13, MM’13, C’16] Oblivious Subspace Embedding: linear map 𝑇: ℝ 𝑜 → ℝ 𝑙 s.t. 𝑙~𝑒 for any linear subspace 𝑄 ⊂ ℝ 𝑜 of dimension 𝑒 : • ||𝑇 𝑞 || Pr ∀𝑞 ∈ 𝑄 ∶ ||𝑞|| ∈ 1 ± 𝜗 ≥ 1 − 𝜀 𝑇  Issue: time to compute sketch  When 𝑇 =Gaussian ([JL]) ⇒ computing 𝑇𝐵 takes 𝑃(𝑜 ⋅ 𝑒 2 ) time  Idea: structured 𝑇 s.t. 𝑇𝐵 can be computed faster 𝑃 1 slower than the 𝑒  +structured 𝑇 : 𝑃 𝑜𝑜𝑨 𝐵 + time 𝜗 original problem ! 1 𝑜𝑜𝑨 𝐵 + 𝑒 𝑃 1  +Preconditioner: 𝑃 ⋅ log 𝜗

  8. ℓ 1 regression  No similar dimension reduction in ℓ 1 [BC’04,JN’09] Weak DR: linear map 𝑇: ℝ 𝑜 → ℝ 𝑙 , s.t. 𝑇 𝑗𝑘 ∼ Cauchy [I’00] distribution, or ||𝑇 𝑞 || 1 1 for any 𝑞 ∈ ℝ 𝑜 : Pr 1 ≤ ≤ 𝜀 ≥ 1 − 𝑃(𝜀) • ||𝑞|| 1 1 /Exponential 𝑇 Weak(er) OSE: linear map 𝑇: ℝ 𝑜 → ℝ 𝑙 s.t. [SW’11, 𝑙 = 𝑃(𝑒 ⋅ log 𝑒) MM’13, for any linear subspace 𝑄 ⊂ ℝ 𝑜 of dimension 𝑒 : • WZ’13, ||𝑇 𝑞 || 1 ≤ 𝑒 𝑃 1 Pr ∀𝑞 ∈ 𝑄 ∶ 1 ≤ ≥ 0.9 WW’18] ||𝑞|| 1 𝑇 𝑃 1 𝑒  +structured 𝑇 , +preconditioner: 𝑃 𝑜𝑜𝑨 𝐵 ⋅ log 𝑜 + 𝜗  More: other norms ( ℓ 𝑞 , M-estimator, Orlicz norms), low-rank approximation & optimization, matrix multiplication, see [Woodruff, FnTTCS’14,…]

  9. Plan  Numerical Linear Algebra  Nearest Neighbor Search  ultra-small sketches  Min-cost matching in plane 9

  10. Approximate Near Neighbor Search  Preprocess: a set of 𝑂 point  approximation 𝑑 > 1  Query: given a query point 𝑟 , report a point 𝑞 ∗ ∈ 𝑄 with the smallest distance to 𝑟  up to factor 𝑑 𝑠 𝑞 ∗ 𝑑𝑠 𝑟  Near neighbor: threshold 𝑠 𝑞 ′  Parameters: space & query time 10

  11. Ultra-small sketches (𝑑, 𝜀, 𝑙) - Distance Estimation Sketch: for approx 𝑑 , & all thresholds 𝑠 map 𝑇: ℝ 𝑒 → {0,1} 𝑙 , estimator 𝑭(⋅,⋅) , s.t. for any 𝑞, 𝑟 ∈ ℝ 𝑒 : DE sketch ||𝑞 − 𝑟|| ≤ 𝑠 , then Pr 𝑇 𝑭 𝑇 𝑞 , 𝑇 𝑟 = "𝑑𝑚𝑝𝑡𝑓" ≥ 1 − 𝜀 • ||𝑞 − 𝑟|| > 𝑑𝑠 , then Pr 𝑇 𝑭 𝑇 𝑞 , 𝑇 𝑟 = "𝑑𝑚𝑝𝑡𝑓" ≤ 𝜀 • const # of bits! 1  [KOR’98,IM’98]: ℓ 2 , ℓ 1 have 1 + 𝜗, 0.1, 𝑃 -DE 𝜗 2 000000 sketches 011100 010100 000100  Via: bit sampling (Hamming), 010100 011111  or discretizing dimension reduction 11

  12. DE Sketch => NNS [KOR’98,IM’98]: (𝑑, 1/3, 𝑙) -DES imply 𝑑 -approx NNS with space 𝑂 𝑃 𝑙 and 1 memory probe per query Proof: construct a sketch with failure probability 1/𝑂  by concatenating 𝑃 log 𝑂 i.i.d. copies of the sketch, and taking majority vote  Data structure: a look-up table for all possible sketches of a query: 2 𝑃 𝑙⋅log 𝑂 = 𝑂 𝑃 𝑙 possibilities only Const size DES => NNS with polynomial space!  Query time: computing the sketch, typically ~ 𝑃(𝑙𝑒 log 𝑂) [see also AC’06] [AK+ANNRW’18]: (𝑑, 0.1, 𝑙) -DES implies NNS with 𝑃(𝑑𝑙) -approximation and 𝑃(𝑂 1.1 ) space, 𝑃 𝑂 0.1 memory probes per query 12

  13. Beyond ℓ 1 and ℓ 2 𝜷 -embedding of metric 𝒀 into ℓ 𝟐 : for distortion 𝐸 , power 𝛽 ≥ 1 : map 𝑔: 𝑌 → ℓ 1 , s.t. for any 𝑞, 𝑟 ∈ 𝑌 : ||𝑔 𝑞 − 𝑔 𝑟 || 𝛽 ≤ 𝑒𝑗𝑡𝑢 𝑌 𝑞, 𝑟 ≤ 𝐸 ⋅ ||𝑔 𝑞 − 𝑔 𝑟 || 𝛽 • Embedding with 𝐸 = 𝑑 NNS 𝑃 𝑑 , 0.1, 𝑃 1 -DES [AKR’15]: when 𝑌 is a norm: Embedding with 𝐸 = 𝑃(𝑑𝑙) 𝑃 𝑑 , 0.1, 𝑙 -DES OPEN: if 𝛽 = 1 achievable Not true for general 𝑌 [KN] 13

  14. NNS with smaller space?  Space closer to linear in 𝑂 ? LSH Sketch: for approx 𝑑 , & ∀ thresholds 𝑠 map 𝑇: ℝ 𝑒 → {0,1} 𝑙 , estimator 𝑭(⋅,⋅) , s.t. for any 𝑞, 𝑟 ∈ ℝ 𝑒 : (𝑑, 𝜍, 𝑙) - LSH = "𝑑𝑚𝑝𝑡𝑓" ≥ 2 −𝜍𝑙 ||𝑞 − 𝑟|| ≤ 𝑠 , then Pr 𝑇 𝑭 𝑇 𝑞 , 𝑇 𝑟 • = "𝑑𝑚𝑝𝑡𝑓" ≤ 2 −𝑙+1 ||𝑞 − 𝑟|| > 𝑑𝑠 , then Pr 𝑇 𝑭 𝑇 𝑞 , 𝑇 𝑟 • • 𝐹 𝜏, 𝜐 = "𝑑𝑚𝑝𝑡𝑓“ iff 𝜏 = 𝜐 [IM’98]: (𝑑, 𝜍, 𝑙) -LSH imply 𝑑 -approx NNS with 𝑃(𝑂 1+𝜍 ) space and 𝑃 𝑂 𝜍 memory probes per query [IM’98]: 𝜍 = 1/𝑑 for ℓ 1 14

  15. Plan  Numerical Linear Algebra  Nearest Neighbor Search  Min-cost matching in plane  specialized sketches  Exploit sketches for:  input  internal state / partial computations Computation 15

  16. LP for Geometric Matching  Problem:  Given two sets 𝐵, 𝐶 of points in ℝ 2 ,  Find min-cost matching ( 1 + 𝜗 approx.)  a.k.a., Earth-Mover Distance, optimal transport, Wasserstein metric, etc  Classically: LP with 𝑜 2 variables  General: ෨ 𝑃(𝑜 2 /𝜗 4 ) time [AWR’17] min 𝑜2 ෍ ||𝑞 𝑗 − 𝑟 𝑘 || ⋅ 𝜌 𝑗𝑘 𝜌∈ℝ +  In 2D: hope for ≈ 𝑜 time [SA’12] 𝑗𝑘 1 1 𝑜 𝟐 and 𝜌 𝑢 𝟐 = s.t. 𝜌𝟐 = 𝑜 𝟐 [ANOY’14]: Solve-And-Sketch framework Solves in 𝑜 1+𝑝(1) time (for fixed 𝜗 ) 16

  17. Solve-And-Sketch (=Divide & Conquer)  Partition the space hierarchically in a “nice way”  In each part  Compute a “solution” for the local view  Sketch the solution using small space  Combine local sketches into (more) global solution 17

  18. Solve-And-Sketch for 2D Matching  Partition the space hierarchically in a “nice way”  In each part all potential local solutions quad-tree  Compute a “solution” for the local view  Sketch the solution using small space  Combine local sketches into (more) global solution cannot precompute any “local solution” Sketch of all potential local solutions: Small- space sketch of the “solution” function 𝐺: ℝ 𝑙 → ℝ + • input 𝑦 ∈ ℝ 𝑙 defines the flow (matching) at the “interface” to the rest Exists with after committing to a wrong alternation, • 𝐺(𝑦) is the min-cost matching assuming flow 𝑦 at polylog( n ) space cannot get <2 approximation! interface

  19. Sketching Fast algorithms A sketch of the rest  Numerical Linear Algebra  linear sketching  Nearest Neighbor Search  ultra-small sketches  Min-cost matching in plane  specialized sketching  Graph sketching  Linear sketch for graph => data structures for dynamic connectivity [AGM’12, KKM’13]  Characterization of DE-sketch size for metrics:  For symmetric norms [BBCKY’17]  Adaptive sketching: when we know we sketch set 𝐵 ⊂ ℝ 𝑒  Then 𝑇 ⋅ may depend (weakly) on 𝐵  Non-oblivious subspace embeddings [DMM’06,…, Woodruff’14]  Data-dependent LSH [AINR’14, AR’15] 19

  20. Bibliography 1  Sarlos’06  Clarkson- Woodruff’13,  Nguyen- Nelson’13,  Mahoney- Meng’13,  Cohen’16  Indyk’00  Sohler- Woodruff’11  Woodruff- Zhang’13  Wang- Woodruff’18 ( arxiv) 20

  21. Bibliography 2  Kushilevitz-Ostrovsky- Rabani’98  Indyk- Motwani’98  Ailon- Chazelle’06  Khot-Naor (unpublished)  A-Krauthgamer (unpublished)  A-Naor-Nikolov-Razenshteyn- Weingarten’18  Altschuler-Weed- Rigolet’17  Sharathkumar- Agarwal’12  A.-Nikolov-Onak- Yaroslavtsev’14  Ahn-Guha- McGregor’12  Kapron-King- Mountjoy’13  Blasiok-Braverman-Chestnut-Krauthgamer- Yang’17  Drineas-Mahoney- Muthukrishnan’06  A-Indyk-Nguyen- Razenshteyn’14  A- Razenshteyn’15 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend