Rehashing Kernel Evaluation in High Dimensions Paris Siminelakis* - - PowerPoint PPT Presentation

rehashing kernel evaluation in high dimensions
SMART_READER_LITE
LIVE PREVIEW

Rehashing Kernel Evaluation in High Dimensions Paris Siminelakis* - - PowerPoint PPT Presentation

Intro Contribution Sketching Diagnostics Evaluation Conclusion Rehashing Kernel Evaluation in High Dimensions Paris Siminelakis* Ph.D. Candidate Kexin Rong*, Peter Bailis, Moses Charikar, Phillip Levis (Stanford University) ICML @ Long


slide-1
SLIDE 1

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Rehashing Kernel Evaluation in High Dimensions

Paris Siminelakis*

Ph.D. Candidate

Kexin Rong*, Peter Bailis, Moses Charikar, Phillip Levis (Stanford University)

ICML @ Long Beach, California

* equal contribution.

June 11, 2019

slide-2
SLIDE 2

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Kernel Density Function

P = {x1, . . . , xn} ⊂ Rd, k : Rd × Rd → R+, u ≥ 0, query point q KDFu

P(q) = n

  • i=1

uik(xi, q) n points

slide-3
SLIDE 3

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Kernel Density Function

P = {x1, . . . , xn} ⊂ Rd, k : Rd × Rd → R+, u ≥ 0, query point q KDFu

P(q) = n

  • i=1

uik(xi, q) n points kernel k

slide-4
SLIDE 4

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Kernel Density Function

P = {x1, . . . , xn} ⊂ Rd, k : Rd × Rd → R+, u ≥ 0, query point q KDFu

P(q) = n

  • i=1

uik(xi, q) n points kernel k

slide-5
SLIDE 5

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Kernel Density Function

P = {x1, . . . , xn} ⊂ Rd, k : Rd × Rd → R+, u ≥ 0, query point q KDFP(q) =

n

  • i=1

1 n

  • k(xi, q)

n points kernel k

slide-6
SLIDE 6

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Kernel Density Evaluation

P = {x1, . . . , xn} ⊂ Rd, k : Rd × Rd → R+, u ≥ 0, query point q KDFu

P(q) = n

  • i=1

uik(xi, q) Where is it used?

1 Non-parametric density estimation KDFP(q) 2 Kernel methods f (x) = i αiφ(x − xi) 3 Comparing point sets (distributions) with “Kernel Distance”

slide-7
SLIDE 7

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Kernel Density Evaluation

P = {x1, . . . , xn} ⊂ Rd, k : Rd × Rd → R+, u ≥ 0, query point q KDFu

P(q) = n

  • i=1

uik(xi, q) Where is it used?

1 Non-parametric density estimation KDFP(q) 2 Kernel methods f (x) = i αiφ(x − xi) 3 Comparing point sets (distributions) with “Kernel Distance”

slide-8
SLIDE 8

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Kernel Density Evaluation

P = {x1, . . . , xn} ⊂ Rd, k : Rd × Rd → R+, u ≥ 0, query point q KDFu

P(q) = n

  • i=1

uik(xi, q) Where is it used?

1 Non-parametric density estimation KDFP(q) 2 Kernel methods f (x) = i αiφ(x − xi) 3 Comparing point sets (distributions) with “Kernel Distance”

slide-9
SLIDE 9

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Kernel Density Evaluation

P = {x1, . . . , xn} ⊂ Rd, k : Rd × Rd → R+, u ≥ 0, query point q KDFu

P(q) = n

  • i=1

uik(xi, q) Where is it used?

1 Non-parametric density estimation KDFP(q) 2 Kernel methods f (x) = i αiφ(x − xi) 3 Comparing point sets (distributions) with “Kernel Distance”

slide-10
SLIDE 10

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Kernel Density Evaluation

P = {x1, . . . , xn} ⊂ Rd, k : Rd × Rd → R+, u ≥ 0, query point q KDFu

P(q) = n

  • i=1

uik(xi, q) Where is it used?

1 Non-parametric density estimation KDFP(q) 2 Kernel methods f (x) = i αiφ(x − xi) 3 Comparing point sets (distributions) with “Kernel Distance”

Evaluating at a single point requires O(n)

slide-11
SLIDE 11

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Kernel Density Evaluation

P = {x1, . . . , xn} ⊂ Rd, k : Rd × Rd → R+, u ≥ 0, query point q KDFu

P(q) = n

  • i=1

uik(xi, q) Where is it used?

1 Non-parametric density estimation KDFP(q) 2 Kernel methods f (x) = i αiφ(x − xi) 3 Comparing point sets (distributions) with “Kernel Distance”

How fast can we approximate KDF?

slide-12
SLIDE 12

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Methods for Fast Kernel Evaluation

P ⊂ Rd, ǫ > 0 ⇒ (1 ± ǫ)-approx to µ := KDFP(q) for any q ∈ Rd

slide-13
SLIDE 13

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Methods for Fast Kernel Evaluation

P ⊂ Rd, ǫ > 0 ⇒ (1 ± ǫ)-approx to µ := KDFP(q) for any q ∈ Rd Space Partitions log(1/µǫ)O(d) FMM [Greengard, Rokhlin’87] Dual-Tree [Lee, Gray, Moore’06] FIG-Tree [Moriaru et al. NeurIPS’09] Slow in high dim

slide-14
SLIDE 14

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Methods for Fast Kernel Evaluation

P ⊂ Rd, ǫ > 0 ⇒ (1 ± ǫ)-approx to µ := KDFP(q) for any q ∈ Rd Space Partitions log(1/µǫ)O(d)

img: computer.org

Slow in high dim

slide-15
SLIDE 15

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Methods for Fast Kernel Evaluation

P ⊂ Rd, ǫ > 0 ⇒ (1 ± ǫ)-approx to µ := KDFP(q) for any q ∈ Rd Space Partitions log(1/µǫ)O(d)

img: computer.org

Slow in high dim Random Sampling 1/µǫ2 Linear in 1/µ

slide-16
SLIDE 16

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Methods for Fast Kernel Evaluation

P ⊂ Rd, ǫ > 0 ⇒ (1 ± ǫ)-approx to µ := KDFP(q) for any q ∈ Rd Space Partitions log(1/µǫ)O(d)

img: computer.org

Slow in high dim Hashing O(1/√µǫ2) Hashing-Based- Estimators [Charikar, S’17] Similar idea: Locality Senstive Samplers [Spring, Shrivastava ’17] Sub-linear in 1/µ Random Sampling 1/µǫ2 Linear in 1/µ

slide-17
SLIDE 17

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Methods for Fast Kernel Evaluation

P ⊂ Rd, ǫ > 0 ⇒ (1 ± ǫ)-approx to µ := KDFP(q) for any q ∈ Rd Space Partitions log(1/µǫ)O(d)

img: computer.org

Slow in high dim Hashing O(1/√µǫ2) Importance Sampling via Randomized Space Partitions Sub-linear in 1/µ Random Sampling 1/µǫ2 Linear in 1/µ

slide-18
SLIDE 18

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Randomized Space Partitions

Distribution H over partitions h : Rd → [M]

slide-19
SLIDE 19

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Randomized Space Partitions

Distribution H over partitions h : Rd → [M] h1 h4 h2 h5 h3 h6

slide-20
SLIDE 20

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Locality Sensitive Hashing

Partitions H such Ph∼H[h(x) = h(y)] = p(x − y) Euclidean LSH [Datar, Immorlika, Indyk, Mirrokni’04] Concatenate k hashes pk(x − y)

slide-21
SLIDE 21

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Hashing-Based-Estimators

[Charikar, S. FOCS’17] Preprocess: Sample h1, . . . , hm ∼ H and evaluate on P Query: Ht(q) hash-bucket for q in table t

slide-22
SLIDE 22

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Hashing-Based-Estimators

[Charikar, S. FOCS’17] Preprocess: Sample h1, . . . , hm ∼ H and evaluate on P Query: Ht(q) hash-bucket for q in table t

slide-23
SLIDE 23

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Hashing-Based-Estimators

[Charikar, S. FOCS’17] Preprocess: Sample h1, . . . , hm ∼ H and evaluate on P Query: Ht(q) hash-bucket for q in table t · · ·

slide-24
SLIDE 24

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Hashing-Based-Estimators

[Charikar, S. FOCS’17] Preprocess: Sample h1, . . . , hm ∼ H and evaluate on P Query: Ht(q) hash-bucket for q in table t · · · Estimator: Sample random point X t from Ht(q) and return: Zm = 1 m

m

  • t=1

1 n k(X t, q) p(X t, q)/|Ht(q)|

slide-25
SLIDE 25

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Hashing-Based-Estimators

[Charikar, S. FOCS’17] Preprocess: Sample h1, . . . , hm ∼ H and evaluate on P Query: Ht(q) hash-bucket for q in table t · · · Estimator: Sample random point X t from Ht(q) and return: Zm = 1 m

m

  • t=1

1 n k(X t, q) p(X t, q)/|Ht(q)| How many samples m? which LSH?

slide-26
SLIDE 26

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Hashing-Based-Estimators have Practical Limitations

Theorem [Charikar, S. FOCS’17] For certain kernels HBE solves the kernel evaluation problem for µ ≥ τ using O(1/√µǫ2) samples and O(n/√τǫ2) space. Kernel LSH Overhead e−x−y2 Ball Carving [Andoni, Indyk’06] e ˜

O(log

2 3 (n))

e−x−y Euclidean [Datar et al’04] √e

1 1+x−yt

2

Euclidean [Datar et al’04] 3t/2

slide-27
SLIDE 27

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Hashing-Based-Estimators have Practical Limitations

Theorem [Charikar, S. FOCS’17] For certain kernels HBE solves the kernel evaluation problem for µ ≥ τ using O(1/√µǫ2) samples and O(n/√τǫ2) space. Practical Limitations:

1 Super-linear Space ⇒ Not practical for massive datasets 2 Uses Adaptive procedure to estim. number of samples:

⇒ large-constant + stringent requirements on hash functions.

3 Gaussian kernel Ball-Carving LSH very slow e ˜ O(log

2 3 (n))

slide-28
SLIDE 28

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Hashing-Based-Estimators have Practical Limitations

Theorem [Charikar, S. FOCS’17] For certain kernels HBE solves the kernel evaluation problem for µ ≥ τ using O(1/√µǫ2) samples and O(n/√τǫ2) space. Practical Limitations:

1 Super-linear Space ⇒ Not practical for massive datasets 2 Uses Adaptive procedure to estim. number of samples:

⇒ large-constant + stringent requirements on hash functions.

3 Gaussian kernel Ball-Carving LSH very slow e ˜ O(log

2 3 (n))

slide-29
SLIDE 29

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Hashing-Based-Estimators have Practical Limitations

Theorem [Charikar, S. FOCS’17] For certain kernels HBE solves the kernel evaluation problem for µ ≥ τ using O(1/√µǫ2) samples and O(n/√τǫ2) space. Practical Limitations:

1 Super-linear Space ⇒ Not practical for massive datasets 2 Uses Adaptive procedure to estim. number of samples:

⇒ large-constant + stringent requirements on hash functions.

3 Gaussian kernel Ball-Carving LSH very slow e ˜ O(log

2 3 (n))

slide-30
SLIDE 30

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Hashing-Based-Estimators have Practical Limitations

Theorem [Charikar, S. FOCS’17] For certain kernels HBE solves the kernel evaluation problem for µ ≥ τ using O(1/√µǫ2) samples and O(n/√τǫ2) space. Practical Limitations:

1 Super-linear Space ⇒ Not practical for massive datasets 2 Uses Adaptive procedure to estim. number of samples:

⇒ large-constant + stringent requirements on hash functions.

3 Gaussian kernel Ball-Carving LSH very slow e ˜ O(log

2 3 (n))

Q: Practical HBE + preserve theoretical guarantees?

slide-31
SLIDE 31

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Overcoming practical Limitations of HBE

[Charikar, S. FOCS’17] Practical Limitations:

1 super-linear space! 2 Adaptive procedure has

large constant overhead.

3 Gaussian Kernel

Ball-Carving LSH is slow. [This work ICML’19] Resolve by:

1 Sketching (sub-linear space) 2 Improved Adaptive

procedure + New Analysis

3 Practical HBE for Gaussian

Kernel via Eulcidean LSH

slide-32
SLIDE 32

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Overcoming practical Limitations of HBE

[Charikar, S. FOCS’17] Practical Limitations:

1 super-linear space! 2 Adaptive procedure has

large constant overhead.

3 Gaussian Kernel

Ball-Carving LSH is slow. [This work ICML’19] Resolve by:

1 Sketching (sub-linear space) 2 Improved Adaptive

procedure + New Analysis

3 Practical HBE for Gaussian

Kernel via Eulcidean LSH

slide-33
SLIDE 33

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Overcoming practical Limitations of HBE

[Charikar, S. FOCS’17] Practical Limitations:

1 super-linear space! 2 Adaptive procedure has

large constant overhead.

3 Gaussian Kernel

Ball-Carving LSH is slow. [This work ICML’19] Resolve by:

1 Sketching (sub-linear space) 2 Improved Adaptive

procedure + New Analysis

3 Practical HBE for Gaussian

Kernel via Eulcidean LSH

slide-34
SLIDE 34

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Overcoming practical Limitations of HBE

[Charikar, S. FOCS’17] Practical Limitations:

1 super-linear space! 2 Adaptive procedure has

large constant overhead.

3 Gaussian Kernel

Ball-Carving LSH is slow. [This work ICML’19] Resolve by:

1 Sketching (sub-linear space) 2 Improved Adaptive

procedure + New Analysis

3 Practical HBE for Gaussian

Kernel via Eulcidean LSH

slide-35
SLIDE 35

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Overcoming practical Limitations of HBE

[Charikar, S. FOCS’17] Practical Limitations:

1 super-linear space! 2 Adaptive procedure has

large constant overhead.

3 Gaussian Kernel

Ball-Carving LSH is slow. [This work ICML’19] Resolve by:

1 Sketching (sub-linear space) 2 Improved Adaptive

procedure + New Analysis

3 Practical HBE for Gaussian

Kernel via Eulcidean LSH [S.*, Rong*, Bailis, Charikar, Levis ICML’19] First Practical and Provably Accurate Algorithm for Gaussian Kernel in High Dimensions

slide-36
SLIDE 36

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Going back a step

Q1: Practical HBE + preserve theoretical guarantees?

slide-37
SLIDE 37

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Going back a step

Q1: Practical HBE + preserve theoretical guarantees? Yes: Sketching, Adaptive procedure, Euclidean LSH

slide-38
SLIDE 38

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Going back a step

Q1: Practical HBE + preserve theoretical guarantees? Yes: Sketching, Adaptive procedure, Euclidean LSH Q2: Is it always better to use?

slide-39
SLIDE 39

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Worst-case bounds can be misleading

Worst-case bounds do not always reflect reality Random Sampling good: O(1) samples bad: O(1/µ) samples

slide-40
SLIDE 40

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Going back a step

Q1: Practical HBE + preserve theoretical guarantees? Yes: Sketching, Adaptive procedure, Euclidean LSH Q2: Is it always better to use?

slide-41
SLIDE 41

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Going back a step

Q1: Practical HBE + preserve theoretical guarantees? Yes: Sketching, Adaptive procedure, Euclidean LSH Q2: Is it always better to use? No: worst-case insufficient to predict performance on a dataset.

slide-42
SLIDE 42

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Going back a step

Q1: Practical HBE + preserve theoretical guarantees? Yes: Sketching, Adaptive procedure, Euclidean LSH Q2: Is it always better to use? No: worst-case insufficient to predict performance on a dataset. [This work ICML’19] Diagnostic tools to estimate dataset-specific performance even without evaluating HBE

slide-43
SLIDE 43

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Outline of the rest of the talk

1 Sketching 2 Diagnostic tools 3 Experimental evaluation

slide-44
SLIDE 44

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Sketching

slide-45
SLIDE 45

Intro Contribution Sketching Diagnostics Evaluation Conclusion

How to sketch the KDF?

Recall: HBE samples a single point from each hash table. Goal: “simulate” HBE on full sample by applying on “Sketch”

slide-46
SLIDE 46

Intro Contribution Sketching Diagnostics Evaluation Conclusion

How to sketch the KDF?

Recall: HBE samples a single point from each hash table. Goal: “simulate” HBE on full sample by applying on “Sketch” Two approaches:

1 Random points:

⇒ some buckets might have 0 points in sketch.

2 point from each bucket:

⇒ might need a large number of points

slide-47
SLIDE 47

Intro Contribution Sketching Diagnostics Evaluation Conclusion

How to sketch the KDF?

Recall: HBE samples a single point from each hash table. Goal: “simulate” HBE on full sample by applying on “Sketch” Two approaches:

1 Random points:

⇒ some buckets might have 0 points in sketch.

2 point from each bucket:

⇒ might need a large number of points

slide-48
SLIDE 48

Intro Contribution Sketching Diagnostics Evaluation Conclusion

How to sketch the KDF?

Recall: HBE samples a single point from each hash table. Goal: “simulate” HBE on full sample by applying on “Sketch” Two approaches:

1 Random points:

⇒ some buckets might have 0 points in sketch.

2 point from each bucket:

⇒ might need a large number of points

slide-49
SLIDE 49

Intro Contribution Sketching Diagnostics Evaluation Conclusion

How to sketch the KDF?

Recall: HBE samples a single point from each hash table. Goal: “simulate” HBE on full sample by applying on “Sketch” Two approaches:

1 Random points:

⇒ some buckets might have 0 points in sketch.

2 point from each bucket:

⇒ might need a large number of points Idea: interpolate between uniform points vs uniform over buckets!

slide-50
SLIDE 50

Intro Contribution Sketching Diagnostics Evaluation Conclusion

How to sketch the KDF?

Recall: HBE samples a single point from each hash table. Goal: “simulate” HBE on full sample by applying on “Sketch” Two approaches:

1 Random points:

⇒ some buckets might have 0 points in sketch.

2 point from each bucket:

⇒ might need a large number of points Idea: interpolate between uniform points vs uniform over buckets! Solution: hashing+ non-uniform sampling

slide-51
SLIDE 51

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Sketching Kernel Density Function

Hashing-Based-Sketch (HBS): hashing + non-uniform sampling.

slide-52
SLIDE 52

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Sketching Kernel Density Function

Hashing-Based-Sketch (HBS): hashing + non-uniform sampling. Sample h0 evaluate on P

slide-53
SLIDE 53

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Sketching Kernel Density Function

Hashing-Based-Sketch (HBS): hashing + non-uniform sampling. Sample h0 evaluate on P S ← ∅. for j = 1, . . . , SketchSize: Sample bucket i prob. ∝ ni γ Sample a random point J from bucket i: S ← S ∪ {J} Weight it so that EJ[ˆ wJk(q, xJ)] ∝ KDFP(q) return (ˆ w, S)

slide-54
SLIDE 54

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Sketching Kernel Density Function

Hashing-Based-Sketch (HBS): hashing + non-uniform sampling. Sample h0 evaluate on P S ← ∅. for j = 1, . . . , SketchSize: Sample bucket i prob. ∝ ni γ Sample a random point J from bucket i: S ← S ∪ {J} Weight it so that EJ[ˆ wJk(q, xJ)] ∝ KDFP(q) return (ˆ w, S)

slide-55
SLIDE 55

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Sketching Kernel Density Function

Hashing-Based-Sketch (HBS): hashing + non-uniform sampling. Sample h0 evaluate on P S ← ∅. for j = 1, . . . , SketchSize: Sample bucket i prob. ∝ ni γ Sample a random point J from bucket i: S ← S ∪ {J} Weight it so that EJ[ˆ wJk(q, xJ)] ∝ KDFP(q) return (ˆ w, S)

slide-56
SLIDE 56

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Sketching Kernel Density Function

Hashing-Based-Sketch (HBS): hashing + non-uniform sampling. Sample h0 evaluate on P Theorem: O(1/τ) points suffice.

  • Approx. any density µ ≥ τ.

Reduce space from O(n/√τ) to O(1/ √ τ 3) Contains a point from any bucket with ≥ n · τ points

slide-57
SLIDE 57

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Sketching Kernel Density Function

Hashing-Based-Sketch (HBS): hashing + non-uniform sampling. Sample h0 evaluate on P Theorem: O(1/τ) points suffice.

  • Approx. any density µ ≥ τ.

Reduce space from O(n/√τ) to O(1/ √ τ 3) Contains a point from any bucket with ≥ n · τ points

slide-58
SLIDE 58

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Sketching Kernel Density Function

Hashing-Based-Sketch (HBS): hashing + non-uniform sampling. Sample h0 evaluate on P Theorem: O(1/τ) points suffice.

  • Approx. any density µ ≥ τ.

Reduce space from O(n/√τ) to O(1/ √ τ 3) Contains a point from any bucket with ≥ n · τ points

slide-59
SLIDE 59

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Sketching Kernel Density Function

Hashing-Based-Sketch (HBS): hashing + non-uniform sampling. Sample h0 evaluate on P Theorem: O(1/τ) points suffice.

  • Approx. any density µ ≥ τ.

Reduce space from O(n/√τ) to O(1/ √ τ 3) Contains a point from any bucket with ≥ n · τ points

slide-60
SLIDE 60

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Sketching Kernel Density Function

Hashing-Based-Sketch (HBS): hashing + non-uniform sampling. Sample h0 evaluate on P Theorem: O(1/τ) points suffice.

  • Approx. any density µ ≥ τ.

Reduce space from O(n/√τ) to O(1/ √ τ 3) Contains a point from any bucket with ≥ n · τ points Sub-linear space: e.g τ =

1 √n we get n5/4 → n3/4

slide-61
SLIDE 61

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Diagnostic tools

slide-62
SLIDE 62

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Variance of Unbiased Estimators

Unbiased estimators: Random Sampling, HBE Metric of interest is average relative variance: Eq∼P V[Z(q)] E[Z(q)]2

  • ∝ “Sample Complexity”

Diagnostic Procedure

1 Sample a number T of random queries from P. 2 For each ⇒ upper bound Relative Variance 3 Average for each method of interest over T queries.

Estimate mean and bound Variance

slide-63
SLIDE 63

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Bounding the variance

Variance is a “quadratic polynomial” of wi = k(q, xi) V[Z] ≤ 1 n2

n

  • i,j=1

w2

i Vij

slide-64
SLIDE 64

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Bounding the variance

Variance is a “quadratic polynomial” of wi = k(q, xi) V[Z] ≤ 1 n2

n

  • i,j=1

w2

i Vij

Random Sampling (RS) E[k2(q, X)] = 1 n2

n

  • i,j=1

w2

i

Vij = 1

slide-65
SLIDE 65

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Bounding the variance

Variance is a “quadratic polynomial” of wi = k(q, xi) V[Z] ≤ 1 n2

n

  • i,j=1

w2

i Vij

Random Sampling (RS) E[k2(q, X)] = 1 n2

n

  • i,j=1

w2

i

Vij = 1 HBE collision prob. p(x, y) E[Z 2] ≤ 1 n2

n

  • i,j=1

w2

i Vij

Vij = min{p(q,xi),p(q,xj)}

p2(q,xi)

slide-66
SLIDE 66

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Bounding the variance

Variance is a “quadratic polynomial” of wi = k(q, xi) V[Z] ≤ 1 n2

n

  • i,j=1

w2

i Vij

Random Sampling (RS) E[k2(q, X)] = 1 n2

n

  • i,j=1

w2

i

Vij = 1 HBE collision prob. p(x, y) E[Z 2] ≤ 1 n2

n

  • i,j=1

w2

i Vij

Vij = min{p(q,xi),p(q,xj)}

p2(q,xi)

Evaluating variance naively requires O(n) or O(n2) per query

slide-67
SLIDE 67

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Bounding the variance

Variance is a “quadratic polynomial” of wi = k(q, xi) V[Z] ≤ 1 n2

n

  • i,j=1

w2

i Vij

Random Sampling (RS) E[k2(q, X)] = 1 n2

n

  • i,j=1

w2

i

Vij = 1 HBE collision prob. p(x, y) E[Z 2] ≤ 1 n2

n

  • i,j=1

w2

i Vij

Vij = min{p(q,xi),p(q,xj)}

p2(q,xi)

Q: Efficient alternative?

slide-68
SLIDE 68

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Data-dependent Variance Bounds

Variance is a “quadratic polynomial” of wi = k(q, xi) V[Z] ≤ 1 n2

n

  • i,j=1

w2

i Vij

Decompose in 4 sets → For two sets Sℓ, Sℓ′:

  • i∈Sℓ,j∈Sℓ′

w2

i Vij

≤ sup

i∈Sℓ,j∈Sℓ′

{wi wj Vij}µℓµℓ′ (H¨

  • lder)

→ Diagnostic

1 bnd

4

2

  • terms

2 Evaluate on

subsample S0

3 Produced by

RS and Adapt. Algorithm

slide-69
SLIDE 69

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Evaluation

slide-70
SLIDE 70

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Algorithms for Kernel Evaluation

Random Sampling (RS): sensitive to range of kernel values (distances). Hashing-Based-Estimators (HBE): sensitive to “correlations” (dense distant clusters) [Charikar, S. FOCS’2017][This work ICML’2019] Fast Improved Gauss Transform (FIGTree): sensitive to #“clusters” (directions) at certain distance [Morariu,Srinivasan,Raykar, Duraiswami, Davis, NeurIPS’2009] Approximate Skeletonization via Treecodes (ASKIT) sensitive to “medium” distance scale/size clusters [March, Xiao, Biros, SIAM JSC 2015] Compare performance on Real-world datasets

slide-71
SLIDE 71

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Comparison on Real-world Datasets

HBE is consistently best or second-best method

slide-72
SLIDE 72

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Comparison on Real-world Datasets

HBE is consistently best or second-best method Diagnostic correctly (21/22) choses between RS and HBE

slide-73
SLIDE 73

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Comparison on Real-world Datasets

HBE is consistently best or second-best method

slide-74
SLIDE 74

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Comparison on Real-world Datasets

HBE is consistently best or second-best method

slide-75
SLIDE 75

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Comparison on Real-world Datasets

HBE is consistently best or second-best method

slide-76
SLIDE 76

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Comparison on Real-world Datasets

HBE is consistently best or second-best method

slide-77
SLIDE 77

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Benchmark Instances

Synthetic Benchmarks:

1 Worst-case: no single geometric aspect can be exploited! 2 D-clusters: gauge impact of different geometric aspects.

slide-78
SLIDE 78

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Worst-case Instances

Union of highly-clustered with uncorrelated points (fixed µ = 10−3, dimension d ∈ [10, 500], 100K queries)

10 50 100 200 500

# dimensions

10−2 10−1 100

Avg Query Time (s)

FigTree ASKIT RS HBE

“Worst-case” data sets HBE best ASKIT second best Instance d = 2

slide-79
SLIDE 79

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Instances with D clusters

Fix N = n · D = 500K, vary D ∈ [1, 105]

100 101 102 103 104 105

# clusters

10−4 10−3 10−2 10−1

Avg Query Time (s)

FigTree HBE RS

FigTree ASKIT RS HBE

D-structured datasets: D ≪ √ N: space partitions D ∼ N1−δ: Random Samp. 1 ≪ D ≪ N: HBE

slide-80
SLIDE 80

Intro Contribution Sketching Diagnostics Evaluation Conclusion

Conclusion

Rehashing Kernel Evaluation in High Dimensions Hashing-Based-Estimators:

1 made practical + often state-of-the-art + worst-case guarant. 2 data-dependent diagnostics: when to use & how to tune

“Rehashing” methodology Open Source Implementation and Experiments (https://github.com/kexinrong/rehashing) Sketch → Diagnostics → Visualization → Config file (deployment) Thank you! psimin@stanford.edu