analysis of hierarchical metric tree indexing schemes
play

Analysis of hierarchical metric-tree indexing schemes for - PowerPoint PPT Presentation

Analysis of hierarchical metric-tree indexing schemes for similarity search in high-dimensional datasets Vladimir Pestov vpest283@uottawa.ca http://aix1.uottawa.ca/ vpest283 Department of Mathematics and Statistics University of Ottawa


  1. Pruning • If B ε ( ω ) ∩ B = ∅ , the sub -tree descending from the node B can be pruned: A B ε ω ε B ε A that is, if it can be certified that ∈ B ε = { x ∈ Ω: d ( x, B ) < ε } . ω / Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.7/25

  2. Pruning • If B ε ( ω ) ∩ B = ∅ , the sub -tree descending from the node B can be pruned: A B B A ε ε ω ω ε ε B ε B ε A A that is, if it can be certified that ∈ B ε = { x ∈ Ω: d ( x, B ) < ε } . ω / • Otherwise the search branches out. Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.7/25

  3. Pruning • If B ε ( ω ) ∩ B = ∅ , the sub -tree descending from the node B can be pruned: A B B A ε ε ω ω ε ε B ε B ε A A that is, if it can be certified that ∈ B ε = { x ∈ Ω: d ( x, B ) < ε } . ω / • Otherwise the search branches out. How to “certify” that B ε ( ω ) ∩ B = ∅ ? Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.7/25

  4. Decision functions Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.8/25

  5. Decision functions Let f : Ω → R be a 1 -Lipschitz function, | f ( x ) − f ( y ) | ≤ d ( x, y ) ∀ x, y ∈ Ω , Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.8/25

  6. Decision functions Let f : Ω → R be a 1 -Lipschitz function, | f ( x ) − f ( y ) | ≤ d ( x, y ) ∀ x, y ∈ Ω , such that f ↾ B ≤ 0 . Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.8/25

  7. Decision functions Let f : Ω → R be a 1 -Lipschitz function, | f ( x ) − f ( y ) | ≤ d ( x, y ) ∀ x, y ∈ Ω , such that f ↾ B ≤ 0 . Then f ↾ B ε < ε , Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.8/25

  8. Decision functions Let f : Ω → R be a 1 -Lipschitz function, | f ( x ) − f ( y ) | ≤ d ( x, y ) ∀ x, y ∈ Ω , such that f ↾ B ≤ 0 . Then f ↾ B ε < ε , f f(x) ε B 0 x y Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.8/25

  9. Decision functions Let f : Ω → R be a 1 -Lipschitz function, | f ( x ) − f ( y ) | ≤ d ( x, y ) ∀ x, y ∈ Ω , such that f ↾ B ≤ 0 . Then f ↾ B ε < ε , f f(x) ε B 0 x y that is, f ( ω ) ≥ ε Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.8/25

  10. Decision functions Let f : Ω → R be a 1 -Lipschitz function, | f ( x ) − f ( y ) | ≤ d ( x, y ) ∀ x, y ∈ Ω , such that f ↾ B ≤ 0 . Then f ↾ B ε < ε , f f(x) ε B 0 x y that is, f ( ω ) ≥ ε is a certificate that B ε ( ω ) ∩ B = ∅ Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.8/25

  11. Metric trees A metric tree for a metric similarity workload (Ω , ρ, X ) : a binary rooted tree T , a collection of partially defined 1 -Lipschitz functions f t : B t → R for every inner node t (decision functions), a collection of bins B t ⊆ Ω for every leaf node t , containing pointers to elements X ∩ B t , such that B root ( T ) = Ω , ∀ inner node t and child nodes t − , t + , B t ⊆ B t − ∪ B t + . When processing a range query B ε ( ω ) , t − [ t + ] is accessed ⇐ ⇒ f t ( ω ) < ε [resp. f t ( ω ) > − ε ]. Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.9/25

  12. What happens in practice? Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.10/25

  13. What happens in practice? The best indexing schemes for exact similarity search in high -dimensional outer datasets are often (not always!) outperformed by linear scan. ∗ ∗ ∗ Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.10/25

  14. What happens in practice? The best indexing schemes for exact similarity search in high -dimensional outer datasets are often (not always!) outperformed by linear scan. ∗ ∗ ∗ The emphasis has shifted towards approximate similarity search: Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.10/25

  15. What happens in practice? The best indexing schemes for exact similarity search in high -dimensional outer datasets are often (not always!) outperformed by linear scan. ∗ ∗ ∗ The emphasis has shifted towards approximate similarity search: given ε > 0 and ω ∈ Ω , return a point that is [with high probability] at a distance < (1 + ε ) d NN ( ω ) from ω . Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.10/25

  16. The curse of dimensionality conjecture Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.11/25

  17. The curse of dimensionality conjecture Conjecture. Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.11/25

  18. The curse of dimensionality conjecture Let X ⊆ { 0 , 1 } d be a dataset with n points, Conjecture. where the Hamming cube is equipped with the Hamming ( ℓ 1 ) distance: d ( x, y ) = ♯ { i : x i � = y i } . Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.11/25

  19. The curse of dimensionality conjecture Let X ⊆ { 0 , 1 } d be a dataset with n points, Conjecture. where the Hamming cube is equipped with the Hamming ( ℓ 1 ) distance: d ( x, y ) = ♯ { i : x i � = y i } . Suppose d = n o (1) , but d = ω (log n ) . Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.11/25

  20. The curse of dimensionality conjecture Let X ⊆ { 0 , 1 } d be a dataset with n points, Conjecture. where the Hamming cube is equipped with the Hamming ( ℓ 1 ) distance: d ( x, y ) = ♯ { i : x i � = y i } . Suppose d = n o (1) , but d = ω (log n ) . Any data structure for exact nearest neighbour search in X , Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.11/25

  21. The curse of dimensionality conjecture Let X ⊆ { 0 , 1 } d be a dataset with n points, Conjecture. where the Hamming cube is equipped with the Hamming ( ℓ 1 ) distance: d ( x, y ) = ♯ { i : x i � = y i } . Suppose d = n o (1) , but d = ω (log n ) . Any data structure for with d O (1) query exact nearest neighbour search in X , time, Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.11/25

  22. The curse of dimensionality conjecture Let X ⊆ { 0 , 1 } d be a dataset with n points, Conjecture. where the Hamming cube is equipped with the Hamming ( ℓ 1 ) distance: d ( x, y ) = ♯ { i : x i � = y i } . Suppose d = n o (1) , but d = ω (log n ) . Any data structure for with d O (1) query exact nearest neighbour search in X , time, must use n ω (1) space. ∗ ∗ ∗ Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.11/25

  23. The curse of dimensionality conjecture Let X ⊆ { 0 , 1 } d be a dataset with n points, Conjecture. where the Hamming cube is equipped with the Hamming ( ℓ 1 ) distance: d ( x, y ) = ♯ { i : x i � = y i } . Suppose d = n o (1) , but d = ω (log n ) . Any data structure for with d O (1) query exact nearest neighbour search in X , time, must use n ω (1) space. ∗ ∗ ∗ The cell probe model : Ω( d/ log n ) lower bound (Barkol–Rabani, 2000). Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.11/25

  24. Concentration of measure Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.12/25

  25. Concentration of measure The phenomenon of concentration of measure on high- dimensional structures ( “Geometric LLN” ): Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.12/25

  26. Concentration of measure The phenomenon of concentration of measure on high-dimensional structures ( “Geometric LLN” ): for a typical “high -dimensional” structure Ω , if A is a subset containing at least half of all points, then the measure of the ε -neighbourhood A ε of A is overwhelmingly close to 1 already for small ε > 0 . Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.12/25

  27. Concentration of measure The phenomenon of concentration of measure on high-dimensional structures ( “Geometric LLN” ): for a typical “high -dimensional” structure Ω , if A is a subset containing at least half of all points, then the measure of the ε -neighbourhood A ε of A is overwhelmingly close to 1 already for small ε > 0 . Ω ε Α contains at least half of all points ������������������������� ������������������������� A ������������������������� ������������������������� ������������������������� ������������������������� Ω \ A ε ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� α(Ω,ε) ������������������������� ������������������������� ������������������������� ������������������������� ) bounds \ A ε µ(Ω ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� from above ������������������������� ������������������������� ������������������������� ������������������������� A ε ������������������������� ������������������������� Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.12/25

  28. Concentration function Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.13/25

  29. Concentration function Let Ω = (Ω , d, µ ) be a metric space with measure. Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.13/25

  30. Concentration function Let Ω = (Ω , d, µ ) be a metric space with measure. The concentration function of Ω : � 1 if ε = 0 , 2 , α ( ε ) = µ ♯ ( A ε ) : A ⊆ Ω , µ ♯ ( A ) ≥ 1 � � 1 − min if ε > 0 . , 2 Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.13/25

  31. Concentration function Let Ω = (Ω , d, µ ) be a metric space with measure. The concentration function of Ω : � 1 if ε = 0 , 2 , α ( ε ) = µ ♯ ( A ε ) : A ⊆ Ω , µ ♯ ( A ) ≥ 1 � � 1 − min if ε > 0 . , 2 For Ω = Σ n , the Hamming cube (normalized distance + unif. measure): α Σ n ( ε ) ≤ e − 2 ε 2 n . Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.13/25

  32. Concentration function Let Ω = (Ω , d, µ ) be a metric space with measure. The concentration function of Ω : � 1 if ε = 0 , 2 , α ( ε ) = µ ♯ ( A ε ) : A ⊆ Ω , µ ♯ ( A ) ≥ 1 � � 1 − min if ε > 0 . , 2 For Ω = Σ n , the Hamming cube (normalized distance + unif. measure): α Σ n ( ε ) ≤ e − 2 ε 2 n . Gaussian estimates are typical (Euclidean spheres S n , cubes I n , ...) Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.13/25

  33. Example: the Hamming cube Concentration function versus Chernoff’s bound, n = 101 1 Concentration function Chernoff bound 0.8 0.6 0.4 0.2 0 0 0.05 0.1 0.15 0.2 Concentration function α (Σ 101 , ε ) versus Chernoff bound Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.14/25

  34. Effects of concentration on branching Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.15/25

  35. Effects of concentration on branching C < α (C, ε) < α (C, ε) B A ε ω ε B ε A Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.15/25

  36. Effects of concentration on branching C < α (C, ε) < α (C, ε) B A ε ω ε B ε A For all query points ω ∈ C except a set of measure ≤ 2 α ( C, ε ) , the search algorithm branches out at the node C . Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.15/25

  37. Search radius ε NN ( ω ) is a 1 -Lipschitz function, so concentrates near the median value, ε M ; ε M → E µ ⊗ µ d ( x, y ) = O (1) . Example: 1000 pts ∼ [0 , 1] 10 , the ℓ 2 - ε NN : E d ( x, y ) = 1 . 2765 . ε M = 0 . 69419 Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.16/25

  38. A naive average O ( n ) lower bound Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.17/25

  39. A naive average O ( n ) lower bound Suppose datapoints are distributed according to µ ∈ P (Ω) ... Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.17/25

  40. A naive average O ( n ) lower bound Suppose datapoints are distributed according to µ ∈ P (Ω) ... ...as well as query points. Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.17/25

  41. A naive average O ( n ) lower bound Suppose datapoints are distributed according to µ ∈ P (Ω) ... ...as well as query points. A balanced metric tree of depth O (log n ) , with O ( n ) bins of roughly equal size ( µ -measure). Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.17/25

  42. A naive average O ( n ) lower bound Suppose datapoints are distributed according to µ ∈ P (Ω) ... ...as well as query points. A balanced metric tree of depth O (log n ) , with O ( n ) bins of roughly equal size ( µ -measure). in 1 / 2 the cases, ε NN ≥ ε M = O (1) , the median NN dist. Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.17/25

  43. A naive average O ( n ) lower bound Suppose datapoints are distributed according to µ ∈ P (Ω) ... ...as well as query points. A balanced metric tree of depth O (log n ) , with O ( n ) bins of roughly equal size ( µ -measure). in 1 / 2 the cases, ε NN ≥ ε M = O (1) , the median NN dist. For every element A of level t partition, α ( A, ε M ) ≤ 2 µ ( A ) − 1 α (Ω , ε M / 2) = O (2 t ) e − O (1) ε 2 M d . Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.17/25

  44. A naive average O ( n ) lower bound Suppose datapoints are distributed according to µ ∈ P (Ω) ... ...as well as query points. A balanced metric tree of depth O (log n ) , with O ( n ) bins of roughly equal size ( µ -measure). in 1 / 2 the cases, ε NN ≥ ε M = O (1) , the median NN dist. For every element A of level t partition, α ( A, ε M ) ≤ 2 µ ( A ) − 1 α (Ω , ε M / 2) = O (2 t ) e − O (1) ε 2 M d . � branching at every node occurs for all ω except Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.17/25

  45. A naive average O ( n ) lower bound Suppose datapoints are distributed according to µ ∈ P (Ω) ... ...as well as query points. A balanced metric tree of depth O (log n ) , with O ( n ) bins of roughly equal size ( µ -measure). in 1 / 2 the cases, ε NN ≥ ε M = O (1) , the median NN dist. For every element A of level t partition, α ( A, ε M ) ≤ 2 µ ( A ) − 1 α (Ω , ε M / 2) = O (2 t ) e − O (1) ε 2 M d . � branching at every node occurs for all ω except α ( A, ε ) = O ( n 2 ) e − O (1) d = o (1) , ♯ ( nodes ) × 2 sup A because d = ω (log n ) , � e − O (1) d is superpoly ( n ) . Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.17/25

  46. What’s wrong? Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.18/25

  47. What’s wrong? A dataset X is modeled by a sequence of i.i.d. r.v. X i ∼ µ . Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.18/25

  48. What’s wrong? A dataset X is modeled by a sequence of i.i.d. r.v. X i ∼ µ . Implicit assumption: empirical measure µ n ( A ) = | A | n ≈ µ ( A ) . Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.18/25

  49. What’s wrong? A dataset X is modeled by a sequence of i.i.d. r.v. X i ∼ µ . Implicit assumption: empirical measure µ n ( A ) = | A | n ≈ µ ( A ) . But the scheme is chosen after seeing an instance X ! Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.18/25

  50. What’s wrong? A dataset X is modeled by a sequence of i.i.d. r.v. X i ∼ µ . Implicit assumption: empirical measure µ n ( A ) = | A | n ≈ µ ( A ) . But the scheme is chosen after seeing an instance X ! 1 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 How much can be said of concentration in (Ω , µ n ) ? Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.18/25

  51. VC dimension Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.19/25

  52. VC dimension Let A be a family of subsets of Ω (a concept class ). B ⊆ Ω is shattered by A if for each C ⊆ B there is A ∈ A such that A ∩ B = C. Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.19/25

  53. VC dimension Let A be a family of subsets of Ω (a concept class ). B ⊆ Ω is shattered by A if for each C ⊆ B there is A ∈ A such that A ∩ B = C. Ω A B C Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.19/25

  54. VC dimension Let A be a family of subsets of Ω (a concept class ). B ⊆ Ω is shattered by A if for each C ⊆ B there is A ∈ A such that A ∩ B = C. Ω A B C The Vapnik–Chervonenkis dimension VC -dim ( A ) of A is the largest cardinality of a set B ⊆ Ω shattered by A . Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.19/25

  55. Statistical learning bounds Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.20/25

  56. Statistical learning bounds Let A ⊆ 2 Ω be a concept class of finite VC dimension, d . Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.20/25

  57. Statistical learning bounds Let A ⊆ 2 Ω be a concept class of finite VC dimension, d . Then for all ǫ, δ > 0 and every probability measure µ on Ω , Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.20/25

  58. Statistical learning bounds Let A ⊆ 2 Ω be a concept class of finite VC dimension, d . Then for all ǫ, δ > 0 and every probability measure µ on Ω , if n datapoints in X are drawn randomly and independently acoording to µ , then with confidence 1 − δ � � � µ ( A ) − X ∩ A � � ∀ A ∈ A , � < ǫ, � � n Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.20/25

  59. Statistical learning bounds Let A ⊆ 2 Ω be a concept class of finite VC dimension, d . Then for all ǫ, δ > 0 and every probability measure µ on Ω , if n datapoints in X are drawn randomly and independently acoording to µ , then with confidence 1 − δ � � � µ ( A ) − X ∩ A � � ∀ A ∈ A , � < ǫ, � � n provided n is large enough: � 2 e 2 n ≥ 128 � ε log 2 e � + log 8 � d log . ε 2 ε δ Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.20/25

  60. Bin access lemma Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.21/25

  61. Bin access lemma Let δ > 0 , and let γ be a collection of subsets A ⊆ Ω of measure µ ( A ) ≤ α ( δ ) ≤ 1 4 each, satisfying µ ( ∪ γ ) ≥ 1 / 2 . Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.21/25

  62. Bin access lemma Let δ > 0 , and let γ be a collection of subsets A ⊆ Ω of measure µ ( A ) ≤ α ( δ ) ≤ 1 4 each, satisfying µ ( ∪ γ ) ≥ 1 / 2 . Then the 2 δ -neighbourhood of every point ω ∈ Ω , apart from 1 2 α ( δ ) − 1 a set of measure at most 1 2 , meets at least ⌈ 1 2 ⌉ 2 α ( δ ) elements of γ . ∗ ∗ ∗ Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.21/25

  63. Bin access lemma Let δ > 0 , and let γ be a collection of subsets A ⊆ Ω of measure µ ( A ) ≤ α ( δ ) ≤ 1 4 each, satisfying µ ( ∪ γ ) ≥ 1 / 2 . Then the 2 δ -neighbourhood of every point ω ∈ Ω , apart from 1 2 α ( δ ) − 1 a set of measure at most 1 2 , meets at least ⌈ 1 2 ⌉ 2 α ( δ ) elements of γ . ∗ ∗ ∗ If we can now guarantee that the bins are not too large, we get a lower bound on the number of bin accesses. Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.21/25

  64. Bin complexity estimates Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.22/25

  65. Bin complexity estimates Let F be a class of 1 -Lipschitz functions used for constructing a metric tree of a particular type. Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.22/25

  66. Bin complexity estimates Let F be a class of 1 -Lipschitz functions used for constructing a metric tree of a particular type. Let A be the concept class of all solution sets to inequalities f � a, f ∈ F , a ∈ R . Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.22/25

  67. Bin complexity estimates Let F be a class of 1 -Lipschitz functions used for constructing a metric tree of a particular type. Let A be the concept class of all solution sets to inequalities f � a, f ∈ F , a ∈ R . Suppose p = VC-dim ( A ) < ∞ ( pseudodimension of F in the sense of Vapnik ). Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.22/25

  68. Bin complexity estimates Let F be a class of 1 -Lipschitz functions used for constructing a metric tree of a particular type. Let A be the concept class of all solution sets to inequalities f � a, f ∈ F , a ∈ R . Suppose p = VC-dim ( A ) < ∞ ( pseudodimension of F in the sense of Vapnik ). Denote B the class of all bins of all possible metric trees of depth ≤ h built using F . Then VC-dim ( B ) ≤ 2 hp log( hp ) = O ( hp ) . Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.22/25

  69. Rigorous lower bounds Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.23/25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend