Faster Cover Trees Mike Izbicki and Christian R. Shelton UC - - PowerPoint PPT Presentation

faster cover trees
SMART_READER_LITE
LIVE PREVIEW

Faster Cover Trees Mike Izbicki and Christian R. Shelton UC - - PowerPoint PPT Presentation

Faster Cover Trees Mike Izbicki and Christian R. Shelton UC Riverside Izbicki and Shelton (UC Riverside) Faster Cover Trees July 7, 2015 1 / 21 Outline Why care about faster cover trees? Making cover trees faster. Experimental setup


slide-1
SLIDE 1

Faster Cover Trees

Mike Izbicki and Christian R. Shelton UC Riverside

Izbicki and Shelton (UC Riverside) Faster Cover Trees July 7, 2015 1 / 21

slide-2
SLIDE 2

Outline

Why care about faster cover trees? Making cover trees faster.

Experimental setup Simpler definition reduces the number of nodes The nearest ancestor invariant Better cache performance Constructing and querying the tree in parallel

Izbicki and Shelton (UC Riverside) Faster Cover Trees July 7, 2015 2 / 21

slide-3
SLIDE 3

Methods for fast nearest neighbor queries: provable speedup arbitrary metric high dimensions quadtree yes no no kd-tree yes no somewhat hashing yes no yes ball tree no yes somewhat cover tree yes yes yes

(Beygelzimer, Kakade, and Langford, 2006)

Izbicki and Shelton (UC Riverside) Faster Cover Trees July 7, 2015 3 / 21

slide-4
SLIDE 4

Other uses of cover trees Any learning algorithm that cares about distance can be made faster using cover trees. Examples: k-nearest neighbor Support vector machines (Segata and Blanzieri, 2010) Dimensionality reduction (Lisitsyn et. al., 2010) Reinforcement learning (Tziortziotis et. al., 2014)

Izbicki and Shelton (UC Riverside) Faster Cover Trees July 7, 2015 4 / 21

slide-5
SLIDE 5

Outline

Why care about faster cover trees? Making cover trees faster.

◮ Experimental setup ◮ Simpler definition reduces the number of nodes ◮ The nearest ancestor invariant ◮ Better cache performance ◮ Constructing and querying the tree in parallel Izbicki and Shelton (UC Riverside) Faster Cover Trees July 7, 2015 5 / 21

slide-6
SLIDE 6

Experimental setup Three data sources: MLPack benchmarks with Euclidean distance Protein dataset with the random walk graph distance Yahoo! 1.5 million creative common images with the earth movers distance Benchmarking procedure: Construct a cover tree on the dataset For each data point in the dataset, find the nearest neighbor

Izbicki and Shelton (UC Riverside) Faster Cover Trees July 7, 2015 6 / 21

slide-7
SLIDE 7

The simplified cover tree

10 8 7 9 12 level 3 level 2 level 1 The covering invariant. For every node p, define the function covdist(p) = 2level(p). For each child q of p d(p, q) ≤ covdist(p) The separating invariant. For every node p, define the function sepdist(p) = 2level(p)−1. For all distinct children q1 and q2 of p d(q1, q2) ≥ sepdist(p)

Izbicki and Shelton (UC Riverside) Faster Cover Trees July 7, 2015 7 / 21

slide-8
SLIDE 8

The simplified cover tree

10 8 7 9 12 level 3 level 2 level 1 Advantages of the simplified cover tree: Maintains all runtime guarantees of the original cover tree. Significantly easier to understand and implement. The original cover tree was described in terms of an infinitely large tree, only a subset of which actually gets implemented. Requires exactly n nodes instead of O(n) nodes. Fewer nodes means a faster constant factor for all algorithms.

Izbicki and Shelton (UC Riverside) Faster Cover Trees July 7, 2015 7 / 21

slide-9
SLIDE 9

The simplified cover tree

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 yearpredict twitter tinyImages mnist corel covtype artificial40 faces fraction of nodes in the original cover tree required for the simplified cover tree

Izbicki and Shelton (UC Riverside) Faster Cover Trees July 7, 2015 8 / 21

slide-10
SLIDE 10

The nearest ancestor cover tree

10 8 7 11 12 9 13 10 8 7 9 12 11 13 level 3 level 2 level 1 A nearest ancestor cover tree is a simplified cover tree where every point p satisfies the additional invariant that if q1 is an ancestor of p and q2 is a sibling of q1, then d(p, q1) ≤ d(p, q2)

Izbicki and Shelton (UC Riverside) Faster Cover Trees July 7, 2015 9 / 21

slide-11
SLIDE 11

The nearest ancestor cover tree

10 8 7 11 12 9 13 10 8 7 9 12 11 13 level 3 level 2 level 1 Insertions require rebalancing. No runtime guarantees on the rebalance step. In practice, queries are much faster and construction is only slightly slower.

Izbicki and Shelton (UC Riverside) Faster Cover Trees July 7, 2015 9 / 21

slide-12
SLIDE 12

Comparing cover trees on construction time

1 2 3 4 5 yearpredict twitter tinyImages mnist corel covtype artificial40 faces number of distance comparisons in tree construction only (normalized by the original cover tree) 19.1

Original cover tree Simplified cover tree Nearest ancestor cover tree

Izbicki and Shelton (UC Riverside) Faster Cover Trees July 7, 2015 10 / 21

slide-13
SLIDE 13

Comparing cover trees on construction and query time

0.2 0.4 0.6 0.8 1 1.2 yearpredict twitter tinyImages mnist corel covtype artificial40 faces number of distance comparisons in tree construction and query (normalized by n2)

Original cover tree Simplified cover tree Nearest ancestor cover tree

Izbicki and Shelton (UC Riverside) Faster Cover Trees July 7, 2015 11 / 21

slide-14
SLIDE 14

All of the cover trees scale similarly

This experiment uses the protein data and the random walk graph kernel. 200 400 600 800 1000 1200 1400 1600 50 100 150 200 250 total distance comparisons (millions)

  • n construction and query

number of data points (thousands)

Original cover tree Simplified cover tree Nearest ancestor cover tree

Izbicki and Shelton (UC Riverside) Faster Cover Trees July 7, 2015 12 / 21

slide-15
SLIDE 15

Cache oblivious cover tree

Need to consider cache accesses for fast, modern data structures

image from: http://1024cores.net Izbicki and Shelton (UC Riverside) Faster Cover Trees July 7, 2015 13 / 21

slide-16
SLIDE 16

Cache oblivious cover tree

Arrange nodes in memory according to a preorder traversal of the tree (van Emde Boas et al., 1966; Demaine, 2002)

image from: Wikipedia Izbicki and Shelton (UC Riverside) Faster Cover Trees July 7, 2015 14 / 21

slide-17
SLIDE 17

The cache efficiency of three cover tree implementations

0.2 0.4 0.6 0.8 1 yearpredict twitter tinyImages mnist corel covtype artificial40 faces cache miss rate (cache misses / cache accesses)

Without van embde boas With van embde boas

Measured using Linux’s perf stat utility on an Amazon AWS instance

Izbicki and Shelton (UC Riverside) Faster Cover Trees July 7, 2015 15 / 21

slide-18
SLIDE 18

Merging cover trees

Merging cover trees gives us a parallel tree construction algorithm Sometimes, merging cover trees is easy: 10 8 7 9 12 11 13 level 3 level 2 level 1 No runtime bound on the merge operation, but it is fast in practice

Izbicki and Shelton (UC Riverside) Faster Cover Trees July 7, 2015 16 / 21

slide-19
SLIDE 19

Merging cover trees

Merging cover trees gives us a parallel tree construction algorithm Sometimes, merging cover trees is hard: 10 8 7 9 11.5 11 13 level 3 level 2 level 1 No runtime bound on the merge operation, but it is fast in practice

Izbicki and Shelton (UC Riverside) Faster Cover Trees July 7, 2015 17 / 21

slide-20
SLIDE 20

The effect of parallel tree construction on small datasets

2−4 2−3 2−2 2−1 2+0 2+1 yearpredict (77sec) twitter (107sec) tinyImages (65sec) mnist (12sec) normalized tree construction time

1 2 4 8 16

number of processors

Our cover tree

Experiments run on an Amazon AWS instance with 16 true cores

Izbicki and Shelton (UC Riverside) Faster Cover Trees July 7, 2015 18 / 21

slide-21
SLIDE 21

Parallel tree construction really matters on larger data sets

On large datasets with an expensive metric, parallelism is more useful Yahoo! Flickr dataset with 1.5 million images and earth mover distance

num cores simplified tree nearest ancestor tree time speedup time speedup 1 70.7 min 1.0 210.9 min 1.0 2 36.6 min 1.9 94.2 min 2.2 4 18.5 min 3.8 48.5 min 4.3 8 10.2 min 6.9 25.3 min 8.3 16 6.7 min 10.5 12.0 min 17.6

Izbicki and Shelton (UC Riverside) Faster Cover Trees July 7, 2015 19 / 21

slide-22
SLIDE 22

The effect of parallel tree construction and query

2−4 2−3 2−2 2−1 2+0 2+1 yearpredict (277min) twitter (51min) tinyImages (34min) mnist (30min) normalized total runtime (both construction and query)

1 1 1 2 4 8 16

Reference cover tree MLPack’s cover tree Our cover tree

Experiments run on an Amazon AWS instance with 16 true cores

Izbicki and Shelton (UC Riverside) Faster Cover Trees July 7, 2015 20 / 21

slide-23
SLIDE 23

Summary You should use cover trees. We made them easier to implement and faster. All the code is licensed under the BSD3 and available at: http://github.com/mikeizbicki/hlearn

Izbicki and Shelton (UC Riverside) Faster Cover Trees July 7, 2015 21 / 21