Dual-tree Algorithms in Statistics Ryan Riegel - - PowerPoint PPT Presentation

dual tree algorithms in statistics
SMART_READER_LITE
LIVE PREVIEW

Dual-tree Algorithms in Statistics Ryan Riegel - - PowerPoint PPT Presentation

Dual-tree Algorithms in Statistics Ryan Riegel rriegel@cc.gatech.edu Computational Science and Engineering College of Computing Georgia Institute of Technology Dual-tree Algorithms in Statistics p.1/77 Outline (Relevant citations at top


slide-1
SLIDE 1

Dual-tree Algorithms in Statistics

Ryan Riegel

rriegel@cc.gatech.edu

Computational Science and Engineering College of Computing Georgia Institute of Technology

Dual-tree Algorithms in Statistics – p.1/77

slide-2
SLIDE 2

Outline

(Relevant citations at top of slide)

  • 1. Recap of yesterday: single-tree algorithms
  • 2. Motivation and intuition for dual-tree algorithms
  • 3. Several examples, including demo of All-NN
  • 4. Case study #1: quasar identification
  • 5. Formal algebraic foundations
  • 6. The general algorithm and its parameters
  • 7. Case study #2: affinity propagation

Dual-tree Algorithms in Statistics – p.2/77

slide-3
SLIDE 3

Recap

Yesterday, we considered a problem best solved by a single-tree algorithm: Given one query and a set of references, determine the sum of forces acting on the query

Dual-tree Algorithms in Statistics – p.3/77

slide-4
SLIDE 4

Recap

Barnes-Hut solution approach: Form a spatial tree (e.g. oct-tree) on the references For each query, process nodes: If R

W > thresh, approximate with center of mass

Else, recurse on the node and sum up child results

Dual-tree Algorithms in Statistics – p.4/77

slide-5
SLIDE 5

Recap

Barnes-Hut solution approach: Form a spatial tree (e.g. oct-tree) on the references For each query, process nodes: If R

W > thresh, approximate with center of mass

Else, recurse on the node and sum up child results Reasoning about the potential function ( 1

r2) permits

bounded error via choice of threshold.

Dual-tree Algorithms in Statistics – p.4/77

slide-6
SLIDE 6

Recap

Fast Multi-pole Method is similar: Annotate spatial tree with order expansion statistics (fast bottom-up computation) For each query, process nodes: If R

W > thresh, approximate with order expansion

Else, recurse on the node and sum up child results

Dual-tree Algorithms in Statistics – p.5/77

slide-7
SLIDE 7

Recap

Fast Multi-pole Method is similar: Annotate spatial tree with order expansion statistics (fast bottom-up computation) For each query, process nodes: If R

W > thresh, approximate with order expansion

Else, recurse on the node and sum up child results Added accuracy of order expansion permits more aggressive pruning while still with bounded error.

Dual-tree Algorithms in Statistics – p.5/77

slide-8
SLIDE 8

Motivation

Complexity analysis: Tree-building is O(N log N): O(N) work at each level,

O(log N) levels (in a balanced tree)

Work is O(log N) per query; O(M log N) overall

Dual-tree Algorithms in Statistics – p.6/77

slide-9
SLIDE 9

Motivation

Consider M ∈ O(N):

Dual-tree Algorithms in Statistics – p.7/77

slide-10
SLIDE 10

Motivation

Consider M ∈ O(N): Theorist’s response: “What’s the problem?”; overall computation is already O(N log N) from tree-building

Dual-tree Algorithms in Statistics – p.7/77

slide-11
SLIDE 11

Motivation

Consider M ∈ O(N): Theorist’s response: “What’s the problem?”; overall computation is already O(N log N) from tree-building Maybe the tree already exists

Dual-tree Algorithms in Statistics – p.7/77

slide-12
SLIDE 12

Motivation

Consider M ∈ O(N): Theorist’s response: “What’s the problem?”; overall computation is already O(N log N) from tree-building Maybe the tree already exists Tree-building tends to be very fast

Dual-tree Algorithms in Statistics – p.7/77

slide-13
SLIDE 13

Motivation

Gray and Moore, NIPS 2000 Dual-tree algorithms (a.k.a. generalized N-body methods):

Dual-tree Algorithms in Statistics – p.8/77

slide-14
SLIDE 14

Motivation

Gray and Moore, NIPS 2000 Dual-tree algorithms (a.k.a. generalized N-body methods): The most logical extension of single-tree algorithms: form trees for references and queries

Dual-tree Algorithms in Statistics – p.8/77

slide-15
SLIDE 15

Motivation

Gray and Moore, NIPS 2000 Dual-tree algorithms (a.k.a. generalized N-body methods): The most logical extension of single-tree algorithms: form trees for references and queries After tree-building, time improved O(N log N) O(N); much better than traditional O(N 2) for nested loops

Dual-tree Algorithms in Statistics – p.8/77

slide-16
SLIDE 16

Motivation

Gray and Moore, NIPS 2000 Dual-tree algorithms (a.k.a. generalized N-body methods): The most logical extension of single-tree algorithms: form trees for references and queries After tree-building, time improved O(N log N) O(N); much better than traditional O(N 2) for nested loops Yield exact results or have bounded approximation error (absolute or relative)

Dual-tree Algorithms in Statistics – p.8/77

slide-17
SLIDE 17

Motivation

Gray and Moore, NIPS 2000 Dual-tree algorithms (a.k.a. generalized N-body methods): The most logical extension of single-tree algorithms: form trees for references and queries After tree-building, time improved O(N log N) O(N); much better than traditional O(N 2) for nested loops Yield exact results or have bounded approximation error (absolute or relative) Track record: fastest, most accurate methods to date

Dual-tree Algorithms in Statistics – p.8/77

slide-18
SLIDE 18

Hype

Gray and Moore, NIPS 2000 Many other papers Applications include: Nonparametric methods in machine learning:

Dual-tree Algorithms in Statistics – p.9/77

slide-19
SLIDE 19

Hype

Gray and Moore, NIPS 2000 Many other papers Applications include: Nonparametric methods in machine learning: The n-point corrleation and range-count

Dual-tree Algorithms in Statistics – p.9/77

slide-20
SLIDE 20

Hype

Gray and Moore, NIPS 2000 Many other papers Applications include: Nonparametric methods in machine learning: The n-point corrleation and range-count All-k-nearest-neighbors (All-NN)

Dual-tree Algorithms in Statistics – p.9/77

slide-21
SLIDE 21

Hype

Gray and Moore, NIPS 2000 Many other papers Applications include: Nonparametric methods in machine learning: The n-point corrleation and range-count All-k-nearest-neighbors (All-NN) Kernel density estimation (KDE)

Dual-tree Algorithms in Statistics – p.9/77

slide-22
SLIDE 22

Hype

Gray and Moore, NIPS 2000 Many other papers Applications include: Nonparametric methods in machine learning: The n-point corrleation and range-count All-k-nearest-neighbors (All-NN) Kernel density estimation (KDE) Kernel discriminant analysis (KDA)

Dual-tree Algorithms in Statistics – p.9/77

slide-23
SLIDE 23

Hype

Gray and Moore, NIPS 2000 Many other papers Applications include: Nonparametric methods in machine learning: The n-point corrleation and range-count All-k-nearest-neighbors (All-NN) Kernel density estimation (KDE) Kernel discriminant analysis (KDA) Local linear regression and others

Dual-tree Algorithms in Statistics – p.9/77

slide-24
SLIDE 24

Hype

Gray and Moore, NIPS 2000 Many other papers Applications include: Nonparametric methods in machine learning: The n-point corrleation and range-count All-k-nearest-neighbors (All-NN) Kernel density estimation (KDE) Kernel discriminant analysis (KDA) Local linear regression and others More...

Dual-tree Algorithms in Statistics – p.9/77

slide-25
SLIDE 25

Hype

Gray and Moore, NIPS 2000 Many other papers Applications include: Nonparametric methods in machine learning Manifold methods via All-NN and others

Dual-tree Algorithms in Statistics – p.10/77

slide-26
SLIDE 26

Hype

Gray and Moore, NIPS 2000 Many other papers Applications include: Nonparametric methods in machine learning Manifold methods via All-NN and others Astronomy: quasar identification via KDA

Dual-tree Algorithms in Statistics – p.10/77

slide-27
SLIDE 27

Hype

Gray and Moore, NIPS 2000 Many other papers Applications include: Nonparametric methods in machine learning Manifold methods via All-NN and others Astronomy: quasar identification via KDA Physics: multi-body potentials, fitting wave functions

Dual-tree Algorithms in Statistics – p.10/77

slide-28
SLIDE 28

Hype

Gray and Moore, NIPS 2000 Many other papers Applications include: Nonparametric methods in machine learning Manifold methods via All-NN and others Astronomy: quasar identification via KDA Physics: multi-body potentials, fitting wave functions Biology: protein folding, solvent-accessible surfaces

Dual-tree Algorithms in Statistics – p.10/77

slide-29
SLIDE 29

Hype

Gray and Moore, NIPS 2000 Many other papers Applications include: Nonparametric methods in machine learning Manifold methods via All-NN and others Astronomy: quasar identification via KDA Physics: multi-body potentials, fitting wave functions Biology: protein folding, solvent-accessible surfaces (I conjecture) products of sparse matrices and other LA

Dual-tree Algorithms in Statistics – p.10/77

slide-30
SLIDE 30

Intuition

Gray and Moore, NIPS 2000 General algorithmic sketch: Form spatial trees for both queries and references For pairs of tree nodes: If “bounds” suggest a result for the pair, use it Else, recurse on all pairs of child nodes

Dual-tree Algorithms in Statistics – p.11/77

slide-31
SLIDE 31

Intuition

Gray and Moore, NIPS 2000 General algorithmic sketch: Form spatial trees for both queries and references For pairs of tree nodes: If “bounds” suggest a result for the pair, use it Else, recurse on all pairs of child nodes “Bounds” are often based on min/max distances between nodes; e.g. the range of a kernel applied to the distances.

Dual-tree Algorithms in Statistics – p.11/77

slide-32
SLIDE 32

Monochromatic all-nearest-neighbors: map

q∈X

argmin

r∈X−q

d(q, r)

Dual-tree Algorithms in Statistics – p.12/77

slide-33
SLIDE 33

Monochromatic all-nearest-neighbors: map

q∈X

argmin

r∈X−q

d(q, r)

Dual-tree Algorithms in Statistics – p.13/77

slide-34
SLIDE 34

Monochromatic all-nearest-neighbors: map

q∈X

argmin

r∈X−q

d(q, r)

Dual-tree Algorithms in Statistics – p.14/77

slide-35
SLIDE 35

Monochromatic all-nearest-neighbors: map

q∈X

argmin

r∈X−q

d(q, r)

Dual-tree Algorithms in Statistics – p.15/77

slide-36
SLIDE 36

Monochromatic all-nearest-neighbors: map

q∈X

argmin

r∈X−q

d(q, r)

Dual-tree Algorithms in Statistics – p.16/77

slide-37
SLIDE 37

Monochromatic all-nearest-neighbors: map

q∈X

argmin

r∈X−q

d(q, r)

Dual-tree Algorithms in Statistics – p.17/77

slide-38
SLIDE 38

Monochromatic all-nearest-neighbors: map

q∈X

argmin

r∈X−q

d(q, r)

Dual-tree Algorithms in Statistics – p.18/77

slide-39
SLIDE 39

Monochromatic all-nearest-neighbors: map

q∈X

argmin

r∈X−q

d(q, r)

Dual-tree Algorithms in Statistics – p.19/77

slide-40
SLIDE 40

Monochromatic all-nearest-neighbors: map

q∈X

argmin

r∈X−q

d(q, r)

Dual-tree Algorithms in Statistics – p.20/77

slide-41
SLIDE 41

Monochromatic all-nearest-neighbors: map

q∈X

argmin

r∈X−q

d(q, r)

Dual-tree Algorithms in Statistics – p.21/77

slide-42
SLIDE 42

Monochromatic all-nearest-neighbors: map

q∈X

argmin

r∈X−q

d(q, r)

Dual-tree Algorithms in Statistics – p.22/77

slide-43
SLIDE 43

Monochromatic all-nearest-neighbors: map

q∈X

argmin

r∈X−q

d(q, r)

Dual-tree Algorithms in Statistics – p.23/77

slide-44
SLIDE 44

Monochromatic all-nearest-neighbors: map

q∈X

argmin

r∈X−q

d(q, r)

Dual-tree Algorithms in Statistics – p.24/77

slide-45
SLIDE 45

Monochromatic all-nearest-neighbors: map

q∈X

argmin

r∈X−q

d(q, r)

Dual-tree Algorithms in Statistics – p.25/77

slide-46
SLIDE 46

Monochromatic all-nearest-neighbors: map

q∈X

argmin

r∈X−q

d(q, r)

Dual-tree Algorithms in Statistics – p.26/77

slide-47
SLIDE 47

Monochromatic all-nearest-neighbors: map

q∈X

argmin

r∈X−q

d(q, r)

Dual-tree Algorithms in Statistics – p.27/77

slide-48
SLIDE 48

Monochromatic all-nearest-neighbors: map

q∈X

argmin

r∈X−q

d(q, r)

Dual-tree Algorithms in Statistics – p.28/77

slide-49
SLIDE 49

Monochromatic all-nearest-neighbors: map

q∈X

argmin

r∈X−q

d(q, r)

Dual-tree Algorithms in Statistics – p.29/77

slide-50
SLIDE 50

Monochromatic all-nearest-neighbors: map

q∈X

argmin

r∈X−q

d(q, r)

Dual-tree Algorithms in Statistics – p.30/77

slide-51
SLIDE 51

Monochromatic all-nearest-neighbors: map

q∈X

argmin

r∈X−q

d(q, r)

Dual-tree Algorithms in Statistics – p.31/77

slide-52
SLIDE 52

Monochromatic all-nearest-neighbors: map

q∈X

argmin

r∈X−q

d(q, r)

Dual-tree Algorithms in Statistics – p.32/77

slide-53
SLIDE 53

Monochromatic all-nearest-neighbors: map

q∈X

argmin

r∈X−q

d(q, r)

Dual-tree Algorithms in Statistics – p.33/77

slide-54
SLIDE 54

Monochromatic all-nearest-neighbors: map

q∈X

argmin

r∈X−q

d(q, r)

Dual-tree Algorithms in Statistics – p.34/77

slide-55
SLIDE 55

Monochromatic all-nearest-neighbors: map

q∈X

argmin

r∈X−q

d(q, r)

Dual-tree Algorithms in Statistics – p.35/77

slide-56
SLIDE 56

Monochromatic all-nearest-neighbors: map

q∈X

argmin

r∈X−q

d(q, r)

Dual-tree Algorithms in Statistics – p.36/77

slide-57
SLIDE 57

Monochromatic all-nearest-neighbors: map

q∈X

argmin

r∈X−q

d(q, r)

Dual-tree Algorithms in Statistics – p.37/77

slide-58
SLIDE 58

Monochromatic all-nearest-neighbors: map

q∈X

argmin

r∈X−q

d(q, r)

Dual-tree Algorithms in Statistics – p.38/77

slide-59
SLIDE 59

Monochromatic all-nearest-neighbors: map

q∈X

argmin

r∈X−q

d(q, r)

Dual-tree Algorithms in Statistics – p.39/77

slide-60
SLIDE 60

Monochromatic all-nearest-neighbors: map

q∈X

argmin

r∈X−q

d(q, r)

Dual-tree Algorithms in Statistics – p.40/77

slide-61
SLIDE 61

Monochromatic all-nearest-neighbors: map

q∈X

argmin

r∈X−q

d(q, r)

Dual-tree Algorithms in Statistics – p.41/77

slide-62
SLIDE 62

Monochromatic all-nearest-neighbors: map

q∈X

argmin

r∈X−q

d(q, r)

Dual-tree Algorithms in Statistics – p.42/77

slide-63
SLIDE 63

Monochromatic all-nearest-neighbors: map

q∈X

argmin

r∈X−q

d(q, r)

Dual-tree Algorithms in Statistics – p.43/77

slide-64
SLIDE 64

Monochromatic all-nearest-neighbors: map

q∈X

argmin

r∈X−q

d(q, r)

Dual-tree Algorithms in Statistics – p.44/77

slide-65
SLIDE 65

Monochromatic all-nearest-neighbors: map

q∈X

argmin

r∈X−q

d(q, r)

Dual-tree Algorithms in Statistics – p.45/77

slide-66
SLIDE 66

Monochromatic all-nearest-neighbors: map

q∈X

argmin

r∈X−q

d(q, r)

Dual-tree Algorithms in Statistics – p.46/77

slide-67
SLIDE 67

Monochromatic all-nearest-neighbors: map

q∈X

argmin

r∈X−q

d(q, r)

Dual-tree Algorithms in Statistics – p.47/77

slide-68
SLIDE 68

Monochromatic all-nearest-neighbors: map

q∈X

argmin

r∈X−q

d(q, r)

Dual-tree Algorithms in Statistics – p.48/77

slide-69
SLIDE 69

Monochromatic all-nearest-neighbors: map

q∈X

argmin

r∈X−q

d(q, r)

Dual-tree Algorithms in Statistics – p.49/77

slide-70
SLIDE 70

Monochromatic all-nearest-neighbors: map

q∈X

argmin

r∈X−q

d(q, r)

Dual-tree Algorithms in Statistics – p.50/77

slide-71
SLIDE 71

Monochromatic all-nearest-neighbors: map

q∈X

argmin

r∈X−q

d(q, r)

Dual-tree Algorithms in Statistics – p.51/77

slide-72
SLIDE 72

Monochromatic all-nearest-neighbors: map

q∈X

argmin

r∈X−q

d(q, r)

Dual-tree Algorithms in Statistics – p.52/77

slide-73
SLIDE 73

Ex: Two-point Correlation

Gray and Moore, NIPS 2000

  • x1∈X
  • x2∈X

I(d(x1, x2) ≤ h) function tpc(X1, X2) if dl(X1, X2) > h, return 0 if du(X1, X2) ≤ h, return |X1| · |X2| return tpc(XL

1, XL 2) + tpc(XL 1, XR 2 )

+ tpc(XR

1 , XL 2) + tpc(XR 1 , XR 2 )

Dual-tree Algorithms in Statistics – p.53/77

slide-74
SLIDE 74

Ex: Range Count

Gray and Moore, NIPS 2000

map

q∈Q

  • r∈R

I(d(q, r) ≤ h) init ∀q ∈ Qroot, a(q) = 0 function rng(Q, R) if dl(Q, R) > h, return if du(Q, R) ≤ h, ∀q ∈ Q, a(q) += |R|; return rng(Q

L, R L); rng(Q L, R R)

rng(Q

R, R L); rng(Q R, R R)

Dual-tree Algorithms in Statistics – p.54/77

slide-75
SLIDE 75

Ex: All-nearest-neighbors

Gray and Moore, NIPS 2000

map

q∈Q

argmin

r∈R

d(q, r) init ∀q ∈ Qroot, a(q) = ∞ function allnn(Q, R) if au(Q) ≤ dl(Q, R), return if (Q, R) = ({q}, {r}), a(q) = min{a(q), d(q, r)}; return prioritize {R1, R2} = {R

L, R R} by dl(Q L, ·)

allnn(Q

L, R1); allnn(Q L, R2)

prioritize {R1, R2} = {R

L, R R} by dl(Q R, ·)

allnn(Q

R, R1); allnn(Q R, R2)

Dual-tree Algorithms in Statistics – p.55/77

slide-76
SLIDE 76

Ex: Kernel Density Estimation

Lee et al., NIPS 2005 Lee and Gray, UAI 2006

map

q∈Q

  • r∈R

Kh(q, r) init ∀q ∈ Qroot, a(q) = 0; b = 0 function kde(Q, R, b) if Ku

h(Q, R) − Kl h(Q, R) < (al(Q) + b) |R|·ǫ |Rroot|,

∀q ∈ Q, a(q) += Kl

h(Q, R); return

prioritize {R1, R2} = {R

L, R R} by dl(Q L, ·)

kde(Q

L, R1, b + Kl h(Q L, R2)); kde(Q L, R2, b)

prioritize {R1, R2} = {R

L, R R} by dl(Q R, ·)

kde(Q

R, R1, b + Kl h(Q R, R2)); kde(Q R, R2, b)

Dual-tree Algorithms in Statistics – p.56/77

slide-77
SLIDE 77

Ex: Kernel Discriminant Analysis

Gray and Riegel, COMPSTAT 2006 Riegel et al., SIAM Data Mining 2008

map

q∈Q

argmax

C∈{C1,C2}

P(C) |RC|

  • r∈RC

KhC(q, r) init ∀q ∈ Qroot, a(q) = δ(Qroot, Rroot) enqueue(Qroot, Rroot) while dequeue(Q, R)

// Main loop of kda

if al(Q) > 0 or au(Q) < 0, return ∀q ∈ Q, a(q) −= δ(Q, R) ∀q ∈ Q

L, a(q) += δ(Q L, R L) + δ(Q L, R R)

∀q ∈ Q

R, a(q) += δ(Q R, R L) + δ(Q R, R R)

enqueue(Q

L, R L); enqueue(Q L, R R)

enqueue(Q

R, R L); enqueue(Q R, R R)

Dual-tree Algorithms in Statistics – p.57/77

slide-78
SLIDE 78

Case Study: Quasar Identification

Riegel et al., SIAM Data Mining 2008 (Sumbitted) Richards et al., AAS 2008 Mining for quasars in the Sloan Digital Sky Survey:

Dual-tree Algorithms in Statistics – p.58/77

slide-79
SLIDE 79

Case Study: Quasar Identification

Riegel et al., SIAM Data Mining 2008 (Sumbitted) Richards et al., AAS 2008 Mining for quasars in the Sloan Digital Sky Survey: Brightest objects in the universe

Dual-tree Algorithms in Statistics – p.58/77

slide-80
SLIDE 80

Case Study: Quasar Identification

Riegel et al., SIAM Data Mining 2008 (Sumbitted) Richards et al., AAS 2008 Mining for quasars in the Sloan Digital Sky Survey: Brightest objects in the universe Thus, the farthest/oldest we can see

Dual-tree Algorithms in Statistics – p.58/77

slide-81
SLIDE 81

Case Study: Quasar Identification

Riegel et al., SIAM Data Mining 2008 (Sumbitted) Richards et al., AAS 2008 Mining for quasars in the Sloan Digital Sky Survey: Brightest objects in the universe Thus, the farthest/oldest we can see Believed to be active galactic nuclei: giant black holes

Dual-tree Algorithms in Statistics – p.58/77

slide-82
SLIDE 82

Case Study: Quasar Identification

Riegel et al., SIAM Data Mining 2008 (Sumbitted) Richards et al., AAS 2008 Mining for quasars in the Sloan Digital Sky Survey: Brightest objects in the universe Thus, the farthest/oldest we can see Believed to be active galactic nuclei: giant black holes Implications for dark matter, dark energy, etc.

Dual-tree Algorithms in Statistics – p.58/77

slide-83
SLIDE 83

Case Study: Quasar Identification

Riegel et al., SIAM Data Mining 2008 (Sumbitted) Richards et al., AAS 2008 Mining for quasars in the Sloan Digital Sky Survey: Brightest objects in the universe Thus, the farthest/oldest we can see Believed to be active galactic nuclei: giant black holes Implications for dark matter, dark energy, etc. Peplow, Nature 2005 uses one of our catalogs to verify the cosmic magnification effect predicted by relativity

Dual-tree Algorithms in Statistics – p.58/77

slide-84
SLIDE 84

Case Study: Quasar Identification

Riegel et al., SIAM Data Mining 2008 (Sumbitted) Richards et al., AAS 2008 Trained a KDA classifier on 4D spectra data from about 80k known quasars and 400k non-quasars. Identified about 1m quasars from 40m unknown objects.

Dual-tree Algorithms in Statistics – p.59/77

slide-85
SLIDE 85

Case Study: Quasar Identification

Riegel et al., SIAM Data Mining 2008 (Sumbitted) Richards et al., AAS 2008 Trained a KDA classifier on 4D spectra data from about 80k known quasars and 400k non-quasars. Identified about 1m quasars from 40m unknown objects. Took 640 seconds in serial; half of that was tree-building.

Dual-tree Algorithms in Statistics – p.59/77

slide-86
SLIDE 86

Case Study: Quasar Identification

Riegel et al., SIAM Data Mining 2008 (Sumbitted) Richards et al., AAS 2008 Trained a KDA classifier on 4D spectra data from about 80k known quasars and 400k non-quasars. Identified about 1m quasars from 40m unknown objects. Took 640 seconds in serial; half of that was tree-building. Naïve’s takes 380 hours, excluding bandwidth learning.

Dual-tree Algorithms in Statistics – p.59/77

slide-87
SLIDE 87

Case Study: Quasar Identification

Riegel et al., SIAM Data Mining 2008 (Sumbitted) Richards et al., AAS 2008 Trained a KDA classifier on 4D spectra data from about 80k known quasars and 400k non-quasars. Identified about 1m quasars from 40m unknown objects. Took 640 seconds in serial; half of that was tree-building. Naïve’s takes 380 hours, excluding bandwidth learning. Algorithmic parameters are key to performance: Hybrid breadth-depth first expansion Epanechnikov kernel (choice of f) to maximize pruning Multi-bandwidth algorithm for faster bandwidth fitting

Dual-tree Algorithms in Statistics – p.59/77

slide-88
SLIDE 88

Case Study: Quasar Identification

10

3

10

4

10

5

10

6

10

−2

10

−1

10 10

1

10

2

10

3

10

4

Data Set Size Running Time LOO CV on 4D Quasar Data Naive Heap Heap, Epan Hybrid Hybrid, Epan

Dual-tree Algorithms in Statistics – p.60/77

slide-89
SLIDE 89

GNPs, Formally Speaking

Boyer, Riegel, and Gray’s THOR Project (Planned) Riegel et al., NIPS 2008 or JMLR 2008 Higher-order reduce problem Ψ = g ◦ ψ, with

ψ(X1, . . . , Xn) =

  • 1

x1∈X1

· · ·

  • n

xn∈Xn

f(x1, . . . , xn)

Dual-tree Algorithms in Statistics – p.61/77

slide-90
SLIDE 90

GNPs, Formally Speaking

Boyer, Riegel, and Gray’s THOR Project (Planned) Riegel et al., NIPS 2008 or JMLR 2008 Higher-order reduce problem Ψ = g ◦ ψ, with

ψ(X1, . . . , Xn) =

  • 1

x1∈X1

· · ·

  • n

xn∈Xn

f(x1, . . . , xn)

subject to decomposability requirement

ψ(. . . , Xi, . . .) = ψ(. . . , XL

i , . . .) ⊗

i ψ(. . . , XR

i , . . .)

for all 1 ≤ i ≤ n and partitions XL

i ∪ XR i = Xi.

Dual-tree Algorithms in Statistics – p.61/77

slide-91
SLIDE 91

GNPs, Formally Speaking

Boyer, Riegel, and Gray’s THOR Project (Planned) Riegel et al., NIPS 2008 or JMLR 2008 Higher-order reduce problem Ψ = g ◦ ψ, with

ψ(X1, . . . , Xn) =

  • 1

x1∈X1

· · ·

  • n

xn∈Xn

f(x1, . . . , xn)

subject to decomposability requirement

ψ(. . . , Xi, . . .) = ψ(. . . , XL

i , . . .) ⊗

i ψ(. . . , XR

i , . . .)

for all 1 ≤ i ≤ n and partitions XL

i ∪ XR i = Xi.

We’ll also need some means of bounding the results of ψ.

Dual-tree Algorithms in Statistics – p.61/77

slide-92
SLIDE 92

Decomposability

(Planned) Riegel et al., NIPS 2008 or JMLR 2008 Decomposability is restrictive; always possible for problems formed by combinations of map and some one other ⊗.

Dual-tree Algorithms in Statistics – p.62/77

slide-93
SLIDE 93

Decomposability

(Planned) Riegel et al., NIPS 2008 or JMLR 2008 Decomposability is restrictive; always possible for problems formed by combinations of map and some one other ⊗. It is equivalent to

  • 1

x1∈X1

· · ·

  • n

xn∈Xn

f(x1, · · · , xn) =

  • p1

xp1∈Xp1

· · ·

  • pn

xpn∈Xpn

f(x1, · · · , xn)

for all permutations p of the set {1, . . . , n},

Dual-tree Algorithms in Statistics – p.62/77

slide-94
SLIDE 94

Decomposability

(Planned) Riegel et al., NIPS 2008 or JMLR 2008 Decomposability is restrictive; always possible for problems formed by combinations of map and some one other ⊗. It is equivalent to

  • 1

x1∈X1

· · ·

  • n

xn∈Xn

f(x1, · · · , xn) =

  • p1

xp1∈Xp1

· · ·

  • pn

xpn∈Xpn

f(x1, · · · , xn)

for all permutations p of the set {1, . . . , n}, and to

(ψ(XL

i , XL j ) ⊗

i ψ(XR

i , XL j )) ⊗

j (ψ(XL

i , XR j ) ⊗

i ψ(XR

i , XR j ))

= (ψ(XL

i , XL j ) ⊗

j ψ(XL

i , XR j )) ⊗

i (ψ(XR

i , XL j ) ⊗

j ψ(XR

i , XR j ))

Dual-tree Algorithms in Statistics – p.62/77

slide-95
SLIDE 95

Decomposability

ψ(X, Y ) =

  • x∈X
  • y∈Y

f(x, y) ( f(x1, y1) ⊗ f(x1, y2) ⊗ · · · ⊗ f(x1, yM) ) ⊙ ( f(x2, y1) ⊗ f(x2, y2) ⊗ · · · ⊗ f(x2, yM) ) ⊙

. . .

⊙ ( f(xN, y1) ⊗ f(xN, y2) ⊗ · · · ⊗ f(xN, yM) )

Dual-tree Algorithms in Statistics – p.63/77

slide-96
SLIDE 96

Decomposability

ψ(X, Y ) = ψ(X, YL) ⊗ ψ(X, YR)              f(x1, y1) ⊙ f(x2, y1) ⊙

. . .

⊙ f(xN, y1)              ⊗              ( f(x1, y2) ⊗ · · · ⊗ f(x1, yM) ) ⊙ ( f(x2, y2) ⊗ · · · ⊗ f(x2, yM) ) ⊙

. . .

⊙ ( f(xN, y2) ⊗ · · · ⊗ f(xN, yM) )             

Dual-tree Algorithms in Statistics – p.64/77

slide-97
SLIDE 97

Transforming Problems into GNPs

(Planned) Riegel et al., NIPS 2008 or JMLR 2008 (“Serial” GNPs.) Decomposable or not,

g1

1 x1∈X1

g2

2 x2∈X2

· · · gn

n xn∈Xn

f(x1, . . . , xn)

  • · · ·
  • may be transformed into nested GNPs by replacing every
  • ther operator with map and factoring intermediate gi out.

Dual-tree Algorithms in Statistics – p.65/77

slide-98
SLIDE 98

Transforming Problems into GNPs

(Planned) Riegel et al., NIPS 2008 or JMLR 2008 (“Serial” GNPs.) Decomposable or not,

g1

1 x1∈X1

g2

2 x2∈X2

· · · gn

n xn∈Xn

f(x1, . . . , xn)

  • · · ·
  • may be transformed into nested GNPs by replacing every
  • ther operator with map and factoring intermediate gi out.

(“Parallel” GNPs.) Also GNP-able are problems such as:

map

i

  • j wijK(xi, xj)
  • j K(xi, xj)

Dual-tree Algorithms in Statistics – p.65/77

slide-99
SLIDE 99

Transforming Problems into GNPs

(Planned) Riegel et al., NIPS 2008 or JMLR 2008 (“Serial” GNPs.) Decomposable or not,

g1

1 x1∈X1

g2

2 x2∈X2

· · · gn

n xn∈Xn

f(x1, . . . , xn)

  • · · ·
  • may be transformed into nested GNPs by replacing every
  • ther operator with map and factoring intermediate gi out.

(“Parallel” GNPs.) Also GNP-able are problems such as:

map

i

  • j wijK(xi, xj)
  • j K(xi, xj)

(“Multi” GNPs.) Wrap problem with map to vary parameter.

Dual-tree Algorithms in Statistics – p.65/77

slide-100
SLIDE 100

The Algorithm

Boyer, Riegel, and Gray’s THOR Project (Planned) Riegel et al., ICML 2008 or JMLR 2008 “One algorithm to solve them all”:

ψ(X1, . . . , Xn) ←      a

if bounds prove it is safe to prune to a,

f(x1, . . . , xn)

if each Xi = {xi}, i.e. is leaf,

ψ(. . . , XL

i , . . .) ⊗

i ψ(. . . , XR

i , . . .)

  • therwise

Dual-tree Algorithms in Statistics – p.66/77

slide-101
SLIDE 101

The Algorithm

Boyer, Riegel, and Gray’s THOR Project (Planned) Riegel et al., ICML 2008 or JMLR 2008 “One algorithm to solve them all”:

ψ(X1, . . . , Xn) ←      a

if bounds prove it is safe to prune to a,

f(x1, . . . , xn)

if each Xi = {xi}, i.e. is leaf,

ψ(. . . , XL

i , . . .) ⊗

i ψ(. . . , XR

i , . . .)

  • therwise

Regarding speed, pruning is everything.

Dual-tree Algorithms in Statistics – p.66/77

slide-102
SLIDE 102

Pruning

Boyer, Riegel, and Gray’s THOR Project (Planned) Riegel et al., ICML 2008 or JMLR 2008 Roughly, three kinds:

Dual-tree Algorithms in Statistics – p.67/77

slide-103
SLIDE 103

Pruning

Boyer, Riegel, and Gray’s THOR Project (Planned) Riegel et al., ICML 2008 or JMLR 2008 Roughly, three kinds: Intrinsic pruning depends only on bounds of

ψ(X1, . . . , Xn) (ex: n-point correlation)

Dual-tree Algorithms in Statistics – p.67/77

slide-104
SLIDE 104

Pruning

Boyer, Riegel, and Gray’s THOR Project (Planned) Riegel et al., ICML 2008 or JMLR 2008 Roughly, three kinds: Intrinsic pruning depends only on bounds of

ψ(X1, . . . , Xn) (ex: n-point correlation)

Extrinsic pruning depends on bounds of ψ(X1, . . . , Xn) and past work (ex: all-nearest-neighbors)

Dual-tree Algorithms in Statistics – p.67/77

slide-105
SLIDE 105

Pruning

Boyer, Riegel, and Gray’s THOR Project (Planned) Riegel et al., ICML 2008 or JMLR 2008 Roughly, three kinds: Intrinsic pruning depends only on bounds of

ψ(X1, . . . , Xn) (ex: n-point correlation)

Extrinsic pruning depends on bounds of ψ(X1, . . . , Xn) and past work (ex: all-nearest-neighbors) Termination pruning depends only on past work (ex: kernel discriminant analysis)

Dual-tree Algorithms in Statistics – p.67/77

slide-106
SLIDE 106

Pruning

Boyer, Riegel, and Gray’s THOR Project (Planned) Riegel et al., ICML 2008 or JMLR 2008 Roughly, three kinds: Intrinsic pruning depends only on bounds of

ψ(X1, . . . , Xn) (ex: n-point correlation)

Extrinsic pruning depends on bounds of ψ(X1, . . . , Xn) and past work (ex: all-nearest-neighbors) Termination pruning depends only on past work (ex: kernel discriminant analysis) Approximation is a form of extrinsic pruning.

Dual-tree Algorithms in Statistics – p.67/77

slide-107
SLIDE 107

Pruning

Boyer, Riegel, and Gray’s THOR Project (Planned) Riegel et al., ICML 2008 or JMLR 2008 Roughly, three kinds: Intrinsic pruning depends only on bounds of

ψ(X1, . . . , Xn) (ex: n-point correlation)

Extrinsic pruning depends on bounds of ψ(X1, . . . , Xn) and past work (ex: all-nearest-neighbors) Termination pruning depends only on past work (ex: kernel discriminant analysis) Approximation is a form of extrinsic pruning. Kind of pruning determined by problem specification. Ease of pruning influenced by algorithmic parameters.

Dual-tree Algorithms in Statistics – p.67/77

slide-108
SLIDE 108

Algorithmic Parameters

An implementation must answer these questions:

Dual-tree Algorithms in Statistics – p.68/77

slide-109
SLIDE 109

Algorithmic Parameters

An implementation must answer these questions: How to partition data? E.g. what kind of trees to use? Non-binary? No tree (ex: Baeza-Yeats)?

Dual-tree Algorithms in Statistics – p.68/77

slide-110
SLIDE 110

Algorithmic Parameters

An implementation must answer these questions: How to partition data? E.g. what kind of trees to use? Non-binary? No tree (ex: Baeza-Yeats)? What expansion pattern? Depth-first? Breath-first? Something else? Which branches first, heuristically?

Dual-tree Algorithms in Statistics – p.68/77

slide-111
SLIDE 111

Algorithmic Parameters

An implementation must answer these questions: How to partition data? E.g. what kind of trees to use? Non-binary? No tree (ex: Baeza-Yeats)? What expansion pattern? Depth-first? Breath-first? Something else? Which branches first, heuristically? (Higher-level.) What scale of data structures to use? Does the problem fit in RAM? Need to be parallel?

Dual-tree Algorithms in Statistics – p.68/77

slide-112
SLIDE 112

Trees

Gray and Lee’s Proximity Project, 2005 Many options: kd-trees, ball trees, cover trees, sorted lists.

Dual-tree Algorithms in Statistics – p.69/77

slide-113
SLIDE 113

Trees

Gray and Lee’s Proximity Project, 2005 Many options: kd-trees, ball trees, cover trees, sorted lists. Aside: tree building constitutes graph partitioning and may (attempt to) minimize some loss function.

Dual-tree Algorithms in Statistics – p.69/77

slide-114
SLIDE 114

Expansion Pattern

Boyer, Riegel, and Gray’s THOR Project (Planned) Riegel et al., ICML 2008 or JMLR 2008 Describes the order we replace

ψ(. . . , Xi, . . .) ← ψ(. . . , XL

i , . . .) ⊗

i ψ(. . . , XR

i , . . .)

Dual-tree Algorithms in Statistics – p.70/77

slide-115
SLIDE 115

Expansion Pattern

Boyer, Riegel, and Gray’s THOR Project (Planned) Riegel et al., ICML 2008 or JMLR 2008 Describes the order we replace

ψ(. . . , Xi, . . .) ← ψ(. . . , XL

i , . . .) ⊗

i ψ(. . . , XR

i , . . .)

DFS has least overhead, sensitive to heuristic

Dual-tree Algorithms in Statistics – p.70/77

slide-116
SLIDE 116

Expansion Pattern

Boyer, Riegel, and Gray’s THOR Project (Planned) Riegel et al., ICML 2008 or JMLR 2008 Describes the order we replace

ψ(. . . , Xi, . . .) ← ψ(. . . , XL

i , . . .) ⊗

i ψ(. . . , XR

i , . . .)

DFS has least overhead, sensitive to heuristic BFS has more overhead, less senitive to heuristic

Dual-tree Algorithms in Statistics – p.70/77

slide-117
SLIDE 117

Expansion Pattern

Boyer, Riegel, and Gray’s THOR Project (Planned) Riegel et al., ICML 2008 or JMLR 2008 Describes the order we replace

ψ(. . . , Xi, . . .) ← ψ(. . . , XL

i , . . .) ⊗

i ψ(. . . , XR

i , . . .)

DFS has least overhead, sensitive to heuristic BFS has more overhead, less senitive to heuristic Priority queue has highest overhead but makes the most of its heuristic; adds need for operators to have inverses (i.e. to form groups)

Dual-tree Algorithms in Statistics – p.70/77

slide-118
SLIDE 118

Expansion Pattern

Boyer, Riegel, and Gray’s THOR Project (Planned) Riegel et al., ICML 2008 or JMLR 2008 Describes the order we replace

ψ(. . . , Xi, . . .) ← ψ(. . . , XL

i , . . .) ⊗

i ψ(. . . , XR

i , . . .)

DFS has least overhead, sensitive to heuristic BFS has more overhead, less senitive to heuristic Priority queue has highest overhead but makes the most of its heuristic; adds need for operators to have inverses (i.e. to form groups) Hybrid breadth-depth first pattern: achieves breadth-first behavior in O(N) space for query-reference problems.

Dual-tree Algorithms in Statistics – p.70/77

slide-119
SLIDE 119

Problem Scale

Boyer, Riegel, and Gray’s THOR Project Simple in-memory data structures, memory-mapped files,

  • r parallelized/distributed data management.

Some observations:

Dual-tree Algorithms in Statistics – p.71/77

slide-120
SLIDE 120

Problem Scale

Boyer, Riegel, and Gray’s THOR Project Simple in-memory data structures, memory-mapped files,

  • r parallelized/distributed data management.

Some observations: All GNPs are parallelizable, some more so than others

Dual-tree Algorithms in Statistics – p.71/77

slide-121
SLIDE 121

Problem Scale

Boyer, Riegel, and Gray’s THOR Project Simple in-memory data structures, memory-mapped files,

  • r parallelized/distributed data management.

Some observations: All GNPs are parallelizable, some more so than others All GNPs can benefit greatly from multicore processors

Dual-tree Algorithms in Statistics – p.71/77

slide-122
SLIDE 122

Problem Scale

Boyer, Riegel, and Gray’s THOR Project Simple in-memory data structures, memory-mapped files,

  • r parallelized/distributed data management.

Some observations: All GNPs are parallelizable, some more so than others All GNPs can benefit greatly from multicore processors Opportunity to use cache-oblivious trees (vEB, etc.)

Dual-tree Algorithms in Statistics – p.71/77

slide-123
SLIDE 123

THOR Coding Framework

Boyer, Riegel, and Gray’s THOR Project Speed-oriented C++ framework for problems of forms

map

q∈Q

g

r∈R

f(q, r)

  • and

g

x1∈X1

  • x2∈X2

f(x1, x2)

  • Dual-tree Algorithms in Statistics – p.72/77
slide-124
SLIDE 124

THOR Coding Framework

Boyer, Riegel, and Gray’s THOR Project Speed-oriented C++ framework for problems of forms

map

q∈Q

g

r∈R

f(q, r)

  • and

g

x1∈X1

  • x2∈X2

f(x1, x2)

  • Coding entails filling a few dozen function stubs

Dual-tree Algorithms in Statistics – p.72/77

slide-125
SLIDE 125

THOR Coding Framework

Boyer, Riegel, and Gray’s THOR Project Speed-oriented C++ framework for problems of forms

map

q∈Q

g

r∈R

f(q, r)

  • and

g

x1∈X1

  • x2∈X2

f(x1, x2)

  • Coding entails filling a few dozen function stubs

Easy variation of tree type, expansion pattern, etc.

Dual-tree Algorithms in Statistics – p.72/77

slide-126
SLIDE 126

THOR Coding Framework

Boyer, Riegel, and Gray’s THOR Project Speed-oriented C++ framework for problems of forms

map

q∈Q

g

r∈R

f(q, r)

  • and

g

x1∈X1

  • x2∈X2

f(x1, x2)

  • Coding entails filling a few dozen function stubs

Easy variation of tree type, expansion pattern, etc. Automatic parallelization (multicore and distributed)

Dual-tree Algorithms in Statistics – p.72/77

slide-127
SLIDE 127

Case Study: Affinity Propagation

(Planned) Riegel et al., NIPS 2008 or JMLR 2008 Recent clustering method:

Dual-tree Algorithms in Statistics – p.73/77

slide-128
SLIDE 128

Case Study: Affinity Propagation

(Planned) Riegel et al., NIPS 2008 or JMLR 2008 Recent clustering method: Frey and Dueck, Science 2007

Dual-tree Algorithms in Statistics – p.73/77

slide-129
SLIDE 129

Case Study: Affinity Propagation

(Planned) Riegel et al., NIPS 2008 or JMLR 2008 Recent clustering method: Frey and Dueck, Science 2007 Finds exemplars in a data set in attempt to minimize square reconstruction error

Dual-tree Algorithms in Statistics – p.73/77

slide-130
SLIDE 130

Case Study: Affinity Propagation

(Planned) Riegel et al., NIPS 2008 or JMLR 2008 Recent clustering method: Frey and Dueck, Science 2007 Finds exemplars in a data set in attempt to minimize square reconstruction error Number of clusters to find is unspecified, but influenced by a “preference” parameter

Dual-tree Algorithms in Statistics – p.73/77

slide-131
SLIDE 131

Case Study: Affinity Propagation

(Planned) Riegel et al., NIPS 2008 or JMLR 2008 Recent clustering method: Frey and Dueck, Science 2007 Finds exemplars in a data set in attempt to minimize square reconstruction error Number of clusters to find is unspecified, but influenced by a “preference” parameter Presented as fast alternative to zillions of random restarts of k-centers algorithm

Dual-tree Algorithms in Statistics – p.73/77

slide-132
SLIDE 132

Case Study: Affinity Propagation

(Planned) Riegel et al., NIPS 2008 or JMLR 2008 For similarity matrix S (pref along diag), update R and A

rij ← sij − max

j′=j (aij′ + sij′)

aij ← κ−

ij i′=i

κ+

i′j(ri′j)

  • Dual-tree Algorithms in Statistics – p.74/77
slide-133
SLIDE 133

Case Study: Affinity Propagation

(Planned) Riegel et al., NIPS 2008 or JMLR 2008 For similarity matrix S (pref along diag), update R and A

rij ← sij − max

j′=j (aij′ + sij′)

aij ← κ−

ij i′=i

κ+

i′j(ri′j)

  • Damping of R and A helps convergence.

Dual-tree Algorithms in Statistics – p.74/77

slide-134
SLIDE 134

Case Study: Affinity Propagation

(Planned) Riegel et al., NIPS 2008 or JMLR 2008 Naïvely O(N2); can be improved if S is made to be sparse, but this introduces uncontrolled error.

Dual-tree Algorithms in Statistics – p.75/77

slide-135
SLIDE 135

Case Study: Affinity Propagation

(Planned) Riegel et al., NIPS 2008 or JMLR 2008 Naïvely O(N2); can be improved if S is made to be sparse, but this introduces uncontrolled error. Alterately, if no damping, we can rearrange into GNPs

αi ← argmax2

j

(κ+

ij(κ+ ij(sij + αi[j]) − ρj) − sij)

ρj ←

  • i

κ+

ij(sij + αi[j])

Dual-tree Algorithms in Statistics – p.75/77

slide-136
SLIDE 136

Case Study: Affinity Propagation

(Planned) Riegel et al., NIPS 2008 or JMLR 2008 Naïvely O(N2); can be improved if S is made to be sparse, but this introduces uncontrolled error. Alterately, if no damping, we can rearrange into GNPs

αi ← argmax2

j

(κ+

ij(κ+ ij(sij + αi[j]) − ρj) − sij)

ρj ←

  • i

κ+

ij(sij + αi[j])

Can pull other tricks to get things to converge.

Dual-tree Algorithms in Statistics – p.75/77

slide-137
SLIDE 137

Case Study: Affinity Propagation

0.01 0.1 1 10 100 1000 10000 100000 1000 10000 100000 1e+06 Mean Time per Iteration (sec) Number of Points Affinity Propagation Runtime Frey-Dueck (extrapolated) Dual-Tree

Dual-tree Algorithms in Statistics – p.76/77

slide-138
SLIDE 138

fin.

Dual-tree Algorithms in Statistics – p.77/77