Scalable Kernel Density Classification via Threshold-Based Pruning
Edward Gan & Peter Bailis
1
via Threshold-Based Pruning Edward Gan & Peter Bailis 1 - - PowerPoint PPT Presentation
Scalable Kernel Density Classification via Threshold-Based Pruning Edward Gan & Peter Bailis 1 MacroBase: Analytics on Fast Streams Increasing Streaming Data Manufacturing, Sensors, Mobile Multi-dimensional + Latent anomalies
1
2
3
“Fuel Flow” “Flight Speed” [UCI Repository] Speed Flow Status 28 27 Fpv Close 34 43 High 52 30 Rad Flow 28 40 Rad Flow … …
4
Data Histogram Gaussian Model
Data Histogram Mixture of Gaussians
5
6
Data Histogram Kernel Density Estimate
7
Galaxy Mass Distribution
[Sloan Digital Sky Survey]
Distribution of Bowhead Whales
[L.T. Quackenbush et al, Arctic 2010] 8
9
Training Data Kernels Final Estimate
10
𝑦
Training Data
11 [Wand, J. of Computational and Graphical Statics 1994]
12
13
14
Kernel Density Estimation Threshold Filter ቊHigh 𝑗𝑔 𝑔 𝑦 ≥ 𝑢 Low 𝑗𝑔 𝑔 𝑦 < 𝑢
𝑦
Training Data Densities Classification
15
Kernel Density Estimation Threshold Filter ቊHigh 𝑗𝑔 𝑔 𝑦 ≥ 𝑢 Low 𝑗𝑔 𝑔 𝑦 < 𝑢
𝑦
Training Data Classification
16
17
18
Upper Bound Lower Bound
Upper Bound Lower Bound True Density 𝑔(𝑦)
19
Upper Bound Lower Bound
20
[Gray & Moore, ICDM 2003] 21
Total Contribution Maximum Contribution Minimum Contribution 𝑔(𝑦)
22
k-d tree root node split split split 𝑔(𝑦)
23
Kernel Classification Better Threshold Threshold Estimate
24
25
26
100𝑁 100𝑁
1 2
100𝑁 100𝑁
7 8
27
28
radial radial
5000x 1000x
kdtree kdtree
29
30
Asymptotic Speedup
31
Training Data KDE Model Classification
https://github.com/stanford-futuredata/tKDC