Hierarchical Decompositions of Kernel Matrices
Bill March UT Austin
- Dec. 12, 2015
Joint work with Bo Xiao, Chenhan Yu, Sameer Tharakan, and George Biros
On the job market!
Hierarchical Decompositions of Kernel Matrices Bill March On the - - PowerPoint PPT Presentation
Hierarchical Decompositions of Kernel Matrices Bill March On the job market! UT Austin Dec. 12, 2015 Joint work with Bo Xiao, Chenhan Yu, Sameer Tharakan, and George Biros Kernel Matrix Approximation x i R d i = 1 , . . . , N points
Bill March UT Austin
Joint work with Bo Xiao, Chenhan Yu, Sameer Tharakan, and George Biros
On the job market!
xi ∈ Rd i = 1, . . . , N
K : Rd × Rd → R
w ∈ RN
Kij = K(xi, xj)
points where kernel function weights Inputs: Output: Exact Evaluation: O(N2) Fast Approximations: O(N log N) or O(N) d > 3
≈ O(Nr2) work with sampling / Nystrom methods low rank full rank
COVTYPE SUSY MNIST2M h ✏c h ✏c h ✏c 0.35 71.6 0.50 65.7 4 95.0 0.22 74.0 0.15 72.1 2 97.4 0.14 79.8 0.09 75.0 1 1001 0.02 95.4 0.05 76.7 0.1 99.5 0.001 6.4 0.01 64.3 0.05 13.6
Bayes Classifier with Gaussian KDE
≈ Exact Approximated — How do we know how to partition the matrix? — How do we approximate the low-rank blocks?
parallelized, require entire matrix to be low rank
12
, high accuracy, kernel specific, d = 3
2
matrices for kernels
scales with N and d
local structure
approximations
source library LIBASKIT
≈ Exact Approximated — How do we know how to partition the matrix? — How do we approximate the low-rank blocks?
interaction of a node with all other points
m s m N
But requires O(N m2) work!
factor
distribution using nearest neighbors to capture important rows
s 𝓂 m N
approximation?
information — any node containing a nearest neighbor must be evaluated exactly
approximation?
information — any node containing a nearest neighbor must be evaluated exactly
approximation?
information — any node containing a nearest neighbor must be evaluated exactly
randomized linear algebra (Upward pass)
and merge lists for FMM node-to-node lists
Storage Factorization Evaluation
G
s2 N p + s3 log p
p
N p ✓ κs log ✓N s ◆◆
Data N d ✏2 %K Uniform 1M 64 5E-3 1.6% Covtype 500K 54 8E-2 2.7% SUSY 4.5M 18 5E-3 0.4 % HIGGS 10.5M 28 1E-1 11% BRAIN 10.5M 246 5E-3 0.9%
Relative errors and fraction of kernel evaluations
#cores 512 2,048 4,096 8,192 16,384 Fact. 2,297 778 544 438 363 Eval. 157 67 42 28 23 Eff. 1.00 0.72 0.50 0.32 0.20
HIGGS data, 11M points, 28d k = 1024, s = 2048 Estimated L2 error = 0.09
#cores Mat-vec Fact. Inv. Total Eff 1, 024 1.5 95.1 0.9 96 1.00 2, 048 0.8 51.4 0.5 52 0.92 4, 096 0.4 29.0 0.3 30 0.80
Normal data, 16M points, 64d (6 intrinsic) k = 128, s = 256 Inverse error = 4E-6
Approximates (λI + K)−1
dimension
also useful as a preconditioner
For code and papers: www.ices.utexas.edu/~march padas.ices.utexas.edu/libaskit