Hierarchical Decompositions of Kernel Matrices Bill March On the - PowerPoint PPT Presentation

Hierarchical Decompositions of Kernel Matrices Bill March On the job market! UT Austin Dec. 12, 2015 Joint work with Bo Xiao, Chenhan Yu, Sameer Tharakan, and George Biros

Kernel Matrix Approximation x i ∈ R d i = 1 , . . . , N points Inputs: d > 3 kernel function K : R d × R d → R w ∈ R N weights K ij = K ( x i , x j ) u = Kw Output: where Exact Evaluation: O(N 2 ) Fast Approximations: O(N log N) or O(N)

Low Rank Approximation O ( Nr 2 ) work with sampling /   ≈ Nystrom methods COVTYPE SUSY MNIST2M h h h ✏ c ✏ c ✏ c low rank 0.35 71.6 0.50 65.7 4 95.0 0.22 74.0 0.15 72.1 2 97.4 100 1 0.14 79.8 0.09 75.0 1 full rank 0.02 95.4 0.05 76.7 0.1 99.5 0.001 6.4 0.01 64.3 0.05 13.6 Bayes Classifier with Gaussian KDE

Hierarchical Approximations Exact ≈ Approximated — How do we know how to partition the matrix? — How do we approximate the low-rank blocks?

Related Work • Nystrom methods [Williams & Seeger, ’01; Drineas & Mahoney, ’05]: scalable, can be parallelized, require entire matrix to be low rank 12 • FMMs: [Greengard, ’85; Lashuk et al., ’12] — N > 10 , high accuracy, kernel specific,   d = 3 • FGTs:[Griebel et al., ’12]: 200K points, synthetic 20D, real 6D, low-order accuracy, sequential • Other hierarchical kernel matrix factorizations & applications: - [Kondor, et al. ’14] — wavelet basis - [Si et al. , ’14] — block Nystrom factoring - [Zhong et al. , ’12] — collaborative filtering - [Ambikasaran & O’Neill, ’15] — Gaussian processes - [Ballani & Kressner, ’14] — QUIC, sparse covariance inverses 2 - [Borm & Garcke, ’07] — H matrices for kernels - [Wang et al , ’15] — block basis factorization - [Gray & Moore, ’00] — general kernel summation treecode - [Lee, et al. , ’12] — kernel independent, parallel treecode, works in modestly high dimensions

ASKIT — Approximate Skeletonization Kernel-Independent Treecode • ASKIT is a kernel-independent algorithm that scales with N and d • Uses nearest neighbor information to capture local structure • Randomized linear algebra to compute approximations • Scalable, parallel implementation and open- source library LIBASKIT

Hierarchical Approximations Exact ≈ Approximated — How do we know how to partition the matrix? — How do we approximate the low-rank blocks?

Keys to ASKIT: Skeletonization m s m • Approximate the interaction of a node N ≈ with all other points • Use a basis of columns But requires O(N m 2 ) work!

Keys to ASKIT:   Randomized Factorization m s • Subsample 𝓂 rows and 𝓂 ≈ ≈ factor N • Construct a sampling distribution using nearest neighbors to capture important rows

Keys to ASKIT:   Combinatorial Pruning Rule • When can we safely use the approximation? • Use nearest neighbor information — any node containing a nearest neighbor must be evaluated exactly

Overall ASKIT Algorithm • Inputs: coordinates, NN info • Construct space partitioning tree • Compute approximate factorizations using randomized linear algebra (Upward pass) • Construct interaction lists using neighbor information and merge lists for FMM node-to-node lists • Evaluate approximate potentials (Downward pass)

    Theoretical Bounds • Error:   C log( N ) σ s +1 ( G ) G κ + s 2 � Nd s 2 N p + s 3 log p � p • Complexity:   Storage Factorization ✓ ✓ N ◆◆ N κ s log p s Evaluation

Accuracy and Work Data % K N d ✏ 2 Uniform 1M 64 5E-3 1.6% Covtype 500K 54 8E-2 2.7% SUSY 4.5M 18 5E-3 0.4 % HIGGS 10.5M 28 1E-1 11% BRAIN 10.5M 246 5E-3 0.9% Relative errors and fraction of kernel evaluations

Strong Scaling #cores 512 2,048 4,096 8,192 16,384 Fact. 2,297 778 544 438 363 Eval. 157 67 42 28 23 E ff . 1.00 0.72 0.50 0.32 0.20 HIGGS data, 11M points, 28d k = 1024, s = 2048 Estimated L2 error = 0.09

INV-ASKIT Approximates ( λ I + K ) − 1 Mat-vec Fact. Inv. Total E ff #cores 1 , 024 1.5 95 . 1 0 . 9 96 1.00 2 , 048 0.8 51 . 4 0 . 5 52 0.92 4 , 096 0.4 29 . 0 0 . 3 30 0.80 Normal data, 16M points, 64d (6 intrinsic) k = 128, s = 256 Inverse error = 4E-6

Summary • ASKIT is a kernel independent FMM that scales with dimension • Efficient and scalable, but requires geometric information • Inv-ASKIT — can efficiently compute approximate inverses, also useful as a preconditioner • Open-source, parallel library available — LIBASKIT For code and papers: www.ices.utexas.edu/~march padas.ices.utexas.edu/libaskit

Hierarchical Decompositions of Kernel Matrices Bill March On the - PowerPoint PPT Presentation

Hierarchical Decompositions of Kernel Matrices Bill March On the job market! UT Austin Dec. 12, 2015 Joint work with Bo Xiao, Chenhan Yu, Sameer Tharakan, and George Biros Kernel Matrix Approximation x i R d i = 1 , . . . , N points

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

MATHEMATICS 1 CONTENTS Matrices Special matrices Operations with matrices Matrix

BOOLEAN MATRIX AND TENSOR DECOMPOSITIONS Pauli Miettinen TML 2013 27 September 2013 BOOLEAN

Query Decompositions survey Nicola Onose January 19, 2007 Nicola Onose Query Decompositions

JUST THE MATHS SLIDES NUMBER 9.10 MATRICES 10 (Symmetric matrices & quadratic forms)

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Tensor Decompositions for ensor Decompositions for Big Multi-aspect Data Big Multi-aspect Data

Some particular direct-sum decompositions and direct-product decompositions Alberto Facchini

Hierarchical Bounding Volume October 11, 2005 () Hierarchical Bounding Volume October 11, 2005

What is a hierarchical model? Richard Erickson Quantitative Ecologist DataCamp Hierarchical

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

ECE/CS 250 Computer Architecture Summer 2019 Basics of Logic Design: Finite State Machines

An Architects Point of View of the Post Moore Era Dr. George Michelogiannakis Research

The Organization of Knowledge, 2 ! History of Information i218 ! Geoff Nunberg ! Feb. 24, 2011 ! 1

Scientific Workflows and Cloud Computing Gideon Juve Ewa Deelman University of Southern

Applications of Normal Quantile Plots David Rose June 13, 2011 David Rose () Applications of

It All Adds Up TODAY'S TAKEAWAYS KEY CONCEPTS ABOUT THE PRESENTERS OUR STORY Over 25 Years

Building a better NetFlow (to appear in SIGCOMM 2004) Cristian Estan, Ken Keys, David Moore,

What does the data tell us about outcomes of EVAR in challenging anatomy? UCSF Vascular Surgery

Hierarchical Decompositions of Kernel Matrices Bill March On the - PowerPoint PPT Presentation

Hierarchical Decompositions of Kernel Matrices Bill March On the job market! UT Austin Dec. 12, 2015 Joint work with Bo Xiao, Chenhan Yu, Sameer Tharakan, and George Biros Kernel Matrix Approximation x i R d i = 1 , . . . , N points

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

MATHEMATICS 1 CONTENTS Matrices Special matrices Operations with matrices Matrix

BOOLEAN MATRIX AND TENSOR DECOMPOSITIONS Pauli Miettinen TML 2013 27 September 2013 BOOLEAN

Query Decompositions survey Nicola Onose January 19, 2007 Nicola Onose Query Decompositions

JUST THE MATHS SLIDES NUMBER 9.10 MATRICES 10 (Symmetric matrices &amp; quadratic forms)

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Tensor Decompositions for ensor Decompositions for Big Multi-aspect Data Big Multi-aspect Data

Some particular direct-sum decompositions and direct-product decompositions Alberto Facchini

Hierarchical Bounding Volume October 11, 2005 () Hierarchical Bounding Volume October 11, 2005

What is a hierarchical model? Richard Erickson Quantitative Ecologist DataCamp Hierarchical

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

ECE/CS 250 Computer Architecture Summer 2019 Basics of Logic Design: Finite State Machines

An Architects Point of View of the Post Moore Era Dr. George Michelogiannakis Research

The Organization of Knowledge, 2 ! History of Information i218 ! Geoff Nunberg ! Feb. 24, 2011 ! 1

Scientific Workflows and Cloud Computing Gideon Juve Ewa Deelman University of Southern

Applications of Normal Quantile Plots David Rose June 13, 2011 David Rose () Applications of

It All Adds Up TODAY'S TAKEAWAYS KEY CONCEPTS ABOUT THE PRESENTERS OUR STORY Over 25 Years

Building a better NetFlow (to appear in SIGCOMM 2004) Cristian Estan, Ken Keys, David Moore,

What does the data tell us about outcomes of EVAR in challenging anatomy? UCSF Vascular Surgery

JUST THE MATHS SLIDES NUMBER 9.10 MATRICES 10 (Symmetric matrices & quadratic forms)