Block-Quantized Kernel Matrix for Fast Spectral Embedding Kai Zhang - PowerPoint PPT Presentation

Block-Quantized Kernel Matrix for Fast Spectral Embedding Kai Zhang James T. Kwok Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong

Introduction The Proposed Method Experiments Conclusion Outline Introduction 1 Eigendecomposition of Kernel Matrix Scale-Up Methods The Proposed Method 2 Gram Matrix of Special Forms Basic Idea Matrix Approximation Matrix Quantization Density Weighted Nystr¨ om Extension Experiments 3 Kernel Principal Component Analysis Image Segmentation Conclusion 4

Introduction The Proposed Method Experiments Conclusion Eigendecomposition of Kernel Matrix Eigen-decomposition of Kernel Matrix When do we need to eigen-decompose the kernel matrix? Kernel Principle Component Analysis A powerful tool to extract nonlinear structures in the high dimensional feature space (Sch¨ olkopf 1998). Spectral Clustering A global, pairwise clustering method based on graph partitioning theories (Shi & Malik, 2000). Manifold Learning and Dimensionality Reduction Laplacian Eigenmap, ISOMAP , Locally linear Embedding...

Introduction The Proposed Method Experiments Conclusion Scale-Up Methods Scale-Up Methods Low-rank approximation of the form L = GG ′ , where L ∈ R N × N , G ∈ R N × m and m ≪ N is the rank Incomplete Cholesky decomposition (Bach & Jordan, 2002; Fine & Scheinberg, 2001) Sparse greedy kernel methods (Smola &Bartlett, 2000) Sampling-based methods Nystr¨ om: randomly selects columns of the kernel matrix (Williams & Segger, 2001; Lawrence & Herbrich, 2005) Drineas & Mahoney (2005): chooses the columns based on a data-dependent probability Ouimet and Bengio (2005): uses a greedy sampling scheme based on the feature space geometry

Introduction The Proposed Method Experiments Conclusion Outline Introduction 1 Eigendecomposition of Kernel Matrix Scale-Up Methods The Proposed Method 2 Gram Matrix of Special Forms Basic Idea Matrix Approximation Matrix Quantization Density Weighted Nystr¨ om Extension Experiments 3 Kernel Principal Component Analysis Image Segmentation Conclusion 4

Introduction The Proposed Method Experiments Conclusion Gram Matrix of Special Forms Block Quantized Matrices Definition The block-quantized matrix W 1   contains m 2 constant blocks. a a b b b         a a b b b The block at the i th row and j th   2 W = column, C ij , has dimension c c d d d      c c d d d  n i × n j , with entry value β ij .     c c d d d E.g., n 1 = 2, n 2 = 3, β 11 = a , β 12 = b , 3 β 21 = c , β 22 = d . Note Block quantization can be performed by: partition the data set into m clusters; 1 set β ij = K ( t i , t j ) ( i , j = 1 , 2 , ..., m ), where t i is the 2 representative of the i th cluster.

Introduction The Proposed Method Experiments Conclusion Gram Matrix of Special Forms Properties of Block Quantized Matrices Eigensystem of W , W φ = λφ       a a b b b φ 1 φ 1 a a b b b φ 2 φ 2              = λ c c d d d φ 3 φ 3      c c d d d φ 4 φ 4 c c d d d φ 5 φ 5 The first n 1 equations are the same, so are the next n 2 equations,..., and so on. It is equal to the m × m system � φ i , where � W � φ j = λ � W ij = β ij n j . How to recover the eigensystem of W from that of � W ? Eigenvalues: W and � W have the same eigenvalues. Eigenvectors: repeat the k th entry of � φ n k times, then we get φ (i.e., φ is piecewise constant).

Introduction The Proposed Method Experiments Conclusion Basic Idea Basic Idea Idea Utilize the blockwise structure of the kernel matrix W to compute the eigen-decomposition more efficiently. Procedure Find a blockwise-constant matrix W to approximate W . 1 Use the Frobenius norm � W − W � F as the approximation criteria. The eigen-system of the N × N matrix W can be fully 2 recovered from that of the m × m matrix � W . Use this as an approximate solution to the eigen-decomposition of W .

Introduction The Proposed Method Experiments Conclusion Matrix Approximation Approximation of Eigenvalues Matrix perturbation theory [Bhatia, 1992] Difference between two matrices can bound the difference between their singular value spectra. If A , E ∈ R m × n , and σ k ( A ) is the k th singular value of A , then 1 ≤ t ≤ n | σ t ( A + E ) − σ t ( A ) | ≤ � E � 2 , max n � ( σ k ( A + E ) − σ k ( A )) 2 ≤ � E � 2 F . k = 1

Introduction The Proposed Method Experiments Conclusion Matrix Approximation Approximation of Eigenvectors Our Analysis In some cases the eigenvectors are of greater importance, such as in manifold embedding, spectral clustering, etc. Let W and W be the original and block-quantized matrices, with eigen-value/vector pair ( α, µ ) and ( β, ν ) , respectively. Then we have  � �  α + 1 1 � W � 2 + 1 β � E � 2 , α ≤ β, β � � � µ − ν � ≤  β − 1 3 � W � 2 + 1 β � E � 2 , α > β. α Since � E � 2 ≤ � E � F , therefore by minimizing � E � F , we can also bound the approximation error of the eigenvectors.

Introduction The Proposed Method Experiments Conclusion Matrix Quantization Minimization of the Matrix Approximation Error The objective E = � W − W � F can be written as � � 2 N m � � � � � 2 . E = W ij − W ij = W pq − β ij i , j = 1 i , j = 1 x p ∈ S i , x q ∈ S j Can be minimized by setting ∂ E ∂β ij = 0 to obtain � 1 β ij = K ( x p , x q ) . n i n j x p ∈ S i , x q ∈ S j Takes O ( N 2 ) time to compute the β ij ’s.

Introduction The Proposed Method Experiments Conclusion Matrix Quantization Data Partitioning Assumption Suppose the data set is partitioned into clusters in the input space Local cluster S i has a minimum enclosing ball ( MEB ) with radius r i . The cluster representative t i should fall into this MEB . Question How does the partitioning influence the matrix approximation quality?

Introduction The Proposed Method Experiments Conclusion Matrix Quantization Approximation error vs. Data Partitioning Upper Bound The approximation error E is bounded by � � E ≤ 64 N 2 ξ 2 R 2 1 D 2 + 4 R 2 + 4 DR , σ 4 �� x − y σ , width of the (stationary) kernel K ( x , y ) = k . σ ξ = max | k ′ ( x ) | . R = i = 1 , 2 ,..., m r i , maximum MEB radius. max � 1 D = ij n i n j D ij , average pairwise distance. N 2 � D 2 = 1 ij n i n j D 2 ij , average pairwise squared distance. N 2

Introduction The Proposed Method Experiments Conclusion Matrix Quantization Sequential Sampling Objective Partition the data set into compact local clusters, such that every point is close to its cluster center. Procedure 1: Randomly select a sample to initialize the cluster center set C = { t 1 } . For i = 1 , 2 , . . . , N , do the following. 2: Compute l ij = � x i − t j � , t j ∈ C . Once l ij ≤ r , assign x i to S j , let i = i + 1, and go to the next step. 3: If � x i − t j � > r , ∀ t j ∈ C , add x i to C as a new center. Let i = i + 1 and go to the next step. 4: On termination, count the number of samples, n j , in S j , and � update each t j ∈ C as t j = 1 x i ∈ S j x i . n j

Introduction The Proposed Method Experiments Conclusion Matrix Quantization Example: Sequential Sampling Data (left); Small threshold r (middle); Large threshold (right) 200 200 150 150 100 100 50 50 0 0 300 300 250 250 200 200 200 200 150 150 100 100 100 100 50 50 0 0 0 0 Property The local clusters are bounded by the hypercube of side length 2 r , where r is the partitioning parameter. The complexity is O ( N log m ) by using a hierarchical implementation.

Introduction The Proposed Method Experiments Conclusion Matrix Quantization Gradient Optimization The approximation error E = � W − W � F can be written as a function of the cluster representatives t i ’s, m � � � � 2 E = K ( x p , x q ) − K ( t i , t j ) i , j = 1 p ∈ S i , q ∈ S j which can be optimized using gradient descent � � � � t k − t j � 2 /σ 2 � − A kj K 2 � � t k − t j � 2 /σ 2 �� j � = k t j B kj K � � t k − t j � 2 /σ 2 � − A kj K 2 � � t k − t j � 2 /σ 2 � t k = � . j � = k B kj K Here A ij = n i n j , B ij = � p ∈ S i , q ∈ S j K ( x p , x q ) . The iteration can fine tune the cluster representatives especially when m is small.

Block-Quantized Kernel Matrix for Fast Spectral Embedding Kai Zhang - PowerPoint PPT Presentation

Block-Quantized Kernel Matrix for Fast Spectral Embedding Kai Zhang James T. Kwok Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong Introduction The Proposed Method Experiments

Greedy embedding of a graph Greedy embedding of a graph 99 Greedy embedding Greedy embedding

Avoiding artifacts in spectral white matter fiber clustering and embedding Demian Wassermann

Quantized Vortices and Quantized Vortices and Quantum Turbulence Quantum Turbulence Makoto

Quantized Quantized superfluid vortices superfluid vortices in the unitary Fermi gas in the

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of

Graph Drawing Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 )

Planarity Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 ) assigns

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

Quantized cosmological spacetimes and higher spin in the IKKT model Harold Steinacker Department

Hourglass Alternative and constructivity of spectral of matrix products V ICTOR K OZYAKIN

A Flexible Design Automation Tool for Accelerating Quantized Spectral CNNs Rachit Rajat, Hanqing

Problem 1 k zero bits n bits IV Block Block Block Block Cipher Cipher Cipher Cipher

An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 An Introduction to Spectral

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

T-61.3050 Machine Learning: Basic Principles Clustering Kai Puolam aki Laboratory of Computer

Stiefel Manifolds and their Applications Pierre-Antoine Absil (UCLouvain) CESAME seminar 22

Methods for finding coupled patterns in two data sets Martin Widmann VALUE training school, ICTP

Reducing Dimensionality Steven J Zeil Old Dominion Univ. Fall 2010 1 Feature Selection

Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Data

Analytical Query Processing Marco Serafini COMPSCI 532 Lecture 7 Announcement Midterm date

Scalable Data Processing at Network transfer rates with nCorium Compute in Memory Modules Suresh

AIMS CDT - Signal Processing Michaelmas Term 2020 Xiaowen Dong Department of Engineering Science