Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel - - PowerPoint PPT Presentation

tight kernel query complexity of kernel ridge regression
SMART_READER_LITE
LIVE PREVIEW

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel - - PowerPoint PPT Presentation

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel Fernndez V, David P. Woodruff, Taisuke Yasuda Kernel Method Many machine learning tasks can be expressed as a function of the inner product


slide-1
SLIDE 1

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering

Manuel Fernández V, David P. Woodruff, Taisuke Yasuda

slide-2
SLIDE 2
  • Many machine learning tasks can be expressed as a function of the inner

product matrix of the data points (rather than the design matrix)

  • Easily adapt to an algorithm for the data under a feature map through the use
  • f a kernel

Kernel Method

slide-3
SLIDE 3

Kernel Query Complexity

  • In this work, we study kernel query complexity: the number of entries of the

kernel matrix read by an algorithm

slide-4
SLIDE 4

Kernel Ridge Regression (KRR)

  • Kernel method applied to ridge regression
  • For large data sets, computing the above is prohibitively expensive
  • Approximation guarantee
slide-5
SLIDE 5

Query-Efficient Algorithms

  • State of the art approximation algorithms have sublinear and data-dependent

runtime and query complexity (Musco and Musco NeurIPS 2017, El Alaoui and Mahoney NeurIPS 2015)

  • Key quantity: effective statistical dimension
slide-6
SLIDE 6

Query-Efficient Algorithms

Figure from Cameron Musco’s slides

slide-7
SLIDE 7

Query-Efficient Algorithms

Theorem (informal)

There is a randomized algorithm computing a -approximate KRR solution with probability at least 2/3 makes at most kernel queries.

slide-8
SLIDE 8

Is this tight?

slide-9
SLIDE 9

Contribution 1: Tight Lower Bounds for KRR

Theorem (informal)

Any randomized algorithm computing a -approximate KRR solution with probability at least 2/3 makes at least kernel queries.

  • Effective against randomized and adaptive (data-dependent) algorithms
  • Tight up to logarithmic factors
  • Settles an open question (El Alaoui and Mahoney NeurIPS 2015)
slide-10
SLIDE 10

Contribution 1: Tight Lower Bounds for KRR

Proof (sketch)

  • Our hard input distribution: all ones vector for the target vector ,

regularization , distribution over binary matrices with effective statistical dimension and rank

slide-11
SLIDE 11

Contribution 1: Tight Lower Bounds for KRR

  • Data distribution for the kernel matrix:
slide-12
SLIDE 12

Contribution 1: Tight Lower Bounds for KRR

Lemma

Any randomized algorithm for labeling the block size of a constant fraction of rows

  • f a kernel matrix drawn from must read kernel entries.
  • Proven using standard techniques
slide-13
SLIDE 13

Contribution 1: Tight Lower Bounds for KRR

Reduction

Main Idea: one can just read off the labels of all the rows from the optimal KRR solution, and one can do this for a constant fraction of the rows from an approximate KRR solution.

slide-14
SLIDE 14

Contribution 1: Tight Lower Bounds for KRR

Optimal KRR solution

slide-15
SLIDE 15

Contribution 1: Tight Lower Bounds for KRR

Optimal KRR solution

The entries are separated by a multiplicative factor.

slide-16
SLIDE 16

Contribution 1: Tight Lower Bounds for KRR

Approximate KRR solution

  • By averaging the approximation guarantee over the coordinates, we can still

distinguish the cluster sizes for a constant fraction of the coordinates

slide-17
SLIDE 17

Kernel -means Clustering (KKMC)

  • Kernel method applied to -means clustering
  • Objective: a partition of the data set into clusters
  • Minimize the cost: sum of squared distances to the nearest centroid
slide-18
SLIDE 18

Contribution 2: Tight Lower Bounds for KKMC

Theorem (informal)

Any randomized algorithm computing a -approximate KKMC solution with probability at least 2/3 makes at least kernel queries.

  • Effective against randomized and adaptive (data-dependent) algorithms
  • Tight up to logarithmic factors
slide-19
SLIDE 19

Contribution 2: Tight Lower Bounds for KKMC

  • Similar techniques, show that a KKMC algorithm must find nonzero entries of

a sparse kernel matrix

  • Hard distribution is sums of standard basis vectors in
slide-20
SLIDE 20

Kernel -means Clustering of Mixtures of Gaussians

  • For input distributions encountered in practice, previous lower bound may be

pessimistic

  • We show that for a mixture of isotropic Gaussians with the dot product

kernel, we can solve KKMC in only kernel queries

slide-21
SLIDE 21

Contribution 3: Query-Efficient Algorithm for Mixtures

  • f Gaussians

Theorem (informal)

Given a mixture of Gaussians with mean separation , there exists a randomized algorithm which returns a - approximate -means clustering solution reading kernel queries with probability at least 2/3.

slide-22
SLIDE 22

Contribution 3: Query-Efficient Algorithm for Mixtures

  • f Gaussians

Main Idea: Johnson-Lindenstrauss Lemma

  • Dimension reduction by multiplying data set by a matrix of zero mean

Gaussians

  • Implemented with few kernel queries since inner products are precomputed