WEIGHTED SUMS OF RANDOM KITCHEN SINKS Replacing minimization with - - PowerPoint PPT Presentation

weighted sums of random kitchen
SMART_READER_LITE
LIVE PREVIEW

WEIGHTED SUMS OF RANDOM KITCHEN SINKS Replacing minimization with - - PowerPoint PPT Presentation

WEIGHTED SUMS OF RANDOM KITCHEN SINKS Replacing minimization with randomization in learning The model Given a set of training data in a domain Fit a function to minimize risk Empirical Risk Risk Loss Function Hinge loss


slide-1
SLIDE 1

WEIGHTED SUMS OF RANDOM KITCHEN SINKS

Replacing minimization with randomization in learning

slide-2
SLIDE 2

The model

  • Given a set of training data in a domain
  • Fit a function to minimize risk
  • Empirical Risk
  • Risk
slide-3
SLIDE 3

Loss Function

  • Hinge loss (SVM)
  • Exponential loss (AdaBoost)
  • Quadratic loss
slide-4
SLIDE 4
  • Consider solutions in the form
  • Feature functions
  • Eigenfunctions (kernel SVM)
  • Decision trees/stumps (AdaBoost)
  • More feature functions gives better classification

Form of solution function

weights feature functions

slide-5
SLIDE 5
  • Approximate
  • This is hard!
  • New approach:
  • Randomly choose and minimize over

Solving f

slide-6
SLIDE 6

Randomized approach

  • Training data
  • Feature function
  • Number of features
  • Parameter distribution
  • Scaling factor
  • Algorithm
  • Draw feature parameters randomly from
  • Let
  • Minimize empirical risk
slide-7
SLIDE 7

Experimental Results vs AdaBoost

  • Three datasets
  • adult
  • activity
  • KDDCUP99
  • Feature function
  • sampled uniformly at random
  • sampled from Gaussian
slide-8
SLIDE 8

Experimental Results vs AdaBoost

slide-9
SLIDE 9

Pros and Cons

  • Pros
  • Much faster
  • Allows simple and efficient experimentation of feature functions
  • Cons
  • Some loss in quality
  • Need to tune probability distribution (not needed in practice)
slide-10
SLIDE 10

Concentration of Risk

  • The randomized algorithm returns a function such that

with probability

  • Number of training points
  • Number of feature vectors
  • Lipschitz constant of loss function
  • Bound approximation error
  • Lowest risk versus lowest risk from functions returned is not large
  • Bound estimation error
  • True risk of every function returned is close to its empirical risk
slide-11
SLIDE 11

Proof

  • minimizer of risk over all solution functions
  • minimizer of risk over functions returned
  • minimizer of empirical risk over functions returned
  • Then

with probability

slide-12
SLIDE 12

Bounding approximation error

  • Lemma 1. Let be i.i.d. random variables in

a ball of radius centered about the origin in a Hilbert

  • space. Then

with probability

  • Construct functions
  • Then there exists
  • So that
slide-13
SLIDE 13

Bounding approximation error

  • If the loss function has Lipschitz constant , for any two

functions

  • Then