1 1 1 1 1 1 1 1 1 1 1 1 1 1 Jerome F riedman T rev - - PowerPoint PPT Presentation

1 1 1 1 1 1 1 1 1 1 1 1 1 1
SMART_READER_LITE
LIVE PREVIEW

1 1 1 1 1 1 1 1 1 1 1 1 1 1 Jerome F riedman T rev - - PowerPoint PPT Presentation

Stanford Univ ersit y Decem b er Bo osting Stanford Univ ersit y Decem b er Bo osting Classication Problem Additiv e Logistic Regression a


slide-1
SLIDE 1 Stanford Univ ersit y Decem b er
  • Bo
  • sting
  • Additiv
e Logistic Regression a Statistical View
  • f
Bo
  • sting
Jerome F riedman T rev
  • r
Hastie Rob Tibshirani Stanford Univ ersit y Thanks to Bogdan P
  • p
escu for helpful and v ery liv ely discussions
  • n
the history
  • f
b
  • sting
and for help in preparing that part
  • f
this talk Email trevorstatstanford ed u Ftp statstanfordedu pubhastie WWW httpwwwstatsta nfo rd edu
  • tre
vor These transparencies are a v ailable via ftp ftpstatstanford e du pub ha sti e boo st p s Stanford Univ ersit y Decem b er
  • Bo
  • sting
  • Classication
Problem

0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Data X
  • Y
  • R
p
  • f
g X is predictor feature Y is class lab el resp
  • nse
X
  • Y
  • ha
v e join t probabilit y distribution D
  • Goal
Based
  • n
N training pairs X i
  • Y
i
  • dra
wn from D pro duce a classier
  • C
X
  • f
g Goal c ho
  • se
  • C
to ha v e lo w generalization error R
  • C
  • P
D
  • C
X
  • Y
  • E
D
  • C
X Y
slide-2
SLIDE 2 Stanford Univ ersit y Decem b er
  • Bo
  • sting
  • Deterministic
Concepts

0 0 0 0 0 0 0 0 0 0 00 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

X
  • R
p has distribution D
  • C
X
  • is
deterministic function
  • concept
class Goal Based
  • n
N training pairs X i
  • Y
i
  • C
X i
  • dra
wn from D pro duce a classier
  • C
X
  • f
g Goal c ho
  • se
  • C
to ha v e lo w generalization error R
  • C
  • P
D
  • C
X
  • C
X
  • E
D
  • C
X C X
  • Stanford
Univ ersit y Decem b er
  • Bo
  • sting
  • Classication
T rees

x.2<-1.06711 x.2>-1.06711 94/200 1 0/34 1 x.2<1.14988 x.2>1.14988 72/166 x.1<1.13632 x.1>1.13632 40/134 x.1<-0.900735 x.1>-0.900735 23/117 x.1<-1.1668 x.1>-1.1668 5/26 1 0/12 1 x.1<-1.07831 x.1>-1.07831 5/14 1 1/5 1 4/9 1 x.2<-0.823968 x.2>-0.823968 2/91 2/8 0/83 0/17 1 0/32 1

slide-3
SLIDE 3 Stanford Univ ersit y Decem b er
  • Bo
  • sting
  • Decision
Boundary T ree

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0

When the nested spheres are in R
  • CAR
T TM pro duces a rather noisy and inaccurate rule
  • C
X
  • with
error rates around
  • Stanford
Univ ersit y Decem b er
  • Bo
  • sting
  • Bagging
and Bo
  • sting
Classication trees can b e simple but
  • ften
pro duce noisy bush y
  • r
w eak stun ted classiers
  • Bagging
Breiman
  • Fit
man y large trees to b
  • tstrapresampled
v ersions
  • f
the training data and classify b y ma jorit y v
  • te
  • Bo
  • sting
F reund
  • Shapire
  • Fit
man y large
  • r
small trees to rew eigh ted v ersions
  • f
the training data Classify b y w eigh ted ma jorit y v
  • te
In general Bo
  • sting
  • Bagging
  • Single
T ree AdaBo
  • st
  • b
est
  • theshelf
classier in the w
  • rld
  • Leo
Breiman NIPS w
  • rkshop
slide-4
SLIDE 4 Stanford Univ ersit y Decem b er
  • Bo
  • sting
  • training sample

weighted sample weighted sample weighted sample f1(x) f2(x) f3(x) fB(x) sign[Σαbfb(x)] Final classifier

The w eigh ting in b
  • sting
can b e ac hiev ed b y w eigh ted imp
  • rtance
sampling Stanford Univ ersit y Decem b er
  • Bo
  • sting
  • Bagging
and Bo
  • sting
  • p
  • in
ts from Nested Spheres in R
  • Ba
y es error rate is
  • T
rees are gro wn Best First without pruning Leftmost iteration is a single tree
slide-5
SLIDE 5 Stanford Univ ersit y Decem b er
  • Bo
  • sting
  • Decision
Boundary Bo
  • sting

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0

Bagging and Bo
  • sting
a v erage man y trees and pro duce smo
  • ther
decision b
  • undaries
Stanford Univ ersit y Decem b er
  • Bo
  • sting
  • AdaBo
  • st
F reund
  • Sc
hapire
  • Start
with w eigh ts w i
  • N
i
  • N
  • y
i
  • f
g
  • Rep
eat for m
  • M
  • a
Estimate the w eak learner f m x
  • f
g from the training data with w eigh ts w i
  • b
Compute e m
  • E
w y
  • f
m x c m
  • log
  • e
m e m
  • c
Set w i
  • w
i exp c m
  • y
i
  • f
m x i
  • i
  • N
  • and
renormalize so that P i w i
  • Output
the ma jorit y w eigh t classier C x
  • sign
  • P
M m c m f m x
slide-6
SLIDE 6 Stanford Univ ersit y Decem b er
  • Bo
  • sting
  • History
  • f
Bo
  • sting

Boosting

Bagging

  • L. Breiman(1996)

Attempts to

Schapire, Y. Singer (1998) Schapire, Freund, P. Bartlett, Lee (1997)

Concept of with Experiments Improvements explain why Adaboost works Adaboost PAC Learning Model

Schapire & Freund (1995,1997)

appears Adaboost is born Genesis of

Freund & Schapire (1996) Breiman(1996, 1997)

  • R. Quinlan (1996)

L.G. Valiant (1984)

  • Y. Freund (1995)

R.E Schapire (1990) Friedman, Hastie, Tibshirani (1998) Schapire, Singer, Freund, Iyer (1998)

Stanford Univ ersit y Decem b er
  • Bo
  • sting
  • P
A C Learning Mo del X
  • D
  • Instance
Space C
  • X
  • f
g Concept
  • C
h
  • X
  • f
g Hyp
  • thesis
  • H
error h
  • P
D C X
  • hX
  • h

C X

Denition Consider a concept class C dened
  • v
er a set X
  • f
length N
  • L
is a learner algorithm using h yp
  • thesis
space H
  • C
is P A C learnable b y L using H if for all C
  • C
  • all
distributions D
  • v
er X and all
  • learner
L will with P r
  • utput
an h
  • H
st error D h
  • in
time p
  • lynomial
in
  • N
and sizeC
  • Suc
h an L is called a strong Learner
slide-7
SLIDE 7 Stanford Univ ersit y Decem b er
  • Bo
  • sting
  • Bo
  • sting
a W eak Learner W eak learner L pro duces an h with error rate
  • with
P r
  • for
an y D
  • L
has access to con tin uous stream
  • f
training data and a class
  • racle
  • L
learns h
  • n
rst N training p
  • in
ts
  • L
randomly lters the next batc h
  • f
training p
  • in
ts extracting N
  • p
  • in
ts correctly classied b y h
  • N
  • incorrectly
classied and pro duces h
  • L
builds a third training set
  • f
N p
  • in
ts for whic h h
  • and
h
  • disagree
and pro duces h
  • L
  • utputs
h
  • Majority
V
  • teh
  • h
  • h
  • THEOREM
Sc hapire
  • The
Strength
  • f
W eak Learnabilit y error D h
  • Stanford
Univ ersit y Decem b er
  • Bo
  • sting
  • Bo
  • sting
  • T
raining Error Nested spheres in R
  • Ba
y es error is
  • Bo
  • sting
driv es the training error to zero F urther iterations con tin ue to impro v e test error in man y examples
slide-8
SLIDE 8 Stanford Univ ersit y Decem b er
  • Bo
  • sting
  • Bo
  • sting
Noisy Problems Nested Gaussians in R
  • Ba
y es error is
  • Here
the test error do es increase but quite slo wly
  • Stanford
Univ ersit y Decem b er
  • Bo
  • sting
  • Bagging
and Bo
  • sting
Smaller T rees
  • p
  • in
ts in R
  • Ba
y es error rate is
  • Eac
h tree has
  • terminal
no des gro wn b estrst BaggingBo
  • sting
gap is wider
slide-9
SLIDE 9 Stanford Univ ersit y Decem b er
  • Bo
  • sting
  • Bagging
and Bo
  • sting
Stumps
  • p
  • in
ts in R
  • Ba
y es error rate is
  • Eac
h tree has
  • terminal
no des gro wn b estrst Bagging fails
  • b
  • sting
do es b est ev er Stanford Univ ersit y Decem b er
  • Bo
  • sting
  • Prediction
games Results
  • f
F reund and Sc hapire
  • and
Breiman
  • Idea
  • Start
with xed learners f
  • x
f
  • x
  • f
M x
  • Pla
y a t w
  • p
erson game pla y er
  • pic
ks
  • bserv
ation w eigh ts w i
  • pla
y er
  • pic
ks learner w eigh ts c m ie he will use learner f m x with probabilit y c m
  • Pla
y er
  • tries
to mak e the prediction problem as hard as p
  • ssible
while Pla y er
  • do
es the b est he can
  • n
the w eigh ted problem W e judge dicult y b y the appro ximate margin a smo
  • th
v ersion
  • f
misclassication loss
slide-10
SLIDE 10 Stanford Univ ersit y Decem b er
  • Bo
  • sting
  • Result
this is a zerosum game and the minimax theorem giv es the b est strategy for eac h pla y er F urthermore AdaBo
  • st
con v erges to this
  • ptimal
strategy
  • Ho
w ev er link with statistical prop erties
  • f
actual AdaBo
  • st
is ten uous
  • wh
y minimize hardest w eigh ted problem mak es test error smaller
  • actual
AdaBo
  • st
do es not use a random c hoice
  • f
learner
  • actual
AdaBo
  • st
nds the learners f m x Stanford Univ ersit y Decem b er
  • Bo
  • sting
  • Stagewise
Additiv e Mo deling Bo
  • sting
builds an additiv e mo del F x
  • P
M m f m x and then C x
  • sign
F x W e do things lik e that in statistics
  • GAMs
F x
  • P
j f j x j
  • Basis
expansions F x
  • P
M m
  • m
h m x T raditionally eac h
  • f
the terms f m x is dieren t in nature and they are t join tly ie least squares maxim um lik eliho
  • d
With Bo
  • sting
eac h term is equiv alen t in nature and they are t in a stagewise fashion Simple example stagewise leastsquares Fix the past M
  • functions
and up date the M th using a tree min f M T r eex E Y
  • M
  • X
m f m x
  • f
M x
slide-11
SLIDE 11 Stanford Univ ersit y Decem b er
  • Bo
  • sting
  • Bo
  • sting
and Additiv e Mo dels
  • Discrete
AdaBo
  • st
builds an additiv e mo del F x
  • M
X m c m f m x b y stagewise
  • ptimization
  • f
J F x
  • E
e y F x
  • Giv
en an imp erfect F M
  • x
the up dates in Discrete AdaBo
  • st
corresp
  • nd
to a Newton step to w ards minimizing J F M
  • xc
M f M x
  • E
e y F M
  • xc
M f M x
  • v
er f M x
  • f
g with step length c M
  • E
e y F x is minimized at F x
  • log
P y
  • jx
P y
  • jx
Hence Adab
  • st
is tting an additiv e logistic regression mo del Stanford Univ ersit y Decem b er
  • Bo
  • sting
  • Details
f M
  • f
M x
  • arg
min g xfg E w y
  • g
x
  • with
w eigh ts w x y
  • e
y F M
  • x
  • c
M
  • c
M
  • arg
min c E w e cy f M x
  • log
  • e
e with e
  • E
w
  • y
f M x
  • Empirical
v ersion at eac h stage f M x is estimated b y the classication at the terminal no des
  • f
a tree gro wn to appropriately w eigh ted v ersions
  • f
the training data
  • A
t the M th stage
  • f
the Discrete AdaBo
  • st
iterations the w eigh ts are suc h that f M
  • has
w eigh ted training error
slide-12
SLIDE 12 Stanford Univ ersit y Decem b er
  • Bo
  • sting
  • Real
AdaBo
  • st
  • Start
with w eigh ts w i
  • N
  • i
  • N
  • y
i
  • f
g
  • Rep
eat for m
  • M
  • a
Fit the class probabilit y estimate p m x
  • P
w y
  • jx
  • using
w eigh ts w i
  • n
the training data b Set f m x
  • log
p m x p m x
  • R
  • c
Set w i
  • w
i exp y i f m x i
  • i
  • N
  • and
renormalize so that P i w i
  • Output
the classier sign P M m f m x Stanford Univ ersit y Decem b er
  • Bo
  • sting
  • Real
AdaBo
  • st
  • Real
AdaBo
  • st
also builds an additiv e logistic regression mo del log P y
  • jx
P y
  • jx
  • M
X m f m x b y stagewise
  • ptimization
  • f
J F x
  • E
e y F x
  • Giv
en an imp erfect F M
  • x
Real AdaBo
  • st
minimizes J F M
  • x
  • f
M x
  • E
e y F M
  • xf
M x
  • v
er f M x
  • R
  • with
solution f M x
  • log
P w y
  • P
w y
  • where
the w eigh ts w x y
  • e
y F M
  • x
  • Empirical
v ersion at eac h stage P w jx is estimated b y a v erages at the terminal no des
  • f
a tree gro wn to appropriately w eigh ted v ersions
  • f
the training data
slide-13
SLIDE 13 Stanford Univ ersit y Decem b er
  • Bo
  • sting
  • Wh
y J F x
  • E
e y F x
  • e
y F x is a monotone smo
  • th
upp er b
  • und
  • n
misclassication loss at x
  • J
F
  • is
an exp ected
  • statistic
at its minim um and equiv alen t to the binomial loglik eliho
  • d
to second
  • rder
  • Stagewise
binomial maxim umlik eliho
  • d
estimation
  • f
additiv e mo dels based
  • n
trees w
  • rks
as least as w ell Stanford Univ ersit y Decem b er
  • Bo
  • sting
  • Stagewise
Maxim um Lik eliho
  • d
Consider the mo del F M x
  • log
P y
  • jx
P y
  • jx
  • M
X m f m x
  • r
P y
  • jx
  • px
  • e
F x
  • e
F x The binomial loglik eliho
  • d
is F x
  • E
y
  • log
px
  • y
  • log
  • px
  • E
y
  • F
x
  • log
  • e
F x Stagewise Maxim um Lik eliho
  • d
Giv en an imp erfect F M
  • x
maximize F M
  • x
  • f
M x
  • v
er f M x
  • R
The LogitBo
  • st
algorithm tak es a single NewtonStep at eac h stage
slide-14
SLIDE 14 Stanford Univ ersit y Decem b er
  • Bo
  • sting
  • LogitBo
  • st
  • Start
with w eigh ts w i
  • N
i
  • N
  • F
x
  • and
probabilit y estimates px i
  • Rep
eat for m
  • M
  • a
Compute the w
  • rking
resp
  • nse
and w eigh ts z i
  • y
  • i
  • px
i
  • px
i
  • px
i
  • w
i
  • px
i
  • px
i
  • b
Fit the function f m x b y a w eigh ted leastsquares regression
  • f
z i to x i using w eigh ts w i
  • ie
a w eigh ted tree c Up date F x
  • F
x
  • f
m x and px
  • Output
the classier signF x
  • sign
P M m f m x W e also ha v e a natural generalization
  • f
LogitBo
  • st
for m ultiple classes Stanford Univ ersit y Decem b er
  • Bo
  • sting
  • Additiv
e Logistic T rees Bestrst tree gro wing allo ws us to limit the size
  • f
eac h tree and hence the in teraction
  • rder
By collecting terms w e get F x
  • X
j f j x j
  • X
jk f j k x j
  • x
k
  • X
jk l f j k l x j
  • x
k
  • x
l
  • Co
  • rdinate
functions for Additiv e Stumps Mo del Bo
  • sting
uses stagewise
  • ptimization
as
  • pp
  • sed
to join t
  • ptimization
full least squares bac ktting
slide-15
SLIDE 15 Stanford Univ ersit y Decem b er
  • Bo
  • sting
  • Stanford
Univ ersit y Decem b er
  • Bo
  • sting
slide-16
SLIDE 16 Stanford Univ ersit y Decem b er
  • Bo
  • sting
  • Large
Real Example Satimage Metho d T erminal Iterations No des
  • Satimage
CAR T error
  • LogitBo
  • st
  • Real
AdaBo
  • st
  • Gen
tle AdaBo
  • st
  • Discrete
AdaBo
  • st
  • LogitBo
  • st
  • Real
AdaBo
  • st
  • Gen
tle AdaBo
  • st
  • Discrete
AdaBo
  • st
  • training
  • test
  • features
  • classes
Stanford Univ ersit y Decem b er
  • Bo
  • sting
  • Large
Real Example Letter Metho d T erminal Iterations F raction No des
  • Letter
CAR T error
  • LogitBo
  • st
  • Real
AdaBo
  • st
  • Gen
tle AdaBo
  • st
  • Discrete
AdaBo
  • st
  • LogitBo
  • st
  • Real
AdaBo
  • st
  • Gen
tle AdaBo
  • st
  • Discrete
AdaBo
  • st
  • training
  • test
  • features
  • classes
slide-17
SLIDE 17 Stanford Univ ersit y Decem b er
  • Bo
  • sting
  • W
eigh t T rimming
  • A
t eac h iteration
  • bserv
ations with w i
  • t
  • are
not used for training t
  • is
  • th
quan tile
  • f
w eigh t distribution and
  • W
  • rks
b etter for LogitBo
  • st
  • LogitBo
  • st
has w eigh ts w i
  • p
i
  • p
i
  • whic
h are large near the decision b
  • undary
  • AdaBo
  • st
has w eigh ts w i
  • e
y i F M x i
  • recall
y i
  • f
g Large for misclassied p
  • in
ts
  • F
  • r
m ultipleclass pro cedures if the classk logit F mk
  • log
N
  • training
stops for that class Stanford Univ ersit y Decem b er
  • Bo
  • sting
  • Summary
and Closing Commen ts
  • The
in tro duction
  • f
Bo
  • sting
b y Sc hapire F reund and colleagues has brough t us an exciting and imp
  • rtan
t set
  • f
new ideas
  • Bo
  • sting
ts additiv e logistic mo dels where eac h comp
  • nen
t base learner is simple The complexit y needed for the base learner dep ends
  • n
the target function
  • Little
connection b et w een w eigh ted b
  • sting
and bagging b
  • sting
is primarily a bias reduction pro cedure while the goal
  • f
bagging is v ariance reduction
  • The
distinction b ecomes blurred when w eigh ting is ac hiev ed in b
  • sting
b y imp
  • rtance
sampling
slide-18
SLIDE 18 Stanford Univ ersit y Decem b er
  • Bo
  • sting
  • Margins
  • •• ••
  • • • •
  • • ••
  • • •
  • • •
  • margin
X
  • M
X
  • P
C X
  • F
reund
  • Sc
hapire Bo
  • sting
generalizes b ecause it pushes the training margins w ell ab
  • v
e zero while k eeping the V C dimension under con trol also V apnik
  • With
P r
  • P
T est M X
  • P
T r ain M X
  • O
  • p
N
  • log
N log jH j
  • log
  • Stanford
Univ ersit y Decem b er
  • Bo
  • sting
  • Ho
w do es Bo
  • sting
a v
  • id
  • v
ertting
  • As
iterations pro ceed impact
  • f
c hange is lo calized
  • P
arameters are not join tly
  • ptimized
  • stagewise
estimation slo ws do wn the learning pro cess
  • Classiers
are h urt less b y
  • v
ertting Co v er and Hart
  • Margin
theory
  • f
Sc hapire and F reund V apnik Disputed b y Breiman
  • Jury
is still
  • ut