Regularized Multi-Class Semi-Supervised Boosting Amir Saffari, - - PowerPoint PPT Presentation

regularized multi class semi supervised boosting
SMART_READER_LITE
LIVE PREVIEW

Regularized Multi-Class Semi-Supervised Boosting Amir Saffari, - - PowerPoint PPT Presentation

Regularized Multi-Class Semi-Supervised Boosting Amir Saffari, Christian Leistner, Horst Bischof Institute for Computer Graphics and Vision, Graz University of Technology, Austria CVPR 2009, June 22, 2009 Supervised Learning Graz University of


slide-1
SLIDE 1

Regularized Multi-Class Semi-Supervised Boosting

Amir Saffari, Christian Leistner, Horst Bischof

Institute for Computer Graphics and Vision, Graz University of Technology, Austria

CVPR 2009, June 22, 2009

slide-2
SLIDE 2

Graz University of Technology

Supervised Learning

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 2 / 1

slide-3
SLIDE 3

Graz University of Technology

Semi-Supervised Learning (SSL)

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 3 / 1

slide-4
SLIDE 4

Graz University of Technology

Large-Scale Applications and Semi-Supervised Learning

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 4 / 1

slide-5
SLIDE 5

Graz University of Technology

Conclusions: Beta version 0.1

We propose a semi-supervised boosting algorithm

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 5 / 1

slide-6
SLIDE 6

Graz University of Technology

Conclusions: Beta version 0.1

We propose a semi-supervised boosting algorithm which solves multi-class problems without decomposing them into binary tasks.

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 5 / 1

slide-7
SLIDE 7

Graz University of Technology

Conclusions: Beta version 0.1

We propose a semi-supervised boosting algorithm which solves multi-class problems without decomposing them into binary tasks. Additionally, our algorithm scales very well with respect to the number of both labeled and unlabeled samples.

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 5 / 1

slide-8
SLIDE 8

Graz University of Technology

Outline

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 6 / 1

slide-9
SLIDE 9

Graz University of Technology

SSL Methods

Semi-Supervised Learning

Semi-supervised learning is a class of machine learning techniques that make use of both labeled and unlabeled data for training. There exists many SSL methods, see:

  • X. Zhu, “Semi-Supervised Learning Survey”, 2008 and
  • O. Chapelle, B. Schoelkopf, A. Zien, “The Semi-Supervised Learning”,

Cambridge, 2006.

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 7 / 1

slide-10
SLIDE 10

Graz University of Technology

Motivations

Many successful SSL methods do not scale very well w.r.t. the number of unlabeled samples, or are very sensitive to the choice of hyper-parameters (G. Mann, A. McCallum, ICML 2007). Expect to see O(n3) many times.

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 8 / 1

slide-11
SLIDE 11

Graz University of Technology

Motivations

Many successful SSL methods do not scale very well w.r.t. the number of unlabeled samples, or are very sensitive to the choice of hyper-parameters (G. Mann, A. McCallum, ICML 2007). Expect to see O(n3) many times. Usually multi-class problems are solved via 1-vs-all and occasionally with 1-vs-1 decompositions.

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 8 / 1

slide-12
SLIDE 12

Graz University of Technology

What is wrong with 1-vs-all?

Do you want to repeat a slow method a few more of times?

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 9 / 1

slide-13
SLIDE 13

Graz University of Technology

What is wrong with 1-vs-all?

Do you want to repeat a slow method a few more of times? Calibration problems (B. Schoelkopf, A. Smola, 2002).

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 9 / 1

slide-14
SLIDE 14

Graz University of Technology

What is wrong with 1-vs-all?

Do you want to repeat a slow method a few more of times? Calibration problems (B. Schoelkopf, A. Smola, 2002). Artificial unbalanced binary problems.

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 9 / 1

slide-15
SLIDE 15

Graz University of Technology

What is wrong with 1-vs-all?

Do you want to repeat a slow method a few more of times? Calibration problems (B. Schoelkopf, A. Smola, 2002). Artificial unbalanced binary problems. There exists slow multi-class SSL methods, see the details in the paper.

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 9 / 1

slide-16
SLIDE 16

Graz University of Technology

Multi-Class Semi-Supervised Boosting

Multi-class classifier: f(x) = [f1(x), · · · , fK(x)]T.

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 10 / 1

slide-17
SLIDE 17

Graz University of Technology

Multi-Class Semi-Supervised Boosting

Multi-class classifier: f(x) = [f1(x), · · · , fK(x)]T.

Overall Loss

L(f(x), X) =

  • (x,y)∈Xl

ℓ(f(x))

  • Labeled

+ α

  • x∈Xu

ℓc(f(x)) + β

  • x∈Xu

ℓm(f(x))

  • Unlabeled

(1)

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 10 / 1

slide-18
SLIDE 18

Graz University of Technology

Multi-Class Semi-Supervised Boosting

Multi-class classifier: f(x) = [f1(x), · · · , fK(x)]T.

Overall Loss

L(f(x), X) =

  • (x,y)∈Xl

ℓ(f(x))

  • Labeled

+ α

  • x∈Xu

ℓc(f(x)) + β

  • x∈Xu

ℓm(f(x))

  • Unlabeled

(1)

Boosting Model

f(x) = ν

T

  • t=1

gt(x) (2)

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 10 / 1

slide-19
SLIDE 19

Graz University of Technology

Fisher-Consistent Loss Functions

Vladimir Vapnik (picture courtesy of Yann LeCun) Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 11 / 1

slide-20
SLIDE 20

Graz University of Technology

Fisher-Consistent Loss Functions

Margin Vector

f(x) is a universal margin vector, if ∀x : K

i=1 fi(x) = 0.

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 12 / 1

slide-21
SLIDE 21

Graz University of Technology

Fisher-Consistent Loss Functions

Margin Vector

f(x) is a universal margin vector, if ∀x : K

i=1 fi(x) = 0.

Fisher-Consistent Loss

ℓ(·) is Fisher-consistent, if the minimization of the expected risk: ˆ f(x) = arg min

f(x)

  • (x,y)

ℓ(fy(x))p(y, x)d(x, y) (3) has a unique solution and C(x) = arg max

i

ˆ fi(x) = arg max

i

p(y = i|x). (4)

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 12 / 1

slide-22
SLIDE 22

Graz University of Technology

Fisher-Consistent Loss Functions

Margin Vector

f(x) is a universal margin vector, if ∀x : K

i=1 fi(x) = 0.

Fisher-Consistent Loss

ℓ(·) is Fisher-consistent, if the minimization of the expected risk: ˆ f(x) = arg min

f(x)

  • (x,y)

ℓ(fy(x))p(y, x)d(x, y) (3) has a unique solution and C(x) = arg max

i

ˆ fi(x) = arg max

i

p(y = i|x). (4)

L(f(x), Xl) =

  • (x,y)∈Xl

e−fy(x)

Zou et al., Annals of Applied Statistics, 2008 Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 12 / 1

slide-23
SLIDE 23

Graz University of Technology

Margin Assumption

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 13 / 1

slide-24
SLIDE 24

Graz University of Technology

Margin Assumption

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 14 / 1

slide-25
SLIDE 25

Graz University of Technology

Margin Assumption

Put the decision boundary over low-density regions of features space. This is equivalent to maximizing the margin of the unlabeled samples.

Example

Transductive Support Vector Machines (TSVM, T. Joachims, ICML 1999) uses this loss function for the binary SVM classifier h(x) ℓu(h(x)) = max(0, 1 − |h(x)|) (5)

Multi-Class Unlabeled Margin

We propose to maximize the multi-class margin of the unlabeled samples by using ℓm(f(x)) = max(0, M − max

i

(fi(x))). (6)

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 15 / 1

slide-26
SLIDE 26

Graz University of Technology

Manifold Assumption

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 16 / 1

slide-27
SLIDE 27

Graz University of Technology

Manifold Assumption

Enforce the classifier to predict similar labels for similar unlabeled samples.

Example

Graph-based methods, such as Laplacian SVM (Belkin et al., JMLR 2006), use this loss function for the binary SVM classifier h(x) ℓu(h(x)) =

  • x′∈Xu,x′=x

s(x, x′)h(x) − h(x′)2. (7)

Cluster Prior

We enforce the multi-class classifier to have a consistent probabilistic estimates over regions of feature space formed by similar samples, i.e. clusters.

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 17 / 1

slide-28
SLIDE 28

Graz University of Technology

Cluster Priors

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 18 / 1

slide-29
SLIDE 29

Graz University of Technology

Cluster Priors

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 19 / 1

slide-30
SLIDE 30

Graz University of Technology

Cluster Priors

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 20 / 1

slide-31
SLIDE 31

Graz University of Technology

Cluster Priors

Cluster Prior

∀x ∈ Xu, ∀i ∈ {1, · · · , K} : pp(y = i|x) . We use the Kullback-Leibler (KL) divergence ℓc(f(x)) = −pT

p f(x) + log K

  • j=1

efj(x). (8)

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 21 / 1

slide-32
SLIDE 32

Graz University of Technology

Cluster Priors

Cluster Prior

∀x ∈ Xu, ∀i ∈ {1, · · · , K} : pp(y = i|x) . We use the Kullback-Leibler (KL) divergence ℓc(f(x)) = −pT

p f(x) + log K

  • j=1

efj(x). (8) Use any clustering method which suits your application. Use similarity functions if it helps clustering to recover the manifolds.

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 21 / 1

slide-33
SLIDE 33

Graz University of Technology

Cluster Priors

Cluster Prior

∀x ∈ Xu, ∀i ∈ {1, · · · , K} : pp(y = i|x) . We use the Kullback-Leibler (KL) divergence ℓc(f(x)) = −pT

p f(x) + log K

  • j=1

efj(x). (8) Use any clustering method which suits your application. Use similarity functions if it helps clustering to recover the manifolds. Use any other source of information in form of priors: label prior, knowledge transfer, human prior knowledge.

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 21 / 1

slide-34
SLIDE 34

Graz University of Technology

Learning with Functional Gradient Descent

F ∗(x) = F0(x) − ν T

t=1 ∂L ∂F |(Ft−1(x))

Friedman et al., Annals of Applied Statistics, 2001 Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 22 / 1

slide-35
SLIDE 35

Graz University of Technology

RMSBoost

Learning task for tth boosting stage becomes gt(x) = arg max

g(x)

  • (x,y)∈Xl

e−fy(x)yTg(x) +

  • x∈Xu
  • α∆p + βm

Tg(x). (9)

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 23 / 1

slide-36
SLIDE 36

Graz University of Technology

RMSBoost

Learning task for tth boosting stage becomes gt(x) = arg max

g(x)

  • (x,y)∈Xl

e−fy(x)yTg(x) +

  • x∈Xu
  • α∆p + βm

Tg(x). (9)

Theorem

The solution using a multi-class classifier C(x) ∈ {1, · · · , K} is Ct(x) = arg min

C(x)

  • (x,y)∈Xl

wlI(C(x) = y) +

  • x∈Xu

wuI(C(x) = z) (10) where wl = e−fy(x) is the weight for a labeled sample, z = arg max

i

(α∆pi + βmi) and wu = α∆pz + βmz are the pseudo-label and weight for an unlabeled sample, respectively.

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 23 / 1

slide-37
SLIDE 37

Graz University of Technology

Experimental Settings

RMSBoost is compared with:

AdaBoost.ML (Zou et al., Annals of Applied Statistics 2008) Kernel SVM Multi-Switch TSVM (Sindhwani and Keerthi, SIGIR 2006) SERBoost (Saffari et al., ECCV 2008) RMBoost

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 24 / 1

slide-38
SLIDE 38

Graz University of Technology

Experimental Settings

RMSBoost is compared with:

AdaBoost.ML (Zou et al., Annals of Applied Statistics 2008) Kernel SVM Multi-Switch TSVM (Sindhwani and Keerthi, SIGIR 2006) SERBoost (Saffari et al., ECCV 2008) RMBoost

Base learners are tiny extremely randomized forests, each consisting

  • f 10 trees.

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 24 / 1

slide-39
SLIDE 39

Graz University of Technology

Experimental Settings

RMSBoost is compared with:

AdaBoost.ML (Zou et al., Annals of Applied Statistics 2008) Kernel SVM Multi-Switch TSVM (Sindhwani and Keerthi, SIGIR 2006) SERBoost (Saffari et al., ECCV 2008) RMBoost

Base learners are tiny extremely randomized forests, each consisting

  • f 10 trees.

Boosting iterations set to be 10000.

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 24 / 1

slide-40
SLIDE 40

Graz University of Technology

Experimental Settings

RMSBoost is compared with:

AdaBoost.ML (Zou et al., Annals of Applied Statistics 2008) Kernel SVM Multi-Switch TSVM (Sindhwani and Keerthi, SIGIR 2006) SERBoost (Saffari et al., ECCV 2008) RMBoost

Base learners are tiny extremely randomized forests, each consisting

  • f 10 trees.

Boosting iterations set to be 10000. Results of hierarchical k-means is averaged 10 times to estimate the cluster priors.

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 24 / 1

slide-41
SLIDE 41

Graz University of Technology

Experimental Settings

RMSBoost is compared with:

AdaBoost.ML (Zou et al., Annals of Applied Statistics 2008) Kernel SVM Multi-Switch TSVM (Sindhwani and Keerthi, SIGIR 2006) SERBoost (Saffari et al., ECCV 2008) RMBoost

Base learners are tiny extremely randomized forests, each consisting

  • f 10 trees.

Boosting iterations set to be 10000. Results of hierarchical k-means is averaged 10 times to estimate the cluster priors. All boosting and RF methods are implemented in C++ and use ATLAS subroutines.

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 24 / 1

slide-42
SLIDE 42

Graz University of Technology

Machine Learning Datasets

5% of the training data is chosen randomly to form the labeled set, the rest 95% is used as unlabeled set.

Dataset # Train # Test # Class # Feat. Letter 15000 5000 26 16 SensIt (com) 78823 19705 3 100

Table: Data sets for the machine learning experiments.

Method AML SVM TSVM SER RMB RMSB Letter 72.3 70.3 65.9 76.5 74.4 79.9 SensIt 79.5 80.2 79.9 81.9 79.0 83.7

Table: Classification accuracy (in %).

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 25 / 1

slide-43
SLIDE 43

Graz University of Technology

PASCAL 2006 Object Categorization Dataset

Standard bag-of-words using quantized SIFT on a regular grid at multiple scales. Images are represented by L1-normalized 2-level spatial pyramids. For SVM, pyramid χ2 kernel is used.

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 26 / 1

slide-44
SLIDE 44

Graz University of Technology

PASCAL 2006 Object Categorization Dataset

0.0 0.1 0.2 0.3 0.4 0.5

Labeled Samp. Ratio, r

0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60

  • Class. Acc.

VOC2006

RMSB SER AML SVM TSVM Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 27 / 1

slide-45
SLIDE 45

Graz University of Technology

PASCAL 2006 Object Categorization Dataset

2000 4000 6000 8000 10000

T

0.0 0.2 0.4 0.6 0.8 1.0

  • Acc. and Grad.

VOC2006, r =0.5

  • Class. Acc.

Grad.: Labeled Grad.: Unlabeled

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 28 / 1

slide-46
SLIDE 46

Graz University of Technology

PASCAL 2006 Object Categorization Dataset

RMSB SER TSVM

2000 4000 6000 8000 10000

Time (sec) Computation Time

r =0.1 r =0.5

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 29 / 1

slide-47
SLIDE 47

Graz University of Technology

PASCAL 2006 Object Categorization Dataset

RMSB SER TSVM

2000 4000 6000 8000 10000

Time (sec) Computation Time

r =0.1 r =0.5

With our current GPU implementation of random forest, once can get a 10 to 20 times speed up here. An additional 5 times speed up can be achieved by reducing the iterations to 2000.

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 29 / 1

slide-48
SLIDE 48

Graz University of Technology

Conclusions: Release version 1.0

We proposed a multi-class semi-supervised boosting method based on margin maximizing and cluster prior regularizations. By directly addressing the multi-class problem and using efficient base learners, such as random forests, we showed that our algorithm not

  • nly out-performs other supervised and semi-supervised methods, but

also achieves a high level of computational efficiency. Additionally, our method provides a mean to incorporate other knowledge sources, such as label priors, knowledge transfer priors, or human knowledge.

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 30 / 1

slide-49
SLIDE 49

Graz University of Technology

DAS-Forests

Semi-Supervised Random Forests, ICCV 2009. Hope to see many of you at Kyoto.

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 31 / 1

slide-50
SLIDE 50

Graz University of Technology

Learning with Functional Gradient Descent

X ∗ = X0 − ν T

t=1 L′(Xt−1)

Friedman et al., Annals of Applied Statistics, 2000 Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 32 / 1

slide-51
SLIDE 51

Graz University of Technology

Learning with Functional Gradient Descent

F ∗(x) = F0(x) − ν T

t=1 ∂L ∂F |(Ft−1(x))

Friedman et al., Annals of Applied Statistics, 2000 Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 33 / 1

slide-52
SLIDE 52

Graz University of Technology

PASCAL 2006 Object Categorization Dataset

0.0 0.2 0.4 0.6 0.8 1.0

α

0.53 0.54 0.55 0.56 0.57

  • Class. Acc.

VOC2006, r =0.5

0.00 0.02 0.04 0.06 0.08 0.10

ν

0.30 0.35 0.40 0.45 0.50 0.55

  • Class. Acc.

VOC2006, r =0.5

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 34 / 1

slide-53
SLIDE 53

Graz University of Technology

Experimental Settings

RMSBoost is compared with: AdaBoost.ML, Kernel SVM, Multi-Switch TSVM, SERBoost, RMBoost. Base learners are tiny extremely randomized forests, each consisting

  • f 10 trees.

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 35 / 1

slide-54
SLIDE 54

Graz University of Technology

Experimental Settings

RMSBoost is compared with: AdaBoost.ML, Kernel SVM, Multi-Switch TSVM, SERBoost, RMBoost. Base learners are tiny extremely randomized forests, each consisting

  • f 10 trees.

Boosting iterations set to be 10000.

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 35 / 1

slide-55
SLIDE 55

Graz University of Technology

Experimental Settings

RMSBoost is compared with: AdaBoost.ML, Kernel SVM, Multi-Switch TSVM, SERBoost, RMBoost. Base learners are tiny extremely randomized forests, each consisting

  • f 10 trees.

Boosting iterations set to be 10000. Parameters are selected via 10-fold cross-validation.

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 35 / 1

slide-56
SLIDE 56

Graz University of Technology

Experimental Settings

RMSBoost is compared with: AdaBoost.ML, Kernel SVM, Multi-Switch TSVM, SERBoost, RMBoost. Base learners are tiny extremely randomized forests, each consisting

  • f 10 trees.

Boosting iterations set to be 10000. Parameters are selected via 10-fold cross-validation. Results of hierarchical k-means is averaged 10 times to estimate the cluster priors.

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 35 / 1

slide-57
SLIDE 57

Graz University of Technology

Experimental Settings

RMSBoost is compared with: AdaBoost.ML, Kernel SVM, Multi-Switch TSVM, SERBoost, RMBoost. Base learners are tiny extremely randomized forests, each consisting

  • f 10 trees.

Boosting iterations set to be 10000. Parameters are selected via 10-fold cross-validation. Results of hierarchical k-means is averaged 10 times to estimate the cluster priors. For binary classification methods, we used a 1-vs-all strategy.

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 35 / 1

slide-58
SLIDE 58

Graz University of Technology

Experimental Settings

RMSBoost is compared with: AdaBoost.ML, Kernel SVM, Multi-Switch TSVM, SERBoost, RMBoost. Base learners are tiny extremely randomized forests, each consisting

  • f 10 trees.

Boosting iterations set to be 10000. Parameters are selected via 10-fold cross-validation. Results of hierarchical k-means is averaged 10 times to estimate the cluster priors. For binary classification methods, we used a 1-vs-all strategy. All results reported are average of 10 independent runs.

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 35 / 1

slide-59
SLIDE 59

Graz University of Technology

Experimental Settings

RMSBoost is compared with: AdaBoost.ML, Kernel SVM, Multi-Switch TSVM, SERBoost, RMBoost. Base learners are tiny extremely randomized forests, each consisting

  • f 10 trees.

Boosting iterations set to be 10000. Parameters are selected via 10-fold cross-validation. Results of hierarchical k-means is averaged 10 times to estimate the cluster priors. For binary classification methods, we used a 1-vs-all strategy. All results reported are average of 10 independent runs. All boosting and RF methods are implemented in C++ and use ATLAS subroutines.

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 35 / 1

slide-60
SLIDE 60

Graz University of Technology

PASCAL 2006 Object Categorization Dataset

2000 4000 6000 8000 10000

T

0.0 0.2 0.4 0.6 0.8 1.0

  • Acc. and Grad.

VOC2006, r =0.5

  • Class. Acc.

Grad.: Labeled Grad.: Unlabeled 2000 4000 6000 8000 10000

T

0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

Weights VOC2006, r =0.5

Correct Outliers Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 36 / 1

slide-61
SLIDE 61

Graz University of Technology

Exponential Loss

Example

The exponential loss ℓ(f (x)) = e−f (x), is a Fisher-consistent loss, its estimated conditional probabilities can be written as ˆ p(y = i|x) = efi(x) K

j=1 efj(x) ,

(11) which is a symmetric multiple logistic transformation. The empirical risk is L(f(x), Xl) =

  • (x,y)∈Xl

e−fy(x). (12)

Zou et al., Annals of Applied Statistics, 2008 Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 37 / 1

slide-62
SLIDE 62

Graz University of Technology

Cluster Priors

Cluster Prior

∀x ∈ Xu, ∀i ∈ {1, · · · , K} : pp(y = i|x) . We use the Kullback-Leibler (KL) divergence to measure the deviation of the model w.r.t. cluster prior ℓc(f(x)) = D(ppˆ p) = −H(pp) + H(pp, ˆ p). (13) Using symmetric multiple logistic transformation as the probabilistic estimates of the model ℓc(f(x)) = −pT

p f(x) + log K

  • j=1

efj(x). (14)

Amir Saffari, Christian Leistner, Horst Bischof (Institute for Computer Graphics and Vision, Graz University of Technology, Austria) Regularized Multi-Class Semi-Supervised Boosting CVPR 2009, June 22, 2009 38 / 1