Aggressive Double Sampling for Reducing Multi-class Classification - - PowerPoint PPT Presentation

aggressive double sampling for reducing multi class
SMART_READER_LITE
LIVE PREVIEW

Aggressive Double Sampling for Reducing Multi-class Classification - - PowerPoint PPT Presentation

Aggressive Double Sampling for Reducing Multi-class Classification to Binary Classification Bikash Joshi (PhD Student) AMA team, LIG Supervised By: Prof. Massih-Reza Amini and Dr. Franck Iutzeler March 20, 2017 Bikash Joshi (AMA team, LIG)


slide-1
SLIDE 1

Aggressive Double Sampling for Reducing Multi-class Classification to Binary Classification

Bikash Joshi (PhD Student) AMA team, LIG

Supervised By:

  • Prof. Massih-Reza Amini and Dr. Franck Iutzeler

March 20, 2017

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 1 / 27

slide-2
SLIDE 2

Outline

1

Introduction

2

Multiclass to Binary Reduction

3

Double-Sampled Multiclass to Binary Reduction

4

Experimental Results

5

Conclusion

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 2 / 27

slide-3
SLIDE 3

Outline

1

Introduction

2

Multiclass to Binary Reduction

3

Double-Sampled Multiclass to Binary Reduction

4

Experimental Results

5

Conclusion

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 3 / 27

slide-4
SLIDE 4

Multiclass Classification: Introduction

Figure : Digit Classification Figure : Image Classification Figure : Text Classification

Finite set of categories (K > 2) Popular applications: image and text classification.

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 4 / 27

slide-5
SLIDE 5

Multiclass classification: Related Work

1 Combined approaches based on binary classification: ◮ One-Vs-Rest ⋆ One binary problem for each class ⋆ K binary problems ⋆ O(K × d) ◮ One-Vs-One ⋆ One binary problem for each pair of classes ⋆ O(K 2× d) 2 Uncombined Approaches ◮ for example: multiclass SVM, MLP ◮ One scoring function per class 3 Logarithmic Time Algorithms ◮ For example: logTree, Recall-Tree ◮ Each leaf node represents a class ◮ O(logK) Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 5 / 27

slide-6
SLIDE 6

Multiclass classification : Challenges

The number of classes, K, in new emerging multiclass problems, for example in text and image classification, may reach 105 to 106 categories. For example:

◮ 4 × 106 sites ◮ 106 categories ◮ 105 editors ◮ Imbalanced nature

  • f hierarchies

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 6 / 27

slide-7
SLIDE 7

Multiclass classification : Challenges

Class imbalance problem Majority of classes have few representative examples Long tailed distribution

500 1000 1500 2000 2500 3000 3500 4000 2-5 6-10 11-30 31-100 101-200 >200 # Classes # Documents DMOZ-7500

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 7 / 27

slide-8
SLIDE 8

Text Classification:

Task: Automatic classification of an example text to one of fixed set of categories. Feature Representation: Bag of Words:

◮ From training corpus extract vocabulary. ◮ Represent each terms as 0 or 1 ◮ Highly sparse

Document-class joint feature representation:

◮ Inspired by learning to rank ◮ Similarity features between an example and class of examples ◮ For example:

  • t∈y∩x

1 Where, x → One document y → Class of documents

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 8 / 27

slide-9
SLIDE 9

Outline

1

Introduction

2

Multiclass to Binary Reduction

3

Double-Sampled Multiclass to Binary Reduction

4

Experimental Results

5

Conclusion

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 9 / 27

slide-10
SLIDE 10

Motivation of our work

Baselines: Model complexity increases with classes(K) and feature dimension (d). Algorithm that scales well for large scale data Does not suffer from class imbalance problem Less complex model Competitive with the state of the art approaches

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 10 / 27

slide-11
SLIDE 11

Framework

X ⊆ Rd : Input Space Y = 1,...,K : Output Space S = (xyi

i )m i=1 : Training set of i.i.d. pairs

G = g : X × Y → R : Class of predictors

Instantaneous Loss

e(g, xy) = 1 K − 1

  • y′∈Y\y

✶g(xy)≤g(xy′ ) (1) ✶π is the indicator function (Value is 0 or 1) Average number of classes that get greater scoring by g than true class Ranking loss used in Multiclass-SVM a

aWeston et. al. (1998) Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 11 / 27

slide-12
SLIDE 12

Framework

Empirical Loss

Empirical error of g ∈ G over S is: Lm(g, S) = 1 m(K − 1)

m

  • i=1
  • y′∈Y \yi

g(xyi

i )≤g(xy′ i

)

(2) = 1 m(K − 1)

m

  • i=1
  • y′∈Y \yi

✶ h(xyi

i , xy

i )

  • g(xyi

i )−g(xy′ i )

≤0

(3) Resembles to binary-classification-loss based risk Selection of a hypothesis in G minimizing risk over S is equivalent to search a hypothesis in H minimizing risk over T(S) of size m × (K − 1)

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 12 / 27

slide-13
SLIDE 13

Multiclass to binary reduction example

We consider the following transformation T(S) = zj =

  • xk

i , xyi i

  • , ˜

yj = −1

  • if k < yi
  • zj =
  • xyi

i , xk i

  • , ˜

yj = +1

  • elsewhere
  • j .

=(i−1)(K−1)+k

, |T(S)| = m × (K - 1)

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 13 / 27

slide-14
SLIDE 14

Multiclass to binary reduction algorithm

[Bikash et al. 2015]

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 14 / 27

slide-15
SLIDE 15

Improvements and New challenges

Improvements: One parameter vector for all classes. Low-dimensional feature space. Overcome class imbalance. New Challenges: Number of transformations huge for larger K Large computational overhead Large memory requirement

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 15 / 27

slide-16
SLIDE 16

Outline

1

Introduction

2

Multiclass to Binary Reduction

3

Double-Sampled Multiclass to Binary Reduction

4

Experimental Results

5

Conclusion

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 16 / 27

slide-17
SLIDE 17

Aggressive double sampling

1 Drawing uniformly µ examples per class, in order to form practical set

Sµ;

◮ Reduce redundancy in examples ◮ Emphasizing rare classes 2 For each example xy in Sµ, drawing uniformly κ adversarial classes in

Y\{y}.

◮ Reduces time complexity ◮ Low memory requirement Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 17 / 27

slide-18
SLIDE 18

Double Sampled Multi to Binary Reduction

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 18 / 27

slide-19
SLIDE 19

Outline

1

Introduction

2

Multiclass to Binary Reduction

3

Double-Sampled Multiclass to Binary Reduction

4

Experimental Results

5

Conclusion

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 19 / 27

slide-20
SLIDE 20

Experimental Setup

Datasets: Application: Text Classification DMOZ and Wikipedia datasets. (LSHTC challenge) Pre-processed with stop word removal and stemming. Random samples of 1000, 2000, 3000, 4000, 5000, 7500, 10000, 20000. Comparison: DS-m2b: Proposed double sampled multiclass to binary algorithm OVA: One-Vs-All algorithm M-SVM: Crammar-Singer implementation of multiclass SVM Recall Tree: Hierarchical One-Vs-Some algorithm

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 20 / 27

slide-21
SLIDE 21

Feature representation Φ(xy)

Features 1.

  • t∈y∩x

ln(1 + yt) 2.

  • t∈y∩x

ln(1 + lS St ) 3.

  • t∈y∩x

It 4.

  • t∈y∩x

ln(1 + yt |y|) 5.

  • t∈y∩x

ln(1 + yt |y|.It) 6.

  • t∈y∩x

ln(1 + yt |y|. lS St ) 7.

  • t∈y∩x

1 8.

  • t∈y∩x

yt |y|.It

  • 9. BM25
  • 10. d(xy, centroid(y))

xt : number of occurrences of terme t in document x, V: Number of distinct terms in S, yt =

x∈y xt, |y| = t∈V yt, St = x∈S xt, lS = t∈V St.

It : idf of the term t,

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 21 / 27

slide-22
SLIDE 22

Results: Runtime Comparison

1000 3000 5000 7500 10000 20000 # of classes 10

1

10

2

10

3

10

4

10

5

10

6

Total runtime (seconds)

OVA MSVM Recall Tree DS-m2b

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 22 / 27

slide-23
SLIDE 23

Results: Memory Comparison

1000 3000 5000 7500 10000 20000 # of classes 10 20 30 40 50 60 Total memory usage (GB) 16GB Limit 32GB Limit

OVA MSVM Recall Tree DS-m2b

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 23 / 27

slide-24
SLIDE 24

Results: Prediction Performance Comparison

1000 3000 5000 7500 10000 20000 # of classes 0.0 0.1 0.2 0.3 0.4 0.5 0.6 MAF

OVA MSVM Recall Tree DS-m2b

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 24 / 27

slide-25
SLIDE 25

Outline

1

Introduction

2

Multiclass to Binary Reduction

3

Double-Sampled Multiclass to Binary Reduction

4

Experimental Results

5

Conclusion

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 25 / 27

slide-26
SLIDE 26

Conclusion:

Multiclass to binary reduction to handle large-class scenario and

  • vercome class imbalance problem.

Use of double sampling to further improve computational complexity and memory usage.

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 26 / 27

slide-27
SLIDE 27

Questions?

Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 27 / 27