On the Combination of two Decompositive Multi-Label Classification - - PowerPoint PPT Presentation

on the combination of two decompositive multi label
SMART_READER_LITE
LIVE PREVIEW

On the Combination of two Decompositive Multi-Label Classification - - PowerPoint PPT Presentation

Introduction QCLR HOMER HOMER+QCLR Conclusions On the Combination of two Decompositive Multi-Label Classification Methods Grigorios Tsoumakas 1 , Eneldo Loza Menc a 2 , Ioannis Katakis 1 , Sang-Hyeun Park 2 , and Johannes F urnkranz 2 1


slide-1
SLIDE 1

Introduction QCLR HOMER HOMER+QCLR Conclusions

On the Combination of two Decompositive Multi-Label Classification Methods

Grigorios Tsoumakas1, Eneldo Loza Menc´ ıa2, Ioannis Katakis1, Sang-Hyeun Park2, and Johannes F¨ urnkranz2

1Aristotle University of Thessaloniki, Greece 2Technische Universit¨

at Darmstadt, Germany

11 September 2009

Tsoumakas, Loza Menc´ ıa, Katakis, Park & F¨ urnkranz ECML PKDD 2009 Workshop on Preference Learning (PL-09)

slide-2
SLIDE 2

Introduction QCLR HOMER HOMER+QCLR Conclusions

Outline

Introduction Background

QCLR HOMER

Evaluation Conclusions

Tsoumakas, Loza Menc´ ıa, Katakis, Park & F¨ urnkranz ECML PKDD 2009 Workshop on Preference Learning (PL-09)

slide-3
SLIDE 3

Introduction QCLR HOMER HOMER+QCLR Conclusions

Multi-Label Classification

Objects are assigned to a set of labels (domains: text, biology, music etc)

Tsoumakas, Loza Menc´ ıa, Katakis, Park & F¨ urnkranz ECML PKDD 2009 Workshop on Preference Learning (PL-09)

slide-4
SLIDE 4

Introduction QCLR HOMER HOMER+QCLR Conclusions

Methods

  • A. Problem Adaptation

Extend algorithms in order to handle multi-label data (e.g. MLkNN, BPMLL)

  • B. Problem Transformation

Transform the learning task into one or more single-label classification tasks

e.g. Label Powerset (LP), Binary Relevance (BR)

Decompositive Approaches: Focus on large number of labels

e.g. HOMER, QCLR

Tsoumakas, Loza Menc´ ıa, Katakis, Park & F¨ urnkranz ECML PKDD 2009 Workshop on Preference Learning (PL-09)

slide-5
SLIDE 5

Introduction QCLR HOMER HOMER+QCLR Conclusions

Methods

  • A. Problem Adaptation

Extend algorithms in order to handle multi-label data (e.g. MLkNN, BPMLL)

  • B. Problem Transformation

Transform the learning task into one or more single-label classification tasks

e.g. Label Powerset (LP), Binary Relevance (BR)

Decompositive Approaches: Focus on large number of labels

e.g. HOMER, QCLR

Main idea of this work Combine two state of the art decompositive methods (HOMER + QCLR) in order to confront problems with large number of labels more effectively and efficiently

Tsoumakas, Loza Menc´ ıa, Katakis, Park & F¨ urnkranz ECML PKDD 2009 Workshop on Preference Learning (PL-09)

slide-6
SLIDE 6

Introduction QCLR HOMER HOMER+QCLR Conclusions

QWeighted Calibrated Label Ranking (1/4)

Based on Ranking by Pairwise Comparison [H¨

ullermeier et al., AIJ08]

RPC - Transformation Learns one binary model for each pair of labels

234

  • 1
  • 34
  • 14
  • Labelset

Ex# 234

  • 1
  • 34
  • 14
  • Labelset

Ex# false

  • true
  • true
  • 1vs2

Ex# false

  • true
  • true
  • 1vs2

Ex# false

  • true
  • false
  • true
  • 1vs3

Ex# false

  • true
  • false
  • true
  • 1vs3

Ex# false

  • true
  • false
  • 1vs4

Ex# false

  • true
  • false
  • 1vs4

Ex# false

  • 2vs3

Ex# false

  • 2vs3

Ex# false

  • false
  • 2vs4

Ex# false

  • false
  • 2vs4

Ex# false

  • 3vs4

Ex# false

  • 3vs4

Ex# Tsoumakas, Loza Menc´ ıa, Katakis, Park & F¨ urnkranz ECML PKDD 2009 Workshop on Preference Learning (PL-09)

slide-7
SLIDE 7

Introduction QCLR HOMER HOMER+QCLR Conclusions

QCLR (2/4)

RPC - Classification

  • 1vs3
  • 1vs2
  • 2vs3
  • 1vs4
  • 2vs4
  • 3vs4
  • 1vs3
  • 1vs2
  • 2vs3
  • 1vs4
  • 2vs4
  • 3vs4

new instance x Ranking:

  • 4
  • 3
  • 2
  • 1

Votes Label

  • 4
  • 3
  • 2
  • 1

Votes Label 3 1 2 4 Tsoumakas, Loza Menc´ ıa, Katakis, Park & F¨ urnkranz ECML PKDD 2009 Workshop on Preference Learning (PL-09)

slide-8
SLIDE 8

Introduction QCLR HOMER HOMER+QCLR Conclusions

QCLR (2/4)

RPC - Classification

  • 1vs3
  • 1vs2
  • 2vs3
  • 1vs4
  • 2vs4
  • 3vs4
  • 1vs3
  • 1vs2
  • 2vs3
  • 1vs4
  • 2vs4
  • 3vs4

new instance x Ranking:

  • 4
  • 3
  • 2
  • 1

Votes Label

  • 4
  • 3
  • 2
  • 1

Votes Label 3 1 2 4

How to obtain a bipartition? Introduce a virtual label λV, that separates positive from negative labels (Calibrated Label Ranking) [F¨

urnkranz et al., MLJ08]

Tsoumakas, Loza Menc´ ıa, Katakis, Park & F¨ urnkranz ECML PKDD 2009 Workshop on Preference Learning (PL-09)

slide-9
SLIDE 9

Introduction QCLR HOMER HOMER+QCLR Conclusions

QCLR (3/4)

CLR - Transformation Additional pairwise models are necessary

234

  • 1
  • 34
  • 14
  • Labelset

Ex# 234

  • 1
  • 34
  • 14
  • Labelset

Ex# false

  • true
  • true
  • 1vs2

Ex# false

  • true
  • true
  • 1vs2

Ex# false

  • true
  • false
  • true
  • 1vs3

Ex# false

  • true
  • false
  • true
  • 1vs3

Ex# false

  • true
  • false
  • 1vs4

Ex# false

  • true
  • false
  • 1vs4

Ex# false

  • 2vs3

Ex# false

  • 2vs3

Ex# false

  • false
  • 2vs4

Ex# false

  • false
  • 2vs4

Ex# false

  • 3vs4

Ex# false

  • 3vs4

Ex# false

  • true
  • false
  • true
  • 1vsV

Ex# false

  • true
  • false
  • true
  • 1vsV

Ex# true

  • false
  • false
  • false
  • 2vsV

Ex# true

  • false
  • false
  • false
  • 2vsV

Ex# true

  • false
  • true
  • false
  • 3vsV

Ex# true

  • false
  • true
  • false
  • 3vsV

Ex# true

  • false
  • true
  • true
  • 4vsV

Ex# true

  • false
  • true
  • true
  • 4vsV

Ex#

Tsoumakas, Loza Menc´ ıa, Katakis, Park & F¨ urnkranz ECML PKDD 2009 Workshop on Preference Learning (PL-09)

slide-10
SLIDE 10

Introduction QCLR HOMER HOMER+QCLR Conclusions

QCLR (4/4)

CLR - Classification

  • 1vs3
  • 1vs2
  • 2vs3
  • 1vs4
  • 2vs4
  • 3vs4
  • 1vs3
  • 1vs2
  • 2vs3
  • 1vs4
  • 2vs4
  • 3vs4

new instance x

  • 4
  • V
  • 3
  • 2
  • 1

Votes Label

  • 4
  • V
  • 3
  • 2
  • 1

Votes Label Ranking: 1 V 2 4 3

  • 2vsV
  • 1vsV
  • 4vsV
  • 3vsV
  • 2vsV
  • 1vsV
  • 4vsV
  • 3vsV

Limitation: Need to query quadratic number of models Solution : Quick Weighted Voting [Loza Menc´

ıa et al., ESANN09]

Complexity is n + dnlog(n), where n is the number of labels and d is the average number of relevant labels (cardinality)

Tsoumakas, Loza Menc´ ıa, Katakis, Park & F¨ urnkranz ECML PKDD 2009 Workshop on Preference Learning (PL-09)

slide-11
SLIDE 11

Introduction QCLR HOMER HOMER+QCLR Conclusions

HOMER - Hierarchy Of MultiLabel ClassifiERs (1/2)

Main Idea [Tsoumakas et al., ECMLPKDD08w] The transformation of a multi-label problem with large number of labels into many hierarchically structured simpler sub-problems

λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ µ µ µ Tsoumakas, Loza Menc´ ıa, Katakis, Park & F¨ urnkranz ECML PKDD 2009 Workshop on Preference Learning (PL-09)

slide-12
SLIDE 12

Introduction QCLR HOMER HOMER+QCLR Conclusions

HOMER - Hierarchy Of MultiLabel ClassifiERs (1/2)

Main Idea [Tsoumakas et al., ECMLPKDD08w] The transformation of a multi-label problem with large number of labels into many hierarchically structured simpler sub-problems Step 1. Hierarchical Organization of Labels

λ1,λ6,λ8 λ4, λ5, λ2 λ7, λ3 λ1, λ2, λ3, λ4, λ5, λ6, λ7, λ8 λ6 λ1 λ8 λ5 λ4 λ2 λ7 λ3 µ1 µ2 µ3

k: branching factor meta label µn : represents the union of the labels of the node

Tsoumakas, Loza Menc´ ıa, Katakis, Park & F¨ urnkranz ECML PKDD 2009 Workshop on Preference Learning (PL-09)

slide-13
SLIDE 13

Introduction QCLR HOMER HOMER+QCLR Conclusions

HOMER - Hierarchy Of MultiLabel ClassifiERs (2/2)

Step 2. Assign a Multilabel Classifier at each internal node

h0 λ1, λ6, λ8 λ4, λ5, λ2 λ7, λ3 λ1, λ2, λ3, λ4, λ5, λ6, λ7, λ8 λ6 λ1 λ8 λ5 λ4 λ2 λ7 λ3 µ1 µ2 µ3 h2 h1 h3 x

Prediction

Tsoumakas, Loza Menc´ ıa, Katakis, Park & F¨ urnkranz ECML PKDD 2009 Workshop on Preference Learning (PL-09)

slide-14
SLIDE 14

Introduction QCLR HOMER HOMER+QCLR Conclusions

HOMER - Hierarchy Of MultiLabel ClassifiERs (2/2)

Step 2. Assign a Multilabel Classifier at each internal node

h0 λ1, λ6, λ8 λ4, λ5, λ2 λ7, λ3 λ1, λ2, λ3, λ4, λ5, λ6, λ7, λ8 λ6 λ1 λ8 λ5 λ4 λ2 λ7 λ3 µ1 µ2 µ3 h2 h1 h3 x

Prediction

Advantages

1

Classification Time - Only invoke few classifiers of the hierarchy

2

Prediction Performance - Balanced examples for each classifier

3

Training Time - Smaller datasets at each node

Tsoumakas, Loza Menc´ ıa, Katakis, Park & F¨ urnkranz ECML PKDD 2009 Workshop on Preference Learning (PL-09)

slide-15
SLIDE 15

Introduction QCLR HOMER HOMER+QCLR Conclusions

Label Distribution (1/2)

Open Issue How should we distribute labels into k children nodes (groups)?

Tsoumakas, Loza Menc´ ıa, Katakis, Park & F¨ urnkranz ECML PKDD 2009 Workshop on Preference Learning (PL-09)

slide-16
SLIDE 16

Introduction QCLR HOMER HOMER+QCLR Conclusions

Label Distribution (1/2)

Open Issue How should we distribute labels into k children nodes (groups)? Criteria

1 Labels of a group should co-occur as much as possible

Prediction of less meta-labels ⇒ activation of less classifiers ⇒ small classification times

2 Groups should be of equal size

Balanced distribution of examples for each meta-label ⇒ improved predictive performance A balanced tree could lead to improved classification times

Tsoumakas, Loza Menc´ ıa, Katakis, Park & F¨ urnkranz ECML PKDD 2009 Workshop on Preference Learning (PL-09)

slide-17
SLIDE 17

Introduction QCLR HOMER HOMER+QCLR Conclusions

Label Distribution (2/2)

Balanced k-Means Extension of k-Means Equal sized clusters Maintain an ordered list of labels according to similarity with the cluster centroid In case a cluster overflows ⇒ move the most distant label into the next most similar group Hamming distance

Tsoumakas, Loza Menc´ ıa, Katakis, Park & F¨ urnkranz ECML PKDD 2009 Workshop on Preference Learning (PL-09)

slide-18
SLIDE 18

Introduction QCLR HOMER HOMER+QCLR Conclusions

Motivation of Combination

Why combine HOMER with QCLR?

1 QCLR+HOMER will require less

memory time for training time for classification

2 HOMER+QCLR will have higher predictive performance (e.g.

compared to using binary relevance at each node)

Tsoumakas, Loza Menc´ ıa, Katakis, Park & F¨ urnkranz ECML PKDD 2009 Workshop on Preference Learning (PL-09)

slide-19
SLIDE 19

Introduction QCLR HOMER HOMER+QCLR Conclusions

Evaluation Goals

Primary Questions

1 Can HOMER improve QCLR in terms of predictive

performance, training and classification time?

2 Can HOMER+QCLR outperform HOMER+BR in terms of

predictive performance?

And what will be the extra cost in training and classification times?

Tsoumakas, Loza Menc´ ıa, Katakis, Park & F¨ urnkranz ECML PKDD 2009 Workshop on Preference Learning (PL-09)

slide-20
SLIDE 20

Introduction QCLR HOMER HOMER+QCLR Conclusions

Evaluation Goals

Primary Questions

1 Can HOMER improve QCLR in terms of predictive

performance, training and classification time?

2 Can HOMER+QCLR outperform HOMER+BR in terms of

predictive performance?

And what will be the extra cost in training and classification times?

Secondary Questions

1 What is the effect of the distribution method in HOMER?

Clustering? Balanced Clustering? Random Distribution?

2 What is the effect of branching factor k? Tsoumakas, Loza Menc´ ıa, Katakis, Park & F¨ urnkranz ECML PKDD 2009 Workshop on Preference Learning (PL-09)

slide-21
SLIDE 21

Introduction QCLR HOMER HOMER+QCLR Conclusions

Experimental Setup

Methods

Base single-label classifier: C4.5 Base multi-label classifiers: BR, QCLR HOMER: H+BR, H+QCLR Partitioning: Balanced k-Means (B), EM (C), Random (R)

Number of partitions ranging from 3 to 10

Datasets

name train test features labels cardinality density labelsets HiFind 16452 16519 98 632 37.304 0.059 32734 eccv2002 42379 4686 36 374 3.525 0.009 3175 jmlr2003 48859 16503 46 153 3.071 0.020 3115 mediamill 30993 12914 120 101 4.376 0.043 6555

Software

Mulan - http://sourceforge.net/projects/mulan/

Tsoumakas, Loza Menc´ ıa, Katakis, Park & F¨ urnkranz ECML PKDD 2009 Workshop on Preference Learning (PL-09)

slide-22
SLIDE 22

Introduction QCLR HOMER HOMER+QCLR Conclusions

The Clustering Factor - Training Time

500 700 900 1100 1300 1500 1700 1900 2100 2300 2500 3 4 5 6 7 8 9 10 CLR - R CLR - B CLR - C BR - R BR - B BR - C

(a) mediamill

500 1000 1500 2000 2500 3000 3 4 5 6 7 8 9 10 CLR - R CLR - B CLR - C BR - R BR - B BR - C

(b) jmlr2003

1000 2000 3000 4000 5000 6000 7000 8000 9000 3 4 5 6 7 8 9 10 CLR - R CLR - B CLR - C BR - R BR - B BR - C

(c) eccv2002

1000 2000 3000 4000 5000 6000 7000 3 4 5 6 7 8 9 10 CLR - R CLR - B CLR - C BR - R BR - B BR - C

(d) HiFind

Tsoumakas, Loza Menc´ ıa, Katakis, Park & F¨ urnkranz ECML PKDD 2009 Workshop on Preference Learning (PL-09)

slide-23
SLIDE 23

Introduction QCLR HOMER HOMER+QCLR Conclusions

The Clustering Factor - Classification Time

3 4 5 6 7 8 9 3 4 5 6 7 8 9 10 CLR - R CLR - B CLR - C BR - R BR - B BR - C

(e) mediamill

5 7 9 11 13 15 17 3 4 5 6 7 8 9 10 CLR - R CLR - B CLR - C BR - R BR - B BR - C

(f) jmlr2003

4 5 6 7 8 9 10 11 12 13 14 3 4 5 6 7 8 9 10 CLR - R CLR - B CLR - C BR - R BR - B BR - C

(g) eccv2002

20 30 40 50 60 70 80 90 100 110 120 3 4 5 6 7 8 9 10 CLR - R CLR - B CLR - C BR - R BR - B BR - C

(h) HiFind

Tsoumakas, Loza Menc´ ıa, Katakis, Park & F¨ urnkranz ECML PKDD 2009 Workshop on Preference Learning (PL-09)

slide-24
SLIDE 24

Introduction QCLR HOMER HOMER+QCLR Conclusions

The Clustering Factor - Recall

0.4 0.42 0.44 0.46 0.48 0.5 0.52 0.54 0.56 0.58 3 4 5 6 7 8 9 10 CLR - R CLR - B CLR - C BR - R BR - B BR - C

(i) mediamill

0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.24 0.26 0.28 3 4 5 6 7 8 9 10 CLR - R CLR - B CLR - C BR - R BR - B BR - C

(j) jmlr2003

0.05 0.1 0.15 0.2 0.25 0.3 3 4 5 6 7 8 9 10 CLR - R CLR - B CLR - C BR - R BR - B BR - C

(k) eccv2002

0.45 0.47 0.49 0.51 0.53 0.55 0.57 0.59 0.61 0.63 3 4 5 6 7 8 9 10 CLR - R CLR - B CLR - C BR - R BR - B BR - C

(l) HiFind

Tsoumakas, Loza Menc´ ıa, Katakis, Park & F¨ urnkranz ECML PKDD 2009 Workshop on Preference Learning (PL-09)

slide-25
SLIDE 25

Introduction QCLR HOMER HOMER+QCLR Conclusions

The Clustering Factor - Precision

0.4 0.42 0.44 0.46 0.48 0.5 0.52 0.54 0.56 0.58 0.6 3 4 5 6 7 8 9 10 CLR - R CLR - B CLR - C BR - R BR - B BR - C

(m) mediamill

0.1 0.15 0.2 0.25 0.3 0.35 3 4 5 6 7 8 9 10 CLR - R CLR - B CLR - C BR - R BR - B BR - C

(n) jmlr2003

0.1 0.15 0.2 0.25 0.3 0.35 3 4 5 6 7 8 9 10 CLR - R CLR - B CLR - C BR - R BR - B BR - C

(o) eccv2002

0.35 0.4 0.45 0.5 0.55 0.6 3 4 5 6 7 8 9 10 CLR - R CLR - B CLR - C BR - R BR - B BR - C

(p) HiFind

Tsoumakas, Loza Menc´ ıa, Katakis, Park & F¨ urnkranz ECML PKDD 2009 Workshop on Preference Learning (PL-09)

slide-26
SLIDE 26

Introduction QCLR HOMER HOMER+QCLR Conclusions

The Clustering Factor - micro F

0.45 0.46 0.47 0.48 0.49 0.5 0.51 0.52 0.53 0.54 3 4 5 6 7 8 9 10 CLR - R CLR - B CLR - C BR - R BR - B BR - C

(q) mediamill

0.1 0.12 0.14 0.16 0.18 0.2 0.22 3 4 5 6 7 8 9 10 CLR - R CLR - B CLR - C BR - R BR - B BR - C

(r) jmlr2003

0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.24 3 4 5 6 7 8 9 10 CLR - R CLR - B CLR - C BR - R BR - B BR - C

(s) eccv2002

0.45 0.47 0.49 0.51 0.53 0.55 0.57 3 4 5 6 7 8 9 10 CLR - R CLR - B CLR - C BR - R BR - B BR - C

(t) HiFind

Tsoumakas, Loza Menc´ ıa, Katakis, Park & F¨ urnkranz ECML PKDD 2009 Workshop on Preference Learning (PL-09)

slide-27
SLIDE 27

Introduction QCLR HOMER HOMER+QCLR Conclusions

The Clustering Factor - Observations

Increasing k leads to ... Better classification times (shorter tree of classifiers) Better precision Worse recall Compared to random partitioning, balanced clustering takes advantage of similarity and can lead to lower (overall) training/classification time, especially for dense datasets

Tsoumakas, Loza Menc´ ıa, Katakis, Park & F¨ urnkranz ECML PKDD 2009 Workshop on Preference Learning (PL-09)

slide-28
SLIDE 28

Introduction QCLR HOMER HOMER+QCLR Conclusions

micro F1

Method mediamill jmlr2003 eccv2002 HiFind BR 50.55 % 15.09 % 12.34 % 51.65 % QCLR 55.04 % 8.45 % 7.21 % – H+BR 50.23 % 15.36 % 18.14 % 51.76 % H+QCLR 53.13 % 15.55 % 19.70 % 54.65 % HOMER improves predictive performance of BR and QCLR

Especially in datasets with large number of labels

HOMER+QCLR presents better predictive performance than HOMER+BR

Tsoumakas, Loza Menc´ ıa, Katakis, Park & F¨ urnkranz ECML PKDD 2009 Workshop on Preference Learning (PL-09)

slide-29
SLIDE 29

Introduction QCLR HOMER HOMER+QCLR Conclusions

Training Time

Method mediamill jmlr2003 eccv2002 HiFind BR 2413.40 2801.17 2701.32 4179.66 QCLR 7423.19 6542.51 7460.14 – H+BR 1065.21 1101.61 1144.47 2345.39 H+QCLR 1667.29 1871.00 1836.34 3801.53 HOMER reduces training time for both BR and CLR

Tsoumakas, Loza Menc´ ıa, Katakis, Park & F¨ urnkranz ECML PKDD 2009 Workshop on Preference Learning (PL-09)

slide-30
SLIDE 30

Introduction QCLR HOMER HOMER+QCLR Conclusions

Testing Time

Method mediamill jmlr2003 eccv2002 HiFind BR 3.84 6.67 5.47 50.47 QCLR 103.59 119.28 154.65 – H+BR 4.35 7.70 4.48 48.77 H+QCLR 4.90 9.26 5.62 60.02 HOMER significantly reduces testing time for QCLR

Tsoumakas, Loza Menc´ ıa, Katakis, Park & F¨ urnkranz ECML PKDD 2009 Workshop on Preference Learning (PL-09)

slide-31
SLIDE 31

Introduction QCLR HOMER HOMER+QCLR Conclusions

Conclusions & Future Work

Conclusions A combination of decompositive methods (HOMER and QCLR) Builds less number of models compared to QCLR

Faster training Faster testing Less memory requirements

Better predictive performance than QCLR Better predictive performance than HOMER+BR with a small expense in training and classification time Future Work In depth analysis of when and why HOMER+QCLR works More datasets More base classifiers

Tsoumakas, Loza Menc´ ıa, Katakis, Park & F¨ urnkranz ECML PKDD 2009 Workshop on Preference Learning (PL-09)

slide-32
SLIDE 32

Introduction QCLR HOMER HOMER+QCLR Conclusions

End of presentation

Thank you for your attention!

Tsoumakas, Loza Menc´ ıa, Katakis, Park & F¨ urnkranz ECML PKDD 2009 Workshop on Preference Learning (PL-09)