[PPT] - Work on Multi-label Classification Jesse Read Supervised by PowerPoint Presentation

SLIDE 1

Work on Multi-label Classification

Jesse Read Supervised by Bernhard Pfahringer

jmr30@cs.waikato.ac.nz

Machine Learning Group University of Waikato Hamilton New Zealand

Work on Multi-label Classification – p. 1/1

SLIDE 2

Outline

Multi-label Classification Multi-label Applications Problem Transformation Binary Method Combination Method PS: Pruned Sets Method Results I Results II On-line Applications Experiments II Summary

Work on Multi-label Classification – p. 2/1

SLIDE 3

Multi-label Classification

Single-label (Multi-class) Classification: Set of instances D. Set of labels (classes) L. For each d ∈ D, select a label (class) l ∈ L Single-label representation: (d, l) Multi-label Classification: Set of instances D. Set of labels L. For each d ∈ D, select a label subset S ⊆ L Multi-label representation: (d, S)

Work on Multi-label Classification – p. 3/1

SLIDE 4

Applications

Any applications where data can be classified with more than one category/subject/label/tag: News articles, encyclopedia articles, . . . Web pages (bookmarks, web directories) Academic papers Emails, Newsgroups, Internet forum posts, RSS, . . . Medical text classification Images, video, music, . . . Biological applications (genes, . . . )

Work on Multi-label Classification – p. 4/1

SLIDE 5

Problem Transformation

1. Problem Transformation

Transform multi-label data into a single-label representation Use one or more single-label classifiers Transform the classifications into multi-label representations e.g.: Binary Method, Combination Method, Ranking Method (next slides. . . )

2. Algorithm Transformation

Transform a singe-label algorithm so it can make multi-label classifications Uses some form of problem transformation internally e.g.: Modifications to AdaBoost, SVM, Naive Bayes, Decision Trees. . .

Work on Multi-label Classification – p. 5/1

SLIDE 6

PT. Binary Method

One binary (single-label) classifier for each label. A label is relevant, or ¬relevant (1/0).

L = {A, B, C, D} D S ⊆ L d0 {A, D} d1 {C, D} d2 {A} d3 {B, C}

Work on Multi-label Classification – p. 6/1

SLIDE 7

PT. Binary Method

One binary (single-label) classifier for each label. A label is relevant, or ¬relevant (1/0).

L = {A, B, C, D} D S ⊆ L d0 {A, D} d1 {C, D} d2 {A} d3 {B, C} LA = {A, !A} = {B, !B} = {C, !C} = {D, !D} DA l ∈ LA DB DC DD d0 A . . . . . . . . . d1 ¬A . . . . . . . . . d2 A . . . . . . . . . d3 ¬A . . . . . . . . .

Work on Multi-label Classification – p. 6/1

SLIDE 8

PT. Binary Method

One binary (single-label) classifier for each label. A label is relevant, or ¬relevant (1/0).

L = {A, B, C, D} D S ⊆ L d0 {A, D} d1 {C, D} d2 {A} d3 {B, C} LA = {A, !A} = {B, !B} = {C, !C} = {D, !D} DA l ∈ LA DB DC DD d0 A . . . . . . . . . d1 ¬A . . . . . . . . . d2 A . . . . . . . . . d3 ¬A . . . . . . . . .

Work on Multi-label Classification – p. 6/1

SLIDE 9

PT. Binary Method

One binary (single-label) classifier for each label. A label is relevant, or ¬relevant (1/0).

L = {A, B, C, D} D S ⊆ L d0 {A, D} d1 {C, D} d2 {A} d3 {B, C} dt {C, D} LA = {A, !A} = {B, !B} = {C, !C} = {D, !D} DA l ∈ LA DB DC DD d0 A . . . . . . . . . d1 ¬A . . . . . . . . . d2 A . . . . . . . . . d3 ¬A . . . . . . . . .

Work on Multi-label Classification – p. 6/1

SLIDE 10

PT. Binary Method

One binary (single-label) classifier for each label. A label is relevant, or ¬relevant (1/0).

L = {A, B, C, D} D S ⊆ L d0 {A, D} d1 {C, D} d2 {A} d3 {B, C} dt {C, D} LA = {A, !A} = {B, !B} = {C, !C} = {D, !D} DA l ∈ LA DB DC DD d0 A . . . . . . . . . d1 ¬A . . . . . . . . . d2 A . . . . . . . . . d3 ¬A . . . . . . . . .

Assumes that all labels are independent Can be unbalanced by many negative examples

Work on Multi-label Classification – p. 6/1

SLIDE 11

PT. Combination Method

One decision involves multiple labels. Each label subset becomes one atomic label.

e.g. L = {A, B, C, D} L′ = {AD, CD, A, BC} D S ⊆ L d0 {A, D} d1 {C, D} d2 {A} d3 {B, C} D l ∈ L′ d0 AD d1 CD d2 A d3 BC

Work on Multi-label Classification – p. 7/1

SLIDE 12

PT. Combination Method

One decision involves multiple labels. Each label subset becomes one atomic label.

e.g. L = {A, B, C, D} L′ = {AD, CD, A, BC} D S ⊆ L d0 {A, D} d1 {C, D} d2 {A} d3 {B, C} D l ∈ L′ d0 AD d1 CD d2 A d3 BC

Work on Multi-label Classification – p. 7/1

SLIDE 13

PT. Combination Method

One decision involves multiple labels. Each label subset becomes one atomic label.

e.g. L = {A, B, C, D} L′ = {AD, CD, A, BC} D S ⊆ L d0 {A, D} d1 {C, D} d2 {A} d3 {B, C} D l ∈ L′ d0 AD d1 CD d2 A d3 BC

Work on Multi-label Classification – p. 7/1

SLIDE 14

PT. Combination Method

One decision involves multiple labels. Each label subset becomes one atomic label.

e.g. L = {A, B, C, D} L′ = {AD, CD, A, BC} D S ⊆ L d0 {A, D} d1 {C, D} d2 {A} d3 {B, C} D l ∈ L′ d0 AD d1 CD d2 A d3 BC

May generate many single-labels (classes) from few examples Can only predict combinations seen in the training set

Work on Multi-label Classification – p. 7/1

SLIDE 15

Ensembles of Pruned Sets (E.PS)

Reduces the number of combinations to only the core label combinations. e.g. 10 examples, 6 combinations:

Doc. Labels (S ⊆ L) d1 {Sports,Science} d2 {Environment,Science,Politics} d3 {Sports} d4 {Environment,Science} d5 {Science} d6 {Sports} d7 {Environment,Science} d8 {Politics} d9 {Politics} d10 {Science}

Work on Multi-label Classification – p. 8/1

SLIDE 16

Ensembles of Pruned Sets (E.PS)

Reduces the number of combinations to only the core label combinations. e.g. 10 examples, 6 combinations:

Doc. Labels (S ⊆ L) d1 {Sports,Science} d2 {Environment,Science,Politics} d3 {Sports} d4 {Environment,Science} d5 {Science} d6 {Sports} d7 {Environment,Science} d8 {Politics} d9 {Politics} d10 {Science}

Work on Multi-label Classification – p. 8/1

SLIDE 17

Ensembles of Pruned Sets (E.PS)

Reduces the number of combinations to only the core label combinations. e.g. 10 examples, 6 combinations:

Doc. Labels (S ⊆ L) d3 {Sports} d4 {Environment,Science} d5 {Science} d6 {Sports} d7 {Environment,Science} d8 {Politics} d9 {Politics} d10 {Science} Doc. Labels (S ⊆ L) d1 {Sports,Science} d2 {Environment,Science,Politics}

Work on Multi-label Classification – p. 8/1

SLIDE 18

Ensembles of Pruned Sets (E.PS)

Reduces the number of combinations to only the core label combinations. e.g. 10 examples, 6 combinations:

Doc. Labels (S ⊆ L) d3 {Sports} d4 {Environment,Science} d5 {Science} d6 {Sports} d7 {Environment,Science} d8 {Politics} d9 {Politics} d10 {Science} Doc. Labels (S ⊆ L) d1 {Sports,Science} d2 {Environment,Science,Politics}

Lost 20% of data. Can we save any of that data?

Work on Multi-label Classification – p. 8/1

SLIDE 19

Ensembles of Pruned Sets (E.PS)

Reduces the number of combinations to only the core label combinations. e.g. 10 examples, 6 combinations:

Doc. Labels (S ⊆ L) d3 {Sports} d4 {Environment,Science} d5 {Science} d6 {Sports} d7 {Environment,Science} d8 {Politics} d9 {Politics} d10 {Science} Doc. Labels (S ⊆ L) d1 {Sports,Science} d2 {Environment,Science,Politics}

Lost 20% of data. Can we save any of that data?

Yes. By splitting up S into more frequent subsubsets

Work on Multi-label Classification – p. 8/1

SLIDE 20

Ensembles of Pruned Sets (E.PS)

Reduces the number of combinations to only the core label combinations. e.g. 10 examples, 6 combinations:

Doc. Labels (S ⊆ L) d3 {Sports} d4 {Environment,Science} d5 {Science} d6 {Sports} d7 {Environment,Science} d8 {Politics} d9 {Politics} d10 {Science} Doc. Labels (S ⊆ L) d1 {Sports,Science} d1 {Sports} d1 {Science} d2 {Environment,Science,Politics} d2 {Environment,Science} d2 {Politics}

Lost 20% of data. Can we save any of that data?

Yes. By splitting up S into more frequent subsubsets

Work on Multi-label Classification – p. 8/1

SLIDE 21

Ensembles of Pruned Sets (E.PS)

Reduces the number of combinations to only the core label combinations. e.g. 10 examples, 6 combinations:

Doc. Labels (S ⊆ L) d3 {Sports} d4 {Environment,Science} d5 {Science} d6 {Sports} d7 {Environment,Science} d8 {Politics} d9 {Politics} d10 {Science} Doc. Labels (S ⊆ L) d1 {Sports} d1 {Science} d2 {Environment,Science} d2 {Politics}

Lost 20% of data. Can we save any of that data?

Yes. By splitting up S into more frequent subsubsets

Work on Multi-label Classification – p. 8/1

SLIDE 22

Ensembles of Pruned Sets (E.PS)

Reduces the number of combinations to only the core label combinations. e.g. 10 examples, 6 combinations:

Doc. Labels (S ⊆ L) d1 {Sports} d1 {Science} d2 {Environment,Science} d2 {Politics} d3 {Sports} d4 {Environment,Science} d5 {Science} d6 {Sports} d7 {Environment,Science} d8 {Politics} d9 {Politics} d10 {Science}

Work on Multi-label Classification – p. 8/1

SLIDE 23

Ensembles of Pruned Sets (E.PS)

Reduces the number of combinations to only the core label combinations. e.g. 12 examples, 4 combinations:

Doc. Labels (S ⊆ L) d1 {Sports} d1 {Science} d2 {Environment,Science} d2 {Politics} d3 {Sports} d4 {Environment,Science} d5 {Science} d6 {Sports} d7 {Environment,Science} d8 {Politics} d9 {Politics} d10 {Science}

Work on Multi-label Classification – p. 8/1

SLIDE 24

Results

a F1 Measure; 5 × 2 CV; paired t test for significance (*):

D |D| |L| LC BM [CM] PS E.PS RAK. Scene 2407 6 1.1 0.671- 0.729 0.730 0.752* 0.735 Medicl 978 45 1.3 0.791* 0.767 0.766 0.764 0.784 Yeast 2417 14 4.2 0.630 0.633 0.643 0.655* 0.665 Enron 1702 53 3.4 0.504 0.502 0.520 0.543* 0.543 Reut. 6000 103 1.5 0.421- 0.482 0.496 0.499* 0.418 Build times for Reuters with parameters: CM 1379 BM 123 p = 5 4 3 2 1 PS 41 58 80 135 246 E.PS 194 277 408 719 1,553 p = 2 25 50 61* 102 RAK. 10 350 3,627 22,337 DNF

aJ. Read, B. Pfahringer, G. Holmes. Multi-label Classification using Ensembles
f Pruned Sets. To appear in ICDM 2008.

Work on Multi-label Classification – p. 9/1

SLIDE 25

On-line Multi-label Classification

Many multi-label data are in fact on-line. New instances incoming Data can be time ordered Concept drift Possibly large collections Therefore, to work with on-line data, a multi-label algorithm would ideally be: Adaptive Efficient Note: Not necessarily ‘streaming’

Work on Multi-label Classification – p. 10/1

SLIDE 26

Measuring Multi-label Concept Drift

Measuring concept drift with . . . individual labels? . . . combinations? . . . PS transformation:

20NG,NEWS,Enron On-line: slow,med,rapid concept drift YEAST Randomised MEDICAL ??? SCENE Ordered train/test

Work on Multi-label Classification – p. 11/1

SLIDE 27

Results II.

‘Pseudo on-line’ BM vs E.PS methods. Model built on 100 instances updated one instance at a time Threshold updated every instance Model rebuilt every 25 instances Enron1−500 Acc. F1 sec. BM 37.27 0.486 35 E.PS 29.28 0.386 89 Enron500−1000 Acc. F1 sec. BM 29.16 0.444 124 E.PS 34.00 0.451 15 E.PS doesn’t outperforms BM under concept drift!

Work on Multi-label Classification – p. 12/1

SLIDE 28

Conclusions

Multi-label classification Problem Transformation (e.g. BM,CM) PS,E.PS: Working with core label-combinations: Multi-label classification in an online context

Work on Multi-label Classification – p. 13/1

SLIDE 29

Thank you for listening. Any questions?

Work on Multi-label Classification – p. 14/1