On-line Hierarchical Multi-label Classification last 6 months - - PowerPoint PPT Presentation

on line hierarchical multi label classification
SMART_READER_LITE
LIVE PREVIEW

On-line Hierarchical Multi-label Classification last 6 months - - PowerPoint PPT Presentation

On-line Hierarchical Multi-label Classification last 6 months Jesse Read jesse.read@gmail.com University of Waikato On-line Hierarchical Multi-label Classification p. 1/3 Outline Multi-label classification (review) Problem


slide-1
SLIDE 1

On-line Hierarchical Multi-label Classification

last 6 months

Jesse Read

jesse.read@gmail.com

University of Waikato

On-line Hierarchical Multi-label Classification – p. 1/3

slide-2
SLIDE 2

Outline

Multi-label classification (review) Problem Transformation (review) Multi-labeled data (a closer look) PPT: A new Problem Transformation method Experiments I PPT: An extension Experiments II Experiments III PPT: Some related applications Summary, current and planned work

On-line Hierarchical Multi-label Classification – p. 2/3

slide-3
SLIDE 3

Single-label (Multi-class) Classification

On-line Hierarchical Multi-label Classification – p. 3/3

slide-4
SLIDE 4

Single-label (Multi-class) Classification

Set of documents D. Set of labels L. For each d ∈ D, select a label l ∈ L Single-label representation: (d, l)

On-line Hierarchical Multi-label Classification – p. 3/3

slide-5
SLIDE 5

Single-label (Multi-class) Classification

Set of documents D. Set of labels L. For each d ∈ D, select a label l ∈ L Single-label representation: (d, l) e.g. L = {Sport, Environment, Science, Politics}:

Document (d) Label (l ∈ L) “NZ scientists help discover solar system in our galaxy. . . ” “Antarctic food chain in danger. . . ” “Top sports stars fuelling success. . . ” “Steeled for ironman. . . ” “Greens claim report doctored. . . ” “Revealed: Polluting impact of humans on the oceans. . . ” “Union muzzled while awaiting poll watchdog’s ruling. . . ” “Technology pushes sporting boundaries. . . ”

On-line Hierarchical Multi-label Classification – p. 3/3

slide-6
SLIDE 6

Single-label (Multi-class) Classification

Set of documents D. Set of labels L. For each d ∈ D, select a label l ∈ L Single-label representation: (d, l) e.g. L = {Sport, Environment, Science, Politics}:

Document (d) Label (l ∈ L) “NZ scientists help discover solar system in our galaxy. . . ” Science “Antarctic food chain in danger. . . ” “Top sports stars fuelling success. . . ” “Steeled for ironman. . . ” “Greens claim report doctored. . . ” “Revealed: Polluting impact of humans on the oceans. . . ” “Union muzzled while awaiting poll watchdog’s ruling. . . ” “Technology pushes sporting boundaries. . . ”

On-line Hierarchical Multi-label Classification – p. 3/3

slide-7
SLIDE 7

Single-label (Multi-class) Classification

Set of documents D. Set of labels L. For each d ∈ D, select a label l ∈ L Single-label representation: (d, l) e.g. L = {Sport, Environment, Science, Politics}:

Document (d) Label (l ∈ L) “NZ scientists help discover solar system in our galaxy. . . ” Science “Antarctic food chain in danger. . . ” Science “Top sports stars fuelling success. . . ” “Steeled for ironman. . . ” “Greens claim report doctored. . . ” “Revealed: Polluting impact of humans on the oceans. . . ” “Union muzzled while awaiting poll watchdog’s ruling. . . ” “Technology pushes sporting boundaries. . . ”

On-line Hierarchical Multi-label Classification – p. 3/3

slide-8
SLIDE 8

Single-label (Multi-class) Classification

Set of documents D. Set of labels L. For each d ∈ D, select a label l ∈ L Single-label representation: (d, l) e.g. L = {Sport, Environment, Science, Politics}:

Document (d) Label (l ∈ L) “NZ scientists help discover solar system in our galaxy. . . ” Science “Antarctic food chain in danger. . . ” Science “Top sports stars fuelling success. . . ” Sport “Steeled for ironman. . . ” “Greens claim report doctored. . . ” “Revealed: Polluting impact of humans on the oceans. . . ” “Union muzzled while awaiting poll watchdog’s ruling. . . ” “Technology pushes sporting boundaries. . . ”

On-line Hierarchical Multi-label Classification – p. 3/3

slide-9
SLIDE 9

Single-label (Multi-class) Classification

Set of documents D. Set of labels L. For each d ∈ D, select a label l ∈ L Single-label representation: (d, l) e.g. L = {Sport, Environment, Science, Politics}:

Document (d) Label (l ∈ L) “NZ scientists help discover solar system in our galaxy. . . ” Science “Antarctic food chain in danger. . . ” Science “Top sports stars fuelling success. . . ” Sport “Steeled for ironman. . . ” Sport “Greens claim report doctored. . . ” “Revealed: Polluting impact of humans on the oceans. . . ” “Union muzzled while awaiting poll watchdog’s ruling. . . ” “Technology pushes sporting boundaries. . . ”

On-line Hierarchical Multi-label Classification – p. 3/3

slide-10
SLIDE 10

Single-label (Multi-class) Classification

Set of documents D. Set of labels L. For each d ∈ D, select a label l ∈ L Single-label representation: (d, l) e.g. L = {Sport, Environment, Science, Politics}:

Document (d) Label (l ∈ L) “NZ scientists help discover solar system in our galaxy. . . ” Science “Antarctic food chain in danger. . . ” Science “Top sports stars fuelling success. . . ” Sport “Steeled for ironman. . . ” Sport “Greens claim report doctored. . . ” Politics “Revealed: Polluting impact of humans on the oceans. . . ” “Union muzzled while awaiting poll watchdog’s ruling. . . ” “Technology pushes sporting boundaries. . . ”

On-line Hierarchical Multi-label Classification – p. 3/3

slide-11
SLIDE 11

Single-label (Multi-class) Classification

Set of documents D. Set of labels L. For each d ∈ D, select a label l ∈ L Single-label representation: (d, l) e.g. L = {Sport, Environment, Science, Politics}:

Document (d) Label (l ∈ L) “NZ scientists help discover solar system in our galaxy. . . ” Science “Antarctic food chain in danger. . . ” Science “Top sports stars fuelling success. . . ” Sport “Steeled for ironman. . . ” Sport “Greens claim report doctored. . . ” Politics “Revealed: Polluting impact of humans on the oceans. . . ” Environment “Union muzzled while awaiting poll watchdog’s ruling. . . ” “Technology pushes sporting boundaries. . . ”

On-line Hierarchical Multi-label Classification – p. 3/3

slide-12
SLIDE 12

Single-label (Multi-class) Classification

Set of documents D. Set of labels L. For each d ∈ D, select a label l ∈ L Single-label representation: (d, l) e.g. L = {Sport, Environment, Science, Politics}:

Document (d) Label (l ∈ L) “NZ scientists help discover solar system in our galaxy. . . ” Science “Antarctic food chain in danger. . . ” Science “Top sports stars fuelling success. . . ” Sport “Steeled for ironman. . . ” Sport “Greens claim report doctored. . . ” Politics “Revealed: Polluting impact of humans on the oceans. . . ” Environment “Union muzzled while awaiting poll watchdog’s ruling. . . ” Politics “Technology pushes sporting boundaries. . . ”

On-line Hierarchical Multi-label Classification – p. 3/3

slide-13
SLIDE 13

Single-label (Multi-class) Classification

Set of documents D. Set of labels L. For each d ∈ D, select a label l ∈ L Single-label representation: (d, l) e.g. L = {Sport, Environment, Science, Politics}:

Document (d) Label (l ∈ L) “NZ scientists help discover solar system in our galaxy. . . ” Science “Antarctic food chain in danger. . . ” Science “Top sports stars fuelling success. . . ” Sport “Steeled for ironman. . . ” Sport “Greens claim report doctored. . . ” Politics “Revealed: Polluting impact of humans on the oceans. . . ” Environment “Union muzzled while awaiting poll watchdog’s ruling. . . ” Politics “Technology pushes sporting boundaries. . . ” Science

On-line Hierarchical Multi-label Classification – p. 3/3

slide-14
SLIDE 14

Multi-label Classification

On-line Hierarchical Multi-label Classification – p. 4/3

slide-15
SLIDE 15

Multi-label Classification

Set of documents D. Set of labels L. For each d ∈ D, select a label subset S ⊆ L Multi-label representation: (d, S)

On-line Hierarchical Multi-label Classification – p. 4/3

slide-16
SLIDE 16

Multi-label Classification

Set of documents D. Set of labels L. For each d ∈ D, select a label subset S ⊆ L Multi-label representation: (d, S) e.g. L = {Sport, Environment, Science, Politics}:

Document (d) Label (S ⊆ L) “NZ scientists help discover solar system in our galaxy. . . ” “Antarctic food chain in danger. . . ” “Top sports stars fuelling success. . . ” “Steeled for ironman. . . ” “Greens claim report doctored. . . ” “Revealed: Polluting impact of humans on the oceans. . . ” “Union muzzled while awaiting poll watchdog’s ruling. . . ” “Technology pushes sporting boundaries. . . ”

On-line Hierarchical Multi-label Classification – p. 4/3

slide-17
SLIDE 17

Multi-label Classification

Set of documents D. Set of labels L. For each d ∈ D, select a label subset S ⊆ L Multi-label representation: (d, S) e.g. L = {Sport, Environment, Science, Politics}:

Document (d) Label (S ⊆ L) “NZ scientists help discover solar system in our galaxy. . . ” {Science} “Antarctic food chain in danger. . . ” “Top sports stars fuelling success. . . ” “Steeled for ironman. . . ” “Greens claim report doctored. . . ” “Revealed: Polluting impact of humans on the oceans. . . ” “Union muzzled while awaiting poll watchdog’s ruling. . . ” “Technology pushes sporting boundaries. . . ”

On-line Hierarchical Multi-label Classification – p. 4/3

slide-18
SLIDE 18

Multi-label Classification

Set of documents D. Set of labels L. For each d ∈ D, select a label subset S ⊆ L Multi-label representation: (d, S) e.g. L = {Sport, Environment, Science, Politics}:

Document (d) Label (S ⊆ L) “NZ scientists help discover solar system in our galaxy. . . ” {Science} “Antarctic food chain in danger. . . ” {Science, Environment} “Top sports stars fuelling success. . . ” “Steeled for ironman. . . ” “Greens claim report doctored. . . ” “Revealed: Polluting impact of humans on the oceans. . . ” “Union muzzled while awaiting poll watchdog’s ruling. . . ” “Technology pushes sporting boundaries. . . ”

On-line Hierarchical Multi-label Classification – p. 4/3

slide-19
SLIDE 19

Multi-label Classification

Set of documents D. Set of labels L. For each d ∈ D, select a label subset S ⊆ L Multi-label representation: (d, S) e.g. L = {Sport, Environment, Science, Politics}:

Document (d) Label (S ⊆ L) “NZ scientists help discover solar system in our galaxy. . . ” {Science} “Antarctic food chain in danger. . . ” {Science, Environment} “Top sports stars fuelling success. . . ” {Sport} “Steeled for ironman. . . ” “Greens claim report doctored. . . ” “Revealed: Polluting impact of humans on the oceans. . . ” “Union muzzled while awaiting poll watchdog’s ruling. . . ” “Technology pushes sporting boundaries. . . ”

On-line Hierarchical Multi-label Classification – p. 4/3

slide-20
SLIDE 20

Multi-label Classification

Set of documents D. Set of labels L. For each d ∈ D, select a label subset S ⊆ L Multi-label representation: (d, S) e.g. L = {Sport, Environment, Science, Politics}:

Document (d) Label (S ⊆ L) “NZ scientists help discover solar system in our galaxy. . . ” {Science} “Antarctic food chain in danger. . . ” {Science, Environment} “Top sports stars fuelling success. . . ” {Sport} “Steeled for ironman. . . ” {Sport} “Greens claim report doctored. . . ” “Revealed: Polluting impact of humans on the oceans. . . ” “Union muzzled while awaiting poll watchdog’s ruling. . . ” “Technology pushes sporting boundaries. . . ”

On-line Hierarchical Multi-label Classification – p. 4/3

slide-21
SLIDE 21

Multi-label Classification

Set of documents D. Set of labels L. For each d ∈ D, select a label subset S ⊆ L Multi-label representation: (d, S) e.g. L = {Sport, Environment, Science, Politics}:

Document (d) Label (S ⊆ L) “NZ scientists help discover solar system in our galaxy. . . ” {Science} “Antarctic food chain in danger. . . ” {Science, Environment} “Top sports stars fuelling success. . . ” {Sport} “Steeled for ironman. . . ” {Sport} “Greens claim report doctored. . . ” {Politics, Environment} “Revealed: Polluting impact of humans on the oceans. . . ” “Union muzzled while awaiting poll watchdog’s ruling. . . ” “Technology pushes sporting boundaries. . . ”

On-line Hierarchical Multi-label Classification – p. 4/3

slide-22
SLIDE 22

Multi-label Classification

Set of documents D. Set of labels L. For each d ∈ D, select a label subset S ⊆ L Multi-label representation: (d, S) e.g. L = {Sport, Environment, Science, Politics}:

Document (d) Label (S ⊆ L) “NZ scientists help discover solar system in our galaxy. . . ” {Science} “Antarctic food chain in danger. . . ” {Science, Environment} “Top sports stars fuelling success. . . ” {Sport} “Steeled for ironman. . . ” {Sport} “Greens claim report doctored. . . ” {Politics, Environment} “Revealed: Polluting impact of humans on the oceans. . . ” {Environment, Science} “Union muzzled while awaiting poll watchdog’s ruling. . . ” “Technology pushes sporting boundaries. . . ”

On-line Hierarchical Multi-label Classification – p. 4/3

slide-23
SLIDE 23

Multi-label Classification

Set of documents D. Set of labels L. For each d ∈ D, select a label subset S ⊆ L Multi-label representation: (d, S) e.g. L = {Sport, Environment, Science, Politics}:

Document (d) Label (S ⊆ L) “NZ scientists help discover solar system in our galaxy. . . ” {Science} “Antarctic food chain in danger. . . ” {Science, Environment} “Top sports stars fuelling success. . . ” {Sport} “Steeled for ironman. . . ” {Sport} “Greens claim report doctored. . . ” {Politics, Environment} “Revealed: Polluting impact of humans on the oceans. . . ” {Environment, Science} “Union muzzled while awaiting poll watchdog’s ruling. . . ” {Politics} “Technology pushes sporting boundaries. . . ”

On-line Hierarchical Multi-label Classification – p. 4/3

slide-24
SLIDE 24

Multi-label Classification

Set of documents D. Set of labels L. For each d ∈ D, select a label subset S ⊆ L Multi-label representation: (d, S) e.g. L = {Sport, Environment, Science, Politics}:

Document (d) Label (S ⊆ L) “NZ scientists help discover solar system in our galaxy. . . ” {Science} “Antarctic food chain in danger. . . ” {Science, Environment} “Top sports stars fuelling success. . . ” {Sport} “Steeled for ironman. . . ” {Sport} “Greens claim report doctored. . . ” {Politics, Environment} “Revealed: Polluting impact of humans on the oceans. . . ” {Environment, Science} “Union muzzled while awaiting poll watchdog’s ruling. . . ” {Politics} “Technology pushes sporting boundaries. . . ” {Sport, Science}

On-line Hierarchical Multi-label Classification – p. 4/3

slide-25
SLIDE 25

Applications of ML Classification

Learn to automatically label (categorise / tag): Articles (news, encyclopedia, academic, . . . ) Emails Web pages (e.g. the Yahoo! directory) RSS feeds Web Bookmarks Medical Text Biological Applications

On-line Hierarchical Multi-label Classification – p. 5/3

slide-26
SLIDE 26

Problem Transformation (PT)

To transform a multi-label problem into a single-label (multi-class) problem using 3 transformation techniques:

  • 1. PT1 (BC) Binary Classifiers Method
  • 2. PT2 (RT) Ranking with Threshold Method
  • 3. PT3 (DS) Distinct Subsets Method

Then we can just use a single-label classifier, then transform the classification back to multi-label. These 3 form the base of practically all existing work, even so called “algorithm adaption” methods.

On-line Hierarchical Multi-label Classification – p. 6/3

slide-27
SLIDE 27

PT1 (Binary Classifiers)

Makes a series of binary classifiers/classifications. Label set: L = {Sport, Environment, Science, Politics} Multi-label Dtrain; (d, S ⊆ L)

d1,{Sports,Politics} d2,{Science,Politics} d3,{Sports} d4,{Environment,Science}

On-line Hierarchical Multi-label Classification – p. 7/3

slide-28
SLIDE 28

PT1 (Binary Classifiers)

Makes a series of binary classifiers/classifications. Label set: L = {Sport, Environment, Science, Politics} Single-label Dtrain; (d, l ∈ L)

C0 (d1, Sport), (d2, ¬Sport), (d3, Sport), (d4, ¬Sport) C1 (d1, ¬Enviro.), (d2, ¬Enviro.), (d3, ¬Enviro.), (d4, Enviro.) C2 (d1, ¬Science), (d2, Science), (d3, ¬Science), (d4, Science) C3 (d1, Politics), (d2, Politics), (d3, ¬Politics), (d4, ¬Politics)

On-line Hierarchical Multi-label Classification – p. 7/3

slide-29
SLIDE 29

PT1 (Binary Classifiers)

Makes a series of binary classifiers/classifications. Label set: L = {Sport, Environment, Science, Politics} Single-label Dtrain; (d, l ∈ L)

C0 (d1, Sport), (d2, ¬Sport), (d3, Sport), (d4, ¬Sport) C1 (d1, ¬Enviro.), (d2, ¬Enviro.), (d3, ¬Enviro.), (d4, Enviro.) C2 (d1, ¬Science), (d2, Science), (d3, ¬Science), (d4, Science) C3 (d1, Politics), (d2, Politics), (d3, ¬Politics), (d4, ¬Politics) dx = “Revealed: Polluting Impact of Humans on the

Oceans...”

On-line Hierarchical Multi-label Classification – p. 7/3

slide-30
SLIDE 30

PT1 (Binary Classifiers)

Makes a series of binary classifiers/classifications. Label set: L = {Sport, Environment, Science, Politics} Single-label Dtrain; (d, l ∈ L)

C0 (d1, Sport), (d2, ¬Sport), (d3, Sport), (d4, ¬Sport) C1 (d1, ¬Enviro.), (d2, ¬Enviro.), (d3, ¬Enviro.), (d4, Enviro.) C2 (d1, ¬Science), (d2, Science), (d3, ¬Science), (d4, Science) C3 (d1, Politics), (d2, Politics), (d3, ¬Politics), (d4, ¬Politics) dx = “Revealed: Polluting Impact of Humans on the

Oceans...” Single-label Test; (d, l ∈ {0, 1})

C0 C1 C2 C3 (dx, ?) (dx, ?) (dx, ?) (dx, ?)

On-line Hierarchical Multi-label Classification – p. 7/3

slide-31
SLIDE 31

PT1 (Binary Classifiers)

Makes a series of binary classifiers/classifications. Label set: L = {Sport, Environment, Science, Politics} Single-label Dtrain; (d, l ∈ L)

C0 (d1, Sport), (d2, ¬Sport), (d3, Sport), (d4, ¬Sport) C1 (d1, ¬Enviro.), (d2, ¬Enviro.), (d3, ¬Enviro.), (d4, Enviro.) C2 (d1, ¬Science), (d2, Science), (d3, ¬Science), (d4, Science) C3 (d1, Politics), (d2, Politics), (d3, ¬Politics), (d4, ¬Politics) dx = “Revealed: Polluting Impact of Humans on the

Oceans...” Single-label Test; (d, l ∈ {0, 1})

C0 C1 C2 C3 (dx, ¬Sport) (dx, Enviro.) (dx, Science) (dx, ¬Politics)

On-line Hierarchical Multi-label Classification – p. 7/3

slide-32
SLIDE 32

PT1 (Binary Classifiers)

Makes a series of binary classifiers/classifications. Label set: L = {Sport, Environment, Science, Politics} Single-label Dtrain; (d, l ∈ L)

C0 (d1, Sport), (d2, ¬Sport), (d3, Sport), (d4, ¬Sport) C1 (d1, ¬Enviro.), (d2, ¬Enviro.), (d3, ¬Enviro.), (d4, Enviro.) C2 (d1, ¬Science), (d2, Science), (d3, ¬Science), (d4, Science) C3 (d1, Politics), (d2, Politics), (d3, ¬Politics), (d4, ¬Politics) dx = “Revealed: Polluting Impact of Humans on the

Oceans...” Multi-label Test; (d, S ⊆ L)

dx,{Environment, Science}

On-line Hierarchical Multi-label Classification – p. 7/3

slide-33
SLIDE 33

PT1 (Binary Classifiers)

Makes a series of binary classifiers/classifications. Label set: L = {Sport, Environment, Science, Politics} Single-label Dtrain; (d, l ∈ L)

C0 (d1, Sport), (d2, ¬Sport), (d3, Sport), (d4, ¬Sport) C1 (d1, ¬Enviro.), (d2, ¬Enviro.), (d3, ¬Enviro.), (d4, Enviro.) C2 (d1, ¬Science), (d2, Science), (d3, ¬Science), (d4, Science) C3 (d1, Politics), (d2, Politics), (d3, ¬Politics), (d4, ¬Politics)

Multi-label Test; (d, S ⊆ L)

dx,{Environment, Science}

Slow, need |L| classifiers. Assumes that all labels are independent Can get overwhelmed by negative examples

On-line Hierarchical Multi-label Classification – p. 7/3

slide-34
SLIDE 34

PT2 (Ranking with Threshold)

Label subset from a label ranking (with threshold t).

L = {Sports, Environment, Science, Politics}

ML Dtrain; (d, S ⊆ L)

d1,{Sports,Politics} d2,{Science,Politics} d3,{Sports} d4,{Environment,Science}

On-line Hierarchical Multi-label Classification – p. 8/3

slide-35
SLIDE 35

PT2 (Ranking with Threshold)

Label subset from a label ranking (with threshold t).

L = {Sports, Environment, Science, Politics}

SL Dtrain; (d, l ∈ L)

d1,Sports d1,Politics d2,Science d2,Politics d3,Sports d4,Science d4,Environment

On-line Hierarchical Multi-label Classification – p. 8/3

slide-36
SLIDE 36

PT2 (Ranking with Threshold)

Label subset from a label ranking (with threshold t).

L = {Sports, Environment, Science, Politics}

SL Dtrain; (d, l ∈ L)

d1,Sports d1,Politics d2,Science d2,Politics d3,Sports d4,Science d4,Environment

SL Test; (d, l ∈ L)

P(l|d) dx,Sports λ dx,Environment λ dx,Science λ dx,Sports λ dx = “Revealed: Polluting Impact of Humans on the

Oceans...”

On-line Hierarchical Multi-label Classification – p. 8/3

slide-37
SLIDE 37

PT2 (Ranking with Threshold)

Label subset from a label ranking (with threshold t).

L = {Sports, Environment, Science, Politics}

SL Dtrain; (d, l ∈ L)

d1,Sports d1,Politics d2,Science d2,Politics d3,Sports d4,Science d4,Environment

SL Test; (d, l ∈ L)

P(l|d) dx,Sports

0.13

dx,Environment

0.93

dx,Science

0.99

dx,Sports

0.21

dx = “Revealed: Polluting Impact of Humans on the

Oceans...”

On-line Hierarchical Multi-label Classification – p. 8/3

slide-38
SLIDE 38

PT2 (Ranking with Threshold)

Label subset from a label ranking (with threshold t).

L = {Sports, Environment, Science, Politics}

SL Dtrain; (d, l ∈ L)

d1,Sports d1,Politics d2,Science d2,Politics d3,Sports d4,Science d4,Environment

SL Test; (d, l ∈ L)

P(l|d) dx,Environment

0.93

dx,Science

0.99 where t = 0.5

dx = “Revealed: Polluting Impact of Humans on the

Oceans...”

On-line Hierarchical Multi-label Classification – p. 8/3

slide-39
SLIDE 39

PT2 (Ranking with Threshold)

Label subset from a label ranking (with threshold t).

L = {Sports, Environment, Science, Politics}

SL Dtrain; (d, l ∈ L)

d1,Sports d1,Politics d2,Science d2,Politics d3,Sports d4,Science d4,Environment

ML Test; (d, S ⊆ L)

dx,{Environment,Science} dx = “Revealed: Polluting Impact of Humans on the

Oceans...”

On-line Hierarchical Multi-label Classification – p. 8/3

slide-40
SLIDE 40

PT2 (Ranking with Threshold)

Label subset from a label ranking (with threshold t).

L = {Sports, Environment, Science, Politics}

SL Dtrain; (d, l ∈ L)

d1,Sports d1,Politics d2,Science d2,Politics d3,Sports d4,Science d4,Environment

ML Test; (d, S ⊆ L)

dx,{Environment,Science}

Issues with threshold selection / classifier selection Assumes that all labels are independent

On-line Hierarchical Multi-label Classification – p. 8/3

slide-41
SLIDE 41

PT3 (Label Subsets Method)

Each label subset becomes an atomic label.

L = {Sports, Environment, Science, Politics}

Multi-label Dtrain; (d, S ⊆ L)

d1,{Sports,Politics} d2,{Science,Politics} d3,{Sports} d4,{Environment,Science}

On-line Hierarchical Multi-label Classification – p. 9/3

slide-42
SLIDE 42

PT3 (Label Subsets Method)

Each label subset becomes an atomic label.

L = {Sports, Environment, Science, Politics}

Single-label Dtrain; (d, l ∈ distinct(l ∈ SLDtrain))

d1,Sports_Politics d2,Science_Politics d3,Sports d4,Environment_Science

On-line Hierarchical Multi-label Classification – p. 9/3

slide-43
SLIDE 43

PT3 (Label Subsets Method)

Each label subset becomes an atomic label.

L = {Sports, Environment, Science, Politics}

Single-label Dtrain; (d, l ∈ distinct(l ∈ SLDtrain))

d1,Sports_Politics d2,Science_Politics d3,Sports d4,Environment_Science dx = “Revealed: Polluting Impact of Humans on the

Oceans...”

On-line Hierarchical Multi-label Classification – p. 9/3

slide-44
SLIDE 44

PT3 (Label Subsets Method)

Each label subset becomes an atomic label.

L = {Sports, Environment, Science, Politics}

Single-label Dtrain; (d, l ∈ distinct(l ∈ SLDtrain))

d1,Sports_Politics d2,Science_Politics d3,Sports d4,Environment_Science dx = “Revealed: Polluting Impact of Humans on the

Oceans...” Single-label Test (d, l ∈ distinct(l ∈ SLDtrain))

dx,?

On-line Hierarchical Multi-label Classification – p. 9/3

slide-45
SLIDE 45

PT3 (Label Subsets Method)

Each label subset becomes an atomic label.

L = {Sports, Environment, Science, Politics}

Single-label Dtrain; (d, l ∈ distinct(l ∈ SLDtrain))

d1,Sports_Politics d2,Science_Politics d3,Sports d4,Environment_Science dx = “Revealed: Polluting Impact of Humans on the

Oceans...” Single-label Test (d, l ∈ distinct(l ∈ SLDtrain))

dx,Environment_Science

On-line Hierarchical Multi-label Classification – p. 9/3

slide-46
SLIDE 46

PT3 (Label Subsets Method)

Each label subset becomes an atomic label.

L = {Sports, Environment, Science, Politics}

Single-label Dtrain; (d, l ∈ distinct(l ∈ SLDtrain))

d1,Sports_Politics d2,Science_Politics d3,Sports d4,Environment_Science dx = “Revealed: Polluting Impact of Humans on the

Oceans...” Multi-label Test (d, S ⊆ L)

dx,{Environment,Science}

On-line Hierarchical Multi-label Classification – p. 9/3

slide-47
SLIDE 47

PT3 (Label Subsets Method)

Each label subset becomes an atomic label.

L = {Sports, Environment, Science, Politics}

Single-label Dtrain; (d, l ∈ distinct(l ∈ SLDtrain))

d1,Sports_Politics d2,Science_Politics d3,Sports d4,Environment_Science

Multi-label Test (d, S ⊆ L)

dx,{Environment,Science}

Issues: May generate many classes for few instances (very slow with SMO) Can only predict combinations seen in the training set

On-line Hierarchical Multi-label Classification – p. 9/3

slide-48
SLIDE 48

Initial Discoveries

ML Data has extra complexity: A variable number of labels per instance Many distinct label combinations (up to min(|D|, 2|L|)) Label imbalances / variations / concept shifts tend to be exaggerated Relationships between labels are important: label X may only ever occur by itself labels X and Y may occur together often labels X and Y may never occur together labels X and Y may occur in the presence of Z

On-line Hierarchical Multi-label Classification – p. 10/3

slide-49
SLIDE 49

Initial Conclusions

Only PT3 inherently takes label relationships into account. This is crucial to performance: PT3 usually performs best but: it is affected negatively by the extra complexity much slower (especially with SMO) liable to do terribly with some datasets why?

On-line Hierarchical Multi-label Classification – p. 11/3

slide-50
SLIDE 50

The “Long Tail”

Some label combinations occur a lot, most occur infrequently

On-line Hierarchical Multi-label Classification – p. 12/3

slide-51
SLIDE 51

A Multi-label Confusion Matrix

Cmbs/Lbls L0 L1 L2 L3 . L16 L17 L18 L19 FP TP Err L14 25 23 4 28 . 39 2 30 342 486 0.70 L14+L19 12 4 13 . 18 6 55 375 4 93.75 L0 20 6 3 2 . 17 1 169 446 0.38 L13 7 12 6 13 . 22 2 2 181 204 0.89 L4 34 23 3 26 . 28 1 6 411 151 2.72 L19 10 23 10 4 . 12 1 3 197 600 0.33 . . . . . . . . . . . . L1+L18 1 3 . 1 28 36 0.78 L15 1 25 4 . 14 10 1 258 432 0.60 L16 1 21 14 29 . 2 9 385 399 0.96 L5 49 11 21 . 37 5 35 1 494 484 1.02 L3 1 49 13 . 37 2 20 3 411 590 0.70

Some label combinations can cause most of the trouble

On-line Hierarchical Multi-label Classification – p. 13/3

slide-52
SLIDE 52

A Graph View

4 29 32 34 16 20 42 25 27 44 23 26 21 36 41 43 37 38 11 13 24 39 8 31 40 10 9 12 22 19 28 35 15 7 17 14 6 5 33 3 30 18 1 2

Figure 1: The Medical data, each edge represents ≥ 1 co-occurrence (100% of 978 instances)

On-line Hierarchical Multi-label Classification – p. 14/3

slide-53
SLIDE 53

A Graph View

36 41 43 37 38 11 13 24 39 32 34 44 16 9 22 4 25 27 28 15 17 14 10 31 7 3 23 30 35 19 1 12 2 21

Figure 2: The Medical data, each edge represents ≥ 2 co-occurrences (97% of 978 instances)

On-line Hierarchical Multi-label Classification – p. 14/3

slide-54
SLIDE 54

A Graph View

36 41 24 39 32 34 44 11 4 25 27 28 43 9 17 14 10 31 16 37 23 30 35 19 1 12 38 2 21 22

Figure 3: The Medical data, each edge represents ≥ 3 co-occurrences (92% of 978 instances)

On-line Hierarchical Multi-label Classification – p. 14/3

slide-55
SLIDE 55

Pruned Problem Transformation (PPT)

Prune/Ignore all label subsets which occur a maximum number of times (i.e. ≤ p times) in the training data. E.g. (*) where p = 1:

Doc. Labels (S ⊆ L) 1 {Sports,Politics} 2 {Science,Politics} 3 {Sports} 4 {Environment,Science} 5 {Science} 6 {Sports} 7 {Environment,Science} 8 {Sports} 9 {Science} 10 {Science}

On-line Hierarchical Multi-label Classification – p. 15/3

slide-56
SLIDE 56

Pruned Problem Transformation (PPT)

Prune/Ignore all label subsets which occur a maximum number of times (i.e. ≤ p times) in the training data. E.g. (*) where p = 1:

Doc. Labels (S ⊆ L) 1 {Sports,Politics} 2 {Science,Politics} 3 {Sports} 4 {Environment,Science} 5 {Science} 6 {Sports} 7 {Environment,Science} 8 {Sports} 9 {Science} 10 {Science}

On-line Hierarchical Multi-label Classification – p. 15/3

slide-57
SLIDE 57

Pruned Problem Transformation (PPT)

Prune/Ignore all label subsets which occur a maximum number of times (i.e. ≤ p times) in the training data. E.g. (*) where p = 1:

Doc. Labels (S ⊆ L) 3 {Sports} 4 {Environment,Science} 5 {Science} 6 {Sports} 7 {Environment,Science} 8 {Sports} 9 {Science} 10 {Science}

On-line Hierarchical Multi-label Classification – p. 15/3

slide-58
SLIDE 58

Pruned Problem Transformation (PPT)

Prune/Ignore all label subsets which occur a maximum number of times (i.e. ≤ p times) in the training data. E.g. (*) where p = 1:

Doc. Labels (S ⊆ L) 3 {Sports} 4 {Environment,Science} 5 {Science} 6 {Sports} 7 {Environment,Science} 8 {Sports} 9 {Science} 10 {Science}

Lost 20% of data. Can we save any of that data?

On-line Hierarchical Multi-label Classification – p. 15/3

slide-59
SLIDE 59

Pruned Problem Transformation (PPT)

Prune/Ignore all label subsets which occur a maximum number of times (i.e. ≤ p times) in the training data. E.g. (*) where p = 1:

Doc. Labels (S ⊆ L) 3 {Sports} 4 {Environment,Science} 5 {Science} 6 {Sports} 7 {Environment,Science} 8 {Sports} 9 {Science} 10 {Science}

Lost 20% of data. Can we save any of that data? By splitting up S into subsubsets which occur > p times

On-line Hierarchical Multi-label Classification – p. 15/3

slide-60
SLIDE 60

PPT −N A

Method A: Before deleting (d, S), greedily divide S into subsets S1 ∩ S2 ∩ · · · ∩ Sn = ∅ which do occur > p times. e.g.:

S {l2, l5, l7} S1 {l2, l5} S2 {l7}

Assume that L = {l1, l2, · · · , l7}, and that S does not occur

> p times, whereas S1 · · · etc do occur > p times. (d, S) will

be discarded, while (d, S1), · · · etc will be added to the training set.

On-line Hierarchical Multi-label Classification – p. 16/3

slide-61
SLIDE 61

PPT −N B

Method B: Choose the biggest n subsets Si ⊂ S which do

  • ccur > p times (n is a new parameter).

e.g. B3:

S {l2, l5, l7} S1 {l2, l5} S2 {l5, l7} S3 {l7}

Assume that L = {l1, l2, · · · , l7}, and that S does not occur

> p times, whereas S1 · · · etc do occur > p times. (d, S) will

be discarded, while (d, S1), · · · etc will be added to the training set.

On-line Hierarchical Multi-label Classification – p. 17/3

slide-62
SLIDE 62

PPT −N C

Method C: Choose all subsets Si ⊂ S, where |Si| > n, and here S1 · · · etc do occur ≥ p times. e.g. C1:

S {l2, l5, l7} S1 {l2, l5} S2 {l5, l7}

Assume that L = {l1, l2, · · · , l7}, and that S does not occur

> p times, whereas S1 · · · etc do occur > p times. (d, S) will

be discarded, while (d, S1), · · · etc will be added to the training set.

On-line Hierarchical Multi-label Classification – p. 18/3

slide-63
SLIDE 63

Multilabel Datasets->

|D| |L| LCard(D) PDist(D)

Medical 978 45 1.25 0.096 Scene 2407 6 1.07 0.006 Yeast 2417 14 4.24 0.082 Enron 1702 53 3.38 0.442

LCard(D) = average size of S; ‘multi-label-ness’ PDist(D) = the percentage of S which are distinct

On-line Hierarchical Multi-label Classification – p. 19/3

slide-64
SLIDE 64

Multilabel Evaluation

Accuracy:

1 |D|

|D|

i=1 |Si∩Yi| |Si∪Yi|

Hamming loss:

1 |D|

|D|

i=1 Yi∆Si |L|

(∆ = symmetrical difference)

F1:

1 |D|

|D|

i=1 2∗p∗r p+r (precision,recall of Yi from Si)

E.g.:

Y = 0100100010 (predicted) S = 0100101000 (actual)

Accuracy

2/4 0.50

(best = 1.00) Hamming loss

2/10 0.20

(best = 0.00) F1

(2 ∗ 2

3 ∗ 2 3/(2 3 + 2 3))

0.67

(best = 1.00)

On-line Hierarchical Multi-label Classification – p. 20/3

slide-65
SLIDE 65

Experiments I

PPT vs PT1,PT2,PT3. For values of PPT -N {−, A, B, C}param In terms of accuracy and build time (y axis) For values of pruning value p 1 − 15 (x axis) with SMO as the internal classifier

On-line Hierarchical Multi-label Classification – p. 21/3

slide-66
SLIDE 66

Experiments I

66 68 70 72 74 76 78 80 2 4 6 8 10 12 14 16 accuracy p value PT3 PT4 PT5 PPT -N - PPT -N B1 PPT -N B2 PPT -N B3 PPT -N B4 PPT -N C1 PPT -N C2 PPT -N C3 PPT -N C4

Medical data, -W SMO, – Accuracy. For p = 1 · · · 15.

On-line Hierarchical Multi-label Classification – p. 21/3

slide-67
SLIDE 67

Experiments I

100 200 300 400 500 600 700 800 2 4 6 8 10 12 14 16 buildtime p value PT3 PT4 PT5 PPT -N - PPT -N B1 PPT -N B2 PPT -N B3 PPT -N B4 PPT -N C1 PPT -N C2 PPT -N C3 PPT -N C4

Medical data, -W SMO, – Build Time. For p = 1 · · · 15.

On-line Hierarchical Multi-label Classification – p. 21/3

slide-68
SLIDE 68

Experiments I

56 58 60 62 64 66 68 70 72 2 4 6 8 10 12 14 16 accuracy p value PT3 PT4 PT5 PPT -N - PPT -N B1 PPT -N B2 PPT -N B3 PPT -N B4 PPT -N C1 PPT -N C2 PPT -N C3 PPT -N C4

Scene data, -W SMO, – Accuracy. For p = 1 · · · 15.

On-line Hierarchical Multi-label Classification – p. 21/3

slide-69
SLIDE 69

Experiments I

20 40 60 80 100 120 2 4 6 8 10 12 14 16 buildtime p value PT3 PT4 PT5 PPT -N - PPT -N B1 PPT -N B2 PPT -N B3 PPT -N B4 PPT -N C1 PPT -N C2 PPT -N C3 PPT -N C4

Scene data, -W SMO, – Build Time. For p = 1 · · · 15.

On-line Hierarchical Multi-label Classification – p. 21/3

slide-70
SLIDE 70

Experiments I

49 50 51 52 53 54 55 56 2 4 6 8 10 12 14 16 accuracy p value PT3 PT4 PT5 PPT -N - PPT -N B1 PPT -N B2 PPT -N B3 PPT -N B4 PPT -N C1 PPT -N C2 PPT -N C3 PPT -N C4

Yeast data, -W SMO, – Accuracy. For p = 1 · · · 15.

On-line Hierarchical Multi-label Classification – p. 21/3

slide-71
SLIDE 71

Experiments I

50 100 150 200 250 2 4 6 8 10 12 14 16 buildtime p value PT3 PT4 PT5 PPT -N - PPT -N B1 PPT -N B2 PPT -N B3 PPT -N B4 PPT -N C1 PPT -N C2 PPT -N C3 PPT -N C4

Yeast data, -W SMO, – Build Time. For p = 1 · · · 15.

On-line Hierarchical Multi-label Classification – p. 21/3

slide-72
SLIDE 72

Experiments I

23 24 25 26 27 28 29 30 31 32 2 4 6 8 10 12 14 16 accuracy p value PT3 PT4 PT5 PPT -N - PPT -N B1 PPT -N B2 PPT -N B3 PPT -N B4 PPT -N C1 PPT -N C2 PPT -N C3 PPT -N C4

Enron data, -W SMO, – Accuracy. For p = 1 · · · 15.

On-line Hierarchical Multi-label Classification – p. 21/3

slide-73
SLIDE 73

Experiments I

1000 2000 3000 4000 5000 6000 7000 2 4 6 8 10 12 14 16 buildtime p value PT3 PT4 PT5 PPT -N - PPT -N B1 PPT -N B2 PPT -N B3 PPT -N B4 PPT -N C1 PPT -N C2 PPT -N C3 PPT -N C4

Enron data, -W SMO, – Build Time. For p = 1 · · · 15.

On-line Hierarchical Multi-label Classification – p. 21/3

slide-74
SLIDE 74

PPT: So far

Fewer classes; Much faster (especially with SMO) For some pruning values p and for some methods for saving information −N, accuracy is superior to the standard problem transformation methods . . . but doesn’t do well on e.g. Enron, where. . . . . . labelling is very irregular . . . too much data is lost when pruning PT1 is the nearest competitor in this case; not PT3 PT1 combines single labels to create multi-labels Can we combine multi-labels to create multi-labels?

On-line Hierarchical Multi-label Classification – p. 22/3

slide-75
SLIDE 75

PPT −J

  • Yes. Consider: each lj ∈ L × each Sk ∈ {distinct S in

pruned(Dtrain)}. e.g.: Sk each label lj ∈ L λk 1 1 0.9 1 1 1 0.8 2 1 1 1 0.4 3 1 0.0 4 1 1 0.0 P 0.4 0.0 1.7 0.0 0.0 1.3 0.0 1.2 t = 0.5 Y 1 1 1 Thus ljk is the jth label of the kth distinct subset, and Yj = 1 if Pj > t, else 0; where Pj = |L|

k=0 ljk × λk; t = 0.5; |L| = 8;

|distinct S| = 5. Hence this e.g. is classified Y = {l2, l5, l7}.

On-line Hierarchical Multi-label Classification – p. 23/3

slide-76
SLIDE 76

Experiments II

26 27 28 29 30 31 32 33 34 35 36 37 2 4 6 8 10 12 14 16 accuracy p value PT3 PT4 PT5 PPT -J -N - PPTjm -N B1 PPTjm -N B2 PPTjm -N B3 PPTjm -N B4 PPTjm -N C1 PPTjm -N C2 PPTjm -N C3 PPTjm -N C4

Enron data, -W SMO, −J. Accuracy (no chg to build time!)

On-line Hierarchical Multi-label Classification – p. 24/3

slide-77
SLIDE 77

State of the Art – k-Labelsubsets

“This paper proposes an ensemble method for multilabel classification. The RAndom k-labELsets (RAKEL) algorithm constructs each member of the ensemble by considering a small random subset of labels and learning a single-label classifier for the prediction of each element in the powerset of this subset. In this way, the proposed algorithm aims to take into account label correlations using single-label classifiers that are applied on subtasks with manageable number of labels and adequate number of examples per label. . . . ”

Tsoumakas G., Vlahavas I. European Conference on Machine Learning (2007), pp. 406-417 (My emphasis). : i.e. m random subsets of PT3 (each with k labels) with an ensemble method and a threshold t to stitch them together. Params: -m <models> -k <set sizes> -t <thres.>

On-line Hierarchical Multi-label Classification – p. 25/3

slide-78
SLIDE 78

Experiments III

RAKEL parameters: best parameter combination according to authors on their datasets; best parameter combination according to iterative testing m ∗ k ∗ 0.5 on my datasets. PPT parameters: best parameter combination according to me on all datasets (iterative testing

p ∗ N ∗ −J0.5 )

Calculate thresholds (where necessary) based on 5x CV of training data. All experiments done within a WEKA-based framework, using default SMO as the internal single-label classifier

On-line Hierarchical Multi-label Classification – p. 26/3

slide-79
SLIDE 79

PPT Vs RAKEL

Yeast Data RAKEL Paramaters: 10x: -k 6 -m 20 -t PPT Parameters: 1x: -p 4 -N C3 -J -t <> RAKEL PPT ACCURACY 55.604 55.013 F1 0.671 0.662

  • HAM. LOSS

0.207 0.199 BUILD TIME 2080 487

|D| |L| LCard(D) PDist(D)

Yeast 2417 14 4.24 0.082

On-line Hierarchical Multi-label Classification – p. 27/3

slide-80
SLIDE 80

PPT Vs RAKEL

Scene Data RAKEL Paramaters: 10x: -k 4 -m 15 -t <> PPT Parameters: 1x: -p 12 -N − RAKEL PPT ACCURACY 70.161 71.391 F1 0.725 0.725

  • HAM. LOSS

0.101 0.097 BUILD TIME 608 7

|D| |L| LCard(D) PDist(D)

Scene 2407 6 1.07 0.006

On-line Hierarchical Multi-label Classification – p. 27/3

slide-81
SLIDE 81

PPT Vs RAKEL

Medical Data RAKEL Paramaters: 10x: -k 27 -m 20 -t <> PPT Parameters: 1x: -p 1 -N B2 RAKEL PPT ACCURACY 78.454 78.571 F1 0.811 0.802

  • HAM. LOSS

0.011 0.011 BUILD TIME 7011 59

|D| |L| LCard(D) PDist(D)

Medical 978 45 1.25 0.096

On-line Hierarchical Multi-label Classification – p. 27/3

slide-82
SLIDE 82

PPT Vs RAKEL

Enron Data RAKEL Paramaters: 10x: -k 12 -m 30 -t <> PPT Parameters: 10x: -p 5 -N B1 -J -t <> RAKEL PPT ACCURACY 33.366 35.075 F1 0.471 0.468

  • HAM. LOSS

0.067 0.066 BUILD TIME 32323 115

|D| |L| LCard(D) PDist(D)

Enron 1702 53 3.38 0.442

On-line Hierarchical Multi-label Classification – p. 27/3

slide-83
SLIDE 83

PPT Vs RAKEL

Conclusions: PPT: Can get slightly better accuracy/hamming loss RAKEL: Can get slightly better F1a Arguably no significant difference statistically (RAKEL may even have been able to match Enron, but experiments taking too long); but. . . PPT: by far the most efficient RAKEL: only viable when |L| is small RAKEL: fiddly parameter configuration

athis is because PPT prunes rare label combinations which usually have more

labels than average hence always resulting in higher precision than recall

On-line Hierarchical Multi-label Classification – p. 28/3

slide-84
SLIDE 84

PFS: Pruned ML Feature Selection

Multi-label Feature Selection: StringToWordVector Binary Multi-Label Feature Selection – “BMLFS” – Rank features for each label (class), sum the rankings, then take the top K features with the highest sum Pruned Multi-label Feature Selection – “PFS” – Create a single-label (multi-class) set via a PPT transformation, then just run ordinary multi-class feature selection

On-line Hierarchical Multi-label Classification – p. 29/3

slide-85
SLIDE 85

PFS: Pruned ML Feature Selection

25 30 35 40 45 50 55 60 65 70 100 200 300 400 500 600 700 800 900 1000 PFS-x0-IG PFS-x0-X2 PFS-x2-IG PFS-x2-X2 PFS-x5-IG PFS-x5-X2 BPFS-IG BPFS-X2 STWV-L-S

The Medical Data - various FS methods. Results from PPT

−p2 −W NaiveBayes

On-line Hierarchical Multi-label Classification – p. 29/3

slide-86
SLIDE 86

PFS: Pruned ML Feature Selection

16 18 20 22 24 26 28 30 32 100 200 300 400 500 600 700 800 900 1000 PFS-x0-IG PFS-x0-X2 PFS-x2-IG PFS-x2-X2 PFS-x5-IG PFS-x5-X2 BPFS-IG BPFS-X2 STWV-L-S

The Enron Data - various FS methods. Results from PPT

−p5 −W SMO

On-line Hierarchical Multi-label Classification – p. 29/3

slide-87
SLIDE 87

PPT-H: PPT Hierarchical

20 Newsgroups data – a “local” hierarchical view (as defined in dataset, a PT method with classifier in each non-leaf node). Disadvantages: Each classifier node ignorant of other nodes Wrong classifications get propagated

On-line Hierarchical Multi-label Classification – p. 30/3

slide-88
SLIDE 88

PPT-H: PPT Hierarchical

20 Newsgroups data – a pruned linked hierarchical view, based on PPT; pruned with −p 3.

root politics rec electronics autos misc_forsale comp graphics religion sci pmisc rmisc space mideast guns motorcycles sport baseball hockey ibm_pc_hardware med sys

  • s_ms_windows_misc

windows_x mac_hardware atheism christian crypt

So far, in practice, no improvement* :-( Update*: it improves on datasets with a deep hierarchy, e.g. Reuters (103 leaf nodes, up to 4 levels deep)

On-line Hierarchical Multi-label Classification – p. 30/3

slide-89
SLIDE 89

Summary / Current / Future Work

Summary: Multi-label Problem Transformation methods: Advantages vs Disadvantages A closer look at Multi-labeled data A new Problem Transformation method: PPT / −N / −J Experiments Some related applications: PFS; PPT-H Future Work: More on PPT-H Hierarchical methods Next Major Focus: On-line / Adaptive methods

On-line Hierarchical Multi-label Classification – p. 31/3

slide-90
SLIDE 90

The End

Questions? / Comments?

On-line Hierarchical Multi-label Classification – p. 32/3

slide-91
SLIDE 91

Appendix 1.

On-line Hierarchical Multi-label Classification – p. 33/3