On-line Hierarchical Multi-label Text Classification Jesse Read - - PowerPoint PPT Presentation

on line hierarchical multi label text classification
SMART_READER_LITE
LIVE PREVIEW

On-line Hierarchical Multi-label Text Classification Jesse Read - - PowerPoint PPT Presentation

On-line Hierarchical Multi-label Text Classification Jesse Read Supervised by Bernhard (and Eibe and Geoff) On-line Hierarchical Multi-label Text Classification 1 Multi-label Classification Multi-class (Single-label) Classification e.g.


slide-1
SLIDE 1

On-line Hierarchical Multi-label Text Classification

Jesse Read Supervised by Bernhard (and Eibe and Geoff)

On-line Hierarchical Multi-label Text Classification 1

slide-2
SLIDE 2

Multi-label Classification

Multi-class (“Single-label”) Classification e.g. Class set C = {Sports, Environment, Science, Politics} For a text document d, select a class c ∈ C Multi-label Classification e.g. Label set L = {Sports, Environment, Science, Politics}. For a text document d select a label subset S ⊆ L e.g.: Doc. Labels (S ⊆ L) 1 {Sports,Politics} 2 {Science,Politics} 3 {Sports} 4 {Environment,Science} ...how to do multi-label classification?

On-line Hierarchical Multi-label Text Classification 2

slide-3
SLIDE 3

Problem Transformation Methods (PT)

Transforming a multi-label problem into a multi-class problem without losing information:

  • 1. (LC) Label Combination Method
  • 2. (BC) Binary Classifiers Method
  • 3. (RT) Ranking Threshold Method

Our toy multi-label problem: Label Set L = {Sports, Environment, Science, Politics} Doc. Labels (S ⊆ L) 1 {Sports,Politics} 2 {Science,Politics} 3 {Sports} 4 {Environment,Science}

On-line Hierarchical Multi-label Text Classification 3

slide-4
SLIDE 4
  • 1. Label Combination Method (LC)

Train Doc. Class 1 Sports+Politics 2 Science+Politics 3 Sports 4 Science+Environment Test Doc. Class X ?

  • May generate many classes for few documents
  • Possibly inflexible for time-ordered data

On-line Hierarchical Multi-label Text Classification 4

slide-5
SLIDE 5
  • 2. Binary Classifiers Method (BC)

Train BSports Doc. Class 1 1 2 3 1 4 BEnvironment Doc. Class 1 2 3 4 1 BScience Doc. Class 1 2 1 3 4 1 BP olitics Doc. Class 1 1 2 1 3 4 Test Doc. BSports BEnvironment BScience BP olitics X ? ? ? ?

  • Slow, need |L| classifiers.
  • Assumes that all labels are independent

On-line Hierarchical Multi-label Text Classification 5

slide-6
SLIDE 6
  • 3. Ranking Threshold Method (RT)

Train Doc. Class 1 Sports 1 Politics 2 Science 2 Politics 3 Sports 4 Science 4 Environment Test Doc. Certainty Distribution X (Yw,Yx,Yy,Yz) = (?,?,?,?)

  • Difficulty in selecting a threshold
  • Assumes that all labels are independent

On-line Hierarchical Multi-label Text Classification 6

slide-7
SLIDE 7

Algorithm Adaption Methods

We have seen the 3 main “Problem Transformation” methods. There are also Algorithm Adaption methods, for example:

  • Modifying the entropy of J48
  • Multiple actions for Association Rules
  • AdaBoost.MH, AdaBoost.MR
  • Modifications to SMO, kNN, . . .

Although most algorithm adaption methods just use a problem transformation method internally, e.g. Association Rules—LC, AdaBoost.MH—“AdaBoost Transformation”(AT), AdaBoost.MR—RT ...what about hierarchy?

On-line Hierarchical Multi-label Text Classification 7

slide-8
SLIDE 8

Hierarchical Classification

Includes some method to recognise relationships between labels. For text data, we recognise a tree structured topic hierarchy, known as a taxonomy. There are two approaches to hierarchical classification:

  • Global Hierarchical (a.k.a. the “big bang” approach)
  • Local Hierarchical (a.k.a the “top down” approach)

On-line Hierarchical Multi-label Text Classification 8

slide-9
SLIDE 9

Global Hierarchical

Americas. US Americas Canada MidEast. Iraq MidEast. Iran Sports. Soccer Sports. Rugby Sci/Tech root

+ Improvements in accuracy − Difficult to maintain; can get very computationally complex E.g.

  • Stacking (e.g. on BC)
  • EM (e.g. on LC)
  • Boosting (e.g. with AT)
  • Association Rules
  • Predictive Clustering Trees (multi-label tree learners)

On-line Hierarchical Multi-label Text Classification 9

slide-10
SLIDE 10

Local Hierarchical

US Canada Iraq Iran Soccer Rugby Sci/Tech Americas Mid.East Sports root

+ Divides up the problem: easy to maintain; intuitive − Error propagation; accuracy similar to flat PT E.g.

  • Pachinko Machine, e.g. Fuzzy Relational Thesauri (FRT)
  • Probabilistic
  • Hybrid: ECOC, Error Recovery, Can return to higher nodes

On-line Hierarchical Multi-label Text Classification 10

slide-11
SLIDE 11

Multi-label Datasets

Key |D| |L| UC(D, L) LC(D, L) Hier. Seq. Text YEAST 2,417 14 198 4.24 N N N MEDC 978 45 94 1.25 N N Y 20NG 19,300 20 55 1.03 Y Y Y ENRN 1,702 53 753 3.38 Y Y Y MARX 3,617 101 208 1.13 Y Y Y REUT 6,000 103 811 1.46 Y N Y |D| = Number of documents |L| = Number of possible labels UC(D, L) = |S|S ⊆ L, ∃d ∈ D : L(d) = S| LC(D, L) = for i = 1 · · · |D|, Si ⊆ L, where (di, Si):

1 |D|

|D|

i=1 |Si|

  • Hier. = Hierarchical structure defined within dataset
  • Seq. = Time-ordered data

Text = Text Dataset

On-line Hierarchical Multi-label Text Classification 11

slide-12
SLIDE 12

Multi-label Evaluation

  • Percentage of correctly classified instances? – Too harsh
  • Percentage of correctly classified labels? – Too easy

Let C be a multi-label classifier, Si ⊆ L and Yi = C(xi) be label predictions by C for document xi: Accuracy(C, D) = 1 |D|

|D|

  • i=1

|Si ∩ Yi| |Si ∪ Yi| (1) Hierarchical Evaluation:

  • Should we give partial credit?
  • If so, how?

On-line Hierarchical Multi-label Text Classification 12

slide-13
SLIDE 13

Algorithms

Multi-class algorithms commonly used in prior multi-label work: Key Type Description NB Bayes Na¨ ıve Bayes BAG. Meta Bagging (with J48) SMO Function Support Vector Machines J48 Tree J48 IBk kNN k Nearest Neighbor NN Neural Neural Networks Pilot experiments showed that:

  • Default NN too slow
  • IBk does not perform well with sparse data

On-line Hierarchical Multi-label Text Classification 13

slide-14
SLIDE 14

Experiments — Tables

Flat vs Global Hierarchical vs Local Hierarchical

1. Problem Transformation LC BC RT NB BAG SMO J48 NB BAG SMO J48 NB BAG SMO J48 MEDI 68.05 71.77* 71.10* 72.13* 55.82 75.58* 73.59* 65.83 67.81 64.20 65.72 60.22 20NG 57.47* 57.58* 57.35* 52.74 32.33

  • 47.67

41.09 56.05* 47.19 54.61* 50.55 ENRN 32.72* 25.42

  • 22.96

21.82 31.35* 30.56* 26.26 15.16 30.25* 24.09 27.82 MARX 48.15* 48.93* 43.26 44.79 32.6 31.69 38.64 33.95 48.44* 36.07 40.46 38.71 REUT 43.76 51.47

  • 41.68

18.21 44.09 56.23* 43.83 37.13 45.9 58.65* 45.31 2. Global Hierarchical LC-EM BC-Stack(RT-NB) AT NB BAG SMO J48 NB BAG SMO J48 BAG J48 MEDI 67.45 74.71* 70.75 72.31 56.09 70.76 73.65* 65.85 67.06 67.82 20NG 57.48* 57.58* 57.45* 53.39 29.8

  • 49.06

40.88

  • ENRN

34.6* 25.46

  • 23.31

20.66 31.79 27.01 25.35

  • MARX

48.18 50.64* 43.29 44.82 39.09 32.08 38.87 34.25

  • REUT

43.77 51.49*

  • 41.69

19.78 43.83 57.32* 43.68

  • 3.

Local Hierarchical LC BC RT NB BAG SMO J48 NB BAG SMO J48 NB BAG SMO J48 20NG 56.49 58.31* 58.83* 53.48 43.68

  • 52.44

42.03 54.87 40.58 53.37 49.26 ENRN 25.96 29.38 27.73 25.23 15.3 34.99*

  • 26.26

4.67 25.51 23.59 27.63 MARX 48.49 54.57* 42.4 46.84 41.69 38.67 40.34 38.65 46.44 33.59 38.32 41.23

On-line Hierarchical Multi-label Text Classification 14

slide-15
SLIDE 15

Experiments — 20NG — Accuracy

20 40 60 80 100 10 100 1000 10000 100000 % Labeled(Training) Examples LH.LC-SMO LH.BC-SMO LH.RT-NB GH.LC_EM-NB GH.BC_STACK-SMO GH.AT_J48 On-line Hierarchical Multi-label Text Classification 15

slide-16
SLIDE 16

Experiments — 20NG — Build Time

2000 4000 6000 8000 10000 12000 10 100 1000 10000 100000 % Labeled(Training) Examples LH.LC-SMO LH.BC-SMO LH.RT-NB GH.LC_EM-NB GH.BC_STACK-SMO GH.AT_J48 On-line Hierarchical Multi-label Text Classification 16

slide-17
SLIDE 17

Experiments — ENRN — Accuracy

20 40 60 80 100 10 100 1000 10000 % Labeled(Training) Examples LH.LC-SMO LH.BC-SMO LH.RT-NB GH.LC_EM-NB GH.BC_STACK-SMO GH.AT_J48 On-line Hierarchical Multi-label Text Classification 17

slide-18
SLIDE 18

Experiments — ENRN — Build Time

500 1000 1500 2000 2500 3000 3500 4000 4500 10 100 1000 10000 % Labeled(Training) Examples LH.LC-SMO LH.BC-SMO LH.RT-NB GH.LC_EM-NB GH.BC_STACK-SMO GH.AT_J48 On-line Hierarchical Multi-label Text Classification 18

slide-19
SLIDE 19

Experiments — MARX — Accuracy

20 40 60 80 100 10 100 1000 10000 % Labeled(Training) Examples LH.LC-SMO LH.BC-SMO LH.RT-NB GH.LC_EM-NB GH.BC_STACK-SMO GH.AT_J48 On-line Hierarchical Multi-label Text Classification 19

slide-20
SLIDE 20

Experiments — MARX — Build Time

200 400 600 800 1000 1200 1400 10 100 1000 10000 % Labeled(Training) Examples LH.LC-SMO LH.BC-SMO LH.RT-NB GH.LC_EM-NB GH.BC_STACK-SMO GH.AT_J48 On-line Hierarchical Multi-label Text Classification 20

slide-21
SLIDE 21

Conclusions

Problem Transformation methods:

  • No problem transformation method is best on all datasets
  • BC and RT might do better with a better selected |S|
  • Complexity determined by D, L, LC(D, L) and UC(D, L)

Multi-class algorithms:

  • J48 not that great
  • BC doesn’t go well with Na¨

ıve Bayes, RT does, and LC works equally with either Hierarchical:

  • Global PT-extensions improve on flat
  • In practice there is overhead involved with building local

hierarchical classifiers but also in theory more flexible

On-line Hierarchical Multi-label Text Classification 21

slide-22
SLIDE 22

Applications

  • Email
  • Bookmarks (Web browser, del.ic.ous, Google Bookmarks)
  • Folksonomies (Wikipedia, CiteULike)
  • Websites
  • Other (Medical Text Classification, etc.)

These all tend to be (or could be):

  • Multi-label
  • Hierarchical
  • Supervised
  • On-line
  • Time-evolving

On-line Hierarchical Multi-label Text Classification 22

slide-23
SLIDE 23

Future Work

Only just the beginning. Other things to look into:

  • Modelling multi-label hierarchical on-line data
  • Topic/“burst” detection (N.B. - not topic creation)
  • Minimising human interaction
  • Active learning?
  • Incremental algorithms
  • Adaptive learning

On-line Hierarchical Multi-label Text Classification 23

slide-24
SLIDE 24

Questions? Comments?

On-line Hierarchical Multi-label Text Classification 24