On-line Hierarchical Multi-label Text Classification Jesse Read - - PowerPoint PPT Presentation

on line hierarchical multi label text classification
SMART_READER_LITE
LIVE PREVIEW

On-line Hierarchical Multi-label Text Classification Jesse Read - - PowerPoint PPT Presentation

On-line Hierarchical Multi-label Text Classification Jesse Read September 7, 2007 On-line Hierarchical Multi-label Text Classification 1 The Problem Learning to automatically classify text documents . Eg: Emails News Articles, Current


slide-1
SLIDE 1

On-line Hierarchical Multi-label Text Classification

Jesse Read September 7, 2007

On-line Hierarchical Multi-label Text Classification 1

slide-2
SLIDE 2

The Problem

Learning to automatically classify text documents. Eg:

  • Emails
  • News Articles, Current Events (websites, RSS feeds)
  • “Folksonomies” (Wikipedia, CiteULike)
  • Bookmarks (Web browser, del.ic.ous, Google Bookmarks)
  • Other (e.g. File System, Medical Text Classification)

Each of these examples is (or could be):

  • Text
  • Multi-label
  • Organised in a Hierarchy
  • On-line / Streamed (not Batch Learning)
  • Affected by Human Interaction

On-line Hierarchical Multi-label Text Classification 2

slide-3
SLIDE 3

Multi-label Classification

Given a label set L = {Sports, Environment, Science, Politics}; “Single-label” (Multi-class) Classification For a text document d, the task is to select a label l ∈ L Multi-label Classification For a text document d select a label subset S ⊆ L E.g.: Example Labels (S ⊆ L) Document 1 {Sports,Politics} Document 2 {Science,Politics} Document 3 {Sports} Document 4 {Environment,Science}

On-line Hierarchical Multi-label Text Classification 3

slide-4
SLIDE 4

Multi-label Classification

Done by transforming a multi-label problem into a single-label problem, i.e. with a Problem Transformation method:

  • 1. (LC) Label Combination Method
  • 2. (BC) Binary Classifiers Method
  • 3. (RT) Ranking Threshold Method

Then employ a standard single-label algorithm on the resulting data. E.g. : Naive Bayes, C4.5, Bagging with C4.5, Support Vector Machines, k Nearest Neighbour, Neural Networks, AdaBoostM1. Then transform result back to multi-label representation.

On-line Hierarchical Multi-label Text Classification 4

slide-5
SLIDE 5
  • 1. Label Combination Method (LC)

Each combination of labels becomes a single label. A single-label classifier C learns to classify from the resulting combinations. One decision per label. E.g.: (C) Document X either belongs to Sports+Politics

  • r Science+Politics
  • r Sports
  • r Science+Environment
  • May generate many unique combinations for few documents
  • What if a document about Sports and Science turns up?
  • Can run very slow if no. of unique combinations grows large

On-line Hierarchical Multi-label Text Classification 5

slide-6
SLIDE 6
  • 2. Binary Classifiers Method (BC)

Single-label [binary] classifiers are created for each possible label. Multiple decisions per document. E.g. Four classifiers C1 · · · C4, one for each label. Document X (C1) belongs to Sports? YES/NO... (C2) belongs to Environment? YES/NO... (C3) belongs to Science? YES/NO... (C4) belongs to Politics? YES/NO...

  • Slow, need as many classifiers as labels.
  • Assumes that all labels are independent
  • Often way too many labels are selected

On-line Hierarchical Multi-label Text Classification 6

slide-7
SLIDE 7
  • 3. Ranking Threshold Method (RT)

A single-label classifier C outputs a ranking of its confidence for each label. E.g.: Document X (C) is 95.5% likely to belong to Science (C) is 81.2% likely to belong to Environment (C) is 60.9% likely to belong to Sports (C) is 21.3% likely to belong to Politics e.g. Threshold = 80.0%

  • Not all single-label classifiers can output their “confidence”
  • Assumes that all labels are independent
  • Difficulty in selecting a good threshold
  • Often the threshold encloses way too many labels

On-line Hierarchical Multi-label Text Classification 7

slide-8
SLIDE 8

Hierarchical Classification (Option 1 - Global)

Uses 1 Problem Transformation method and single-label classifier. Information about the hierarchy is incorporated into the process.

Americas. US Americas Canada MidEast. Iraq MidEast. Iran Sports. Soccer Sports. Rugby Sci/Tech root

+ Higher accuracy − Can run very slow and use up a lot of memory − Difficult to maintain; inflexible

On-line Hierarchical Multi-label Text Classification 8

slide-9
SLIDE 9

Hierarchical Classification (Option 2 - Local)

Each internal node with its own Problem Transformation Method.

US Canada Iraq Iran Soccer Rugby Sci/Tech Americas Mid.East Sports root

+ Divides up the problem: easy to maintain; efficient; intuitive − Error propagation; accuracy unimpressive − Overhead involved in setting up the hierarchical structure

On-line Hierarchical Multi-label Text Classification 9

slide-10
SLIDE 10

Experiments — 20Newsgroups — Accuracy

20 40 60 80 100 10 100 1000 10000 100000 % Labeled(Training) Examples LH.LC-SMO LH.BC-SMO LH.RT-NB GH.LC_EM-NB GH.BC_STACK-SMO GH.AT_J48 On-line Hierarchical Multi-label Text Classification 10

slide-11
SLIDE 11

Experiments — 20Newsgroups — Build Time

2000 4000 6000 8000 10000 12000 10 100 1000 10000 100000 % Labeled(Training) Examples LH.LC-SMO LH.BC-SMO LH.RT-NB GH.LC_EM-NB GH.BC_STACK-SMO GH.AT_J48 On-line Hierarchical Multi-label Text Classification 11

slide-12
SLIDE 12

Experiments — Enron — Accuracy

20 40 60 80 100 10 100 1000 10000 % Labeled(Training) Examples LH.LC-SMO LH.BC-SMO LH.RT-NB GH.LC_EM-NB GH.BC_STACK-SMO GH.AT_J48 On-line Hierarchical Multi-label Text Classification 12

slide-13
SLIDE 13

Experiments — Enron — Build Time

500 1000 1500 2000 2500 3000 3500 4000 4500 10 100 1000 10000 % Labeled(Training) Examples LH.LC-SMO LH.BC-SMO LH.RT-NB GH.LC_EM-NB GH.BC_STACK-SMO GH.AT_J48 On-line Hierarchical Multi-label Text Classification 13

slide-14
SLIDE 14

Experiments — NewsArticles — Accuracy

20 40 60 80 100 10 100 1000 10000 % Labeled(Training) Examples LH.LC-SMO LH.BC-SMO LH.RT-NB GH.LC_EM-NB GH.BC_STACK-SMO GH.AT_J48 On-line Hierarchical Multi-label Text Classification 14

slide-15
SLIDE 15

Experiments — NewsArticles — Build Time

200 400 600 800 1000 1200 1400 10 100 1000 10000 % Labeled(Training) Examples LH.LC-SMO LH.BC-SMO LH.RT-NB GH.LC_EM-NB GH.BC_STACK-SMO GH.AT_J48 On-line Hierarchical Multi-label Text Classification 15

slide-16
SLIDE 16

Initial Conclusions

Performance is poor.

  • All Problem Transformation methods have significant

disadvantages

  • Multi-label data is more complex than single-label data
  • Multi-label text datasets can be very different, no method best

for all

  • On-line data is invariably susceptible to “Concept Drift”
  • . . . but it is very costly to build / rebuild classifiers

On-line Hierarchical Multi-label Text Classification 16

slide-17
SLIDE 17

Current Work

  • Analysis and modelling of on-line hierarchical multi-label text

data

  • Analysing the performance/flaws of Problem Transformation

methods

  • Investigating adaptive and incremental learning methods

On-line Hierarchical Multi-label Text Classification 17

slide-18
SLIDE 18

“Multi-label-ness”: Documents per Label

  • 80/20 rule. Typically most labels used not used very often.

On-line Hierarchical Multi-label Text Classification 18

slide-19
SLIDE 19

“Multi-label-ness”: Labels per Documents

  • Most documents have only a few labels.

On-line Hierarchical Multi-label Text Classification 19

slide-20
SLIDE 20

On-line data: Creation of Labels Over Time

  • Most labels are used for the first time (created) very early on.

On-line Hierarchical Multi-label Text Classification 20

slide-21
SLIDE 21

On-line data: Label Combinations Over Time

  • New label combinations continue to appear for some time.

On-line Hierarchical Multi-label Text Classification 21

slide-22
SLIDE 22

On-line data∗: Label Activity Over Time

  • Labels occur and reoccur in “bursts”
  • → Topic/“burst” detection∗

On-line Hierarchical Multi-label Text Classification 22

slide-23
SLIDE 23

On-line data∗: Label Activity Over Time

  • Label often co-occur in bursts.
  • Labels may be unused for periods of time

On-line Hierarchical Multi-label Text Classification 23

slide-24
SLIDE 24

Other Things I found

  • Some labels are particularly troublesome
  • Some label combinations are particularly troublesome
  • Some Problem Transformation methods do better or worse

depending on variations of: – The length and type of text documents – The no. of training examples seen – The no. of possible labels it can choose from – The no. of unique combinations of those labels – Etc.

On-line Hierarchical Multi-label Text Classification 24

slide-25
SLIDE 25

Future Work

  • Continue analysis
  • Improve Problem Transformation methods
  • Design a novel hierarchical multi-label classification framework,

for on-line text data streams, able to adapt to and learn from human interference (manual labelling).

On-line Hierarchical Multi-label Text Classification 25

slide-26
SLIDE 26

. . . Questions? . . . Comments?

On-line Hierarchical Multi-label Text Classification 26