On-line Hierarchical Multi-label Text Classification Jesse Read - PowerPoint PPT Presentation

On-line Hierarchical Multi-label Text Classification Jesse Read September 7, 2007 On-line Hierarchical Multi-label Text Classification 1

The Problem Learning to automatically classify text documents . Eg: • Emails • News Articles, Current Events (websites, RSS feeds) • “Folksonomies” (Wikipedia, CiteULike) • Bookmarks (Web browser, del.ic.ous, Google Bookmarks) • Other (e.g. File System, Medical Text Classification) Each of these examples is (or could be): • Text • Multi-label • Organised in a Hierarchy • On-line / Streamed (not Batch Learning) • Affected by Human Interaction On-line Hierarchical Multi-label Text Classification 2

Multi-label Classification Given a label set L = { Sports, Environment, Science, Politics } ; “Single-label” (Multi-class) Classification For a text document d , the task is to select a label l ∈ L Multi-label Classification For a text document d select a label subset S ⊆ L Example Labels ( S ⊆ L ) Document 1 { Sports,Politics } E.g.: Document 2 { Science,Politics } Document 3 { Sports } Document 4 { Environment,Science } On-line Hierarchical Multi-label Text Classification 3

Multi-label Classification Done by transforming a multi-label problem into a single-label problem, i.e. with a Problem Transformation method : 1. (LC) Label Combination Method 2. (BC) Binary Classifiers Method 3. (RT) Ranking Threshold Method Then employ a standard single-label algorithm on the resulting data. E.g. : Naive Bayes, C4.5, Bagging with C4.5, Support Vector Machines, k Nearest Neighbour, Neural Networks, AdaBoostM1 . Then transform result back to multi-label representation. On-line Hierarchical Multi-label Text Classification 4

1. Label Combination Method (LC) Each combination of labels becomes a single label. A single-label classifier C learns to classify from the resulting combinations. One decision per label. E.g.: ( C ) Document X either belongs to Sports+Politics or Science+Politics or Sports or Science+Environment • May generate many unique combinations for few documents • What if a document about Sports and Science turns up? • Can run very slow if no. of unique combinations grows large On-line Hierarchical Multi-label Text Classification 5

2. Binary Classifiers Method (BC) Single-label [binary] classifiers are created for each possible label. Multiple decisions per document. E.g. Four classifiers C 1 · · · C 4 , one for each label. Document X ( C 1 ) belongs to Sports ? YES/NO... ( C 2 ) belongs to Environment ? YES/NO... ( C 3 ) belongs to Science ? YES/NO... ( C 4 ) belongs to Politics ? YES/NO... • Slow, need as many classifiers as labels. • Assumes that all labels are independent • Often way too many labels are selected On-line Hierarchical Multi-label Text Classification 6

3. Ranking Threshold Method (RT) A single-label classifier C outputs a ranking of its confidence for each label. E.g.: Document X ( C ) is 95.5% likely to belong to Science ( C ) is 81.2% likely to belong to Environment ( C ) is 60.9% likely to belong to Sports ( C ) is 21.3% likely to belong to Politics e.g. Threshold = 80.0% • Not all single-label classifiers can output their “confidence” • Assumes that all labels are independent • Difficulty in selecting a good threshold • Often the threshold encloses way too many labels On-line Hierarchical Multi-label Text Classification 7

Hierarchical Classification (Option 1 - Global) Uses 1 Problem Transformation method and single-label classifier. Information about the hierarchy is incorporated into the process. root Americas. Americas MidEast. MidEast. Sports. Sports. Sci/Tech US Canada Iraq Iran Soccer Rugby + Higher accuracy − Can run very slow and use up a lot of memory − Difficult to maintain; inflexible On-line Hierarchical Multi-label Text Classification 8

Hierarchical Classification (Option 2 - Local) Each internal node with its own Problem Transformation Method. root Americas Mid.East Sci/Tech Sports US Canada Iraq Iran Soccer Rugby + Divides up the problem: easy to maintain; efficient; intuitive − Error propagation; accuracy unimpressive − Overhead involved in setting up the hierarchical structure On-line Hierarchical Multi-label Text Classification 9

Experiments — 20Newsgroups — Accuracy 100 LH.LC-SMO LH.BC-SMO LH.RT-NB GH.LC_EM-NB GH.BC_STACK-SMO 80 GH.AT_J48 60 40 20 0 10 100 1000 10000 100000 % Labeled(Training) Examples On-line Hierarchical Multi-label Text Classification 10

Experiments — 20Newsgroups — Build Time 12000 LH.LC-SMO LH.BC-SMO LH.RT-NB GH.LC_EM-NB 10000 GH.BC_STACK-SMO GH.AT_J48 8000 6000 4000 2000 0 10 100 1000 10000 100000 % Labeled(Training) Examples On-line Hierarchical Multi-label Text Classification 11

Experiments — Enron — Accuracy 100 LH.LC-SMO LH.BC-SMO LH.RT-NB GH.LC_EM-NB GH.BC_STACK-SMO 80 GH.AT_J48 60 40 20 0 10 100 1000 10000 % Labeled(Training) Examples On-line Hierarchical Multi-label Text Classification 12

Experiments — Enron — Build Time 4500 LH.LC-SMO LH.BC-SMO 4000 LH.RT-NB GH.LC_EM-NB GH.BC_STACK-SMO 3500 GH.AT_J48 3000 2500 2000 1500 1000 500 0 10 100 1000 10000 % Labeled(Training) Examples On-line Hierarchical Multi-label Text Classification 13

Experiments — NewsArticles — Accuracy 100 LH.LC-SMO LH.BC-SMO LH.RT-NB GH.LC_EM-NB GH.BC_STACK-SMO 80 GH.AT_J48 60 40 20 0 10 100 1000 10000 % Labeled(Training) Examples On-line Hierarchical Multi-label Text Classification 14

Experiments — NewsArticles — Build Time 1400 LH.LC-SMO LH.BC-SMO LH.RT-NB 1200 GH.LC_EM-NB GH.BC_STACK-SMO GH.AT_J48 1000 800 600 400 200 0 10 100 1000 10000 % Labeled(Training) Examples On-line Hierarchical Multi-label Text Classification 15

Initial Conclusions Performance is poor. • All Problem Transformation methods have significant disadvantages • Multi-label data is more complex than single-label data • Multi-label text datasets can be very different, no method best for all • On-line data is invariably susceptible to “Concept Drift” • . . . but it is very costly to build / rebuild classifiers On-line Hierarchical Multi-label Text Classification 16

Current Work • Analysis and modelling of on-line hierarchical multi-label text data • Analysing the performance/flaws of Problem Transformation methods • Investigating adaptive and incremental learning methods On-line Hierarchical Multi-label Text Classification 17

“Multi-label-ness”: Documents per Label • 80/20 rule. Typically most labels used not used very often. On-line Hierarchical Multi-label Text Classification 18

“Multi-label-ness”: Labels per Documents • Most documents have only a few labels. On-line Hierarchical Multi-label Text Classification 19

On-line data: Creation of Labels Over Time • Most labels are used for the first time (created) very early on. On-line Hierarchical Multi-label Text Classification 20

On-line data: Label Combinations Over Time • New label combinations continue to appear for some time. On-line Hierarchical Multi-label Text Classification 21

On-line data ∗ : Label Activity Over Time • Labels occur and reoccur in “bursts” • → Topic/“burst” detection ∗ On-line Hierarchical Multi-label Text Classification 22

On-line data ∗ : Label Activity Over Time • Label often co-occur in bursts. • Labels may be unused for periods of time On-line Hierarchical Multi-label Text Classification 23

Other Things I found • Some labels are particularly troublesome • Some label combinations are particularly troublesome • Some Problem Transformation methods do better or worse depending on variations of: – The length and type of text documents – The no. of training examples seen – The no. of possible labels it can choose from – The no. of unique combinations of those labels – Etc. On-line Hierarchical Multi-label Text Classification 24

Future Work • Continue analysis • Improve Problem Transformation methods • Design a novel hierarchical multi-label classification framework, for on-line text data streams, able to adapt to and learn from human interference (manual labelling). On-line Hierarchical Multi-label Text Classification 25

. . . Questions? . . . Comments? On-line Hierarchical Multi-label Text Classification 26

On-line Hierarchical Multi-label Text Classification Jesse Read - PowerPoint PPT Presentation

On-line Hierarchical Multi-label Text Classification Jesse Read September 7, 2007 On-line Hierarchical Multi-label Text Classification 1 The Problem Learning to automatically classify text documents . Eg: Emails News Articles, Current

On-line Hierarchical Multi-label Text Classification Jesse Read Supervised by Bernhard (and Eibe

On-line Hierarchical Multi-label Classification last 6 months Jesse Read jesse.read@gmail.com

Blue Label Pilot-plant Reactor 1 Product Line-up Platinum Label Gold Label Blue Label Blue

AG! Blue Label Bench-top Reactor 1 Product line up Platinum Label Gold Label Blue Label Blue

Extreme Classification A New Paradigm for Ranking & Recommendation Manik Varma Microsoft

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

A Pruned Problem Transformation Method for Multi-label Classification Jesse Read

Work on Multi-label Classification Jesse Read Supervised by Bernhard Pfahringer

Learning Context-dependent Label Permutations for Multi-label Classification Jinseok Nam Amazon

The Slope of a Line The Slope of a Line The Slope of a Line The Slope of a Line The Slope of a

Title Slide Math 696 Class July 19, 2002 Line 1 Line 2 Line 3 Line 4 Line 5 Line 6 Line 7

On-line Multi-label Classification A Problem Transformation Approach Jesse Read Supervisors:

Text Classification and Sequence Labeling Graham Neubig Text Classification

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Web Information Retrieval Lecture 14 Text classification Sec. 13.1 Text Classification

Factorization of the Label Conditional Distribution for Multi-Label Classification ECML PKDD 2015

GridICE: Requirements, Architecture and Experience of a Monitoring Tool for Grid Systems Sergio

Open Source Passive DNS Replication Robert Edmonds ( edmonds@isc.org ) October 14, 2012 ISC

Modelling Languages: (mostly) Concrete (Visual) Syntax Hans Vangheluwe http://msdl.cs.mcgill.ca/

Linking ArchivesSpace Hierarchy to DSpace Digital Objects Terry Brady Georgetown University

V ISUALIZING HDF5 DATA WITH O PEN DX Ireneusz Szcze sniak John Cary Center for Integrated

Regional Operations Forum How to Organize for Operations Organization What are we talking about?

Out- -of of- -Order Order Out Superscalar CPU Superscalar CPU Cliff Frey and Vicky Liu May

adv advanc anced insta ed instance nce sc schedu heduling by ling by integr int egrate

On-line Hierarchical Multi-label Text Classification Jesse Read - PowerPoint PPT Presentation

On-line Hierarchical Multi-label Text Classification Jesse Read September 7, 2007 On-line Hierarchical Multi-label Text Classification 1 The Problem Learning to automatically classify text documents . Eg: Emails News Articles, Current

On-line Hierarchical Multi-label Text Classification Jesse Read Supervised by Bernhard (and Eibe

On-line Hierarchical Multi-label Classification last 6 months Jesse Read jesse.read@gmail.com

Blue Label Pilot-plant Reactor 1 Product Line-up Platinum Label Gold Label Blue Label Blue

AG! Blue Label Bench-top Reactor 1 Product line up Platinum Label Gold Label Blue Label Blue

Extreme Classification A New Paradigm for Ranking &amp; Recommendation Manik Varma Microsoft

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

A Pruned Problem Transformation Method for Multi-label Classification Jesse Read

Work on Multi-label Classification Jesse Read Supervised by Bernhard Pfahringer

Learning Context-dependent Label Permutations for Multi-label Classification Jinseok Nam Amazon

The Slope of a Line The Slope of a Line The Slope of a Line The Slope of a Line The Slope of a

Title Slide Math 696 Class July 19, 2002 Line 1 Line 2 Line 3 Line 4 Line 5 Line 6 Line 7

On-line Multi-label Classification A Problem Transformation Approach Jesse Read Supervisors:

Text Classification and Sequence Labeling Graham Neubig Text Classification

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Web Information Retrieval Lecture 14 Text classification Sec. 13.1 Text Classification

Factorization of the Label Conditional Distribution for Multi-Label Classification ECML PKDD 2015

GridICE: Requirements, Architecture and Experience of a Monitoring Tool for Grid Systems Sergio

Open Source Passive DNS Replication Robert Edmonds ( edmonds@isc.org ) October 14, 2012 ISC

Modelling Languages: (mostly) Concrete (Visual) Syntax Hans Vangheluwe http://msdl.cs.mcgill.ca/

Linking ArchivesSpace Hierarchy to DSpace Digital Objects Terry Brady Georgetown University

V ISUALIZING HDF5 DATA WITH O PEN DX Ireneusz Szcze sniak John Cary Center for Integrated

Regional Operations Forum How to Organize for Operations Organization What are we talking about?

Out- -of of- -Order Order Out Superscalar CPU Superscalar CPU Cliff Frey and Vicky Liu May

adv advanc anced insta ed instance nce sc schedu heduling by ling by integr int egrate

Extreme Classification A New Paradigm for Ranking & Recommendation Manik Varma Microsoft