on line hierarchical multi label text classification
play

On-line Hierarchical Multi-label Text Classification Jesse Read - PowerPoint PPT Presentation

On-line Hierarchical Multi-label Text Classification Jesse Read Supervised by Bernhard (and Eibe and Geoff) On-line Hierarchical Multi-label Text Classification 1 Multi-label Classification Multi-class (Single-label) Classification e.g.


  1. On-line Hierarchical Multi-label Text Classification Jesse Read Supervised by Bernhard (and Eibe and Geoff) On-line Hierarchical Multi-label Text Classification 1

  2. Multi-label Classification Multi-class (“Single-label”) Classification e.g. Class set C = { Sports, Environment, Science, Politics } For a text document d , select a class c ∈ C Multi-label Classification e.g. Label set L = { Sports, Environment, Science, Politics } . For a text document d select a label subset S ⊆ L Doc. Labels ( S ⊆ L ) 1 { Sports,Politics } e.g.: 2 { Science,Politics } 3 { Sports } 4 { Environment,Science } ...how to do multi-label classification? On-line Hierarchical Multi-label Text Classification 2

  3. Problem Transformation Methods (PT) Transforming a multi-label problem into a multi-class problem without losing information: 1. (LC) Label Combination Method 2. (BC) Binary Classifiers Method 3. (RT) Ranking Threshold Method Our toy multi-label problem: Label Set L = { Sports, Environment, Science, Politics } Doc. Labels ( S ⊆ L ) 1 { Sports,Politics } 2 { Science,Politics } 3 { Sports } 4 { Environment,Science } On-line Hierarchical Multi-label Text Classification 3

  4. 1. Label Combination Method (LC) Train Doc. Class 1 Sports+Politics 2 Science+Politics 3 Sports 4 Science+Environment Test Doc. Class X ? • May generate many classes for few documents • Possibly inflexible for time-ordered data On-line Hierarchical Multi-label Text Classification 4

  5. 2. Binary Classifiers Method (BC) Train B Sports B Environment B Science B P olitics Doc. Class Doc. Class Doc. Class Doc. Class 1 1 1 0 1 0 1 1 2 0 2 0 2 1 2 1 3 1 3 0 3 0 3 0 4 0 4 1 4 1 4 0 Doc. B Sports B Environment B Science B P olitics Test X ? ? ? ? • Slow, need | L | classifiers. • Assumes that all labels are independent On-line Hierarchical Multi-label Text Classification 5

  6. 3. Ranking Threshold Method (RT) Doc. Class 1 Sports 1 Politics 2 Science Train 2 Politics 3 Sports 4 Science 4 Environment Doc. Certainty Distribution Test X ( Y w , Y x , Y y , Y z ) = (?,?,?,?) • Difficulty in selecting a threshold • Assumes that all labels are independent On-line Hierarchical Multi-label Text Classification 6

  7. Algorithm Adaption Methods We have seen the 3 main “Problem Transformation” methods. There are also Algorithm Adaption methods, for example: • Modifying the entropy of J48 • Multiple actions for Association Rules • AdaBoost.MH, AdaBoost.MR • Modifications to SMO, kNN, . . . Although most algorithm adaption methods just use a problem transformation method internally, e.g. Association Rules—LC, AdaBoost.MH—“AdaBoost Transformation”(AT), AdaBoost.MR—RT ...what about hierarchy? On-line Hierarchical Multi-label Text Classification 7

  8. Hierarchical Classification Includes some method to recognise relationships between labels. For text data, we recognise a tree structured topic hierarchy, known as a taxonomy . There are two approaches to hierarchical classification: • Global Hierarchical (a.k.a. the “big bang” approach) • Local Hierarchical (a.k.a the “top down” approach) On-line Hierarchical Multi-label Text Classification 8

  9. Global Hierarchical root Americas. Americas MidEast. MidEast. Sports. Sports. Sci/Tech US Canada Iraq Iran Soccer Rugby + Improvements in accuracy − Difficult to maintain; can get very computationally complex E.g. • Stacking (e.g. on BC) • EM (e.g. on LC) • Boosting (e.g. with AT) • Association Rules • Predictive Clustering Trees (multi-label tree learners) On-line Hierarchical Multi-label Text Classification 9

  10. Local Hierarchical root Americas Mid.East Sci/Tech Sports US Canada Iraq Iran Soccer Rugby + Divides up the problem: easy to maintain; intuitive − Error propagation; accuracy similar to flat PT E.g. • Pachinko Machine, e.g. Fuzzy Relational Thesauri (FRT) • Probabilistic • Hybrid: ECOC, Error Recovery, Can return to higher nodes On-line Hierarchical Multi-label Text Classification 10

  11. Multi-label Datasets Key | D | | L | UC ( D, L ) LC ( D, L ) Hier. Seq. Text YEAST 2,417 14 198 4.24 N N N MEDC 978 45 94 1.25 N N Y 20NG 19,300 20 55 1.03 Y Y Y ENRN 1,702 53 753 3.38 Y Y Y MARX 3,617 101 208 1.13 Y Y Y REUT 6,000 103 811 1.46 Y N Y | D | = Number of documents | L | = Number of possible labels UC ( D, L ) = | S | S ⊆ L, ∃ d ∈ D : L ( d ) = S | � | D | 1 LC ( D, L ) = for i = 1 · · · | D | , S i ⊆ L , where ( d i , S i ): i =1 | S i | | D | Hier. = Hierarchical structure defined within dataset Seq. = Time-ordered data Text = Text Dataset On-line Hierarchical Multi-label Text Classification 11

  12. Multi-label Evaluation • Percentage of correctly classified instances? – Too harsh • Percentage of correctly classified labels? – Too easy Let C be a multi-label classifier, S i ⊆ L and Y i = C ( x i ) be label predictions by C for document x i : | D | 1 | S i ∩ Y i | � Accuracy ( C, D ) = (1) | D | | S i ∪ Y i | i =1 Hierarchical Evaluation: • Should we give partial credit? • If so, how? On-line Hierarchical Multi-label Text Classification 12

  13. Algorithms Multi-class algorithms commonly used in prior multi-label work: Key Type Description NB Bayes Na¨ ıve Bayes BAG. Meta Bagging (with J48) SMO Function Support Vector Machines J48 Tree J48 IBk kNN k Nearest Neighbor NN Neural Neural Networks Pilot experiments showed that: • Default NN too slow • IBk does not perform well with sparse data On-line Hierarchical Multi-label Text Classification 13

  14. Experiments — Tables Flat vs Global Hierarchical vs Local Hierarchical 1. Problem Transformation LC BC RT NB BAG SMO J48 NB BAG SMO J48 NB BAG SMO J48 MEDI 68.05 71.77* 71.10* 72.13* 55.82 75.58* 73.59* 65.83 67.81 64.20 65.72 60.22 20NG 57.47* 57.58* 57.35* 52.74 32.33 - 47.67 41.09 56.05* 47.19 54.61* 50.55 ENRN 32.72* 25.42 - 22.96 21.82 31.35* 30.56* 26.26 15.16 30.25* 24.09 27.82 MARX 48.15* 48.93* 43.26 44.79 32.6 31.69 38.64 33.95 48.44* 36.07 40.46 38.71 REUT 43.76 51.47 - 41.68 18.21 44.09 56.23* 43.83 37.13 45.9 58.65* 45.31 2. Global Hierarchical LC-EM BC-Stack(RT-NB) AT NB BAG SMO J48 NB BAG SMO J48 BAG J48 MEDI 67.45 74.71* 70.75 72.31 56.09 70.76 73.65* 65.85 67.06 67.82 20NG 57.48* 57.58* 57.45* 53.39 29.8 - 49.06 40.88 - - ENRN 34.6* 25.46 - 23.31 20.66 31.79 27.01 25.35 - - MARX 48.18 50.64* 43.29 44.82 39.09 32.08 38.87 34.25 - - REUT 43.77 51.49* - 41.69 19.78 43.83 57.32* 43.68 - - 3. Local Hierarchical LC BC RT NB BAG SMO J48 NB BAG SMO J48 NB BAG SMO J48 20NG 56.49 58.31* 58.83* 53.48 43.68 - 52.44 42.03 54.87 40.58 53.37 49.26 ENRN 25.96 29.38 27.73 25.23 15.3 34.99* - 26.26 4.67 25.51 23.59 27.63 MARX 48.49 54.57* 42.4 46.84 41.69 38.67 40.34 38.65 46.44 33.59 38.32 41.23 On-line Hierarchical Multi-label Text Classification 14

  15. Experiments — 20NG — Accuracy 100 LH.LC-SMO LH.BC-SMO LH.RT-NB GH.LC_EM-NB GH.BC_STACK-SMO 80 GH.AT_J48 60 40 20 0 10 100 1000 10000 100000 % Labeled(Training) Examples On-line Hierarchical Multi-label Text Classification 15

  16. Experiments — 20NG — Build Time 12000 LH.LC-SMO LH.BC-SMO LH.RT-NB GH.LC_EM-NB 10000 GH.BC_STACK-SMO GH.AT_J48 8000 6000 4000 2000 0 10 100 1000 10000 100000 % Labeled(Training) Examples On-line Hierarchical Multi-label Text Classification 16

  17. Experiments — ENRN — Accuracy 100 LH.LC-SMO LH.BC-SMO LH.RT-NB GH.LC_EM-NB GH.BC_STACK-SMO 80 GH.AT_J48 60 40 20 0 10 100 1000 10000 % Labeled(Training) Examples On-line Hierarchical Multi-label Text Classification 17

  18. Experiments — ENRN — Build Time 4500 LH.LC-SMO LH.BC-SMO 4000 LH.RT-NB GH.LC_EM-NB GH.BC_STACK-SMO 3500 GH.AT_J48 3000 2500 2000 1500 1000 500 0 10 100 1000 10000 % Labeled(Training) Examples On-line Hierarchical Multi-label Text Classification 18

  19. Experiments — MARX — Accuracy 100 LH.LC-SMO LH.BC-SMO LH.RT-NB GH.LC_EM-NB GH.BC_STACK-SMO 80 GH.AT_J48 60 40 20 0 10 100 1000 10000 % Labeled(Training) Examples On-line Hierarchical Multi-label Text Classification 19

  20. Experiments — MARX — Build Time 1400 LH.LC-SMO LH.BC-SMO LH.RT-NB 1200 GH.LC_EM-NB GH.BC_STACK-SMO GH.AT_J48 1000 800 600 400 200 0 10 100 1000 10000 % Labeled(Training) Examples On-line Hierarchical Multi-label Text Classification 20

  21. Conclusions Problem Transformation methods: • No problem transformation method is best on all datasets • BC and RT might do better with a better selected | S | • Complexity determined by D , L , LC ( D, L ) and UC ( D, L ) Multi-class algorithms: • J48 not that great • BC doesn’t go well with Na¨ ıve Bayes, RT does, and LC works equally with either Hierarchical: • Global PT-extensions improve on flat • In practice there is overhead involved with building local hierarchical classifiers but also in theory more flexible On-line Hierarchical Multi-label Text Classification 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend