Classifier Chains for Multi-label Classification Jesse Read, - PowerPoint PPT Presentation

Classifier Chains for Multi-label Classification Jesse Read, Bernhard Pfahringer, Geoff Holmes, Eibe Frank University of Waikato New Zealand ECML PKDD 2009, September 9, 2009. Bled, Slovenia J. Read, B. Pfahringer, G. Holmes, E. Frank (UoW) Classifier Chains ECML PKDD 2009 1 / 10

Introduction Multi-label Classification Each instance may be associated with multiple labels set of instances X = { x 1 , · · · , x m } ; set of predefined labels L = { l 1 , · · · , l n } ; dataset ( x 1 , S 1 ) , ( x 2 , S 2 ) , · · · where each S i ⊆ L . For example, a film can be labeled { romance , comedy } J. Read, B. Pfahringer, G. Holmes, E. Frank (UoW) Classifier Chains ECML PKDD 2009 2 / 10

Introduction Multi-label Classification Each instance may be associated with multiple labels set of instances X = { x 1 , · · · , x m } ; set of predefined labels L = { l 1 , · · · , l n } ; dataset ( x 1 , S 1 ) , ( x 2 , S 2 ) , · · · where each S i ⊆ L . For example, a film can be labeled { romance , comedy } Applications Scene, Video classification Text classification Medical classification Biology, Genomics J. Read, B. Pfahringer, G. Holmes, E. Frank (UoW) Classifier Chains ECML PKDD 2009 2 / 10

Introduction Multi-label Classification Each instance may be associated with multiple labels set of instances X = { x 1 , · · · , x m } ; set of predefined labels L = { l 1 , · · · , l n } ; dataset ( x 1 , S 1 ) , ( x 2 , S 2 ) , · · · where each S i ⊆ L . For example, a film can be labeled { romance , comedy } Applications Scene, Video classification Text classification Medical classification Biology, Genomics Multi-label Issues label correlations: consider { romance , comedy } vs { romance , horror } computational complexity J. Read, B. Pfahringer, G. Holmes, E. Frank (UoW) Classifier Chains ECML PKDD 2009 2 / 10

Prior Work Binary relevance method ( BR ): binary problem for each label simple, efficient does not take into account label correlations J. Read, B. Pfahringer, G. Holmes, E. Frank (UoW) Classifier Chains ECML PKDD 2009 3 / 10

Prior Work Binary relevance method ( BR ): binary problem for each label simple, efficient does not take into account label correlations Nearest neighbor approaches based on BR , e.g. MLkNN Stacking approaches, e.g. meta level stacking ( MS ) Pairwise approaches, e.g. calibrated label ranking J. Read, B. Pfahringer, G. Holmes, E. Frank (UoW) Classifier Chains ECML PKDD 2009 3 / 10

Prior Work Binary relevance method ( BR ): binary problem for each label simple, efficient does not take into account label correlations Nearest neighbor approaches based on BR , e.g. MLkNN Stacking approaches, e.g. meta level stacking ( MS ) Pairwise approaches, e.g. calibrated label ranking Label powerset method: label sets are treated as single labels takes into account label correlations computationally complex J. Read, B. Pfahringer, G. Holmes, E. Frank (UoW) Classifier Chains ECML PKDD 2009 3 / 10

Prior Work Binary relevance method ( BR ): binary problem for each label simple, efficient does not take into account label correlations Nearest neighbor approaches based on BR , e.g. MLkNN Stacking approaches, e.g. meta level stacking ( MS ) Pairwise approaches, e.g. calibrated label ranking Label powerset method: label sets are treated as single labels takes into account label correlations computationally complex RAKEL : ensembles of subsets EPS : ensembles of pruned sets J. Read, B. Pfahringer, G. Holmes, E. Frank (UoW) Classifier Chains ECML PKDD 2009 3 / 10

Prior Work Binary relevance method ( BR ): binary problem for each label simple, efficient does not take into account label correlations Nearest neighbor approaches based on BR , e.g. MLkNN Stacking approaches, e.g. meta level stacking ( MS ) Pairwise approaches, e.g. calibrated label ranking Label powerset method: label sets are treated as single labels takes into account label correlations computationally complex RAKEL : ensembles of subsets EPS : ensembles of pruned sets Many other methods take into account label correlations complex, prone to overfitting J. Read, B. Pfahringer, G. Holmes, E. Frank (UoW) Classifier Chains ECML PKDD 2009 3 / 10

Binary Relevance ( BR ) L = { romance , horror , comedy , drama , action , western } ( | L | = 6) Classifiers Classifications C 1 : x → { romance , !romance } romance C 2 : x → { horror , !horror } !horror C 3 : x → { comedy , !comedy } comedy C 4 : x → { drama , !drama } !drama C 5 : x → { action , !action } !action C 6 : x → { western , !western } !western Y ⊆ L { romance , comedy } J. Read, B. Pfahringer, G. Holmes, E. Frank (UoW) Classifier Chains ECML PKDD 2009 4 / 10

Binary Relevance ( BR ) L = { romance , horror , comedy , drama , action , western } ( | L | = 6) Classifiers Classifications C 1 : x → { romance , !romance } romance C 2 : x → { horror , !horror } !horror C 3 : x → { comedy , !comedy } comedy C 4 : x → { drama , !drama } !drama C 5 : x → { action , !action } !action C 6 : x → { western , !western } !western Y ⊆ L { romance , comedy } simple, intuitive efficient useful for incremental contexts doesn’t account for label correlations J. Read, B. Pfahringer, G. Holmes, E. Frank (UoW) Classifier Chains ECML PKDD 2009 4 / 10

Classifier Chains ( CC ) L = { romance , horror , comedy , drama , action , western } ( | L | = 6) Classifiers Classifications C 1 : x → { romance , !romance } romance C 2 : x ∪ romance → { horror , !horror } !horror C 3 : x ∪ romance ∪ !horror → { comedy , !comedy } comedy C 4 : x ∪ romance ∪ !horror ∪ comedy → { drama , !drama } !drama C 5 : x ∪ romance ∪ !horror ∪ comedy ∪ !drama → · · · !action C 6 : x ∪ romance ∪ !horror ∪ comedy ∪ !drama ∪ · · · !western Y ⊆ L = { romance , comedy } J. Read, B. Pfahringer, G. Holmes, E. Frank (UoW) Classifier Chains ECML PKDD 2009 5 / 10

Classifier Chains ( CC ) L = { romance , horror , comedy , drama , action , western } ( | L | = 6) Classifiers Classifications C 1 : x → { romance , !romance } romance C 2 : x ∪ romance → { horror , !horror } !horror C 3 : x ∪ romance ∪ !horror → { comedy , !comedy } comedy C 4 : x ∪ romance ∪ !horror ∪ comedy → { drama , !drama } !drama C 5 : x ∪ romance ∪ !horror ∪ comedy ∪ !drama → · · · !action C 6 : x ∪ romance ∪ !horror ∪ comedy ∪ !drama ∪ · · · !western Y ⊆ L = { romance , comedy } similar advantages to binary relevance method time complexity similar in practice takes into account label correlations how to order the chain? J. Read, B. Pfahringer, G. Holmes, E. Frank (UoW) Classifier Chains ECML PKDD 2009 5 / 10

Ensembles of Classifier Chains ( ECC ) Ensembles known for augmenting accuracy more label correlations learnt, without overfitting solves ‘chain order’ issue: each chain has a random order J. Read, B. Pfahringer, G. Holmes, E. Frank (UoW) Classifier Chains ECML PKDD 2009 6 / 10

Ensembles of Classifier Chains ( ECC ) Ensembles known for augmenting accuracy more label correlations learnt, without overfitting solves ‘chain order’ issue: each chain has a random order For i ∈ 1 · · · m iterations: L ′ ← shuffle label set L D ′ ← subset of training set D train a model CC i given L ′ and D ′ Generic vote/score/threshold method for classification: collect votes from models assign a score to each label apply a threshold to determine relevant labels J. Read, B. Pfahringer, G. Holmes, E. Frank (UoW) Classifier Chains ECML PKDD 2009 6 / 10

Ensembles of Classifier Chains ( ECC ) Ensembles known for augmenting accuracy more label correlations learnt, without overfitting solves ‘chain order’ issue: each chain has a random order For i ∈ 1 · · · m iterations: L ′ ← shuffle label set L D ′ ← subset of training set D train a model CC i given L ′ and D ′ Generic vote/score/threshold method for classification: collect votes from models assign a score to each label apply a threshold to determine relevant labels Can also be applied to binary relevance method , i.e. EBR J. Read, B. Pfahringer, G. Holmes, E. Frank (UoW) Classifier Chains ECML PKDD 2009 6 / 10

Experiments WEKA -based framework Support Vector Machines as base classifiers Multi-label datasets: Labels | L | Instances | D | 6 Standard 6 · · · 103 2407 · · · 6000 6 Large 22 · · · 983 7395 · · · 95424 Multi-label evaluation metrics: accuracy, macro F-measure (label set evaluation) log loss, AU ( PRC ) (per-label evaluation) build times, test times Method parameters preset to optimise predictive performance ( ECC requires no additional parameters) Experiments: Compare Classifier Chains ( CC ) to the Binary Relevance method ( BR ) 1 and related BR -based methods. Compare ECC to EBR and modern methods of proven success: RAKEL , 2 EPS , and MLkNN J. Read, B. Pfahringer, G. Holmes, E. Frank (UoW) Classifier Chains ECML PKDD 2009 7 / 10

Results 1 Comparing CC to BR and related methods SM 1 and MS 2 . Table: Standard Datasets: Wins for each evaluation measure. CC BR SM MS Accuracy 0 1 0 5 Macro F1 5 0 1 0 Micro F1 3 1 0 2 Exact Match 6 0 0 0 Total wins 19 1 2 2 CC ’s chaining technique justified over default BR CC outperforms other similar methods 1 Subset Mapping: maps output of BR to nearest (Hamming distance) known subset 2 Meta Stacking: stacking the output of BR with meta classifiers J. Read, B. Pfahringer, G. Holmes, E. Frank (UoW) Classifier Chains ECML PKDD 2009 8 / 10

Classifier Chains for Multi-label Classification Jesse Read, - PowerPoint PPT Presentation

Classifier Chains for Multi-label Classification Jesse Read, Bernhard Pfahringer, Geoff Holmes, Eibe Frank University of Waikato New Zealand ECML PKDD 2009, September 9, 2009. Bled, Slovenia J. Read, B. Pfahringer, G. Holmes, E. Frank (UoW)

Blue Label Pilot-plant Reactor 1 Product Line-up Platinum Label Gold Label Blue Label Blue

AG! Blue Label Bench-top Reactor 1 Product line up Platinum Label Gold Label Blue Label Blue

On-line Hierarchical Multi-label Text Classification Jesse Read Supervised by Bernhard (and Eibe

Extreme Classification A New Paradigm for Ranking & Recommendation Manik Varma Microsoft

On-line Hierarchical Multi-label Classification last 6 months Jesse Read jesse.read@gmail.com

Lazy Associative Classification Decision Tree Classifier (Eager) Associative Classifier By

A Pruned Problem Transformation Method for Multi-label Classification Jesse Read

Discrete time Markov chains Today: Discrete Time Markov Chains, Limiting Discrete time Markov

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier

Work on Multi-label Classification Jesse Read Supervised by Bernhard Pfahringer

Data Classification Linear Classifier II Latent Differential Analysis Mean Classification

Learning Context-dependent Label Permutations for Multi-label Classification Jinseok Nam Amazon

Industrial Robots Industrial Robots Kinematic chains Kinematic chains Kinematic chains Kinematic

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Factorization of the Label Conditional Distribution for Multi-Label Classification ECML PKDD 2015

XLII Conference on Mathematical Statistics CCnet: joint multi-label classification and feature

Modelling and Synthesis of User Interfaces for Complex, Web-Based Modelling Environments Jacob

Matthew Flaschen Software Engineer, Collaboration T eam, Wikimedia Foundation Converting

What's new in Eclipse IDE and the ecosystem around it Sopot Cela Mickael Istria The plan

Think functionally f(x) = y Presenters Pavel Struhar pavel.struhar@accenture.com Twitter:

Abducing Biological Regulatory Networks from Process Hitting models Maxime FOLSCHETTE 1 , 2

Toward In Interpretable De Deep Re Reinforcement Lea Learning g wi with Li Linea ear Model

Jonschkowski and Brock (2010) CS330 Student Presentation Background State representation: a

East West Rail Central Section Early Development Activity Graham Botham, Principal Strategic

Classifier Chains for Multi-label Classification Jesse Read, - PowerPoint PPT Presentation

Classifier Chains for Multi-label Classification Jesse Read, Bernhard Pfahringer, Geoff Holmes, Eibe Frank University of Waikato New Zealand ECML PKDD 2009, September 9, 2009. Bled, Slovenia J. Read, B. Pfahringer, G. Holmes, E. Frank (UoW)

Blue Label Pilot-plant Reactor 1 Product Line-up Platinum Label Gold Label Blue Label Blue

AG! Blue Label Bench-top Reactor 1 Product line up Platinum Label Gold Label Blue Label Blue

On-line Hierarchical Multi-label Text Classification Jesse Read Supervised by Bernhard (and Eibe

Extreme Classification A New Paradigm for Ranking &amp; Recommendation Manik Varma Microsoft

On-line Hierarchical Multi-label Classification last 6 months Jesse Read jesse.read@gmail.com

Lazy Associative Classification Decision Tree Classifier (Eager) Associative Classifier By

A Pruned Problem Transformation Method for Multi-label Classification Jesse Read

Discrete time Markov chains Today: Discrete Time Markov Chains, Limiting Discrete time Markov

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier

Work on Multi-label Classification Jesse Read Supervised by Bernhard Pfahringer

Data Classification Linear Classifier II Latent Differential Analysis Mean Classification

Learning Context-dependent Label Permutations for Multi-label Classification Jinseok Nam Amazon

Industrial Robots Industrial Robots Kinematic chains Kinematic chains Kinematic chains Kinematic

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Factorization of the Label Conditional Distribution for Multi-Label Classification ECML PKDD 2015

XLII Conference on Mathematical Statistics CCnet: joint multi-label classification and feature

Modelling and Synthesis of User Interfaces for Complex, Web-Based Modelling Environments Jacob

Matthew Flaschen Software Engineer, Collaboration T eam, Wikimedia Foundation Converting

What's new in Eclipse IDE and the ecosystem around it Sopot Cela Mickael Istria The plan

Think functionally f(x) = y Presenters Pavel Struhar pavel.struhar@accenture.com Twitter:

Abducing Biological Regulatory Networks from Process Hitting models Maxime FOLSCHETTE 1 , 2

Toward In Interpretable De Deep Re Reinforcement Lea Learning g wi with Li Linea ear Model

Jonschkowski and Brock (2010) CS330 Student Presentation Background State representation: a

East West Rail Central Section Early Development Activity Graham Botham, Principal Strategic

Extreme Classification A New Paradigm for Ranking & Recommendation Manik Varma Microsoft