SIGML Department of CSE IIT Kanpur Special Interest Group in - PowerPoint PPT Presentation

Multi-label Learning Trees, Embeddings, and much more! Purushottam Kar SIGML Department of CSE IIT Kanpur Special Interest Group in Machine Learning

Classification Paradigms Pick one Pick one Pick all applicable Label 1 Label 1 Label 1 Label 2 Label 2 Label 2 Label 3 Label 3 Label 4 Label 4 … … … … Label L Label L Binary Multi-class Multi-label

Classification Paradigms Pick one Pick one Pick all applicable Binary Multi-class Multi-label

Examples

eXtreme Multi-label Classification What all items would this user buy? : Users : Items

eXtreme Multi-label Classification Who all are present in this selfie?

eXtreme Multi-label Classification Dances by name, Indian culture, Performing arts in India, South India, Tamil culture

Challenges and Opportunities in Multi-label learning • Exploit label correlations • Problem not as large as it seems • Missing labels in training and test set • Appropriate training and evaluation? • Novelty and Diversity in predicted set of labels? • Useful in recommendation and tagging tasks

Evaluation Techniques An Invitation to Optimization Connoisseurs

Classification Metrics Truth y Predicted ŷ

Hamming Loss • ( | y |+ | ŷ | -2 | y  ŷ | )/L =( | y ∆ ŷ | /L) = 3/13 = 0.23 • Symmetric difference • What if | y | >> | ŷ | ? Truth y Predicted ŷ

Precision • | y  ŷ | / | ŷ | = 2/3 = 0.66 Truth y Predicted ŷ

Recall • | y  ŷ | / | y | = 2/4 = 0.5 • What if | y | >> | ŷ | ? Truth y Predicted ŷ

F-measure • Harmonic mean of precision and recall • 2| y  ŷ |/ (| y |+| ŷ |) = 0.57 • What if | y | >> | ŷ | ? Truth y Predicted ŷ

Jaccard Distance • | y  ŷ |/| y  ŷ | = 2/5 = 0.4 • What if | y | >> | ŷ | ? Truth y Predicted ŷ

Classification Metrics • Of these, only precision seems to be (mildly) appropriate for cases with • eXtremely large number of labels • Smaller prediction budgets • Missing labels in truth Truth y Predicted ŷ

Ranking Metrics 1 2 2 4 3 5 4 13 5 8 6 6 7 11 8 3 9 10 10 1 11 7 Predicted  12 9 Truth y 13 12

Precision@k • Precision@1 = 100% 1 2 2 4 • Precision@2 = 50% 3 5 • Precision@3 = 66% 4 13 5 8 6 6 • Very appropriate for 7 11 8 3 budget constrained 9 10 prediction settings 10 1 11 7 Predicted  12 9 Truth y 13 12

Mean Average Precision • Precision@1 = 100% 1 2 2 4 • Precision@2 = 50% 3 5 • … 4 13 5 8 • Precision@13 = 13.7% 6 6 7 11 ------------------------------- 8 3 • MAP = 46.56% 9 10 • Usefulness for large L?? 10 1 11 7 Predicted  12 9 Truth y 13 12

Area under the ROC curve • Count mis-orderings 1 2 2 4 • For 2: none 3 5 • For 5: 1 4 13 • For 11: 4 5 8 • For 10: 5 6 6 7 11 • Total violations: 10 8 3 • AUC = 1 – 10/(4*9) 9 10 10 1 = 0.72 11 7 Predicted  12 9 Truth y 13 12

Mean Reciprocal Rank • Penalize rankings that 1 2 2 4 rank “on” labels low 3 5 • Rank of 2 = 1 4 13 5 8 • Rank of 5 = 3 6 6 • Rank of 11 = 7 7 11 8 3 • Rank of 10 = 9 9 10 • MRR = ¼* ( 1/1+1/3+1/7+1/9 ) 10 1 11 7 = 0.39 = 1/(2.52) Predicted  12 9 Truth y 13 12

Solution Strategies a.k.a. how to compress a decade worth of literature into an hour long talk

Notation and Formulation • Abstract problem : We have “documents” that are to be assigned a subset of L labels • Representation • Documents: vectors in D dimensions • Labels: vectors in L dimensions (Boolean hypercube) • Training set • ( x 1 , y 1 ), ( x 2 , y 2 ), ( x 3 , y 3 ), …, ( x n , y n ) • x i  R D , y i  {0,1} L

The Three Pillars of Multi-label Learning • 1 -vs-All or Binary Relevance Methods • Embedding or Dimensionality Reduction Methods • Tree or Ensemble Methods

1 -vs-All Methods • Predict scores for each label separately • Threshold or rank scores to make predictions Dance Test Dance Sport Test Sport Wiki page Tech Test Tech Math Test Math

1 -vs-All Methods Questions • Are the L classifiers trained separately/jointly? • If jointly then what “joins” the classifiers? Benefits Considerations • Extremely flexible model • Training time  • In-depth theoretical • Test time  • Model size  analysis possible

1 -vs-All Methods • Binary Relevance methods • Treat each label as a separate classification problem • Formulation (on board) • Also includes so-called plug-in methods, submodular methods • Margin methods much larger • Ensure scores of “on” labels are larger than those of “off” labels • Formulation (on board) • Structured Loss minimization methods • Formulation (sketch on board)

Embedding Methods • Since L >>> 1 and also has redundancies, reduce L • Dimensionality reduction!! • Nice theory, results, but expensive in prediction, training • Questions • How to embed labels (linear/non-linear) • How to predict in the embedding space • How to “pull back” to the label space • Single/multiple embeddings • CS, BCS, PLST, CPLST, LEML, SLEEC

Embedding Methods • How to embed labels • RP(CS), CCA, PCA, Low local distortion proj., Learnt projections • How to pull back x Test • Sparse recovery, Nearest neighbor, Learnt projections • Considerations z  R l • Training time  • Test time  y  R L • Model size 

Tree Methods All of Wiki Arts Tech Music Dance IT/SW EE/HW

Tree Methods • Partition the space of documents into several bins • To ease life, perform hierarchical partitioning as a tree • At each leaf perform some classification task to predict • To increase efficiency, use several trees (forest) • Questions • Partitioning criterion (clustering, ranking, classification) • Leaf action (constant labeling, use of another multi-labeler) • Ensemble size and aggregation method (single, multiple) • LPSR, MLRF, FAST-XML • Consideration: good accuracy, fast prediction, huge models

The Three Pillars of Multi-label Learning Prediction Model Well Name “Accuracy” Scalability Cost Size Understood? Did I not Now we are talking! Are you 1-vs-All Meh! Yikes! make myself Excellent kidding me! clear? Good/ Good/ Embedding Good Good Good Best Best Good/ Good/ Tree Best Large Meh! Best Best

SIGML Department of CSE IIT Kanpur Special Interest Group in - PowerPoint PPT Presentation

Multi-label Learning Trees, Embeddings, and much more! Purushottam Kar SIGML Department of CSE IIT Kanpur Special Interest Group in Machine Learning Classification Paradigms Pick one Pick one Pick all applicable Label 1 Label 1 Label 1

Algorithms for CTL B. Srivathsan Chennai Mathematical Institute Model Checking and Systems

5th Quality Estimation Shared Task WMT16 Lucia Specia, Varvara Logacheva and Carolina Scarton

54 Years of Graph Isomorphism Testing Brendan McKay Australian National University isomorphism

A Bucket Graph Based Labelling Algorithm for Vehicle Routing Pricing Ruslan Sadykov 1,2 Artur

Lecture 18: Semantic Role Labeling & Semantic Parsing Kai-Wei Chang CS @ University of

Skolem labelled graphs, old and new results Nabil Shalaby Department of Mathematics and

Background Instantiation-based methods for first-order logic Decision procedure for

AI and Predictive Analytics in Data-Center Environments Introduction to Machine Learning Josep

POS tagging CMSC 723 / LING 723 / INST 725 Marine Carpuat POS tagging Sequence labeling with

Route Planning Tabulation Reach Dijkstra ArcFlags Bidirectional Transit

Segmentation of Argumentative Texts with Contextualised Word Representations Georgios Petasis

Controlling Adaptive Resampling Fons Adriaensen Casa della Musica, Parma Linux Audio Conference

WHAT TO EXPECT AT LAC LAC WEBINAR FEBRUARY 23 RD , 2017 WHERE IS LAC? The J.W. Marriott

FEDEX CORPORATION GLOBALLY 69.7 Billion FY19 AnnualRevenue > 15M Shipments each business day

Outcome me Measur sureme ment i in Academic L Libraries: s: Ada dapti ting th the Pr

Cryptanalysis of LAC G. Leurent (Inria) Cryptanalysis of LAC DIAC 2014 1 / 9 . . . . . . . .

Workshop A: Getting to Scale: Tools for Greening Financial Institutions Presentation from

Changing the rules of the Internet is simple. Agenda 1 What are policies? Why am I

Changing Internet Policies is Easy Sergio Rojas sergio@lacnic.net Scenario The Services

Members - Call to Action, September 2020 Rich Jordan, Committee Chairperson Email:

Handling non-matching methods between independent distributed grids: step-70 Luca Heltai 1 ,

Conf. AMAST, Lac-Beauport, Qu ebec, 23-25 June 2010 MODEL REFINEMENT USING BISIMULATION

Compositional Game Theory Neil Ghani and Julian Hedges, Viktor Winschel, Philipp Zahn MSP group,

Mixed-Initiative Application for Equipment Diagnostics Bill Cheetham Bill Cheetham General

SIGML Department of CSE IIT Kanpur Special Interest Group in - PowerPoint PPT Presentation

Multi-label Learning Trees, Embeddings, and much more! Purushottam Kar SIGML Department of CSE IIT Kanpur Special Interest Group in Machine Learning Classification Paradigms Pick one Pick one Pick all applicable Label 1 Label 1 Label 1

Algorithms for CTL B. Srivathsan Chennai Mathematical Institute Model Checking and Systems

5th Quality Estimation Shared Task WMT16 Lucia Specia, Varvara Logacheva and Carolina Scarton

54 Years of Graph Isomorphism Testing Brendan McKay Australian National University isomorphism

A Bucket Graph Based Labelling Algorithm for Vehicle Routing Pricing Ruslan Sadykov 1,2 Artur

Lecture 18: Semantic Role Labeling &amp; Semantic Parsing Kai-Wei Chang CS @ University of

Skolem labelled graphs, old and new results Nabil Shalaby Department of Mathematics and

Background Instantiation-based methods for first-order logic Decision procedure for

AI and Predictive Analytics in Data-Center Environments Introduction to Machine Learning Josep

POS tagging CMSC 723 / LING 723 / INST 725 Marine Carpuat POS tagging Sequence labeling with

Route Planning Tabulation Reach Dijkstra ArcFlags Bidirectional Transit

Segmentation of Argumentative Texts with Contextualised Word Representations Georgios Petasis

Controlling Adaptive Resampling Fons Adriaensen Casa della Musica, Parma Linux Audio Conference

WHAT TO EXPECT AT LAC LAC WEBINAR FEBRUARY 23 RD , 2017 WHERE IS LAC? The J.W. Marriott

FEDEX CORPORATION GLOBALLY 69.7 Billion FY19 AnnualRevenue &gt; 15M Shipments each business day

Outcome me Measur sureme ment i in Academic L Libraries: s: Ada dapti ting th the Pr

Cryptanalysis of LAC G. Leurent (Inria) Cryptanalysis of LAC DIAC 2014 1 / 9 . . . . . . . .

Workshop A: Getting to Scale: Tools for Greening Financial Institutions Presentation from

Changing the rules of the Internet is simple. Agenda 1 What are policies? Why am I

Changing Internet Policies is Easy Sergio Rojas sergio@lacnic.net Scenario The Services

Members - Call to Action, September 2020 Rich Jordan, Committee Chairperson Email:

Handling non-matching methods between independent distributed grids: step-70 Luca Heltai 1 ,

Conf. AMAST, Lac-Beauport, Qu ebec, 23-25 June 2010 MODEL REFINEMENT USING BISIMULATION

Compositional Game Theory Neil Ghani and Julian Hedges, Viktor Winschel, Philipp Zahn MSP group,

Mixed-Initiative Application for Equipment Diagnostics Bill Cheetham Bill Cheetham General

Lecture 18: Semantic Role Labeling & Semantic Parsing Kai-Wei Chang CS @ University of

FEDEX CORPORATION GLOBALLY 69.7 Billion FY19 AnnualRevenue > 15M Shipments each business day