SIGIR 10 Siddharth Gopal & Yiming Yang Introduction Motivation - PowerPoint PPT Presentation

SIGIR ’10 Siddharth Gopal & Yiming Yang

 Introduction  Motivation  Proposed approach –  Ranking  Thresholding  Experiments 7/20/2010 2

Webpage/Image/ News Article  Binary classification (e.g.)  Ad vs Not-an-Ad  Spam vs Genuine  Multiclass classification (e.g.)  Which country is it about ? Switzerland, France, Italy, United States, ..  Multilabel classification  What topics is it related to ? Politics , Terrorism, Health, Sports, .. 7/20/2010 3

 Our goal Subset of categories    d : , , { 1,2,....., } F x y x R y m Webpage , Image , etc..  Given:   A set of training examples d { | } x x R i i  For each training instance, the set of  relevant categories { | { 1,2,3.... }} y y m i i 7/20/2010 4

 Binary relevance learning  Split the problem into several independent binary classification problems - One vs Rest, Pairwise.  Instance based multilabel classifier  Standard ML-kNN ( Yang, SIGIR 1994 )  Bayesian style ML-kNN. ( Zhang and Zhou , Pattern Recognition 2007)  Logistic regression style – (IBLR-ML) using kNN features ( Cheng and Hüllermeier, Machine Learning 2009)  Model based method  Rank-SVM for MLC, A maximum margin method re-enforcing partial order constraints. (Elisseff and Weston, NIPS 2002) 7/20/2010 5

 Rank-svm  Having a global optimization criteria: Not break- down into multiple independent binary problems  A large number of parameters ( mD )  Different from Rank-SVM for IR [ and other Learning to rank IR methods ]  Follows a two-step procedure (a) Rank categories for a given instance (b) Select an instance specific threshold.  Our approach – to leverage recent learning to rank methods in IR to solve (a). 7/20/2010 6

The typical learning to rank framework    d   Corpus 10       d   d   1   3   d Query      2  d   Model  1    q d 3       ..   ..               .. .. Documents are represented using a combined feature representation between  query, and document (TF, Cosine-sim, BM25 , Okapi etc)   d    ( , ) q d 10     1 Corpus     d    ( , ) q d     3 d   2 1          d d Query ( , )   q d Model  1   2  3       q d   .. 3     ..       ..         ..     ..     .. 7/20/2010 7

 Given a new instance, rank the categories ..   Cats     5     1         1 Doc   2 Model         d 3 2         ..     ..           m   ..  How do we define a Combined Feature representation ?     Cats ( ,1) vec d     5         1 ( ,2) vec d       1   Doc   2     Model   ( ,3) vec d   2     d 3         ..     .. ..          ( , )  vec d m         m .. 7/20/2010 8

 Define feature representation of the pair ( instance, category ) as follows  ( , ) vec x c i [ ( ( ...., ( ] Dist x ,D ),Dist x ,D ), Dist x ,D ) 1 2 NN i c NN i c kNN i c  D Instances that belong to category 'c' c  Distance to category centroid also appended  Concatenated L1, L2 and cosine similarity distances 7/20/2010 9

 Pictorially (using only L2 links)  Thicker lines denotes links to the centroid  Thinner lines denotes links to the category neighborhood 7/20/2010 10

 In short,  Represent the relation between each instance and category using ( , ) vec x c i  Substantially reduced model parameters compared to Rank-SVM for MLC.  Allow to use any learning to rank algorithm for IR to rank the categories  In our experience, we used SVM-MAP as the learning to rank method. 7/20/2010 11

Supervised learning of instance-specific threshold (Elisseff and Weston, NIPS 2002) Ranklist of category scores   [ , 1 2 ,... ] 1) m x LETOR s s s  1... i n i i i i Threshold for a ranklist is the ( 1 , 2 ,... ], ) m s s s t [ 2) one that minimizes the sum of 1 1 1 1 FP and FN ( 1 , 2 ,... ], ) m s s s t [ 2 2 2 2 ::: 1 2 ( , ,... m ], ) s s s t [ n n n n  3) Learn : 1 , 2 ,... ] T m w w s s s t [ i  4) : [ 1 , 2 ,... ] T m Predict Threshold t w s s s test test test test 7/20/2010 13

Dataset #Training #Testing #Categories #Avg-label per #Features instance Emotions 391 202 6 1.87 72 Scene 1211 1196 6 1.07 294 Yeast 1500 917 14 4.24 103 Citeseer 5303 1326 17 1.26 14601 Reuters- 21578 7770 3019 90 1.23 18637 7/20/2010 15

 SVM-MAP-MLC  Our proposed approach  ML-kNN ( Zhang and Zhou , Pattern Recognition 2007)  IBLR-ML ( Cheng and Hüllermeier, Machine Learning 2009)  Rank-SVM (Elisseff and Weston, NIPS 2002)  Standard One vs Rest SVM 7/20/2010 16

 Average Precision  Standard metric in IR  For a ranklist, measures the precision at each relevant category and averages them.  RankingLoss  Measures the average number of inversions between the relevant and irrelevant categories in the ranklist  Micro-F1 & Macro-F1  F1 is the harmonic mean of precision and recall.  Micro-averaging gives equal importance to each document.  Macro-averaging gives equal importance to each category. 7/20/2010 17

MAP performance 1 0.95 0.9 SVM-MAP-MLC 0.85 ML-kNN 1-Rankloss Rank-SVM performance 0.8 Binary-SVM 0.75 IBLR 1 0.98 0.7 0.96 0.94 SVM-MAP- 0.92 MLC 0.9 ML-kNN 0.88 Rank-SVM 0.86 0.84 Binary-SVM 0.82 IBLR 0.8 7/20/2010 18

Micro-F1 performance 0.9 0.85 0.8 Macro-F1 0.75 SVM-MAP-MLC 0.7 performance 0.65 ML-kNN 0.6 Rank-SVM 0.8 0.55 Binary-SVM 0.5 0.7 IBLR 0.45 0.4 0.6 0.5 SVM-MAP- MLC 0.4 ML-kNN 0.3 Rank-SVM Binary-SVM 0.2 IBLR 7/20/2010 19

 Meta-level features to represent the relationship between instances and categories  Merging learning to rank and multilabel classification using the Meta-level features.  Improve the state-of-the-art for multilabel classification 7/20/2010 20

 Different kinds of meta-level features  Different Learning to rank methods  Optimize different metrics other than MAP. 7/20/2010 21

THANKS ! 7/20/2010 22

 A Typical scenario in text categorization Wall Street Market Bag of Classifie Crime Words . r .  Support vector machine, logistic regression or boosting learn ‘m’ weight vectors each of length | vocabulary |, a total of m*| vocabulary | parameters. Is this good or bad ? 7/20/2010 23

 Words are fairly discriminative  Current methods build a predictor based on weighting different words  Disadvantages  Too many words  Does not allow us to have a firm control over how each instance is related to a particular category. 7/20/2010 24

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 ALL L2 0.2 L1 0.1 Cos 0 Emotions Yeast Scene Citeseer Reuters-21578  Effect of Different feature-sets 7/20/2010 25

Rank-svm for IR Rank-svm for MLC 7/20/2010 26

1 0.9 0.8 0.7 0.6 SVM-MAP 0.5 MLKNN 0.4 RANKSVM-MLC 0.3 SVM 0.2 IBLR-ML 0.1 0 7/20/2010 27

1 0.9 0.8 0.7 0.6 SVM-MAP 0.5 MLKNN 0.4 RANKSVM-MLC 0.3 SVM 0.2 IBLR-ML 0.1 0 7/20/2010 28

SIGIR 10 Siddharth Gopal & Yiming Yang Introduction Motivation - PowerPoint PPT Presentation

SIGIR 10 Siddharth Gopal & Yiming Yang Introduction Motivation Proposed approach Ranking Thresholding Experiments 7/20/2010 2 Webpage/Image/ News Article Binary classification (e.g.) Ad vs Not-an-Ad

Distributed Training for Large-scale Logistic Models Siddharth Gopal Carnegie Mellon Univeristy

Semantic Full-Text Search Semantic Full Text Search Talk @ SIGIR JIWES Talk @ SIGIR

Cross-Graph Learning of Multi-Relational Associations Hanxiao Liu, Yiming Yang Carnegie Mellon

Siddharth S Saxena Siddharth S Saxena Quantum Matter Group Cavendish Laboratory University of

Recursive Regularization for Large-scale Classification with Hierarchical and Graphical

YANG Data Models for TE and RSVP drafu-ietg-teas-yang-te-08 drafu-ietg-teas-yang-rsvp-07

Yang Yang MICHIGAN TECH Yang Yang , yyang7@mtu.edu RESEARCH FORUM TECHTALKS Current research

User Modeling on Demographic Attributes in Big Mobile Social Networks Yang Yang Northwestern

HSBC Hongkong and Shanghai Banking Corporation Yiming Guan, Yang Cao, Fu Chen Introduction

Stuff Ive Seen: Retrospective and Prospective Susan Dumais SIGIR Desktop

Steerable Interfaces for Steerable Interfaces for Pervasive Computing Spaces Pervasive Computing

fbRads Analyzing and managing Facebook ads from R Gergely Daroczi, Ajay Gopal CARD.com November

Bipartite Edge Prediction via Transductive Learning over Product Graphs Hanxiao Liu, Yiming Yang

Privacy and Anonymity in Graph Data Michael Hay, Siddharth Srivastava, Philipp Weis May 2006

Hacking XPATH 2.0 Tom Forbes Sumit Siddharth 7Safe, UK

A sustainability-based approach to resource allocation in the Smart Grid Siddharth Suryanarayanan

Demogr ographic ic Tren ends and Attit itudes es towards Migr gration ion Globall lly

12 th March 2019 Mercer Boston, 99 High Street, Financial District, Boston, MA 02110, USA.

New Banks seminar New Bank Start-up Unit 9 June 2017 NBSU Seminar How to become a bank 2 How

Housekeeping Agenda Introduction Amy Bell OPBAS findings Common Issues for Law

Marktoberdorf NATO Summer School 2016, Lecture 1 Assurance and Formal Methods John Rushby

THE MIXED EFFECTS TREND VECTOR MODEL Mark de Rooij Leiden University Psychological Institute

Undecidability of D < : and Its Decidable Fragments Jason Z.S. Hu Ondej Lhotk University

Featherweight Scala Week 14 January 31 1 Today Previously: Featherweight Java Today:

SIGIR 10 Siddharth Gopal & Yiming Yang Introduction Motivation - PowerPoint PPT Presentation

SIGIR 10 Siddharth Gopal & Yiming Yang Introduction Motivation Proposed approach Ranking Thresholding Experiments 7/20/2010 2 Webpage/Image/ News Article Binary classification (e.g.) Ad vs Not-an-Ad

Distributed Training for Large-scale Logistic Models Siddharth Gopal Carnegie Mellon Univeristy

Semantic Full-Text Search Semantic Full Text Search Talk @ SIGIR JIWES Talk @ SIGIR

Cross-Graph Learning of Multi-Relational Associations Hanxiao Liu, Yiming Yang Carnegie Mellon

Siddharth S Saxena Siddharth S Saxena Quantum Matter Group Cavendish Laboratory University of

Recursive Regularization for Large-scale Classification with Hierarchical and Graphical

YANG Data Models for TE and RSVP drafu-ietg-teas-yang-te-08 drafu-ietg-teas-yang-rsvp-07

Yang Yang MICHIGAN TECH Yang Yang , yyang7@mtu.edu RESEARCH FORUM TECHTALKS Current research

User Modeling on Demographic Attributes in Big Mobile Social Networks Yang Yang Northwestern

HSBC Hongkong and Shanghai Banking Corporation Yiming Guan, Yang Cao, Fu Chen Introduction

Stuff Ive Seen: Retrospective and Prospective Susan Dumais SIGIR Desktop

Steerable Interfaces for Steerable Interfaces for Pervasive Computing Spaces Pervasive Computing

fbRads Analyzing and managing Facebook ads from R Gergely Daroczi, Ajay Gopal CARD.com November

Bipartite Edge Prediction via Transductive Learning over Product Graphs Hanxiao Liu, Yiming Yang

Privacy and Anonymity in Graph Data Michael Hay, Siddharth Srivastava, Philipp Weis May 2006

Hacking XPATH 2.0 Tom Forbes Sumit Siddharth 7Safe, UK

A sustainability-based approach to resource allocation in the Smart Grid Siddharth Suryanarayanan

Demogr ographic ic Tren ends and Attit itudes es towards Migr gration ion Globall lly

12 th March 2019 Mercer Boston, 99 High Street, Financial District, Boston, MA 02110, USA.

New Banks seminar New Bank Start-up Unit 9 June 2017 NBSU Seminar How to become a bank 2 How

Housekeeping Agenda Introduction Amy Bell OPBAS findings Common Issues for Law

Marktoberdorf NATO Summer School 2016, Lecture 1 Assurance and Formal Methods John Rushby

THE MIXED EFFECTS TREND VECTOR MODEL Mark de Rooij Leiden University Psychological Institute

Undecidability of D &lt; : and Its Decidable Fragments Jason Z.S. Hu Ondej Lhotk University

Featherweight Scala Week 14 January 31 1 Today Previously: Featherweight Java Today:

Undecidability of D < : and Its Decidable Fragments Jason Z.S. Hu Ondej Lhotk University