sigir 10 siddharth gopal yiming yang introduction
play

SIGIR 10 Siddharth Gopal & Yiming Yang Introduction Motivation - PowerPoint PPT Presentation

SIGIR 10 Siddharth Gopal & Yiming Yang Introduction Motivation Proposed approach Ranking Thresholding Experiments 7/20/2010 2 Webpage/Image/ News Article Binary classification (e.g.) Ad vs Not-an-Ad


  1. SIGIR ’10 Siddharth Gopal & Yiming Yang

  2.  Introduction  Motivation  Proposed approach –  Ranking  Thresholding  Experiments 7/20/2010 2

  3. Webpage/Image/ News Article  Binary classification (e.g.)  Ad vs Not-an-Ad  Spam vs Genuine  Multiclass classification (e.g.)  Which country is it about ? Switzerland, France, Italy, United States, ..  Multilabel classification  What topics is it related to ? Politics , Terrorism, Health, Sports, .. 7/20/2010 3

  4.  Our goal Subset of categories    d : , , { 1,2,....., } F x y x R y m Webpage , Image , etc..  Given:   A set of training examples d { | } x x R i i  For each training instance, the set of  relevant categories { | { 1,2,3.... }} y y m i i 7/20/2010 4

  5.  Binary relevance learning  Split the problem into several independent binary classification problems - One vs Rest, Pairwise.  Instance based multilabel classifier  Standard ML-kNN ( Yang, SIGIR 1994 )  Bayesian style ML-kNN. ( Zhang and Zhou , Pattern Recognition 2007)  Logistic regression style – (IBLR-ML) using kNN features ( Cheng and Hüllermeier, Machine Learning 2009)  Model based method  Rank-SVM for MLC, A maximum margin method re-enforcing partial order constraints. (Elisseff and Weston, NIPS 2002) 7/20/2010 5

  6.  Rank-svm  Having a global optimization criteria: Not break- down into multiple independent binary problems  A large number of parameters ( mD )  Different from Rank-SVM for IR [ and other Learning to rank IR methods ]  Follows a two-step procedure (a) Rank categories for a given instance (b) Select an instance specific threshold.  Our approach – to leverage recent learning to rank methods in IR to solve (a). 7/20/2010 6

  7. The typical learning to rank framework    d   Corpus 10       d   d   1   3   d Query      2  d   Model  1    q d 3       ..   ..               .. .. Documents are represented using a combined feature representation between  query, and document (TF, Cosine-sim, BM25 , Okapi etc)   d    ( , ) q d 10     1 Corpus     d    ( , ) q d     3 d   2 1          d d Query ( , )   q d Model  1   2  3       q d   .. 3     ..       ..         ..     ..     .. 7/20/2010 7

  8.  Given a new instance, rank the categories ..   Cats     5     1         1 Doc   2 Model         d 3 2         ..     ..           m   ..  How do we define a Combined Feature representation ?     Cats ( ,1) vec d     5         1 ( ,2) vec d       1   Doc   2     Model   ( ,3) vec d   2     d 3         ..     .. ..          ( , )  vec d m         m .. 7/20/2010 8

  9.  Define feature representation of the pair ( instance, category ) as follows  ( , ) vec x c i [ ( ( ...., ( ] Dist x ,D ),Dist x ,D ), Dist x ,D ) 1 2 NN i c NN i c kNN i c  D Instances that belong to category 'c' c  Distance to category centroid also appended  Concatenated L1, L2 and cosine similarity distances 7/20/2010 9

  10.  Pictorially (using only L2 links)  Thicker lines denotes links to the centroid  Thinner lines denotes links to the category neighborhood 7/20/2010 10

  11.  In short,  Represent the relation between each instance and category using ( , ) vec x c i  Substantially reduced model parameters compared to Rank-SVM for MLC.  Allow to use any learning to rank algorithm for IR to rank the categories  In our experience, we used SVM-MAP as the learning to rank method. 7/20/2010 11

  12.  Introduction  Motivation  Proposed approach –  Ranking  Thresholding  Experiments 7/20/2010 12

  13. Supervised learning of instance-specific threshold (Elisseff and Weston, NIPS 2002) Ranklist of category scores   [ , 1 2 ,... ] 1) m x LETOR s s s  1... i n i i i i Threshold for a ranklist is the ( 1 , 2 ,... ], ) m s s s t [ 2) one that minimizes the sum of 1 1 1 1 FP and FN ( 1 , 2 ,... ], ) m s s s t [ 2 2 2 2 ::: 1 2 ( , ,... m ], ) s s s t [ n n n n  3) Learn : 1 , 2 ,... ] T m w w s s s t [ i  4) : [ 1 , 2 ,... ] T m Predict Threshold t w s s s test test test test 7/20/2010 13

  14.  Introduction  Motivation  Proposed approach –  Ranking  Thresholding  Experiments 7/20/2010 14

  15. Dataset #Training #Testing #Categories #Avg-label per #Features instance Emotions 391 202 6 1.87 72 Scene 1211 1196 6 1.07 294 Yeast 1500 917 14 4.24 103 Citeseer 5303 1326 17 1.26 14601 Reuters- 21578 7770 3019 90 1.23 18637 7/20/2010 15

  16.  SVM-MAP-MLC  Our proposed approach  ML-kNN ( Zhang and Zhou , Pattern Recognition 2007)  IBLR-ML ( Cheng and Hüllermeier, Machine Learning 2009)  Rank-SVM (Elisseff and Weston, NIPS 2002)  Standard One vs Rest SVM 7/20/2010 16

  17.  Average Precision  Standard metric in IR  For a ranklist, measures the precision at each relevant category and averages them.  RankingLoss  Measures the average number of inversions between the relevant and irrelevant categories in the ranklist  Micro-F1 & Macro-F1  F1 is the harmonic mean of precision and recall.  Micro-averaging gives equal importance to each document.  Macro-averaging gives equal importance to each category. 7/20/2010 17

  18. MAP performance 1 0.95 0.9 SVM-MAP-MLC 0.85 ML-kNN 1-Rankloss Rank-SVM performance 0.8 Binary-SVM 0.75 IBLR 1 0.98 0.7 0.96 0.94 SVM-MAP- 0.92 MLC 0.9 ML-kNN 0.88 Rank-SVM 0.86 0.84 Binary-SVM 0.82 IBLR 0.8 7/20/2010 18

  19. Micro-F1 performance 0.9 0.85 0.8 Macro-F1 0.75 SVM-MAP-MLC 0.7 performance 0.65 ML-kNN 0.6 Rank-SVM 0.8 0.55 Binary-SVM 0.5 0.7 IBLR 0.45 0.4 0.6 0.5 SVM-MAP- MLC 0.4 ML-kNN 0.3 Rank-SVM Binary-SVM 0.2 IBLR 7/20/2010 19

  20.  Meta-level features to represent the relationship between instances and categories  Merging learning to rank and multilabel classification using the Meta-level features.  Improve the state-of-the-art for multilabel classification 7/20/2010 20

  21.  Different kinds of meta-level features  Different Learning to rank methods  Optimize different metrics other than MAP. 7/20/2010 21

  22. THANKS ! 7/20/2010 22

  23.  A Typical scenario in text categorization Wall Street Market Bag of Classifie Crime Words . r .  Support vector machine, logistic regression or boosting learn ‘m’ weight vectors each of length | vocabulary |, a total of m*| vocabulary | parameters. Is this good or bad ? 7/20/2010 23

  24.  Words are fairly discriminative  Current methods build a predictor based on weighting different words  Disadvantages  Too many words  Does not allow us to have a firm control over how each instance is related to a particular category. 7/20/2010 24

  25. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 ALL L2 0.2 L1 0.1 Cos 0 Emotions Yeast Scene Citeseer Reuters-21578  Effect of Different feature-sets 7/20/2010 25

  26. Rank-svm for IR Rank-svm for MLC 7/20/2010 26

  27. 1 0.9 0.8 0.7 0.6 SVM-MAP 0.5 MLKNN 0.4 RANKSVM-MLC 0.3 SVM 0.2 IBLR-ML 0.1 0 7/20/2010 27

  28. 1 0.9 0.8 0.7 0.6 SVM-MAP 0.5 MLKNN 0.4 RANKSVM-MLC 0.3 SVM 0.2 IBLR-ML 0.1 0 7/20/2010 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend