A meta-learning system for multi-instance classification Gitte - - PowerPoint PPT Presentation
A meta-learning system for multi-instance classification Gitte - - PowerPoint PPT Presentation
A meta-learning system for multi-instance classification Gitte Vanwinckelen and Hendrik Blockeel KU Leuven, Belgium Motivation Performed extensive evaluation of multi-instance (MI) learners on datasets from different domains
Motivation
- Performed extensive evaluation of multi-instance
(MI) learners on datasets from different domains
- Performance of MI algorithms is very sensitive to
the application domain
- Can we formalize this knowledge by learning a
meta-model?
Outline
1) Motivation 2) What is multi-instance learning? 3) Design principles of meta-model 4) Performance evaluation of mi-learners 5) Meta-learning results 6) Conclusion
MI learning
Relationship instances – bag
- Traditional mi learning
– At least one postive instance in a bag – Learn a concept that describes all positive
instances (or bags)
- Generalized mi learning
– All instances in a bag contribute to its label – Learn a concept that identifies the positive
bags
Standard multi-instance learning
Drug activity prediction Identifying musky molecule configurations
[Dietterich, Artificial Intelligence 1997]
Generalized multi-instance learning
[J. Amores, Artificial Intelligence '13]
Which bags describe a beach ?
Meta-learning
- Which learner performs best on which MI dataset?
- Construct meta-features from original learning tasks
- Learn a model on meta-dataset (decision tree)
- Nb attributes, size train sets, correlation with
- utput , ...
- Landmarkers: Fast algorithms [Pfahringer '00]
- Indicate performance of expensive algorithms
Meta-learning with landmarking
- Reduce MI datasets to single-instance datasets
based on different MI assumptions
- Standard MI assumption
– Label instances with bag label – One-sided noisy dataset
- Collective assumption
– All instances contribute equally to the bag label – Average features values over all instances in a bag
MI experiments: Datasets
- SIVAL image classification, CBIR (25)
- Synthetic newsgroups, text classification (20)
- Binary classification UCI datasets (27)
– adult, tictactoe,diabetes,transfusion,spam – Iid sampled to create bags – Bag configurations: ½, ⅓, ¼, …
- Evaluation: Area Under ROC curve (AUC)
MI experiments: Algorithms
- Decision trees: SimpleMI-J48, MIWrapper-J48, Adaboost-MITI
- Rule inducer MIRI
- Nearest neighbors: CitationKNN
- OptimalBall
- Diverse Density: MDD, EM-DD, MIDD
- TLD
- Support Vector Machines: mi-SVM, MISMO (NSK)
- Logistic regression: MILR, MILR-C
Performance overview MI algorithms
- Comparison of classifiers over multiple datasets [Demsar '06]
- Are performance differences statistically significant?
- Friedman test with post-hoc Nemenyi test
– Ranking of algorithms for each dataset – Average ranks over datasets same domain – Hypothesis test that algorithms perform equally good – Nemenyi test identify statistically equivalent groups of classifiers
- Critical difference diagram
Critical difference diagrams (AUC)
Text UCI CBIR
Meta-learning setup
- 14 learners
binary classification tasks for all → combinations of learners (one vs one)
- Leave-one-out cross-validation
- Three dataset domains (CBIR, text, UCI datasets)
- Landmarkers (standard and collective assumption):
– Naive Bayes – 1 nearest neighbors – Logistic regression – Decision stump
UCI Metamodel based on number of features and noise level
Majority classifier wins Meta-model wins
UCI metamodel: Landmarker approach
Standard MI landmarkers Collective MI landmarkers Dstump, NB, 1NN, LR Majority classifier wins Meta-model wins
CBIR metamodel: Landmarker approach
Standard MI landmarkers Collective MI landmarkers Majority classifier wins Meta-model wins
Relationship landmarkers:
logistic regression
CBIR UCI Text
Conclusions and future work
- Demonstration large differences MI learner
evaluation on different domains
- Not sufficient to evaluate on multiple datasets from
same domain
- Larger meta-dataset needed
- Define alternative MI assumptions and translate to
SI datasets
– e.g. Meta-data assumption (NSK)