Classify as a Whole? MULTIPLE INSTANCE LEARNING Set Learning? - - PDF document

classify as a whole multiple instance learning
SMART_READER_LITE
LIVE PREVIEW

Classify as a Whole? MULTIPLE INSTANCE LEARNING Set Learning? - - PDF document

TUDelft PRLab NL Classify as a Whole? MULTIPLE INSTANCE LEARNING Set Learning? Multi-Set Learning? Marco Loog Pattern Recognition Laboratory Delft University of Technology heavily inspired by slides from Veronika Cheplygina and David MJ


slide-1
SLIDE 1

PRLab TUDelft NL

Classify as a Whole?

slide-2
SLIDE 2

PRLab TUDelft NL

MULTIPLE INSTANCE LEARNING

Set Learning? Multi-Set Learning? Marco Loog Pattern Recognition Laboratory Delft University of Technology

heavily inspired by slides from Veronika Cheplygina and David MJ Tax

slide-3
SLIDE 3

PRLab TUDelft NL

Outline

Representation and the idea of MIL Some example problems Two related goals A naïve approach to MIL? Concept-based classifiers Bag-based classifiers [with intermezzo] Discussion and conclusions

slide-4
SLIDE 4

PRLab TUDelft NL

Representation is Key!

Representing every object by a single feature vector can be rather limiting One other possibility is to represent an object by a collection or [multi-]set of feature vectors, i.e., by means of multiple instances

We still assume vectors to be in the same feature space Set sizes do not have to be, and typically are not, the same

slide-5
SLIDE 5

PRLab TUDelft NL

So…

The setting is that we have sets or so-called bag of instances [feature vectors] to represent every object, but we still have a single label per object When would we consider this? How do we build classifiers for this?

slide-6
SLIDE 6

PRLab TUDelft NL

Graphically Speaking

slide-7
SLIDE 7

PRLab TUDelft NL

Use Local Information?

slide-8
SLIDE 8

PRLab TUDelft NL

Cat?

slide-9
SLIDE 9

PRLab TUDelft NL

Detecting Activity of Action Units

slide-10
SLIDE 10

PRLab TUDelft NL

Does This Smell Musky?

slide-11
SLIDE 11

PRLab TUDelft NL

Original Goal = Twofold

MIL aims to classify new and previously unseen bags as accurate as possible But also : MIL tries to discover a “concept” that determines the positive class Concepts are feature vectors that uniquely identify a positive class

An image patch with a cat A single musky molecule The action unit that is active?

slide-12
SLIDE 12

PRLab TUDelft NL

Original Goal = Twofold

MIL aims to classify new and previously unseen bags as accurate as possible But also : MIL tries to discover a “concept” that determines the positive class Concepts are feature vectors that uniquely identify a positive class

First more easy than second goal Latter is often not considered in much of the MIL literature

slide-13
SLIDE 13

PRLab TUDelft NL

“Naive” Approach

Or classifier combining approach… Copy bag labels to instances Train a regular classifier Combine all outcomes by simple combiner

E.g. max rule, averaging, majority vote, quantile / percentile

I say “naive” because this might actually work pretty well in particular settings

slide-14
SLIDE 14

PRLab TUDelft NL

Concept-based MIL Classifiers

Really try to identify a part in feature space where the concept class resides Relies on “strict” interpretation of MIL : a bag is negative iff none of the instances is a concept Original approach by Dietterich relies on exactly this assumption

slide-15
SLIDE 15

PRLab TUDelft NL

Graphically Speaking

concept we look for?

slide-16
SLIDE 16

PRLab TUDelft NL

The General Setting?

Large background distribution… Small foreground effect?

slide-17
SLIDE 17

PRLab TUDelft NL

Discovering a Concept : What If?

This would potentially enable us to solve detailed tasks by coarse level annotations / labelings E.g. train on brain images for which you only know that there is a tumor, but get a classifier that can actually localize those tumors…

slide-18
SLIDE 18

PRLab TUDelft NL

Axes Parallel Rectangles

slide-19
SLIDE 19

PRLab TUDelft NL

Diverse Density

Model the concept by a compact density All instances in negative bags should have a high probability of non-concept All positive bags should have a high probability of having at least one concept instance Leads to complicated optimization But it can work pretty OK…

slide-20
SLIDE 20

PRLab TUDelft NL

MI-SVM

Adaptation of standard SVM

First iteration : Copy the bag label to the instance label Train a standard SVM Subsequent iterations : Choose a single instance in every positive bag is on the correct side of the decision boundary Retrain

As so often : idea is illustrated using SVM, but there is no need to stick to this choice of classifier

slide-21
SLIDE 21

PRLab TUDelft NL

“Naive” Approach [Continued]

Or classifier combining approach… Copy bag labels to instances Train a regular classifier Combine all outcomes by simple combiner

E.g. max rule, averaging, majority vote, quantile / percentile

Max rule comes close to “strict” MIL setting…

slide-22
SLIDE 22

PRLab TUDelft NL

Bag-based Classifiers

Forget about the concept Model the bag as a whole Extract global or local statistics for every bag Define bag distances / [dis]similarities and use a dissimilarity approach [or any similar kind of technique] MILES

slide-23
SLIDE 23

PRLab TUDelft NL

Global Bag Statistics

Consider means, variances, minima, maxima, covariances to describe the content of every bag This turns every bag in a regular feature vector… … and so we can apply our favorite supervised learning tools to it Just considering minima and maxima works surprisingly well in some cases

N.B. Not a rotation invariant representation [But do we care?]

slide-24
SLIDE 24

PRLab TUDelft NL

Local Bag Statistics : BoWs

Basic bag of “words” within MIL approach tries to model every MIL bag by means of a histogram representation Histogram binning is typically data adaptive

E.g. based on clustering of all training data

For every bag, we simply count how many instances end up in what bin These counts [or normalized counts] are then represented in a single feature vector

slide-25
SLIDE 25

PRLab TUDelft NL

Local Bag Statistics : BoWs

slide-26
SLIDE 26

PRLab TUDelft NL

The Dissimilarity Approach

Time for an intermezzo?

slide-27
SLIDE 27

PRLab TUDelft NL

Dissimilarity-based MIL

What can of distances / dissimilarities / similarities can we define between bags?

Think clustering, kernels,… Other options?

Approach is simple, relatively fast at train time, and competitive with many other approaches

In fact, on currently available MIL data sets it performs among the best; on par with the next MI learner…

slide-28
SLIDE 28

PRLab TUDelft NL

MILES

Instead of using bags as prototypes, we might as well take the individual instances

The original paper uses an RBF similarity and takes maximum similarity to a bag as feature value

Result : enormous feature vector dimensionality “Solution” : employing sparse classifier like lasso or liknon If the sparse regularization is tuned properly, results are often very good

slide-29
SLIDE 29

PRLab TUDelft NL

Remarks and Discussion

We are dealing with two different sample sizes : #bags and #instances

Which is the important one?

Strict MIL is asymmetric in its label

How to extend to multiclass?

What if we want more structure? E.g.

Instances do not have arbitrary position w.r.t. each other We have a time series

What if we don’t have a single concept?

Or we need a nontrivial combination [and, or, xor, ertc.]?

How to incorporate partial labeling?

slide-30
SLIDE 30

PRLab TUDelft NL

Remarks and Conclusions?

For many real-world classification problems complex / compound objects have to be represented

MIL representation might be a viable option in this setting

Many procedures have been proposed… … and it is not very clear when to choose which one

Diverse-Density is very good, but very slow Straightforward methods sometimes surprisingly good Represent bags with feature vectors is good Current recommendation : at least check MILES and a dissmilarity-based approach

slide-31
SLIDE 31

PRLab TUDelft NL

slide-32
SLIDE 32

PRLab TUDelft NL

References

  • Amores, “Multiple instance classification: Review, taxonomy and comparative study”, AI, 2013.
  • Andrews, Hofmann, Tsochantaridis, “Multiple instance learning with generalized support vector machines”, IAAA, 2002
  • Brossi, Bradley. “A comparison of multiple instance and group based learning”, DICTA, 2012
  • Chen, Bi, Wang, “MILES: Multiple-instance learning via embedded instance selection”, IEEE TPAMI, 2006
  • Cheplygina, Tax, Loog, “Multiple instancelearning with bag dissimilarities”, PR, 2015
  • Cheplygina, Tax, Loog, “On Classification with Bags, Groups and Sets”, arXiv , 2015
  • Cheplygina, Tax, Loog, “Does one rotten apple spoil the whole barrel?”, ICPR, 2012
  • Dietterich, Lathrop, Lozano-Perez, “Solving the multiple instance problem with axis-parallel rectangles”, AI, 1997
  • Duin, Pekalska, “The dissimilarity space: Bridging structural and statistical pattern recognition”, PRL, 2012
  • Foulds, Frank, “A review of multi-instance learning assumptions”, KER, 2010
  • Gärtner, Flach, Kowalczyk, Smola, “Multi-Instance Kernels”, ICML, 2002
  • Li, Tax, Duin, Loog, “Multiple-instance learning as a classifier combining problem”, PR, 2013
  • Loog, van Ginneken, “Static posterior probability fusion for signal detection”, ICPR, 2004
  • Maron, Lozano-Pérez, “A framework for multiple-instance learning”, NIPS, 1998
  • Pekalska, Duin, “The dissimilarity representation for pattern recognition”, World Scientific, 2005
  • Tax, Loog, Duin, Cheplygina, Lee, “Bag dissimilarities for multiple instance learning”, SIMBAD, 2011
  • Wang, Zucker, “Solving multiple-instance problem: A lazy learning approach”, ICML, 2000
  • Zhang, Goldman, “EM-DD: an improved multiple-instance learning technique”, NIPS, 2001