Classify as a Whole? MULTIPLE INSTANCE LEARNING Set Learning? - PDF document

TUDelft PRLab NL Classify as a Whole?

MULTIPLE INSTANCE LEARNING Set Learning? Multi-Set Learning? Marco Loog Pattern Recognition Laboratory Delft University of Technology heavily inspired by slides from Veronika Cheplygina and David MJ Tax PRLab TUDelft NL

Outline � Representation and the idea of MIL � Some example problems � Two related goals � A naïve approach to MIL? � Concept-based classifiers � Bag-based classifiers [with intermezzo] � Discussion and conclusions PRLab TUDelft NL

Representation is Key! � Representing every object by a single feature vector can be rather limiting � One other possibility is to represent an object by a collection or [multi-]set of feature vectors, i.e., by means of multiple instances � We still assume vectors to be in the same feature space � Set sizes do not have to be, and typically are not, the same PRLab TUDelft NL

So… � The setting is that we have sets or so-called bag of instances [feature vectors] to represent every object, but we still have a single label per object � When would we consider this? � How do we build classifiers for this? PRLab TUDelft NL

Graphically Speaking PRLab TUDelft NL

TUDelft PRLab NL Use Local Information?

Cat? PRLab TUDelft NL

Detecting Activity of Action Units PRLab TUDelft NL

Does This Smell Musky? PRLab TUDelft NL

Original Goal = Twofold � MIL aims to classify new and previously unseen bags as accurate as possible � But also : MIL tries to discover a “ concept ” that determines the positive class � Concepts are feature vectors that uniquely identify a positive class � An image patch with a cat � A single musky molecule � The action unit that is active? PRLab TUDelft NL

Original Goal = Twofold � MIL aims to classify new and previously unseen bags as accurate as possible � But also : MIL tries to discover a “ concept ” that determines the positive class � Concepts are feature vectors that uniquely identify a positive class � First more easy than second goal � Latter is often not considered in much of the MIL literature PRLab TUDelft NL

“Naive” Approach � Or classifier combining approach… � Copy bag labels to instances � Train a regular classifier � Combine all outcomes by simple combiner � E.g. max rule, averaging, majority vote, quantile / percentile � I say “naive” because this might actually work pretty well in particular settings PRLab TUDelft NL

Concept-based MIL Classifiers � Really try to identify a part in feature space where the concept class resides � Relies on “strict” interpretation of MIL : a bag is negative iff none of the instances is a concept � Original approach by Dietterich relies on exactly this assumption PRLab TUDelft NL

Graphically Speaking concept we look for? PRLab TUDelft NL

The General Setting? � Large background distribution… � Small foreground effect? PRLab TUDelft NL

Discovering a Concept : What If? � This would potentially enable us to solve detailed tasks by coarse level annotations / labelings � E.g. train on brain images for which you only know that there is a tumor, but get a classifier that can actually localize those tumors… PRLab TUDelft NL

Axes Parallel Rectangles PRLab TUDelft NL

Diverse Density � Model the concept by a compact density � All instances in negative bags should have a high probability of non-concept � All positive bags should have a high probability of having at least one concept instance � Leads to complicated optimization � But it can work pretty OK… PRLab TUDelft NL

MI-SVM � Adaptation of standard SVM � First iteration : � Copy the bag label to the instance label � Train a standard SVM � Subsequent iterations : � Choose a single instance in every positive bag is on the correct side of the decision boundary � Retrain � As so often : idea is illustrated using SVM, but there is no need to stick to this choice of classifier PRLab TUDelft NL

“Naive” Approach [Continued] � Or classifier combining approach… � Copy bag labels to instances � Train a regular classifier � Combine all outcomes by simple combiner � E.g. max rule, averaging, majority vote, quantile / percentile � Max rule comes close to “strict” MIL setting… PRLab TUDelft NL

Bag-based Classifiers � Forget about the concept � Model the bag as a whole � Extract global or local statistics for every bag � Define bag distances / [dis]similarities and use a dissimilarity approach [or any similar kind of technique] � MILES PRLab TUDelft NL

Global Bag Statistics � Consider means, variances, minima, maxima, covariances to describe the content of every bag � This turns every bag in a regular feature vector… � … and so we can apply our favorite supervised learning tools to it � Just considering minima and maxima works surprisingly well in some cases � N.B. Not a rotation invariant representation � [But do we care?] PRLab TUDelft NL

Local Bag Statistics : BoWs � Basic bag of “words” within MIL approach tries to model every MIL bag by means of a histogram representation � Histogram binning is typically data adaptive � E.g. based on clustering of all training data � For every bag, we simply count how many instances end up in what bin � These counts [or normalized counts] are then represented in a single feature vector PRLab TUDelft NL

Local Bag Statistics : BoWs PRLab TUDelft NL

The Dissimilarity Approach � Time for an intermezzo? PRLab TUDelft NL

Dissimilarity-based MIL � What can of distances / dissimilarities / similarities can we define between bags? � Think clustering, kernels,… Other options? � Approach is simple, relatively fast at train time, and competitive with many other approaches � In fact, on currently available MIL data sets it performs among the best; on par with the next MI learner… PRLab TUDelft NL

MILES � Instead of using bags as prototypes, we might as well take the individual instances � The original paper uses an RBF similarity and takes maximum similarity to a bag as feature value � Result : enormous feature vector dimensionality � “Solution” : employing sparse classifier like lasso or liknon � If the sparse regularization is tuned properly, results are often very good PRLab TUDelft NL

Remarks and Discussion � We are dealing with two different sample sizes : #bags and #instances � Which is the important one? � Strict MIL is asymmetric in its label � How to extend to multiclass? � What if we want more structure? E.g. � Instances do not have arbitrary position w.r.t. each other � We have a time series � What if we don’t have a single concept? � Or we need a nontrivial combination [and, or, xor, ertc.]? � How to incorporate partial labeling? PRLab TUDelft NL

Remarks and Conclusions? � For many real-world classification problems complex / compound objects have to be represented � MIL representation might be a viable option in this setting � Many procedures have been proposed… � … and it is not very clear when to choose which one � Diverse-Density is very good, but very slow � Straightforward methods sometimes surprisingly good � Represent bags with feature vectors is good � Current recommendation : at least check MILES and a dissmilarity-based approach PRLab TUDelft NL

PRLab TUDelft NL

References - Amores, “Multiple instance classification: Review, taxonomy and comparative study”, AI, 2013. - Andrews, Hofmann, Tsochantaridis, “Multiple instance learning with generalized support vector machines”, IAAA, 2002 - Brossi, Bradley. “A comparison of multiple instance and group based learning”, DICTA, 2012 - Chen, Bi, Wang, “MILES: Multiple-instance learning via embedded instance selection”, IEEE TPAMI, 2006 - Cheplygina, Tax, Loog, “Multiple instancelearning with bag dissimilarities”, PR, 2015 - Cheplygina, Tax, Loog, “On Classification with Bags, Groups and Sets”, arXiv , 2015 - Cheplygina, Tax, Loog, “Does one rotten apple spoil the whole barrel?”, ICPR, 2012 - Dietterich, Lathrop, Lozano-Perez, “Solving the multiple instance problem with axis-parallel rectangles”, AI, 1997 - Duin, Pekalska, “The dissimilarity space: Bridging structural and statistical pattern recognition”, PRL, 2012 - Foulds, Frank, “A review of multi-instance learning assumptions”, KER, 2010 - Gärtner, Flach, Kowalczyk, Smola, “Multi-Instance Kernels”, ICML, 2002 - Li, Tax, Duin, Loog, “Multiple-instance learning as a classifier combining problem”, PR, 2013 - Loog, van Ginneken, “Static posterior probability fusion for signal detection”, ICPR, 2004 - Maron, Lozano-Pérez, “A framework for multiple-instance learning”, NIPS, 1998 - Pekalska, Duin, “The dissimilarity representation for pattern recognition”, World Scientific, 2005 - Tax, Loog, Duin, Cheplygina, Lee, “Bag dissimilarities for multiple instance learning”, SIMBAD, 2011 - Wang, Zucker, “Solving multiple-instance problem: A lazy learning approach”, ICML, 2000 - Zhang, Goldman, “EM-DD: an improved multiple-instance learning technique”, NIPS, 2001 PRLab TUDelft NL

Classify as a Whole? MULTIPLE INSTANCE LEARNING Set Learning? - PDF document

TUDelft PRLab NL Classify as a Whole? MULTIPLE INSTANCE LEARNING Set Learning? Multi-Set Learning? Marco Loog Pattern Recognition Laboratory Delft University of Technology heavily inspired by slides from Veronika Cheplygina and David MJ

Classify then Summarize or Summarize then Classify Melvin F. Janowitz DIMACS, Rutgers University

Nearest Neighbor Learning (Instance Based Learning) l Classify based on local similarity l Ranges

INSTANCE BASED LEARNING 2 Instance-Based Learning Distance function defines whats learned

Multiple Instance Detection Network with Online Instance Classifier Refinement Peng Tang

Explaining the Stars: Weighted Multiple-Instance Learning for Aspect-Based Sentiment Analysis

Instance recognition Thurs April 6 Kristen Grauman UT Austin Instance recognition Indexing

Divide And Conquer Small And Large Instance Small instance. Sort a list that has n <=

I Instance-level recognition t l l iti Cordelia Schmid INRIA Instance-level recognition

Divide And Conquer Small And Large Instance Small instance. Sort a list that has n <=

Test Instance Generation Test Instance Generation for MAX 2SAT for MAX 2SAT Mitsuo Motoki

REAP Rural Education Achievement Program Presented by: Mitch Parrish Whole Child Whole

Learning for Categorization Sample Category Learning Problem A training example is an instance

ESSA Title II, Part A Preparing, Training, and Recruiting High- Quality Teachers, Principals,

Teacher Residencies RFP Webinar January 15, 2019 Whole Child Whole School Whole Community

Introduction to the Whole Health Skill-Building Courses Description: The Whole Health

Whole Genome Analysis and Annotation Adam Siepel Biological Statistics & Computational

Typed Meta-Interpretive Learning for Proof Strategies Colin Farquhar, Gudmund Grov, Andrew

Mejora en el Proceso de Desempacado usando T ecnicas DBI Ricardo J. Rodr guez

CMCC Webinar Blue Growth: science, society and innovation. A focus on the Mediterranean and Black

Interferometric Sensor (MAGIS-100) PAC Meeting Jason Hogan on behalf of the MAGIS

2Q 2018 Results Rip Zinsmeister Chief executive officer Craig Coburn Chief financial officer 1

First Quarter 2018 Results MAY 10, 2018 Rip Zinsmeister CHIEF EXECUTIVE OFFICER Craig Coburn

Cyber attacks as use of force in international relations: ius ad bellum and ius in bello Prof.

UCSC FINANCE RESEARCH GROUP PhD Program in Economics and Finance Catholic University of Milan

Classify as a Whole? MULTIPLE INSTANCE LEARNING Set Learning? - PDF document

TUDelft PRLab NL Classify as a Whole? MULTIPLE INSTANCE LEARNING Set Learning? Multi-Set Learning? Marco Loog Pattern Recognition Laboratory Delft University of Technology heavily inspired by slides from Veronika Cheplygina and David MJ

Classify then Summarize or Summarize then Classify Melvin F. Janowitz DIMACS, Rutgers University

Nearest Neighbor Learning (Instance Based Learning) l Classify based on local similarity l Ranges

INSTANCE BASED LEARNING 2 Instance-Based Learning Distance function defines whats learned

Multiple Instance Detection Network with Online Instance Classifier Refinement Peng Tang

Explaining the Stars: Weighted Multiple-Instance Learning for Aspect-Based Sentiment Analysis

Instance recognition Thurs April 6 Kristen Grauman UT Austin Instance recognition Indexing

Divide And Conquer Small And Large Instance Small instance. Sort a list that has n &lt;=

I Instance-level recognition t l l iti Cordelia Schmid INRIA Instance-level recognition

Divide And Conquer Small And Large Instance Small instance. Sort a list that has n &lt;=

Test Instance Generation Test Instance Generation for MAX 2SAT for MAX 2SAT Mitsuo Motoki

REAP Rural Education Achievement Program Presented by: Mitch Parrish Whole Child Whole

Learning for Categorization Sample Category Learning Problem A training example is an instance

ESSA Title II, Part A Preparing, Training, and Recruiting High- Quality Teachers, Principals,

Teacher Residencies RFP Webinar January 15, 2019 Whole Child Whole School Whole Community

Introduction to the Whole Health Skill-Building Courses Description: The Whole Health

Whole Genome Analysis and Annotation Adam Siepel Biological Statistics &amp; Computational

Typed Meta-Interpretive Learning for Proof Strategies Colin Farquhar, Gudmund Grov, Andrew

Mejora en el Proceso de Desempacado usando T ecnicas DBI Ricardo J. Rodr guez

CMCC Webinar Blue Growth: science, society and innovation. A focus on the Mediterranean and Black

Interferometric Sensor (MAGIS-100) PAC Meeting Jason Hogan on behalf of the MAGIS

2Q 2018 Results Rip Zinsmeister Chief executive officer Craig Coburn Chief financial officer 1

First Quarter 2018 Results MAY 10, 2018 Rip Zinsmeister CHIEF EXECUTIVE OFFICER Craig Coburn

Cyber attacks as use of force in international relations: ius ad bellum and ius in bello Prof.

UCSC FINANCE RESEARCH GROUP PhD Program in Economics and Finance Catholic University of Milan

Divide And Conquer Small And Large Instance Small instance. Sort a list that has n <=

Divide And Conquer Small And Large Instance Small instance. Sort a list that has n <=

Whole Genome Analysis and Annotation Adam Siepel Biological Statistics & Computational