Aggressive Double Sampling for Reducing Multi-class Classification - PowerPoint PPT Presentation

Aggressive Double Sampling for Reducing Multi-class Classification to Binary Classification Bikash Joshi (PhD Student) AMA team, LIG Supervised By: Prof. Massih-Reza Amini and Dr. Franck Iutzeler March 20, 2017 Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 1 / 27

Outline Introduction 1 Multiclass to Binary Reduction 2 Double-Sampled Multiclass to Binary Reduction 3 Experimental Results 4 Conclusion 5 Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 2 / 27

Multiclass Classification: Introduction Figure : Digit Figure : Image Figure : Text Classification Classification Classification Finite set of categories (K > 2) Popular applications: image and text classification. Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 4 / 27

Multiclass classification: Related Work 1 Combined approaches based on binary classification: ◮ One-Vs-Rest ⋆ One binary problem for each class ⋆ K binary problems ⋆ O(K × d) ◮ One-Vs-One ⋆ One binary problem for each pair of classes ⋆ O( K 2 × d) 2 Uncombined Approaches ◮ for example: multiclass SVM, MLP ◮ One scoring function per class 3 Logarithmic Time Algorithms ◮ For example: logTree, Recall-Tree ◮ Each leaf node represents a class ◮ O(logK) Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 5 / 27

Multiclass classification : Challenges The number of classes, K, in new emerging multiclass problems, for example in text and image classification, may reach 10 5 to 10 6 categories. For example: ◮ 4 × 10 6 sites ◮ 10 6 categories ◮ 10 5 editors ◮ Imbalanced nature of hierarchies Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 6 / 27

Multiclass classification : Challenges Class imbalance problem Majority of classes have few representative examples Long tailed distribution 4000 DMOZ-7500 3500 3000 2500 # Classes 2000 1500 1000 500 0 2-5 6-10 11-30 31-100 101-200 >200 # Documents Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 7 / 27

Text Classification: Task: Automatic classification of an example text to one of fixed set of categories. Feature Representation: Bag of Words: ◮ From training corpus extract vocabulary. ◮ Represent each terms as 0 or 1 ◮ Highly sparse Document-class joint feature representation: ◮ Inspired by learning to rank ◮ Similarity features between an example and class of examples ◮ For example: � 1 t ∈ y ∩ x Where, x → One document y → Class of documents Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 8 / 27

Motivation of our work Baselines: Model complexity increases with classes(K) and feature dimension (d). Algorithm that scales well for large scale data Does not suffer from class imbalance problem Less complex model Competitive with the state of the art approaches Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 10 / 27

Framework X ⊆ R d : Input Space Y = 1,...,K : Output Space S = ( x y i i ) m i =1 : Training set of i.i.d. pairs G = g : X × Y → R : Class of predictors Instantaneous Loss 1 � e ( g , x y ) = (1) ✶ g ( x y ) ≤ g ( x y ′ ) K − 1 y ′ ∈Y\ y ✶ π is the indicator function (Value is 0 or 1) Average number of classes that get greater scoring by g than true class Ranking loss used in Multiclass-SVM a a Weston et. al. (1998) Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 11 / 27

Framework Empirical Loss Empirical error of g ∈ G over S is: m 1 � � L m ( g , S ) = (2) ✶ i ) ≤ g ( x y ′ g ( x yi m ( K − 1) ) i y ′ ∈ Y \ y i i =1 m 1 � � = (3) ✶ ′ m ( K − 1) h ( x y i i , x y i ) ≤ 0 i =1 y ′ ∈ Y \ y i � �� ) − g ( xy ′ g ( xyi ) i i Resembles to binary-classification-loss based risk Selection of a hypothesis in G minimizing risk over S is equivalent to search a hypothesis in H minimizing risk over T(S) of size m × ( K − 1) Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 12 / 27

Multiclass to binary reduction example We consider the following transformation �� x k i , x y i z j = , ˜ y j = − 1 if k < y i i T ( S ) = , � � � � x y i i , x k z j = , ˜ y j = +1 elsewhere j . i =( i − 1)( K − 1)+ k | T ( S ) | = m × (K - 1) Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 13 / 27

Multiclass to binary reduction algorithm [Bikash et al. 2015] Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 14 / 27

Improvements and New challenges Improvements: One parameter vector for all classes. Low-dimensional feature space. Overcome class imbalance. New Challenges: Number of transformations huge for larger K Large computational overhead Large memory requirement Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 15 / 27

Aggressive double sampling 1 Drawing uniformly µ examples per class, in order to form practical set S µ ; ◮ Reduce redundancy in examples ◮ Emphasizing rare classes 2 For each example x y in S µ , drawing uniformly κ adversarial classes in Y\{ y } . ◮ Reduces time complexity ◮ Low memory requirement Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 17 / 27

Double Sampled Multi to Binary Reduction Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 18 / 27

Experimental Setup Datasets: Application: Text Classification DMOZ and Wikipedia datasets. (LSHTC challenge) Pre-processed with stop word removal and stemming. Random samples of 1000, 2000, 3000, 4000, 5000, 7500, 10000, 20000. Comparison: DS-m2b: Proposed double sampled multiclass to binary algorithm OVA: One-Vs-All algorithm M-SVM: Crammar-Singer implementation of multiclass SVM Recall Tree: Hierarchical One-Vs-Some algorithm Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 20 / 27

Feature representation Φ( x y ) Features ln(1 + l S � � 1. ln(1 + y t ) 2. S t ) t ∈ y ∩ x t ∈ y ∩ x ln(1 + y t � � 3. I t 4. | y | ) t ∈ y ∩ x t ∈ y ∩ x ln(1 + y t ln(1 + y t | y | . l S � � 5. | y | . I t ) 6. S t ) t ∈ y ∩ x t ∈ y ∩ x y t � � 7. 1 8. | y | . I t t ∈ y ∩ x t ∈ y ∩ x 10. d ( x y , centroid ( y )) 9. BM25 x t : number of occurrences of terme t in document x , V : Number of distinct terms in S , y t = � x ∈ y x t , | y | = � t ∈V y t , S t = � x ∈S x t , l S = � t ∈V S t . I t : idf of the term t , Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 21 / 27

Results: Runtime Comparison 6 10 OVA MSVM Recall Tree 5 10 DS-m2b Total runtime (seconds) 4 10 3 10 2 10 1 10 1000 3000 5000 7500 10000 20000 # of classes Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 22 / 27

Results: Memory Comparison 60 OVA MSVM 50 Recall Tree DS-m2b Total memory usage (GB) 40 32GB Limit 30 20 16GB Limit 10 0 1000 3000 5000 7500 10000 20000 # of classes Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 23 / 27

Results: Prediction Performance Comparison 0.6 OVA MSVM 0.5 Recall Tree DS-m2b 0.4 MAF 0.3 0.2 0.1 0.0 1000 3000 5000 7500 10000 20000 # of classes Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 24 / 27

Conclusion: Multiclass to binary reduction to handle large-class scenario and overcome class imbalance problem. Use of double sampling to further improve computational complexity and memory usage. Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 26 / 27

Questions? Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 27 / 27

Aggressive Double Sampling for Reducing Multi-class Classification - PowerPoint PPT Presentation

Aggressive Double Sampling for Reducing Multi-class Classification to Binary Classification Bikash Joshi (PhD Student) AMA team, LIG Supervised By: Prof. Massih-Reza Amini and Dr. Franck Iutzeler March 20, 2017 Bikash Joshi (AMA team, LIG)

Double, Multiple, and Sequential Sampling Double-sampling In a double-sampling plan, a first

Financial Impacts of Achieving Aggressive Financial Impacts of Achieving Aggressive Financial

More Java Graphics Shape Classes: Face Check out Faces from SVN Finish Java Graphics: text and

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Sampling Overview R toy sampling Non-probability sampling Probability Methods (AKA random)

Sampling Sediment and Sampling Sediment and Sampling Sediment and Porewater Sampling Sediment

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

Names Quattro S Double A Double S Double C Triple C Quattro C Variations All Boxer models

Double Chooz Experiment Status Double Chooz Experiment Status Jelena Maricic, Drexel University

Global Illumination Multi-Sampling Path Tracing Simple Sampling Josef talked about all of

Case 2: Reducing Cardiovascular Risk Type 2 Diabetes Management Case 1: Reducing Hypoglycemic

Newfound Water Quality Sampling: In Lake Sampling 8 Historic Sampling locations

Sampling Distributions Sampling Distribution of the Mean & Hypothesis Testing Sampling

HawkEye: Efficient Fine-grained OS Support for Huge Pages Ashish Panwar 1 , Sorav Bansal 2 , K.

Skill Based Workshop Severe Aggression CAPTAIN Summit December 5, 2018 Daniel B. Shabani,

Parallel QR Algorithm with Aggressive Early Deflation Meiyue Shao Department of Computing Science

Problem Behavior is Predictable and Preventable Timothy R. Vollmer, Ph.D. Department of

to Optimize Cellular Radio Usage Pavan Kumar, Ranjita Bhagwan, Saikat Guha, Vishnu Navda,

PMWG Readmissions Sub-group 06/25 / 2019 Agenda 1. Revisit Workplan/Vision of Sub-Group 2.

Applied Machine Learning CIML Chaps 4-5 (A Geometric Approach) A ship in port is safe, but

Autonomous Helicopter Flight Pieter Abbeel UC Berkeley EECS Challenges in Helicopter Control n

Aggressive Double Sampling for Reducing Multi-class Classification - PowerPoint PPT Presentation

Aggressive Double Sampling for Reducing Multi-class Classification to Binary Classification Bikash Joshi (PhD Student) AMA team, LIG Supervised By: Prof. Massih-Reza Amini and Dr. Franck Iutzeler March 20, 2017 Bikash Joshi (AMA team, LIG)

Double, Multiple, and Sequential Sampling Double-sampling In a double-sampling plan, a first

Financial Impacts of Achieving Aggressive Financial Impacts of Achieving Aggressive Financial

More Java Graphics Shape Classes: Face Check out Faces from SVN Finish Java Graphics: text and

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Sampling Overview R toy sampling Non-probability sampling Probability Methods (AKA random)

Sampling Sediment and Sampling Sediment and Sampling Sediment and Porewater Sampling Sediment

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

Names Quattro S Double A Double S Double C Triple C Quattro C Variations All Boxer models

Double Chooz Experiment Status Double Chooz Experiment Status Jelena Maricic, Drexel University

Global Illumination Multi-Sampling Path Tracing Simple Sampling Josef talked about all of

Case 2: Reducing Cardiovascular Risk Type 2 Diabetes Management Case 1: Reducing Hypoglycemic

Newfound Water Quality Sampling: In Lake Sampling 8 Historic Sampling locations

Sampling Distributions Sampling Distribution of the Mean &amp; Hypothesis Testing Sampling

HawkEye: Efficient Fine-grained OS Support for Huge Pages Ashish Panwar 1 , Sorav Bansal 2 , K.

Skill Based Workshop Severe Aggression CAPTAIN Summit December 5, 2018 Daniel B. Shabani,

Parallel QR Algorithm with Aggressive Early Deflation Meiyue Shao Department of Computing Science

Problem Behavior is Predictable and Preventable Timothy R. Vollmer, Ph.D. Department of

to Optimize Cellular Radio Usage Pavan Kumar, Ranjita Bhagwan, Saikat Guha, Vishnu Navda,

PMWG Readmissions Sub-group 06/25 / 2019 Agenda 1. Revisit Workplan/Vision of Sub-Group 2.

Applied Machine Learning CIML Chaps 4-5 (A Geometric Approach) A ship in port is safe, but

Autonomous Helicopter Flight Pieter Abbeel UC Berkeley EECS Challenges in Helicopter Control n

Sampling Distributions Sampling Distribution of the Mean & Hypothesis Testing Sampling