For the Open-Set Data Classification Zhuoyi Wang, Bo Dong, Yu Lin, - PowerPoint PPT Presentation

UT DALLAS Erik Jonsson School of Engineering & ComputerScience Co-Representation Learning Framework For the Open-Set Data Classification Zhuoyi Wang, Bo Dong, Yu Lin, Yigong Wang, Latifur Khan Department of Computer Science The University of Texas at Dallas, Richardson TX, USA This material is based upon work supported by FEARLESS engineering

Why Open Data Classification? ⮚ Open Data Problem: – Continuous incoming of instances from various source domain. – New emerged categories that may non-linear separable with existing classes. – Limited past data to build model. Scene image with Flow of news new object in the with new autonomous system emerged topic in the Social Network System. 2 FEARLESS engineering

Challenge: Novel Class Novel class x 1 Novel Class Traditional lower dimension space dataset (Boundary between novel and existing class is clear) Previous work: ECSMiner[1], SAND[2], Non-linear separable space of data ECHO[3]. set FASHION-MNIST (novel categories may share some attributes with seen class). [1]. Al-Khateeb, T., Masud, M. M., Khan, L., Aggarwal, C., Han, J., & Thuraisingham, B. (2012, December). Stream classification with recurring and novel class detection using class- based ensemble. In 2012 ICDM. [2]. Haque, Ahsanul, Latifur Khan, and Michael Baron. "Sand: Semi-supervised adaptive novel class detection and classification o ver data stream." In AAAI 2016. [3]. Haque, Ahsanul, et al. "Efficient handling of concept drift and concept evolution over stream data." In 2016 ICDE. 3 FEARLESS engineering

Proposed Approach: RLCN ✓ Applying deep metric-learning to : – Extract more powerful features from deep network – Learn more suitable latent space which improve the intra-class compactness and inter-class separateness, and – detecting any significant change(s) through confidence scores. ✓ Wight Pairwise-Constraint is used for classification: – Distance metric for existing and novel class is learned on different samples for corresponding classification. – Training is semi-supervised. ✓ Temperature Scaling (distillation process for output calibration) ✓ Novel class detection is integrated. 4 FEARLESS engineering

Co-Representation Learning Framework FEARLESS engineering 5

Weighted Pairwise-Constraint (WPC) Figure 1 We define the weighted pairwise-constraint (WPC), which aims to transform the input x into a suitable latent feature space 𝜚 (x). Comparing the original space in the Figure 1 (middle), after the mapping operation after WPC, the distance between similar pairs will be smaller and clustered together; the inter-class separation also achieved with a margin value τ + γ . 6 FEARLESS engineering

Weighted Pairwise-Constraint (WPC) ➢ Objective function (minimization problem) ➢ Weighted Constraint (Discussion) The weight for pair ( 𝑦 𝑗 , 𝑦 𝑘 ) in the WPL, could be derived from differentiating 𝑀 𝑞𝑏𝑗𝑠 on S( 𝑦 𝑗 , 𝑦 𝑘 ) as: 7 FEARLESS engineering

Weighted Pairwise-Constraint (WPC) We can find that a negative pair with a higher similarity would be assigned with a larger weight: Less weight contribute to 𝑀 𝑞𝑏𝑗𝑠 Larger weight contribute to 𝑀 𝑞𝑏𝑗𝑠 Negative pair (In the figure below) is more informative than the dis-similar and inter- class pairs (Upside). The two similar samples from different classes could be distinguished by our WPC . 8 FEARLESS engineering

Open-World Classifier Aims to compute the threshold T novel to reject instances from novel class and classify 𝑧 t ො the existing categories’ examples. Let be the estimated class label for instance x t , different with softmax output, we apply the temperature scaling to calibrated the prediction confidence of the output: Suppose the instances from total k classes in the training set D, . For the x t, we could calculate the prediction probability that it’s corresponding ො 𝑧 t for class Y i , i ∈ {1,2 … 𝐿} : The threshold T novel for class Y i could be generated through: 9 FEARLESS engineering

Experiments ➢ We use the following performance metrics: – ERR: Total misclassication error (percent), i.e., ((FP+FN+Fe)*100)/N – M new : % of novel class instances Misclassified as existing class, i.e., (FN*100)/N c – F new : % of existing class instances Falsely identified as novel class, i.e., (FP*100)/(N-N c ) Name of Data Number of Number of Number of Set Instances Classes Features FASHION-MNIST[1] 70,000 10 784 MNIST [2] 70,000 10 784 CIFAR10[3] 70,000 10 4096 New-York Times 100,000 20 300 Guardian 130,000 20 300 SynRBF@0.003 [4] 130,000 7 70 10 FEARLESS engineering

Visulization of latent space TSNE visualization of embeddings in original feature space (left) and transformed latent feature space (right) on FASHION-MNIST dataset 11 FEARLESS engineering

Classification Error and Executing Time Comparison of classification performance on competing methods over open-set settings. Analysis of the execution time for the different training of model. RLCN is more efficient than other methods during different dimension of datasets. 12 FEARLESS engineering

Performance: Novel Class Detection Novel Class detection performance over open-set data environment. Here - denotes failure of novel class detection. Parameter analysis of τ on novel class detection 13 FEARLESS engineering

Conclusion ➢ RLCN: – is a co-representation learning framework which handles for open-set classification problem. • Metric learning for learned suitable latent space for filter out novel class. – adjusts threshold with temperature scaling to find the most probability novel class instances. – does not require so many labeled data to maintain the classification accuracy for the novel class instances. – outperforms state of the art approaches both in classification accuracy and novel class detection. 14 FEARLESS engineering

For the Open-Set Data Classification Zhuoyi Wang, Bo Dong, Yu Lin, - PowerPoint PPT Presentation

UT DALLAS Erik Jonsson School of Engineering & ComputerScience Co-Representation Learning Framework For the Open-Set Data Classification Zhuoyi Wang, Bo Dong, Yu Lin, Yigong Wang, Latifur Khan Department of Computer Science The University

Classification Image Classification Set of predefined categories [eg: table, apple, dog, giraffe]

Input. A set of men M , and a set of women W . Input. A set of men M , and a set of women W .

Graph Classification Classification Outline Introduction, Overview Classification using

Classification of Symmetry Classification of Symmetry Classification of Symmetry Classification

Why Open Data? Closed Data is Bad For You Ingo R. Keck ingo.keck@openknowledge.ie Open

(a) Quantitative classification (b) Qualitative classification (c) Area classification (d) Simple

Classification 1 Classification: Basic Concepts and Methods Classification: Basic Concepts

Library of Congress Classification: Module 1.3 1 Library of Congress Classification: Module 1.3

Classification K-nearest neighbor classification D istance functions Choice of k Choice of k

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Management of Classification Lookup Files The basics of classification The basics of

Data Classification Data Classification While not (explicitly) on the CCNA Security (640-553

The Future of Open Data THE FUTURE OF OPEN DATA AFRICA OPEN DATA CONFERENCE Edward Anderson Dar

Classification Classification and Prediction Classification: predict categorical class labels

Open Data Kit h8p://code.google.com/p/open-data-kit A set of open

Data Structures for Disjoint Set Union-Find Data Structure Disjoint Set Data Structure Disjoint

Whole Person Care Los Angeles Clemens Hong MD MPH Director, Whole Person Care Medical

Theorie der Informatik 6. Formale Sprachen und Grammatiken Malte Helmert Gabriele R oger

A Closer Look at Adaptive Regret Dmitry Adamskiy Joint work with Wouter Koolen, Volodya Vovk and

Robust Multilingual Statistical Morphology Generation Models Ondej Duek and Filip Jurek

Protecting Westminsters Means of Grace Ministry! Organization, Administration & Safety

Data Driven Justice Planning and Research in Long Beach Oct 14 2020 Outline 1. Beginning -

ALICE ITS Coordination Board 16 July 2013 L. Musa 1 TDR - draft CET + 6h WP meetings 15h

Partnership Health Plan Social Determinants of Health June 3rd, 2019 What Exactly is Whole Person

For the Open-Set Data Classification Zhuoyi Wang, Bo Dong, Yu Lin, - PowerPoint PPT Presentation

UT DALLAS Erik Jonsson School of Engineering & ComputerScience Co-Representation Learning Framework For the Open-Set Data Classification Zhuoyi Wang, Bo Dong, Yu Lin, Yigong Wang, Latifur Khan Department of Computer Science The University

Classification Image Classification Set of predefined categories [eg: table, apple, dog, giraffe]

Input. A set of men M , and a set of women W . Input. A set of men M , and a set of women W .

Graph Classification Classification Outline Introduction, Overview Classification using

Classification of Symmetry Classification of Symmetry Classification of Symmetry Classification

Why Open Data? Closed Data is Bad For You Ingo R. Keck ingo.keck@openknowledge.ie Open

(a) Quantitative classification (b) Qualitative classification (c) Area classification (d) Simple

Classification 1 Classification: Basic Concepts and Methods Classification: Basic Concepts

Library of Congress Classification: Module 1.3 1 Library of Congress Classification: Module 1.3

Classification K-nearest neighbor classification D istance functions Choice of k Choice of k

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Management of Classification Lookup Files The basics of classification The basics of

Data Classification Data Classification While not (explicitly) on the CCNA Security (640-553

The Future of Open Data THE FUTURE OF OPEN DATA AFRICA OPEN DATA CONFERENCE Edward Anderson Dar

Classification Classification and Prediction Classification: predict categorical class labels

Open Data Kit h8p://code.google.com/p/open-data-kit A set of open

Data Structures for Disjoint Set Union-Find Data Structure Disjoint Set Data Structure Disjoint

Whole Person Care Los Angeles Clemens Hong MD MPH Director, Whole Person Care Medical

Theorie der Informatik 6. Formale Sprachen und Grammatiken Malte Helmert Gabriele R oger

A Closer Look at Adaptive Regret Dmitry Adamskiy Joint work with Wouter Koolen, Volodya Vovk and

Robust Multilingual Statistical Morphology Generation Models Ondej Duek and Filip Jurek

Protecting Westminsters Means of Grace Ministry! Organization, Administration &amp; Safety

Data Driven Justice Planning and Research in Long Beach Oct 14 2020 Outline 1. Beginning -

ALICE ITS Coordination Board 16 July 2013 L. Musa 1 TDR - draft CET + 6h WP meetings 15h

Partnership Health Plan Social Determinants of Health June 3rd, 2019 What Exactly is Whole Person

Protecting Westminsters Means of Grace Ministry! Organization, Administration & Safety