Margin-based Semi-supervised Learning Using Apollonius circle MONA - PowerPoint PPT Presentation

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S 2 0 2 0

Semi-supervised learning Training data Supervised All labeled data Model learning Some labeled data Semi-supervised Model learning Lots of unlabeled data Unsupervised learning Model All unlabeled data 2 TTCS 2020

Self-training semi-supervised algorithm Step 1: initial labeled training data L: L init Step 2: f=learn classifier (L) Step 4: Augment training data: L  L self Step 5: Repeat step 2 L self = k example with most confident predictions, Step 3: Apply f on the unlabeled data: U remove these examples from the unlabeled pool Unlabeled data 3 TTCS 2020

The main contributions ➢ Evaluating the use of data points that are close to the decision boundary for improving the classification performance. ➢ Proposing a geometric base selection metric to find informative unlabeled data points. ➢ We define a new metric to measure the similarity between the labeled and unlabeled data points based on the proposed geometrical structure. ➢ We address an agreement based approach for selection from the newly-labeled data based on classifier predictions and proposed neighborhood construction algorithm. 4 TTCS 2020

Apollonius circle Apollonius circle : The Apollonius circle is the geometric location of the points on the Euclidean plane, the ratio of its distance two fixed points A and B is fixed K M d 2 d 1 D A C B 𝒆(𝑩,𝑵) K= 𝒆(𝑵,𝑪) 5 TTCS 2020

Apollonius circle B A K<1 K=1 K>1 𝑫 𝑩 𝒋𝒈 𝒍 < 𝟐 𝑫 𝑪 𝒋𝒈 𝑫 𝑩𝑪 = ቐ 𝒍 > 𝟐 𝑫 𝒋𝒐𝒈 𝒋𝒈 𝒍 = 𝟐 6 TTCS 2020

Density peak ➢ Local density ( 𝜍 𝑗 ) is defined as: 𝟐 𝒔 σ 𝑵 𝒋 𝝑𝑶(𝑵 𝒌 ) 𝒆(𝑵 𝒋 , 𝑵 𝒌 ) 𝟑 ) 𝝇 𝒋 = 𝐟𝐲𝐪(−( 𝑠 = 𝑞 × 𝑜 𝑒 𝑁 𝑗 , 𝑁 𝑘 = 𝑁 𝑗 − 𝑁 𝑘 ➢ 𝜀 𝑗 is the minimum distance between 𝑁 𝑗 and any other sample with higher density than 𝜍 𝑗 , which is define as: {𝒆(𝑵 𝒋 , 𝑵 𝒌 ) 𝜺 𝒋 = ቊ 𝒏𝒋𝒐 𝝇 𝒋 <𝝇 𝒌 𝝇 𝒋 < 𝝇 𝒌 ∃𝒌 𝒋𝒈 𝒏𝒃𝒚 𝒌 𝒆(𝑵 𝒋 , 𝑵 𝒌 ) , 𝒑𝒖𝒊𝒇𝒔𝒙𝒋𝒕𝒇 ➢ Peaks (high density points) are obtained using the score function 𝒕𝒅𝒑𝒔𝒇(𝑵 𝒋 )= 𝝇 𝒋 × 𝜺 𝒋 7 TTCS 2020

Neighborhood groups with the Apollonius circle Farthest data points are defined as: 𝒏 𝑮𝒆 𝑸 𝒖 = 𝒏𝒃𝒚 𝒆 𝑸 𝒖 , 𝑵 𝒋 |𝑵 𝒋 ∈ 𝑵 𝒃𝒐𝒆 𝒆 𝑸 𝒖 , 𝑵 𝒋 < 𝒆 𝑸 𝒖 , 𝑸 𝒖+𝟐 𝒃𝒐𝒆 𝒆 𝑸 𝒖 , 𝑵 𝒋 < 𝒏𝒋𝒐 𝒎=𝟐,𝒎∈𝒒 𝒆 𝑸 𝒎 , 𝑵 𝒋 𝒕. 𝒖. 𝒖 ≠ 𝒎 𝑢 = {1,2, … , 𝑛 − 1} 𝑁: 𝑒𝑏𝑢𝑏 𝑞𝑝𝑗𝑜𝑢𝑡, 𝑁 𝑗 ∉ 𝑄 Peak points : 𝑄 = (𝑄 1 , 𝑄 2 , … , 𝑄 𝑛 ) M={ 𝑁 𝑗 |𝑗 ∈ {1,2, … , 𝑜 − 𝑛} Farthest data points 𝑮𝑸 𝑸 𝒖 = {𝑵 𝒋 |𝒆 𝑸 𝒖 , 𝑵 𝒋 = 𝑮𝒆 𝑸 𝒖 8 TTCS 2020

Example dataset for finding farthest points and grouping 𝑮𝒆 𝟑 = 𝒏𝒃𝒚 𝒆 𝟑, 𝑵 𝒋 |𝑵 𝒋 ∈ 𝟐, 𝟒, 𝟓, 𝟕, 𝟖, 𝟘, 𝟐𝟏 𝒃𝒐𝒆 𝒆 𝟑, 𝑵 𝒋 < 𝒆 𝟔, 𝑵 𝒋 → 𝑮𝒆 𝟑 = 𝒆 𝟑, 𝟒 → 𝑮𝑸 𝟑 = 𝟒 𝒃𝒐𝒆 𝒆 𝟑, 𝑵 𝒋 < 𝒆 𝟗, 𝑵 𝒋 𝒃𝒐𝒆 𝒆 𝟑, 𝑵 𝒋 < 𝒆(𝟑, 𝟗) 9 TTCS 2020

Example for making neighborhood groups with the Apollonius circle Class1 Class1 Class1 Class2 Class2 Class2 Unlabeled data Unlabeled data Unlabeled data Peak1 Peak1 Peak2 Peak2 farthest point 10 TTCS 2020

Impact of selecting data close the decision boundary 11 TTCS 2020

Proposed semi-supervised algorithm 12 TTCS 2020

Summarize the properties of all the used datasets Name #Example #Attribute(D) #Class Iris 150 4 3 Wine 178 13 3 Seeds 210 7 3 Thyroid 215 5 3 Glass 214 9 6 Banknote 1372 4 2 Liver 345 6 2 Blood 748 4 2 13 TTCS 2020

Experimental results of comparisons accuracy of the algorithms with 10% labeled data dataset Supervised SVM Self-training SVM STC-DPC algorithm Our algorithm Iris 92.50 87 91 95.76 Wine 88.30 90.81 86.96 91.40 Seeds 84.16 74.40 81.19 92.35 Thyroid 88.95 87.21 89.65 91.72 Glass 47.44 51.15 51.15 51.93 Banknote 98.39 98.77 98.12 96.62 Liver 58.04 57.31 55.29 61.90 Blood 72.42 72.58 72.01 74.98 14 TTCS 2020

Accuracy rate of our algorithm with all unlabeled data and near decision boundary unlabeled data Dataset All unlabeled data Selected unlabeled Banknote 96.58 96.62 Liver 59.85 61.90 Blood 75.45 74.98 Heart 74.78 78.25 Hypothyroid 78.78 78.25 Diabetes 62.72 63.47 Parkinson 80.62 80.62 15 TTCS 2020

Impact of the ratio of labeled data 16 TTCS 2020

Impact of the ratio of labeled data 17 TTCS 2020

Discussion banknote 18 TTCS 2020

Discussion wine iris seeds 19 TTCS 2020

conclusion ➢ we proposed a semi-supervised self-training method based on Apollonius. ➢ First candidate data are selected from the unlabeled data to be labeled in the self-training process. Then, the peak points are found using the density peak clustering. Apollonius circle corresponding to each peak point is formed and the label of peak point is assigned to unlabeled data points in the Apollonius circle. The applied base classifier is svm which is a margin based algorithm. 20 TTCS 2020

conclusion ➢ A series of experiments was performed on several datasets and the performance of the proposed algorithm was evaluated. According to the experimental results, it is concluded that the proposed algorithm outperforms the other algorithms. Especially on datasets lacking any scattered data distribution and mixed structure . ➢ The impact of selecting data close to decision boundary was investigated and it was found that data points close to decision boundary effects the optimal change of decision boundary more than the farthest ones and also improve the classification performance. 21 TTCS 2020

References [1] Di Wu, Mingsheng Shang, X.L.J.X.H.Y.W.D., Wang, G., 2018. Self-training semi-supervised classification based on density peaks of data. Neurocomputing 275, No. C, 180-191. [2] Pourbahrami, S., Khanli, L.M., Azimpour, S., 2019. A novel and efficient data point neighborhood construction algorithm based on apollonius circle. Expert Systems with Applications 115, 57 - 67 . [3] Rodriguez, A., Laio, A., 2014. Clustering by fast search and find of density peaks. science 344, Issue 6191, 1492-1496. [4] Tanha, J., 2019. A multiclass boosting algorithm to labeled and unlabeled data. International Journal of Machine Learning and Cybernetics . [4] Tanha, J., van Someren, Maarten, A.H., 2017. Semi-supervised self-training for decision tree classifiers. International Journal of Machine Learning and Cybernetics 8, 355-370. [5] Tanha, J., van Someren, M., Afsarmanesh, H., 2014. Boosting for multiclass semi-supervised learning. Pattern Recognition Letters 37, 63-77. [6] Tanha, J., van Someren, M., Afsarmanesh, H., 2017. Semi-supervised self-training for decision tree classifiers. International Journal of Machine Learning and Cybernetics 8, 355-370 . [7] Zhou, Y., Kantarcioglu, M., Thuraisingham, B., 2012. Self-training with selection-by-rejection, ICDM ’ 12: proceedings of 2012 IEEE 12 th International Conference on Data Mining. 22 TTCS 2020

Thank you for your attention. emadi.mona@pnu.ac.ir 23 TTCS 2020

Margin-based Semi-supervised Learning Using Apollonius circle MONA - PowerPoint PPT Presentation

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S 2 0 2 0 Semi-supervised learning Training data Supervised All labeled data Model learning Some labeled data Semi-supervised Model learning

Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Maximum Margin Criterion Math

Semi-Supervised Learning Maria-Florina Balcan 03/30/2015 Readings: Semi-Supervised Learning.

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Support Vector Machines (SVMs). Semi-Supervised Learning. Semi-Supervised SVMs.

CS330 Paper Presentation: October 16th, 2019 Supervised Classification Semi-Supervised

Semi-Supervised Kernel Mean Shift Clustering A Semi-Supervised Clustering Approach Motivation:

Semi-Supervised Local Fisher Semi-Supervised Local Fisher Discriminant Analysis Discriminant

Iterative Hybrid Algorithm for Semi-supervised Classification Martin SAVESKI Supervised by

Maximum Margin based Semi-supervised Spectral Kernel Learning Zenglin Xu, Jianke Zhu, Michael R.

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

5 Semi-Supervised Learning BVM Tutorial: Advanced Deep Learning Methods David Zimmerer, Division

Classification Semi-supervised learning based on network Speakers: Hanwen Wang, Xinxin Huang, and

Semi-Supervised Learning Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824

10701 Semi supervised learning Can Unlabeled Data improve supervised learning? Important

About this class Maximizing the Margin Maximum margin classifiers Picture of large and small

Nave Bayes in a Nutshell Bayes rule: Assuming conditional independence among X i s: So,

Bias and Parsimony in Regression Analysis ECS 256 W14 Final Project Presentaion Kevin Cosgrove,

Android goes Semantic: DL Reasoners on Smartphones Fernando Bobillo , fbobillo@unizar.es

Semantics Avalanche: Word Sense Disambiguation, Dependency Parsing, Semantic Role

MDL L for or Pat atte tern Min inin ing Jill illes V s Vreeken 4 4 June une 2014 2014

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech ECE 4424 / 5424G (CS

Cautious label-wise ranking with constraint satisfaction Sbastien Destercke, Yonatan Carlos

Automatic Collocation Extraction from Text Corpora Pavel Pecina Ustav form aln a