11/02/2016 UKUROVA UNIVERSITY Outline DEPARTMENT OF ELECTRICAL AND - - PDF document

▶

Jun 18, 2023 274 likes •334 views

11/02/2016 UKUROVA UNIVERSITY Outline DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING Meta Learning on Small Biomedical Informatics Background Big Data/ Small Data Biomedical Datasets Machine Learning Algorithms How to

SLIDE 1

11/02/2016 1

Meta Learning on Small Biomedical Datasets

ÇUKUROVA UNIVERSITY DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING

Turgay Ib Ibrikci, , (Presenter) Esr sra Mahse sereci Karabulut, , Jean Dieu Uwise sengeyi yima mana from Cukurova University, TURKEY

The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam

Outline

Biomedical Informatics
Big Data/ Small Data
Machine Learning Algorithms

Background

How to classify small medical data

with machine learning

Problem

Datasets & Feature
Meta Learning Algorithms
WEKA

Material & Methods

The ROC area Results
The F-measurement Results

Results & Discussions

Methods
Datasets

Conclusions

Background -> Biomedical informatics

Biomedical informatics is the field of science in which all kind of medical data, computer science, and information technology merge to form a single discipline.

Biomedical informatics Statistics Biology Mathematics Genetics Algorithms Proteomics Medical cares Computer science Medicine Data Science Machine Learning Informatics Clinical data Pharmacogenomics

The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam

Background -> Small Data / Big Data

Small Data Big Data

The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam

Small data is data in an

accessible,
informative,
actionable.

Small data typically answers a specific question

r addresses a specific

problem. Big data can be described by

high volume,
high velocity,
high variety,
high veracity,
high variability,
n information assets.

Background -> Machine Learning

Machine Learning :
Machine learning is a subfield of computer science that is a growing role in a wide

range of critical applications such as

data mining,
pattern recognition,
expert systems,
a vastly improved understanding of the human genome.
Machine learning is so pervasive today that you probably use it dozens of times a

day without knowing it.

The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam

Background -> Machine Learning

Supervised Learning

The system learns by examples with its input and desired outputs on predefined set

f data examples, so the goal is to learn a general rule that maps inputs to outputs.

It provides powerful tools for prediction and classification.

Unsupervised Learning

No labels are given to the learning algorithm, leaving it on its own to find structure in its input. Clustering, Anomaly detection and dimension reduction are key techniques for unsupervised learning.

Types of Machine Learning:

There are many different machine learning algorithms that gives computers the ability to learn without being explicitly programmed. They are mostly

The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam

SLIDE 2

11/02/2016 2

Problem -> How to classify small medical data with machine learning

Medical data is collection of hospital/clinical records for expert person who could be medical

doctor or/and technical person/machine/algorithms to help for making decision.

Meta learning is learning algorithms, set by Donald B. Maudsley, that are applied on data to

understand the interaction between the mechanism of learning and the concrete contexts.

Meta learning provides one such methodology that allows systems to become more effective

through experience.

Meta learning differs from base learning in the scope of the level of the adaptation.

http://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam

Material

Datasets*
Arrhythmia
Heart disease(Cleveland)
Vertebral column (2C)
CTG
Diabetes (Pima Indians)
Mammographic mass
Parkinson
Wisconsin breast cancer
WEKA

The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam

Material -> Datasets

Datasets* Instances Attributes Classes Arrhythmia [6] 452 279 2 Heart disease(Cleveland) [7] 303 13 5 Vertebral column (2C) [8] 310 6 2 CTG [9] 2126 21 3 Diabetes (Pima Indians) [10] 768 8 2 Mammographic mass [11] 961 5 2 Parkinson [12] 194 22 2 Wisconsin breast cancer [13] 699 9 2

The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam *These all datasets are taken from UCI Machine Learning Repository

Material -> Weka

Waikato Environment for Knowledge Analysis

(WEKA)

It’s a data mining/machine learning tool developed

by Department of Computer Science, University of Waikato, New Zealand.

100+ algorithm for classification
75 for data preprocessing
25 to assist with feature selection
20 for clustering, finding association rules, etc.

The Explorer: Preprocess data Classification Clustering Association Rules Attribute Selection Data Visualization

The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam

Methods

Bagging
Dagging
Decorate
Random Forest
Filtered Classification

The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam

Bagging : is a bootstrap method for improving the accuracy of the model by using the multiple random

redistribution copies of the training set [14].

Main point on bagging algorithm, average of misclassification errors on divided different subset of the data gives a better

estimate of the predictive ability of a learning method. Thus, bagging pursues to reduce the error rate by using a variance of the base classifier.

Dagging : is similar to Bagging, but as input to each member of the ensemble it uses disjoint stratified folds
f the training data instead of bootstrap samples [15].
Decorate : Diverse Ensemble Creation by Oppositional Relabeling of Artificial Training Examples directly

builds ensembles of diverse classifiers by using specially constructed artificial training examples.

It is a simple and general meta-learner that can decide to use any strong learner as a base classifier to build diverse groups

[16].

Rotation Forest : is also one method for generating classifier ensembles based on feature extraction.
The idea of the rotation approach is to encourage simultaneously individual accuracy and diversity within the ensemble.

Diversity is promoted through the feature extraction for each base classifier. Decision trees are most often chosen because they are sensitive to rotation of the feature axes, hence the name "forest.” Accuracy is sought by keeping all principal components and also using the whole data set to train each base classifier [17].

Filterered Classification : This filter is generated using the training data, and then applied to the test data.

The filter will be processed on the test data without any changing the structure of it [18].

Methods

The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam

SLIDE 3

11/02/2016 3

Results & Discussions

tp -> true positive
fp -> false positive
fn -> false negative

( ) tp Recall sensitivity tp fn   ( ) tp Precision specifity tp fp  

2* Precision*Recall F Precision+Recall 

We evaluate 5 meta-learning algorithms on 8 datasets using the 10-

fold cross-validation accuracy.

The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam

Results I –> ROC Area for the datasets

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Bagging Dagging Decorate RotationForest FilteredClassifier Arrhytmia Cleveland VertebralC CTG Diabetes Mommograp Parkinson Wisconsin Average The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam

Results I –> ROC Area The Algorithms on the datasets

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Arrhytmia Cleveland VertebralC CTG Diabetes Mommograp Parkinson Wisconsin Average Bagging Dagging Decorate RotationForest FilteredClassifier The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Bagging Dagging Decorate RotationForest FilteredClassifier Arrhytmia Cleveland Vertebral column CTG Diabetes Mommograp Parkinson Wisconsin Average

Results II –> F-Measurement for the algorithms

The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam

Results II –> F-Measurement for the datasets

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Arrhytmia Cleveland Vertebral column CTG Diabetes Mommograp Parkinson Wisconsin Average Bagging Dagging Decorate RotationForest FilteredClassifier The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam

Conclusions

In this study, we focused on small biomedical datasets to classify by using meta learning
The best results achieve an
ROC Area of 0.992 on the Wisconsin Breast Cancer Data Set with decorate algorithm. - The Class number is 2
F-measure of 0.972 on the Wisconsin Breast Cancer Data Set with rotation forest algorithm.
The worst results achieve an
ROC Area of 0.68 on the Cleveland Data Set with Filtered Classifier algorithm. – The Class number is 5
F-measure of 0.479 on the Cleveland Data Set with Filtered Classifier algorithm.
Meta-learning can serve as a useful algorithm for classifying the medical datasets with exploitation of knowledge.
It has a strong potential impact in medical informatics applications.

The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam

SLIDE 4

11/02/2016 4

Future Works

We will focus on big biomedical data to classify with different algorithms.
Support Vector Machine.
Deep learning
Different features can be added datasets.
New data sets

The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam

References-Selected

Maudsley, D.B. : A Theory of Meta-Learning and Principles of Facilitation: An Organismic Perspective. University of

Toronto, (1979) 40, 8,4354-4355-A

Muresan, S. : Pre-processing flow for enhancing learning from medical data Int. Computer Comm. and Processing (ICCP),

(2015), 27 - 34

Randa El-Bialy , Mostafa A. S., Omar H. K. and M.Essam K. : Feature Analysis of Coronary Artery Heart Disease Data Sets,

Procedia Computer Science 65 ( 2015 ) 459 – 468

Arredondo T, Ormazabal W.: Meta-learning framework applied in bioinformatics inference system design,Int J Data Min
Bioinform. (2015) 11(2):139-66.
Lichman, M. UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of

Information and Computer Science (2013).

H. Altay Guvenir, Burak Acar, Gulsen Demiroz, Ayhan Cekin: A Supervised Machine Learning Algorithm for Arrhythmia

Analysis Proceedings of the Comp. in Cardiology Conference (1997) 24: 433 - 436

Detrano, R., Janosi, A., Steinbrunn, W., Pfisterer, M., Schmid, J., Sandhu, S., Guppy, K., Lee, S., & Froelicher, V. :

International application of a new probability algorithm for the diagnosis of coronary artery disease. A. Journal of Cardiology, (1989) 64, 304-310.

Rocha Neto, A. R. & Barreto, G. A.: On the Application of Ensembles of Classifiers to the Diagnosis of Pathologies of the

Vertebral Column: A Comparative Analysis, IEEE Latin America Transactions, (2009) 7(4):487-496.

The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam

Questions?

The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam