11/02/2016 UKUROVA UNIVERSITY Outline DEPARTMENT OF ELECTRICAL AND - - PDF document

11 02 2016
SMART_READER_LITE
LIVE PREVIEW

11/02/2016 UKUROVA UNIVERSITY Outline DEPARTMENT OF ELECTRICAL AND - - PDF document

11/02/2016 UKUROVA UNIVERSITY Outline DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING Meta Learning on Small Biomedical Informatics Background Big Data/ Small Data Biomedical Datasets Machine Learning Algorithms How to


slide-1
SLIDE 1

11/02/2016 1

Meta Learning on Small Biomedical Datasets

ÇUKUROVA UNIVERSITY DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING

Turgay Ib Ibrikci, , (Presenter) Esr sra Mahse sereci Karabulut, , Jean Dieu Uwise sengeyi yima mana from Cukurova University, TURKEY

The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam

Outline

  • Biomedical Informatics
  • Big Data/ Small Data
  • Machine Learning Algorithms

Background

  • How to classify small medical data

with machine learning

Problem

  • Datasets & Feature
  • Meta Learning Algorithms
  • WEKA

Material & Methods

  • The ROC area Results
  • The F-measurement Results

Results & Discussions

  • Methods
  • Datasets

Conclusions

Background -> Biomedical informatics

Biomedical informatics is the field of science in which all kind of medical data, computer science, and information technology merge to form a single discipline.

Biomedical informatics Statistics Biology Mathematics Genetics Algorithms Proteomics Medical cares Computer science Medicine Data Science Machine Learning Informatics Clinical data Pharmacogenomics

The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam

Background -> Small Data / Big Data

Small Data Big Data

The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam

Small data is data in an

  • accessible,
  • informative,
  • actionable.

Small data typically answers a specific question

  • r addresses a specific

problem. Big data can be described by

  • high volume,
  • high velocity,
  • high variety,
  • high veracity,
  • high variability,
  • n information assets.

Background -> Machine Learning

  • Machine Learning :
  • Machine learning is a subfield of computer science that is a growing role in a wide

range of critical applications such as

  • data mining,
  • pattern recognition,
  • expert systems,
  • a vastly improved understanding of the human genome.
  • Machine learning is so pervasive today that you probably use it dozens of times a

day without knowing it.

The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam

Background -> Machine Learning

  • Supervised Learning

The system learns by examples with its input and desired outputs on predefined set

  • f data examples, so the goal is to learn a general rule that maps inputs to outputs.

It provides powerful tools for prediction and classification.

  • Unsupervised Learning

No labels are given to the learning algorithm, leaving it on its own to find structure in its input. Clustering, Anomaly detection and dimension reduction are key techniques for unsupervised learning.

  • Types of Machine Learning:

There are many different machine learning algorithms that gives computers the ability to learn without being explicitly programmed. They are mostly

The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam

slide-2
SLIDE 2

11/02/2016 2

Problem -> How to classify small medical data with machine learning

  • Medical data is collection of hospital/clinical records for expert person who could be medical

doctor or/and technical person/machine/algorithms to help for making decision.

  • Meta learning is learning algorithms, set by Donald B. Maudsley, that are applied on data to

understand the interaction between the mechanism of learning and the concrete contexts.

  • Meta learning provides one such methodology that allows systems to become more effective

through experience.

  • Meta learning differs from base learning in the scope of the level of the adaptation.

http://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam

Material

  • Datasets*
  • Arrhythmia
  • Heart disease(Cleveland)
  • Vertebral column (2C)
  • CTG
  • Diabetes (Pima Indians)
  • Mammographic mass
  • Parkinson
  • Wisconsin breast cancer
  • WEKA

The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam

Material -> Datasets

Datasets* Instances Attributes Classes Arrhythmia [6] 452 279 2 Heart disease(Cleveland) [7] 303 13 5 Vertebral column (2C) [8] 310 6 2 CTG [9] 2126 21 3 Diabetes (Pima Indians) [10] 768 8 2 Mammographic mass [11] 961 5 2 Parkinson [12] 194 22 2 Wisconsin breast cancer [13] 699 9 2

The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam *These all datasets are taken from UCI Machine Learning Repository

Material -> Weka

  • Waikato Environment for Knowledge Analysis

(WEKA)

  • It’s a data mining/machine learning tool developed

by Department of Computer Science, University of Waikato, New Zealand.

  • 100+ algorithm for classification
  • 75 for data preprocessing
  • 25 to assist with feature selection
  • 20 for clustering, finding association rules, etc.

The Explorer: Preprocess data Classification Clustering Association Rules Attribute Selection Data Visualization

The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam

Methods

  • Bagging
  • Dagging
  • Decorate
  • Random Forest
  • Filtered Classification

The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam

  • Bagging : is a bootstrap method for improving the accuracy of the model by using the multiple random

redistribution copies of the training set [14].

  • Main point on bagging algorithm, average of misclassification errors on divided different subset of the data gives a better

estimate of the predictive ability of a learning method. Thus, bagging pursues to reduce the error rate by using a variance of the base classifier.

  • Dagging : is similar to Bagging, but as input to each member of the ensemble it uses disjoint stratified folds
  • f the training data instead of bootstrap samples [15].
  • Decorate : Diverse Ensemble Creation by Oppositional Relabeling of Artificial Training Examples directly

builds ensembles of diverse classifiers by using specially constructed artificial training examples.

  • It is a simple and general meta-learner that can decide to use any strong learner as a base classifier to build diverse groups

[16].

  • Rotation Forest : is also one method for generating classifier ensembles based on feature extraction.
  • The idea of the rotation approach is to encourage simultaneously individual accuracy and diversity within the ensemble.

Diversity is promoted through the feature extraction for each base classifier. Decision trees are most often chosen because they are sensitive to rotation of the feature axes, hence the name "forest.” Accuracy is sought by keeping all principal components and also using the whole data set to train each base classifier [17].

  • Filterered Classification : This filter is generated using the training data, and then applied to the test data.

The filter will be processed on the test data without any changing the structure of it [18].

Methods

The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam

slide-3
SLIDE 3

11/02/2016 3

Results & Discussions

  • tp -> true positive
  • fp -> false positive
  • fn -> false negative

( ) tp Recall sensitivity tp fn   ( ) tp Precision specifity tp fp  

2* Precision*Recall F Precision+Recall 

  • We evaluate 5 meta-learning algorithms on 8 datasets using the 10-

fold cross-validation accuracy.

The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam

Results I –> ROC Area for the datasets

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Bagging Dagging Decorate RotationForest FilteredClassifier Arrhytmia Cleveland VertebralC CTG Diabetes Mommograp Parkinson Wisconsin Average The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam

Results I –> ROC Area The Algorithms on the datasets

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Arrhytmia Cleveland VertebralC CTG Diabetes Mommograp Parkinson Wisconsin Average Bagging Dagging Decorate RotationForest FilteredClassifier The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Bagging Dagging Decorate RotationForest FilteredClassifier Arrhytmia Cleveland Vertebral column CTG Diabetes Mommograp Parkinson Wisconsin Average

Results II –> F-Measurement for the algorithms

The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam

Results II –> F-Measurement for the datasets

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Arrhytmia Cleveland Vertebral column CTG Diabetes Mommograp Parkinson Wisconsin Average Bagging Dagging Decorate RotationForest FilteredClassifier The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam

Conclusions

  • In this study, we focused on small biomedical datasets to classify by using meta learning
  • The best results achieve an
  • ROC Area of 0.992 on the Wisconsin Breast Cancer Data Set with decorate algorithm. - The Class number is 2
  • F-measure of 0.972 on the Wisconsin Breast Cancer Data Set with rotation forest algorithm.
  • The worst results achieve an
  • ROC Area of 0.68 on the Cleveland Data Set with Filtered Classifier algorithm. – The Class number is 5
  • F-measure of 0.479 on the Cleveland Data Set with Filtered Classifier algorithm.
  • Meta-learning can serve as a useful algorithm for classifying the medical datasets with exploitation of knowledge.
  • It has a strong potential impact in medical informatics applications.

The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam

slide-4
SLIDE 4

11/02/2016 4

Future Works

  • We will focus on big biomedical data to classify with different algorithms.
  • Support Vector Machine.
  • Deep learning
  • Different features can be added datasets.
  • New data sets

The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam

References-Selected

  • Maudsley, D.B. : A Theory of Meta-Learning and Principles of Facilitation: An Organismic Perspective. University of

Toronto, (1979) 40, 8,4354-4355-A

  • Muresan, S. : Pre-processing flow for enhancing learning from medical data Int. Computer Comm. and Processing (ICCP),

(2015), 27 - 34

  • Randa El-Bialy , Mostafa A. S., Omar H. K. and M.Essam K. : Feature Analysis of Coronary Artery Heart Disease Data Sets,

Procedia Computer Science 65 ( 2015 ) 459 – 468

  • Arredondo T, Ormazabal W.: Meta-learning framework applied in bioinformatics inference system design,Int J Data Min
  • Bioinform. (2015) 11(2):139-66.
  • Lichman, M. UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of

Information and Computer Science (2013).

  • H. Altay Guvenir, Burak Acar, Gulsen Demiroz, Ayhan Cekin: A Supervised Machine Learning Algorithm for Arrhythmia

Analysis Proceedings of the Comp. in Cardiology Conference (1997) 24: 433 - 436

  • Detrano, R., Janosi, A., Steinbrunn, W., Pfisterer, M., Schmid, J., Sandhu, S., Guppy, K., Lee, S., & Froelicher, V. :

International application of a new probability algorithm for the diagnosis of coronary artery disease. A. Journal of Cardiology, (1989) 64, 304-310.

  • Rocha Neto, A. R. & Barreto, G. A.: On the Application of Ensembles of Classifiers to the Diagnosis of Pathologies of the

Vertebral Column: A Comparative Analysis, IEEE Latin America Transactions, (2009) 7(4):487-496.

The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam

Questions?

The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam