On Windowing as a subsampling method for Distributed Data Mining - - PowerPoint PPT Presentation

on windowing as a subsampling method for distributed data
SMART_READER_LITE
LIVE PREVIEW

On Windowing as a subsampling method for Distributed Data Mining - - PowerPoint PPT Presentation

On Windowing as a subsampling method for Distributed Data Mining David Mart nez-Galicia Director: Alejandro Guerra-Hern andez Co-directors: Nicandro Cruz-Ram rez, Xavier Lim on Universidad Veracruzana Centro de investigaci on


slide-1
SLIDE 1

On Windowing as a subsampling method for Distributed Data Mining

David Mart´ ınez-Galicia

Director: Alejandro Guerra-Hern´ andez Co-directors: Nicandro Cruz-Ram´ ırez, Xavier Lim´

  • n

Universidad Veracruzana Centro de investigaci´

  • n en Inteligencia Artificial

Sebast´ ıan Camacho No. 5 Xalapa, Veracruz, Mexico (91000)

David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 1 / 52

slide-2
SLIDE 2

Introduction

Data Mining (DM) consists of applying analysis algorithms that produce models to predict or describe the data [1]. David Martinez Galicia

Knowledge Patterns Data Transformed Data Preprocessed Data Target Data Selection Preprocessing Transformation Data Mining Interpretation/ Evaluation

Figure: Knowledge Discovery on Databases (KDD) process.

David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 2 / 52

slide-3
SLIDE 3

Introduction

Distributed Data Mining (DDM) concerns the application of DM procedures trying to optimize the available resources in distributed environments [2].

Knowledge Patterns Site 2 Site N

...

Site 1 Transformed Data Distributed Data Mining Interpretation/ Evaluation

...

Figure: Distributed Data Mining (DDM).

David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 3 / 52

slide-4
SLIDE 4

Scope

This work studies three points necessary to adopt Windowing as a subsampling technique in distributed environments:

1 Method generalization. 2 Sub-sampling characterization. 3 Model description. David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 4 / 52

slide-5
SLIDE 5

Windowing

Technique proposed by John Quinlan that induces models from large datasets selecting a small sample from the training instances [3].

Training examples Window Remaining examples Model Counter examples? Stop

Counter examples Subsample Evaluation Yes No Induce

Figure: Windowing diagram.

David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 5 / 52

slide-6
SLIDE 6

Related Work I

  • J. Quinlan based his research in the hypothesis that it is possible to

generate an accurate decision tree to explain a large dataset, even when a small part of the examples is selected for induction [3].

David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 6 / 52

slide-7
SLIDE 7

Related Work II

  • J. Wirth and J. Catlett publish an early critic [4] about the costs of

Windowing where they suggest avoiding its use in noisy domains because it considerably increases the CPU requirements.

David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 7 / 52

slide-8
SLIDE 8

Related Work III

  • J. F¨

urnkranz focused his research in new mechanisms to optimize the time convergence, the levels of accuracy and the performance in noisy domains [5].

David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 8 / 52

slide-9
SLIDE 9

Related Work IV

  • X. Lim´
  • n et al. introduce a new framework for DDM, where they

propose different Windows-based strategies that are capable to perform aggressive samplings [6].

David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 9 / 52

slide-10
SLIDE 10

Hypothesis

Windowing exhibits consistent behavior through the use of different Machine Learning models in DDM scenarios, i.e., models with high levels of accuracy are induced from small samples. In these scenarios, it is possible to obtain gains in terms of performance, model complexity and data compression, against traditional sub-sampling methods.

David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 10 / 52

slide-11
SLIDE 11

Objectives

General objective: Studying the behavior of Windowing through the use of different Machine Learning models. Specific objectives:

1

Measuring the correlation between the model accuracy and the percentage

  • f instances.

2

Suggesting metrics that measure informational features to compare the samples and the induced models.

3

Comparing Windowing with other sub-sampling techniques to observe the advantages of its use.

4

Characterizing the operation of this technique on different types of datasets.

5

Providing a wide description about Windowing behavior and the best conditions to make use of it.

David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 11 / 52

slide-12
SLIDE 12

Justification I

Johannes F¨ urnkranz [7] has argued that this method offers three advantages:

1 It copes well with memory limitations, reducing considerably the

number of examples to induce a model of acceptable accuracy.

2 It offers an efficiency gain by reducing the time of convergence,

especially when using a rule learning algorithm, as Foil.

3 It offers an accuracy gain, particularly in noiseless datasets, possibly

because learning from a sample may result in a less over-fitting theory.

David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 12 / 52

slide-13
SLIDE 13

Justification II

Articles related to JaCa-DDM [8, 6] have shown:

1 A strong correlation between the accuracy of the learned Decision

Trees and the percentage of examples used to induce them.

2 The performed reductions are as big as the 90% of the available

training examples.

David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 13 / 52

slide-14
SLIDE 14

Contributions

1 The empirical evidence that the use of Windowing can be generalized

to other Machine Learning algorithms.

2 A methodology that involves different Theory Information metrics to

characterize the data transformation performed by a sampling.

3 The implementation of the proposed metrics available in a digital

  • repository. 1

4 Two papers as result of our participation in MICAI.

Windowing as a Sub-Sampling Method for Distributed Data Mining. Mathematical and Computational Applications, 25(3), 39. MDPI AG. Towards Windowing as a Sub-Sampling Method for Distributed Data

  • Mining. Research in Computing Science Journal. In press.

1https://github.com/DMGalicia/Thesis-Windowing David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 14 / 52

slide-15
SLIDE 15

Methodology

The methodological design of this work includes 3 experiments to study:

1 The Windowing generalization. 2 The sample characterization (comparison with traditional samplings). 3 The study of the evolution of the windows.

JaCa-DDM 2 is adopted to run the experiments.

2https://github.com/xl666/jaca-ddm David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 15 / 52

slide-16
SLIDE 16

Counter Strategy

JaCa-DDM defines a set of Windowing-based strategies using J48, the Weka implementation [9] of C4.5. Due to the great similarity with the Windowing’s original formulation, the Counter strategy is selected.

Node1

Window Model

Node2 Worker2 Node3 Worker3 Nodej Workerj CP1 Worker1

. . .

Figure: Counter strategy

David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 16 / 52

slide-17
SLIDE 17

Datasets

Experiments are tested on 15 datasets selected from the UCI [10] and MOA [11] repositories.

Dataset #Instances #Attributes

  • Attrib. Type

Missing Val. #Classes Adult 48842 15 Mixed Yes 2 Australian 690 15 Mixed No 2 Breast 683 10 Numeric No 2 Diabetes 768 9 Mixed No 2 Ecoli 336 8 Numeric No 8 German 1000 21 Mixed No 2 Hypothyroid 3772 30 Mixed Yes 4 Kr-vs-kp 3196 37 Numeric No 2 Letter 20000 17 Mixed No 26 Mushroom 8124 23 Nominal Yes 2 Poker-lsn 829201 11 Mixed No 10 Segment 2310 20 Numeric No 7 Sick 3772 30 Mixed Yes 2 Splice 3190 61 Nominal No 3 Waveform5000 5000 41 Numeric No 3 David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 17 / 52

slide-18
SLIDE 18

On Windowing generalization I

This experiment seeks to: Corroborate the correlation reported in literature. Provide evidence about the generalization of Windowing. Characterize the sampling with informational properties. Decision trees (j48) and other 4 Weka models are induced by running a 10-fold stratified cross-validation on each dataset.

David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 18 / 52

slide-19
SLIDE 19

On Windowing generalization II

Weka algorithms:

Naive Bayes: A probabilistic classifier based on Bayes’ theorem [12]. jRip: An inductive rule learner based on RIPPER [13]. Multilayer-Perceptron: A perceptron trained by backpropagation [14]. SMO: An implementation for training a support vector classifier [15].

In order to measure the performance of models, their accuracy is defined as the percentage of correctly classified instances: TP + TN TP + FP + TN + FN (1)

David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 19 / 52

slide-20
SLIDE 20

On Windowing generalization III

Kullback-Leibler divergence (DKL) [16] is defined as: DKL(PDS PWindow) =

  • c∈Class

PDS(c) log2

  • PDS(c)

PWindow(c)

  • (2)

Sim1 [17] is a similarity measure between datasets defined as: sim1(Window, DS) = |Item(Window) ∩ Item(DS)| |Item(Window) ∪ Item(DS)| (3) Red [18] measures redundancy in a dataset in terms of conditional population entropy (CPE): Red = 1 − CPE

  • a∈Attrs

log2 |dom(a)| (4)

David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 20 / 52

slide-21
SLIDE 21

Results: Generalization I

Classifier: j48 Correlation: -0.985

Percentage of training instances

0.0 0.2 0.4 0.6 0.8 60 70 80 90 100

Classifier: Naive bayes Correlation: -0.966

Accuracy

60 70 80 90 100

Classifier: jRip Correlation: -0.980

60 70 80 90 100

Correlation: -0.984

0.0 0.2 0.4 0.6 0.8

Correlation: -0.991

Adult Australian Breast Diabetes Ecoli German Hypothyroid Kr-vs-kp Letter Mushroom Poker-lsn Segment Sick Splice Waveform5000

Classifier: Multilayer-Perceptron Classifier: SMO David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 21 / 52

slide-22
SLIDE 22

Results: Generalization II

Classifier: j48 Correlation: -0.494

Percentage of training instances

0.0 0.2 0.4 0.6 0.8 0.0 0.1 0.2 0.3 0.4 0.5 0.6

Classifier: Naive bayes Correlation: -0.174

KL divergence

0.0 0.1 0.2 0.3 0.4 0.5 0.6

Classifier: jRip Correlation: -0.439

0.0 0.1 0.2 0.3 0.4 0.5 0.6

Classifier: Multilayer-Perceptron

0.0 0.2 0.4 0.6 0.8

Classifier: SMO

Adult Australian Breast Diabetes Ecoli German Hypothyroid Kr-vs-kp Letter Mushroom Poker-lsn Segment Sick Splice Waveform5000

Correlation: -0.275 Correlation: -0.115 David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 22 / 52

slide-23
SLIDE 23

Results: Generalization III

Classifier: j48 Correlation: 0.184

Percentage of training instances

0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 1.0

Classifier: Naive bayes Correlation: 0.054

Sim1

0.0 0.2 0.4 0.6 0.8 1.0

Classifier: jRip Correlation: 0.274

0.0 0.2 0.4 0.6 0.8 1.0

Classifier: Multilayer-Perceptron

0.0 0.2 0.4 0.6 0.8

Classifier: SMO

Adult Australian Breast Diabetes Ecoli German Hypothyroid Kr-vs-kp Letter Mushroom Poker-lsn Segment Sick Splice Waveform5000

Correlation: 0.122 Correlation: -0.017 David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 23 / 52

slide-24
SLIDE 24

Results: Generalization IV

Classifier: j48 Correlation: -0.256

Percentage of training instances

0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 1.0

Classifier: Naive bayes Correlation: 0.312

Red

0.0 0.2 0.4 0.6 0.8 1.0

Classifier: jRip Correlation: -0.105

0.0 0.2 0.4 0.6 0.8 1.0

Classifier: Multilayer-Perceptron Correlation: -0.278

0.0 0.2 0.4 0.6 0.8

Classifier: SMO Correlation: -0.206

Adult Australian Breast Diabetes Ecoli German Hypothyroid Kr-vs-kp Letter Mushroom Poker-lsn Segment Sick Splice Waveform5000 David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 24 / 52

slide-25
SLIDE 25

Comparing Windowing with subsampling techniques I

This experiment seeks to: Obtain a deeper understanding of the informational properties of the computed models, as well as those of the samples. Compare Windowing with traditional sampling techniques. For this, decision trees (j48) are adopted as classifiers.

David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 25 / 52

slide-26
SLIDE 26

Comparing Windowing with subsampling techniques II

The Area Under the ROC Curve (AUC) defined as the probability of a random instance to be correctly classified: AUC = 1 2

  • TP

TP + FN + TN TN + FP

  • (5)

The Minimum Description Length (MDL) defined the sum of the length of the model L(H), and the length of the data when encoded using the theory as a predictor for the data L(D|H) [19]: MDL = L(H) + L(D|H) (6)

David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 26 / 52

slide-27
SLIDE 27

Comparing Windowing with subsampling techniques III

The metrics are used to compare the window and the model computed by Windowing, against those obtained as follows:

Without sampling, using all the available data to induce the model. By Random sampling, using samples of the size of the windows. By Stratified random sampling, using samples of the size of the windows. By Balanced random sampling, using samples of the size of the windows.

10 repetitions of 10-fold stratified cross-validation are run on each dataset.

David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 27 / 52

slide-28
SLIDE 28

Statistical Test

The comparison of A algorithms on D datasets is realized following the method proposed by Demˇ sar[20]. It is based on the use of the Friedman[21, 22] test with a corresponding post-hoc test (Nemenyi test). The null-hypothesis states that if the performance of the algorithms is similar, their ranks should be equal. Ra = 1 D

  • d∈D

Rd

a

(7)

David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 28 / 52

slide-29
SLIDE 29

Statistics

Friedman Iman and Davenport χ2

F = 12D A(A+1)

  • a R2

a − A(A+1)2 4

  • Ff =

(D−1)×χ2

F

D×(A−1)−χ2

F

Distributed according to χ2

F with

A − 1 degrees of freedom. Distributed according to the F-distribution with A − 1 and (A − 1)(D − 1) degrees of freedom. If the null hypothesis of similar performances is rejected, then the Nemenyi post-hoc test is realized for pairwise comparisons.

David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 29 / 52

slide-30
SLIDE 30

Post-hoc Test

The performance of two classifiers is significantly different if their corresponding average ranks differ by at least the critical difference: CD = qα

  • A(A + 1)

6D (8) Critical values qα are based on the studentized range divided by √ 2. Results can be visually represented with a Critical Difference diagram.

CD 1 2 3 4 Alg1 Alg2 Alg3 Alg4

David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 30 / 52

slide-31
SLIDE 31

Results: Accuracy

60 70 80 90 100

Diabetes

W CV RS SS BS

60 70 80 90 100

Accuracy

German

W CV RS SS BS

60 70 80 90 100

Mushroom

W CV RS SS BS

60 70 80 90 100

Waveform-5000

W CV RS SS BS 5 4 3 2 1 Full_Dataset Windowing Stratified_Sampling Balanced_Sampling Random_Sampling C D

Critical Difference Diagram

FD FD FD FD

slide-32
SLIDE 32

Results: Area Under the ROC curve

60 70 80 90 100

Diabetes

W CV RS SS BS

60 70 80 90 100

Area Under the ROC Curve

German

W CV RS SS BS

60 70 80 90 100

Mushroom

W CV RS SS BS

60 70 80 90 100

Waveform-5000

W CV RS SS BS

Critical Difference Diagram

C D 5 4 3 2 1 Full_Dataset Windowing Stratified_Sampling Balanced_Sampling Random_Sampling FD FD FD FD David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 32 / 52

slide-33
SLIDE 33

Results: Model complexity

Model Complexity

Model Complexity

Diabetes

W CV RS SS BS

50 100 150 200

Mushroom

W CV RS SS BS

25 50 75 100

Critical Difference Diagram

150 200 250 300 350

German Waveform-5000

W CV RS SS BS

1500 2500 3500

W CV RS SS BS 5 4 3 2 1 Full_Dataset Windowing Stratified_Sampling Balanced_Sampling Random_Sampling C D FD FD FD FD David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 33 / 52

slide-34
SLIDE 34

Results: Data compression

Data compression

Data Compression

Diabetes

W CV RS SS BS

50 60 70 80 90

Mushroom

W CV RS SS BS

200 400 600 800 60 70 80 90 100

German Waveform-5000

W CV RS SS BS

300 350 400 450 500

Critical Difference Diagram

5 4 3 2 1 Full_Dataset

Windowing Stratified_Sampling Balanced_Sampling Random_Sampling C D W CV RS SS BS FD FD FD FD

slide-35
SLIDE 35

Results: Minimum Description Length

Minimum Description Length

Minimum Description Length

Diabetes

W CV RS SS BS

80 120 160 200 240

Mushroom

W CV RS SS BS

100 300 500 700 900 250 300 350 400 450

German

W CV RS SS BS

Waveform-5000

W CV RS SS BS

2000 2750 3500

Critical Difference Diagram

C D Full_Dataset Windowing Stratified_Sampling Balanced_Sampling Random_Sampling 2 3 4 5 FD FD FD FD David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 35 / 52

slide-36
SLIDE 36

Results: KL Divergence

KL divergence

KL divergence

Diabetes

W CV RS SS BS

0.00 0.04 0.08 1 2 3 4

Mushroom

W CV RS SS BS

German

W CV RS SS BS

0.000 0.050 0.100 0.000 0.002 0.004

Waveform-5000

W CV RS SS BS

Critical Difference Diagram

5 4 3 2 1 Full_Dataset

Windowing Stratified_Sampling Balanced_Sampling Random_Sampling C D FD FD FD FD

David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 36 / 52

slide-37
SLIDE 37

Results: Sim1

Sim1

Diabetes

W CV RS SS BS

0.00 0.50 1.00

Mushroom

W CV RS SS BS

0.00 0.50 1.00

German

W CV RS SS BS

0.00 0.50 1.00

Waveform-5000

W CV RS SS BS

0.00 0.50 1.00

Critical Difference Diagram

1 2 3 4 C D Full_Dataset

Windowing Random_Sampling Balanced_Sampling Stratified_Sampling FD FD FD FD

slide-38
SLIDE 38

Window evolution over time

This experiment aims to yield a full description about the evolution of the windows and their effects on the model. For this: Counter was modify in order to save the window evolution. A 10-fold stratified cross-validation is run by every dataset. Metrics in experiments A and B were calculated every iteration. Decision trees (j48) are adopted as classifiers.

David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 38 / 52

slide-39
SLIDE 39

Results: Evolution of performance

81.95 85.8 88.1 88.05 1 2 3 4 5 6 7

Accuracy

50 60 70 80 90 100

Hypothyroid (Unbalanced)

1 2 3 4 98.14 99.73 98.94 99.73 99.73

Sick (Unbalanced)

93.9 96.82 98.14 99.47 99.2 1 2 3 4

Letter(Balanced)

90.62 92.62 93.81 93.79 1 2 3 4 5 6 7

AUC

Iterations

50 60 70 80 90 100

Iterations

1 2 3 4 99.02 99.86 97.86 99.86 99.86 50 90.17 90.88 95.65 93.48

Iterations

1 2 3 4

David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 39 / 52

slide-40
SLIDE 40

Results: Evolution of MDL

1 2 3 4 5 6 7

Letter

Bits

Iterations

100 3600 7100 10600 14100 1 2 3 4

Hypothyroid

20 45 70 95 120

Iterations

  • - L(H)
  • - L(D|H)
  • - MDL

1 2 3 4 50 100 150 200

Sick Iterations

David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 40 / 52

slide-41
SLIDE 41

Results: Evolution of the class distribution

0.01 0.01 0.01 0.01 1 2 3 4 5 6 7

  • St. Dv. class distribution

Iterations

0.0 0.2 0.4 0.6 0.8

Iterations

1 2 3 4 0.48 0.33 0.29 0.27 0.26 0.66 0.34 0.31 0.29 0.3

Iterations

1 2 3 4

Letter (Balanced)

0.03 0.03 0.03 0.03 1 2 3 4 5 6 7

KL divergence

0.0 0.2 0.4

Hypothyroid (Unbalanced)

1 2 3 4 0.26 0.02 0.21 0.29 0.36

Sick (Unbalanced)

0.01 0.2 0.24 0.26 0.26 1 2 3 4

David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 41 / 52

slide-42
SLIDE 42

Results: Iterations vs. Accuracy

2 4 6 8 60 70 80 90 100

Accuracy Iterations Correlation coeffiecient: -0.563 AUC

1 2 3 4 5 6

Iterations Correlation coeffiecient: -0.508 Adult Australian Breast Credit-g Diabetes Ecoli German Hypothyroid Kr-vs-kp Letter Mushroom Segment Sick Splice Waveform-5000 David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 42 / 52

slide-43
SLIDE 43

Conclusions

Counter, the Windowing-based learning strategy, not only supplies a natural workflow for distributed scenarios, but it also offers some benefits: A homogeneous behavior beyond decision trees. It allows the induction of accurate models while performing an aggressive sampling. The determination of an appropriate sample size. This problem is often tackled most of the time by trial and error. Decision trees with better data compression. Models tend to be larger but more accurate than traditional samplings. Samples with more balanced class distributions. This behavior is restricted by the number of instances and their relevance.

David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 43 / 52

slide-44
SLIDE 44

Future Work

This work suggests future lines of research on Windowing, including:

1 Optimizing the search model process. 2 Adopting metrics for detecting relevant data.

PhD proposal: Detection of noisy, redundant, and relevant data to improve the Windowing performance. Maillo et al. [23] review multiple metrics to describe redundancy, complexity, and density of a problem and also propose two data big metrics.

3 Dealing with datasets of higher number of dimensions. David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 44 / 52

slide-45
SLIDE 45

References I

  • U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, “The kdd process for extracting

useful knowledge from volumes of data,” Commun. ACM, vol. 39, p. 27–34, Nov. 1996.

  • L. Zeng, L. Li, L. Duan, K. Lu, Z. Shi, M. Wang, W. Wu, and P. Luo, “Distributed

data mining: A survey,” Information Technology and Management, vol. 13, 12 2012.

  • J. R. Quinlan, “Induction over large data bases,” Tech. Rep. STAN-CS-79-739,

Computer Science Department, School of Humanities and Sciences, Stanford University, Stanford, CA, USA, 5 1979.

  • J. Wirth and J. Catlett, “Experiments on the costs and benefits of windowing in

ID3,” in Machine Learning, Proceedings of the Fifth International Conference on Machine Learning, Ann Arbor, Michigan, USA, June 12-14, 1988 (J. E. Laird, ed.),

  • pp. 87–99, Morgan Kaufmann, 1988.

David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 45 / 52

slide-46
SLIDE 46

References II

  • J. F¨

urnkranz, “More efficient windowing,” in Proceedings of the Fourteenth National Conference on Artificial Intelligence and Ninth Innovative Applications of Artificial Intelligence Conference, AAAI 97, IAAI 97, July 27-31, 1997, Providence, Rhode Island, USA (B. Kuipers and B. L. Webber, eds.), pp. 509–514, AAAI Press / The MIT Press, 1997.

  • X. Lim´
  • n, A. Guerra-Hern´

andez, N. Cruz-Ram´ ırez, and F. Grimaldo, “Modeling and implementing distributed data mining strategies in JaCa-DDM,” Knowledge and Information Systems, vol. 60, no. 1, pp. 99–143, 2019.

  • J. F¨

urnkranz, “Integrative windowing,” Journal of Artificial Intelligence Research,

  • vol. 8, pp. 129–164, 1998.
  • X. Lim´
  • n, A. Guerra-Hern´

andez, N. Cruz-Ram´ ırez, H. G. Acosta-Mesa, and

  • F. Grimaldo, “A windowing strategy for distributed data mining optimized through

GPUs,” Pattern Recognition Letters, vol. 93, pp. 23–30, 7 2017.

  • I. H. Witten, E. Frank, and M. A. Hall, Data Mining: Practical Machine Learning

Toools and Techniques. Burlington, MA., USA: Morgan Kaufmann Publishers, 2011.

David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 46 / 52

slide-47
SLIDE 47

References III

  • D. Dua and C. Graff, “UCI machine learning repository,” 2017.
  • A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer, “Moa: Massive online analysis,”

Journal of Machine Learning Research, vol. 11, no. May, pp. 1601–1604, 2010.

  • G. H. John and P. Langley, “Estimating continuous distributions in bayesian

classifiers,” in Eleventh Conference on Uncertainty in Artificial Intelligence, (San Mateo), pp. 338–345, Morgan Kaufmann, 1995.

  • W. W. Cohen, “Fast effective rule induction,” in Twelfth International Conference
  • n Machine Learning, pp. 115–123, Morgan Kaufmann, 1995.
  • D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning Internal

Representations by Error Propagation, p. 318–362. Cambridge, MA, USA: MIT Press, 1986.

  • J. Platt, “Fast training of support vector machines using sequential minimal
  • ptimization,” in Advances in Kernel Methods - Support Vector Learning

(B. Schoelkopf, C. Burges, and A. Smola, eds.), MIT Press, 1998.

David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 47 / 52

slide-48
SLIDE 48

References IV

  • S. Kullback and R. A. Leibler, “On information and sufficiency,” The annals of

mathematical statistics, vol. 22, no. 1, pp. 79–86, 1951.

  • S. Zhang, C. Zhang, and X. Wu, Knowledge Discovery in Multiple Databases.

Advanced Information and Knowledge Processing, London, UK: Springer-Verlag London, Limited, 2004.

  • M. Møller, “Supervised learning on large redundant training sets,” International

Journal of Neural Systems, vol. 4, no. 1, pp. 15–25, 1993.

  • J. Rissanen, “Stochastic complexity and modeling,” The Annals of Statistics,
  • vol. 14, pp. 1080–1100, 1986.
  • J. Demˇ

sar, “Statistical comparisons of classifiers over multiple data sets,” J. Mach.

  • Learn. Res., vol. 7, p. 1–30, 2006.
  • M. Friedman, “The use of ranks to avoid the assumption of normality implicit in

the analysis of variance,” Journal of the American Statistical Association, vol. 32,

  • no. 200, pp. 675–701, 1937.

David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 48 / 52

slide-49
SLIDE 49

References V

  • M. Friedman, “A comparison of alternative tests of significance for the problem of

m rankings,” Ann. Math. Statist., vol. 11, pp. 86–92, 03 1940.

  • J. Maillo, I. Triguero, and F. Herrera, “Redundancy and complexity metrics for big

data classification: Towards smart data,” IEEE Access, vol. 8, pp. 87918–87928, 2020.

David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 49 / 52

slide-50
SLIDE 50

Conditional Population Entropy

CPE = −

nc

  • i=1

p(ci)

na

  • a=1

nva

  • v=1

p(xa,v|ci) · log2p(xa,v|ci) Where: nc is the number of classes, na is the number of attributes. nva is the number of values for the attribute a. ci stands for the i − th class. xa,v represents the v − th value of attribute a.

David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 50 / 52

slide-51
SLIDE 51

Counter configuration

Parameter Value Maximum number of rounds 10 - 15 Initial percentage for the window 0.20 Validation percentage for the test 0.25 Change step of accuracy every round 0.35

David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 51 / 52

slide-52
SLIDE 52

Auto-adjust stop procedure

The changeStep parameter defines a threshold. If the accuracy of the current model compared with the accuracy of the previous model surpasses this parameter, then other round is computed, otherwise, the process stops.

David Mart´ ınez-Galicia (UV) Thesis presentation August 21, 2020 52 / 52