an exercise in an exercise in machine learning machine
play

An Exercise in An Exercise in Machine Learning Machine Learning - PowerPoint PPT Presentation

An Exercise in An Exercise in Machine Learning Machine Learning http://www.cs.iastate.edu/~cs573x/bbsilab.html Machine Learning Software Preparing Data Building Classifiers Interpreting Results Machine Learning Software


  1. An Exercise in An Exercise in Machine Learning Machine Learning http://www.cs.iastate.edu/~cs573x/bbsilab.html • Machine Learning Software • Preparing Data • Building Classifiers • Interpreting Results

  2. Machine Learning Software Machine Learning Software � Suites (General Purpose) Suites (General Purpose) � � WEKA WEKA (Source: Java) (Source: Java) � � MLC++ MLC++ (Source: C++) (Source: C++) � � SAS SAS � � List from List from KDNuggets KDNuggets (Various) (Various) � � Specific Specific � � Classification: C5.0, Classification: C5.0, SVMlight SVMlight � � Association Rule Mining Association Rule Mining � � Bayesian Net Bayesian Net … … … … � � Commercial vs. Free vs. Programming Commercial vs. Free vs. Programming �

  3. What does WEKA do? What does WEKA do? � Implementation of state Implementation of state- -of of- -art learning art learning � algorithm algorithm � Main strengths in the classification Main strengths in the classification � � Regression, Association Rules and clustering Regression, Association Rules and clustering � algorithms algorithms � Extensible to try new learning schemes Extensible to try new learning schemes � � Large variety of handy tools (transforming Large variety of handy tools (transforming � datasets, filters, visualization etc … ) datasets, filters, visualization etc … )

  4. WEKA resources WEKA resources � API Documentation, Tutorial, Source code. API Documentation, Tutorial, Source code. � � WEKA mailing list WEKA mailing list � � Data Mining: Practical Machine Learning Tools and Data Mining: Practical Machine Learning Tools and � Techniques with Java Implementations Techniques with Java Implementations � Weka Weka- -related Projects: related Projects: � � Weka Weka- -Parallel Parallel - - parallel processing for parallel processing for Weka Weka � � RWeka RWeka - - linking R and linking R and Weka Weka � � YALE YALE - - Yet Another Learning Environment Yet Another Learning Environment � � Many others Many others … … �

  5. Getting Started Getting Started � Installation (Java runtime +WEKA) Installation (Java runtime +WEKA) � � Setting up the environment ( Setting up the environment ( CLASSPATH ) CLASSPATH ) � � Reference Book and online API document Reference Book and online API document � � Preparing Data sets Preparing Data sets � � Running WEKA to build classifiers Running WEKA to build classifiers � � Interpreting Results Interpreting Results �

  6. ARFF Data Format ARFF Data Format � Attribute Attribute- -Relation File Format Relation File Format � � Header Header – – describing the attribute describing the attribute � types types � Data Data – – (instances, examples) (instances, examples) � comma- -separated list separated list comma � Use the right data format: Use the right data format: � � ARFF format � Filestem Filestem, CSV , CSV � ARFF format � � Use Use C45Loader C45Loader and and CSVLoader CSVLoader to to � convert convert

  7. Launching WEKA Launching WEKA

  8. Load Dataset into WEKA Load Dataset into WEKA

  9. Data Filters Data Filters � Useful support for data preprocessing Useful support for data preprocessing � � Removing or adding attributes, Removing or adding attributes, resampling resampling � the dataset, removing examples, etc. the dataset, removing examples, etc. � Creates stratified cross Creates stratified cross- -validation folds of the validation folds of the � given dataset, and class distributions are given dataset, and class distributions are approximately retained within each fold. approximately retained within each fold. � Typically split data as 2/3 in training and 1/3 Typically split data as 2/3 in training and 1/3 � in testing in testing

  10. Building Classifiers Building Classifiers � A classifier model A classifier model - - mapping from dataset mapping from dataset � attributes to the class (target) attribute. attributes to the class (target) attribute. Creation and form differs. Creation and form differs. � Decision Tree and Na Decision Tree and Na ï ve Bayes Bayes Classifiers Classifiers ï ve � � Which one is the better? Which one is the better? � � No Free Lunch! No Free Lunch! �

  11. Building Classifier Building Classifier

  12. (1) weka.classifiers.rules.ZeroR weka.classifiers.rules.ZeroR (1) � Building and using a 0 Building and using a 0- -R classifier. Predicts the R classifier. Predicts the � mean (for a numeric class) or the mode (for a mean (for a numeric class) or the mode (for a nominal class). nominal class). (2) weka.classifiers.bayes.NaiveBayes weka.classifiers.bayes.NaiveBayes (2) � Class for building a Naive Bayesian classifier Class for building a Naive Bayesian classifier �

  13. (3) weka.classifiers.trees.J48 (3) weka.classifiers.trees.J48 � Class for generating an Class for generating an � unpruned or a pruned or a pruned unpruned C4.5 decision tree. C4.5 decision tree.

  14. Test Options Test Options � Percentage Split (2/3 Training; 1/3 Testing) Percentage Split (2/3 Training; 1/3 Testing) � � Cross Cross- -validation validation � � Estimating the generalization error based on Estimating the generalization error based on � resampling when limited data; averaged error when limited data; averaged error resampling estimate. estimate. � Stratified 10 Stratified 10- -fold fold � � Leave Leave- -one one- -out ( out (Loo Loo) ) � � 10 10- -fold vs. fold vs. Loo Loo �

  15. Understanding Output Understanding Output

  16. Decision Tree Output (1) Decision Tree Output (1) === Error on training data === === Error on training data === J48 pruned tree J48 pruned tree ------------------ ------------------ Correctly Classified Instance 14 100 % Correctly Classified Instance 14 100 % Incorrectly Classified Instances 0 0 % Incorrectly Classified Instances 0 0 % Kappa statistic 1 Kappa statistic 1 outlook = sunny outlook = sunny Mean absolute error 0 Mean absolute error 0 | humidity <= 75: yes (2.0) | humidity <= 75: yes (2.0) Root mean squared error 0 Root mean squared error 0 | humidity > 75: no (3.0) | humidity > 75: no (3.0) Relative absolute error 0% Relative absolute error 0% Root relative squared error 0% Root relative squared error 0% outlook = overcast: yes (4.0) outlook = overcast: yes (4.0) Total Number of Instances 14 Total Number of Instances 14 outlook = rainy outlook = rainy | windy = TRUE: no (2.0) | windy = TRUE: no (2.0) === Detailed Accuracy By Class === === Detailed Accuracy By Class === TP FP Precision Recall F- -Measure Class Measure Class TP FP Precision Recall F | windy = FALSE: yes (3.0) | windy = FALSE: yes (3.0) 1 0 1 1 1 yes 1 0 1 1 1 yes 1 0 1 1 1 no 1 0 1 1 1 no Number of Leaves : 5 Number of Leaves : 5 === Confusion Matrix === === Confusion Matrix === a b <-- -- classified as classified as a b < Size of the tree : 8 Size of the tree : 8 0 | a = yes 0 | a = yes 9 9 0 5 | b = no 0 5 | b = no 10 10

  17. Decision Tree Output (2) Decision Tree Output (2) === Stratified cross- -validation === validation === === Stratified cross Correctly Classified Instances 9 64.2857 % Correctly Classified Instances 9 64.2857 % Incorrectly Classified Instances 5 35.7143 % Incorrectly Classified Instances 5 35.7143 % Kappa statistic 0.186 Kappa statistic 0.186 Mean absolute error 0.2857 Mean absolute error 0.2857 Root mean squared error 0.4818 Root mean squared error 0.4818 Relative absolute error 60% Relative absolute error 60% Root relative squared error 97.6586 % Root relative squared error 97.6586 % Total Number of Instances 14 Total Number of Instances 14 === Detailed Accuracy By Class === === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F- -Measure Class Measure Class TP Rate FP Rate Precision Recall F 0.778 0.6 0.7 0.778 0.737 yes 0.778 0.6 0.7 0.778 0.737 yes 0.4 0.222 0.5 0.4 0.444 no 0.4 0.222 0.5 0.4 0.444 no === Confusion Matrix === === Confusion Matrix === a b <-- -- classified as classified as a b < 7 2 | a = yes 7 2 | a = yes 3 2 | b = no 3 2 | b = no

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend