1
CS 472 - Machine Learning
Projects Data Representation Basic testing and evaluation schemes
CS 472 – Data and Testing
CS 472 - Machine Learning Projects Data Representation Basic - - PowerPoint PPT Presentation
CS 472 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 472 Data and Testing 1 Programming Your Project Models l Program in Python, the most popular language for ML NumPy Great with arrays,
1
CS 472 – Data and Testing
CS 472 – Data and Testing 2
CS 472 – Data and Testing 3
l Can also explode the variable into n-1 input nodes where the most
CS 472 – Data and Testing 4
CS 472 – Data and Testing 5
CS 472 – Data and Testing 6
CS 472 – Data and Testing 7
l Name of relation (Data Set) l List of attributes and domains
l Actual instances or rows of the relation
CS 472 – Data and Testing 8
% 1. Title: Pizza Database % 2. Sources: % (a) Creator: BYU CS 472 Class… % (b) Statistics about the features, etc. @RELATION Pizza @ATTRIBUTE Weight CONTINUOUS @ATTRIBUTE Crust {Pan, Thin, Stuffed} @ATTRIBUTE Cheesiness CONTINUOUS @ATTRIBUTE Meat {True, False} @ATTRIBUTE Quality {Good, Great} @DATA .9, Stuffed, 99, True, Great .1, Thin, 2, False, Good ?, Thin, 60, True, Good .6, Pan, 60, True, Great
CS 472 – Data and Testing 9
CS 472 – Data and Testing 10
CS 472 – Data and Testing 11
l Static split test set CV l Random split test set CV l N-fold cross-validation
CS 472 – Data and Testing 12
CS 472 – Data and Testing 13
l One is used for learning/training (i.e., inducing a model), and l One is used exclusively for testing
CS 472 – Data and Testing 14
l Instances are randomly assigned to either set l The distribution of instances (with respect to the target class) is hopefully
l Typically 60% to 90% of instances is used for training and the remainder for
l Could get a lucky or unlucky test set
CS 472 – Data and Testing 15
l Let Mk be the model induced from D - Sk l Let ak be the accuracy of Mk on the instances of the test
CS 472 – Data and Testing 16
CS 472 – Data and Testing 17
CS 472 – Data and Testing 18
CS 472 – Data and Testing 19