course content
play

Course Content Week 5 (April 7) and Week 6 (April 14) Introduction - PDF document

Lecture 4 Course Content Week 5 (April 7) and Week 6 (April 14) Introduction to Data Mining 33459-01 Principles of Knowledge Discovery in Data Association analysis Sequential Pattern Analysis Classification: Neural Networks,


  1. Lecture 4 Course Content Week 5 (April 7) and Week 6 (April 14) • Introduction to Data Mining 33459-01 Principles of Knowledge Discovery in Data • Association analysis • Sequential Pattern Analysis Classification: Neural Networks, • Classification and prediction Naïve Bayesian Classification, • Contrast Sets k-Nearest Neighbors, Decision • Data Clustering Trees & Associative Classifiers • Outlier Detection Lecture by: Dr. Osmar R. Zaïane • Web Mining 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 1 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 2 (Dr. O. Zaiane) (Dr. O. Zaiane) What is Classification? Classification = Learning a Model The goal of data classification is to organize and Training Set (labeled) categorize data in distinct classes. A model is first created based on the data distribution. The model is then used to classify new data. Given the model, a class can be predicted for new data. Classification Model With classification, I can predict in which bucket to put the ball, but I can’t predict the weight of the ball. ? … New unlabeled data Labeling=Classification 1 2 3 4 n 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 3 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 4 (Dr. O. Zaiane) (Dr. O. Zaiane) Classification is a three-step Classification is a three-step process process 1. Model construction ( Learning ): 2. Model Evaluation ( Accuracy ): • Each tuple is assumed to belong to a predefined class, as Estimate accuracy rate of the model based on a test set . determined by one of the attributes, called the class label . – The known label of test sample is compared with the • The set of all tuples used for construction of the model is classified result from the model. called training set . – Accuracy rate is the percentage of test set samples that • The model is represented in the following forms: are correctly classified by the model. – Test set is independent of training set otherwise over- • Classification rules, (IF-THEN statements), fitting will occur. • Decision tree • Mathematical formulae 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 5 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 6 (Dr. O. Zaiane) (Dr. O. Zaiane)

  2. Classification is a three-step Classification with Holdout process Derive 3. Model Use ( Classification ): Training Estimate Classifier The model is used to classify unseen objects. Data Accuracy (Model) • Give a class label to a new tuple Data • Predict the value of an actual attribute Testing Data •Holdout •Random sub-sampling •K-fold cross validation •Bootstrapping • … 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 7 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 8 (Dr. O. Zaiane) (Dr. O. Zaiane) 1. Classification Process 2. Classification Process (Learning) (Accuracy Evaluation) Classification Algorithms Training Classifier Testing (Model) Data Data Name Income Age Credit rating Classifier Bruce Low <30 bad (Model) Name Income Age Credit rating How accurate is the model? Tom Medium <30 bad Dave Medium [30..40] good Jane High <30 bad William High <30 good IF Income = ‘High’ Wei High >40 good IF Income = ‘High’ Marie Medium >40 good OR Age > 30 Hua Medium [30..40] good OR Age > 30 THEN CreditRating = ‘Good’ Anne Low [30..40] good THEN CreditRating = ‘Good’ Chris Medium <30 bad 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 9 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 10 (Dr. O. Zaiane) (Dr. O. Zaiane) 3. Classification Process Improving Accuracy (Classification) Classifier 1 Classifier 2 Classifier New (Model) Classifier 3 Combine Data Data votes … New Classifier n Credit Rating? Name Income Age Credit rating Data Paul High [30..40] ? Composite classifier 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 11 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 12 (Dr. O. Zaiane) (Dr. O. Zaiane)

  3. Classification Methods Framework (Supervised Learning) Next week � Decision Tree Induction � Neural Networks Derive Derive Training Training Classifier Estimate Estimate � Bayesian Classification Data (Model) Accuracy Classifier Labeled Data Data Accuracy Today (Model) � Associative Classifiers Testing Data Labeled � K-Nearest Neighbour Unlabeled Data New Data Next week Testing � Support Vector Machines Data � Case-Based Reasoning � Genetic Algorithms Unlabeled � Rough Set Theory New Data � Fuzzy Sets � Etc. 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 13 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 14 (Dr. O. Zaiane) (Dr. O. Zaiane) Lecture Outline Human Nervous System Part I: Artificial Neural Networks (ANN) (1 hour) • We have only just began to understand Introduction to Neural Networks how our neural system operates � Biological Neural System • • A huge number of neurons and What is an artificial neural network? • interconnections between them Neuron model and activation function • 100 billion (i.e. 10 10 ) neurons in the brain – Construction of a neural network • Learning: Backpropagation Algorithm � • a full Olympic-sized swimming pool contains 10 10 raindrops; the number of stars in the Forward propagation of signal • Milky Way is of the same magnitude Backward propagation of error • – 10 4 connections per neuron Example • Part II: Bayesian Classifiers (Statistical-based) (1 hour) What is Bayesian Classification � • Biological neurons are slower than computers Bayes theorem � – Neurons operate in 10 -3 seconds , computers in 10 -9 seconds Naïve Bayes Algorithm � – The brain makes up for the slow rate of operation by a single Using Laplace Estimate • neurone by the large number of neurons and connections Handling Missing Values and Numerical Data • • Belief Networks (think about the speed of face recognition by a human, for example, and the time it takes fast computers to do the same task.) 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 15 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 16 (Dr. O. Zaiane) (Dr. O. Zaiane) Biological Neurons Operation of biological neurons • Signals are transmitted between neurons by • The purpose of neurons: transmit information in electrical pulses ( action potentials, AP ) the form of electrical signals traveling along the axon; • When the potential at the synapse is raised – it accepts many inputs, which are all added up in some way sufficiently by the AP, it releases chemicals – if enough active inputs are received at once, the neuron will called neurotransmitters be activated and fire; if not, it remain in its inactive state - it may take the arrival of more than one AP before the synapse is triggered • Structure of neuron • Cell body - contains nucleus holding the • The neurotransmitters diffuse across the gap and chemically activate chromosomes gates on the dendrites, that allows charged ions to flow • Dendrites • Axon • The flow of ions alters the potential of the dendrite and provides a • Synapse voltage pulse on the dendrite ( post-synaptic-potential, PSP ) � couples the axon with the dendrite of • some synapses excite the dendrite they affect, while others inhibit it another cell; • the synapses also determine the strength of the new input signal � information is passed from one neuron • Each PSP travels along its dendrite and spreads over the soma (cell to another through synapses; body) � no direct linkage across the junction, • The soma sums the effects of thousands PSPs; if the resulting potential it is a chemical one. exceeds a threshold, the neuron fires and generates another AP. 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 17 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 18 (Dr. O. Zaiane) (Dr. O. Zaiane)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend