Course Content Week 5 (April 7) and Week 6 (April 14) Introduction - PDF document

Lecture 4 Course Content Week 5 (April 7) and Week 6 (April 14) • Introduction to Data Mining 33459-01 Principles of Knowledge Discovery in Data • Association analysis • Sequential Pattern Analysis Classification: Neural Networks, • Classification and prediction Naïve Bayesian Classification, • Contrast Sets k-Nearest Neighbors, Decision • Data Clustering Trees & Associative Classifiers • Outlier Detection Lecture by: Dr. Osmar R. Zaïane • Web Mining 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 1 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 2 (Dr. O. Zaiane) (Dr. O. Zaiane) What is Classification? Classification = Learning a Model The goal of data classification is to organize and Training Set (labeled) categorize data in distinct classes. A model is first created based on the data distribution. The model is then used to classify new data. Given the model, a class can be predicted for new data. Classification Model With classification, I can predict in which bucket to put the ball, but I can’t predict the weight of the ball. ? … New unlabeled data Labeling=Classification 1 2 3 4 n 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 3 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 4 (Dr. O. Zaiane) (Dr. O. Zaiane) Classification is a three-step Classification is a three-step process process 1. Model construction ( Learning ): 2. Model Evaluation ( Accuracy ): • Each tuple is assumed to belong to a predefined class, as Estimate accuracy rate of the model based on a test set . determined by one of the attributes, called the class label . – The known label of test sample is compared with the • The set of all tuples used for construction of the model is classified result from the model. called training set . – Accuracy rate is the percentage of test set samples that • The model is represented in the following forms: are correctly classified by the model. – Test set is independent of training set otherwise over- • Classification rules, (IF-THEN statements), fitting will occur. • Decision tree • Mathematical formulae 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 5 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 6 (Dr. O. Zaiane) (Dr. O. Zaiane)

Classification is a three-step Classification with Holdout process Derive 3. Model Use ( Classification ): Training Estimate Classifier The model is used to classify unseen objects. Data Accuracy (Model) • Give a class label to a new tuple Data • Predict the value of an actual attribute Testing Data •Holdout •Random sub-sampling •K-fold cross validation •Bootstrapping • … 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 7 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 8 (Dr. O. Zaiane) (Dr. O. Zaiane) 1. Classification Process 2. Classification Process (Learning) (Accuracy Evaluation) Classification Algorithms Training Classifier Testing (Model) Data Data Name Income Age Credit rating Classifier Bruce Low <30 bad (Model) Name Income Age Credit rating How accurate is the model? Tom Medium <30 bad Dave Medium [30..40] good Jane High <30 bad William High <30 good IF Income = ‘High’ Wei High >40 good IF Income = ‘High’ Marie Medium >40 good OR Age > 30 Hua Medium [30..40] good OR Age > 30 THEN CreditRating = ‘Good’ Anne Low [30..40] good THEN CreditRating = ‘Good’ Chris Medium <30 bad 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 9 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 10 (Dr. O. Zaiane) (Dr. O. Zaiane) 3. Classification Process Improving Accuracy (Classification) Classifier 1 Classifier 2 Classifier New (Model) Classifier 3 Combine Data Data votes … New Classifier n Credit Rating? Name Income Age Credit rating Data Paul High [30..40] ? Composite classifier 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 11 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 12 (Dr. O. Zaiane) (Dr. O. Zaiane)

Classification Methods Framework (Supervised Learning) Next week � Decision Tree Induction � Neural Networks Derive Derive Training Training Classifier Estimate Estimate � Bayesian Classification Data (Model) Accuracy Classifier Labeled Data Data Accuracy Today (Model) � Associative Classifiers Testing Data Labeled � K-Nearest Neighbour Unlabeled Data New Data Next week Testing � Support Vector Machines Data � Case-Based Reasoning � Genetic Algorithms Unlabeled � Rough Set Theory New Data � Fuzzy Sets � Etc. 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 13 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 14 (Dr. O. Zaiane) (Dr. O. Zaiane) Lecture Outline Human Nervous System Part I: Artificial Neural Networks (ANN) (1 hour) • We have only just began to understand Introduction to Neural Networks how our neural system operates � Biological Neural System • • A huge number of neurons and What is an artificial neural network? • interconnections between them Neuron model and activation function • 100 billion (i.e. 10 10 ) neurons in the brain – Construction of a neural network • Learning: Backpropagation Algorithm � • a full Olympic-sized swimming pool contains 10 10 raindrops; the number of stars in the Forward propagation of signal • Milky Way is of the same magnitude Backward propagation of error • – 10 4 connections per neuron Example • Part II: Bayesian Classifiers (Statistical-based) (1 hour) What is Bayesian Classification � • Biological neurons are slower than computers Bayes theorem � – Neurons operate in 10 -3 seconds , computers in 10 -9 seconds Naïve Bayes Algorithm � – The brain makes up for the slow rate of operation by a single Using Laplace Estimate • neurone by the large number of neurons and connections Handling Missing Values and Numerical Data • • Belief Networks (think about the speed of face recognition by a human, for example, and the time it takes fast computers to do the same task.) 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 15 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 16 (Dr. O. Zaiane) (Dr. O. Zaiane) Biological Neurons Operation of biological neurons • Signals are transmitted between neurons by • The purpose of neurons: transmit information in electrical pulses ( action potentials, AP ) the form of electrical signals traveling along the axon; • When the potential at the synapse is raised – it accepts many inputs, which are all added up in some way sufficiently by the AP, it releases chemicals – if enough active inputs are received at once, the neuron will called neurotransmitters be activated and fire; if not, it remain in its inactive state - it may take the arrival of more than one AP before the synapse is triggered • Structure of neuron • Cell body - contains nucleus holding the • The neurotransmitters diffuse across the gap and chemically activate chromosomes gates on the dendrites, that allows charged ions to flow • Dendrites • Axon • The flow of ions alters the potential of the dendrite and provides a • Synapse voltage pulse on the dendrite ( post-synaptic-potential, PSP ) � couples the axon with the dendrite of • some synapses excite the dendrite they affect, while others inhibit it another cell; • the synapses also determine the strength of the new input signal � information is passed from one neuron • Each PSP travels along its dendrite and spreads over the soma (cell to another through synapses; body) � no direct linkage across the junction, • The soma sums the effects of thousands PSPs; if the resulting potential it is a chemical one. exceeds a threshold, the neuron fires and generates another AP. 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 17 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 18 (Dr. O. Zaiane) (Dr. O. Zaiane)

Course Content Week 5 (April 7) and Week 6 (April 14) Introduction - PDF document

Lecture 4 Course Content Week 5 (April 7) and Week 6 (April 14) Introduction to Data Mining 33459-01 Principles of Knowledge Discovery in Data Association analysis Sequential Pattern Analysis Classification: Neural Networks,

Course Orientation q Course Description q Course Outcomes q Course Requirements q Course Outline

CONTENT DURING CORONAVIRUS LUCINDA DAWES - STRATEGIC CONTENT MARKETING Copywriter & Content

Content Provider Content Resolver Cursor Content Provider Basics Content providers is one

Peering and CDNs Arturo Servin Google Imagine youre a Content Provider Content Provider

CS371m - Mobile Computing Content Providers And Content Resolvers Content Providers One of

Course Search Widget Topics StudyLink Course Search Widget Demo Generic Course Search

Course Specifications/Detailed Course Outline Course code : STA 331 2.0 Course title :

Content Editors Training Course 2 In this session we will introduce Content Editors to the new

NC COURSE OF STUDY GRADUATION REQUIREMENTS * Content Area CAREER PREP COLLEGE TECH PREP**

DPD Basic Bicycle Course Course Objectives COURSE GOAL: The course will provide the trainee with

CANVAS COURSE PROFILE STUDENT PERFORMANCE COURSE OVERVIEW ASSIGNMENT AND SUBMISSION ANALYSIS

Leadplane Training Course Leadplane Training Course Course Objectives Describe procedures for

Statistics II Xavier Vil Course 2004-2005 1.- Course Contents 2.- Course Resources 3.-

ARM Microcontroller Course June 3, 2015 ARM Microcontroller Course The Course Direct Digital

Course Home Page Course Design Course Structure main source reading-intensive course

Level 1, V2.0 Level 1, V2.0 1 Course Contents Course Contents Course Contents Course

Objec(ves Review algorithms Programming in Python Data types Expressions Variables

A86012 Management and Principles of Accounting (2019/2020) Session 8 Operations Paul G. Smith

Hybrid intelligence systems. These human in the loop AI systems leverage the

Self-supervised Learning for Vision-and-Language Licheng Yu, Yen-Chun Chen, Linjie Li Nowadays

Logic and science: science and logic Marcus Rossberg and Stewart Shapiro November 8, 2017 Logic

Todays*Class ! Organiza.onal&mee.ng Course(organiza.on(&(outline Policies

The Emergent Church and Last Days Ecumenism By Dr. Andy Woods Adapted from Roger Oakland, Faith

Advanced Global Illumination 15-462 Computer Graphics Mar 23, 2004 Sriram Vaidhyanathan 1

Course Content Week 5 (April 7) and Week 6 (April 14) Introduction - PDF document

Lecture 4 Course Content Week 5 (April 7) and Week 6 (April 14) Introduction to Data Mining 33459-01 Principles of Knowledge Discovery in Data Association analysis Sequential Pattern Analysis Classification: Neural Networks,

Course Orientation q Course Description q Course Outcomes q Course Requirements q Course Outline

CONTENT DURING CORONAVIRUS LUCINDA DAWES - STRATEGIC CONTENT MARKETING Copywriter &amp; Content

Content Provider Content Resolver Cursor Content Provider Basics Content providers is one

Peering and CDNs Arturo Servin Google Imagine youre a Content Provider Content Provider

CS371m - Mobile Computing Content Providers And Content Resolvers Content Providers One of

Course Search Widget Topics StudyLink Course Search Widget Demo Generic Course Search

Course Specifications/Detailed Course Outline Course code : STA 331 2.0 Course title :

Content Editors Training Course 2 In this session we will introduce Content Editors to the new

NC COURSE OF STUDY GRADUATION REQUIREMENTS * Content Area CAREER PREP COLLEGE TECH PREP**

DPD Basic Bicycle Course Course Objectives COURSE GOAL: The course will provide the trainee with

CANVAS COURSE PROFILE STUDENT PERFORMANCE COURSE OVERVIEW ASSIGNMENT AND SUBMISSION ANALYSIS

Leadplane Training Course Leadplane Training Course Course Objectives Describe procedures for

Statistics II Xavier Vil Course 2004-2005 1.- Course Contents 2.- Course Resources 3.-

ARM Microcontroller Course June 3, 2015 ARM Microcontroller Course The Course Direct Digital

Course Home Page Course Design Course Structure main source reading-intensive course

Level 1, V2.0 Level 1, V2.0 1 Course Contents Course Contents Course Contents Course

Objec(ves Review algorithms Programming in Python Data types Expressions Variables

A86012 Management and Principles of Accounting (2019/2020) Session 8 Operations Paul G. Smith

Hybrid intelligence systems. These human in the loop AI systems leverage the

Self-supervised Learning for Vision-and-Language Licheng Yu, Yen-Chun Chen, Linjie Li Nowadays

Logic and science: science and logic Marcus Rossberg and Stewart Shapiro November 8, 2017 Logic

Todays*Class ! Organiza.onal&amp;mee.ng Course(organiza.on(&amp;(outline Policies

The Emergent Church and Last Days Ecumenism By Dr. Andy Woods Adapted from Roger Oakland, Faith

Advanced Global Illumination 15-462 Computer Graphics Mar 23, 2004 Sriram Vaidhyanathan 1

CONTENT DURING CORONAVIRUS LUCINDA DAWES - STRATEGIC CONTENT MARKETING Copywriter & Content

Todays*Class ! Organiza.onal&mee.ng Course(organiza.on(&(outline Policies