Classification and Prediction 3 Cengiz Gunay Partial slide credits: - PowerPoint PPT Presentation

CS 570 Data Mining Classification and Prediction 3 Cengiz Gunay Partial slide credits: Li Xiong, Han, Kamber, and Pan, Tan,Steinbach, Kumar 1 1

Collaborative Filtering Examples  Movielens: movies  Moviecritic: movies again  My launch: music  Gustos starrater: web pages  Jester: Jokes  TV Recommender: TV shows  Suggest 1.0 : different products February 12, 2008 Data Mining: Concepts and Techniques 2

Chapter 6. Classification and Prediction  Overview  Classification algorithms and methods  Decision tree induction  Bayesian classification  Lazy learning and kNN classification  Support Vector Machines (SVM)  Others  Prediction methods  Evaluation metrics and methods  Ensemble methods 3 February 12, 2008 Data Mining: Concepts and Techniques 3

Prediction  Prediction vs. classification  Classification predicts categorical class label  Prediction predicts continuous-valued attributes  Major method for prediction: regression  model the relationship between one or more independent or predictor variables and a dependent or response variable  Regression analysis  Linear regression  Other regression methods: generalized linear model, logistic regression, Poisson regression, regression trees 4 February 12, 2008 Data Mining: Concepts and Techniques 4

Linear Regression  Linear regression: Y = b 0 + b 1 X 1 + b 2 X 2 + … + b P X P  Line fitting: y = w 0 + w 1 x  Polynomial fitting: Y = b 2 x 2 + b 1 x + b 0  Many nonlinear functions can be transformed  Method of least squares: estimates the best-fitting straight line ∣ D ∣ w 1 = ∑ ( x i −̄ x )( y i −̄ y ) w 0 =̄ y − w 1 ̄ x i = 1 ∣ D ∣ ∑ 2 ( x i −̄ x ) i = 1 5 February 12, 2008 Li Xiong 5

Linear Regression- Loss Function

Other Regression-Based Models  General linear model  Logistic regression: models the probability of some event occurring as a linear function of a set of predictor variables  vs. Bayesian classifier  Assumes logistic model  Poisson regression (log-linear model): models the data that exhibit a Poisson distribution  Assumes Poisson distribution for response variable  Maximum likelyhood method 7 February 12, 2008 Li Xiong 7

Logistic Regression  Logistic regression: models the probability of some event occurring as a linear function of a set of predictor variables  Logistic function 8 February 12, 2008 Data Mining: Concepts and Techniques 8

Poisson Regression  Poisson regression (log-linear model): models the data that exhibit a Poisson distribution  Assumes Poisson distribution for response variable  Assumes logarithm of its expected value follows a linear model  Simplest case: 9 February 12, 2008 Li Xiong 9

Lasso  Subset selection  Lasso is defined  Using a small t forces some coefficients to 0  Explains the model with fewer variables  Ref: Hastie, Tibshirani, Friedman. The Elements of Statistical Learning

Other Classification Methods  Rule based classification  Neural networks  Genetic algorithms  Rough set approaches  Fuzzy set approaches 11 February 12, 2008 Data Mining: Concepts and Techniques 11

Linear Classification  Binary Classification problem x x  The data above the red x x x line belongs to class ‘x’ x x o x  The data below red line x o belongs to class ‘o’ o x o o oo  Examples: SVM, o o Perceptron, Probabilistic o o o o Classifiers 12 February 12, 2008 Data Mining: Concepts and Techniques 12

Classification: A Mathematical Mapping  Mathematically  x ∈ X = ℜ n , y ∈ Y = {+1, –1}  We want a function f: X  Y  Linear classifiers  Probabilistic Classifiers (Naive Bayesian)  SVM  Perceptron 13 February 12, 2008 Data Mining: Concepts and Techniques 13

Discriminative Classifiers  Advantages  prediction accuracy is generally high  As compared to Bayesian methods – in general  robust, works when training examples contain errors  fast evaluation of the learned target function  Bayesian networks are normally slow  Criticism  long training time  difficult to understand the learned function (weights)  Bayesian networks can be used easily for pattern discovery  not easy to incorporate domain knowledge  Easy in the form of priors on the data or distributions 14 February 12, 2008 Data Mining: Concepts and Techniques 14

Support Vector Machines (SVM)  Find linear separation in input space 15

SVM vs. Neural Network  SVM  Neural Network  Relatively old  Relatively new concept  Nondeterministic  Deterministic algorithm algorithm  Nice Generalization  Generalizes well but properties doesn’t have strong mathematical foundation  Hard to learn – learned  Can easily be learned in in batch mode using incremental fashion quadratic programming  To learn complex techniques functions—use multilayer  Using kernels can learn perceptron (not that trivial) very complex functions 16 February 12, 2008 Data Mining: Concepts and Techniques 16

Why Neural Networks?  Inspired by the nervous system:  Formalized by McCullough & Pitts (1943) as perceptron

A Neuron (= a perceptron) µ k - x 0 w 0 x 1 w 1 ∑ f output y x n w n For Example n Input weight weighted Activation y = sign ( ∑ w i x i + μ k ) vector x vector w sum function i = 0  The n -dimensional input vector x is mapped into variable y by means of the scalar product and a nonlinear function mapping 18 February 12, 2008 Data Mining: Concepts and Techniques 18

Perceptron & Winnow Algorithms • Vector: x ; scalar: x x 2 Input: {( x (1) , y (1) ), …} Output: classification function f( x ) f( x (i) ) > 0 for y (i) = +1 f( x (i) ) < 0 for y (i) = -1 f(x) => uses inner product w x + b = 0 or w 1 x 1 +w 2 x 2 +b = 0 Learning updates w : Learning updates w : • Perceptron: additively • Perceptron: additively x 1 • Winnow: multiplicatively • Winnow: multiplicatively 19 February 12, 2008 Data Mining: Concepts and Techniques 19

Linearly non-separable input? Use multiple perceptrons Advantage over SVM? No need for kernels, although Kernel Perceptron algorithm exists.

Neural Networks  A neural network: A set of connected input/output units where each connection is associated with a weight  Learning phase: adjusting the weights so as to predict the correct class label of the input tuples  Backpropagation  From a statistical point of view, networks perform nonlinear regression 23 February 12, 2008 Data Mining: Concepts and Techniques 23

A Multi-Layer Feed-Forward Neural Network Output vector Output layer Hidden layer w ij Input layer Input vector: X 24 February 12, 2008 Data Mining: Concepts and Techniques 24

A Multi-Layer Neural Network  The inputs to the network correspond to the attributes measured for each training tuple  Inputs are fed simultaneously into the units making up the input layer  They are then weighted and fed simultaneously to a hidden layer  The number of hidden layers is arbitrary, although usually only one  The weighted outputs of the last hidden layer are input to units making up the output layer , which emits the network's prediction  The network is feed-forward in that none of the weights cycles back to an input unit or to an output unit of a previous layer  From a statistical point of view, networks perform nonlinear regression : Given enough hidden units and enough training samples, they can closely approximate any function 25 February 12, 2008 Data Mining: Concepts and Techniques 25

Defining a Network Topology  First decide the network topology: # of units in the input layer , # of hidden layers (if > 1), # of units in each hidden layer , and # of units in the output layer  Normalizing the input values for each attribute measured in the training tuples to [0.0—1.0]  One input unit per domain value, each initialized to 0  Output , if for classification and more than two classes, one output unit per class is used  Once a network has been trained and its accuracy is unacceptable , repeat the training process with a different network topology or a different set of initial weights 26 February 12, 2008 Data Mining: Concepts and Techniques 26

Backpropagation  For each training tuple, the weights are modified to minimize the mean squared error between the network's prediction and the actual target value  Modifications are made in the “ backwards ” direction: from the output layer, through each hidden layer down to the first hidden layer, hence “ backpropagation ”  Steps  Initialize weights (to small random #s) and biases in the network  Propagate the inputs forward (by applying activation function)  Backpropagate the error (by updating weights and biases)  Terminating condition (when error is very small, etc.) 27 February 12, 2008 Data Mining: Concepts and Techniques 27

A Multi-Layer Feed-Forward Neural Network Output vector Err j = O j ( 1 − O j ) ∑ Err k w jk Output layer k θ j = θ j +( l ) Err j w ij = w ij +( l ) Err j O i Err j = O j ( 1 − O j )( T j − O j ) Hidden layer 1 w ij O j = − I j 1 + e Input layer I j = ∑ w ij O i + θ j i Input vector: X 28 February 12, 2008 Data Mining: Concepts and Techniques 28

Classification and Prediction 3 Cengiz Gunay Partial slide credits: - PowerPoint PPT Presentation

CS 570 Data Mining Classification and Prediction 3 Cengiz Gunay Partial slide credits: Li Xiong, Han, Kamber, and Pan, Tan,Steinbach, Kumar 1 1 Collaborative Filtering Examples Movielens: movies Moviecritic: movies again My launch:

CS570 Introduction to Data Mining Frequent Pattern Mining and Association Analysis Cengiz Gunay

CS570 Data Mining Frequent Pattern Mining and Association Analysis 2 Cengiz Gunay Slide

CS325 Artificial Intelligence Spring 2013 Midterm Solution Guide Instructor: Cengiz Gunay,

BIRD Design, Features & Status Cengiz Alaettinoglu cengiz@isi.edu Ramesh Govindan

Towards a Unified Object Storage Foundation for Scalable Storage Systems Authors: Cengiz

Routing Policy System WG (RPS) Chairs: Cengiz Alaettinoglu Curtis Villamizar Cengiz 1 Agenda

Overview Partial Constituent Fronting in German The phenomenon: Partial constituent fronting

Classification Classification and Prediction Classification: predict categorical class labels

Education quality in the perception of Azerbaijani education community Authors: Gunay Hamidova

Analysis of Sample Correlations for Monte Carlo Rendering David Coeurjolly Gurprit Singh Cengiz

CS570 Data Mining Classification: Ensemble Methods Cengiz Gnay Dept. Math & CS, Emory

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction

Partial Functions and Categories of Partial Maps Science Atlantic at Acadia University Darien

The Semantics of Partial Model Introduction Transformations Partial Models Transforming

Partial Orders on the integers. In this case ( a , b ) R if a b . a a so R is reflexive. a b

Review of classification methods for fraud detection Charlotte Werger Data Scientist DataCamp

A Semi-supervised Stacked Autoencoder Approach for Network Traffic Classification Ons Aouedi,

CS145: INTRODUCTION TO DATA MINING 08: Classification Evaluation and Practical Issues

Control for the Lundberg process Reinsurance and investment Christian Hipp Institute for

CS6220: DATA MINING TECHNIQUES Matrix Data: Classification: Part 3 Instructor: Yizhou Sun

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

Machine Learning (CSE 446): Multi-Class Classification; Kernel Methods Sham M Kakade 2018 c

Weakly Supervised Classification Weakly Supervised Classification and Robust Learning and Robust

Classification and Prediction 3 Cengiz Gunay Partial slide credits: - PowerPoint PPT Presentation

CS 570 Data Mining Classification and Prediction 3 Cengiz Gunay Partial slide credits: Li Xiong, Han, Kamber, and Pan, Tan,Steinbach, Kumar 1 1 Collaborative Filtering Examples Movielens: movies Moviecritic: movies again My launch:

CS570 Introduction to Data Mining Frequent Pattern Mining and Association Analysis Cengiz Gunay

CS570 Data Mining Frequent Pattern Mining and Association Analysis 2 Cengiz Gunay Slide

CS325 Artificial Intelligence Spring 2013 Midterm Solution Guide Instructor: Cengiz Gunay,

BIRD Design, Features &amp; Status Cengiz Alaettinoglu cengiz@isi.edu Ramesh Govindan

Towards a Unified Object Storage Foundation for Scalable Storage Systems Authors: Cengiz

Routing Policy System WG (RPS) Chairs: Cengiz Alaettinoglu Curtis Villamizar Cengiz 1 Agenda

Overview Partial Constituent Fronting in German The phenomenon: Partial constituent fronting

Classification Classification and Prediction Classification: predict categorical class labels

Education quality in the perception of Azerbaijani education community Authors: Gunay Hamidova

Analysis of Sample Correlations for Monte Carlo Rendering David Coeurjolly Gurprit Singh Cengiz

CS570 Data Mining Classification: Ensemble Methods Cengiz Gnay Dept. Math &amp; CS, Emory

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction

Partial Functions and Categories of Partial Maps Science Atlantic at Acadia University Darien

The Semantics of Partial Model Introduction Transformations Partial Models Transforming

Partial Orders on the integers. In this case ( a , b ) R if a b . a a so R is reflexive. a b

Review of classification methods for fraud detection Charlotte Werger Data Scientist DataCamp

A Semi-supervised Stacked Autoencoder Approach for Network Traffic Classification Ons Aouedi,

CS145: INTRODUCTION TO DATA MINING 08: Classification Evaluation and Practical Issues

Control for the Lundberg process Reinsurance and investment Christian Hipp Institute for

CS6220: DATA MINING TECHNIQUES Matrix Data: Classification: Part 3 Instructor: Yizhou Sun

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

Machine Learning (CSE 446): Multi-Class Classification; Kernel Methods Sham M Kakade 2018 c

Weakly Supervised Classification Weakly Supervised Classification and Robust Learning and Robust

BIRD Design, Features & Status Cengiz Alaettinoglu cengiz@isi.edu Ramesh Govindan

CS570 Data Mining Classification: Ensemble Methods Cengiz Gnay Dept. Math & CS, Emory