Classification Classification and Prediction Classification: - PowerPoint PPT Presentation

Classification

Classification and Prediction • Classification: predict categorical class labels – Build a model for a set of classes/concepts – Classify loan applications (approve/decline) • Prediction: model continuous-valued functions – Predict the economic growth in 2015 Jian Pei: CMPT 741/459 Classification (1) 2

Classification: A 2-step Process • Model construction: describe a set of predetermined classes – Training dataset: tuples for model construction • Each tuple/sample belongs to a predefined class – Classification rules, decision trees, or math formulae • Model application: classify unseen objects – Estimate accuracy of the model using an independent test set – Acceptable accuracy à apply the model to classify tuples with unknown class labels Jian Pei: CMPT 741/459 Classification (1) 3

Model Construction Classification Algorithms Training Data Classifier Name Rank Years Tenured (Model) Mike Ass. Prof 3 No Mary Ass. Prof 7 Yes Bill Prof 2 Yes Jim Asso. Prof 7 Yes IF rank = ‘ professor ’ Dave Ass. Prof 6 No OR years > 6 Anne Asso. Prof 3 No THEN tenured = ‘ yes ’ Jian Pei: CMPT 741/459 Classification (1) 4

Model Application Classifier Testing Unseen Data Data (Jeff, Professor, 4) Name Rank Years Tenured Tenured? Tom Ass. Prof 2 No Merlisa Asso. Prof 7 No George Prof 5 Yes Joseph Ass. Prof 7 Yes Jian Pei: CMPT 741/459 Classification (1) 5

Supervised/Unsupervised Learning • Supervised learning (classification) – Supervision: objects in the training data set have labels – New data is classified based on the training set • Unsupervised learning (clustering) – The class labels of training data are unknown – Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data Jian Pei: CMPT 741/459 Classification (1) 6

Data Preparation • Data cleaning – Preprocess data in order to reduce noise and handle missing values • Relevance analysis (feature selection) – Remove the irrelevant or redundant attributes • Data transformation – Generalize and/or normalize data Jian Pei: CMPT 741/459 Classification (1) 7

Measurements of Quality • Prediction accuracy • Speed and scalability – Construction speed and application speed • Robustness: handle noise and missing values • Scalability: build model for large training data sets • Interpretability: understandability of models Jian Pei: CMPT 741/459 Classification (1) 8

Decision Tree Induction • Decision tree representation • Construction of a decision tree • Inductive bias and overfitting • Scalable enhancements for large databases Jian Pei: CMPT 741/459 Classification (1) 9

Decision Tree • A node in the tree – a test of some attribute • A branch: a possible value of the attribute • Classification Outlook – Start at the root Sunny Overcast Rain – Test the attribute – Move down the tree branch Humidity Yes Wind High Normal Strong Weak No Yes No Yes Jian Pei: CMPT 741/459 Classification (1) 10

Training Dataset Outlook Temp Humid Wind PlayTennis Sunny Hot High Weak No Sunny Hot High Strong No Overcast Hot High Weak Yes Rain Mild High Weak Yes Rain Cool Normal Weak Yes Rain Cool Normal Strong No Overcast Cool Normal Strong Yes Sunny Mild High Weak No Sunny Cool Normal Weak Yes Rain Mild Normal Weak Yes Sunny Mild Normal Strong Yes Overcast Mild High Strong Yes Overcast Hot Normal Weak Yes Rain Mild High Strong No Jian Pei: CMPT 741/459 Classification (1) 11

Appropriate Problems • Instances are represented by attribute-value pairs – Extensions of decision trees can handle real- valued attributes • Disjunctive descriptions may be required • The training data may contain errors or missing values Jian Pei: CMPT 741/459 Classification (1) 12

Basic Algorithm ID3 • Construct a tree in a top-down recursive divide- and-conquer manner – Which attribute is the best at the current node? – Create a node for each possible attribute value – Partition training data into descendant nodes • Conditions for stopping recursion – All samples at a given node belong to the same class – No attribute remained for further partitioning • Majority voting is employed for classifying the leaf – There is no sample at the node Jian Pei: CMPT 741/459 Classification (1) 13

Which Attribute Is the Best? • The attribute most useful for classifying examples • Information gain and gini index – Statistical properties – Measure how well an attribute separates the training examples Jian Pei: CMPT 741/459 Classification (1) 14

Entropy • Measure homogeneity of examples c Entropy ( S ) p log p ∑ ≡ − i 2 i i 1 = – S is the training data set, and pi is the proportion of S belong to class i • The smaller the entropy, the purer the data set Jian Pei: CMPT 741/459 Classification (1) 15

Information Gain • The expected reduction in entropy caused by partitioning the examples according to an attribute | S | v Gain ( S , A ) Entropy ( S ) Entropy ( S ) ∑ ≡ − v | S | v Values ( A ) ∈ Value(A) is the set of all possible values for attribute A , and S v is the subset of S for which attribute A has value v Jian Pei: CMPT 741/459 Classification (1) 16

PlayTenni Outlook Temp Humid Wind s Example Sunny Hot High Weak No Sunny Hot High Strong No Overcast Hot High Weak Yes Rain Mild High Weak Yes Rain Cool Normal Weak Yes Rain Cool Normal Strong No Overcast Cool Normal Strong Yes Sunny Mild High Weak No Sunny Cool Normal Weak Yes Rain Mild Normal Weak Yes 9 9 5 5 Sunny Mild Normal Strong Yes Entropy ( S ) log log = − − Overcast Mild High Strong Yes 2 2 14 14 14 14 Overcast Hot Normal Weak Yes Rain Mild High Strong No 0 . 94 = | S | v Gain ( S , Wind ) Entropy ( S ) Entropy ( S ) ∑ = − v | S | v { Weak , Strong } ∈ 8 6 Entropy ( S ) Engropy ( S ) Engropy ( S ) = − − Weak Strong 14 14 8 6 0 . 94 0 . 811 1 . 00 0 . 048 = − × − × = 14 14 Jian Pei: CMPT 741/459 Classification (1) 17

Hypothesis Space Search in Decision Tree Building • Hypothesis space: the set of possible decision trees • ID3: simple-to-complex, hill-climbing search – Evaluation function: information gain Jian Pei: CMPT 741/459 Classification (1) 18

Capabilities and Limitations • The hypothesis space is complete • Maintains only a single current hypothesis • No backtracking – May converge to a locally optimal solution • Use all training examples at each step – Make statistics-based decisions – Not sensitive to errors in individual example Jian Pei: CMPT 741/459 Classification (1) 19

Natural Bias • The information gain measure favors attributes with many values • An extreme example – Attribute “ date ” may have the highest information gain – A very broad decision tree of depth one – Inapplicable to any future data Jian Pei: CMPT 741/459 Classification (1) 20

Alternative Measures • Gain ratio: penalize attributes like date by incorporating split information c | S | | S | – SplitInfor mation ( S , A ) i log i ∑ ≡ − 2 S | S | | | i 1 = • Split information is sensitive to how broadly and uniformly the attribute splits the data Gain ( S , A ) – GainRatio ( S , A ) ≡ SplitInfor mation ( S , A ) • Gain ratio can be undefined or very large – Only test attributes with over average gain Jian Pei: CMPT 741/459 Classification (1) 21

Measuring Inequality Lorenz Curve X-axis: quintiles Y-axis: accumulative share of income earned by the plotted quintile Gap between the actual lines and the mythical line: the degree of inequality Gini Gini = 0, even distribution index Gini = 1, perfectly unequal The greater the distance, the more unequal the distribution Jian Pei: CMPT 741/459 Classification (1) 22

Gini Index (Adjusted) • A data set S contains examples from n classes n 2 gini ( T ) 1 p j = − ∑ j 1 = – p j is the relative frequency of class j in S • A data set S is split into two subsets S 1 and S 2 with sizes N 1 and N 2 respectively N N ( T ) 1 gini ( ) 2 gini ( ) gini split = T + T 1 2 N N • The attribute provides the smallest ginisplit(T) is chosen to split the node Jian Pei: CMPT 741/459 Classification (1) 23

Extracting Classification Rules • Classification rules can be extracted from a decision tree • Each path from the root to a leaf à an IF- THEN rule – All attribute-value pair along a path form a conjunctive condition – The leaf node holds the class prediction – IF age = “ <=30 ” AND student = “ no ” THEN buys_computer = “ no ” • Rules are easy to understand Jian Pei: CMPT 741/459 Classification (1) 24

Inductive Bias • The set of assumptions that, together with the training data, deductively justifies the classification to future instances – Preferences of the classifier construction • Shorter trees are preferred over longer trees • Trees that place high information gain attributes close to the root are preferred Jian Pei: CMPT 741/459 Classification (1) 25

Classification Classification and Prediction Classification: - PowerPoint PPT Presentation

Classification Classification and Prediction Classification: predict categorical class labels Build a model for a set of classes/concepts Classify loan applications (approve/decline) Prediction: model continuous-valued functions

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction

Using lasso and related estimators for prediction Di Liu StataCorp July 12, 2019 1 / 20

Prediction and Odds 18.05 Spring 2017 Probabilistic Prediction Also called probabilistic

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp 1 / 50

CS 104 Computer Organization and Design Branch Prediction CS104:Branch Prediction 1 Branch

Exercise 7a: Additional Intra Prediction Modes Implement Additional Block Prediction Modes Add

(seasonal) prediction systems Arun Kumar Climate Prediction Center College Park, Maryland, USA

Summary of part I: prediction and RL Prediction is important for action selection The

Prediction and Odds 18.05 Spring 2014 January 1, 2017 1 / 20 Probabilistic Prediction Also

Prediction and Odds 18.05 Spring 2014 January 1, 2017 1 / 26 Probabilistic Prediction Also

Image and Video Coding: Intra Prediction & Picture Partitioning Intra-Picture Prediction

Part-II Parametric Signal Modeling and Linear Prediction Theory 3. Linear Prediction Electrical

Link prediction The link prediction space is vast and imbalanced : real approaches focus only in

DeepLoc Data set statistics & performance Protein prediction II Gregor Sturm, Johannes Rest,

Link prediction via matrix factorization Charles Elkan University of California, San Diego

Historical Transitions in Transport Systems iTEAM4 Workshop, IIASA October 30, 2018 Arnulf

Torsion subgroups of rational elliptic curves over the compositum of all cubic fields Andrew V.

Security on the Line: Modern Curve-based Cryptography Joost Renes SCA Workshop 18 June 2019

MAPA Mapping Scorecard Calibration using a Monotone Adjacent Pooling Algorithm Presented at

for Sustainable Development arnulf gruebler@iiasa.ac.at IIT Bombay, January 22, 2018 101 of

["si:saId] Why CSIDH? Drop-in post-quantum replacement for (EC)DH.

Math 211 Math 211 Lecture #37 The Linearization in Higher Dimension November 21, 2003 2

APNA 30th Annual Conference Session 4017: October 22, 2016 Guenzel 1 APNA 30th Annual

Classification Classification and Prediction Classification: - PowerPoint PPT Presentation

Classification Classification and Prediction Classification: predict categorical class labels Build a model for a set of classes/concepts Classify loan applications (approve/decline) Prediction: model continuous-valued functions

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction

Using lasso and related estimators for prediction Di Liu StataCorp July 12, 2019 1 / 20

Prediction and Odds 18.05 Spring 2017 Probabilistic Prediction Also called probabilistic

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp 1 / 50

CS 104 Computer Organization and Design Branch Prediction CS104:Branch Prediction 1 Branch

Exercise 7a: Additional Intra Prediction Modes Implement Additional Block Prediction Modes Add

(seasonal) prediction systems Arun Kumar Climate Prediction Center College Park, Maryland, USA

Summary of part I: prediction and RL Prediction is important for action selection The

Prediction and Odds 18.05 Spring 2014 January 1, 2017 1 / 20 Probabilistic Prediction Also

Prediction and Odds 18.05 Spring 2014 January 1, 2017 1 / 26 Probabilistic Prediction Also

Image and Video Coding: Intra Prediction &amp; Picture Partitioning Intra-Picture Prediction

Part-II Parametric Signal Modeling and Linear Prediction Theory 3. Linear Prediction Electrical

Link prediction The link prediction space is vast and imbalanced : real approaches focus only in

DeepLoc Data set statistics &amp; performance Protein prediction II Gregor Sturm, Johannes Rest,

Link prediction via matrix factorization Charles Elkan University of California, San Diego

Historical Transitions in Transport Systems iTEAM4 Workshop, IIASA October 30, 2018 Arnulf

Torsion subgroups of rational elliptic curves over the compositum of all cubic fields Andrew V.

Security on the Line: Modern Curve-based Cryptography Joost Renes SCA Workshop 18 June 2019

MAPA Mapping Scorecard Calibration using a Monotone Adjacent Pooling Algorithm Presented at

for Sustainable Development arnulf gruebler@iiasa.ac.at IIT Bombay, January 22, 2018 101 of

[&quot;si:saId] Why CSIDH? Drop-in post-quantum replacement for (EC)DH.

Math 211 Math 211 Lecture #37 The Linearization in Higher Dimension November 21, 2003 2

APNA 30th Annual Conference Session 4017: October 22, 2016 Guenzel 1 APNA 30th Annual

Image and Video Coding: Intra Prediction & Picture Partitioning Intra-Picture Prediction

DeepLoc Data set statistics & performance Protein prediction II Gregor Sturm, Johannes Rest,

["si:saId] Why CSIDH? Drop-in post-quantum replacement for (EC)DH.