Implementation: Real machine learning schemes Decision trees z From - PDF document

� � � � � � � Implementation: Real machine learning schemes Decision trees z From ID3 to C4.5 (pruning, numeric attributes, ...) Classification rules z From PRISM to RIPPER and PART (pruning, numeric data, ...) Data Mining Extending linear models z Support vector machines and neural networks Practical Machine Learning Tools and Techniques Instance-based learning z Pruning examples, generalized exemplars, distance functions Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank Numeric prediction z Regression/model trees, locally weighted regression Clustering: hierarchical, incremental, probabilistic z Hierarchical, incremental, probabilistic Bayesian networks z Learning and prediction, fast data structures for learning 1 2 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 6) 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 6) Industrial-strength algorithms Decision trees ✁ For an algorithm to be useful in a wide ✂ Extending ID3: ✄ to permit numeric attributes: range of real-world applications it must: straightforward ✄ to deal sensibly with missing values: z Permit numeric attributes trickier ✄ stability for noisy data: z Allow missing values z Be robust in the presence of noise requires pruning mechanism ✂ End result: C4.5 (Quinlan) z Be able to approximate arbitrary concept ✄ Best-known and (probably) most widely-used descriptions (at least in principle) ✁ Basic schemes need to be extended to fulfill learning algorithm ✄ Commercial successor: C5.0 these requirements 3 4 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 6) 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 6) Numeric attributes Weather data (again!) Outlook Outlook Outlook Temperature Temperature Temperature Humidity Humidity Humidity Windy Windy Windy Play Play Play ✁ Standard method: binary splits Sunny Sunny Sunny Hot Hot Hot High High High False False False No No No Sunny Sunny Sunny Hot Hot Hot High High High True True True No No No z E.g. temp < 45 Overcast Overcast Overcast Hot Hot Hot High High High False False False Yes Yes Yes ✁ Unlike nominal attributes, Rainy Rainy Rainy Mild Mild Mild Normal Normal High False False False Yes Yes Yes Rainy … … … … … Cool … … … … … Normal … … … … … False … … … … … Yes … … … … … every attribute has many possible split points Rainy … … Cool … … Normal … … True … … … No … ✁ Solution is straightforward extension: … … … … … … … … … … … … … … … z Evaluate info gain (or other measure) for every possible split point of attribute If outlook = sunny and humidity = high then play = no z Choose “best” split point If outlook = rainy and windy = true then play = no If outlook = overcast then play = yes z Info gain for best split point is info gain for attribute If humidity = normal then play = yes ✁ Computationally more demanding If none of the above then play = yes 5 6 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 6) 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 6) 1

Weather data (again!) Example ✁ Split on temperature attribute: Outlook Outlook Outlook Temperature Temperature Temperature Humidity Humidity Humidity Windy Windy Windy Play Play Play Sunny Sunny Sunny Hot Hot Hot High High High False False False No No No Sunny Sunny Sunny Hot Hot Hot Outlook Outlook Outlook High High Temperature Temperature High Temperature True True True Humidity Humidity Humidity No No No Windy Windy Windy Play Play Play 64 65 68 69 70 71 72 72 75 75 80 81 83 85 Overcast Overcast Overcast Hot Hot Hot Sunny Sunny Sunny High High High Hot Hot 85 False False False High High 85 Yes Yes Yes False False False No No No Yes No Yes Yes Yes No No Yes Yes Yes No Yes Yes No Rainy Rainy Rainy Mild Mild Mild Sunny Sunny Sunny Normal Normal High Hot Hot 80 False False False High High 90 Yes Yes Yes True True True No No No Rainy … … … … … Cool … … … … … Overcast Overcast Overcast Normal … … … … … Hot Hot 83 False … … … … … High High 86 Yes … … … … … False False False Yes Yes Yes z E.g. temperature < 71.5: yes/4, no/2 Rainy … … Cool … … Rainy Rainy Rainy Normal … … Mild Mild 70 True … … Normal Normal 96 No … … False False False Yes Yes Yes temperature * 71.5: yes/5, no/3 … … … … … … Rainy … … … … … … … … … 68 … … … … … … … … … … … … 80 … … … False … … … … … Yes … … … … … Rainy … … … 65 … … … 70 True … … No … … z Info([4,2],[5,3]) … … … … … … … … … … … … … … … = 6/14 info([4,2]) + 8/14 info([5,3]) If outlook = sunny and humidity = high then play = no If outlook = rainy and windy = true then play = no = 0.939 bits If outlook = overcast then play = yes ✁ Place split points halfway between values If humidity = normal then play = yes If none of the above then play = yes If outlook = sunny and humidity > 83 then play = no ✁ Can evaluate all split points in one pass! If outlook = rainy and windy = true then play = no If outlook = overcast then play = yes If humidity < 85 then play = no If none of the above then play = yes 7 8 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 6) 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 6) Can avoid repeated sorting Binary vs multiway splits ✁ Splitting (multi-way) on a nominal attribute ✁ Sort instances by the values of the numeric attribute exhausts all information in that attribute z Nominal attribute is tested (at most) once on any path z Time complexity for sorting: O ( n log n ) in the tree ✁ Does this have to be repeated at each node of the ✁ Not so for binary splits on numeric attributes! tree? z Numeric attribute may be tested several times along a ✁ No! Sort order for children can be derived from sort path in the tree ✁ Disadvantage: tree is hard to read order for parent z Time complexity of derivation: O ( n ) ✁ Remedy: z Drawback: need to create and store an array of sorted z pre-discretize numeric attributes, or indices for each numeric attribute z use multi-way splits instead of binary ones 9 10 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 6) 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 6) Computing multi-way splits Missing values ✁ Simple and efficient way of generating ✁ Split instances with missing values into pieces multi-way splits: greedy algorithm z A piece going down a branch receives a weight ✁ Dynamic programming can find optimum proportional to the popularity of the branch z weights sum to 1 multi-way split in O ( n 2 ) time ✁ Info gain works with fractional instances z imp ( k , i , j ) is the impurity of the best split of values x i … x j into k sub-intervals z use sums of weights instead of counts ✁ During classification, split the instance into z imp ( k , 1, i ) = min 0< j < i imp ( k –1, 1, j ) + imp (1, j +1, i ) pieces in the same way z imp ( k, 1 , N ) gives us the best k -way split z Merge probability distribution using weights ✁ In practice, greedy algorithm works as well 11 12 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 6) 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 6) 2

Implementation: Real machine learning schemes Decision trees z From - PDF document

Implementation: Real machine learning schemes Decision trees z From ID3 to C4.5 (pruning, numeric attributes, ...) Classification rules z From PRISM to RIPPER and PART (pruning, numeric data, ...) Data Mining

Section 1 Commitment Schemes Commitment Schemes Commitment Schemes Digital analogue of a safe.

Implementation: Real machine learning schemes Decision trees From ID3 to C4.5 (pruning,

Real graduates, Real graduates, real transitions, real transitions, real stories: real

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Decision Trees COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning Decision

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Learning Decision Trees Machine Learning 1 Some slides from Tom Mitchell, Dan Roth and others

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Applied Machine Learning Applied Machine Learning Decision Trees Siamak Ravanbakhsh Siamak

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Neural-Symbolic Integration Strategies Neural-Symbolic Integration Unification Hybrid

Snap-back repellers in rational difference equations Antonio Cascales, Francisco Balibrea

Introduction to Artificial Intelligence Introduction to Artificial Intelligence Data Mining with

Dynamic Generation of Agent Communities from Distributed Production and Content-Driven Delivery

Enhancing Your Leadership Opportunities with the Council on Leadership and Advocacy Who Are We?

Creating Tables, Defining Constraints Rose-Hulman Institute of Technology Curt Clifton Outline

Introduction to Deep Learning by Boris Hanin June 12, 2020 Deeplearninglutorial :c : :c

CREATING THE NEXT YEAR IN REVIEW Video linked from Image Below CREATING THE NEXT