Decision Tree Algorithm Decision Tree Algorithm Week 4 1 Team - PowerPoint PPT Presentation

Decision Tree Algorithm Decision Tree Algorithm Week 4 1

Team Homework Assignment #5 Team Homework Assignment #5 • Read pp. 105 – 117 of the text book. R d 105 117 f h b k • Do Examples 3.1, 3.2, 3.3 and Exercise 3.4 (a). Prepare for the results of the homework assignment. results of the homework assignment. • Due date – beginning of the lecture on Friday February 25 th .

Team Homework Assignment #6 Team Homework Assignment #6 • Decide a data warehousing tool for your future homework D id d h i l f f h k assignments • Play the data warehousing tool Play the data warehousing tool • Due date – beginning of the lecture on Friday February 25 th .

Classification - A Two-Step Process Classification A Two Step Process Model usage : classifying future or unknown objects • Estimate accuracy of the model – The known label of test data is compared with the • classified result from the model Accuracy rate is the percentage of test set samples A t i th t f t t t l • that are correctly classified by the model If the accuracy is acceptable, use the model to classify y p , y – data tuples whose class labels are not known 4

ess (1): Model Construction ess (1): Model Construction Figure 6.1 The data classification process: (a) Learning: Training data are analyzed by a classification algorithm Here the class label attribute is loan decision and the a classification algorithm. Here, the class label attribute is loan_decision , and the learned model or classifier is represented in the form of classification rules. 5

Figure 6.1 The data classification process: (b) Classification: Test data are used to estimate the accuracy of the classification rules. If the accuracy is considered acceptable, the rules can be applied to the classification of new data tuples. t bl th l b li d t th l ifi ti f d t t l 6

De c isio n T re e Classific atio n E E xample l 7

Decision Tree Learning Overview Decision Tree Learning Overview • Decision Tree learning is one of the most widely used and • Decision Tree learning is one of the most widely used and practical methods for inductive inference over supervised data. • A decision tree represents a procedure for classifying • A decision tree represents a procedure for classifying categorical data based on their attributes. • It is also efficient for processing large amount of data, so is often used in data mining application. i ft d i d t i i li ti • The construction of decision tree does not require any domain knowledge or parameter setting, and therefore appropriate for exploratory knowledge discovery. • Their representation of acquired knowledge in tree form is intuitive and easy to assimilate by humans y y 8

Decision Tree Algorithm – ID3 Decision Tree Algorithm ID3 • Decide • Decide which attribute (splitting ‐ point) to test at hich attrib te (splitting point) to test at node N by determining the “best” way to separate or partition the tuples in D into separate or partition the tuples in D into individual classes • The splitting criteria is determined so that • The splitting criteria is determined so that, ideally, the resulting partitions at each branch are as “pure” as possible as pure as possible. – A partition is pure if all of the tuples in it belong to the same class 9

Figure 6.3 Basic algorithm for inducing a decision tree from training examples. 10

What is E What is E ntro py? ntro py? • T • T he entro py is a he entro py is a measure o f the unc e rtainty asso c iate d with a rando m variable ith d i bl • As unc e rtainty and o r rando mne ss inc re ase s fo r a re sult se t so do e s the entro py • Value s range fro m 0 – 1 Value s range fro m 0 1 to represent the entro py o f info rmatio n c c ∑ ≡ − ( ) log ( ) Entropy D p p 2 i i = 11 1 i

E E ntro py E ntro py E xample (1) xample (1) 12

Entropy Example (2) Entropy Example (2) 13

Entropy Example (3) Entropy Example (3) 14

E E ntro py E ntro py E xample (4) xample (4) 15

Information Gain Information Gain • Information gain is used as an attribute selection Information gain is used as an attribute selection measure • Pick the attribute that has the highest Information g Gain v | j | D ∑ ∑ = − , , Gain (D ( A) ) Entropy py ( ) (D) Entropy py ( (D ) ) j j | | D = 1 j D : A given data partition A : Attribute v : Suppose we were partition the tuples in D on some attribute A having v distinct values D is split into v partition or subsets, { D 1 , D2, … Dj }, where Dj contains those tupes in D that have outcome a j of A . 16

Table 6 1 Class ‐ labeled training tuples from AllElectronics customer database Table 6.1 Class labeled training tuples from AllElectronics customer database. 17

• Class P: buys_computer = “yes” • Class N: buys computer = “no” Class N: buys_computer = no 9 9 5 5 = − − = ( ) log ( ) log ( ) 0 . 940 Entropy D 2 2 14 14 14 14 14 14 14 14 • Compute the expected information requirement for each attribute: start with the attribute age ( , ) Gain age D | | S ∑ ∑ = = − v ( ( ) ) ( ( ) ) Entropy Entropy D D Entropy Entropy S S v v | | S ∈ − { , , } v Youth Middle aged Senior 5 4 5 = − − − ( ) ( ) ( ) ( ) Entropy D Entropy S Entropy S Entropy S _ youth middle aged senior 14 14 14 14 14 14 = 0 . 246 = ( , ) 0 . 029 Gain income D = ( , ) 0 . 151 Gain student D = ( _ , ) 0 . 048 Gain credit rating D 18

Figure 6.5 The attribute age has the highest information gain and therefore becomes the Figure 6.5 The attribute age has the highest information gain and therefore becomes the splitting attribute at the root node of the decision tree. Branches are grown for each outcome of age. The tuples are shown partitioned accordingly. 19

Figure 6.2 A decision tree for the concept buys_computer , indicating whether a customer at AllElectronics is likely to purchase a computer. Each internal (nonleaf) node represents a test on an attribute. Each leaf node represents a class (either buys_computer = yes or ib E h l f d l ( i h b buy_computers = no . 20

E E xe rc ise xe rc ise Construct a decision tree to classify “golf play.” Weather and Possibility of Golf Play Weather Temperature Humidity Wind Golf Play fine hot high none no fine fine hot hot high high few few no no cloud hot high none yes rain warm high none yes rain cold midiam none yes rain cold midiam few no cloud cold midiam few yes fine warm high none no fine fine cold cold midiam midiam none none yes yes rain warm midiam none yes fine warm midiam few yes cloud warm high few yes cloud l d h t hot midiam idi none yes rain warm high few no 21

Decision Tree Algorithm Decision Tree Algorithm Week 4 1 Team - PowerPoint PPT Presentation

Decision Tree Algorithm Decision Tree Algorithm Week 4 1 Team Homework Assignment #5 Team Homework Assignment #5 Read pp. 105 117 of the text book. R d 105 117 f h b k Do Examples 3.1, 3.2, 3.3 and Exercise 3.4 (a). Prepare

Decision Tree Decision Trees A decision tree is a decision support tool that uses a tree-like

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Why the Junction Tree Algorithm? The Junction Tree Algorithm The JTA is a general-purpose

Tree-sitter @maxbrunsfeld What is Tree-sitter? Why I wrote Tree-sitter What were

Decision Tree R Greiner Cmput 466 / 551 Learning Decision Trees Def'n: Decision Trees

Decision tree learning Aim: find a small tree consistent with the training examples Idea:

Junction Tree Algorithm Examples October 13, 2016 Junction Tree Algorithm Moralize (if

A Brief History of Decision Tree Implementation MAX AUSTIN Overview Famous Decision Tree

Decision Tree and Automata Learning Stefan Edelkamp 1 Overview - Decision tree representation

Final Examples Announcements Trees Tree-Structured Data def tree(label, branches=[]): A tree

Bounded Degree Spanning Tree using Iterative Relaxation Barna Saha March 11, 2015 Bounded

Odds Algorithm An Online Algorithm Group Fibonado 20. Dec 2016 Group Fibonado Odds Algorithm

Assessing The Necessity Survey and Decision Tree Activities Conducted Decision tree created

6 Decision- -Making Making MVC (revisited) 6 Decision MVC (revisited) decision

AMERICAS ARMY: Globally Responsive, Regionally Engaged Presented By: Program Executive Office

Investment Program Overview November 2016 CONFIDENTIAL Investment Strategy Investment Strategy

Generative Linguistics Linguistics is a branch of cognitive psychology. It is the study of

Be My Guest MCS Lock Now Welcomes Guests Tianzheng Wang , University of Toronto Milind

Evaluating Atomicity, and Integrity of Correct Memory Acquisition Methods Michael Gruhn , Felix

Nov. 6, 2017 1 NPS TRANSACTION SUMMARY Pharmaceutical Technologies, Inc., dba National

Enhancing an OAI- PMH Service Using Linked Data The case of the Sheet Music Consortium Stephen

NASA Helps Finding Damaged Buildings using Satellite Radar Data NASAs JPL Produced Damage