Machine Learning & Decision Trees
CS16: Introduction to Data Structures & Algorithms Spring 2020
Machine Learning & Decision Trees CS16: Introduction to Data - - PowerPoint PPT Presentation
Machine Learning & Decision Trees CS16: Introduction to Data Structures & Algorithms Spring 2020 Outline Motivation Supervised learning Decision Trees ML Bias 2 Machine Learning Algorithms that use data to design
Machine Learning & Decision Trees
CS16: Introduction to Data Structures & Algorithms Spring 2020
Outline
Machine Learning
recognition)
3 Learning Algodata
Algo Algoinput
Applications of ML
Classes of ML
feedback
6Supervised Learning
Supervised Learning
Example: Waiting for a Table
for a table
Example: Waiting for a Table?
Training Data
11Supervised Learning
Outline
Decision Trees
Decision Tree Example
15Decision Tree Example
16Activity #1
Decision Tree Example
17Activity #1
Decision Tree Example
18Activity #1
Decision Tree Example
19Decision Tree Example
20Our Goal: Learning a Decision Tree
21Training Data Decision Tree Learn
What is a Good Decision Tree?
Ω(22n)
Iterative Dichotomizer 3 (ID3) Algorithm
23Data (Learned) Decision Tree ID3
Ross QuinlanID3
ID3
25 French Thai Burger Italian 1xYes 1xNo 2xYes 2xNo 2xYes 2xNo 1xYes 1xNo Type 6xYes 6xNoUncertainty about whether we should wait or not
ID3
26 Type French Thai Burger Italian 1xYes 1xNo 2xYes 2xNo 2xYes 2xNo 1xYes 1xNo Patrons Some Full None 4xYes 2xYes 4xNo 2xNo No uncertainty! Subproblem recur! 6xYes 6xNo 6xYes 6xNoID3
same classification)
different classification)
ID3
28 Type French Thai Burger Italian 1xYes 1xNo 2xYes 2xNo 2xYes 2xNo 1xYes 1xNo Patrons Some Full None 4xYes 2xYes 4xNo 2xNoHow do we distinguish
from “bad” attributes “good” attributes many mixed subsets many unmixed subsets
6xYes 6xNo 6xYes 6xNoID3
Entropy
[ ]
Entropy
[ ]
ID3 - Information Gain
32 Type French Thai Burger Italian 1xYes 1xNo 2xYes 2xNo 2xYes 2xNo 1xYes 1xNo Patrons Some Full None 4xYes 2xYes 4xNo 2xNo 6xYes 6xNo 6xYes 6xNo 1 entropy 1 entropy 1 entropy 1 entropy 0.8 entropy entropy entropy Weighted Sum Weighted Sum entropy1
remainder1
ID3 - Notation
33 Type French Thai Burger Italian 1xYes 1xNo 2xYes 2xNo 2xYes 2xNo 1xYes 1xNo 6xYes 6xNodata0 data2 data3 data1 data4
Information gain
Gain(Att) = H(data0) − Rem(Att)
<latexit sha1_base64="SPj7aZrFYj9woaAL0BHWfCVaFTI=">ACJXicbVBNSwMxEM36WetX1aOXYBHag2VXBPVQqHrQo4pVoS1lNs1qaDa7JLNCWfbXePGvePFQRfDkXzHd9uDXg8Cb92aYzPNjKQy67oczNT0zOzdfWCguLi2vrJbW1q9NlGjGmySkb71wXApFG+iQMlvY80h9CW/8fsnI/mgWsjInWFg5h3QrhTIhAM0ErdUj1tm4CeglBZJadHiFm1Ts/GVQ8Qsq5bpTs0ry95+L2vWyq7NTcH/Uu8CSmTCc67pWG7F7Ek5AqZBGNanhtjJwWNgkmeFduJ4TGwPtzxlqUKQm46aX5mRret0qNBpO1TSHP1+0QKoTGD0LedIeC9+e2NxP+8VoLBQScVKk6QKzZeFCSYkRHmdGe0JyhHFgCTAv7V8ruQNDm2zRhuD9Pvkvae7WDmvexV65cTxJo0A2yRapEI/skwY5I+ekSRh5JM9kSF6dJ+fFeXPex61TzmRmg/yA8/kFd56jhQ=</latexit><latexit sha1_base64="SPj7aZrFYj9woaAL0BHWfCVaFTI=">ACJXicbVBNSwMxEM36WetX1aOXYBHag2VXBPVQqHrQo4pVoS1lNs1qaDa7JLNCWfbXePGvePFQRfDkXzHd9uDXg8Cb92aYzPNjKQy67oczNT0zOzdfWCguLi2vrJbW1q9NlGjGmySkb71wXApFG+iQMlvY80h9CW/8fsnI/mgWsjInWFg5h3QrhTIhAM0ErdUj1tm4CeglBZJadHiFm1Ts/GVQ8Qsq5bpTs0ry95+L2vWyq7NTcH/Uu8CSmTCc67pWG7F7Ek5AqZBGNanhtjJwWNgkmeFduJ4TGwPtzxlqUKQm46aX5mRret0qNBpO1TSHP1+0QKoTGD0LedIeC9+e2NxP+8VoLBQScVKk6QKzZeFCSYkRHmdGe0JyhHFgCTAv7V8ruQNDm2zRhuD9Pvkvae7WDmvexV65cTxJo0A2yRapEI/skwY5I+ekSRh5JM9kSF6dJ+fFeXPex61TzmRmg/yA8/kFd56jhQ=</latexit><latexit sha1_base64="SPj7aZrFYj9woaAL0BHWfCVaFTI=">ACJXicbVBNSwMxEM36WetX1aOXYBHag2VXBPVQqHrQo4pVoS1lNs1qaDa7JLNCWfbXePGvePFQRfDkXzHd9uDXg8Cb92aYzPNjKQy67oczNT0zOzdfWCguLi2vrJbW1q9NlGjGmySkb71wXApFG+iQMlvY80h9CW/8fsnI/mgWsjInWFg5h3QrhTIhAM0ErdUj1tm4CeglBZJadHiFm1Ts/GVQ8Qsq5bpTs0ry95+L2vWyq7NTcH/Uu8CSmTCc67pWG7F7Ek5AqZBGNanhtjJwWNgkmeFduJ4TGwPtzxlqUKQm46aX5mRret0qNBpO1TSHP1+0QKoTGD0LedIeC9+e2NxP+8VoLBQScVKk6QKzZeFCSYkRHmdGe0JyhHFgCTAv7V8ruQNDm2zRhuD9Pvkvae7WDmvexV65cTxJo0A2yRapEI/skwY5I+ekSRh5JM9kSF6dJ+fFeXPex61TzmRmg/yA8/kFd56jhQ=</latexit><latexit sha1_base64="SPj7aZrFYj9woaAL0BHWfCVaFTI=">ACJXicbVBNSwMxEM36WetX1aOXYBHag2VXBPVQqHrQo4pVoS1lNs1qaDa7JLNCWfbXePGvePFQRfDkXzHd9uDXg8Cb92aYzPNjKQy67oczNT0zOzdfWCguLi2vrJbW1q9NlGjGmySkb71wXApFG+iQMlvY80h9CW/8fsnI/mgWsjInWFg5h3QrhTIhAM0ErdUj1tm4CeglBZJadHiFm1Ts/GVQ8Qsq5bpTs0ry95+L2vWyq7NTcH/Uu8CSmTCc67pWG7F7Ek5AqZBGNanhtjJwWNgkmeFduJ4TGwPtzxlqUKQm46aX5mRret0qNBpO1TSHP1+0QKoTGD0LedIeC9+e2NxP+8VoLBQScVKk6QKzZeFCSYkRHmdGe0JyhHFgCTAv7V8ruQNDm2zRhuD9Pvkvae7WDmvexV65cTxJo0A2yRapEI/skwY5I+ekSRh5JM9kSF6dJ+fFeXPex61TzmRmg/yA8/kFd56jhQ=</latexit> H(data) = − ✓ #Yes #Yes + #No · log2 ✓ #Yes #Yes + #No ◆ + ✓ 1 − #Yes #Yes + #No ◆ · log2 ✓ 1 − #Yes #Yes + #No ◆◆ <latexit sha1_base64="CUCt8fc/HqYRscmeEmG1tEW24LM=">ADGXiclVJNb9QwEHVSPkr42sKRy4gV1VZVo6RCajkgVXDpCRWJpUXr1cpxnF2rThzsCdIqyu/opX+lFw6AOMKJf4M3G6HS5UBHsvw8b+Z5PJ6kVNJiFP3y/LUbN2/dXr8T3L13/8HD3saj91ZXhosh10qbk4RZoWQhihRiZPSCJYnShwnp68X/PEnYazUxTucl2Kcs2khM8kZOtdkw4sOBzW1GaQMWbMFL2FnkyqR4YBmhvGa9lv2g7BNc/mw3eE3umkoTzVSpaeT3WUq/H8uUCOnM9yCDoRAabBJP1Yshe2glQuX2yCGnWsId7qrtcXUflTXtCBIJj0+lEYtQarIO5An3R2NOn9oKnmVS4K5IpZO4qjEsc1Myi5Ek1AKytKxk/ZVIwcLFgu7Lhuv7aBZ86TQqaNWwVC672cUbPc2nmeuMic4cxe5RbOf3GjCrP9cS2LskJR8OVFWaUANSzmBFJpBEc1d4BxI12twGfM9Q3dNC2aEF98ioY7oYvwvjt8/7Bq64b6+QJeUoGJCZ75IAckiMyJNw78y68L95X/9z/7H/zvy9Dfa/LeUz+Mv/nb1d/+DI=</latexit><latexit sha1_base64="CUCt8fc/HqYRscmeEmG1tEW24LM=">ADGXiclVJNb9QwEHVSPkr42sKRy4gV1VZVo6RCajkgVXDpCRWJpUXr1cpxnF2rThzsCdIqyu/opX+lFw6AOMKJf4M3G6HS5UBHsvw8b+Z5PJ6kVNJiFP3y/LUbN2/dXr8T3L13/8HD3saj91ZXhosh10qbk4RZoWQhihRiZPSCJYnShwnp68X/PEnYazUxTucl2Kcs2khM8kZOtdkw4sOBzW1GaQMWbMFL2FnkyqR4YBmhvGa9lv2g7BNc/mw3eE3umkoTzVSpaeT3WUq/H8uUCOnM9yCDoRAabBJP1Yshe2glQuX2yCGnWsId7qrtcXUflTXtCBIJj0+lEYtQarIO5An3R2NOn9oKnmVS4K5IpZO4qjEsc1Myi5Ek1AKytKxk/ZVIwcLFgu7Lhuv7aBZ86TQqaNWwVC672cUbPc2nmeuMic4cxe5RbOf3GjCrP9cS2LskJR8OVFWaUANSzmBFJpBEc1d4BxI12twGfM9Q3dNC2aEF98ioY7oYvwvjt8/7Bq64b6+QJeUoGJCZ75IAckiMyJNw78y68L95X/9z/7H/zvy9Dfa/LeUz+Mv/nb1d/+DI=</latexit><latexit sha1_base64="CUCt8fc/HqYRscmeEmG1tEW24LM=">ADGXiclVJNb9QwEHVSPkr42sKRy4gV1VZVo6RCajkgVXDpCRWJpUXr1cpxnF2rThzsCdIqyu/opX+lFw6AOMKJf4M3G6HS5UBHsvw8b+Z5PJ6kVNJiFP3y/LUbN2/dXr8T3L13/8HD3saj91ZXhosh10qbk4RZoWQhihRiZPSCJYnShwnp68X/PEnYazUxTucl2Kcs2khM8kZOtdkw4sOBzW1GaQMWbMFL2FnkyqR4YBmhvGa9lv2g7BNc/mw3eE3umkoTzVSpaeT3WUq/H8uUCOnM9yCDoRAabBJP1Yshe2glQuX2yCGnWsId7qrtcXUflTXtCBIJj0+lEYtQarIO5An3R2NOn9oKnmVS4K5IpZO4qjEsc1Myi5Ek1AKytKxk/ZVIwcLFgu7Lhuv7aBZ86TQqaNWwVC672cUbPc2nmeuMic4cxe5RbOf3GjCrP9cS2LskJR8OVFWaUANSzmBFJpBEc1d4BxI12twGfM9Q3dNC2aEF98ioY7oYvwvjt8/7Bq64b6+QJeUoGJCZ75IAckiMyJNw78y68L95X/9z/7H/zvy9Dfa/LeUz+Mv/nb1d/+DI=</latexit><latexit sha1_base64="CUCt8fc/HqYRscmeEmG1tEW24LM=">ADGXiclVJNb9QwEHVSPkr42sKRy4gV1VZVo6RCajkgVXDpCRWJpUXr1cpxnF2rThzsCdIqyu/opX+lFw6AOMKJf4M3G6HS5UBHsvw8b+Z5PJ6kVNJiFP3y/LUbN2/dXr8T3L13/8HD3saj91ZXhosh10qbk4RZoWQhihRiZPSCJYnShwnp68X/PEnYazUxTucl2Kcs2khM8kZOtdkw4sOBzW1GaQMWbMFL2FnkyqR4YBmhvGa9lv2g7BNc/mw3eE3umkoTzVSpaeT3WUq/H8uUCOnM9yCDoRAabBJP1Yshe2glQuX2yCGnWsId7qrtcXUflTXtCBIJj0+lEYtQarIO5An3R2NOn9oKnmVS4K5IpZO4qjEsc1Myi5Ek1AKytKxk/ZVIwcLFgu7Lhuv7aBZ86TQqaNWwVC672cUbPc2nmeuMic4cxe5RbOf3GjCrP9cS2LskJR8OVFWaUANSzmBFJpBEc1d4BxI12twGfM9Q3dNC2aEF98ioY7oYvwvjt8/7Bq64b6+QJeUoGJCZ75IAckiMyJNw78y68L95X/9z/7H/zvy9Dfa/LeUz+Mv/nb1d/+DI=</latexit>ID3 Pseudocode
35 id3_algorithm(data, attributes, parent_data): if data is empty: return new node w/ most frequent classification in parent_data else if all examples in data have same classification: return new node with that classification else if attributes is empty: return new node w/ most frequent classification in data else: A = attribute with largest information gain tree = new decision tree with root A for each value a of attribute A new_data = all examples in data such that example.A = a subtree = id3_algorithm(new_data, attributes - A, data) attach subtree to tree with branch labeled “a” return treeNo examples w/ this combination of attribute values Examples w/ this combination of attribute values but they have different classifications
ID3
36Testing Decision Trees
Outline
ML Applications
ML Applications
Risk Assessment
Criminal Risk Assessment
Criminal Risk Assessment Tools
Criminal Risk Assessment Tools
Maryland, New York, Oklahoma, Virginia, Washington, Wisconsin, …
effectiveness
York started using tool in 2001
ProPublica Study
ProPublica Study
Algorithms
Algorithms
References
. Norvig
criminal-sentencing
York State COMPAS-Probation Risk and Need Assessment Study
DCJS_OPCA_COMPAS_Probation_Validity.pdf
Assessment System by Northpointe
Behavior-COMPAS.pdf
49References
References
Your Dragon”
51