Machine Learning & Decision Trees CS16: Introduction to Data - - PowerPoint PPT Presentation

machine learning decision trees
SMART_READER_LITE
LIVE PREVIEW

Machine Learning & Decision Trees CS16: Introduction to Data - - PowerPoint PPT Presentation

Machine Learning & Decision Trees CS16: Introduction to Data Structures & Algorithms Spring 2020 Outline Motivation Supervised learning Decision Trees ML Bias 2 Machine Learning Algorithms that use data to design


slide-1
SLIDE 1

Machine Learning & Decision Trees

CS16: Introduction to Data Structures & Algorithms Spring 2020

slide-2
SLIDE 2

Outline

  • Motivation
  • Supervised learning
  • Decision Trees
  • ML Bias
2
slide-3
SLIDE 3

Machine Learning

  • Algorithms that use data to design algorithms
  • Allows us to design algorithms
  • that predict the future (e.g., picking stocks)
  • even when we don’t know how (e.g., facial

recognition)

3 Learning Algo

data

Algo Algo

input

  • utput
slide-4
SLIDE 4 4 CS 147
slide-5
SLIDE 5

Applications of ML

  • Agriculture
  • Astronomy
  • Bioinformatics
  • Classifying DNA
  • Computer Vision
  • Finance
  • Linguistics
  • Medical diagnostics
  • Insurance
  • Economics
  • Advertising
  • Self-driving cars
  • Recommendation systems (e.g., Netflix)
  • Search engines
  • Translations
  • Robotics
  • Risk assessment
  • Drug discovery
  • Fraud discovery
  • Computational Anatomy
slide-6
SLIDE 6

Classes of ML

  • Supervised learning
  • learn to make accurate predictions from training data
  • Unsupervised learning
  • find patterns in data without training data
  • Reinforcement learning
  • improve performance with positive and negative

feedback

6
slide-7
SLIDE 7

Supervised Learning

  • Make accurate predictions/classifications
  • Is this email spam?
  • Will the snowstorm cancel class?
  • Will this flight be delayed?
  • Will this candidate win the next election?
  • How can our algorithm predict the future?
  • We train it using “training data” which are past examples
  • Examples of emails classified as spam and of emails classified as non-spam
  • Examples of snowstorms that have lead to cancelations and of snowstorms that
have not
  • Examples of flights that have been delayed and of flights that have left on time
  • Examples of candidates that won and of candidates that have lost
7
slide-8
SLIDE 8

Supervised Learning

  • Training data is a collection of examples
  • An example includes an input and its classification
  • inputs: flights, snowstorms, candidates, …
  • classifications: delayed/non-delayed, canceled/not canceled, win/lose
  • But how do we represent inputs for our algorithm?
  • What is a student? what is a flight? what is an email?
  • We have to choose attributes that describe the inputs
  • flight is represented by: source, destination, airline, number of passengers, …
  • snowstorm is represented by: duration, expected inches, winds, …
  • candidate is represented by: district, political affiliation, experience, …
8
slide-9
SLIDE 9

Example: Waiting for a Table

  • Design algorithm that predicts if patron will wait

for a table

  • What are the inputs?
  • the “context” of the patron’s decision
  • What are the attributes of this context?
  • is patron hungry? is the line long?
9
slide-10
SLIDE 10

Example: Waiting for a Table?

  • Input attributes
  • A1: Alternatives = {Yes, No}
  • A2: Bar = {Yes, No}
  • A3: Fri/Sat = {Yes, No}
  • A4: Hungry = {Yes, No}
  • A5: Patrons = {None, Some, Full}
  • A6: Price = {$, $$, $$$}
  • A7: Raining = {Yes, No}
  • A8: Reservation = {Yes, No}
  • A9: Type = {French, Italian, Thai, Burger}
  • A10: Wait = {10-30, 30-60, >60}
  • Classification: {Yes, No}
10
slide-11
SLIDE 11

Training Data

11
  • S. Russel & P. Norvig. Artificial Intelligence - A Modern Approach
slide-12
SLIDE 12

Supervised Learning

  • Classification
  • If classifications are from a finite set
  • ex: spam/not spam, delayed/not delayed
  • Regression
  • If classifications are real numbers
  • ex: temperature
12
slide-13
SLIDE 13

Outline

  • Motivation
  • Supervised learning
  • Decision Trees
  • Algorithmic Bias
13
slide-14
SLIDE 14

Decision Trees

  • A decision tree maps
  • inputs represented by attributes…
  • …to a classification
  • Examples
  • snowstorm_dt(12h,8”,strong winds) returns Yes
  • flight_dt(DL,PVD,Paris,night,no_storm,…) returns No
  • restaurant_dt(estimate,hungry,patrons,…) returns No
14
slide-15
SLIDE 15

Decision Tree Example

15
slide-16
SLIDE 16

Decision Tree Example

16

2 min

Activity #1

slide-17
SLIDE 17

Decision Tree Example

17

1 min

Activity #1

slide-18
SLIDE 18

Decision Tree Example

18

0 min

Activity #1

slide-19
SLIDE 19

Decision Tree Example

19
slide-20
SLIDE 20

Decision Tree Example

20
slide-21
SLIDE 21

Our Goal: Learning a Decision Tree

21

Training Data Decision Tree Learn

slide-22
SLIDE 22

What is a Good Decision Tree?

  • Consistent with training data
  • classifies training examples correctly
  • Performs well on future examples
  • classifies future inputs correctly
  • As small as possible
  • Efficient classification
  • How can we find a small decision tree?
  • there are possible decision trees
  • so brute force is not possible
22

Ω(22n)

slide-23
SLIDE 23

Iterative Dichotomizer 3 (ID3) Algorithm

23

Data (Learned) Decision Tree ID3

Ross Quinlan
slide-24
SLIDE 24

ID3

  • Starting at root
  • node is either an attribute node or a classification node (leaf)
  • outgoing edges are labeled with attribute values
  • children are either a classification node or another attribute node
  • Tree should be as small as possible
24
slide-25
SLIDE 25

ID3

25 French Thai Burger Italian 1xYes 1xNo 2xYes 2xNo 2xYes 2xNo 1xYes 1xNo Type 6xYes 6xNo

Uncertainty about whether we should wait or not

slide-26
SLIDE 26

ID3

26 Type French Thai Burger Italian 1xYes 1xNo 2xYes 2xNo 2xYes 2xNo 1xYes 1xNo Patrons Some Full None 4xYes 2xYes 4xNo 2xNo No uncertainty! Subproblem recur! 6xYes 6xNo 6xYes 6xNo
slide-27
SLIDE 27

ID3

  • Start at root with entire training data
  • Choose attribute that creates a “good split”
  • Attribute “splits” data into subsets
  • good split: children with subsets that are unmixed (with

same classification)

  • bad split: children with subsets that are mixed (with

different classification)

  • Children with unmixed subsets lead to a classification
  • Children with mixed subsets handled with recursion
27
slide-28
SLIDE 28

ID3

28 Type French Thai Burger Italian 1xYes 1xNo 2xYes 2xNo 2xYes 2xNo 1xYes 1xNo Patrons Some Full None 4xYes 2xYes 4xNo 2xNo

How do we distinguish

from “bad” attributes “good” attributes many mixed subsets many unmixed subsets

6xYes 6xNo 6xYes 6xNo
slide-29
SLIDE 29

ID3

  • How do we decide if attribute is good?
  • Compute entropy of each child
  • quantifies how mixed/alike it is
  • quantifies amount of certainty/uncertainty
  • Combine the entropies of all the children
  • Compare combined entropy of children to entropy
  • f node
  • This is called the information gain
29
slide-30
SLIDE 30

Entropy

  • Entropy of a dataset of examples
30 H(data) = − ✓ #Yes #Yes + #No · log2 ✓ #Yes #Yes + #No ◆ + ✓ 1 − #Yes #Yes + #No ◆ · log2 ✓ 1 − #Yes #Yes + #No ◆◆ <latexit sha1_base64="CUCt8fc/HqYRscmeEmG1tEW24LM=">ADGXiclVJNb9QwEHVSPkr42sKRy4gV1VZVo6RCajkgVXDpCRWJpUXr1cpxnF2rThzsCdIqyu/opX+lFw6AOMKJf4M3G6HS5UBHsvw8b+Z5PJ6kVNJiFP3y/LUbN2/dXr8T3L13/8HD3saj91ZXhosh10qbk4RZoWQhihRiZPSCJYnShwnp68X/PEnYazUxTucl2Kcs2khM8kZOtdkw4sOBzW1GaQMWbMFL2FnkyqR4YBmhvGa9lv2g7BNc/mw3eE3umkoTzVSpaeT3WUq/H8uUCOnM9yCDoRAabBJP1Yshe2glQuX2yCGnWsId7qrtcXUflTXtCBIJj0+lEYtQarIO5An3R2NOn9oKnmVS4K5IpZO4qjEsc1Myi5Ek1AKytKxk/ZVIwcLFgu7Lhuv7aBZ86TQqaNWwVC672cUbPc2nmeuMic4cxe5RbOf3GjCrP9cS2LskJR8OVFWaUANSzmBFJpBEc1d4BxI12twGfM9Q3dNC2aEF98ioY7oYvwvjt8/7Bq64b6+QJeUoGJCZ75IAckiMyJNw78y68L95X/9z/7H/zvy9Dfa/LeUz+Mv/nb1d/+DI=</latexit><latexit sha1_base64="CUCt8fc/HqYRscmeEmG1tEW24LM=">ADGXiclVJNb9QwEHVSPkr42sKRy4gV1VZVo6RCajkgVXDpCRWJpUXr1cpxnF2rThzsCdIqyu/opX+lFw6AOMKJf4M3G6HS5UBHsvw8b+Z5PJ6kVNJiFP3y/LUbN2/dXr8T3L13/8HD3saj91ZXhosh10qbk4RZoWQhihRiZPSCJYnShwnp68X/PEnYazUxTucl2Kcs2khM8kZOtdkw4sOBzW1GaQMWbMFL2FnkyqR4YBmhvGa9lv2g7BNc/mw3eE3umkoTzVSpaeT3WUq/H8uUCOnM9yCDoRAabBJP1Yshe2glQuX2yCGnWsId7qrtcXUflTXtCBIJj0+lEYtQarIO5An3R2NOn9oKnmVS4K5IpZO4qjEsc1Myi5Ek1AKytKxk/ZVIwcLFgu7Lhuv7aBZ86TQqaNWwVC672cUbPc2nmeuMic4cxe5RbOf3GjCrP9cS2LskJR8OVFWaUANSzmBFJpBEc1d4BxI12twGfM9Q3dNC2aEF98ioY7oYvwvjt8/7Bq64b6+QJeUoGJCZ75IAckiMyJNw78y68L95X/9z/7H/zvy9Dfa/LeUz+Mv/nb1d/+DI=</latexit><latexit sha1_base64="CUCt8fc/HqYRscmeEmG1tEW24LM=">ADGXiclVJNb9QwEHVSPkr42sKRy4gV1VZVo6RCajkgVXDpCRWJpUXr1cpxnF2rThzsCdIqyu/opX+lFw6AOMKJf4M3G6HS5UBHsvw8b+Z5PJ6kVNJiFP3y/LUbN2/dXr8T3L13/8HD3saj91ZXhosh10qbk4RZoWQhihRiZPSCJYnShwnp68X/PEnYazUxTucl2Kcs2khM8kZOtdkw4sOBzW1GaQMWbMFL2FnkyqR4YBmhvGa9lv2g7BNc/mw3eE3umkoTzVSpaeT3WUq/H8uUCOnM9yCDoRAabBJP1Yshe2glQuX2yCGnWsId7qrtcXUflTXtCBIJj0+lEYtQarIO5An3R2NOn9oKnmVS4K5IpZO4qjEsc1Myi5Ek1AKytKxk/ZVIwcLFgu7Lhuv7aBZ86TQqaNWwVC672cUbPc2nmeuMic4cxe5RbOf3GjCrP9cS2LskJR8OVFWaUANSzmBFJpBEc1d4BxI12twGfM9Q3dNC2aEF98ioY7oYvwvjt8/7Bq64b6+QJeUoGJCZ75IAckiMyJNw78y68L95X/9z/7H/zvy9Dfa/LeUz+Mv/nb1d/+DI=</latexit><latexit sha1_base64="CUCt8fc/HqYRscmeEmG1tEW24LM=">ADGXiclVJNb9QwEHVSPkr42sKRy4gV1VZVo6RCajkgVXDpCRWJpUXr1cpxnF2rThzsCdIqyu/opX+lFw6AOMKJf4M3G6HS5UBHsvw8b+Z5PJ6kVNJiFP3y/LUbN2/dXr8T3L13/8HD3saj91ZXhosh10qbk4RZoWQhihRiZPSCJYnShwnp68X/PEnYazUxTucl2Kcs2khM8kZOtdkw4sOBzW1GaQMWbMFL2FnkyqR4YBmhvGa9lv2g7BNc/mw3eE3umkoTzVSpaeT3WUq/H8uUCOnM9yCDoRAabBJP1Yshe2glQuX2yCGnWsId7qrtcXUflTXtCBIJj0+lEYtQarIO5An3R2NOn9oKnmVS4K5IpZO4qjEsc1Myi5Ek1AKytKxk/ZVIwcLFgu7Lhuv7aBZ86TQqaNWwVC672cUbPc2nmeuMic4cxe5RbOf3GjCrP9cS2LskJR8OVFWaUANSzmBFJpBEc1d4BxI12twGfM9Q3dNC2aEF98ioY7oYvwvjt8/7Bq64b6+QJeUoGJCZ75IAckiMyJNw78y68L95X/9z/7H/zvy9Dfa/LeUz+Mv/nb1d/+DI=</latexit> log(0) = 0 <latexit sha1_base64="NDBv5UwjCJzY/ZoaB4AHXijEA4w=">AB8XicbVBNS8NAEJ3Ur1q/qh69LBahXkoignoQil48VjC2kIay2W7apZvdsLsRSujP8OJBxav/xpv/xm2bg7Y+GHi8N8PMvCjlTBvX/XZK6tr6xvlzcrW9s7uXnX/4FHLTBHqE8ml6kRYU84E9Q0znHZSRXEScdqORrdTv/1ElWZSPJhxSsMEDwSLGcHGSkGXy0HdPUXyO1Va27DnQEtE68gNSjQ6lW/un1JsoQKQzjWOvDc1IQ5VoYRTieVbqZpiskID2hgqcAJ1WE+O3mCTqzSR7FUtoRBM/X3RI4TrcdJZDsTbIZ60ZuK/3lBZuLMGcizQwVZL4ozjgyEk3/R32mKDF8bAkmitlbERlihYmxKVsCN7iy8vEP2tcNbz781rzpkijDEdwDHXw4AKacAct8IGAhGd4hTfHOC/Ou/Mxby05xcwh/IHz+QNcSY+I</latexit><latexit sha1_base64="NDBv5UwjCJzY/ZoaB4AHXijEA4w=">AB8XicbVBNS8NAEJ3Ur1q/qh69LBahXkoignoQil48VjC2kIay2W7apZvdsLsRSujP8OJBxav/xpv/xm2bg7Y+GHi8N8PMvCjlTBvX/XZK6tr6xvlzcrW9s7uXnX/4FHLTBHqE8ml6kRYU84E9Q0znHZSRXEScdqORrdTv/1ElWZSPJhxSsMEDwSLGcHGSkGXy0HdPUXyO1Va27DnQEtE68gNSjQ6lW/un1JsoQKQzjWOvDc1IQ5VoYRTieVbqZpiskID2hgqcAJ1WE+O3mCTqzSR7FUtoRBM/X3RI4TrcdJZDsTbIZ60ZuK/3lBZuLMGcizQwVZL4ozjgyEk3/R32mKDF8bAkmitlbERlihYmxKVsCN7iy8vEP2tcNbz781rzpkijDEdwDHXw4AKacAct8IGAhGd4hTfHOC/Ou/Mxby05xcwh/IHz+QNcSY+I</latexit><latexit sha1_base64="NDBv5UwjCJzY/ZoaB4AHXijEA4w=">AB8XicbVBNS8NAEJ3Ur1q/qh69LBahXkoignoQil48VjC2kIay2W7apZvdsLsRSujP8OJBxav/xpv/xm2bg7Y+GHi8N8PMvCjlTBvX/XZK6tr6xvlzcrW9s7uXnX/4FHLTBHqE8ml6kRYU84E9Q0znHZSRXEScdqORrdTv/1ElWZSPJhxSsMEDwSLGcHGSkGXy0HdPUXyO1Va27DnQEtE68gNSjQ6lW/un1JsoQKQzjWOvDc1IQ5VoYRTieVbqZpiskID2hgqcAJ1WE+O3mCTqzSR7FUtoRBM/X3RI4TrcdJZDsTbIZ60ZuK/3lBZuLMGcizQwVZL4ozjgyEk3/R32mKDF8bAkmitlbERlihYmxKVsCN7iy8vEP2tcNbz781rzpkijDEdwDHXw4AKacAct8IGAhGd4hTfHOC/Ou/Mxby05xcwh/IHz+QNcSY+I</latexit><latexit sha1_base64="NDBv5UwjCJzY/ZoaB4AHXijEA4w=">AB8XicbVBNS8NAEJ3Ur1q/qh69LBahXkoignoQil48VjC2kIay2W7apZvdsLsRSujP8OJBxav/xpv/xm2bg7Y+GHi8N8PMvCjlTBvX/XZK6tr6xvlzcrW9s7uXnX/4FHLTBHqE8ml6kRYU84E9Q0znHZSRXEScdqORrdTv/1ElWZSPJhxSsMEDwSLGcHGSkGXy0HdPUXyO1Va27DnQEtE68gNSjQ6lW/un1JsoQKQzjWOvDc1IQ5VoYRTieVbqZpiskID2hgqcAJ1WE+O3mCTqzSR7FUtoRBM/X3RI4TrcdJZDsTbIZ60ZuK/3lBZuLMGcizQwVZL4ozjgyEk3/R32mKDF8bAkmitlbERlihYmxKVsCN7iy8vEP2tcNbz781rzpkijDEdwDHXw4AKacAct8IGAhGd4hTfHOC/Ou/Mxby05xcwh/IHz+QNcSY+I</latexit>

[ ]

slide-31
SLIDE 31

Entropy

  • Entropy of a dataset of examples
  • Intuition:
  • H(perfectly mixed) = 1
  • H(all the same) = 0
31 H(data) = − ✓ #Yes #Yes + #No · log2 ✓ #Yes #Yes + #No ◆ + ✓ 1 − #Yes #Yes + #No ◆ · log2 ✓ 1 − #Yes #Yes + #No ◆◆ <latexit sha1_base64="CUCt8fc/HqYRscmeEmG1tEW24LM=">ADGXiclVJNb9QwEHVSPkr42sKRy4gV1VZVo6RCajkgVXDpCRWJpUXr1cpxnF2rThzsCdIqyu/opX+lFw6AOMKJf4M3G6HS5UBHsvw8b+Z5PJ6kVNJiFP3y/LUbN2/dXr8T3L13/8HD3saj91ZXhosh10qbk4RZoWQhihRiZPSCJYnShwnp68X/PEnYazUxTucl2Kcs2khM8kZOtdkw4sOBzW1GaQMWbMFL2FnkyqR4YBmhvGa9lv2g7BNc/mw3eE3umkoTzVSpaeT3WUq/H8uUCOnM9yCDoRAabBJP1Yshe2glQuX2yCGnWsId7qrtcXUflTXtCBIJj0+lEYtQarIO5An3R2NOn9oKnmVS4K5IpZO4qjEsc1Myi5Ek1AKytKxk/ZVIwcLFgu7Lhuv7aBZ86TQqaNWwVC672cUbPc2nmeuMic4cxe5RbOf3GjCrP9cS2LskJR8OVFWaUANSzmBFJpBEc1d4BxI12twGfM9Q3dNC2aEF98ioY7oYvwvjt8/7Bq64b6+QJeUoGJCZ75IAckiMyJNw78y68L95X/9z/7H/zvy9Dfa/LeUz+Mv/nb1d/+DI=</latexit><latexit sha1_base64="CUCt8fc/HqYRscmeEmG1tEW24LM=">ADGXiclVJNb9QwEHVSPkr42sKRy4gV1VZVo6RCajkgVXDpCRWJpUXr1cpxnF2rThzsCdIqyu/opX+lFw6AOMKJf4M3G6HS5UBHsvw8b+Z5PJ6kVNJiFP3y/LUbN2/dXr8T3L13/8HD3saj91ZXhosh10qbk4RZoWQhihRiZPSCJYnShwnp68X/PEnYazUxTucl2Kcs2khM8kZOtdkw4sOBzW1GaQMWbMFL2FnkyqR4YBmhvGa9lv2g7BNc/mw3eE3umkoTzVSpaeT3WUq/H8uUCOnM9yCDoRAabBJP1Yshe2glQuX2yCGnWsId7qrtcXUflTXtCBIJj0+lEYtQarIO5An3R2NOn9oKnmVS4K5IpZO4qjEsc1Myi5Ek1AKytKxk/ZVIwcLFgu7Lhuv7aBZ86TQqaNWwVC672cUbPc2nmeuMic4cxe5RbOf3GjCrP9cS2LskJR8OVFWaUANSzmBFJpBEc1d4BxI12twGfM9Q3dNC2aEF98ioY7oYvwvjt8/7Bq64b6+QJeUoGJCZ75IAckiMyJNw78y68L95X/9z/7H/zvy9Dfa/LeUz+Mv/nb1d/+DI=</latexit><latexit sha1_base64="CUCt8fc/HqYRscmeEmG1tEW24LM=">ADGXiclVJNb9QwEHVSPkr42sKRy4gV1VZVo6RCajkgVXDpCRWJpUXr1cpxnF2rThzsCdIqyu/opX+lFw6AOMKJf4M3G6HS5UBHsvw8b+Z5PJ6kVNJiFP3y/LUbN2/dXr8T3L13/8HD3saj91ZXhosh10qbk4RZoWQhihRiZPSCJYnShwnp68X/PEnYazUxTucl2Kcs2khM8kZOtdkw4sOBzW1GaQMWbMFL2FnkyqR4YBmhvGa9lv2g7BNc/mw3eE3umkoTzVSpaeT3WUq/H8uUCOnM9yCDoRAabBJP1Yshe2glQuX2yCGnWsId7qrtcXUflTXtCBIJj0+lEYtQarIO5An3R2NOn9oKnmVS4K5IpZO4qjEsc1Myi5Ek1AKytKxk/ZVIwcLFgu7Lhuv7aBZ86TQqaNWwVC672cUbPc2nmeuMic4cxe5RbOf3GjCrP9cS2LskJR8OVFWaUANSzmBFJpBEc1d4BxI12twGfM9Q3dNC2aEF98ioY7oYvwvjt8/7Bq64b6+QJeUoGJCZ75IAckiMyJNw78y68L95X/9z/7H/zvy9Dfa/LeUz+Mv/nb1d/+DI=</latexit><latexit sha1_base64="CUCt8fc/HqYRscmeEmG1tEW24LM=">ADGXiclVJNb9QwEHVSPkr42sKRy4gV1VZVo6RCajkgVXDpCRWJpUXr1cpxnF2rThzsCdIqyu/opX+lFw6AOMKJf4M3G6HS5UBHsvw8b+Z5PJ6kVNJiFP3y/LUbN2/dXr8T3L13/8HD3saj91ZXhosh10qbk4RZoWQhihRiZPSCJYnShwnp68X/PEnYazUxTucl2Kcs2khM8kZOtdkw4sOBzW1GaQMWbMFL2FnkyqR4YBmhvGa9lv2g7BNc/mw3eE3umkoTzVSpaeT3WUq/H8uUCOnM9yCDoRAabBJP1Yshe2glQuX2yCGnWsId7qrtcXUflTXtCBIJj0+lEYtQarIO5An3R2NOn9oKnmVS4K5IpZO4qjEsc1Myi5Ek1AKytKxk/ZVIwcLFgu7Lhuv7aBZ86TQqaNWwVC672cUbPc2nmeuMic4cxe5RbOf3GjCrP9cS2LskJR8OVFWaUANSzmBFJpBEc1d4BxI12twGfM9Q3dNC2aEF98ioY7oYvwvjt8/7Bq64b6+QJeUoGJCZ75IAckiMyJNw78y68L95X/9z/7H/zvy9Dfa/LeUz+Mv/nb1d/+DI=</latexit> log(0) = 0 <latexit sha1_base64="NDBv5UwjCJzY/ZoaB4AHXijEA4w=">AB8XicbVBNS8NAEJ3Ur1q/qh69LBahXkoignoQil48VjC2kIay2W7apZvdsLsRSujP8OJBxav/xpv/xm2bg7Y+GHi8N8PMvCjlTBvX/XZK6tr6xvlzcrW9s7uXnX/4FHLTBHqE8ml6kRYU84E9Q0znHZSRXEScdqORrdTv/1ElWZSPJhxSsMEDwSLGcHGSkGXy0HdPUXyO1Va27DnQEtE68gNSjQ6lW/un1JsoQKQzjWOvDc1IQ5VoYRTieVbqZpiskID2hgqcAJ1WE+O3mCTqzSR7FUtoRBM/X3RI4TrcdJZDsTbIZ60ZuK/3lBZuLMGcizQwVZL4ozjgyEk3/R32mKDF8bAkmitlbERlihYmxKVsCN7iy8vEP2tcNbz781rzpkijDEdwDHXw4AKacAct8IGAhGd4hTfHOC/Ou/Mxby05xcwh/IHz+QNcSY+I</latexit><latexit sha1_base64="NDBv5UwjCJzY/ZoaB4AHXijEA4w=">AB8XicbVBNS8NAEJ3Ur1q/qh69LBahXkoignoQil48VjC2kIay2W7apZvdsLsRSujP8OJBxav/xpv/xm2bg7Y+GHi8N8PMvCjlTBvX/XZK6tr6xvlzcrW9s7uXnX/4FHLTBHqE8ml6kRYU84E9Q0znHZSRXEScdqORrdTv/1ElWZSPJhxSsMEDwSLGcHGSkGXy0HdPUXyO1Va27DnQEtE68gNSjQ6lW/un1JsoQKQzjWOvDc1IQ5VoYRTieVbqZpiskID2hgqcAJ1WE+O3mCTqzSR7FUtoRBM/X3RI4TrcdJZDsTbIZ60ZuK/3lBZuLMGcizQwVZL4ozjgyEk3/R32mKDF8bAkmitlbERlihYmxKVsCN7iy8vEP2tcNbz781rzpkijDEdwDHXw4AKacAct8IGAhGd4hTfHOC/Ou/Mxby05xcwh/IHz+QNcSY+I</latexit><latexit sha1_base64="NDBv5UwjCJzY/ZoaB4AHXijEA4w=">AB8XicbVBNS8NAEJ3Ur1q/qh69LBahXkoignoQil48VjC2kIay2W7apZvdsLsRSujP8OJBxav/xpv/xm2bg7Y+GHi8N8PMvCjlTBvX/XZK6tr6xvlzcrW9s7uXnX/4FHLTBHqE8ml6kRYU84E9Q0znHZSRXEScdqORrdTv/1ElWZSPJhxSsMEDwSLGcHGSkGXy0HdPUXyO1Va27DnQEtE68gNSjQ6lW/un1JsoQKQzjWOvDc1IQ5VoYRTieVbqZpiskID2hgqcAJ1WE+O3mCTqzSR7FUtoRBM/X3RI4TrcdJZDsTbIZ60ZuK/3lBZuLMGcizQwVZL4ozjgyEk3/R32mKDF8bAkmitlbERlihYmxKVsCN7iy8vEP2tcNbz781rzpkijDEdwDHXw4AKacAct8IGAhGd4hTfHOC/Ou/Mxby05xcwh/IHz+QNcSY+I</latexit><latexit sha1_base64="NDBv5UwjCJzY/ZoaB4AHXijEA4w=">AB8XicbVBNS8NAEJ3Ur1q/qh69LBahXkoignoQil48VjC2kIay2W7apZvdsLsRSujP8OJBxav/xpv/xm2bg7Y+GHi8N8PMvCjlTBvX/XZK6tr6xvlzcrW9s7uXnX/4FHLTBHqE8ml6kRYU84E9Q0znHZSRXEScdqORrdTv/1ElWZSPJhxSsMEDwSLGcHGSkGXy0HdPUXyO1Va27DnQEtE68gNSjQ6lW/un1JsoQKQzjWOvDc1IQ5VoYRTieVbqZpiskID2hgqcAJ1WE+O3mCTqzSR7FUtoRBM/X3RI4TrcdJZDsTbIZ60ZuK/3lBZuLMGcizQwVZL4ozjgyEk3/R32mKDF8bAkmitlbERlihYmxKVsCN7iy8vEP2tcNbz781rzpkijDEdwDHXw4AKacAct8IGAhGd4hTfHOC/Ou/Mxby05xcwh/IHz+QNcSY+I</latexit>

[ ]

slide-32
SLIDE 32

ID3 - Information Gain

32 Type French Thai Burger Italian 1xYes 1xNo 2xYes 2xNo 2xYes 2xNo 1xYes 1xNo Patrons Some Full None 4xYes 2xYes 4xNo 2xNo 6xYes 6xNo 6xYes 6xNo 1 entropy 1 entropy 1 entropy 1 entropy 0.8 entropy entropy entropy Weighted Sum Weighted Sum entropy

1

remainder

1

  • = 0
entropy 1 remainder 0.4
  • = 0.6
slide-33
SLIDE 33

ID3 - Notation

33 Type French Thai Burger Italian 1xYes 1xNo 2xYes 2xNo 2xYes 2xNo 1xYes 1xNo 6xYes 6xNo

data0 data2 data3 data1 data4

slide-34
SLIDE 34

Information gain

  • Entropy of a dataset of examples
  • Remainder of an attribute
  • Information gain of an attribute
34 Rem(Att) = d X i=1 #datai #data1 + · · · + #datad · H(datai) <latexit sha1_base64="bhQeD3I6bm4u58OIK2OjRa+Dlb4=">ACYXicbVFNT9swGHYyGNAxCHDk8oKqQipSriwHZBgHODIJgpITRcx6EWdhLZb5CqyH9oP2c3Tlz4Ibgph/LxSpYePx+y/TitpDAYho+e/2Vh8evS8krn2+r3tfVgY/PKlLVmfMBKWeqblBouRcEHKFDym0pzqlLJr9P706l+/cC1EWVxiZOKjxS9K0QuGEVHJcGkiU0Of7iyvRadINo9OILY1CpxFk/2YQ5qyJu62jowiTYS1b/aRhX2IWVaigf15IbN2xsN5bz6+10mCbtgP24GPIHoF3eMz+HdLCLlIgv9xVrJa8QKZpMYMo7DCUM1Cia57cS14RVl9/SODx0sqOJm1LQVWdh1TAZ5qd0qEFp2PtFQZcxEpc6pKI7Ne21KfqYNa8x/jBpRVDXygs0OymsJWMK0b8iE5gzlxAHKtHB3BTamrk90vzItIXr/5I9gcND/2Y9+uzJ+kdksk2yQ3okIofkmJyTCzIgjDx5i96at+49+x0/8DdnVt97zWyRN+NvwA+PbZf</latexit><latexit sha1_base64="2CojO+9lrlrcFJZSlyilaAQHBFo=">ACYXicbVFNT9swGHayAV3HR+iOXF5RTSpCqhIuwAEJ2GEc2bQOpKZEjuMUCzuJ7DdIVfAf2s/Zbad+CG4KYfy8UqWHj8fsv04raQwGIb/P/Dx5XVtc6n7uf1jc2tYLv325S1ZnzESlnq65QaLkXBRyhQ8utKc6pSya/Su29z/eqeayPK4hfOKj5RdFqIXDCKjkqCWRObH5yZQctOkO0e3ACsalV0oiTyN5kEOeasibut46MIk2EtS/2kYV9iFlWoH9ZSGzdsHDxWA5vtdNgn4DNuBtyB6Bv3T7/AnTh6ml0nwN85KViteIJPUmHEUVjhpqEbBJLfduDa8ouyOTvnYwYIqbiZNW5GFr47JIC+1WwVCy4nGqMmanUORXFW/Nam5PvaeMa86NJI4qRl6wxUF5LQFLmPcNmdCcoZw5QJkW7q7AbqnrE92vzEuIXj/5LRgdDI+H0Q9XxjlZTIfskF0yIBE5JKfkglySEWHkv7fibXpb3qPf9QO/t7D63nPmC3kx/s4T5dK3nQ=</latexit><latexit sha1_base64="2CojO+9lrlrcFJZSlyilaAQHBFo=">ACYXicbVFNT9swGHayAV3HR+iOXF5RTSpCqhIuwAEJ2GEc2bQOpKZEjuMUCzuJ7DdIVfAf2s/Zbad+CG4KYfy8UqWHj8fsv04raQwGIb/P/Dx5XVtc6n7uf1jc2tYLv325S1ZnzESlnq65QaLkXBRyhQ8utKc6pSya/Su29z/eqeayPK4hfOKj5RdFqIXDCKjkqCWRObH5yZQctOkO0e3ACsalV0oiTyN5kEOeasibut46MIk2EtS/2kYV9iFlWoH9ZSGzdsHDxWA5vtdNgn4DNuBtyB6Bv3T7/AnTh6ml0nwN85KViteIJPUmHEUVjhpqEbBJLfduDa8ouyOTvnYwYIqbiZNW5GFr47JIC+1WwVCy4nGqMmanUORXFW/Nam5PvaeMa86NJI4qRl6wxUF5LQFLmPcNmdCcoZw5QJkW7q7AbqnrE92vzEuIXj/5LRgdDI+H0Q9XxjlZTIfskF0yIBE5JKfkglySEWHkv7fibXpb3qPf9QO/t7D63nPmC3kx/s4T5dK3nQ=</latexit><latexit sha1_base64="AIBnFzLOhWi2JHEzoGiTa9O/yNY=">ACYXicbVFNS8MwGE7r9/yq8+glOISJMFov6kHw47KjilNhnSVNUw1L2pK8FUbJn/TmyYs/xKzbYXO+EHjyfJDkSVwIrsH3vx3aXldW19o7G5tb2z6+01n3ReKsp6NBe5eomJZoJnrAcBHspFCMyFuw5Ht6O9ecPpjTPs0cYFWwgyVvGU04JWCryRlWoU/zApGnX6BrAHONLHOpSRhW/DMxrgsNUEVqFrdqRECARN2ZuHxh8gkOa5KDxyayQGDPhcbc9Gz9uRF7L7/j14EUQTELTecu8j7DJKelZBlQbTuB34Bg4o4FQw0whLzQpCh+SN9S3MiGR6UNUVGXxkmQSnubIrA1yzs4mKSK1HMrZOSeBd/9XG5H9av4T0fFDxrCiBZXRyUFoKDke940TrhgFMbKAUMXtXTF9J7ZPsL8yLiH4+RF0DvtXHSCe791dTNtYx0doEPURgE6Q1eoi+5QD1H07aw4O86u8+M2XM9tTqyuM83so7lxD34B9yq0rg=</latexit>

Gain(Att) = H(data0) − Rem(Att)

<latexit sha1_base64="SPj7aZrFYj9woaAL0BHWfCVaFTI=">ACJXicbVBNSwMxEM36WetX1aOXYBHag2VXBPVQqHrQo4pVoS1lNs1qaDa7JLNCWfbXePGvePFQRfDkXzHd9uDXg8Cb92aYzPNjKQy67oczNT0zOzdfWCguLi2vrJbW1q9NlGjGmySkb71wXApFG+iQMlvY80h9CW/8fsnI/mgWsjInWFg5h3QrhTIhAM0ErdUj1tm4CeglBZJadHiFm1Ts/GVQ8Qsq5bpTs0ry95+L2vWyq7NTcH/Uu8CSmTCc67pWG7F7Ek5AqZBGNanhtjJwWNgkmeFduJ4TGwPtzxlqUKQm46aX5mRret0qNBpO1TSHP1+0QKoTGD0LedIeC9+e2NxP+8VoLBQScVKk6QKzZeFCSYkRHmdGe0JyhHFgCTAv7V8ruQNDm2zRhuD9Pvkvae7WDmvexV65cTxJo0A2yRapEI/skwY5I+ekSRh5JM9kSF6dJ+fFeXPex61TzmRmg/yA8/kFd56jhQ=</latexit><latexit sha1_base64="SPj7aZrFYj9woaAL0BHWfCVaFTI=">ACJXicbVBNSwMxEM36WetX1aOXYBHag2VXBPVQqHrQo4pVoS1lNs1qaDa7JLNCWfbXePGvePFQRfDkXzHd9uDXg8Cb92aYzPNjKQy67oczNT0zOzdfWCguLi2vrJbW1q9NlGjGmySkb71wXApFG+iQMlvY80h9CW/8fsnI/mgWsjInWFg5h3QrhTIhAM0ErdUj1tm4CeglBZJadHiFm1Ts/GVQ8Qsq5bpTs0ry95+L2vWyq7NTcH/Uu8CSmTCc67pWG7F7Ek5AqZBGNanhtjJwWNgkmeFduJ4TGwPtzxlqUKQm46aX5mRret0qNBpO1TSHP1+0QKoTGD0LedIeC9+e2NxP+8VoLBQScVKk6QKzZeFCSYkRHmdGe0JyhHFgCTAv7V8ruQNDm2zRhuD9Pvkvae7WDmvexV65cTxJo0A2yRapEI/skwY5I+ekSRh5JM9kSF6dJ+fFeXPex61TzmRmg/yA8/kFd56jhQ=</latexit><latexit sha1_base64="SPj7aZrFYj9woaAL0BHWfCVaFTI=">ACJXicbVBNSwMxEM36WetX1aOXYBHag2VXBPVQqHrQo4pVoS1lNs1qaDa7JLNCWfbXePGvePFQRfDkXzHd9uDXg8Cb92aYzPNjKQy67oczNT0zOzdfWCguLi2vrJbW1q9NlGjGmySkb71wXApFG+iQMlvY80h9CW/8fsnI/mgWsjInWFg5h3QrhTIhAM0ErdUj1tm4CeglBZJadHiFm1Ts/GVQ8Qsq5bpTs0ry95+L2vWyq7NTcH/Uu8CSmTCc67pWG7F7Ek5AqZBGNanhtjJwWNgkmeFduJ4TGwPtzxlqUKQm46aX5mRret0qNBpO1TSHP1+0QKoTGD0LedIeC9+e2NxP+8VoLBQScVKk6QKzZeFCSYkRHmdGe0JyhHFgCTAv7V8ruQNDm2zRhuD9Pvkvae7WDmvexV65cTxJo0A2yRapEI/skwY5I+ekSRh5JM9kSF6dJ+fFeXPex61TzmRmg/yA8/kFd56jhQ=</latexit><latexit sha1_base64="SPj7aZrFYj9woaAL0BHWfCVaFTI=">ACJXicbVBNSwMxEM36WetX1aOXYBHag2VXBPVQqHrQo4pVoS1lNs1qaDa7JLNCWfbXePGvePFQRfDkXzHd9uDXg8Cb92aYzPNjKQy67oczNT0zOzdfWCguLi2vrJbW1q9NlGjGmySkb71wXApFG+iQMlvY80h9CW/8fsnI/mgWsjInWFg5h3QrhTIhAM0ErdUj1tm4CeglBZJadHiFm1Ts/GVQ8Qsq5bpTs0ry95+L2vWyq7NTcH/Uu8CSmTCc67pWG7F7Ek5AqZBGNanhtjJwWNgkmeFduJ4TGwPtzxlqUKQm46aX5mRret0qNBpO1TSHP1+0QKoTGD0LedIeC9+e2NxP+8VoLBQScVKk6QKzZeFCSYkRHmdGe0JyhHFgCTAv7V8ruQNDm2zRhuD9Pvkvae7WDmvexV65cTxJo0A2yRapEI/skwY5I+ekSRh5JM9kSF6dJ+fFeXPex61TzmRmg/yA8/kFd56jhQ=</latexit> H(data) = − ✓ #Yes #Yes + #No · log2 ✓ #Yes #Yes + #No ◆ + ✓ 1 − #Yes #Yes + #No ◆ · log2 ✓ 1 − #Yes #Yes + #No ◆◆ <latexit sha1_base64="CUCt8fc/HqYRscmeEmG1tEW24LM=">ADGXiclVJNb9QwEHVSPkr42sKRy4gV1VZVo6RCajkgVXDpCRWJpUXr1cpxnF2rThzsCdIqyu/opX+lFw6AOMKJf4M3G6HS5UBHsvw8b+Z5PJ6kVNJiFP3y/LUbN2/dXr8T3L13/8HD3saj91ZXhosh10qbk4RZoWQhihRiZPSCJYnShwnp68X/PEnYazUxTucl2Kcs2khM8kZOtdkw4sOBzW1GaQMWbMFL2FnkyqR4YBmhvGa9lv2g7BNc/mw3eE3umkoTzVSpaeT3WUq/H8uUCOnM9yCDoRAabBJP1Yshe2glQuX2yCGnWsId7qrtcXUflTXtCBIJj0+lEYtQarIO5An3R2NOn9oKnmVS4K5IpZO4qjEsc1Myi5Ek1AKytKxk/ZVIwcLFgu7Lhuv7aBZ86TQqaNWwVC672cUbPc2nmeuMic4cxe5RbOf3GjCrP9cS2LskJR8OVFWaUANSzmBFJpBEc1d4BxI12twGfM9Q3dNC2aEF98ioY7oYvwvjt8/7Bq64b6+QJeUoGJCZ75IAckiMyJNw78y68L95X/9z/7H/zvy9Dfa/LeUz+Mv/nb1d/+DI=</latexit><latexit sha1_base64="CUCt8fc/HqYRscmeEmG1tEW24LM=">ADGXiclVJNb9QwEHVSPkr42sKRy4gV1VZVo6RCajkgVXDpCRWJpUXr1cpxnF2rThzsCdIqyu/opX+lFw6AOMKJf4M3G6HS5UBHsvw8b+Z5PJ6kVNJiFP3y/LUbN2/dXr8T3L13/8HD3saj91ZXhosh10qbk4RZoWQhihRiZPSCJYnShwnp68X/PEnYazUxTucl2Kcs2khM8kZOtdkw4sOBzW1GaQMWbMFL2FnkyqR4YBmhvGa9lv2g7BNc/mw3eE3umkoTzVSpaeT3WUq/H8uUCOnM9yCDoRAabBJP1Yshe2glQuX2yCGnWsId7qrtcXUflTXtCBIJj0+lEYtQarIO5An3R2NOn9oKnmVS4K5IpZO4qjEsc1Myi5Ek1AKytKxk/ZVIwcLFgu7Lhuv7aBZ86TQqaNWwVC672cUbPc2nmeuMic4cxe5RbOf3GjCrP9cS2LskJR8OVFWaUANSzmBFJpBEc1d4BxI12twGfM9Q3dNC2aEF98ioY7oYvwvjt8/7Bq64b6+QJeUoGJCZ75IAckiMyJNw78y68L95X/9z/7H/zvy9Dfa/LeUz+Mv/nb1d/+DI=</latexit><latexit sha1_base64="CUCt8fc/HqYRscmeEmG1tEW24LM=">ADGXiclVJNb9QwEHVSPkr42sKRy4gV1VZVo6RCajkgVXDpCRWJpUXr1cpxnF2rThzsCdIqyu/opX+lFw6AOMKJf4M3G6HS5UBHsvw8b+Z5PJ6kVNJiFP3y/LUbN2/dXr8T3L13/8HD3saj91ZXhosh10qbk4RZoWQhihRiZPSCJYnShwnp68X/PEnYazUxTucl2Kcs2khM8kZOtdkw4sOBzW1GaQMWbMFL2FnkyqR4YBmhvGa9lv2g7BNc/mw3eE3umkoTzVSpaeT3WUq/H8uUCOnM9yCDoRAabBJP1Yshe2glQuX2yCGnWsId7qrtcXUflTXtCBIJj0+lEYtQarIO5An3R2NOn9oKnmVS4K5IpZO4qjEsc1Myi5Ek1AKytKxk/ZVIwcLFgu7Lhuv7aBZ86TQqaNWwVC672cUbPc2nmeuMic4cxe5RbOf3GjCrP9cS2LskJR8OVFWaUANSzmBFJpBEc1d4BxI12twGfM9Q3dNC2aEF98ioY7oYvwvjt8/7Bq64b6+QJeUoGJCZ75IAckiMyJNw78y68L95X/9z/7H/zvy9Dfa/LeUz+Mv/nb1d/+DI=</latexit><latexit sha1_base64="CUCt8fc/HqYRscmeEmG1tEW24LM=">ADGXiclVJNb9QwEHVSPkr42sKRy4gV1VZVo6RCajkgVXDpCRWJpUXr1cpxnF2rThzsCdIqyu/opX+lFw6AOMKJf4M3G6HS5UBHsvw8b+Z5PJ6kVNJiFP3y/LUbN2/dXr8T3L13/8HD3saj91ZXhosh10qbk4RZoWQhihRiZPSCJYnShwnp68X/PEnYazUxTucl2Kcs2khM8kZOtdkw4sOBzW1GaQMWbMFL2FnkyqR4YBmhvGa9lv2g7BNc/mw3eE3umkoTzVSpaeT3WUq/H8uUCOnM9yCDoRAabBJP1Yshe2glQuX2yCGnWsId7qrtcXUflTXtCBIJj0+lEYtQarIO5An3R2NOn9oKnmVS4K5IpZO4qjEsc1Myi5Ek1AKytKxk/ZVIwcLFgu7Lhuv7aBZ86TQqaNWwVC672cUbPc2nmeuMic4cxe5RbOf3GjCrP9cS2LskJR8OVFWaUANSzmBFJpBEc1d4BxI12twGfM9Q3dNC2aEF98ioY7oYvwvjt8/7Bq64b6+QJeUoGJCZ75IAckiMyJNw78y68L95X/9z/7H/zvy9Dfa/LeUz+Mv/nb1d/+DI=</latexit>
slide-35
SLIDE 35

ID3 Pseudocode

35 id3_algorithm(data, attributes, parent_data): if data is empty: return new node w/ most frequent classification in parent_data else if all examples in data have same classification: return new node with that classification else if attributes is empty: return new node w/ most frequent classification in data else: A = attribute with largest information gain tree = new decision tree with root A for each value a of attribute A new_data = all examples in data such that example.A = a subtree = id3_algorithm(new_data, attributes - A, data) attach subtree to tree with branch labeled “a” return tree

No examples w/ this combination of attribute values Examples w/ this combination of attribute values but they have different classifications

slide-36
SLIDE 36

ID3

36
  • S. Russel & P. Norvig. Artificial Intelligence - A Modern Approach
slide-37
SLIDE 37

Testing Decision Trees

  • ID3 learns a decision tree from training data
  • How good is this decision tree?
  • How accurately does it classify inputs it hasn’t seen?
  • New flights, new snowstorms, new patrons, …
  • Split examples into two non-overlapping sets
  • training data & testing data
  • by assigning each example to one set uniformly at random
  • Accuracy of decision tree
  • # of correct classifications on testing data / # of testing examples
37
slide-38
SLIDE 38

Outline

  • Motivation
  • Supervised learning
  • Decision Trees
  • ML Bias
38
slide-39
SLIDE 39

ML Applications

  • Learned algorithms are different than classical algorithms
  • A learned algorithm depends on training data
  • A learned algorithm depends on features
  • Learned algorithms are being embedded everywhere
  • Traditional applications
  • spell checking
  • ad targeting
  • recommendations
  • speech & handwriting recognition
  • computer vision
39
slide-40
SLIDE 40

ML Applications

  • Newer applications
  • News feeds, self-driving cars
  • insurance companies, medical systems, K-12 education
  • Risk assessment
  • These have serious impact on society
  • should news be tailored to your political beliefs?
  • should car save driver or pedestrian?
  • should we deny freedoms based on risk assessments?
40
slide-41
SLIDE 41

Risk Assessment

  • Learned algorithms that predict risk
  • Insurance premiums
  • Credit
  • Crime
  • Recidivism
  • Bonds
  • Sentencing
41
slide-42
SLIDE 42

Criminal Risk Assessment

  • ML in criminal justice system
  • better predict who will commit new crimes
  • Purported goals
  • a fairer system
  • less people in jail
  • for less time
42
slide-43
SLIDE 43

Criminal Risk Assessment Tools

  • COMPAS by Northpointe predicts
  • Risk of new violent crime
  • Risk of general recidivism
  • Pretrial risk (failure to appear)
  • Public Safety Assessment by Arnold Foundation
  • helps Judges make release/detention decisions
43
slide-44
SLIDE 44

Criminal Risk Assessment Tools

  • Used in
  • Arizona, Colorado, Delaware, Kentucky, Louisiana,

Maryland, New York, Oklahoma, Virginia, Washington, Wisconsin, …

  • Often adopted without independent study of

effectiveness

  • Example
  • New

York started using tool in 2001

  • Studied it only in 2012
44
slide-45
SLIDE 45

ProPublica Study

  • In 2016 ProPublica conducted a study of COMPAS
  • 7000 arrests in Broward County, FL
  • between 2013 and 2014
  • OK predictions for all crimes (misdemeanors included)
  • 61% of people labeled high risk committed new crimes
  • But unreliable for violent crimes
  • 20% of people labeled high risk committed new violent crimes
45
slide-46
SLIDE 46

ProPublica Study

  • Found significant racial disparities
  • Out of people labeled high risk but didn’t re-offend
  • 44.9% were African American
  • 23.5% were White
  • Out of people labeled low risk but did re-offend
  • 28% were African American
  • 47.7% were White
  • Study accounted for
  • Criminal history, age and gender
46
slide-47
SLIDE 47

Algorithms

  • Algorithms are impacting our lives
  • You are learning how to
  • design algorithms
  • analyze algorithms
  • implement algorithms
  • But don’t forget that…
  • …your algorithms will impact people
47
slide-48
SLIDE 48

Algorithms

  • Algorithms can do a lot amazing things
  • But they can also
  • deny people their freedom
  • addict people
  • sway elections
  • expose people’s private lives
48
slide-49
SLIDE 49

References

  • Artificial Intelligence - A Modern Approach by S. Russel & P

. Norvig

  • Machine Bias by ProPublica
  • https://www.propublica.org/article/machine-bias-risk-assessments-in-

criminal-sentencing

  • New

York State COMPAS-Probation Risk and Need Assessment Study

  • http://www.northpointeinc.com/downloads/research/

DCJS_OPCA_COMPAS_Probation_Validity.pdf

  • Evaluating the Predictive Validity of the COMPAS Risk and Needs

Assessment System by Northpointe

  • http://www.northpointeinc.com/files/publications/Criminal-Justice-

Behavior-COMPAS.pdf

49
slide-50
SLIDE 50

References

  • Fairness, Accountability and Transparency in ML
  • http://fatml.mysociety.org
  • Center for Humane Technology
  • http://humanetech.com/
50
slide-51
SLIDE 51

References

  • Slide #2
  • Toothless from “How to Train

Your Dragon”

51