ARTIFICIAL INTELLIGENCE Supervised learning: classification - PowerPoint PPT Presentation

Utrecht University INFOB2KI 2019-2020 The Netherlands ARTIFICIAL INTELLIGENCE Supervised learning: classification Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html

Requirements Supervised learning algorithms for classification learn the relation between  class‐labels (the things to predict) and feature/attribute values (observable things) Various algorithms use probabilistic relations between class and feature. The required probabilities  can be assessed from the data using frequency counting 3

When to play tennis? Example dataset D: N=14 cases, 4 attributes , 1 class variable 4

Frequency counting Our example class variable PlayTennis (PT) has 2 possible values ( yes , no); feature Temperature has 3 values (hot, mild, cool). With N=14 cases, we find with frequency counting the following prior probabilities for the class labels:  9 out of N=14 examples are positive  p(PT=yes) = 9/14  5 of these 14 are negative  p(PT=no) = 5/14 Similary, the conditional probabilities for the features given the class can be determined. E.g. given PT=yes, we find that out of the 9 cases, 2 were in hot conditions, 4 in mild and 3 in cool  p(Temp = hot | PT =yes) = 2/9 p(Temp = mild | PT =yes) = 4/9 p(Temp = cool | PT =yes) = 3/9 5

Naïve Bayes classifier updated given forecast…. Supervised learning of naive Bayes classifier 6

Naive Bayes classifier: learning A naive Bayes classifier specifies  a class variable C  feature variables F 1 ,…,F n  a prior distribution p(C); probabilities sum to one (!)  conditional distributions p(F i |C) (probabilities sum to one for each C=c) Distributions p(C) and p(F i |C) can be `learned’ from data. E.g. simple approach: frequency counting. More sophisticated approach also learns the ‘structure’ of the model, i.e. determines which features to include  requires performance measure (e.g. accuracy). 7

Naive Bayes classifier: use A naive Bayes classifier predicts a most likely value c for class C given observed features F i = f i from: where 1/Z = 1/p(F 1 ,…,F n ) is a normalisation constant. This formula is based on  Bayes’ rule: p(A|B) = p(B|A)p(A)/p(B)  and the naive assumption that all n feature variables are in dependent given the class variable. 8

Learn NBC - example Model ‘structure’ is fixed; just need probabilities from data. F eature variables: C lass variable: PlayTennis Outlook, Temp., Humidity, Wind Conditionals p(F i | C) PT= yes PT= no O=sunny 2/9 3/5 Class Priors: O=overcast 4/9 0 P(C) = P(PlayTennis) = O=rain 3/9 2/5 { T=hot 2/9 2/5 p(PlayTennis=yes) = 9/14 , T=mild 4/9 2/5 p(PT=no) = 5/14 } T=cool 3/9 1/5 H=high 3/9 4/5 H=normal 6/9 1/5 Probabilities based on frequency counting. W=weak 6/9 2/5 W=strong 3/9 3/5 9

Classify with NBC - example Feature variables: O, T, H, W Conditionals p(F i | C) PT= yes PT= no O=sunny 2/9 3/5 O=overcast 4/9 0 Class variable: PT O=rain 3/9 2/5 Class Priors: T=hot 2/9 2/5 { p(PT=yes) = 9/14, T=mild 4/9 2/5 p(PT=no) = 5/14 } T=cool 3/9 1/5 H=high 3/9 4/5 H=normal 6/9 1/5 W=weak 6/9 2/5 W=strong 3/9 3/5 Classify ‘instance’ e =<O=sunny, T=hot, H=normal, W=weak>: p(PT=yes | e) = 1/Z *9/14*2/9*2/9*6/9*6/9 = 1/Z * 0.01411 > p(PT=no | e) = 1/Z *5/14*3/5*2/5*1/5*2/5 = 1/Z * 0.00686 10

NBC Properties  NBC learning is complete (Probabilistic: can handle inconsistencies in data)  NBC learning is not optimal (Irrealistic independence assumptions  class posterior often unreliable; yet accurate prediction of most likely value)  Time and space complexity: independence assumptions strongly reduce dimensionality  NBC can overfit on the training data (especially with large number of features)  NBC has been further optimized  TAN/FAN/KDB 11

Decision tree learning Supervised learning of decision tree classifier by means of `splitting’ on attributes 1. What is that? 2. How to split? (ID3) 12

Example data set: when to play tennis, again 13

Decision Tree splits I Let’s start building the tree from scratch.  we first need to decide on which attribute to make a decision. Let’s say 1 we selected “ Humidity”; split data according to the attribute’s values: Humidity normal high D1,D2,D3,D4 D5,D6,D7,D9 D8,D12,D14 D10,D11,D13 1 NB using ID3, this choice will be made by the algorithm… 14

Decision Tree splits - II Now let’s split the first subset (H=high) D1,D2,D3,D4,D8,D12,D14 using attribute “ Wind”: Humidity normal high Wind D5,D6,D7,D9 D10,D11,D13 strong weak D1,D3,D4,D8 D2,D12,D14 15

Decision Tree splits - III Now let’s split the subset H=high & W=strong Humidity (D2,D12,D14) normal high using attribute D5,D6,D7,D9 Wind “ Outlook” D10,D11,D13 strong weak Outlook D1,D3,D4,D8 Sunny Overcast Rain No No Yes entire subset classified 16

Decision Tree splits - IV Now let’s split the subset H=high & W=weak (D1,D3,D4,D8) using attribute “ Outlook” Humidity normal high D5,D6,D7,D9 Wind D10,D11,D13 strong weak Outlook Outlook Sunny Overcast Overcast Rain Sunny Rain No No Yes Yes No Yes 17

Decision Tree splits – V Now let’s split the subset H= normal (D5,D6,D7,D9,D10,D11,D13) using “ Outlook ” Humidity normal high outlook wind Sunny strong weak Rain Overcast Yes Yes D5,D6,D10 outlook outlook Sunny Overcast Overcast Rain Sunny Rain No No Yes No Yes Yes 18

Decision Tree splits – VI Now let’s split subset H=normal & O=rain (D5,D6,D10) using “ Wind ” Humidity normal high wind outlook strong Sunny weak Rain Overcast Yes Yes wind outlook outlook weak strong Sunny Overcast Sunny Rain Overcast Rain Yes No No No Yes Yes No Yes 19

Final Decision Tree (humidity=high  wind=strong  outlook=overcast)  Note: The decision tree can (humidity=high  wind=weak  outlook=overcast)  be expressed as an expression (humidity=high  wind=weak  outlook=rain)  of if‐then‐else sentences, or (humidity=normal  outlook=sunny)  – in case of binary outcomes – (humidity=normal  outlook=overcast)  a logical formula : (humidity=normal  outlook=rain  wind=weak) Humidity normal high wind outlook strong Sunny weak Rain Overcast Yes Yes wind outlook outlook weak strong Sunny Overcast Sunny Rain Overcast Rain Yes No No No Yes Yes No Yes 20

Classifying with Decision Trees Now classify instance <O=sunny, T=hot, H=normal, W=weak> = ??? Humidity normal high wind outlook strong weak Sunny Rain Overcast Yes Yes wind outlook outlook Sunny Sunny weak Rain Overcast Rain Overcast strong Yes No Yes No Yes No No Yes 21

Classifying with Decision Trees Now classify instance <O=sunny, T=hot, H=normal, W=weak> = ??? Humidity normal high wind outlook strong weak Sunny Rain Overcast Yes Yes wind outlook outlook Sunny Sunny weak Rain Overcast Rain Overcast strong Yes No Yes No Yes No No Yes Note that this was an ‘unseen’ instance (not in data). 22

Alternative Decision Trees Another tree from the same data, using different attributes: We can build quite a large number of (unique) decision trees… So which attribute should we choose at branches? 23

ID3: an entropy-based decision tree learner 24

Entropy A measure of the disorder or randomness in a closed system with variable(s) of interest S : where n = | S | is the number of values of S  Convention: 0 log 2 0 = 0  For a degenerate distribution, the entropy will be 0 (why?)  For a uniform distribution, the entropy will be log 2 n (= 1 for binary‐valued variable)  Recall: log 2 x = log b x / log b 2 for any base‐b logarithm 25

Entropy: example In our system we have 1 variable of interest (S= PlayTennis ), with 2 possible values i ( yes , no)  n=|S|=2. Let p + = p(PT=yes) and p − = p(PT=no); we again use frequency counting to establish these probabilities from the data; recall:  9 out of N=14 examples are positive  p + = 9/14  5 of these 14 are negative  p − = 5/14  Entropy( PlayTennis ) = = − p + log 2 p + − p − log 2 p − = = −(9/14)log 2 (9/14) − (5/14)log 2 (5/14) = 0.940 26

Conditional & Expected Entropy Conditional entropy Entropy( S | X ) is the entropy we expect in a system S when another variable X is given; it is the expected value of the entropy given possible values x of X : Entropy( S | X ) = �� where for a specific value x of X: Entropy( S | X = x ) with n = |S| NB We will use the following short‐hand notations (!): • Entropy( S X ) for Entropy( S | X ) = conditional entropy • Entropy( S x ) for Entropy( S | X = x ) = entropy given specific x 27

ARTIFICIAL INTELLIGENCE Supervised learning: classification - PowerPoint PPT Presentation

Utrecht University INFOB2KI 2019-2020 The Netherlands ARTIFICIAL INTELLIGENCE Supervised learning: classification Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from

Artificial Intelligence Artificial Intelligence Artificial Intelligence Study and design of

Artificial Intelligence Course Presentation Summary Artificial Intelligence Motivations

Artificial Intelligence Course Presentation Summary Artificial Intelligence Motivations

Artificial intelligence Artificial Intelligence is the science of PHILOSOPHY OF ARTIFICIAL

Artificial Intelligence Intro (Chapter 1 of AIMA) Summary Artificial Intelligence What is AI?

CS330 Paper Presentation: October 16th, 2019 Supervised Classification Semi-Supervised

Weakly Supervised Classification Weakly Supervised Classification and Robust Learning and Robust

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

What is Artificial Intelligence? CPSC 322 Lecture 1 September 5, 2007 What is Artificial

Traditional Definition of Artificial Intelligence Trends Artificial Intelligence (AI) is

8th November 2019 Artificial Intelligence Finance Institute NYU Courant Artificial Intelligence

http://www.ai.rug.nl/~verheij/sysu2018/ Artificial intelligence Specialized artificial

Generative Adversarial Networks (GANs) By: Ismail Elezi ismail.elezi@gmail.com Supervised

Supervised Learning Prof. Kuan-Ting Lai 2020/4/9 Machine Learning Supervised Unsupervised

Artificial Intelligence as Law Bart Verheij Department of Artificial Intelligence, Bernoulli

CSCI 446 ARTIFICIAL INTELLIGENCE EXAM 1 STUDY OUTLINE Introduction to Artificial Intelligence

Perform EDA AN ALYZ IN G IOT DATA IN P YTH ON Matthias Voppichler IT Developer Plot dataframe

Contribution to the humidity condition supervision in the ATLAS Inner Detector volume

Concept Learning Mitchell, Chapter 2 CptS 570 Machine Learning School of EECS Washington State

Thermal Modeling for a HVAC Controlled Real-life Auditorium Chenyang Lu Endeavor on Smart

Learning Decision Trees Machine Learning 1 Some slides from Tom Mitchell, Dan Roth and others

Evolution of convective of convective cloud cloud top top height height: : Evolution

Average Path Profile of Atmospheric Temperature and Humidity

Discovering and Building Semantic Models of Web Sources Craig A. Knoblock University of Southern

Sambuz

Useful Links

Newsletter

Mail Us