Machine Learning
CS 486/686: Introduction to Artificial Intelligence
1
Machine Learning CS 486/686: Introduction to Artificial Intelligence - - PowerPoint PPT Presentation
Machine Learning CS 486/686: Introduction to Artificial Intelligence 1 Outline Forms of Learning Inductive Learning - Decision Trees 2 What is Machine Learning Definition: - A computer program is said to learn from experience E with
CS 486/686: Introduction to Artificial Intelligence
1
2
experience E with respect to some class of tasks T and performance measures P, if its performance at tasks in T, as measured by P, improves with experience E.
[T Mitchell, 1997]
3
classifications
4
using vision sensors
was made (as judged by human overseer)
commands recorded while observing a human driver
5
and outputs
6
patterns
7
punishments)
8
Today’s lecture Special case for neural nets
9
Sky AirTemp
Humidity
Wind Water Forecast
EnjoySport
Sunny Warm Normal Strong Warm Same Yes Sunny Warm High Strong Warm Same Yes Sunny Warm High Strong Warm Change No Sunny Warm High Strong Cool Change Yes
x f(x)
attribute
10
constraints
h classifies x as a positive example (h(x)=1)
enjoys her favorite sport only on cold days with high humidity
11
consider regarding the target concept
hypothesis space
12
possible
13
14
15
16
17
18
19
20
Ockham’s Razor: Prefer the simplest hypothesis consistent with the data
hypothesis depends on the hypothesis space
function
21
down the tree from root to leaf
take
attribute
22
23
Outlook Humidity Wind Yes
Overcast Sunny Rain
No Yes Yes No
High Normal Strong Weak
<Outlook=Sunny, Temp=Hot, Humidity=High, Wind=Strong> An instance Classification: No
Note: Decision trees represent disjunctions of conjunctions of constraints on attribute values
propositional languages
tree
with a path in the tree
function, parity function)
24
root of (sub)tree
25
26
27
28
subsets E1,...,Ev according to their values for A, where A has v distinct values
attribute test:
29
30
31
32
set
are correctly classified by h
different randomly selected test sets for each size
33
34
As the training set grows, accuracy increases
35
just deep enough to perfectly classify the training examples
target function
36
to overfit the training data if there exists some alternative hypothesis h’ in H such that h has smaller error than h’ on the training examples, but h’ has smaller error than h over the entire distribution of instances
errorTr(h)<errorTr(h’) but errorTe(h’)<errorTe(h)
decision trees by 10-25%
37
sample size v would exhibit observed deviation
38
experiments, each time putting aside 1/k of the data to test on
39
40