Artificial Intelligence: Representation and Problem Solving
15-381 April 17, 2007
Probabilistic Learning
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
Reminder
- No class on Thursday - spring carnival.
2
0.9 0.8 0.7 0.6 0.5 0.4 R 23 = C 2 is misclassified as C 3 0.3 - - PDF document
Artificial Intelligence: Representation and Problem Solving 15-381 April 17, 2007 Probabilistic Learning Reminder No class on Thursday - spring carnival. Artificial Intelligence: Probabilistic Learning Michael S. Lewicki Carnegie
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
2
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
3 Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
4
<2 years at current job? missed payments? defaulted? N N N Y N Y N N N N N N N Y Y Y N N N Y N N Y Y Y N N Y N N
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
5 EDIBLE? CAP-SHAPE CAP-SURFACE
1 edible flat fibrous
2 poisonous convex smooth
3 edible flat fibrous
4 edible convex scaly
5 poisonous convex smooth
6 edible convex fibrous
7 poisonous flat scaly
8 poisonous flat scaly
9 poisonous convex fibrous
10 poisonous convex fibrous
11 poisonous flat smooth
12 edible convex smooth
13 poisonous knobbed scaly
14 poisonous flat smooth
15 poisonous flat fibrous
Mushroom data
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
6
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
7
input is a set of T observations, each an N-dimensional vector (binary, discrete, or continuous) model (e.g. a decision tree) is defined by M parameters.
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
8
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
9
1 2 3 4 5 6 7 0.5 1 1.5 2 2.5 petal length (cm) petal width (cm) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
10
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
11
decision boundary
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
12
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
13
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
14
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
15
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
16
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
17
<2 years at current job? missed payments? defaulted? N N N Y N Y N N N N N N N Y Y Y N N N Y N N Y Y Y N N Y N N
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
18
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
19
<2 years at current job? missed payments? defaulted? N N N Y N Y N N N N N N N Y Y Y N N N Y N N Y Y Y N N Y N N
x1 <2 years at current job? x2 missed payments? C1 did default C2 did not default N N N Y Y N Y Y
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
20
<2 years at current job? missed payments? defaulted? N N N Y N Y N N N N N N N Y Y Y N N N Y N N Y Y Y N N Y N N
x1 <2 years at current job? x2 missed payments? C1 did default C2 did not default N N 3/3 0/0 N Y Y N Y Y
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
21
<2 years at current job? missed payments? defaulted? N N N Y N Y N N N N N N N Y Y Y N N N Y N N Y Y Y N N Y N N
x1 <2 years at current job? x2 missed payments? C1 did default C2 did not default N N 3/3 0/0 N Y 2/3 1/3 Y N Y Y
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
22
<2 years at current job? missed payments? defaulted? N N N Y N Y N N N N N N N Y Y Y N N N Y N N Y Y Y N N Y N N
x1 <2 years at current job? x2 missed payments? C1 did default C2 did not default N N 3/3 0/0 N Y 2/3 1/3 Y N 1/4 3/4 Y Y
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
23
<2 years at current job? missed payments? defaulted? N N N Y N Y N N N N N N N Y Y Y N N N Y N N Y Y Y N N Y N N
x1 <2 years at current job? x2 missed payments? C1 did default C2 did not default N N 3/3 0/0 N Y 2/3 1/3 Y N 1/4 3/4 Y Y 0/0 0/0
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
24
<2 years at current job? missed payments? defaulted? N N N Y N Y N N N N N N N Y Y Y N N N Y N N Y Y Y N N Y N N
x1 <2 years at current job? x2 missed payments? C1 did default C2 did not default N N 3/3 0/0 N Y 2/3 1/3 Y N 1/4 3/4 Y Y 0/0 0/0
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
25
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
26
posterior likelihood prior normalizing constant Uniform on [0, 1] is a reasonable assumption, i.e. “we don’t know anything”. We know the likelihood, what about the prior?
In this case, the posterior is just proportional to the likelihood: What is the form of the posterior?
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
27
What do we know initially, before observing any trials?
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
28
What is our belief about after observing one “tail” ?
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
29
Now after two trials we observe 1 head and 1 tail.
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
30
3 trials: 1 head and 2 tails.
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
31
4 trials: 1 head and 3 tails.
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
32
5 trials: 1 head and 4 tails.
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
33
Bayes in his original paper in 1763 showed that:
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
34
MAP and ratio estimate would say 0. y/n = 0 * Does this make sense? What would a better estimate be?
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
35
What happens for zero trials?
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
36 EDIBLE? CAP-SHAPE CAP-SURFACE CAP-COLOR ODOR STALK-SHAPE POPULATION HABITAT
1 edible flat fibrous red none tapering several woods
2 poisonous convex smooth red foul tapering several paths
3 edible flat fibrous brown none tapering abundant grasses
4 edible convex scaly gray none tapering several woods
5 poisonous convex smooth red foul tapering several woods
6 edible convex fibrous gray none tapering several woods
7 poisonous flat scaly brown fishy tapering several leaves
8 poisonous flat scaly brown spicy tapering several leaves
9 poisonous convex fibrous yellow foul enlarging several paths
10 poisonous convex fibrous yellow foul enlarging several woods
11 poisonous flat smooth brown spicy tapering several woods
12 edible convex smooth yellow anise tapering several woods
13 poisonous knobbed scaly red foul tapering several leaves
14 poisonous flat smooth brown foul tapering several leaves
15 poisonous flat fibrous gray foul enlarging several woods
16 edible sunken fibrous brown none enlarging solitary urban
17 poisonous flat smooth brown foul tapering several woods
18 poisonous convex smooth white foul tapering scattered urban
19 poisonous flat scaly yellow foul enlarging solitary paths
20 edible convex fibrous gray none tapering several woods
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
37
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
38
EDIBLE: edible poisonous CAP-SHAPE: bell conical convex flat knobbed sunken CAP-SURFACE: fibrous grooves scaly smooth CAP-COLOR: brown buff cinnamon gray green pink purple red white yellow BRUISES: bruises no ODOR: almond anise creosote fishy foul musty none pungent spicy GILL-ATTACHMENT: attached free GILL-SPACING: close crowded GILL-SIZE: broad narrow GILL-COLOR: black brown buff chocolate gray green orange pink purple red white yellow STALK-SHAPE: enlarging tapering STALK-ROOT: bulbous club equal rooted STALK-SURFACE-ABOVE-RING: fibrous scaly silky smooth STALK-SURFACE-BELOW-RING: fibrous scaly silky smooth STALK-COLOR-ABOVE-RING: brown buff cinnamon gray orange pink red white yellow STALK-COLOR-BELOW-RING: brown buff cinnamon gray orange pink red white yellow VEIL-TYPE: partial universal VEIL-COLOR: brown orange white yellow RING-NUMBER: none one two RING-TYPE: evanescent flaring large none pendant SPORE-PRINT
POPULATION: abundant clustered numerous scattered several solitary HABITAT: grasses leaves meadows paths urban waste woods 2 6 4 10 2 9 2 2 2 12 2 4 4 4 9 9 2 4 3 5 9 6 7
# values attributes
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
39
N
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
40
n p(xn|Ck)
n p(xn|Ck)
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
41
n p(xn|Ck)
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
42
i
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
43
Each column is a distinct eight-dimensional binary feature.
There are five underlying causal feature patterns. What are they? Each column is a distinct eight-dimensional binary feature. true hidden causes of the data inferred causes of the data
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
j Sjwji)
j Sjwji)
47 Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
48
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
Each column is a distinct eight-dimensional binary feature. true hidden causes of the data inferred causes of the data
49 Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
j Sjwji)
j Sjwji)
50
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
51 Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
52
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
n S(n) i
j
j
n S(n) i
53
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
54
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
55 Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
k Skwki
56
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning 57
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
58
The true generative model Patterns sampled from the model Can we infer the structure of the network given only the patterns?
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
59
The first layer of weights learn that patterns are combinations of lines. The second layer learns combinations of the first layer features
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
60
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
61
One the structure learned, the Gibbs updating convergences in two sweeps.
Michael S. Lewicki Carnegie Mellon Artificial Intelligence: Probabilistic Learning
62