1
Model Building:
General Strategies, Data Pre-processing, and Partial Least Squares
Max Kuhn and Kjell Johnson Nonclinical Statistics, Pfizer
1 Monday, March 24, 2008
Model Building: General Strategies, Data Pre-processing, and - - PowerPoint PPT Presentation
Model Building: General Strategies, Data Pre-processing, and Partial Least Squares Max Kuhn and Kjell Johnson Nonclinical Statistics, Pfizer 1 Monday, March 24, 2008 1 Objective To construct a model of predictors that can be used to
1
1 Monday, March 24, 2008
2
2 Monday, March 24, 2008
3
3 Monday, March 24, 2008
4
4 Monday, March 24, 2008
5
5 Monday, March 24, 2008
6
6 Monday, March 24, 2008
7
7 Monday, March 24, 2008
8
8 Monday, March 24, 2008
9
Predictor B Predictor B Predictor A Predictor A
9 Monday, March 24, 2008
10
10 Monday, March 24, 2008
11
11 Monday, March 24, 2008
12
12 Monday, March 24, 2008
13
13 Monday, March 24, 2008
14
14 Monday, March 24, 2008
15
15 Monday, March 24, 2008
16
16 Monday, March 24, 2008
17
17 Monday, March 24, 2008
18
18 Monday, March 24, 2008
19
19 Monday, March 24, 2008
20
20 Monday, March 24, 2008
21
21 Monday, March 24, 2008
22
22 Monday, March 24, 2008
23
23 Monday, March 24, 2008
24
24 Monday, March 24, 2008
25
25 Monday, March 24, 2008
26
26 Monday, March 24, 2008
27
27 Monday, March 24, 2008
28
28 Monday, March 24, 2008
29
29 Monday, March 24, 2008
30
30 Monday, March 24, 2008
31
31 Monday, March 24, 2008
32
problem
frequency of the second most common value
32 Monday, March 24, 2008
33
33 Monday, March 24, 2008
34
34 Monday, March 24, 2008
35
35 Monday, March 24, 2008
36
36 Monday, March 24, 2008
37
O[C@H] (CCn1c(c2ccc(F)cc2)c(c2ccccc2)c(C(=O)Nc2cccc c2)c1C(C)C)C[C@@H](O)CC(= O)O
37 Monday, March 24, 2008
38
38 Monday, March 24, 2008
39
39 Monday, March 24, 2008
40
Monday, March 24, 2008
41
41 Monday, March 24, 2008
42
42 Monday, March 24, 2008
43
43 Monday, March 24, 2008
44
log10 Cost Accuracy
0.72 0.74 0.76 0.78 0.80 0.82 1 1 2 3
Kappa
0.40 0.45 0.50 0.55 0.60 1 1 2 3
Monday, March 24, 2008
45
Accuracy 81.8 84.2 (81.9, 86.3) Kappa 0.63 0.68
Density
10 20 30 40 0.80 0.82 0.84
Accuracy
5 10 15 20 0.60 0.65 0.70
Kappa
45 Monday, March 24, 2008
46
46 Monday, March 24, 2008
47
47 Monday, March 24, 2008
48
48 Monday, March 24, 2008
49
49 Monday, March 24, 2008
50
50 Monday, March 24, 2008
51
1 2 3 4 5
1 2 3 4 5
Predictor 1 Predictor 2
51 Monday, March 24, 2008
51
1 2 3 4 5
1 2 3 4 5
Predictor 1 Predictor 2
51 Monday, March 24, 2008
51
1 2 3 4 5
1 2 3 4 5
Predictor 1 Predictor 2
51 Monday, March 24, 2008
51
1 2 3 4 5
1 2 3 4 5
Predictor 1 Predictor 2
51 Monday, March 24, 2008
52 52 Monday, March 24, 2008
53
53 Monday, March 24, 2008
54
with only a few directions
resulting scores.
54 Monday, March 24, 2008
55
55 Monday, March 24, 2008
56
56 Monday, March 24, 2008
57
0.50 3.25 6.00
1.25 2.50 3.75 5.00
Scatter of Predictors
Predictor 2 Predictor 1
57 Monday, March 24, 2008
58
1.25 2.50 3.75 5.00
2 6 10
Scatter of First PCA Scores with Response
Response First PCA Scores
58 Monday, March 24, 2008
59
59 Monday, March 24, 2008
60
Predictor2
Response1 Predictor1 Predictor3 Predictor4 Predictor5
Predictor6 Response2 Response3
60 Monday, March 24, 2008
61
Predictor1 Predictor2 Predictor3 Predictor4 Predictor5 Response1
61 Monday, March 24, 2008
62
(many predictors, one response)
62 Monday, March 24, 2008
63
(many predictors, one response)
63 Monday, March 24, 2008
64
64 Monday, March 24, 2008
65
65 Monday, March 24, 2008
66
66 Monday, March 24, 2008
67
variability (although this will be tempered by the need to also be related to the response)
scores, and relationship with the response. Specifically, outliers can – make it appear that there is no relationship between the predictors and response when there truly is a relationship, or – make it appear that there is a relationship between the predictors and response when there truly is no relationship
67 Monday, March 24, 2008
68
68 Monday, March 24, 2008
69
0.500 2.333 4.167 6.000
1.3889 3.1944 5.0000
Scatter of Predictors
Predictor 2 Predictor 1
69 Monday, March 24, 2008
70
1.25 2.50 3.75 5.00
1.25 2.50 3.75 5.00
Scatter of First PLS Scores with Response
Response First PLS Scores
70 Monday, March 24, 2008
71
71 Monday, March 24, 2008
72
business acres per town
variable (= 1 if tract bounds river; 0 otherwise)
per dwelling
1940
Boston employment centers
radial highways
town
(outcome)
72 Monday, March 24, 2008
73
73 Monday, March 24, 2008
74
74 Monday, March 24, 2008