Diagnostics & Kernel Methods Visualized
Ondřej Bojar
April 3, 2019
NPFL104 Machine Learning Methods
Charles University Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics unless otherwise stated
Diagnostics & Kernel Methods Visualized Ondej Bojar April 3, - - PowerPoint PPT Presentation
Diagnostics & Kernel Methods Visualized Ondej Bojar April 3, 2019 NPFL104 Machine Learning Methods Charles University Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics unless otherwise stated Diagnostics
Ondřej Bojar
April 3, 2019
NPFL104 Machine Learning Methods
Charles University Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics unless otherwise stated
Slides based on:
1/110
Some ML does not perform suffjciently well. You can consider random improvements:
… some may be fjxing problems you don’t have.
2/110
First fjgure out what’s going on.
Trivial but vital:
3/110
hw_my_dataset .
(Python source on the seminar web page.) An excellent resource: https://matplotlib.org/gallery.html
4/110
3 2 1 1 2 3 4 4 3 2 1 1 2 3 4 5/110
5 5 10 15 20 25 30 60 80 100 120 140 160 180 200 6/110
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 60 80 100 120 140 160 180 200 7/110
3 2 1 1 2 3 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 8/110
4 3 2 1 1 2 3 4 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45
hist
9/110
4 3 2 1 1 2 3 4 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45
gauss hist
10/110
75 80 85 90 95 100 105 110 100 200 300 400 500 600
heartrates hist. for activity 1
11/110
60 80 100 120 140 160 180 100 200 300 400 500 600
heartrates hist. for activity 1 heartrates hist. for activity 3 heartrates hist. for activity 5
12/110
60 70 80 90 100 110 120 130 50 100 150 200 250
data initial
13/110
4 3 2 1 1 2 3 4 20 10 10 20 30 40 50 60 70
data initial
14/110
High Variance = Overfjtting:
High Bias = Underfjtting:
Consider:
15/110
16/110
17/110
18/110
19/110
20/110
𝐹𝑠𝑠(𝑦) = 𝐹[(𝑍 − ̂ 𝑔𝐸(𝑦))2] Expected error 𝐹𝑠𝑠(𝑦) of learning ̂ 𝑔𝐸 over various datasets 𝐸 on a fjxed test set 𝑦 with observed values 𝑍 = 𝑔(𝑦) + 𝜗 can be decomposed as: 𝐹𝑠𝑠(𝑦) = (𝐹 ̂ 𝑔𝐸(𝑦) − 𝑔(𝑦))2 +𝐹( ̂ 𝑔𝐸(𝑦) − 𝐹 ̂ 𝑔𝐸(𝑦))2 +𝜏2
𝑓
𝐹𝑠𝑠(𝑦) = Bias2 +Variance +Noise
̂ 𝑔𝐸(𝑦) difgers from the ideal value 𝑔(𝑦).
̂ 𝑔𝐸(𝑦) difgers from the average prediction 𝐹 ̂ 𝑔𝐸(𝑦), on average over datasets 𝐸.
More: http://scott.fortmann-roe.com/docs/BiasVariance.html Derivation: see slides by Cohen
21/110
Picture from: http://scott.fortmann-roe.com/docs/BiasVariance.html 22/110
See the slides by Andrew Ng, plots on slide 7 and 8.
23/110
Search Error:
Modelling Error:
Consider:
24/110
See the slides by Andrew Ng, slide 14.
25/110
Error Analysis:
Ablative Analysis:
26/110
… including their parameters
Based on http://scikit-learn.org/stable/modules/svm.html and other scikit-demos.
27/110
𝑙(x, y) = x ⋅ y (Linear kernel = no kernel) The parameter 𝐷 in (linear) SVM:
Penalty Number of 𝐷 for Errors points considered Margin Bias Variance Low Low Many Wide High Low High High Few Narrow Low High Think 𝐷 for Varian𝐷e.
28/110
29/110
30/110
31/110
32/110
33/110
34/110
35/110
36/110
37/110
linear separation. The trick: map the coordinates to another space where separation is possible:
Ø
38/110
𝑙(𝑦, 𝑧) = 𝑦𝑧 + 𝑦2𝑧2 (1)
Picture from https://en.wikipedia.org/wiki/Kernel_method 39/110
Slides from ?). 40/110
Slides from ?). 41/110
Slides from ?). 42/110
Slides from ?). 43/110
Slides from ?). 44/110
𝑙(x, y) = (𝛿 ∗ x ⋅ y + coefg0)degree
45/110
46/110
47/110
48/110
49/110
50/110
51/110
52/110
53/110
54/110
55/110
56/110
57/110
58/110
59/110
60/110
61/110
62/110
63/110
64/110
65/110
66/110
67/110
68/110
𝑙(x, y) = 𝑓𝑦𝑞(−𝛿‖x − y‖2); 𝛿 > 0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
5 10 exp(-0.5*x*x) exp(-1*x*x) exp(-2*x*x) exp(-3*x*x)
… Totally “fmips” the space:
69/110
𝐷 Decision Surface Model Bias Variance Low Smooth Simple High Low High Peaked Complex Low High gamma Afgected Points Low can be far from training examples High must be close to training examples
70/110
71/110
72/110
73/110
74/110
75/110
76/110
77/110
78/110
79/110
80/110
81/110
82/110
83/110
84/110
85/110
86/110
87/110
88/110
89/110
90/110
http: //scikit-learn.org/stable/auto_examples/svm/plot_rbf_parameters.html
91/110
Two implementations in scikit-learn:
92/110
30 30.5 31 31.5 32 32.5 33 33.5 34 60 80 100 120 140 160 180 200 1 12 13 16 17 2 24 3 4 5 6 7 93/110
94/110
95/110
96/110
97/110
98/110
99/110
100/110
101/110
102/110
103/110
104/110
105/110
106/110
107/110
108/110
109/110
Diagnostics:
Kernels and Efgects on Hyperparameters Visualizations:
For
hw_gridsearch see the web.
110/110