Machine Learning Extra : 1 BMVA Summer School 2016
The bits the whirlwind tour left
- ut ...
The bits the whirlwind tour left out ... BMVA Summer School 2016 - - PowerPoint PPT Presentation
The bits the whirlwind tour left out ... BMVA Summer School 2016 extra background slides (from teaching material at Durham University) BMVA Summer School 2016 Machine Learning Extra : 1 Machine Learning Definition: A computer
Machine Learning Extra : 1 BMVA Summer School 2016
Machine Learning Extra : 2 BMVA Summer School 2016
[Mitchell, 1997]
Machine Learning Extra : 3 BMVA Summer School 2016
Machine Learning Extra : 4 BMVA Summer School 2016
node = root of tree
Main loop:
A = “best” decision attribute for next node .....
Machine Learning Extra : 5 BMVA Summer School 2016
– S is a sample of training examples – P is the proportion of positive examples in S – P⊖ is the proportion of negative examples in S
Machine Learning Extra : 6 BMVA Summer School 2016
– i.e. expected reduction in impurity in the data – (improvement in consistent data sorting)
Machine Learning Extra : 7 BMVA Summer School 2016
– reduction in entropy in set of examples S if split on attribute A – Sv = subset of S for which attribute A has value v
– Gain(S,A) = original entropy – SUM(entropy of sub-nodes if split on A)
Machine Learning Extra : 8 BMVA Summer School 2016
– “information provided about the target function given the value of some attribute A” – How well does A sort the data into the required classes?
– (not just or ⊖)
i=1 c
Machine Learning Extra : 9 BMVA Summer School 2016
Selecting the Next Attribute – which attribute should we split on next?
Machine Learning Extra : 10 BMVA Summer School 2016
Selecting the Next Attribute – which attribute should we split on next?
Machine Learning Extra : 11 BMVA Summer School 2016
Lecture 4 : 12 Toby Breckon
Assign equal weight to each training instance For t iterations: Apply learning algorithm to weighted training set, store resulting (weak) classifier Compute classifier’s error e on weighted training set If e = 0 or e > 0.5: Terminate classifier generation For each instance in training set: If classified correctly by classifier: Multiply instance’s weight by e/(1-e) Normalize weight of all instances
Learning Boosted Classifier (Adaboost Algorithm) Classification using Boosted Classifier
Assign weight = 0 to all classes For each of the t (or less) classifiers: For the class this classifier predicts add –log e/(1-e) to this class’s weight Return class with highest weight
e = error of classifier on the training set
Lecture 4 : 13 Toby Breckon
– Weight adjustment means t+1th classifier concentrates on the examples tth classifier got wrong – Each classifier must be able to achieve greater than 50% success
– Results in an ensemble of t classifiers
– Training error decreases exponentially (theoretically)
– several additions/modifications to handle this
– Works best with weak classifiers
– set of t decision trees of limited complexity (e.g. depth)
Lecture 4 : 14 Toby Breckon
– Select a training set T' (size N) by randomly selecting (with replacement) N instances from training set T – Select a number m < M where a subset of m attributes
the best split at a given node (m is constant across all trees in the forest) – Grow each tree using T' to the largest extent possible without any pruning.
[Breiman 2001]
Machine Learning Extra : 15 BMVA Summer School 2016
Machine Learning Extra : 16 BMVA Summer School 2016
– input examples d={1...D}
vector, target vector}
– node index n={1 … N} – weight wji connects node j → i – input xji is the input on the connection node j → i
– output error for node n is δn
Output Layer
Input layer Input, x Output vector, Ok
Hidden Layer node index {1 … N}
Machine Learning Extra : 17 BMVA Summer School 2016
example d
based on :
difference between
(t - o) derivative of sigmoid function
proportional to node contribution to output error
–
Machine Learning Extra : 18 BMVA Summer School 2016
– number of iterations reached – Or error below suitable bound
Machine Learning Extra : 19 BMVA Summer School 2016
Output Layer, unit k
Input layer Input, x Output vector, Ok
Hidden Layer, unit h
Machine Learning Extra : 20 BMVA Summer School 2016
Output vector, Ok
Output Layer, unit k
Input layer Input, x
Hidden Layer, unit h
Machine Learning Extra : 21 BMVA Summer School 2016
Output Layer, unit k
Input layer Input, x
Hidden Layer, unit h
Output vector, Ok
Machine Learning Extra : 22 BMVA Summer School 2016
Output Layer, unit k
Input layer Input, x
Hidden Layer(s), unit h
Output vector, Ok
Machine Learning Extra : 23 BMVA Summer School 2016
Output Layer, unit k
Input layer Input, x
Hidden Layer(s), unit h
Output vector, Ok
Machine Learning Extra : 24 BMVA Summer School 2016
– as updates based on training one sample at a time
Machine Learning Extra : 25 BMVA Summer School 2016
– http://deeplearning.net/tutorial/lenet.html
– http://www.deeplearning.net/tutorial/
Machine Learning Extra : 26 BMVA Summer School 2016
Machine Learning Extra : 27 BMVA Summer School 2016
Offset from origin Normal to line 2D LINES REMINDER
Machine Learning Extra : 28 BMVA Summer School 2016
http://www.mathopenref.com/coordpointdisttrig.html
2D LINES REMINDER
Machine Learning Extra : 29 BMVA Summer School 2016
Normal to line
Result is +ve if point on this side
Result is -ve if point on this side
Result is the distance (+ve or
for:
2D LINES REMINDER
Machine Learning Extra : 30 BMVA Summer School 2016
Instances (i.e, examples) {xi , yi } – xi = point in instance space (Rn) made up of n attributes – yi =class value for classification of xi
y = +1 y = -1 Classification of example function f(x) = y = {+1, -1} i.e. 2 classes N.B. we have a vector of weights coefficients ⃗ w
Machine Learning Extra : 31 BMVA Summer School 2016
finding the parameters: y = +1 y = -1 Classification of example function f(x) = y = {+1, -1} i.e. 2 classes
Machine Learning Extra : 32 BMVA Summer School 2016
Machine Learning Extra : 33 BMVA Summer School 2016
Machine Learning Extra : 34 BMVA Summer School 2016
Machine Learning Extra : 35 BMVA Summer School 2016
Find hyperplane separator (plane in 3D) via optimization Non-linear Separation (red / blue data items
Kernel projection to higher dimensional space Non-linear boundary in original dimension (e.g. circle n 2D) defined by planar boundary (cut) in 3D.
Machine Learning Extra : 36 BMVA Summer School 2016
– project X into some higher dimensional space X' = Rm where data will be linearly separable – let : X → X' be this projection.
– Training depends only on dot products of form (xi) (xj)
– So we can train in Rm with same computational complexity as in Rn, provided we can find a kernel basis function K such that:
(kernel trick)
– Classifying new instance x now requires calculating sign of:
Machine Learning Extra : 37 BMVA Summer School 2016
Machine Learning Extra : 38 BMVA Summer School 2016
– Unbiased
– Representative
– Accurate
– Available
Machine Learning Extra : 39 BMVA Summer School 2016
– split overall data set into separate training and test sets
splits common
– Training on one, test on the other – Test error = error on the test set – Training error = error on training set – Weakness: susceptible to bias in data sets or “over-fitting”
Machine Learning Extra : 40 BMVA Summer School 2016
– Randomly split (all) the data into k-subsets – For 1 to k
using kth subset – report mean error over all k tests
Machine Learning Extra : 41 BMVA Summer School 2016
tp = true positive / tn = true negative fp = false positive / fn = false negative Often quoted or plotted when comparing ML techniques
Machine Learning Extra : 42 BMVA Summer School 2016
– e.g. 2 categories = 50% (0.5), 3 categories = 33% (0.33) ….. etc. – Pr(e) can be replaced with Pr(b) to measure agreement between classifiers/techniques a and b
[Cohen, 1960]