MACHINE LEARNING – 2013
1 1
Introduction Lecturer: Prof. Aude Billard (aude.billard@epfl.ch) - - PowerPoint PPT Presentation
MACHINE LEARNING 2013 MACHINE LEARNING Introduction Lecturer: Prof. Aude Billard (aude.billard@epfl.ch) Assistants: Dr. Basilio Noris, Nicolas Sommer (basilio.noris@epfl.ch; n.sommer@epfl.ch) 1 1 MACHINE LEARNING 2013 Practicalities
MACHINE LEARNING – 2013
1 1
MACHINE LEARNING – 2013
2 2
MACHINE LEARNING – 2013
3 3
http://lasa.epfl.ch/teaching/lectures/ML_Phd/index.php
MACHINE LEARNING – 2013
4 4
MACHINE LEARNING – 2013
5 5
MACHINE LEARNING – 2013
6 6
http://lasa.epfl.ch/teaching/lectures/ML_Phd/
MACHINE LEARNING – 2013
7 7
Compulsory reading of background chapters before class!
MACHINE LEARNING – 2013
8 8
MACHINE LEARNING – 2013
9 9
MACHINE LEARNING – 2013
10 10
Machine Learning is the field of scientific study that concentrates on induction algorithms and on other algorithms that can be said to ``learn.''
Machine Learning Journal, Kluwer Academic
Machine Learning is an area of artificial intelligence involving
developing techniques to allow computers to “learn”. More specifically, machine learning is a method for creating computer programs by the analysis of data sets, rather than the intuition of engineers. Machine learning overlaps heavily with statistics, since both fields study the analysis of data.
Webster Dictionary
Machine learning is a branch of statistics and computer science, which studies algorithms and architectures that learn from data sets. WordIQ
MACHINE LEARNING – 2012
11 11 11
11
and Blind Signal Separation, pp. 934-940, 2006.
MACHINE LEARNING – 2012
12 12 12
12
MACHINE LEARNING – 2012
13 13 13
13
Piano note Same note played by a oboe
MACHINE LEARNING – 2013
14 14
MACHINE LEARNING – 2013
15 15
Wrinkles, Eyelids and Eyelashes Support Vector Regression
Noris et al, 2011, Computer Vision and Image Understanding.
MACHINE LEARNING – 2013
16 16
16
MACHINE LEARNING – 2012
17 17 17
17
MACHINE LEARNING – 2013
18 18
MACHINE LEARNING – 2013
19 19
MACHINE LEARNING – 2013
20 20
MACHINE LEARNING – 2013
21 21
MACHINE LEARNING – 2013
22 22
Linear PCA Kernel PCA projections
Projection of handwritten digits; kernel PCA projections extract better some of the texture and is less sensitive to noise than linear PCA, which boost reconstruction and recognition of digits (Mika et al, NIPS 2000). Reconst.
MACHINE LEARNING – 2013
23 23
Person identification Task: Top row: Query image and 10 candidates in the gallery set. Bottom row: projections of the query image onto the pre-learned (through kernel PCA) appearance manifold of the 10 candidates.
Yang et al, Person Reidentification by Kernel PCA Based Appearance Learning, Canadian Conf. on Computer and Robot Vision (2011)
MACHINE LEARNING – 2013
24 24
Feature space Projections in feature space
x F(x) F1(x)
MACHINE LEARNING – 2013
25 25
MACHINE LEARNING – 2012
26
MACHINE LEARNING – 2013
27 27
Chandra et al, “A Multivariate Time Series Clustering Approach for Crime Trends Prediction”, IEEE SMC 2008.
MACHINE LEARNING – 2013
28 28
Multispectral medical image segmentation. (left: MRI-image from 1 channel) (right: classification from a 9-cluster semi-supervised learning); Clusters should identify patterns, such as cerebro-spinal fluid, white matter, striated muscle, tumor. (Lundervolt et al, 1996).
MACHINE LEARNING – 2013
29 29
Jain, 2010, Data clustering: 50 years beyond K-means, Pattern Recognition Letters
MACHINE LEARNING – 2013
30 30
Jain, 2010, Data clustering: 50 years beyond K-means, Pattern Recognition Letters
MACHINE LEARNING – 2013
31 31
MACHINE LEARNING – 2013
32 32
Original Data After 4-class classification using SVM
MACHINE LEARNING – 2013
33 33
Swiderski et al, Decision Multistage classification by using logistic regression and neural networks for assessment of financial condition of company, Support System, 2012
5-classes insolvency risk Excellent, good, satisfactory, passable, poor
MACHINE LEARNING – 2013
34 34
Data from Swiderski et al. have more positive examples than negative examples
MACHINE LEARNING – 2013
35 35
y
i i i M
x
1
x
1
y
2
x
2
y
3
x
3
y
4
x
4
y
MACHINE LEARNING – 2013
36 36
Wand & Zhu, Financial market forecasting using a two-step kernel learning method for the support vector regression, Annals of Op. Research, 2010
Found that short-term (daily and weekly) trends had a bigger impact than the long- term (monthly and quarterly) trends in predicting the next day return.
MACHINE LEARNING – 2013
37 37
Kronander, Khansari and Billard, JTSC award, IEEE Int. Conf. on Int. and Rob. Systems 2011.
Contrast prediction of two methods (Gaussian Process Regression and Gaussian Mixture Regression) in terms of precision and generalization.
GPR GMR
MACHINE LEARNING – 2013
38 38
MACHINE LEARNING – 2013
39 39
MACHINE LEARNING – 2013
40 40 y x
Regression minimizing Mean Square Error
2 1
m i i i
Sampling
MACHINE LEARNING – 2013
41 41 y x
Regression minimizing Mean Square Error
2 1
m i i i
Crossvalidation
MACHINE LEARNING – 2013
42 42
Training Set Validation Set Testing Set
Crossvalidation
Training and validation sets are used to determine the sensitivity of the learning to the choice of hyperparameters (i.e. parameters not learned during training). Values for the hyperparameters are set through a grid search. Once the optimal hyperparameters have been picked, the model is trained with complete training + validation set and tested on the testing set. In practice, one often uses solely training and testing sets and performs crossvalidation directly
Crossvalidation
MACHINE LEARNING – 2013
43 43
MACHINE LEARNING – 2013
44 44
Time Performance How long can it take before an acceptable level of performance is achieved?
When is good enough achieved?
Progress in a machine’s performance must be measurable and must be significant. A machine must eventually reach a minimal level of performance (“good enough”) within an acceptable time frame.
MACHINE LEARNING – 2013
45 45
MACHINE LEARNING – 2013
46 46
X
1| P y x
MACHINE LEARNING – 2013
47 47
MACHINE LEARNING – 2013
48 48
http://www.machinelearning.org/index.html
and Computational Learning (summer schools and workshops) Databases:
Journals:
Conferences:
www.nips.org
MACHINE LEARNING – 2013
49 49
Topics for survey will entail:
The exact list of topics for lit. survey and mini-project will be posted by March 8 Topics for mini-project will entail implementing either of these:
The exact list of topics for lit. survey and mini-project will be posted in the second week of March
MACHINE LEARNING – 2013
50 50
MACHINE LEARNING – 2013
51 51
MACHINE LEARNING – 2013
52 52
MACHINE LEARNING – 2013
53 53
1
i i i M i i j
MACHINE LEARNING – 2013
54 54
The joint probability that the two events A (variable x takes value xi) and B (variable y takes value yj) occur is expressed as: P(A | B) is the conditional probability that event A will take place if event B already took place
i j
Bayes' theorem:
MACHINE LEARNING – 2013
55 55
The so-called marginal probability that variable x will take value xi is given by:
1
N x i i j j
MACHINE LEARNING – 2013
56 56
p(x) a continuous function is the probability density function or probability distribution function (PDF) (sometimes also called probability distribution or simply density) of variable x. The pdf is not bounded by 1. It can grow unbounded, depending on the value taken by x.
p(x) x
MACHINE LEARNING – 2012
57 57 57
57
b a
The probability that the variable x takes a value in the subinterval [a,b] is given by: The cumulative distribution function (or simply distribution function) of X is:
p(x) dx ~ probability of x to fall within an infinitesimal interval [x, x + dx]
D(x) p(x) x x
MACHINE LEARNING – 2013
58 58
MACHINE LEARNING – 2013
59 59
i i
i
MACHINE LEARNING – 2013
60 60
2 2 2 2
2
MACHINE LEARNING – 2013
61 61
1 2 3 4 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 * x f(0)+f(1)+f(-2) expectation std=1.38
3 Gaussians distributions Resulting distribution when superposing the 3 Gaussian distributions.
MACHINE LEARNING – 2013
62 62
2 2
2
x
The uni-dimensional Gaussian or Normal distribution is a distribution with pdf given by:
MACHINE LEARNING – 2012
63 63 63
x
Consider two random variables x and y with joint distribution p(x,y), then the marginal probability of x given y is:
MACHINE LEARNING – 2013
64 64
x
Consider two random variables x and y with joint distribution p(x,y), then the marginal probability of x given y is: Consider that the pdf of x, y is parametrized, s.t. one can compute the conditional Then, the likelihood function (short – likelihood) of the model parameters is given by:
MACHINE LEARNING – 2013
65 65
Machine learning techniques often assume that the form of the distribution function is known and that sole its parameters must be optimized to fit at best a set of observed
likelihood optimization. The principle of maximum likelihood consists of finding the optimal parameters of a given distribution by maximizing the likelihood function of these parameters, equivalently by maximizing the probability of the data given the model and its parameters, e.g.:
, ,
If p is the Gaussian function, then the above has an analytical solution (assuming that one has enough observations of x to draw from).
MACHINE LEARNING – 2013
66 66
MACHINE LEARNING – 2013
67 67
MACHINE LEARNING – 2013
68 68