Datasets Preprocessing Dimensionality reduction
Preprocessing and Dimensionality Reduction
J´ er´ emy Fix
CentraleSup´ elec jeremy.fix@centralesupelec.fr
2017
1 / 73
Preprocessing and Dimensionality Reduction J er emy Fix - - PowerPoint PPT Presentation
Datasets Preprocessing Dimensionality reduction Preprocessing and Dimensionality Reduction J er emy Fix CentraleSup elec jeremy.fix@centralesupelec.fr 2017 1 / 73 Datasets Preprocessing Dimensionality reduction Where to get data
Datasets Preprocessing Dimensionality reduction
1 / 73
Datasets Preprocessing Dimensionality reduction Where to get data
2 / 73
Datasets Preprocessing Dimensionality reduction Where to get data
Kaggle, ICML 2013 3 / 73
Datasets Preprocessing Dimensionality reduction Where to get data
Classes : person/bird, cat, cow, dog, horse, sheep/aeroplane, bicycle, boat, bus, car, motorbike, train/bottle, chair, dining table, potted plant, sofa, tv-monitor http://host.robots.ox.ac.uk/pascal/VOC/ 4 / 73
Datasets Preprocessing Dimensionality reduction Where to get data
ImageNet Large Scale Visual Recognition Challenge, Russakovsky et al. (2015) 5 / 73
Datasets Preprocessing Dimensionality reduction Where to get data
https://github.com/openimages/dataset
6 / 73
Datasets Preprocessing Dimensionality reduction Where to get data
http://cocodataset.org/ 7 / 73
Datasets Preprocessing Dimensionality reduction Where to get data
8 / 73
Datasets Preprocessing Dimensionality reduction Where to get data
9 / 73
Datasets Preprocessing Dimensionality reduction Where to get data
10 / 73
Datasets Preprocessing Dimensionality reduction Make your own dataset
11 / 73
Datasets Preprocessing Dimensionality reduction We need vectors, appropriately scaled, without missing values
12 / 73
Datasets Preprocessing Dimensionality reduction We need vectors, appropriately scaled, without missing values
13 / 73
Datasets Preprocessing Dimensionality reduction We need vectors, appropriately scaled, without missing values
14 / 73
Datasets Preprocessing Dimensionality reduction We need vectors, appropriately scaled, without missing values
15 / 73
Datasets Preprocessing Dimensionality reduction We need vectors, appropriately scaled, without missing values
Pennington(2014) GloVe: Global Vectors for Word Representation; Mikolov(2013) Efficient Estimation of Word Representations in Vector Space; https://fasttext.cc/ 16 / 73
Datasets Preprocessing Dimensionality reduction We need vectors, appropriately scaled, without missing values
Silva(2014). A brief review of the main approaches for treatment of missing data 17 / 73
Datasets Preprocessing Dimensionality reduction We need vectors, appropriately scaled, without missing values
18 / 73
Datasets Preprocessing Dimensionality reduction We need vectors, appropriately scaled, without missing values
19 / 73
Datasets Preprocessing Dimensionality reduction
20 / 73
Datasets Preprocessing Dimensionality reduction
21 / 73
Datasets Preprocessing Dimensionality reduction
22 / 73
Datasets Preprocessing Dimensionality reduction
23 / 73
Datasets Preprocessing Dimensionality reduction
24 / 73
Datasets Preprocessing Dimensionality reduction
25 / 73
Datasets Preprocessing Dimensionality reduction
26 / 73
Datasets Preprocessing Dimensionality reduction
27 / 73
Datasets Preprocessing Dimensionality reduction
28 / 73
Datasets Preprocessing Dimensionality reduction
29 / 73
Datasets Preprocessing Dimensionality reduction
30 / 73
Datasets Preprocessing Dimensionality reduction Feature selection
xi ∈ Rn z ∈ Rd
31 / 73
Datasets Preprocessing Dimensionality reduction Feature selection
32 / 73
Datasets Preprocessing Dimensionality reduction Feature selection
33 / 73
Datasets Preprocessing Dimensionality reduction Feature selection
(x0−x)2 2σ2
(xN−1−x)2 2σ2
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Fit samples true lreg lreg_l1
5 10 15 20 25 30 0.4 0.2 0.0 0.2 0.4 0.6
Parameters
34 / 73
Datasets Preprocessing Dimensionality reduction Feature selection
35 / 73
Datasets Preprocessing Dimensionality reduction Feature selection
36 / 73
Datasets Preprocessing Dimensionality reduction Feature selection
37 / 73
Datasets Preprocessing Dimensionality reduction Feature selection
{x0, x1} ∅ Xd {x0} {x1} {xd−1} {x0, x2} {x0, xd−1} · · · · · · · · · {x1, xd−1} {xd−2, xd−1} · · · · · · . . . . . . . . . Number of sets 1 1 d d(d-1)/2
d! k!(d−k)!
Sequential Forward Search Sequential Backward Search
38 / 73
Datasets Preprocessing Dimensionality reduction Feature selection
JCSF (χ) = k¯ r(χ, y)
r(χ, χ) ¯ r(χ, y) = 1 k
r(x.,j , y) ¯ r(χ, χ) = 1 k(k − 1)
r(x.,j1 , x.,j2 )
39 / 73
Datasets Preprocessing Dimensionality reduction Feature selection
40 / 73
Datasets Preprocessing Dimensionality reduction Feature extraction
41 / 73
Datasets Preprocessing Dimensionality reduction Feature extraction
1.0 0.5 0.0 0.5 1.0 1.5 2.0 1.0 0.5 0.0 0.5 1.0 1.5 2.0
w0 w1
42 / 73
Datasets Preprocessing Dimensionality reduction Feature extraction
43 / 73
Datasets Preprocessing Dimensionality reduction Feature extraction
44 / 73
Datasets Preprocessing Dimensionality reduction Feature extraction
45 / 73
Datasets Preprocessing Dimensionality reduction Feature extraction
46 / 73
Datasets Preprocessing Dimensionality reduction Feature extraction
(Id − WWT )h is the residual vector by the orthogonal projection on the column vectors of W If h ∈ span{w1, ..., wr }, the residual is 0 If h ∈ span{w1, ..., wr }⊥, the residual is h 47 / 73
Datasets Preprocessing Dimensionality reduction Feature extraction
48 / 73
Datasets Preprocessing Dimensionality reduction Feature extraction
49 / 73
Datasets Preprocessing Dimensionality reduction Feature extraction
50 / 73
Datasets Preprocessing Dimensionality reduction Feature extraction
51 / 73
Datasets Preprocessing Dimensionality reduction Feature extraction
52 / 73
Datasets Preprocessing Dimensionality reduction Feature extraction
53 / 73
Datasets Preprocessing Dimensionality reduction Feature extraction
54 / 73
Datasets Preprocessing Dimensionality reduction Feature extraction
55 / 73
Datasets Preprocessing Dimensionality reduction Feature extraction
56 / 73
Datasets Preprocessing Dimensionality reduction Feature extraction
57 / 73
Datasets Preprocessing Dimensionality reduction Feature extraction
58 / 73
Datasets Preprocessing Dimensionality reduction Feature extraction
59 / 73
Datasets Preprocessing Dimensionality reduction Feature extraction
60 / 73
Datasets Preprocessing Dimensionality reduction Feature extraction
10 8 6 4 2 2 4 6
w1
6 4 2 2 4 6
w2 1 2 3 4 5 6 7 8 9
0.10 0.08 0.06 0.04 0.02 0.00 0.02 0.04 0.06 0.08 0.10
61 / 73
Datasets Preprocessing Dimensionality reduction Feature extraction
62 / 73
Datasets Preprocessing Dimensionality reduction Feature extraction
63 / 73
Datasets Preprocessing Dimensionality reduction Feature extraction
64 / 73
Datasets Preprocessing Dimensionality reduction Feature extraction
65 / 73
Datasets Preprocessing Dimensionality reduction Feature extraction
66 / 73
Datasets Preprocessing Dimensionality reduction Feature extraction
2
67 / 73
Datasets Preprocessing Dimensionality reduction Feature extraction
68 / 73
Datasets Preprocessing Dimensionality reduction Feature extraction
69 / 73
Datasets Preprocessing Dimensionality reduction Feature extraction
70 / 73
Datasets Preprocessing Dimensionality reduction Feature extraction
i
i
71 / 73
Datasets Preprocessing Dimensionality reduction Feature extraction
2)−1
2)−1
72 / 73
Datasets Preprocessing Dimensionality reduction Feature extraction
73 / 73