The curse of dimensionality
Julie Delon
Laboratoire MAP5, UMR CNRS 8145 Université Paris Descartes up5.fr/delon
1
The curse of dimensionality Julie Delon Laboratoire MAP5, UMR CNRS - - PowerPoint PPT Presentation
The curse of dimensionality Julie Delon Laboratoire MAP5, UMR CNRS 8145 Universit Paris Descartes up5.fr/delon 1 Introduction Modern data are often high dimensional. computational biology: DNA, few observations and huge number of
Laboratoire MAP5, UMR CNRS 8145 Université Paris Descartes up5.fr/delon
1
computational biology: DNA, few observations and huge number of
2
images or videos: an image from a digital camera has millions of pixels,
3
data coming from consumer preferences: Netflix for instance owns a huge
4
this term was first used by R. Bellman in the introduction of his book
he used this term to talk about the difficulties to find an optimum in a
in order to promote dynamic approaches in programming. 5
6
Classification : you know the classes of n points from your learning
7
Regression : you observe n i.i.d observations (xi, yi) from the model
8
Regression : you observe n i.i.d observations (xi, yi) from the model
8
Regression : you observe n i.i.d observations (xi, yi) from the model
8
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 fraction of volume 0.0 0.2 0.4 0.6 0.8 1.0 distance p=1 p=2 p=3 p=10 9
s = 0.1, p = 10, s1/p = 0.63 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 fraction of volume 0.0 0.2 0.4 0.6 0.8 1.0 distance p=1 p=2 p=3 p=10 9
s = 0.1, p = 10, s1/p = 0.63 s = 0.01, p = 10, s1/p = 0.8 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 fraction of volume 0.0 0.2 0.4 0.6 0.8 1.0 distance p=1 p=2 p=3 p=10 9
s = 0.1, p = 10, s1/p = 0.63 s = 0.01, p = 10, s1/p = 0.8 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 fraction of volume 0.0 0.2 0.4 0.6 0.8 1.0 distance p=1 p=2 p=3 p=10 9
s = 0.1, p = 10, s1/p = 0.63 s = 0.01, p = 10, s1/p = 0.8 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 fraction of volume 0.0 0.2 0.4 0.6 0.8 1.0 distance p=1 p=2 p=3 p=10
9
10
10
0.0 0.2 0.4 0.6 0.8 1.0 1.2 distance 20 40 60 80 100 1 2 3 4 5 distance 50 100 150 200 250 300 350 400 2 4 6 8 10 12 14 distance 200 400 600 800 1000
11
0.0 0.2 0.4 0.6 0.8 1.0 1.2 distance 20 40 60 80 100 1 2 3 4 5 distance 50 100 150 200 250 300 350 400 2 4 6 8 10 12 14 distance 200 400 600 800 1000
11
0.0 0.2 0.4 0.6 0.8 1.0 1.2 distance 20 40 60 80 100 1 2 3 4 5 distance 50 100 150 200 250 300 350 400 2 4 6 8 10 12 14 distance 200 400 600 800 1000
11
since high-dimensional spaces are almost empty, it should be easier to separate groups in high-dimensional space with an
12
since high-dimensional spaces are almost empty, it should be easier to separate groups in high-dimensional space with an
the larger p is, the higher the likelihood that we can separate the classes
12
since high-dimensional spaces are almost empty, it should be easier to separate groups in high-dimensional space with an
the larger p is, the higher the likelihood that we can separate the classes
12
13
πp/2 Γ(p/2+1),
14
πp/2 Γ(p/2+1),
20 40 60 80 100 1 2 3 4 5 Volume
14
πp/2 Γ(p/2+1),
20 40 60 80 100 1 2 3 4 5 Volume
p→∞
2 √pπ.
14
15
p→∞ 1
16
p→∞ 1
40 60 80 100 0.2 0.4 0.6 0.8 1.0 Dimension P(X in S_0.9)
16
1 n
p
17
18
2 from 0 is around 0.99.
18
1 2 3 4 5 6 r 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 p(r) p=1 p=2 p=10 p=20 19
1 2 3 4 5 6 r 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 p(r) p=1 p=2 p=10 p=20 19
1 2 3 4 5 6 r 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 p(r) p=1 p=2 p=10 p=20
19
20
classification with gaussian mixture models principal component analysis (PCA) in linear regression with least squares, etc...
−3 −2 −1 1 2 3 4 −4 −3 −2 −1 1 2 3
21
classification with gaussian mixture models principal component analysis (PCA) in linear regression with least squares, etc...
−3 −2 −1 1 2 3 4 −4 −3 −2 −1 1 2 3
if n is not large enough, the estimates
sometimes necessary to estimate the
21
22
n
k .
22
n
k .
a.s.
22
n
k .
a.s.
If n, p → ∞ with p/n → c > 0, then
Even false for p/n = 1/100. 22
p/n = c > 1 Convergence in ∞
a.s.
However, we lose the convergence in spectral norm since
23
p/n = c > 1 Convergence in ∞
a.s.
However, we lose the convergence in spectral norm since
23
p
Σp) a.s.
µ({0}) = max(0, 1 − c−1)
24
Empirical eigenvalue distribution Mar˘ cenko–Pastur Law
25
26
26
26
the problem comes from that p is too large, therefore, reduce the data dimension to d ≪ p, such that the curse of dimensionality vanishes!
the problem comes from that parameter estimates are unstable, therefore, regularize these estimates, such that the parameter are correctly estimated!
the problem comes from that the number of parameters to estimate is
therefore, make restrictive assumptions on the model, such that the number of parameters to estimate becomes more “decent”! 27