The curse of dimensionality Julie Delon Laboratoire MAP5, UMR CNRS - PowerPoint PPT Presentation

The curse of dimensionality Julie Delon Laboratoire MAP5, UMR CNRS 8145 Université Paris Descartes up5.fr/delon 1

Introduction Modern data are often high dimensional. � computational biology: DNA, few observations and huge number of variables ; 2

Introduction � images or videos: an image from a digital camera has millions of pixels, 1h of video contains more than 130000 images 3

Introduction � data coming from consumer preferences: Netflix for instance owns a huge (but sparse) database of ratings given by millions of users on thousands of movies or TV shows. 4

The curse of dimensionality The curse of dimensionality: � this term was first used by R. Bellman in the introduction of his book “Dynamic programming” in 1957: All [problems due to high dimension] may be subsumed under the heading “the curse of dimensionality”. Since this is a curse, [...] , there is no need to feel discouraged about the possibility of obtaining significant results despite it. � he used this term to talk about the difficulties to find an optimum in a high-dimensional space using an exhaustive search, � in order to promote dynamic approaches in programming. 5

Outline In high dimensional spaces, nobody can hear you scream Concentration phenomena Surprising asymptotic properties for covariance matrices 6

Nearest neighbors and neighborhoods in estimation Supervised classification or regression often rely on local averages: � Classification : you know the classes of n points from your learning database, you can classify a new point x by computing the most represented class in the neighborhood of x . 7

Nearest neighbors and neighborhoods in estimation � Regression : you observe n i.i.d observations ( x i , y i ) from the model y i = f ( x i ) + ǫ i , and you want to estimate f . If you assume f is smooth, a simple solution consists in estimating f ( x ) as the average of all y i corresponding to the k nearest neighbors x i of x . 8

Nearest neighbors and neighborhoods in estimation � Regression : you observe n i.i.d observations ( x i , y i ) from the model y i = f ( x i ) + ǫ i , and you want to estimate f . If you assume f is smooth, a simple solution consists in estimating f ( x ) as the average of all y i corresponding to the k nearest neighbors x i of x . Makes sense in small dimension. Unfortunately, not so much when the dimension p increases... 8

1.0 0.8 0.6 distance 0.4 p=1 0.2 p=2 p=3 p=10 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 fraction of volume High dimensional spaces are empty Assume your data lives in [0 , 1] p . To capture a neighborhood which represents a fraction s of the hypercube volume, you need the edge length to be s 1 /p 9

1.0 0.8 0.6 distance 0.4 p=1 0.2 p=2 p=3 p=10 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 fraction of volume High dimensional spaces are empty Assume your data lives in [0 , 1] p . To capture a neighborhood which represents a fraction s of the hypercube volume, you need the edge length to be s 1 /p � s = 0 . 1 , p = 10 , s 1 /p = 0 . 63 9

1.0 0.8 0.6 distance 0.4 p=1 0.2 p=2 p=3 p=10 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 fraction of volume High dimensional spaces are empty Assume your data lives in [0 , 1] p . To capture a neighborhood which represents a fraction s of the hypercube volume, you need the edge length to be s 1 /p � s = 0 . 1 , p = 10 , s 1 /p = 0 . 63 � s = 0 . 01 , p = 10 , s 1 /p = 0 . 8 9

High dimensional spaces are empty Assume your data lives in [0 , 1] p . To capture a neighborhood which represents a fraction s of the hypercube volume, you need the edge length to be s 1 /p � s = 0 . 1 , p = 10 , s 1 /p = 0 . 63 � s = 0 . 01 , p = 10 , s 1 /p = 0 . 8 1.0 0.8 0.6 distance 0.4 p=1 0.2 p=2 p=3 p=10 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 fraction of volume 9

High dimensional spaces are empty Assume your data lives in [0 , 1] p . To capture a neighborhood which represents a fraction s of the hypercube volume, you need the edge length to be s 1 /p � s = 0 . 1 , p = 10 , s 1 /p = 0 . 63 � s = 0 . 01 , p = 10 , s 1 /p = 0 . 8 1.0 0.8 0.6 distance 0.4 p=1 0.2 p=2 p=3 p=10 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 fraction of volume Neighborhoods are no longer local 9

High dimensional spaces are empty The volume of an hypercube with an edge length of r = 0 . 1 is 0 . 1 p → when p grows, it quickly becomes so small that the probability to capture points from your database becomes very very small... Points in high dimensional spaces are isolated 10

High dimensional spaces are empty The volume of an hypercube with an edge length of r = 0 . 1 is 0 . 1 p → when p grows, it quickly becomes so small that the probability to capture points from your database becomes very very small... Points in high dimensional spaces are isolated To overcome this limitation, you need a number of sample which grows exponentially with p ... 10

400 100 1000 350 80 800 300 250 60 600 200 40 400 150 100 20 200 50 0 0 0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 0 1 2 3 4 5 0 2 4 6 8 10 12 14 distance distance distance Nearest neighbors X, Y two independent variables, with uniform distribution on [0 , 1] p . The mean square distance � X − Y � 2 satisfies E [ � X − Y � 2 ] = p/ 6 and Std [ � X − Y � 2 ] ≃ 0 . 2 √ p. 11

Nearest neighbors X, Y two independent variables, with uniform distribution on [0 , 1] p . The mean square distance � X − Y � 2 satisfies E [ � X − Y � 2 ] = p/ 6 and Std [ � X − Y � 2 ] ≃ 0 . 2 √ p. p = 2 p = 100 p = 1000 400 100 1000 350 80 800 300 250 60 600 200 40 400 150 100 20 200 50 0 0 0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 0 1 2 3 4 5 0 2 4 6 8 10 12 14 distance distance distance Figure: Histograms of pairwise-distances between n = 100 points sampled uniformly in the hypercube [0 , 1] p 11

Nearest neighbors X, Y two independent variables, with uniform distribution on [0 , 1] p . The mean square distance � X − Y � 2 satisfies E [ � X − Y � 2 ] = p/ 6 and Std [ � X − Y � 2 ] ≃ 0 . 2 √ p. p = 2 p = 100 p = 1000 400 100 1000 350 80 800 300 250 60 600 200 40 400 150 100 20 200 50 0 0 0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 0 1 2 3 4 5 0 2 4 6 8 10 12 14 distance distance distance Figure: Histograms of pairwise-distances between n = 100 points sampled uniformly in the hypercube [0 , 1] p The notion of nearest neighbors vanishes. 11

Classification in high dimension � since high-dimensional spaces are almost empty, � it should be easier to separate groups in high-dimensional space with an adapted classifier, 12

Classification in high dimension � since high-dimensional spaces are almost empty, � it should be easier to separate groups in high-dimensional space with an adapted classifier, � the larger p is, the higher the likelihood that we can separate the classes perfectly with a hyperplane 12

Classification in high dimension � since high-dimensional spaces are almost empty, � it should be easier to separate groups in high-dimensional space with an adapted classifier, � the larger p is, the higher the likelihood that we can separate the classes perfectly with a hyperplane Overfitting 12

Outline In high dimensional spaces, nobody can hear you scream Concentration phenomena Surprising asymptotic properties for covariance matrices 13

Volume of the ball π p/ 2 Volume of the ball of radius r is V p ( r ) = r p Γ( p/ 2+1) , 14

Volume of the ball π p/ 2 Volume of the ball of radius r is V p ( r ) = r p Γ( p/ 2+1) , 5 4 Volume 3 2 1 0 0 20 40 60 80 100 Fig. Volume of a ball of radius 1 regarding to the dimension p . 14

Volume of the ball π p/ 2 Volume of the ball of radius r is V p ( r ) = r p Γ( p/ 2+1) , 5 4 Volume 3 2 1 0 0 20 40 60 80 100 Fig. Volume of a ball of radius 1 regarding to the dimension p . Consequence: if you want to cover [0 , 1] p with a union of n unit balls, you need � p � p n ≥ 1 = Γ( p/ 2 + 1) 2 √ pπ. p →∞ ∼ V p π p/ 2 2 πe For p = 100 , n = 42 10 39 . 14

Corners of the hypercube Assume you draw n samples with uniform law in the hypercube, most sample points will be in corners of the hypercube : 15

Volume of the shell Probability that a uniform variable on the unit sphere belongs to the shell between the spheres of radius 0.9 and 1 is P ( X ∈ S 0 . 9 ( p )) = 1 − 0 . 9 p − → p →∞ 1 16

The curse of dimensionality Julie Delon Laboratoire MAP5, UMR CNRS - PowerPoint PPT Presentation

The curse of dimensionality Julie Delon Laboratoire MAP5, UMR CNRS 8145 Universit Paris Descartes up5.fr/delon 1 Introduction Modern data are often high dimensional. computational biology: DNA, few observations and huge number of

How to Cope with the Curse of Dimensionality ? Henryk Wo zniakowski University of Warsaw and

. . . 1 / 5 The curse of dimensionality . many applications require high dimensional data .

Dimensionality Reduction Alexandros Tantos Assistant Professor Aristotle University of

Investigating Dimensionality Dimensionality Dimensionality with with Investigating

STAT 209 Dimensionality Reduction November 26, 2019 Colin Reimer Dawson 1 / 24 Dimensionality

Lifting the curse of dimensionality in nonlinear system identification with tensor networks. Kim

Can Tim or Leste Avoid Can Tim or Leste Avoid the Resource Curse? the Resource Curse? By

High dimensional computing - the upside of the curse of dimensionality Peer Neubert Stefan

Curse of Dimensionality in Pivot-based Indexes Ilya Volnyansky, Vladimir Pestov Department of

Overcoming the curse of dimensionality: from nonlinear Monte Carlo to deep artificial neural

Concepts for Breaking the Curse of Dimensionality for the Optimal Control HJB Equation Karl

When can Deep Networks avoid the curse of dimensionality and other theoretical puzzles Tomaso

Lecture 3: Kernel Regression Distance Metrics Curse of Dimensionality Linear

Dampening the Curse of Dimensionality Decomposition Methods for Stochastic Optimization Problems

Lecture 3: Kernel Regression Curse of Dimensionality Aykut Erdem February 2016 Hacettepe

On k -anonymity and the curse of dimensionality Introduction An important method for privacy

What are Connected? 1 L aszl o Fejes T oths work on maximum density packing density

Mentoring for Career & Leadership Development 2018 AWIS DC Mentoring Circles Kickoff Event

Helping Patients Change Behavior: A Motivational Interviewing (MI) Approach Barbara L Beebe

Regression Methods 1. Linear Regression with only one parameter, and without offset; MLE and

The Graphs of Planar Soap Bubbles David Eppstein Computer Science Dept., University of

Topic 24 Classes Part II "Object-oriented programming as it emerged in Simula 67 allows

The ABCs of ADTs Algebraic Data Types Justin Lubin January 18, 2018 Asynchronous Anonymous @

Instantaneous Packet Delay Variation ipdv Carlo M. Demichelis - CSELT 41th IETF - Los Angeles

The curse of dimensionality Julie Delon Laboratoire MAP5, UMR CNRS - PowerPoint PPT Presentation

The curse of dimensionality Julie Delon Laboratoire MAP5, UMR CNRS 8145 Universit Paris Descartes up5.fr/delon 1 Introduction Modern data are often high dimensional. computational biology: DNA, few observations and huge number of

How to Cope with the Curse of Dimensionality ? Henryk Wo zniakowski University of Warsaw and

. . . 1 / 5 The curse of dimensionality . many applications require high dimensional data .

Dimensionality Reduction Alexandros Tantos Assistant Professor Aristotle University of

Investigating Dimensionality Dimensionality Dimensionality with with Investigating

STAT 209 Dimensionality Reduction November 26, 2019 Colin Reimer Dawson 1 / 24 Dimensionality

Lifting the curse of dimensionality in nonlinear system identification with tensor networks. Kim

Can Tim or Leste Avoid Can Tim or Leste Avoid the Resource Curse? the Resource Curse? By

High dimensional computing - the upside of the curse of dimensionality Peer Neubert Stefan

Curse of Dimensionality in Pivot-based Indexes Ilya Volnyansky, Vladimir Pestov Department of

Overcoming the curse of dimensionality: from nonlinear Monte Carlo to deep artificial neural

Concepts for Breaking the Curse of Dimensionality for the Optimal Control HJB Equation Karl

When can Deep Networks avoid the curse of dimensionality and other theoretical puzzles Tomaso

Lecture 3: Kernel Regression Distance Metrics Curse of Dimensionality Linear

Dampening the Curse of Dimensionality Decomposition Methods for Stochastic Optimization Problems

Lecture 3: Kernel Regression Curse of Dimensionality Aykut Erdem February 2016 Hacettepe

On k -anonymity and the curse of dimensionality Introduction An important method for privacy

What are Connected? 1 L aszl o Fejes T oths work on maximum density packing density

Mentoring for Career &amp; Leadership Development 2018 AWIS DC Mentoring Circles Kickoff Event

Helping Patients Change Behavior: A Motivational Interviewing (MI) Approach Barbara L Beebe

Regression Methods 1. Linear Regression with only one parameter, and without offset; MLE and

The Graphs of Planar Soap Bubbles David Eppstein Computer Science Dept., University of

Topic 24 Classes Part II &quot;Object-oriented programming as it emerged in Simula 67 allows

The ABCs of ADTs Algebraic Data Types Justin Lubin January 18, 2018 Asynchronous Anonymous @

Instantaneous Packet Delay Variation ipdv Carlo M. Demichelis - CSELT 41th IETF - Los Angeles

Mentoring for Career & Leadership Development 2018 AWIS DC Mentoring Circles Kickoff Event

Topic 24 Classes Part II "Object-oriented programming as it emerged in Simula 67 allows