distance in data space notion of distance metrics in data
play

Distance in data space Notion of distance (metrics) in data space - PowerPoint PPT Presentation

Fundamentals of AI Introduction and the most basic concepts Distance in data space Notion of distance (metrics) in data space Who is my closest neighbor? Euclidean distance Shape of the 2D sphere, R=1 Euclidean distance Euclidean distance


  1. Fundamentals of AI Introduction and the most basic concepts Distance in data space

  2. Notion of distance (metrics) in data space Who is my closest neighbor?

  3. Euclidean distance Shape of the 2D sphere, R=1

  4. Euclidean distance Euclidean distance is the most fundamental distance because physical world is locally Euclidean (with rather large locality radius!) Data space is not obliged to be Euclidean metric space Duality connections between Euclidean distance and Normal (Gaussian) distribution Duality connections between Euclidean distance and linear regression, principal components Euclidean distance is sometimes denoted as L2-norm or L2-metric

  5. Metric acsioms

  6. L1-distance Shape of the 2D sphere, R=1 ๐‘™ |๐‘ž ๐‘™ โˆ’ ๐‘Ÿ ๐‘™ | ๐ธ ๐’’, ๐’“ = เท ๐‘—=1

  7. L1-distance Shape of the 2D sphere, R=1 a ๐‘™ |๐‘ž ๐‘™ โˆ’ ๐‘Ÿ ๐‘™ | ๐ธ ๐’’, ๐’“ = เท ๐‘—=1 L1-distance is not rotationally invariant!

  8. Lp-distance Shape of the spheres ๐‘™ ๐‘ž |๐‘ž ๐‘™ โˆ’ ๐‘Ÿ ๐‘™ | ๐‘ž ๐ธ ๐’’, ๐’“ = เท ๐‘—=1 โ€ข p = 2, Euclidean distance โ€ข p = 1, L1-distance โ€ข p = โˆž, max -distance โ€ข p < 1 โ€“ fractional (pseudo)metrics, violates the triangle acsiom! If a distance acsiom is not satisfied better use word dissimilarity instead of distance or metric!

  9. Correlation dissimilarity *** Definition of Pearson coefficient, -1 <= Corr <= 1 Correlation dissimilarity = (1 - Corr(X,Y))/2 > 0 also Absolute correlation dissimilarity = 1 - |Corr(X,Y)| > 0 *** do not mix with distance correlation, dCor!

  10. Cosine similarity and Angular distance CosSim( A , B )

  11. Distance matrix โ€ข Non-negative, symmetric โ€ข Convenient for searching neighbours โ€ข Inconvenient to store cause the number of elements grows quadratically: 100000 * 100000 * 2 bytes (float16 size ) = 20 Gb of RAM

  12. k Nearest Neighbor (kNN) graph

  13. k Nearest Neighbor (kNN) graph is directed! In higher-dimensional spaces, asymmetry of kNN graphs increases Asymmetry This can lead to hubness (points which are neigbours of many (>>k) other points) Hubness might be detrimental for machine learning methods based on kNN graphs

  14. Mutual Nearest Neigbours (MNN) graph Mutually Nearest Neigbours

  15. Mutual Nearest Neigbours (MNN) graph Matching objects in two datasets

  16. Mutual Nearest Neigbours (MNN) graph Matching objects in two datasets Mismatch Match

  17. Metric learning โ€ข Example: learn the distance function from labeled data Label Orange By choosing distance: Make red lines closer! Make blue lines more Label Green distant! Label Blue

  18. Dimensionality curse, measure concentration Point neighborhood in multidimensional space of radius e *D, e << 1 , where D = mean distance between points High-dimensional Low-dimensional case case When number of features >> number of objects When the intrinsic dimension of the data > log2(number of objects)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend