MDS Embedding MDS takes as input a distance matrix D , containing all - - PowerPoint PPT Presentation

mds embedding
SMART_READER_LITE
LIVE PREVIEW

MDS Embedding MDS takes as input a distance matrix D , containing all - - PowerPoint PPT Presentation

MDS Embedding MDS takes as input a distance matrix D , containing all N N pair of distances between elements xi , and embed the elements in N dimensional space such that the inter distances Dij are preserved as much as possible by ||xi xj||


slide-1
SLIDE 1

MDS Embedding

MDS takes as input a distance matrix D, containing all N × N pair of distances between elements xi, and embed the elements in N dimensional space such that the inter distances Dij are preserved as much as possible by ||xi−xj|| in the embedded space.

1

slide-2
SLIDE 2

2

slide-3
SLIDE 3

3

slide-4
SLIDE 4

4

slide-5
SLIDE 5

128 dim space visualized by t-SNE

Joint Embeddings of Shapes and Images

slide-6
SLIDE 6
slide-7
SLIDE 7

Image based Shape Retrieval

slide-8
SLIDE 8

Shape based Image Retrieval

slide-9
SLIDE 9

Cross-View Image Retrieval

slide-10
SLIDE 10

MDS Embedding

11

slide-11
SLIDE 11

Common MDS do not handle outliers

input SMACOF Sammon

12

slide-12
SLIDE 12

Two outlier distances lead to significant distortion in the embedding

In many real-world scenarios, input distances may be noisy or contain outliers, due to malicious acts, system faults, or erroneous measures.

13

slide-13
SLIDE 13

Two outlier distances lead to significant distortion in the embedding

In many real-world scenarios, input distances may be noisy or contain outliers, due to malicious acts, system faults, or erroneous measures.

14

slide-14
SLIDE 14

Least square fitting

15

slide-15
SLIDE 15

Least square fitting

16

slide-16
SLIDE 16

RANSAC

  • Generate Lines using Pairs of Points.
  • Count number of points within ε of line.
  • Pick the best line.

17

slide-17
SLIDE 17

RANSAC

Sadly can’t be applied to MDS – a lot of data is needed for generating an embedding. Almost every sample will still have outliers.

18

slide-18
SLIDE 18

Forero and Giannakis method

Lasso regression parameter (when bigger there are less outliers) The non-zero entries represent the outlier pairs

  • Tuning the regularization parameter is not a simple task.
  • There are NxN unknowns instead of just dxN, thus it is significantly harder to solve

accuratly and thus very sensitive to the initial guess.

19

slide-19
SLIDE 19

Different λ applied to the same dataset with the same initial guess, leads to different embedding qualities.

20

slide-20
SLIDE 20

Same λ applied to the same datasets with different initial guesses, yields different embedding qualities.

21

slide-21
SLIDE 21

This graph presents the number of non-zero elements in O (which represent outliers) as a function of λ. The three plots were generated using different initial guesses that were uniformly sampled.

FG12 method is overly sensitive to the initial guess.

22

slide-22
SLIDE 22

Embed and remove pairs which are

  • verly stressed…

Sadly, the overly stressed edges are not necessarily outliers. (for example long edge that became a short one can cause a lot of short edges to deform in the embedding). Also other stress weighting has their shortcomings – we tested that method for a while.

23

slide-23
SLIDE 23

Geometric Reasoning

An outlier distance tends to break many triangles. We detect those outliers and filter them.

24

slide-24
SLIDE 24

Broken Triangles

d3 d1 d2

For triangle with edge length If

d3 d1 d2

then the triangle is broken

25

slide-25
SLIDE 25

Broken Triangles

An edge in a broken triangle is not necessarily an outlier

26

Not every outlier edge necessarily breaks a triangle

slide-26
SLIDE 26

Histogram of Broken Triangles

27

slide-27
SLIDE 27

Histogram of Broken Triangles

We set ф to be the smallest value that satisfies the following two requirements:

28

slide-28
SLIDE 28

Shepard Diagram

Each point represents a distance. The X-axis represents the input distances and the Y-axis represents the distance in the embedding result.

29

slide-29
SLIDE 29

The Red dots are the distances classified as outliers. Some of the are on diagonal – those are the false positives.

30

slide-30
SLIDE 30

Precision and Recall

31

slide-31
SLIDE 31

Threshold Performance

฀ The outlier detection rate as a function of the shrinkage enlargement of the outliers relative to the ground-truth value. Edges that are strongly deformed (either squeezed or enlarged) are likely to be detected. Note: the X-axis is logarithmic: log2(Dout /DGT ).

32

slide-32
SLIDE 32

Qualitative Comparison

A comparison between SMACOF and our method as a function of outlier rate. Up to 22% our method has better performance.

ij j i ij j i ij

D X X S S Score || || log , − = = ∑

33

slide-33
SLIDE 33

The embedding of a ’PLUS’ shaped dataset with 10%

  • utliers, and a ’SPIRAL’

shaped dataset with 15%

  • utliers.

(a,c) SMACOF (b,d) Our technique.

35

slide-34
SLIDE 34

128 US Cities

Two-dimensional embedding of SGB128 distances with 10% outliers. ฀ The green dots are the ground-truth locations and the magenta dots represent the embedded points. (a) SMACOF (b) Our Filtering technique.

36

slide-35
SLIDE 35

Protein Dataset

Average cluster index value of 10 executions. ฀ The embedding dimension is set to 6, since for lower dimensions SMACOF fails due to co-located points.

37

slide-36
SLIDE 36

Outlier Detection for Robust Multi- dimensional Scaling

Thank You

40

slide-37
SLIDE 37

Outlier Detection for Robust Multi-dimensional Scaling

Leonid Blouvshtein and Daniel Cohen-Or

41