[PPT] - image matching presented by Dmytro Mishkin joint work with PowerPoint Presentation

SLIDE 1

Crafting and learning for image matching

presented by Dmytro Mishkin

joint work with

Anastasia Mishchuk, Milan Pultar, Filip Radenovic, Daniel Barath, Michal Perdoch, Jiri Matas

1 2019.07.06, Odesa, EECVC

SLIDE 2

What is image matching?

The task is to find the correspondences between pixels in two images and/or find a geometrical relations

between camera poses

It`s special version is also known as “wide baseline stereo”:
large change in viewpoint, illumination, time & occlusions, modality

2019.07.06, Odesa, EECVC 2

SLIDE 3

Where is image matching useful? 3D rec & SLAM

R. Mur-Artal, and J. D. Tardós.

ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras, arXiv 2016

L. Schonberger and J.-M. Frahm,

Structure-from-Motion Revisited, 2016 COLMAP

3 2019.07.06, Odesa, EECVC

Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich MagicLeap SLAM

SLIDE 4

Image retrieval 2.0

F. Radenovic, series of works

4 2019.07.06, Odesa, EECVC Google Landmark Retrieval challenge 2019 winner

Where is image matching useful? Image retrieval

SLIDE 5

What is NOT topic of my talk

Semantic correspondences (the object/scene is not the same)

2019.07.06, Odesa, EECVC 5

Images from Aberman et al. “Neural Best-Buddies: Sparse Cross-Domain Correspondence”, SIGGRAPH 2018

SLIDE 6

What is NOT topic of my talk

Short baseline stereo (wait for Anastasiia Mishchuk talk at 17:20)
Optimization methods for stereo
see talks from previous EECVCs:

Alexander Shekhovtsov (2017) and Tolga Birdal (2018)

2019.07.06, Odesa, EECVC 6

SLIDE 7

Wide baseline stereo pipeline

7 2019.07.06, Odesa, EECVC

Measurement region selector Measurement region selector

Matching

Descriptor Descriptor Detector Detector Geometrical verification (RANSAC)

Image credit: Andrea Vedaldi, ICCVW 2017

Single feature visualization

SLIDE 8

Toy example for illustration: matching with OpenCV SIFT

2019.07.06, Odesa, EECVC 8

Try yourself: https://github.com/ducha-aiki/matching-strategies-comparison

SLIDE 9

Toy example for illustration: matching with OpenCV SIFT

2019.07.06, Odesa, EECVC 9

Recovered 1st to 2nd image projection, ground truth 1st to 2nd image project, inlier correspondences

SLIDE 10

Geometric verification (RANSAC)

2019.07.06, Odesa, EECVC 10

Measurement region selector Measurement region selector

Matching

Descriptor Descriptor Detector Detector Geometrical verification (RANSAC)

SLIDE 11

Homography: planar surface/static camera

2019.07.06, Odesa, EECVC 11 Image credit: forums.fast.ai

Planar surface or static camera → use homography Image with dominant plane → use homography Not sure what to use? → try homography first.

SLIDE 12

Fundamental matrix: general two-view case

2019.07.06, Odesa, EECVC 12

General two view geometry in

static scene. A corresponding point lies somewhere on a line in the other image. Where on the line - depends

n the (unknown) depth
Weaker constraint than

homography

Still rigid (no motion in scene

assumed)

Image credit: https://en.wikipedia.org/wiki/Epipolar_geometry

SLIDE 13

RANSAC: fitting the data with gross outliers

What is it

2019.07.06, Odesa, EECVC 13 Image credit: https://scipy-cookbook.readthedocs.io/items/RANSAC.html

OpenCV functions: cv2.findHomography() cv2.findFundamentalMatrix() We will publish soon a python package, which is 2..5 times faster and have an additional tricks inside https://github.com/ducha-aiki/pyransac (save this link, the repo is private now, will clean-up and open next week)

SLIDE 14

Pitfails and solutions: homography

Wrong geometry case because of dirty correspondences

2019.07.06, Odesa, EECVC 14

OpenCV finds 31 wrong inliers in 0.018s. CMP RANSAC finds 6 wrong inliers in 0.004s. See the same pattern in img1: 3 corrs in line + group. RANSAC H is prone to such case

SLIDE 15

Pitfails and solutions: homography

Solution: 2-way error metric

2019.07.06, Odesa, EECVC 15

CMP RANSAC + transfer check: finds 48 correct inliers in 0.005s. 𝐼2−1 = 𝐼1−2

−1

Check #inliers consistent in opposite direction Python CMP RANSAC package will be available soon

D. Mishkin, J. Matas and M. Perdoch. MODS: Fast and Robust Method for Two-View Matching, CVIU 2015,

SLIDE 16

Pitfails and solutions: fundamental matrix

F is too permissive (point to line)

2019.07.06, Odesa, EECVC 16

SLIDE 17

Pitfails and solutions: fundamental matrix

LAF-check: remember that local feature is oriented circle or ellipse, not just a point.
Check if additional points on circle are consistent with geometry

2019.07.06, Odesa, EECVC 17

D. Mishkin, J. Matas and M. Perdoch. MODS: Fast and Robust Method for Two-View Matching, CVIU 2015,

SLIDE 18

Matching strategies

2019.07.06, Odesa, EECVC 18

Measurement region selector Measurement region selector

Matching

Descriptor Descriptor Detector Detector Geometrical verification (RANSAC)

SLIDE 19

Nearest neighbor (1NN) strategy

2019.07.06, Odesa, EECVC 20

Features from img1 are matched to features from img2 You can see, that it is asymmetric and allowing “many-to-one” matches

SLIDE 20

Nearest neighbor (NN) strategy

2019.07.06, Odesa, EECVC 21

Features from img1 are matched to features from img2 OpenCV RANSAC failed to find a good model with NN matching Found 1st image projection: blue, ground truth: green, Inlier correspondences: yellow

SLIDE 21

Mutual nearest neighbor (MNN) strategy

2019.07.06, Odesa, EECVC 22

Features from img1 are matched to features from img2 Only cross-consistent (mutual NNs) matches are retained.

SLIDE 22

Mutual nearest neighbor (MNN) strategy

2019.07.06, Odesa, EECVC 23

OpenCV RANSAC failed to find a good model with MNN matching No one-to-many connections, but still bad Found 1st image projection: blue, ground truth: green , inlier correspondences: yellow Features from img1 are matched to features from img2 Only cross-consistent (mutual NNs) matches are retained.

SLIDE 23

Second nearest neighbor ratio (SNN) strategy

2019.07.06, Odesa, EECVC 24

1stNN 2ndNN 2ndNN 1stNN 2ndNN 1stNN

1stNN/2ndNN > 0.8, drop 1stNN/2ndNN < 0.8, keep Features from img1 are matched to features from img2

we look for 2 nearest neighbors
If both are too similar (1stNN/2ndNN ratio > 0.8) →

discard

If 1st NN is much closer (1stNN/2ndNN ratio ≤ 0.8) →

keep

D. Lowe, Distinctive Image Features from Scale-Invariant Keypoints, IJCV 2004

SLIDE 24

2019.07.06, Odesa, EECVC 25

NN SNN SNN NN SNN NN

NN/SNN > 0.8, drop NN/SNN < 0.8, keep OpenCV RANSAC found a model roughly correct No one-to-many connections, but still bad Found 1st image projection: blue, ground truth: green , inlier correspondences: yellow

Second nearest neighbor ratio (SNN) strategy

SLIDE 25

1st geometrically inconsistent nearest neighbor ratio (FGINN) strategy

2019.07.06, Odesa, EECVC 26

SNN ratio is cool, but what about symmetrical, or too closely detected features? Ratio test will kill them. Solution: look for 2nd nearest neighbor, which is far enough from 1st nearest.

Mishkin et al., “MODS: Fast and Robust Method for Two-View Matching”, CVIU 2015

SLIDE 26

1st geometrically inconsistent nearest neighbor ratio (FGINN) strategy

2019.07.06, Odesa, EECVC 27

SNN ratio is cool, but what about symmetrical, or too closely detected features? Ratio test will kill them. Solution: look for 2nd nearest neighbor, which is far enough from 1st nearest.

Mishkin et al., “MODS: Fast and Robust Method for Two-View Matching”, CVIU 2015

SLIDE 27

SNN vs FGINN

2019.07.06, Odesa, EECVC 28

Mishkin et al., “MODS: Fast and Robust Method for Two-View Matching”, CVIU 2015

SNN: roughly correct FGINN: more correspondences, better geometry found

SLIDE 28

Symmetrical FGINN

2019.07.06, Odesa, EECVC 29

Recall, that FGINN is still asymmetric: Matching (Img1 → Img2) ≠ (Img2 → Img1) We can do both (Img1 → Img2) and (Img2 → Img1) and keep all FGINNs (union)

r only cross-consistent FGINNs

SLIDE 29

Learned filtering strategy (CVPR 2018)

2019.07.06, Odesa, EECVC 30 Yi et al. Learning to Find Good Correspondences https://arxiv.org/abs/1711.05971

Input: matches (x1, y1, x2, y2) [ N x 4 ] Output: scores [ N x 1 ]

SLIDE 30

Evaluation on IMW2019 data

2019.07.06, Odesa, EECVC 31

CVPR 2019 competition https://image-matching-workshop.github.io/ Evaluation Stereo: features ⇨ matching ⇨ ⇨ OpenCV RANSAC ⇨ pose estimation

Participants

rganizers

Metric: # of precise enough recovered camera poses (mAP @ 15°)

SLIDE 31

Evaluation on IMW2019 data

2019.07.06, Odesa, EECVC 32

NN – are you kidding? Never use it alone
SNN is simple and good
FGINN is always a bit better
Symmetrical FGINN rocks
Learning is not that powerful (yet?)

SLIDE 32

Descriptor: HardNet (NIPS, 2017)

Mishchuk et.al. Working hard to know your neighbor’s margins: Local descriptor learning loss. NIPS 2017

33 2019.07.06, Odesa, EECVC

Measurement region selector Measurement region selector

Matching

Descriptor Descriptor Detector Detector Geometrical verification (RANSAC)

SLIDE 33

Q1: How to find correct correspondences?
Q2: How to filter out features, which do not have a correspondence?
A1: Nearest neighbor by descriptor distance
A2: Threshold the second-to-first nearest ratio (SNN)

HardNet: lets use it for training CNNs!

34

Classical way to select good correspondences

2019.07.06, Odesa, EECVC

SLIDE 34

Architecture: deep, VGGNet style

Adopted from previous sota L2Net descriptor

Tian et al (CVPR 2017).

Vanilla CNN: Convolution + BatchNorm + ReLU

36

2019.07.06, Odesa, EECVC

SLIDE 35

37

Sampling: positives: random negatives: hard-in-batch

2019.07.06, Odesa, EECVC

SLIDE 36

HardNet vs SIFT descriptor

2019.07.06, Odesa, EECVC 38

Mishkin et al., “MODS: Fast and Robust Method for Two-View Matching”, CVIU 2015

SIFT: 71 inliers HardNet: 121 inliers

SLIDE 37

39

Results: HPatches

1.5 … 2 times better than rootSIFT:

2019.07.06, Odesa, EECVC

SLIDE 38

GeoDesc: same architecture, special loss and sampling utilizing 3d reconstruction data. 40

HardNet training scales well with the bigger datasets

Luo et.al, ECCV 2018

2019.07.06, Odesa, EECVC

SLIDE 39

Descriptor: creating the dataset (CVWW, 2019)

Leveraging Outdoor Webcams for Local Descriptor Learning Milan Pultar, Dmytro Mishkin, Jiří Matas

41 2019.07.06, Odesa, EECVC

Measurement region selector Measurement region selector

Matching

Descriptor Descriptor Detector Detector Geometrical verification (RANSAC)

SLIDE 40

Brown dataset
HPatches
PhotoSynth
GL3D

Existing datasets for local descriptor learning

42

SLIDE 41

1 128 millions of images
Almost 30K webcams
Continuously growing
Cameras placed across all world
~10 TB of data
Each camera in one directory

○

Split further into folders by year

○

Image timestamp in GMT

○

GPS info not always available

Archive of Many Outdoor Scenes (AMOS)

43

Camera 1001 Camera 1002 Images - 2011 Images - 2013 Images - 2010 Images - 2011

SLIDE 42

AMOS views

44

Good cameras Bad cameras

SLIDE 43

Pipeline of AMOS Patches

45

SLIDE 44

Camera selection

Choose randomly 20 images from each camera
Test each image using the criteria
Keep the camera, if 14/20 images pass

46

> 474 cameras

Sky segmentation in the wild: An empirical study.

R. P. Mihail et al, 2012

(https://github.com/kuangliu/torchcv)

SLIDE 45

Appearance clustering

Solves data redundancy
Use fc6 layer of ImageNet-pretrained AlexNet
Run K-means in the AlexNet output space
Choose K=120 most representative images

(by looking at the corresponding outputs)

47

> 474 cameras, each 120 images

Imagenet classification with deep convolutional neural networks.

A. Krizhevsky et al, 2012

SLIDE 46

Viewpoint reclustering

Solves switching of cameras between views
Uses MODS (image matching) in greedy algorithm
1. Pick a reference image
2. Find matching pairs
3. Create a new view; exclude images from original sequence
4. If original sequence not empty:

Repeat

Keep the biggest view from each camera, 50 images each (if available)

48

> 273 views

SLIDE 47

Registration

Results still not satisfactory
Why?

○

MODS often outputs homography matrix only for small part of image

○

Need for final manual check

> Use GDB-ICP
In each view

○

Run registration on pairs of images

○

If a single fail -> remove the whole view

49

> 151 registered views

SLIDE 48

Manual pruning

Several problems not detected so far

○

Dynamic scenes

○

Cloud-dominated scenes

○

Views with very similar content

50

> 27 registered views, 50 images each

SLIDE 49

Patch selection

51

Apply masks (crop out text etc.)
Sampling of centers (response function)
Random rotation (any angle)
Random scale

SLIDE 50

Experiments: (de)registration

Displace each patch randomly

○

bserve the influence on precision
Result:

○

Precise registration is important

mAP - mean average precision

52

HPatches matching task, full split

SLIDE 51

Experiments: #batch composition

We use hard-in-batch triplet margin loss
Composition of a batch influences precision
Idea: choose a subset of views

as source for patches

Intuition: Tough pairs often come

from the same image

> Improvement

53

HPatches matching task, full split

SLIDE 52

Evaluation

New state-of-the-art in matching under

illumination changes (to the best of our knowledge)

Outperforms recently proposed

HardNetPS in full split

We propose AMOS Patches test split

for evaluation of robustness to lighting and season-related conditions

55

SLIDE 53

Measurement region selector:

rientation

56 2019.07.06, Odesa, EECVC

Measurement region selector Measurement region selector

Matching

Descriptor Descriptor Detector Detector Geometrical verification (RANSAC)

SLIDE 54

Which patch should we describe?

2019.07.06, Odesa, EECVC 57

Detector: x, y, scale Should we rotate patch? Should we deform patch? Handcrafted: dominant orientation Learned orientation: CNN

Yi et al. Learning to Assign Orientations to Feature Points CVPR 2016

D. Lowe, Distinctive Image Features from Scale-Invariant Keypoints, IJCV 2004

SLIDE 55

If images are upright for sure: don`t detect orientation

2019.07.06, Odesa, EECVC 58

DoG + HardNet matches +FGINN union + RANSAC. Found 1st image projection: blue, ground truth: green , inlier correspondences: yellow Dominant gradient orientation: 123 inliers Learned orientation: 140 inliers Constant orientation: 181 inliers

SLIDE 56

AffNet (ECCV 2018) Measurement region selector

59 2019.07.06, Odesa, EECVC

Measurement region selector Measurement region selector

Matching

Descriptor Descriptor Detector Detector Geometrical verification (RANSAC)

SLIDE 57

AffNet: learning measurement region

Mishkin et.al. Repeatability Is Not Enough: Learning Affine Regions via Discriminability. ECCV 2018 60 2019.07.06, Odesa, EECVC

SLIDE 58

Do AffNet help? Yes, if the problem is hard

2019.07.06, Odesa, EECVC 61

FGINN union + RANSAC. Found 1st image projection: blue, ground truth: green , inlier correspondences: yellow

DoG + HardNet 2.0: 123 inliers DoG + AffNet + HardNet 2.0: 165 inliers

SLIDE 59

62

Find affine shape such that maximizes difference between positive

and hardest-in-batch negative examples

Positive-only learning (Yi et. Al, CVPR2015) leads to degenerated ellipses
Triplet margin (HardNet) – unstable in training affine shape

AffNet: learning measurement region

2019.07.06, Odesa, EECVC

SLIDE 60

Local feature detector

2019.07.06, Odesa, EECVC 63

Measurement region selector Measurement region selector

Matching

Descriptor Descriptor Detector Detector Geometrical verification (RANSAC)

SLIDE 61

Detector is the often failure point of the whole process

Yet we still use 10-20 y.o stuff like SIFT or FAST, because nothing

significantly better for practical purposed have been proposed

So let`s stick to the basics

2019.07.06, Odesa, EECVC 64

Stylianou et.al, WACV 2015. Characterizing Feature Matching Performance Over Long Time Periods

SLIDE 62

SIFT is the DoG detector + SIFT descriptor

Really, there is not such thing, as SIFT detector.
But everyone so got used to name DoG as SIFT 

2019.07.06, Odesa, EECVC 65

DoG filter is a simple blob template

https://docs.opencv.org/3.4.3/da/df5/tutorial_py_sift_intro.html

Gaussian scalespace, “stack of gradually smoothed versions” of original image

Detections on synthetic image

SLIDE 63

ORB: FAST detector + BRIEF descriptor

2019.07.06, Odesa, EECVC 66

FAST is a corner detector based on segment test

SLIDE 64

Joint detectors and descriptors

SuperPoint (CVPRW 2017) DELF (ICCV 2017) D2Net (CVPR 2019)

2019.07.06, Odesa, EECVC 67

Measurement region selector Measurement region selector

Matching

Descriptor Descriptor Detector Detector Geometrical verification (RANSAC)

SLIDE 65

SuperPoint

2019.07.06, Odesa, EECVC 68

DeTone et al., SuperPoint: Self-Supervised Interest Point Detection and Description CVPRW2017

SLIDE 66

DELF

2019.07.06, Odesa, EECVC 69

Noh et al., SuperPoint: Large-Scale Image Retrieval with Attentive Deep Local Features ICCV 2017

“Attention” as weighting for global descriptor

SLIDE 67

D2Net

2019.07.06, Odesa, EECVC 70

Dusmanu et al, D2-Net: A Trainable CNN for Joint Description and Detection of Local Features CVPR 2019

SLIDE 68

Comparison on toy example

2019.07.06, Odesa, EECVC 71

SuperPoint: 51 inliers DoG + HardNet: 123 inliers D2Net: 26 inliers, incorrect geometry

SLIDE 69

All things together

2019.07.06, Odesa, EECVC 72

SIFT + SNN match + OpenCV RANSAC: 27 inliers SIFT + NoOri + HardNet + FGINN union match + CMP RANSAC: 179 inliers

SLIDE 70

I really need to match this

View synthesis: MODS

2019.07.06, Odesa, EECVC 73

SLIDE 71

MODS (controller and preprocessor)

MODS handles angular viewpoint difference up to:

85° for planar scenes
30° for structured
D. Mishkin, J. Matas and M. Perdoch.

MODS: Fast and Robust Method for Two-View Matching, CVIU 2015,

Affine view synthesis

Images

Det1-Desc1 Det2-Desc2 Match RANSAC

Match! Not match? Try more view synthesis

SLIDE 72

If you DO NOT need correspondences & camera pose → DO NOT use local features.

Use global descriptor (ResNet101 GeM) + fast search (faiss)

Step 0: try OpenCV SIFT
Use proper RANSAC (private now, will clean-up and open next week)
Matching → use FGINN in two-way mode
Need to be faster → ORB.
Need to be more robust → use SIFT + HardNet 2.0
Custom data → train on your own dataset
Even more robust → use SIFT + AffNet + HardNet 2.0
If images are upright, DO NOT DETECT the ORIENTATION
Landmark data → DELF

2019.07.06, Odesa, EECVC 75

Thank you for your attention

ducha_aiki ducha-aiki