image matching presented by Dmytro Mishkin joint work with - - PowerPoint PPT Presentation

image matching
SMART_READER_LITE
LIVE PREVIEW

image matching presented by Dmytro Mishkin joint work with - - PowerPoint PPT Presentation

Crafting and learning for image matching presented by Dmytro Mishkin joint work with Anastasia Mishchuk, Milan Pultar, Filip Radenovic, Daniel Barath, Michal Perdoch, Jiri Matas 1 2019.07.06, Odesa, EECVC What is image matching? The task


slide-1
SLIDE 1

Crafting and learning for image matching

presented by Dmytro Mishkin

joint work with

Anastasia Mishchuk, Milan Pultar, Filip Radenovic, Daniel Barath, Michal Perdoch, Jiri Matas

1 2019.07.06, Odesa, EECVC

slide-2
SLIDE 2

What is image matching?

  • The task is to find the correspondences between pixels in two images and/or find a geometrical relations

between camera poses

  • It`s special version is also known as “wide baseline stereo”:
  • large change in viewpoint, illumination, time & occlusions, modality

2019.07.06, Odesa, EECVC 2

slide-3
SLIDE 3

Where is image matching useful? 3D rec & SLAM

  • R. Mur-Artal, and J. D. Tardós.

ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras, arXiv 2016

  • L. Schonberger and J.-M. Frahm,

Structure-from-Motion Revisited, 2016 COLMAP

3 2019.07.06, Odesa, EECVC

Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich MagicLeap SLAM

slide-4
SLIDE 4

Image retrieval 2.0

  • F. Radenovic, series of works

4 2019.07.06, Odesa, EECVC Google Landmark Retrieval challenge 2019 winner

Where is image matching useful? Image retrieval

slide-5
SLIDE 5

What is NOT topic of my talk

  • Semantic correspondences (the object/scene is not the same)

2019.07.06, Odesa, EECVC 5

Images from Aberman et al. “Neural Best-Buddies: Sparse Cross-Domain Correspondence”, SIGGRAPH 2018

slide-6
SLIDE 6

What is NOT topic of my talk

  • Short baseline stereo (wait for Anastasiia Mishchuk talk at 17:20)
  • Optimization methods for stereo
  • see talks from previous EECVCs:

Alexander Shekhovtsov (2017) and Tolga Birdal (2018)

2019.07.06, Odesa, EECVC 6

slide-7
SLIDE 7

Wide baseline stereo pipeline

7 2019.07.06, Odesa, EECVC

Measurement region selector Measurement region selector

Matching

Descriptor Descriptor Detector Detector Geometrical verification (RANSAC)

Image credit: Andrea Vedaldi, ICCVW 2017

Single feature visualization

slide-8
SLIDE 8

Toy example for illustration: matching with OpenCV SIFT

2019.07.06, Odesa, EECVC 8

Try yourself: https://github.com/ducha-aiki/matching-strategies-comparison

slide-9
SLIDE 9

Toy example for illustration: matching with OpenCV SIFT

2019.07.06, Odesa, EECVC 9

Recovered 1st to 2nd image projection, ground truth 1st to 2nd image project, inlier correspondences

slide-10
SLIDE 10

Geometric verification (RANSAC)

2019.07.06, Odesa, EECVC 10

Measurement region selector Measurement region selector

Matching

Descriptor Descriptor Detector Detector Geometrical verification (RANSAC)

slide-11
SLIDE 11

Homography: planar surface/static camera

2019.07.06, Odesa, EECVC 11 Image credit: forums.fast.ai

Planar surface or static camera → use homography Image with dominant plane → use homography Not sure what to use? → try homography first.

slide-12
SLIDE 12

Fundamental matrix: general two-view case

2019.07.06, Odesa, EECVC 12

  • General two view geometry in

static scene. A corresponding point lies somewhere on a line in the other image. Where on the line - depends

  • n the (unknown) depth
  • Weaker constraint than

homography

  • Still rigid (no motion in scene

assumed)

Image credit: https://en.wikipedia.org/wiki/Epipolar_geometry

slide-13
SLIDE 13

RANSAC: fitting the data with gross outliers

  • What is it

2019.07.06, Odesa, EECVC 13 Image credit: https://scipy-cookbook.readthedocs.io/items/RANSAC.html

OpenCV functions: cv2.findHomography() cv2.findFundamentalMatrix() We will publish soon a python package, which is 2..5 times faster and have an additional tricks inside https://github.com/ducha-aiki/pyransac (save this link, the repo is private now, will clean-up and open next week)

slide-14
SLIDE 14

Pitfails and solutions: homography

  • Wrong geometry case because of dirty correspondences

2019.07.06, Odesa, EECVC 14

OpenCV finds 31 wrong inliers in 0.018s. CMP RANSAC finds 6 wrong inliers in 0.004s. See the same pattern in img1: 3 corrs in line + group. RANSAC H is prone to such case

slide-15
SLIDE 15

Pitfails and solutions: homography

  • Solution: 2-way error metric

2019.07.06, Odesa, EECVC 15

CMP RANSAC + transfer check: finds 48 correct inliers in 0.005s. 𝐼2−1 = 𝐼1−2

−1

Check #inliers consistent in opposite direction Python CMP RANSAC package will be available soon

  • D. Mishkin, J. Matas and M. Perdoch. MODS: Fast and Robust Method for Two-View Matching, CVIU 2015,
slide-16
SLIDE 16

Pitfails and solutions: fundamental matrix

  • F is too permissive (point to line)

2019.07.06, Odesa, EECVC 16

slide-17
SLIDE 17

Pitfails and solutions: fundamental matrix

  • LAF-check: remember that local feature is oriented circle or ellipse, not just a point.
  • Check if additional points on circle are consistent with geometry

2019.07.06, Odesa, EECVC 17

  • D. Mishkin, J. Matas and M. Perdoch. MODS: Fast and Robust Method for Two-View Matching, CVIU 2015,
slide-18
SLIDE 18

Matching strategies

2019.07.06, Odesa, EECVC 18

Measurement region selector Measurement region selector

Matching

Descriptor Descriptor Detector Detector Geometrical verification (RANSAC)

slide-19
SLIDE 19

Nearest neighbor (1NN) strategy

2019.07.06, Odesa, EECVC 20

Features from img1 are matched to features from img2 You can see, that it is asymmetric and allowing “many-to-one” matches

slide-20
SLIDE 20

Nearest neighbor (NN) strategy

2019.07.06, Odesa, EECVC 21

Features from img1 are matched to features from img2 OpenCV RANSAC failed to find a good model with NN matching Found 1st image projection: blue, ground truth: green, Inlier correspondences: yellow

slide-21
SLIDE 21

Mutual nearest neighbor (MNN) strategy

2019.07.06, Odesa, EECVC 22

Features from img1 are matched to features from img2 Only cross-consistent (mutual NNs) matches are retained.

slide-22
SLIDE 22

Mutual nearest neighbor (MNN) strategy

2019.07.06, Odesa, EECVC 23

OpenCV RANSAC failed to find a good model with MNN matching No one-to-many connections, but still bad Found 1st image projection: blue, ground truth: green , inlier correspondences: yellow Features from img1 are matched to features from img2 Only cross-consistent (mutual NNs) matches are retained.

slide-23
SLIDE 23

Second nearest neighbor ratio (SNN) strategy

2019.07.06, Odesa, EECVC 24

1stNN 2ndNN 2ndNN 1stNN 2ndNN 1stNN

1stNN/2ndNN > 0.8, drop 1stNN/2ndNN < 0.8, keep Features from img1 are matched to features from img2

  • we look for 2 nearest neighbors
  • If both are too similar (1stNN/2ndNN ratio > 0.8) →

discard

  • If 1st NN is much closer (1stNN/2ndNN ratio ≤ 0.8) →

keep

  • D. Lowe, Distinctive Image Features from Scale-Invariant Keypoints, IJCV 2004
slide-24
SLIDE 24

2019.07.06, Odesa, EECVC 25

NN SNN SNN NN SNN NN

NN/SNN > 0.8, drop NN/SNN < 0.8, keep OpenCV RANSAC found a model roughly correct No one-to-many connections, but still bad Found 1st image projection: blue, ground truth: green , inlier correspondences: yellow

Second nearest neighbor ratio (SNN) strategy

slide-25
SLIDE 25

1st geometrically inconsistent nearest neighbor ratio (FGINN) strategy

2019.07.06, Odesa, EECVC 26

SNN ratio is cool, but what about symmetrical, or too closely detected features? Ratio test will kill them. Solution: look for 2nd nearest neighbor, which is far enough from 1st nearest.

Mishkin et al., “MODS: Fast and Robust Method for Two-View Matching”, CVIU 2015

slide-26
SLIDE 26

1st geometrically inconsistent nearest neighbor ratio (FGINN) strategy

2019.07.06, Odesa, EECVC 27

SNN ratio is cool, but what about symmetrical, or too closely detected features? Ratio test will kill them. Solution: look for 2nd nearest neighbor, which is far enough from 1st nearest.

Mishkin et al., “MODS: Fast and Robust Method for Two-View Matching”, CVIU 2015

slide-27
SLIDE 27

SNN vs FGINN

2019.07.06, Odesa, EECVC 28

Mishkin et al., “MODS: Fast and Robust Method for Two-View Matching”, CVIU 2015

SNN: roughly correct FGINN: more correspondences, better geometry found

slide-28
SLIDE 28

Symmetrical FGINN

2019.07.06, Odesa, EECVC 29

Recall, that FGINN is still asymmetric: Matching (Img1 → Img2) ≠ (Img2 → Img1) We can do both (Img1 → Img2) and (Img2 → Img1) and keep all FGINNs (union)

  • r only cross-consistent FGINNs
slide-29
SLIDE 29

Learned filtering strategy (CVPR 2018)

2019.07.06, Odesa, EECVC 30 Yi et al. Learning to Find Good Correspondences https://arxiv.org/abs/1711.05971

Input: matches (x1, y1, x2, y2) [ N x 4 ] Output: scores [ N x 1 ]

slide-30
SLIDE 30

Evaluation on IMW2019 data

2019.07.06, Odesa, EECVC 31

CVPR 2019 competition https://image-matching-workshop.github.io/ Evaluation Stereo: features ⇨ matching ⇨ ⇨ OpenCV RANSAC ⇨ pose estimation

Participants

  • rganizers

Metric: # of precise enough recovered camera poses (mAP @ 15°)

slide-31
SLIDE 31

Evaluation on IMW2019 data

2019.07.06, Odesa, EECVC 32

  • NN – are you kidding? Never use it alone
  • SNN is simple and good
  • FGINN is always a bit better
  • Symmetrical FGINN rocks
  • Learning is not that powerful (yet?)
slide-32
SLIDE 32

Descriptor: HardNet (NIPS, 2017)

Mishchuk et.al. Working hard to know your neighbor’s margins: Local descriptor learning loss. NIPS 2017

33 2019.07.06, Odesa, EECVC

Measurement region selector Measurement region selector

Matching

Descriptor Descriptor Detector Detector Geometrical verification (RANSAC)

slide-33
SLIDE 33
  • Q1: How to find correct correspondences?
  • Q2: How to filter out features, which do not have a correspondence?
  • A1: Nearest neighbor by descriptor distance
  • A2: Threshold the second-to-first nearest ratio (SNN)

HardNet: lets use it for training CNNs!

34

Classical way to select good correspondences

2019.07.06, Odesa, EECVC

slide-34
SLIDE 34

Architecture: deep, VGGNet style

  • Adopted from previous sota L2Net descriptor

Tian et al (CVPR 2017).

  • Vanilla CNN: Convolution + BatchNorm + ReLU

36

2019.07.06, Odesa, EECVC

slide-35
SLIDE 35

37

Sampling: positives: random negatives: hard-in-batch

2019.07.06, Odesa, EECVC

slide-36
SLIDE 36

HardNet vs SIFT descriptor

2019.07.06, Odesa, EECVC 38

Mishkin et al., “MODS: Fast and Robust Method for Two-View Matching”, CVIU 2015

SIFT: 71 inliers HardNet: 121 inliers

slide-37
SLIDE 37

39

Results: HPatches

1.5 … 2 times better than rootSIFT:

2019.07.06, Odesa, EECVC

slide-38
SLIDE 38

GeoDesc: same architecture, special loss and sampling utilizing 3d reconstruction data. 40

HardNet training scales well with the bigger datasets

Luo et.al, ECCV 2018

2019.07.06, Odesa, EECVC

slide-39
SLIDE 39

Descriptor: creating the dataset (CVWW, 2019)

Leveraging Outdoor Webcams for Local Descriptor Learning Milan Pultar, Dmytro Mishkin, Jiří Matas

41 2019.07.06, Odesa, EECVC

Measurement region selector Measurement region selector

Matching

Descriptor Descriptor Detector Detector Geometrical verification (RANSAC)

slide-40
SLIDE 40
  • Brown dataset
  • HPatches
  • PhotoSynth
  • GL3D

Existing datasets for local descriptor learning

42

slide-41
SLIDE 41
  • 1 128 millions of images
  • Almost 30K webcams
  • Continuously growing
  • Cameras placed across all world
  • ~10 TB of data
  • Each camera in one directory

Split further into folders by year

Image timestamp in GMT

GPS info not always available

Archive of Many Outdoor Scenes (AMOS)

43

Camera 1001 Camera 1002 Images - 2011 Images - 2013 Images - 2010 Images - 2011

slide-42
SLIDE 42

AMOS views

44

Good cameras Bad cameras

slide-43
SLIDE 43

Pipeline of AMOS Patches

45

slide-44
SLIDE 44

Camera selection

  • Choose randomly 20 images from each camera
  • Test each image using the criteria
  • Keep the camera, if 14/20 images pass

46

  • > 474 cameras

Sky segmentation in the wild: An empirical study.

  • R. P. Mihail et al, 2012

(https://github.com/kuangliu/torchcv)

slide-45
SLIDE 45

Appearance clustering

  • Solves data redundancy
  • Use fc6 layer of ImageNet-pretrained AlexNet
  • Run K-means in the AlexNet output space
  • Choose K=120 most representative images

(by looking at the corresponding outputs)

47

  • > 474 cameras, each 120 images

Imagenet classification with deep convolutional neural networks.

  • A. Krizhevsky et al, 2012
slide-46
SLIDE 46

Viewpoint reclustering

  • Solves switching of cameras between views
  • Uses MODS (image matching) in greedy algorithm
  • 1. Pick a reference image
  • 2. Find matching pairs
  • 3. Create a new view; exclude images from original sequence
  • 4. If original sequence not empty:

Repeat

  • Keep the biggest view from each camera, 50 images each (if available)

48

  • > 273 views
slide-47
SLIDE 47

Registration

  • Results still not satisfactory
  • Why?

MODS often outputs homography matrix only for small part of image

Need for final manual check

  • > Use GDB-ICP
  • In each view

Run registration on pairs of images

If a single fail -> remove the whole view

49

  • > 151 registered views
slide-48
SLIDE 48

Manual pruning

  • Several problems not detected so far

Dynamic scenes

Cloud-dominated scenes

Views with very similar content

50

  • > 27 registered views, 50 images each
slide-49
SLIDE 49

Patch selection

51

  • Apply masks (crop out text etc.)
  • Sampling of centers (response function)
  • Random rotation (any angle)
  • Random scale
slide-50
SLIDE 50

Experiments: (de)registration

  • Displace each patch randomly

  • bserve the influence on precision
  • Result:

Precise registration is important

  • mAP - mean average precision

52

HPatches matching task, full split

slide-51
SLIDE 51

Experiments: #batch composition

  • We use hard-in-batch triplet margin loss
  • Composition of a batch influences precision
  • Idea: choose a subset of views

as source for patches

  • Intuition: Tough pairs often come

from the same image

  • > Improvement

53

HPatches matching task, full split

slide-52
SLIDE 52

Evaluation

  • New state-of-the-art in matching under

illumination changes (to the best of our knowledge)

  • Outperforms recently proposed

HardNetPS in full split

  • We propose AMOS Patches test split

for evaluation of robustness to lighting and season-related conditions

55

slide-53
SLIDE 53

Measurement region selector:

  • rientation

56 2019.07.06, Odesa, EECVC

Measurement region selector Measurement region selector

Matching

Descriptor Descriptor Detector Detector Geometrical verification (RANSAC)

slide-54
SLIDE 54

Which patch should we describe?

2019.07.06, Odesa, EECVC 57

Detector: x, y, scale Should we rotate patch? Should we deform patch? Handcrafted: dominant orientation Learned orientation: CNN

Yi et al. Learning to Assign Orientations to Feature Points CVPR 2016

  • D. Lowe, Distinctive Image Features from Scale-Invariant Keypoints, IJCV 2004
slide-55
SLIDE 55

If images are upright for sure: don`t detect orientation

2019.07.06, Odesa, EECVC 58

DoG + HardNet matches +FGINN union + RANSAC. Found 1st image projection: blue, ground truth: green , inlier correspondences: yellow Dominant gradient orientation: 123 inliers Learned orientation: 140 inliers Constant orientation: 181 inliers

slide-56
SLIDE 56

AffNet (ECCV 2018) Measurement region selector

59 2019.07.06, Odesa, EECVC

Measurement region selector Measurement region selector

Matching

Descriptor Descriptor Detector Detector Geometrical verification (RANSAC)

slide-57
SLIDE 57

AffNet: learning measurement region

Mishkin et.al. Repeatability Is Not Enough: Learning Affine Regions via Discriminability. ECCV 2018 60 2019.07.06, Odesa, EECVC

slide-58
SLIDE 58

Do AffNet help? Yes, if the problem is hard

2019.07.06, Odesa, EECVC 61

FGINN union + RANSAC. Found 1st image projection: blue, ground truth: green , inlier correspondences: yellow

DoG + HardNet 2.0: 123 inliers DoG + AffNet + HardNet 2.0: 165 inliers

slide-59
SLIDE 59

62

  • Find affine shape such that maximizes difference between positive

and hardest-in-batch negative examples

  • Positive-only learning (Yi et. Al, CVPR2015) leads to degenerated ellipses
  • Triplet margin (HardNet) – unstable in training affine shape

AffNet: learning measurement region

2019.07.06, Odesa, EECVC

slide-60
SLIDE 60

Local feature detector

2019.07.06, Odesa, EECVC 63

Measurement region selector Measurement region selector

Matching

Descriptor Descriptor Detector Detector Geometrical verification (RANSAC)

slide-61
SLIDE 61

Detector is the often failure point of the whole process

  • Yet we still use 10-20 y.o stuff like SIFT or FAST, because nothing

significantly better for practical purposed have been proposed

  • So let`s stick to the basics

2019.07.06, Odesa, EECVC 64

Stylianou et.al, WACV 2015. Characterizing Feature Matching Performance Over Long Time Periods

slide-62
SLIDE 62

SIFT is the DoG detector + SIFT descriptor

  • Really, there is not such thing, as SIFT detector.
  • But everyone so got used to name DoG as SIFT 

2019.07.06, Odesa, EECVC 65

DoG filter is a simple blob template

https://docs.opencv.org/3.4.3/da/df5/tutorial_py_sift_intro.html

Gaussian scalespace, “stack of gradually smoothed versions” of original image

Detections on synthetic image

slide-63
SLIDE 63

ORB: FAST detector + BRIEF descriptor

2019.07.06, Odesa, EECVC 66

FAST is a corner detector based on segment test

slide-64
SLIDE 64

Joint detectors and descriptors

SuperPoint (CVPRW 2017) DELF (ICCV 2017) D2Net (CVPR 2019)

2019.07.06, Odesa, EECVC 67

Measurement region selector Measurement region selector

Matching

Descriptor Descriptor Detector Detector Geometrical verification (RANSAC)

slide-65
SLIDE 65

SuperPoint

2019.07.06, Odesa, EECVC 68

DeTone et al., SuperPoint: Self-Supervised Interest Point Detection and Description CVPRW2017

slide-66
SLIDE 66

DELF

2019.07.06, Odesa, EECVC 69

Noh et al., SuperPoint: Large-Scale Image Retrieval with Attentive Deep Local Features ICCV 2017

“Attention” as weighting for global descriptor

slide-67
SLIDE 67

D2Net

2019.07.06, Odesa, EECVC 70

Dusmanu et al, D2-Net: A Trainable CNN for Joint Description and Detection of Local Features CVPR 2019

slide-68
SLIDE 68

Comparison on toy example

2019.07.06, Odesa, EECVC 71

SuperPoint: 51 inliers DoG + HardNet: 123 inliers D2Net: 26 inliers, incorrect geometry

slide-69
SLIDE 69

All things together

2019.07.06, Odesa, EECVC 72

SIFT + SNN match + OpenCV RANSAC: 27 inliers SIFT + NoOri + HardNet + FGINN union match + CMP RANSAC: 179 inliers

slide-70
SLIDE 70

I really need to match this

  • View synthesis: MODS

2019.07.06, Odesa, EECVC 73

slide-71
SLIDE 71

MODS (controller and preprocessor)

MODS handles angular viewpoint difference up to:

  • 85° for planar scenes
  • 30° for structured
  • D. Mishkin, J. Matas and M. Perdoch.

MODS: Fast and Robust Method for Two-View Matching, CVIU 2015,

Affine view synthesis

Images

Det1-Desc1 Det2-Desc2 Match RANSAC

Match! Not match? Try more view synthesis

slide-72
SLIDE 72
  • If you DO NOT need correspondences & camera pose → DO NOT use local features.

Use global descriptor (ResNet101 GeM) + fast search (faiss)

  • Step 0: try OpenCV SIFT
  • Use proper RANSAC (private now, will clean-up and open next week)
  • Matching → use FGINN in two-way mode
  • Need to be faster → ORB.
  • Need to be more robust → use SIFT + HardNet 2.0
  • Custom data → train on your own dataset
  • Even more robust → use SIFT + AffNet + HardNet 2.0
  • If images are upright, DO NOT DETECT the ORIENTATION
  • Landmark data → DELF

2019.07.06, Odesa, EECVC 75

Thank you for your attention

ducha_aiki ducha-aiki