SuperGlue: Learning Feature Matching with Graph Neural Networks - - PowerPoint PPT Presentation

superglue learning feature matching with graph neural
SMART_READER_LITE
LIVE PREVIEW

SuperGlue: Learning Feature Matching with Graph Neural Networks - - PowerPoint PPT Presentation

SuperGlue: Learning Feature Matching with Graph Neural Networks Paul-Edouard Sarlin 1 Daniel DeTone 2 Tomasz Malisiewicz 2 Andrew Rabinovich 2 Feature matching is ubiquitous 3D reconstruction Visual localization SLAM Place


slide-1
SLIDE 1

SuperGlue: Learning Feature Matching with Graph Neural Networks

Paul-Edouard Sarlin1 Daniel DeTone2 Tomasz Malisiewicz2 Andrew Rabinovich2

slide-2
SLIDE 2

Feature matching is ubiquitous

  • 3D reconstruction
  • Visual localization
  • SLAM
  • Place recognition

[Google VPS] [Image Matching Workshop 2020] [ScanNet]

slide-3
SLIDE 3
  • Extreme wide-baseline image pairs in real-time on GPU
  • State-of-the-art indoor+outdoor matching with SIFT & SuperPoint

SuperGlue = Graph Neural Nets + Optimal Transport

slide-4
SLIDE 4
  • Front-end: images to constraints

○ Recent works: deep learning for feature extraction → Convolutional Nets!

  • Back-end: optimize pose and 3D structure

[Cadena et al, 2016]

Visual SLAM

slide-5
SLIDE 5
  • Our position: learn the data association!
  • We propose a new middle-end: SuperGlue
  • 2D-to-2D feature matching

front-end feature extraction back-end MAP estimation middle-end data association

A middle-end

slide-6
SLIDE 6

A minimal matching pipeline

> Classical: SIFT, ORB > Learned: SuperPoint, D2-Net

detection description feature matching

  • utlier

filtering pose estimation

> Heuristics: ratio test, mutual check > Learned: classifier on set deep net Nearest Neighbor Matching

SuperGlue: context aggregation + matching + filtering

image pair [DeTone et al, 2018] [Yi et al, 2018]

slide-7
SLIDE 7

The importance of context

with SuperGlue no SuperGlue

slide-8
SLIDE 8

Single a match per keypoint + occlusion and noise → a soft partial assignment:

S u p e r G l u e

Inputs Outputs

  • Images A and B
  • 2 sets of M, N local features

○ Keypoints:

  • Coordinates
  • Confidence

○ Visual descriptors:

sum ≤ 1 sum ≤ 1

Problem formulation

slide-9
SLIDE 9

Solving a partial assignment problem A Graph Neural Network with attention

Self Cross

L dustbin score

+

score matrix

Attentional Aggregation

partial assignment

M+1 N+1 visual descriptor =1 matching descriptors position

+

Keypoint Encoder

local features

Sinkhorn Algorithm

column norm. row normalization

T

Encodes contextual cues & priors Reasons about the 3D scene Differentiable solver Enforces the assignment constraints = domain knowledge

slide-10
SLIDE 10
  • Initial representation for each keypoints :
  • Combines visual appearance and position with an MLP:

Self Cross

L dustbin score

+

Attentional Graph Neural Network

score matrix

Attentional Aggregation

Optimal Matching Layer

partial assignment

M+1 N+1 visual descriptor =1 matching descriptors position

+

Keypoint Encoder

local features

Sinkhorn Algorithm

column norm. row normalization

T

Multi-Layer Perceptron

slide-11
SLIDE 11

Update the representation based on other keypoints:

  • in the same image: “self” edges
  • in the other image: “cross” edges

→ A complete graph with two types of edges

Self Cross

L dustbin score

+

Attentional Graph Neural Network

score matrix

Attentional Aggregation

Optimal Matching Layer

partial assignment

M+1 N+1 visual descriptor =1 matching descriptors position

+

Keypoint Encoder Sinkhorn Algorithm

column norm. row normalization

T

feature in image at layer

local features

slide-12
SLIDE 12

Update the representation using a Message Passing Neural Network

Self Cross

L dustbin score

+

Attentional Graph Neural Network

score matrix

Attentional Aggregation

Optimal Matching Layer

partial assignment

M+1 N+1 visual descriptor =1 matching descriptors position

+

Keypoint Encoder Sinkhorn Algorithm

column norm. row normalization

T

the message

local features

slide-13
SLIDE 13

Attentional Aggregation

  • Compute the message

using self and cross attention

  • Soft database retrieval: query , key , and value

= [tile, position (70, 100)] = [tile, pos. (80, 110)] = [corner, pos. (60, 90)] = [grid, pos. (400, 600)]

query neighbors query salient points [Vaswani et al, 2017]

slide-14
SLIDE 14

A B A B

Self-attention = intra-image information flow Cross-attention = inter-image distinctive points Attention builds a soft, dynamic, sparse graph candidate matches

slide-15
SLIDE 15

Self Cross

L dustbin score

+

Attentional Graph Neural Network

score matrix

Attentional Aggregation

Optimal Matching Layer

partial assignment

M+1 N+1 visual descriptor =1 matching descriptors position

+

Keypoint Encoder Sinkhorn Algorithm

column norm. row normalization

T

Compute a score matrix for all matches:

local features

slide-16
SLIDE 16

Self Cross

L dustbin score

+

Attentional Graph Neural Network

score matrix

Attentional Aggregation

Optimal Matching Layer

partial assignment

M+1 N+1 visual descriptor =1 matching descriptors position

+

Keypoint Encoder Sinkhorn Algorithm

column norm. row normalization

T

  • Occlusion and noise: unmatched keypoints are assigned to a dustbin
  • Augment the scores with a learnable dustbin score

local features

slide-17
SLIDE 17

Self Cross

L dustbin score

+

Attentional Graph Neural Network

score matrix

Attentional Aggregation

Optimal Matching Layer

partial assignment

M+1 N+1 visual descriptor =1 matching descriptors position

+

Keypoint Encoder Sinkhorn Algorithm

column norm. row normalization

T

  • Compute the assignment that maximizes
  • Solve an optimal transport problem
  • With the Sinkhorn algorithm: differentiable & soft Hungarian algorithm

[Sinkhorn & Knopp, 1967]

local features

slide-18
SLIDE 18

Self Cross

L dustbin score

+

Attentional Graph Neural Network

score matrix

Attentional Aggregation

Optimal Matching Layer

partial assignment

M+1 N+1 visual descriptor =1 matching descriptors position

+

Keypoint Encoder Sinkhorn Algorithm

column norm. row normalization

T

  • Compute ground truth correspondences from pose and depth
  • Find which keypoints should be unmatched
  • Loss: maximize the log-likelihood of the GT cells

local features

slide-19
SLIDE 19

SuperPoint + NN + heuristics

Results: indoor - ScanNet

SuperPoint + SuperGlue

SuperGlue: more correct matches and fewer mismatches

slide-20
SLIDE 20

SuperPoint + NN + mutual check SuperPoint + NN + OA-Net (inlier classifier)

Results: outdoor - SfM

SuperPoint + SuperGlue

SuperGlue: more correct matches and fewer mismatches

slide-21
SLIDE 21

21

Results: attention patterns

Flexibility of attention → diversity of patterns

global context neighborhood distinctive keypoints self-similarities match candidates

slide-22
SLIDE 22

Evaluation

SuperGlue yields large improvements in all cases

Heuristics Learned inlier classifier

slide-23
SLIDE 23

SuperGlue @ CVPR 2020

First place in the following competitions:

  • Image matching challenge
  • Local features for visual localization
  • Visual localization for handheld devices

vision.uvic.ca/image-matching-challenge www.visuallocalization.net

slide-24
SLIDE 24

SuperGlue Learning Feature Matching with Graph Neural Networks

A major step towards end-to-end deep SLAM & SfM

psarlin.com/superglue

slide-25
SLIDE 25

Thank you

psarlin.com/superglue