SuperGlue: Learning Feature Matching with Graph Neural Networks - PowerPoint PPT Presentation

SuperGlue: Learning Feature Matching with Graph Neural Networks Paul-Edouard Sarlin 1 Daniel DeTone 2 Tomasz Malisiewicz 2 Andrew Rabinovich 2

Feature matching is ubiquitous ● 3D reconstruction ● Visual localization ● SLAM ● Place recognition [Image Matching Workshop 2020] [ScanNet] [Google VPS]

SuperGlue = Graph Neural Nets + Optimal Transport ● Extreme wide-baseline image pairs in real-time on GPU ● State-of-the-art indoor + outdoor matching with SIFT & SuperPoint

Visual SLAM ● Front-end : images to constraints ○ Recent works: deep learning for feature extraction → Convolutional Nets! ● Back-end : optimize pose and 3D structure [Cadena et al, 2016]

A middle-end front-end middle-end back-end feature MAP data extraction association estimation ● Our position: learn the data association! ● We propose a new middle-end : SuperGlue ● 2D-to-2D feature matching

A minimal matching pipeline SuperGlue : context aggregation + matching + filtering image pair feature outlier pose detection description matching filtering estimation Nearest > Classical: SIFT, ORB > Heuristics: ratio test, mutual check Neighbor > Learned: SuperPoint, D2-Net > Learned: classifier on set Matching deep net [DeTone et al, 2018] [Yi et al, 2018]

The importance of context no SuperGlue with SuperGlue

Problem formulation S u Inputs p Outputs e r G l u e ● Images A and B Single a match per keypoint ● 2 sets of M , N local features + occlusion and noise → a soft partial assignment : ○ Keypoints: - Coordinates - Confidence sum ≤ 1 ○ Visual descriptors: sum ≤ 1

local Attentional Aggregation Sinkhorn Algorithm matching features descriptors partial Self Cross visual descriptor + assignment score matrix row normalization position Keypoint M+1 Encoder column norm. + T L dustbin N+1 =1 score A Graph Neural Network Solving a partial with attention assignment problem Encodes contextual cues & priors Differentiable solver Reasons about the 3D scene Enforces the assignment constraints = domain knowledge

Attentional Graph Neural Network Optimal Matching Layer local Attentional Aggregation Sinkhorn Algorithm matching features descriptors partial Self Cross visual descriptor + assignment score matrix row normalization position Keypoint M+1 Encoder column norm. + T L dustbin N+1 =1 score ● Initial representation for each keypoints : ● Combines visual appearance and position with an MLP: Multi-Layer Perceptron

Attentional Graph Neural Network Optimal Matching Layer local Attentional Aggregation Sinkhorn Algorithm matching features descriptors partial Self Cross visual descriptor + assignment score matrix row normalization position Keypoint M+1 Encoder column norm. + T L dustbin N+1 =1 score Update the representation based on other keypoints: - in the same image: “ self ” edges - in the other image: “ cross ” edges → A complete graph with two types of edges feature in image at layer

Attentional Graph Neural Network Optimal Matching Layer local Attentional Aggregation Sinkhorn Algorithm matching features descriptors partial Self Cross visual descriptor + assignment score matrix row normalization position Keypoint M+1 Encoder column norm. + T L dustbin N+1 =1 score Update the representation using a Message Passing Neural Network the message

Attentional Aggregation ● Compute the message using self and cross attention ● Soft database retrieval: query , key , and value = [tile, pos. (80, 110)] query neighbors = [corner, pos. (60, 90)] = [tile, position (70, 100)] query = [grid, pos. (400, 600)] salient points [Vaswani et al, 2017]

A B Self-attention = intra-image information flow distinctive points A B Cross-attention candidate = inter-image matches Attention builds a soft , dynamic , sparse graph

Attentional Graph Neural Network Optimal Matching Layer local Attentional Aggregation Sinkhorn Algorithm matching features descriptors partial Self Cross visual descriptor + assignment score matrix row normalization position Keypoint M+1 Encoder column norm. + T L dustbin N+1 =1 score Compute a score matrix for all matches:

Attentional Graph Neural Network Optimal Matching Layer local Attentional Aggregation Sinkhorn Algorithm matching features descriptors partial Self Cross visual descriptor + assignment score matrix row normalization position Keypoint M+1 Encoder column norm. + T L dustbin N+1 =1 score ● Occlusion and noise: unmatched keypoints are assigned to a dustbin ● Augment the scores with a learnable dustbin score

Attentional Graph Neural Network Optimal Matching Layer local Attentional Aggregation Sinkhorn Algorithm matching features descriptors partial Self Cross visual descriptor + assignment score matrix row normalization position Keypoint M+1 Encoder column norm. + T L dustbin N+1 =1 score ● Compute the assignment that maximizes ● Solve an optimal transport problem ● With the Sinkhorn algorithm : differentiable & soft Hungarian algorithm [Sinkhorn & Knopp, 1967]

Attentional Graph Neural Network Optimal Matching Layer local Attentional Aggregation Sinkhorn Algorithm matching features descriptors partial Self Cross visual descriptor + assignment score matrix row normalization position Keypoint M+1 Encoder column norm. + T L dustbin N+1 =1 score ● Compute ground truth correspondences from pose and depth ● Find which keypoints should be unmatched ● Loss: maximize the log-likelihood of the GT cells

Results: indoor - ScanNet SuperPoint + NN + heuristics SuperPoint + SuperGlue SuperGlue: more correct matches and fewer mismatches

Results: outdoor - SfM SuperPoint + NN + OA-Net (inlier classifier) SuperPoint + NN + mutual check SuperPoint + SuperGlue SuperGlue: more correct matches and fewer mismatches

Results: attention patterns global context neighborhood distinctive keypoints self-similarities match candidates Flexibility of attention → diversity of patterns 21

Evaluation Heuristics Learned inlier classifier SuperGlue yields large improvements in all cases

SuperGlue @ CVPR 2020 First place in the following competitions: - Image matching challenge vision.uvic.ca/image-matching-challenge - Local features for visual localization www.visuallocalization.net - Visual localization for handheld devices

SuperGlue Learning Feature Matching with Graph Neural Networks A major step towards end-to-end deep SLAM & SfM psarlin.com/superglue

Thank you psarlin.com/superglue

SuperGlue: Learning Feature Matching with Graph Neural Networks - PowerPoint PPT Presentation

SuperGlue: Learning Feature Matching with Graph Neural Networks Paul-Edouard Sarlin 1 Daniel DeTone 2 Tomasz Malisiewicz 2 Andrew Rabinovich 2 Feature matching is ubiquitous 3D reconstruction Visual localization SLAM Place

7.5 Bipartite Matching Matching Matching. Input: undirected graph G = (V, E). M E

Global Shape Matching Section 3.3: Articulated Matching using Graph Cuts Global Shape Matching:

Feature Point Feature-based approach: Detect and match feature Detec.on and Matching points

Decision Tree Prof. Seungchul Lee Industrial AI Lab. Feature Test Feature 1 Feature 2 Feature

Matching Bipartite Matching Input Given a (undirected) graph G = ( V , E ) Input Given a bipartite

Graph Matchings Matching A matching M in a graph G is a set of non-loop edges with no shared

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Matching of Matrix Elements and Parton Showers CKKW matching in e + e collisions Lecture 2:

Graph Neural Network Fang Yuanqiang, 2019/05/18 Graph Neural Network Why GNN? Preliminary

CS4495/6495 Introduction to Computer Vision 4B-L2 Matching feature points (a little) Feature

Plack Superglue for Perl Web Frameworks Tatsuhiko Miyagawa YAPC::NA 2010 Tatsuhiko Miyagawa

SuperGLUE A Stickier Benchmark for General-Purpose Language Understanding Systems Alex Wang* ,

A Distinctive Feature of A Distinctive Feature of A Distinctive Feature of A Distinctive Feature

Outline Reducing Dimensionality Feature Selection 1 Steven J Zeil Feature Extraction 2

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

Non-verbal Communication Skills to Positively Influence Classroom

Classification Based on Missing Features in Deep Convolutional Neural Networks Nemanja Milo

1. CS research process Check research on prior work on this problem Who (where, when),

Algorithms for NLP Summarization Chan Young Park CMU Slides adapted from: Dan Jurafsky

Chapter 5 - Attention and Memory Constraints Why is the human brain limited in capacity?

Attention! 1. Definitions and behavioral effects 2. Effects on neural firing rates: Spatial

Physiological measures in Learning Sciences Research Patrick.Jermann@epfl.ch