Inductive Visual Localisation: Factorised Training for Superior - - PowerPoint PPT Presentation

▶

Dec 27, 2023 298 likes •420 views

Inductive Visual Localisation: Factorised Training for Superior Generalisation Ankush Gupta Andrea Vedaldi Andrew Zisserman Visual Geometry Group (VGG) University of Oxford 1 BMVC 2018, Newcastle upon Tyne | Ankush Gupta RNNs have a

SLIDE 1 BMVC 2018, Newcastle upon Tyne | Ankush Gupta 1 Ankush Gupta Andrea Vedaldi Andrew Zisserman

Inductive Visual Localisation: Factorised Training for Superior Generalisation

Visual Geometry Group (VGG) University of Oxford

SLIDE 2 BMVC 2018, Newcastle upon Tyne | Ankush Gupta 2

RNNs have a problem. Poor generalization to sequence lengths beyond those in the training set.

Training Testing

SLIDE 3 BMVC 2018, Newcastle upon Tyne | Ankush Gupta 3

Example: Enumerative Counting

Counting objects one-by-one. Total count = 3

Training Stop? 1

SLIDE 4 BMVC 2018, Newcastle upon Tyne | Ankush Gupta 4

Example: Enumerative Counting

Failure when tested on >3 length input Total count = 6

Testing Stop? 1

SLIDE 5 BMVC 2018, Newcastle upon Tyne | Ankush Gupta 5

Why? Non-interpretable recurrent state (st) which is trained end-to-end may not learn the correct loop-invariant.

SLIDE 6 BMVC 2018, Newcastle upon Tyne | Ankush Gupta 6

Our Solution

1. Train for one-step inductive

updates (not end-to-end).

2. Restrict the recurrent state to a

spatial-memory map, which tracks the progress made so far.

SLIDE 7 BMVC 2018, Newcastle upon Tyne | Ankush Gupta 7

Inductive Training

end-to-end

input image Spatial memory map

Stop?

Updated memory

Train for

ne-step

updates

SLIDE 8 BMVC 2018, Newcastle upon Tyne | Ankush Gupta 8

Results: Enumerative Counting

Coloured Shapes & DOTA Airplanes

train on 3-5 objects, test on >5 objects

SLIDE 9 BMVC 2018, Newcastle upon Tyne | Ankush Gupta 9

Multi-line Text Recognition

Read one line at each step

SLIDE 10 BMVC 2018, Newcastle upon Tyne | Ankush Gupta 10

Results: Multi-line Text Recognition

Synth Text Blocks

train on 1-4 lines, test on up to 10 lines

SLIDE 11 BMVC 2018, Newcastle upon Tyne | Ankush Gupta 11

Results: Multi-line Text Recognition

Vs. State-of-the-art @ ICDAR 2013 Blocks
utperform (in terms of Recall, F-score)

SLIDE 12 BMVC 2018, Newcastle upon Tyne | Ankush Gupta 12 Ankush Gupta Andrea Vedaldi Andrew Zisserman

Inductive Visual Localisation: Factorised Training for Superior Generalisation

Visual Geometry Group (VGG) University of Oxford

#111

Poster