Inductive Visual Localisation: Factorised Training for Superior - - PowerPoint PPT Presentation

inductive visual localisation factorised training for
SMART_READER_LITE
LIVE PREVIEW

Inductive Visual Localisation: Factorised Training for Superior - - PowerPoint PPT Presentation

Inductive Visual Localisation: Factorised Training for Superior Generalisation Ankush Gupta Andrea Vedaldi Andrew Zisserman Visual Geometry Group (VGG) University of Oxford 1 BMVC 2018, Newcastle upon Tyne | Ankush Gupta RNNs have a


slide-1
SLIDE 1 BMVC 2018, Newcastle upon Tyne | Ankush Gupta 1 Ankush Gupta Andrea Vedaldi Andrew Zisserman

Inductive Visual Localisation: Factorised Training for Superior Generalisation

Visual Geometry Group (VGG) University of Oxford
slide-2
SLIDE 2 BMVC 2018, Newcastle upon Tyne | Ankush Gupta 2

RNNs have a problem. Poor generalization to sequence lengths beyond those in the training set.

Training Testing

slide-3
SLIDE 3 BMVC 2018, Newcastle upon Tyne | Ankush Gupta 3

Example: Enumerative Counting

Counting objects one-by-one. Total count = 3

Training Stop? 1

slide-4
SLIDE 4 BMVC 2018, Newcastle upon Tyne | Ankush Gupta 4

Example: Enumerative Counting

Failure when tested on >3 length input Total count = 6

Testing Stop? 1

slide-5
SLIDE 5 BMVC 2018, Newcastle upon Tyne | Ankush Gupta 5

Why? Non-interpretable recurrent state (st) which is trained end-to-end may not learn the correct loop-invariant.

slide-6
SLIDE 6 BMVC 2018, Newcastle upon Tyne | Ankush Gupta 6

Our Solution

  • 1. Train for one-step inductive

updates (not end-to-end).

  • 2. Restrict the recurrent state to a

spatial-memory map, which tracks the progress made so far.

slide-7
SLIDE 7 BMVC 2018, Newcastle upon Tyne | Ankush Gupta 7

Inductive Training

end-to-end

input image Spatial memory map

Stop?

Updated memory

Train for

  • ne-step

updates

slide-8
SLIDE 8 BMVC 2018, Newcastle upon Tyne | Ankush Gupta 8

Results: Enumerative Counting

Coloured Shapes & DOTA Airplanes

train on 3-5 objects, test on >5 objects

slide-9
SLIDE 9 BMVC 2018, Newcastle upon Tyne | Ankush Gupta 9

Multi-line Text Recognition

Read one line at each step

slide-10
SLIDE 10 BMVC 2018, Newcastle upon Tyne | Ankush Gupta 10

Results: Multi-line Text Recognition

Synth Text Blocks

train on 1-4 lines, test on up to 10 lines

slide-11
SLIDE 11 BMVC 2018, Newcastle upon Tyne | Ankush Gupta 11

Results: Multi-line Text Recognition

  • Vs. State-of-the-art @ ICDAR 2013 Blocks
  • utperform (in terms of Recall, F-score)
slide-12
SLIDE 12 BMVC 2018, Newcastle upon Tyne | Ankush Gupta 12 Ankush Gupta Andrea Vedaldi Andrew Zisserman

Inductive Visual Localisation: Factorised Training for Superior Generalisation

Visual Geometry Group (VGG) University of Oxford

#111

Poster