Learning to Read by Spelling
Towards Unsupervised Text Recognition
Ankush Gupta Andrea Vedaldi Andrew Zisserman
Visual Geometry Group (VGG) University of Oxford ICVGIP 2018, Hyderabad
Learning to Read by Spelling Towards Unsupervised Text Recognition - - PowerPoint PPT Presentation
Learning to Read by Spelling Towards Unsupervised Text Recognition Ankush Gupta Andrea Vedaldi Andrew Zisserman Visual Geometry Group (VGG) University of Oxford ICVGIP 2018, Hyderabad Text Recognition Imaged Text ASCII Text tion
Learning to Read by Spelling
Towards Unsupervised Text Recognition
Ankush Gupta Andrea Vedaldi Andrew Zisserman
Visual Geometry Group (VGG) University of Oxford ICVGIP 2018, Hyderabad
Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad
tion of regular fits of the gout, one or more joints
Imaged Text ASCII Text Assumes localisation is given
Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad
Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad
ConvNet
a .. n
<stop>
Sequence Model (e.g. RNNs)
Paired Image / Annotations tion of regular fits of the gout one or more joints part of the brain the tunica arachnoides was rated and of a pale yellow colour and with the
Lots of paired data
Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad
Manual labour
Synthetic Data
Jaderberg et al., NIPS DLW 2014 Gupta et al., CVPR16, BMVC18
Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad
Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad
but only 15K are valid English strings
(n-grams etc.) further constrain the output
Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad
Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad
2 sub-problems
à Solve for a |A | x |A | permutation matrix where, A is the alphabet, e.g.: 26 English letters {a,b,c,…,z}
Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad
Learn from unaligned text corpora, and text-images
Fully-Conv Recognition Net images with <= L characters |A A | “fake” text
Valid Language Strings e.g. from: WMT, NewsGroup etc.
Discriminator “real” text Adversarial Loss L
28: English letters {a,b,c,…,z} + space + pad
Softmax each position
visual language
Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad
Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad
The “recognizer” can generate valid text without “recognizing”
à Fool the discriminator without solving the task e.g. use “text-image” as noise à learn generator for valid English strings Fully-Conv Recognition Net “ this is a valid English string which looks real to the discriminator ” invalid recognition
Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad
Local recognizer restricted receptive field (~ 3 characters)
tion of regular fits of the gout one or more joints
Global discriminator checks validity of the entire sentence
Discriminator
No Reconstruction unlike CycleGAN
Because text à image is highly ambiguous
Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad
Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad
Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad
(no overlapping pages)
unaligned “text source”
Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad
Sample images and valid strings independently
Fully-Conv Recognition Net
Valid Language Strings
Discriminator Adversarial Loss
softmax predictions
strings
Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad
Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad
à generalization to other lengths
test accuracy
Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad
Why?
Varying spacing / non-fixed width font challenging for fully-conv. recognizer
45% word accuracy!
Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad
Why?
Varying spacing / non-fixed width challenging for fully-conv. recognizer à Let features travel using a “skip-RNN” in the last layer
Conv RNN Skip
Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad
85% character accuracy (now vs. 45% before)! 96.2% character accuracy.
Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad
the different forms in which this disease ap pears have rendered it necessary to divide it into regular and aregular gout in the former the attacks of which are known by the denomina tion of regular fits of the got onne o more joints
tender and frequently in an exquisite deqree a symptoneatic fever proportioned to the degree of pain and inflammation with evening exacerba tions accompand the other complaints which dis tress the patient for uncertain perions sometimes for several weeks wo he the fit goes off the piont which have been the seat of the disease are always found to have become rigid and inflexible in pro portion to the degree in which the disease has existed in themuffeequently remaining enlarged and incapable of free motion for a considerable timer cen he other hand the patient at the sas time experiences so perfect an exemption from disease as generally to lead to the opinion that the fit has occasioned the most salutary changes in the sten in ■ne■ e erd y i er in the irregular gout the affection of the joints is much less confined than in the former some e times it leaves the joints at fort tttached and fixes on some distant part■ and sometimes after harassing the patient by making a circuit in■ Ground Truth Prediction the different forms in which this disease ap pears have rendered it necessary to divide it into regular and irregular gout in the former the attacks of which are known by the denomina tion of regular fits of the gout one or more joint
tender and frequently in an exquisite degree a symptomatic fever proportioned to the degree of pain and inflammation with evening exacerba tions accompany the other complaints which dis tress the patient for uncertain periods sometimes for several weeks when the fit goes off the joints which have been the seat of the disease are always found to have become rigid and inflexible in pro portion to the degree in which the disease has existed in them■ frequently remaining enlarged and incapable of free motion for a considerable time on the other hand the patient at the same time experiences so perfect an exemption from disease ag generallyto leadu to the opinion that the fit has occasioned the most salutary changes in the system ■ ■■■ ■ ■i■■ in the irregular gout the affection of the joints is much less confined than in the former yiomeu timcs it leaves the joints at first attacked and fixes on some distant part■ and sometimes after harassing the patient by making a circuit in■ Image
Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad
Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad
Training with sequences of different lengths:
no convergence
faster convergence 13 > 11 > 9 > 7
Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad
learning order 1 2 3 4 training run s i e u a
t d h g k m y c p r n f b x w v z j q g e i s n l a u r d
t k y m c v w x p b j f q z g i e
s u p t d l c n r w h k y v m b z q f j x a e t s d i u l
r n y m v w b j x f z c g p k q low high
(Spearman’s rank correlation coefficient ρ = 0.80, p-value < 1e−5).
Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad
88 90 92 94 96 98 100
WMT WMT (No overlap) War & Peace
char word
Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad
à apply to any input domain, as long as output is still language
Learning to Read by Spelling
Towards Unsupervised Text Recognition
Ankush Gupta Andrea Vedaldi Andrew Zisserman
Any Questions?
Visual Geometry Group (VGG) University of Oxford ICVGIP 2018, Hyderabad
Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad
2000 4000 6000 8000 10000 iterations 0.0 0.2 0.4 0.6 0.8 1.0 character accuracy a e t s d i u l
r n y m v w b j x f z c g p k q
Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad
tttttttttttttttttttttttttttttttttttttttttttrssssss ttttttt ny nytt nr nrttttttt ny ntt nrttttttt zzzz bcorote to whol th ticunthss tidio tiostolonzzzz trougfht to ferr oy disectins it has dicomered brought to view by dissection it was discovered
Training iterations
The last transcription also corresponds to the ground-truth (punctuations are not modelled). The colour bar on the right indicates the accuracy (darker means higher accuracy).