Learning to Read by Spelling Towards Unsupervised Text Recognition - PowerPoint PPT Presentation

Learning to Read by Spelling Towards Unsupervised Text Recognition Ankush Gupta Andrea Vedaldi Andrew Zisserman Visual Geometry Group (VGG) University of Oxford ICVGIP 2018, Hyderabad

Text Recognition Imaged Text ASCII Text tion of regular fits of the gout, one or more joints Assumes localisation is given Word / line level bounding boxes • Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad

Let’s solve this Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad

Text Recognition: Sequence Learning 101 a .. Lots of paired data n o .. <stop> ConvNet Sequence Model (e.g. RNNs) Paired Image / Annotations tion of regular fits of the gout one or more joints part of the brain the tunica arachnoides was rated and of a pale yellow colour and with the Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad

Text Recognition: Paired Data? Manual labour Expensive • Boring.. • Synthetic Data New engine for each domain • Complex pipelines • Jaderberg et al., NIPS DLW 2014 Domain gap • Gupta et al., CVPR16, BMVC18 Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad

Can we do without paired data? Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad

Language is highly structured There are ~ 8 billion random strings of length 7 (26 characters) • but only 15K are valid English strings The frequency of characters and words, and their co-occurrence • (n-grams etc.) further constrain the output We leverage this structure for supervision. Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad

Method Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad

Text Recognition 2 sub-problems 1. Segment text image into characters, and cluster to consistent class visual 2. Assign each cluster to correct “character” label language à Solve for a | A | x | A | permutation matrix where, A is the alphabet, e.g.: 26 English letters {a,b,c,…,z} Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad

Unpaired Text Recognition Learn from unaligned text corpora, and text-images visual language 28 : English letters Adversarial Loss {a,b,c,…,z} + space L + pad “fake” text | A A | Discriminator Softmax “real” text each position one-hot Fully-Conv Recognition Net Valid Language Strings e.g. from: WMT, images with <= L characters NewsGroup etc. Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad

Unpaired Text Recognition Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad

Pitfall: Uncorrelated Predictions The “recognizer” can generate valid text without “recognizing” à Fool the discriminator without solving the task e.g. use “text-image” as noise à learn generator for valid English strings invalid recognition “ this is a valid English string which looks real to the discriminator ” Fully-Conv Recognition Net Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad

Uncorrelated Predictions: Solution Global discriminator checks validity of Discriminator the entire sentence . . . . . tion of regular fits of the gout one or more joints Local recognizer restricted receptive field (~ 3 characters) No Reconstruction unlike CycleGAN Because text à image is highly ambiguous Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad

Experiments Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad

Synthetic Text Images Fixed-width font • WMT Newscrawl text source (EMNLP datasets) • Control over nuisance factors à used for analysis • 100K training sequences, 1K test sequences • Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad

Real Text Images Google Books scans • Non-fixed width font • Varying word spacing due to alignment • “See-through” from back • Different case (small / capital) • Italics / noise / fading etc. • 3K training lines, 300 test lines • (no overlapping pages) Use Google OCR output as • unaligned “text source” Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad

Training Strategy Sample images and valid strings independently Adversarial Loss Discriminator softmax one-hot predictions strings Fully-Conv Recognition Net Valid Language Strings Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad

Results Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad

Synthetic Text Images test accuracy ~99% character accuracy • ~95% word accuracy • Trained on 24-length sequences à test on 3, 5, 7, 9, 11, 24, 32, 48 • à generalization to other lengths Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad

Real Text Images 45% word accuracy! Why? Varying spacing / non-fixed width font challenging for fully-conv. recognizer Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad

Real Text Images Why? Varying spacing / non-fixed width challenging for fully-conv. recognizer à Let features travel using a “skip-RNN” in the last layer Skip RNN Conv Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad

Real Text Images 85% character accuracy (now vs. 45% before)! 96.2% character accuracy. Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad

Real Text Example Ground Truth Image Prediction the different forms in which this disease ap the different forms in which this disease ap pears have rendered it necessary to divide it pears have rendered it necessary to divide it into regular and aregular gout in the former into regular and irregular gout in the former the attacks of which are known by the denomina the attacks of which are known by the denomina tion of regular fits of the got onne o more joints tion of regular fits of the gout one or more joint of the extremities become inflamed painful and of the extremities become inflamed painful and tender and frequently in an exquisite deqree a tender and frequently in an exquisite degree a symptoneatic fever proportioned to the degree of symptomatic fever proportioned to the degree of pain and inflammation with evening exacerba pain and inflammation with evening exacerba tions accompand the other complaints which dis tions accompany the other complaints which dis tress the patient for uncertain perions sometimes tress the patient for uncertain periods sometimes for several weeks wo he the fit goes off the piont for several weeks when the fit goes off the joints which have been the seat of the disease are always which have been the seat of the disease are always found to have become rigid and inflexible in pro found to have become rigid and inflexible in pro portion to the degree in which the disease has portion to the degree in which the disease has existed in them ■ frequently remaining enlarged existed in themuffeequently remaining enlarged and incapable of free motion for a considerable and incapable of free motion for a considerable timer cen he other hand the patient at the sas time on the other hand the patient at the same time experiences so perfect an exemption from time experiences so perfect an exemption from disease as generally to lead to the opinion that disease ag generallyto leadu to the opinion that the fit has occasioned the most salutary changes the fit has occasioned the most salutary changes in the sten in ■ ne ■ e erd y i er in the system ■ ■■■ ■ ■ i ■■ in the irregular gout the affection of the joints in the irregular gout the affection of the joints is much less confined than in the former some e is much less confined than in the former yiomeu times it leaves the joints at fort tttached and timcs it leaves the joints at first attacked and fixes on some distant part ■ and sometimes after fixes on some distant part ■ and sometimes after harassing the patient by making a circuit in ■ harassing the patient by making a circuit in ■ Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad

Analysis Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad

Effect of Sequence Length on Training Convergence Training with sequences of different lengths: Short lengths 3-5 : • no convergence Longer sequence à • faster convergence 13 > 11 > 9 > 7 Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Visual Geometry Group (VGG), Oxford | ICVGIP 2018, Hyderabad

Learning to Read by Spelling Towards Unsupervised Text Recognition - PowerPoint PPT Presentation

Learning to Read by Spelling Towards Unsupervised Text Recognition Ankush Gupta Andrea Vedaldi Andrew Zisserman Visual Geometry Group (VGG) University of Oxford ICVGIP 2018, Hyderabad Text Recognition Imaged Text ASCII Text tion

Spelling Correction and the Noisy Channel The Spelling

Spelling Correction and the Noisy Channel The$Spelling$ Correc/on$Task$ Dan$Jurafsky$

Spelling Frome Vale Academy Finding out. about spelling within the new primary curriculum

Spelling Presentation Book, Grade 3 (SRA Reading Mastery, Signature Spelling Presentation Book,

instruction in learning to read? Kathy Rastle Royal Holloway, University of London @kathy_rastle

SPaG Parent Workshop Agenda English and the 2014 Curriculum Spelling How we teach

Relaunch of Christ Churchs spelling scheme What is the Christ Church Spelling Scheme? It is

IAVA Education Day Bees Presentation Why Spelling Bee? Help students improve their spelling

Grammatica Spelling & Grammar API Introduction (simplified: see the expanded diagram below)

This week, we are going to look at a set of statutory spelling challenge words from the Y3/Y4

Spelling auditory memory, discrimina:on and motor skills] Spelling [encoding] is the reverse

SPELLING AND GRAMMAR Know the statutory guidelines for each year group. Know the expectations

Who cares about spelling? Why people care about spelling Computers Computers Computers Topic

Phonics in EYFS and KS1 LEARNING TO READ AT WARREN ROAD Elements of learning to read Listening

scop spect Read the sentences and choose the best spelling word for each. You

Spelling, Grammar and Punctuation Parent Workshop with explanations and activities Aims

Oranga Rongo Papakura Marae Centre Figure 3 Male Diagnosed Gout Prevalence by age and

{ Background PCT Open Tender in 2011 30% Ophthalmology OP Services Multiple Bidders

U.S. FDA Regulation of Aquaculture Drugs Lisa Weddig National Fisheries Institute Who is NFI?

Testosterone supplements for male hypogonadism Replacement therapy for deficiency or absence of

cannabis for fiber production at Mount Vernon as his primary crop. By the late 18th century, early

Forensics and Police Data Management Sub Directorate General overview on INTERPOL and its

Mag -PCOR Muna Tayo Nationwide Capacity Building for Filipino- Americans to Engage In PCOR

I nterim financial report fourth quarter 2 0 1 4 I nvestor presentation Koen Van Gerven, CEO

Learning to Read by Spelling Towards Unsupervised Text Recognition - PowerPoint PPT Presentation

Learning to Read by Spelling Towards Unsupervised Text Recognition Ankush Gupta Andrea Vedaldi Andrew Zisserman Visual Geometry Group (VGG) University of Oxford ICVGIP 2018, Hyderabad Text Recognition Imaged Text ASCII Text tion

Spelling Correction and the Noisy Channel The Spelling

Spelling Correction and the Noisy Channel The$Spelling$ Correc/on$Task$ Dan$Jurafsky$

Spelling Frome Vale Academy Finding out. about spelling within the new primary curriculum

Spelling Presentation Book, Grade 3 (SRA Reading Mastery, Signature Spelling Presentation Book,

instruction in learning to read? Kathy Rastle Royal Holloway, University of London @kathy_rastle

SPaG Parent Workshop Agenda English and the 2014 Curriculum Spelling How we teach

Relaunch of Christ Churchs spelling scheme What is the Christ Church Spelling Scheme? It is

IAVA Education Day Bees Presentation Why Spelling Bee? Help students improve their spelling

Grammatica Spelling &amp; Grammar API Introduction (simplified: see the expanded diagram below)

This week, we are going to look at a set of statutory spelling challenge words from the Y3/Y4

Spelling auditory memory, discrimina:on and motor skills] Spelling [encoding] is the reverse

SPELLING AND GRAMMAR Know the statutory guidelines for each year group. Know the expectations

Who cares about spelling? Why people care about spelling Computers Computers Computers Topic

Phonics in EYFS and KS1 LEARNING TO READ AT WARREN ROAD Elements of learning to read Listening

scop spect Read the sentences and choose the best spelling word for each. You

Spelling, Grammar and Punctuation Parent Workshop with explanations and activities Aims

Oranga Rongo Papakura Marae Centre Figure 3 Male Diagnosed Gout Prevalence by age and

{ Background PCT Open Tender in 2011 30% Ophthalmology OP Services Multiple Bidders

U.S. FDA Regulation of Aquaculture Drugs Lisa Weddig National Fisheries Institute Who is NFI?

Testosterone supplements for male hypogonadism Replacement therapy for deficiency or absence of

cannabis for fiber production at Mount Vernon as his primary crop. By the late 18th century, early

Forensics and Police Data Management Sub Directorate General overview on INTERPOL and its

Mag -PCOR Muna Tayo Nationwide Capacity Building for Filipino- Americans to Engage In PCOR

I nterim financial report fourth quarter 2 0 1 4 I nvestor presentation Koen Van Gerven, CEO

Grammatica Spelling & Grammar API Introduction (simplified: see the expanded diagram below)