Obje jective To develop a novel language agnostic text detection - PowerPoint PPT Presentation

Aarushi Agrawal 1 , Prerana Mukherjee 2 , Siddharth Srivastava 2 and Brejesh Lall 2 1 Department of Electrical Engineering 2 Department of Electrical Engineering Indian Institute of Technology, Kharagpur Indian Institute of Technology, Delhi CVIP-WM 2017

Obje jective To develop a novel language agnostic text detection method utilizing edge enhanced Maximally Stable Extremal Regions in natural scenes by defining strong characterness measures. CVIP-WM 2017

Introduction In • Text co-occurring in images and videos serve as a warehouse for valuable information for describing images. • A few interesting applications are • Extract street names, numbers, textual indications such as “diversion ahead” • Autonomous vehicles- follow traffic rules based on road sign interpretaion • Indexing and tagging of images Performing the above tasks is trivial for humans but segregating it against a challenging background still remains as a complicated task for machines. CVIP-WM 2017

Rela lated Works • Maximally Stable Extremal Regions (MSERs) • With Canny Edge Detector • MSER is applied to the image to determine regions with characters • Pixels outside of Canny Edges are removed • With Graph Model • Apply MSER for generating blobs • Generate a graph model using the positioning, color etc of graphs • Then define cost functions to separate foreground and background regions • Stroke Width Transform • Finds stroke width for each image pixel • A stroke is a contiguous part of an image that forms a band of nearly constant width CVIP-WM 2017

Rela lated Works • Feature based techniques • Histogram of Oriented Gradients • Gabor based features • Shape descriptors • Fourier Transform • Zernike moments • Characterness • Text specific saliency detection method • Uses saliency cues to accentuate boundary information CVIP-WM 2017

Contributions • We develop a language agnostic text identification framework using text candidates obtained from edge based MSERs and combination of various characterness cues. This is followed by a entropy assisted non-text region rejection strategy. Finally, the blobs are refined by combining regions with similar stroke width variance and distribution of characterness cues in respective regions • We provide comprehensive evaluation on popular text datases against recent text detection techniques and show that the proposed technique provides equivalent or better results. CVIP-WM 2017

Methodology CVIP-WM 2017

Methodology Text candidate generation using eMSERs: • Generate initial set of text candidates using edge enhanced Maximally Stable Extremal Regions (eMSERs) approach. • MSER is a method for blob detection which extracts the covariant regions. • It aggregates region with similar intensity at various thresholds. • In order to handle presence of blur, eMSERs are computed over the gradient amplitude based image. • Two sets of regions are generated: dark and bright; dark regions are those with lower intensity than their surroundings and vice-versa . • Non text regions are rejected based on geometric properties such as aspect ratio, number of pixels(to reject noise) and skeleton length. Original Image Lighter side Darker Side

Methodology Elimination of non-text regions: • Text usually appears on a surrounding having a distinctive intensity. • Find corresponding image patches, 𝑆 , for eMSER blobs. As the patch may contain spurious data, we obtain binarized image patch 𝑐 𝑗 using Otsu's threshold for that region and common region, 𝐷 𝑆 𝑗 between 𝑐 𝑗 and 𝑆 . Retain blob if ( 𝑐 𝑗 ∩ 𝑆 > 90%). Initial Blob Binarised image patch Selected individual alphabets ‘w’ and ‘n’. • Define various characterness cues: • Stroke width variance: For every pixel 𝑞 in the skeletal image of region ( 𝑠 ) to the boundary of the region, 𝑇𝑋 ( 𝑞 ) distribution is obtained and following are evaluated: 𝑤𝑏𝑠(𝑇𝑋) max 𝑇𝑋 −min(𝑇𝑋) 𝑛𝑝𝑒𝑓(𝑇𝑋) 𝑛𝑓𝑏𝑜(𝑇𝑋) 2 𝐼𝑌𝑋 𝐼𝑌𝑋 • HOG and PHOG: HOG is invariant to geometric and photometric transformations. PHOG helps in providing a spatial layout for the local shape of the image. • Entropy: Calculated as Shannon's entropy for the common regions ( 𝑐 𝑗 ∩ 𝑆 ) given as, 𝑂−1 𝑞 𝑗 𝑚𝑝𝑕 𝑞 𝑗 𝐼 = - 𝑗=0 where 𝑂 = # gray levels 𝑞 𝑗 = probability associated to the gray level 𝑗 ;

Methodology Bounding Box Refinement: • Characterness cue distribution is defined by computing values for ICDAR 2013 dataset. • Using above distribution, stroke width distribution and stroke width difference combine the neighboring candidate regions and aggregate them into one larger text region. • Combine all the neighboring regions into a single text candidate. Smaller regions selected as individual blobs Final result after combining them

Results Training and Testing: Training is performed on ICDAR 2013 dataset while the test set consists of MSRATD and KAIST datasets. This setting makes the evaluation potentially challenging as well as allows to evaluate the generalization ability of various techniques. Qualitative Results

Results KAIST - English Precision Recall F- Measure Quantitative Results Proposed 0.8485 0.3299 0.4562 Characterness 0.5299 0.2467 0.3136 Blob Detection 0.8047 0.4716 0.5547 KAIST - Korean Precision Recall F- Measure Proposed 0.9545 0.3556 0.4994 MSRATD Characterness 0.7263 0.3209 0.4083 Blob Detection 0.9091 0.5141 0.6269 Precision Recall F- Measure Proposed 0.85 0.33 0.46 KAIST - Mixed Characterness [1] 0.53 0.25 0.31 Precision Recall F- Measure Blob Detection [2] 0.8 0.47 0.55 Proposed 0.9702 0.3362 0.4838 Epshtein et al. [3] 0.25 0.25 0.25 Characterness 0.8345 0.3043 0.4053 Chen et al. [4] 0.05 0.05 0.05 Blob Detection 0.9218 0.4826 0.5985 TD-ICDAR [5] 0.53 0.52 0.5 KAIST - All Gomez et al. [6] 0.58 0.54 0.56 Precision Recall F- Measure Proposed 0.9244 0.3407 0.4798 Characterness [1] 0.6969 0.2910 0.3757 Blob Detection [2] 0.8785 0.4898 0.5933 Gomez et al. [6] 0.66 0.78 0.71 Lee et al. [7] 0.69 0.60 0.64

Conclusion • Proposed a language agnostic text identification scheme using text candidates obtained from edge based eMSERs. • Processing steps are used to reject the non-textual blobs and combine smaller blobs into one larger region by utilizing stronger characterness measures. • The effectiveness has been analyzed with precision, recall and F- measure evaluation measures showing that the proposed scheme performs better than the traditional text detection schemes. CVIP-WM 2017

References [1] Li, Yao, Wenjing Jia, Chunhua Shen, and Anton van den Hengel. "Characterness: An indicator of text in the wild." IEEE transactions on image processing 23, no. 4 (2014): 1666-1677. [2] Jahangiri, Mohammad, and Maria Petrou. "An attention model for extracting components that merit identification." In Image Processing (ICIP), 2009 16th IEEE International Conference on , pp. 965-968. IEEE, 2009. [3] Epshtein, Boris, Eyal Ofek, and Yonatan Wexler. "Detecting text in natural scenes with stroke width transform." In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on , pp. 2963-2970. IEEE, 2010. [4] Chen, Xiangrong, and Alan L. Yuille. "Detecting and reading text in natural scenes." In Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on , vol. 2, pp. II-II. IEEE, 2004 [5] Yao, Cong, Xiang Bai, Wenyu Liu, Yi Ma, and Zhuowen Tu. "Detecting texts of arbitrary orientations in natural images." In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on , pp. 1083-1090. IEEE, 2012. [6] Gomez, Lluis, and Dimosthenis Karatzas. "Multi-script text extraction from natural scenes." In Document Analysis and Recognition (ICDAR), 2013 12th International Conference on , pp. 467-471. IEEE, 2013. [7] Lee, SeongHun, Min Su Cho, Kyomin Jung, and Jin Hyung Kim. "Scene text extraction with edge constraint and text collinearity." In Pattern Recognition (ICPR), 2010 20th International Conference on , pp. 3983-3986. IEEE, 2010. CVIP-WM 2017

CVIP-WM 2017

Obje jective To develop a novel language agnostic text detection - PowerPoint PPT Presentation

Aarushi Agrawal 1 , Prerana Mukherjee 2 , Siddharth Srivastava 2 and Brejesh Lall 2 1 Department of Electrical Engineering 2 Department of Electrical Engineering Indian Institute of Technology, Kharagpur Indian Institute of Technology, Delhi

Happiness A Business Model Margaret Cullen Destination marketing study obje jective and

How Arizona a Obje jective ives s for or today ay Citizen Review w Participants will

with Navigators to Create a Consumer Friendly Out-of- Pocket Cost Tool Kevin Van Dyke, IMPAQ

object jective ive ca caml ml Daniel Jackson MIT Lab for Computer Science 6898: Advanced

CARIBBEAN MARINE ATLAS PHASE 2 - CMA2 ATLAS MARINO DEL CARIBE FASE 2 - CMA2 CARIBBEAN MARINE

7 th Grade Course Request Presentation (Current 6 th Graders) Welcome Parents! Me Meet eting

(Some) Research Trends in Question Answering (Q (QA) Advanced Topics in AI (Spring 2017) IIT

Carole Lawrence Lab Director PCRM Edmonton Obje jectiv ives Initial work up of couple

IDGE CT BR OJE R PR A MONICA PIE NT ME ACE PL SANT E R OBJE CTIVE Staff r e c

Loblolly Pine Trees at the Camp Mohawk Area In Intern Fie Field ld Proje ojects ts: Obje

HAZARD AWARE NE SS PROGRAM September 2019 T hre e Main Obje c tive s 1. Promote Hazard

Programs and Funding September 2019 Learnin ing Obje jectiv ives Understand the role HRSA

May 2016 Course rse obje jecti tives ves: 1. Be able to apply an audience-centered approach

Literary Elements OB OBJE JECT CTIVES IVES Identify elements of a short story Define

PROPE RT Y Annua l Physic a l I nve nto ry T ra ining MANAGE ME NT OBJE CT I VE T

Cla re nc e W. Pie rc e Sc ho o l o f Ag ric ulture GOAL S AND OBJE CT I VE S - SUPPORT T

Outline Introduction. Paper: Paper: Optimal Sizing for Minimum Energy. B Benton H. C.,

The bilevel lot-sizing problem Tams Kis 1 joint work with Andrs Kovcs 1 Computing and

Gregory Shklover Ben Emanuel Intel Corporation Motivation Data Gate Sizing by Lagrangian

Impact of Buffer Size on a Congestion Control Algorithm Based on Model Predictive Control Taran

Intuitionistic Proofs Without Syntax Willem Heijltjes, Dominic Hughes, and Lutz Straburger

Optimizing Data Partitioning for Data-Parallel Computing Qifa Ke , Vijayan Prabhakaran, Jingyue

Lecture 3.1 Factors Against Parallelism EN 600.320/420/620 Instructor: Randal Burns 7 February

A skew product map with a non-contracting iterated monodromy group Volodymyr Nekrashevych 2019,