t h e w o r l d s i n f o r m a t i o n i s n o t d i g i
play

T h e w o r l d s i n f o r m a t i o n i s n - PowerPoint PPT Presentation

T h e w o r l d s i n f o r m a t i o n i s n o t d i g i t a l ! A n e s t i m a t e d 1 3 0 M i l l i o n U n i q u e b o o k s i n e x i s t e n c e A t t e m p


  1. T h e w o r l d s i n f o r m a t i o n i s n o t d i g i t a l ! ● A n e s t i m a t e d 1 3 0 M i l l i o n U n i q u e b o o k s i n e x i s t e n c e A t t e m p t i n g t o d i g i t i z e t h i s i n f o r m a t i o n : ● G o o g l e B o o k s ● P r o j e c t G u t e n b e r g ● M i l l i o n B o o k P r o j e c t ● L O C : A m e r i c a n M e m o r y A n d m a n y m o r e

  2. V i e w i n g S c a n n e d d o c u m e n t s o n m o b i l e d e v i c e s – t w o o p t i o n s : T O O B I G T O O S M A L L

  3. I s O C R t h e s o l u t i o n ?

  4. B I T e R B i d e r i c t i o n a l I m a g e T e x t R e f l o w

  5. C o n c e p t : R e f l o w e d I m a g e * Actual program output! R e a r r a n g e t e x t a n d i m a g e s w h i l e p r e s e r v i n g d o c u m e n t l a y o u t

  6. S p e c i a l P r o p e r t i e s o f t e x t U s u a l l y o r g a n i z e d i n p a r a g r a p h s

  7. S p e c i a l P r o p e r t i e s o f t e x t High directional variance in gradient

  8. S p e c i a l P r o p e r t i e s o f t e x t S p a c i n g b e t w e e n w o r d s i s d i r e c t l y r e l a t e d t o l i n e h e i g h t

  9. S p e c i a l P r o p e r t i e s o f t e x t P a r a g r a p h m a r g i n s a r e u s u a l l y s m a l l e r t h a n i m a g e / g r a p h m a r g i n s

  10. M e t h o d o l o g y : O u t l i n e ● Line height detection ● Segmentation to textual and non-textual elements ● Verification of segment classification ● Further segmentation of textual segments ● Ordering of textual segments

  11. M e t h o d o l o g y : L i n e h e i g h t d e t e c t i o n ● Sum pixel values on X axis ● Filter using median value ● Find median length of continuous segments ● Robust – automatic re-targeting

  12. M e t h o d o l o g y : L i n e h e i g h t d e t e c t i o n 28

  13. M e t h o d o l o g y : I m a g e S e g m e n t a t i o n ● C o m p u t e d i r e c t i o n a l v a r i a n c e o f g r a d i e n t a l o n g t h e X c o - o r d i n a t e ● T h r e s h o l d r e s u l t u s i n g m e d i a n v a l u e ● A b s o r b s e g m e n t s s m a l l e r t h a n c o m p u t e d d o c u m e n t l i n e h e i g h t ● V e r i f y a n d r e c l a s s i f y u s i n g t w o m e t r i c s : 1 . M a r g i n w i d t h 2 . S e g m e n t l i n e h e i g h t

  14. M e t h o d o l o g y : I m a g e S e g m e n t a t i o n O r i g i n a l F i n a l S e g me n t a t i o n I n i t i a l S e g me n t a t i o n TEXT IMAGE TEXT TEXT IMAGE TEXT

  15. M e t h o d o l o g y : W o r d I d e n t i f i c a t i o n ● Smooth segment using a filter based on detected segment line height. ● Detect connected components after smoothing ● Filter out small components.

  16. M e t h o d o l o g y : T e x t B a s e l i n e P r o b l e m: D u e t o c l o s e - c r o p p i n g , w o r d s w i l l b e c o m e m i s a l i g n e d S o l u t i o n : D e t e c t t r u e w o r d b a s e l i n e u s i n g l o c a l m a x i m a o f v e r t i c a l g r a d i e n t

  17. M e t h o d o l o g y : W o r d o r d e r ● Detect text line locations using the same methodology as line height detection ● Order words by lines, then by X coordinate ● RTL and LTR languages easily accommodated

  18. M e t h o d o l o g y : O u t p u t ● S e g m e n t e d s e c t i o n s a r e o u t p u t a s i n d i v i d u a l i m a g e s ● O r i g i n a l d o c u m e n t o r d e r i s p r e s e r v e d a l o n g s e g m e n t s ● R e s u l t i s d i s p l a y e d a s a H T M L f i l e , a l l o w i n g e a s y v i e w i n g o n m u l t i p l e p l a t f o r m s .

  19. R e s u l t s Test Set Accuracy Exert from V.S. Nalwa's “A guided tour of computer vision” , 1993 (1) 96.1% Exert from V.S. Nalwa's “A guided tour of computer vision” , 1993 (2) 91.6% 2010 English Journal article 97.8% 1907 German journal article 99.7% Hebrew sample text 96.8% Hebrew sample text 2 84.9% 1953 Article by Kuffler 93.7% 1925, Gestalt theory by Max Wertheimer. 57.3% 1983 Excerpt from Human and Machine Vision, Vitkin & Tenenbaum 98.2% 0.25 ⋅ correctlyidentified +¿ total segments 2 0.75 ⋅( correctlysegmented )+ 0.5 ⋅( baseline / segmenting erros )−( dropped / misplaced words ) = accuracy total words

  20. F u t u r e W o r k ● Improved adaptive parameters for filters ● Better verification of segment identification ● Support for multi-column layouts ● Detection of special text formats (lists etc.)

  21. T h a n k y o u ! T O O B I G T O O S M A L L J U S T R I G H T !

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend