T h e w o r l d s i n f o r m a t i o n i s n o t d i g i t a l ! ● A n e s t i m a t e d 1 3 0 M i l l i o n U n i q u e b o o k s i n e x i s t e n c e A t t e m p t i n g t o d i g i t i z e t h i s i n f o r m a t i o n : ● G o o g l e B o o k s ● P r o j e c t G u t e n b e r g ● M i l l i o n B o o k P r o j e c t ● L O C : A m e r i c a n M e m o r y A n d m a n y m o r e
V i e w i n g S c a n n e d d o c u m e n t s o n m o b i l e d e v i c e s – t w o o p t i o n s : T O O B I G T O O S M A L L
I s O C R t h e s o l u t i o n ?
B I T e R B i d e r i c t i o n a l I m a g e T e x t R e f l o w
C o n c e p t : R e f l o w e d I m a g e * Actual program output! R e a r r a n g e t e x t a n d i m a g e s w h i l e p r e s e r v i n g d o c u m e n t l a y o u t
S p e c i a l P r o p e r t i e s o f t e x t U s u a l l y o r g a n i z e d i n p a r a g r a p h s
S p e c i a l P r o p e r t i e s o f t e x t High directional variance in gradient
S p e c i a l P r o p e r t i e s o f t e x t S p a c i n g b e t w e e n w o r d s i s d i r e c t l y r e l a t e d t o l i n e h e i g h t
S p e c i a l P r o p e r t i e s o f t e x t P a r a g r a p h m a r g i n s a r e u s u a l l y s m a l l e r t h a n i m a g e / g r a p h m a r g i n s
M e t h o d o l o g y : O u t l i n e ● Line height detection ● Segmentation to textual and non-textual elements ● Verification of segment classification ● Further segmentation of textual segments ● Ordering of textual segments
M e t h o d o l o g y : L i n e h e i g h t d e t e c t i o n ● Sum pixel values on X axis ● Filter using median value ● Find median length of continuous segments ● Robust – automatic re-targeting
M e t h o d o l o g y : L i n e h e i g h t d e t e c t i o n 28
M e t h o d o l o g y : I m a g e S e g m e n t a t i o n ● C o m p u t e d i r e c t i o n a l v a r i a n c e o f g r a d i e n t a l o n g t h e X c o - o r d i n a t e ● T h r e s h o l d r e s u l t u s i n g m e d i a n v a l u e ● A b s o r b s e g m e n t s s m a l l e r t h a n c o m p u t e d d o c u m e n t l i n e h e i g h t ● V e r i f y a n d r e c l a s s i f y u s i n g t w o m e t r i c s : 1 . M a r g i n w i d t h 2 . S e g m e n t l i n e h e i g h t
M e t h o d o l o g y : I m a g e S e g m e n t a t i o n O r i g i n a l F i n a l S e g me n t a t i o n I n i t i a l S e g me n t a t i o n TEXT IMAGE TEXT TEXT IMAGE TEXT
M e t h o d o l o g y : W o r d I d e n t i f i c a t i o n ● Smooth segment using a filter based on detected segment line height. ● Detect connected components after smoothing ● Filter out small components.
M e t h o d o l o g y : T e x t B a s e l i n e P r o b l e m: D u e t o c l o s e - c r o p p i n g , w o r d s w i l l b e c o m e m i s a l i g n e d S o l u t i o n : D e t e c t t r u e w o r d b a s e l i n e u s i n g l o c a l m a x i m a o f v e r t i c a l g r a d i e n t
M e t h o d o l o g y : W o r d o r d e r ● Detect text line locations using the same methodology as line height detection ● Order words by lines, then by X coordinate ● RTL and LTR languages easily accommodated
M e t h o d o l o g y : O u t p u t ● S e g m e n t e d s e c t i o n s a r e o u t p u t a s i n d i v i d u a l i m a g e s ● O r i g i n a l d o c u m e n t o r d e r i s p r e s e r v e d a l o n g s e g m e n t s ● R e s u l t i s d i s p l a y e d a s a H T M L f i l e , a l l o w i n g e a s y v i e w i n g o n m u l t i p l e p l a t f o r m s .
R e s u l t s Test Set Accuracy Exert from V.S. Nalwa's “A guided tour of computer vision” , 1993 (1) 96.1% Exert from V.S. Nalwa's “A guided tour of computer vision” , 1993 (2) 91.6% 2010 English Journal article 97.8% 1907 German journal article 99.7% Hebrew sample text 96.8% Hebrew sample text 2 84.9% 1953 Article by Kuffler 93.7% 1925, Gestalt theory by Max Wertheimer. 57.3% 1983 Excerpt from Human and Machine Vision, Vitkin & Tenenbaum 98.2% 0.25 ⋅ correctlyidentified +¿ total segments 2 0.75 ⋅( correctlysegmented )+ 0.5 ⋅( baseline / segmenting erros )−( dropped / misplaced words ) = accuracy total words
F u t u r e W o r k ● Improved adaptive parameters for filters ● Better verification of segment identification ● Support for multi-column layouts ● Detection of special text formats (lists etc.)
T h a n k y o u ! T O O B I G T O O S M A L L J U S T R I G H T !
Recommend
More recommend