Downtown Osaka Scene Text Dataset
Masakazu Iwamura, Takahiro Matsuda Naoyuki Morimoto, Hitomi Sato Yuki Ikeda and Koichi Kise Osaka Prefecture University
Downtown Osaka Scene Text Dataset Masakazu Iwamura, Takahiro - - PowerPoint PPT Presentation
DOST Dataset Downtown Osaka Scene Text Dataset Masakazu Iwamura, Takahiro Matsuda Naoyuki Morimoto, Hitomi Sato Yuki Ikeda and Koichi Kise Osaka Prefecture University 1. Introduction 2. Unique Features of DOST Dataset 3. Construction of
Masakazu Iwamura, Takahiro Matsuda Naoyuki Morimoto, Hitomi Sato Yuki Ikeda and Koichi Kise Osaka Prefecture University
Dataset
Dataset
10 20 30 40 50 60 70 80 90 100 IIIT5K 50 IIIT5K 1k IIIT5K None SVT 50 SVT None ICDAR2003 50 ICDAR2003 Full ICDAR2003 50k
Recent results are 80+%
This does not mean these methods can read a wide variety of text in the real environment
(as much as possible)
(with regard to resolution, capture angle and so on)
“Scene Text in the Wild”
Dataset
in the real environment
like MJSynth and SynthText datasets
captured
“incidental scene text” dataset captured with Google Glass
direction
single word
(MJSynth and SynthText)
(Google Streetview House Number dataset)
509 462 1,670 659 3,000 349 5,000 63,686 15,277 27,824 11,791 32,147 20,000 40,000 60,000 80,000 ICDAR2003 ICDAR2013 Chal. 2 ICDAR2015 Chal. 4 NEOCR KAIST SVT IIIT5K COCO-Text ICDAR2013 Chal. 3 ICDAR2015 Chal. 3 YVT DOST
Image DB Video DB
Almost double
2,268 2,524 17,548 5,238 3,000 904 5,000 173,589 93,598 125,141 16,620 797,919 200,000 400,000 600,000 800,000 ICDAR2003 ICDAR2013 Chal. 2 ICDAR2015 Chal. 4 NEOCR KAIST SVT IIIT5K COCO-Text ICDAR2013 Chal. 3 ICDAR2015 Chal. 3 YVT DOST Image DB Video DB
x4.6 Images were captured in shopping streets where a lot of texts exist
1,962 3,562 245 22,398 5,000 10,000 15,000 20,000 25,000 ICDAR2013 Chal. 3 ICDAR2015 Chal. 3 YVT DOST
x6.3
words are contained
837,489 723,805 696,697 355,158 324,742 22,802 200,000 400,000 600,000 800,000 Alphabet Kanji Katakana Hiragana Digit Symbol Japanese characters
837,489 723,805 696,697 355,158 324,742 22,802 200,000 400,000 600,000 800,000 Alphabet Kanji Katakana Hiragana Digit Symbol Japanese characters 日 本 店 円 大 工 中 四 業 房 会 北 月 千 元 年 間 販 売 酒 家 取 台 止 あ い う え お か き く け こ さ し す せ そ た ち つ て と ア イ ウ エ オ カ キ ク ケ コ サ シ ス セ ソ タ チ ツ テ ト ~ ! # & ( ) * , - . / : ? × ’ ↑ → ★ 、 。 々 〇 」 ・
Dataset
Completed in 2012
ICDAR2013 & 2015 Challenge 3 datasets
in neighboring frames
preservation
We spent more than 1,500 man hours
Bunsetsu: the smallest unit of words that sounds natural in a spoken sentence
same basic units as long as it can be traced
goes out from the frame
Dataset
are not tight enough
is not comprehensive
cameras
We will improve them
“Don’t care” is marked in illegible regions
Dataset
Subset of DOST which contain words consisting of alphabets and digits Data were sampled
18.7 6.1 13 19 11.9 8.5 28.5 2.4 1.2 10 20 30 ICDAR2003 ICDAR2013 Chal. 2 ICDAR2015 Chal. 4 SVT COCO-Text ICDAR2015 Chal. 3 YVT DOST DOST Latin F-measure [%] Image DB Video DB
47.5 4.8 6.3 29.1 1.5 3.9 1.9 2.8 2.1 10 20 30 40 50 ICDAR2003 ICDAR2013 Chal. 2 ICDAR2015 Chal. 4 SVT COCO-Text ICDAR2015 Chal. 3 YVT DOST DOST Latin F-measure [%] Image DB Video DB
81.8 71.3 48.5 24.2 17.1 44.1 37.7 2.7 11.2 20 40 60 80 100 ICDAR2003 ICDAR2013 Chal. 2 ICDAR2015 Chal. 4 SVT COCO-Text ICDAR2015 Chal. 3 YVT DOST DOST Latin F-measure [%] Image DB Video DB Recognized in Japanese mode
Dataset