Generating Handwritten Character Clones from an Incomplete Seed Character Set using Collaborative Filtering
Kazuaki Nakamura, Eiji Miyazaki, Naoko Nitta, and Noboru Babaguchi Osaka University, Japan
- n 6th Aug. 2018, at ICFHR2018
using Collaborative Filtering Kazuaki Nakamura, Eiji Miyazaki, - - PowerPoint PPT Presentation
Generating Handwritten Character Clones from an Incomplete Seed Character Set using Collaborative Filtering Kazuaki Nakamura, Eiji Miyazaki, Naoko Nitta, and Noboru Babaguchi Osaka University, Japan on 6th Aug. 2018, at ICFHR2018 Research
resembling a target user’s actual handwriting.
Target user Training dataset
Generator
(e.g. auto-encoders, GANs)
fed into handwritten character clones; HCCs generate Automatically generated HCCs
because it is difficult to collect a lot of images from the target user.
especially in the case of Asian languages
even if the same writer writes the same character.
All of them have the similar characteristics but are slightly different from each other this is a pen. i am japanese. i like baseball. a, b, e, h, i, j, k, l, m, n, p, s, t c, d, f, g, o, q, r, u, v, w, x, y, z Training dataset seed non-seed
Not a single HCC but its distribution should be created. At most one or zero image is available per character as a training data.
a a ⋯ b b ⋯ z z ⋯ ⋮ INPUT A lot of images for each character OUTPUT HCC distribution for each character
𝑞 HCC "a" 𝑞 HCC "b" 𝑞 HCC "z"
⋮ INPUT A few images of several seed characters written in a certain style a c n y OUTPUT Images of the other characters that seems to be written in the same style b d e x z ⋱ ☒Incomplete seed character set ☑within-person variety ☑Incomplete seed character set ☒within-person variety
[1] T. S. Haines et al.: “My Text in Your Handwriting,” ACM Trans. on Graphics, Vol.35, No.3, 2016. [2] A. Graves: “Generating Sequences With Recurrent Neural Networks,” arXiv preprint, arXiv:1308.0850, 2013. [3] D. G. Balreira et al.: “Handwriting Synthesis from Public Fonts,” in Proc. of 30th SIBGRAPI Conf. on Graphics, Patterns and Images (SIBGRAPI), pp.246--253, 2017. [4] J. W. Lin et al.: “Complete Font Generation of Chinese Characters in Personal Handwriting Style,” in Proc. of 34th IEEE Int'l Performance Computing and Communications Conf. (IPCCC), pp.1--5, 2015. [5] Z. Lian et al.: “Automatic Generation of Large-Scale Handwriting Fonts via Style Learning,” in Proc. of SIGGRAPH ASIA 2016 Technical Briefs, 2016.
Side dataset
(e.g. an image per character)
𝑥1 𝑥2 𝑥3 𝑥4 Training dataset 𝑣
the target user 𝑣
character set
Encoder
feature extractor
feature set 𝑔
𝑥
feature set 𝑔
𝑣
𝜄1 𝜄3 𝜄2 𝜄4 ⋱ 𝜄𝐿 𝜄5 Parameter pool Parameter selection 𝑞 𝑔 መ 𝜄
select the parameter that is best-fit to 𝑔
𝑣
𝑔 feature distribution
for each character Sample a new feature 𝑔 ~𝑞 𝑔 𝜄
Decoder
HCC
Side dataset Encoder
feature extractor
𝑑 and the covariance Σ𝑙 𝑑 for each cluster 𝑙
𝜄𝑙 = 𝑢𝑙
𝑑, Σ𝑙 𝑑
𝑢1
𝑑, Σ1 𝑑
𝑢2
𝑑, Σ2 𝑑
𝑢3
𝑑, Σ3 𝑑
IOW, there are a lot of writer-pairs whose handwriting shapes are similar for some characters. Not only the average shape but also the shape distribution of their handwriting would be similar.
Hypothesis Separately perform the following procedure for each character 𝒅:
Side dataset Encoder
feature extractor
For a seed character 𝑑,
𝑑 .
𝑑 .
𝑢1
𝑑, Σ1 𝑑
𝑢2
𝑑, Σ2 𝑑
𝑢3
𝑑, Σ3 𝑑
Training dataset 𝑣
𝐽𝑣
𝑑
𝑔
𝑣 𝑑
𝜾𝒅 = 𝜾𝟒
𝒅
Side dataset Encoder
feature extractor
For a non-seed character 𝑑′,
𝑢1
𝑑, Σ1 𝑑
𝑢2
𝑑, Σ2 𝑑
𝑢3
𝑑, Σ3 𝑑
Training dataset 𝑣
𝐽𝑣
𝑑
𝑔
𝑣 𝑑
𝜾𝒅 = 𝜾𝟒
𝒅
for 𝑛-th character writer-character matrix
Seed characters non-seed character
target user 𝒅𝟐 𝒅𝟑 ⋯ 𝒅𝒏 ⋯ 𝒅𝑵 𝒙𝟐 𝜚11 𝜚12 ⋯ 𝜚1𝑛 ⋯ 𝜚1𝑁 𝒙𝟑 𝜚21 𝜚22 ⋯ 𝜚2𝑛 ⋯ 𝜚2𝑁 ⋮ ⋮ ⋮ ⋱ ⋮ ⋱ ⋮ 𝒙𝑲 𝜚𝐾1 𝜚𝐾2 ⋯ 𝜚𝐾𝑛 ⋯ 𝜚𝐾𝑁 𝒗 𝜚𝑣,1 𝜚𝑣,2 ⋯ ? ⋯ 𝜚𝑣,𝑁
𝜚𝑣,1, 𝜚𝑣,2, 𝜚𝑣,𝑁: known (estimated by Best-Fit strategy) 𝜚𝑣,𝑛: unknown try to estimate it by collaborative filtering!
If the feature distributions of two writers are similar with each other for some characters, their distributions for another character also tend to be similar. Hypothesis 𝑑1 𝑑2 𝑑3 𝑑4 𝑥1 𝜚11 𝜚12 𝜚13 𝜚14 𝑥2 𝜚21 𝜚22 𝜚23 𝜚24 𝑥3 𝜚31 𝜚32 𝜚33 𝜚34 𝑣 𝜚𝑣,1 𝜚𝑣,2 ? 𝜚𝑣,4 𝑣
target user 𝑥1 𝑥2 𝑥3 similarity Choose top-𝑂 similar writers similar writers
Based on the feature vectors of all the seed characters
𝑥
𝑘
[Majority voting] For each 𝑥
𝑘,
vote the similarity score sim 𝑣, 𝑥
𝑘
for 𝜚𝑘3-th parameter
similar writers 𝑙 𝑙 𝑙
and regard the remaining writers as “other writers”, i.e., 𝑥
𝑘 .
ETL4 ETL5
BestFit can generate HCCs
quite similar with Original.
UserCF and HybrCF can also
generate good HCCs. The performance of ItemCF is almost same with that of Random.
becomes statistically unreliable with large K. Average distance between feature of original image and that of generated HCCs K: num. of clusters (size of parameter pool)
Average distance between feature of original image and that of generated HCCs K: num. of clusters (size of parameter pool) Similar result was obtained.
are quite similar with Original.
generate good HCCs
UserCF is more suitable to
the HCC generation task.
Original BestFit UserCF
HCCs generated by BestFit slightly differ from each other while keeping the similar shape with original. This is also the case with UserCF. within-person variety
HCCs generated by BestFit slightly differ from each other while keeping the similar shape with original. This is also the case with UserCF. within-person variety
Original BestFit UserCF
i.e., an incomplete seed character set.
the target user’s handwriting is estimated for each character.
with a certain level of within-person variety