Deep Transfer Mapping for Unsupervised Writer Adaptation Hong-Ming - PowerPoint PPT Presentation

Deep Transfer Mapping for Unsupervised Writer Adaptation Hong-Ming Yang 1,2 , Xu-Yao Zhang 1,2 , Fei Yin 1,2 , Jun Sun 4 , Cheng-Lin Liu 1,2,3 1 NLPR, Institute of Automation, Chinese Academy of Sciences 2 University of Chinese Academy of Sciences 3 CAS Center for Excellence in Brain Science and Intelligence Technology 4 Fujitsu Research & Development Center Aug. 8, 2018

Outline Introduction Style Transfer Mapping Motivation and the Proposed Method Experiments and Analysis Conclusions 2/18

Introduction  A main challenge for handwriting recognition: The large variability of distributions across training and different test data – Different writing styles of different writers – Different writing tools (e.g. different pens or electronic writing devices) – Different writing environments (e.g. normal or emergency situations) – ………… Written characters of two writers. 3/18

Introduction • Domain adaption: a form of transfer learning Adapt the base classifier to each domain in the test dataset Writer 1 Training Methods Adaptation Training Base Writer 2 Datasets Classifier ⁞ Writer N • Recent methods: mainly based on deep learning – Fine tuning with target domain data – Learning domain invariant representations (features) – Project source or target domain data to align the distribution 4

Style Transfer Mapping  Style transfer mapping (STM) Main idea: project the target domain (test) data to balance the data distribution 𝑞 𝑦 𝑡 ≠ 𝑞 𝑦 𝑢 𝑦 𝑢 = 𝐵 𝑢 𝑦 𝑢 + 𝑐 𝑢 𝑞 𝑦 𝑡 ≈ 𝑞 𝑦 𝑢 Learning classifier on 𝑦 𝑡 and apply 𝑦 𝑢 to the base classifier  Learning of the projection ( 𝑩 𝒖 and 𝒄 𝒖 ) 𝑜 2 + 𝛾 2 + 𝛿 𝑐 2 2 𝑛𝑗𝑜 𝐵∈𝑆 𝑒×𝑒 ,𝑐∈𝑆 𝑒 𝑔 𝑗 𝐵𝑡 𝑗 + 𝑐 − 𝑢 𝑗 𝐵 − 𝐽 𝐺 2 𝑗=1 Source points 𝒕 𝒋 : features in the target domain, i.e., 𝑦 𝑢 Target points 𝒖 𝒋 : prototype (LVQ) or mean (MQDF) for class 𝑧 𝑗 ( 𝑧 𝑗 is the label of sample 𝑡 𝑗 ) [Xu-Yao. Zhang et al., writer adaptation with style transfer mapping. TPAMI’13] 5/18

Style Transfer Mapping Solution: a convex quadratic programming problem, has a closed-form solution 𝐵 = 𝑅𝑄 −1 , 𝑐 = 1 𝑢 − 𝐵𝑡 𝑔 𝑜 𝑜 𝑜 𝑜 𝑜 − 1 − 1 𝑢 𝑡 Τ + 𝛾𝐽 𝑡 𝑡 Τ + 𝛾𝐽 = 𝑔 Τ Τ 𝑔 + 𝛿 𝑅 = 𝑔 𝑗 𝑢 𝑗 𝑡 𝑗 𝑄 = 𝑔 𝑗 𝑡 𝑗 𝑡 𝑗 𝑢 = 𝑔 𝑗 𝑢 𝑗 𝑡 = 𝑔 𝑗 𝑡 𝑗 𝑗 𝑔 𝑔 𝑗=1 𝑗=1 𝑗=1 𝑗=1 𝑗=1  Dealing with unsupervised adaptation – Using the pseudo labels, predicted by the base classifier – Iteration method: base classifier  pseudo label  adaptation  better pseudo label  adaptation  ……  Extend to convolutional neural networks (CNNs) Main idea: perform adaptation on the deep features 𝑔 𝑦 : 𝐷𝑂𝑂 𝑔𝑓𝑏𝑢𝑣𝑠𝑓 𝑓𝑦𝑢𝑠𝑏𝑑𝑢𝑝𝑠 𝑦 𝑡 = 𝑔 𝑦 𝑡 , 𝑦 𝑢 = 𝑔 𝑦 𝑢 [Xu-Yao. Zhang et al., writer adaptation with style transfer mapping. TPAMI’13] 6/18 [Xu-Yao. Zhang et al., online and offline handwritten Chinese character recognition: a comprehensive study and new benchmark. PR’17]

Motivations & Methods  Traditional adaptation methods with CNN – Consider only the fully connected layers – Perform adaptation only on one layer  Motivations – Adaptation on both fully connected layers and convolutional layers – Perform adaptation on multiple (or all) layers of the base CNN  Adaptation method for fully connected layers STM based on the deep features of the layer (unsupervised adaptation)  Adaptation methods for convolutional layers – Use a linear transformation to project the target domain data for aligning the data distributions – Propose four variations of linear transformation, which are based on different assumptions of the space relation in the feature maps 7/18

Motivations & Methods  Fully associate adaptation (FAA): – Output of a convolutional layer for an input 𝑦 𝑗 𝑑=𝐷,𝑘=𝐼,𝑙=𝑋 𝑝 𝑗 = 𝑒 𝑑𝑘𝑙 𝑑=1,𝑘=1,𝑙=1 𝑑, 𝑘, 𝑙: index of the feature maps, rows, and columns in each feature map – Assumption: all positions of 𝑑, 𝑘, 𝑙 are related to each other – Method: expand 𝑝 𝑗 to a long vector 𝑤 𝑗 with dimension 𝐷𝐼𝑋 , and learn a transformation 𝐵 ∈ 𝑆 𝐷𝐼𝑋×𝐷𝐼𝑋 , 𝑐 ∈ 𝑆 𝐷𝐼𝑋 by STM ′ = 𝐵𝑤 𝑗 + 𝑐 , 𝑤 𝑗 ′ ′ are related 𝐷𝐼𝑋 𝑘 = – 𝑤 𝑗 𝐵 𝑘𝑙 𝑤 𝑗 𝑙 + 𝑐 , each position 𝑘 in 𝑤 𝑗 𝑘 𝑙=1 to all positions in 𝑤 𝑗 8/18

Motivations & Methods  Partly associate adaptation (PAA): – Assumption: positions within the same feature map are related to each other, but the feature maps are mutually independent – Method: expand each feature map to a vector with dimension 𝐼𝑋 , and learn a transformation 𝐵 𝑑 ∈ 𝑆 𝐼𝑋×𝐼𝑋 , 𝑐 𝑑 ∈ 𝑆 𝐼𝑋 for each feature map 𝑑 separately by STM – Transformation 𝐵 𝑑 , 𝑐 𝑑 ensures the relation of positions within a feature map, learn 𝐵 𝑑 , 𝑐 𝑑 separately ensures the independence between the feature maps Separate STM for each feature map 9/18

Motivations & Methods  Weakly independent adaptation (WIA): – Assumption: all positions 𝑑, 𝑘, 𝑙 in 𝑝 𝑗 are independent to each other – Learn a transformation 𝑏, 𝑐 ∈ 𝑆 for each position 𝑑, 𝑘, 𝑙 separately by STM ′ – 𝑝 𝑗 𝑑 0 ,𝑘 0 ,𝑙 0 = 𝑏 𝑝 𝑗 𝑑 0 ,𝑘 0 ,𝑙 0 + 𝑐 2 + 𝛾 𝑏 − 1 2 + 𝛿𝑐 2 𝑂 𝑢 – 𝑛𝑗𝑜 𝑏,𝑐∈𝑆 𝑔 𝑗 𝑏 𝑝 𝑗 𝑑 0 ,𝑘 0 ,𝑙 0 + 𝑐 − 𝑢 𝑗 𝑑 0 ,𝑘 0 ,𝑙 0 𝑗=1 10/18

Motivations & Methods  Strong independent adaptation (SIA): – Assumption: all positions are independent to each other and the positions within the same feature map share a same linear transformation – Similar to the linear projection in the batch normalization (BN) layer – Learn a transformation 𝑏, 𝑐 ∈ 𝑆 for each feature map separately by STM 𝑂 𝑢 𝐼 𝑋 2 + 𝛾 𝑏 − 1 2 + 𝛿𝑐 2 𝑛𝑗𝑜 𝑏,𝑐∈𝑆 𝑔 𝑗 𝑏 𝑝 𝑗 𝑑 0 ,𝑘,𝑙 + 𝑐 − 𝑢 𝑗 𝑑 0 ,𝑘,𝑙 𝑗=1 𝑘=1 𝑙=1 11/18

Motivations & Methods  Analysis and comparison Adaptation FAA PAA WIA SIA Methods All positions Inner feature All positions All positions independent Assumption related map related independent & parameter sharing Feature Dimension 𝐷𝐼𝑋 𝐼𝑋 1 1 Matrix Size 𝐷𝐼𝑋 × 𝐷𝐼𝑋 𝐼𝑋 × 𝐼𝑋 1 × 1 1 × 1 Transformation 1 𝐷 𝐷𝐼𝑋 𝐷 Number Total Parameters 𝐷𝐼𝑋 𝐷𝐼𝑋 + 1 C 𝐼𝑋 𝐼𝑋 + 1 2𝐷𝐼𝑋 2𝐷 – Complexity & Flexibility: FAA > PAA > WIA > SIA – Computation & Memory efficiency: SIA > WIA > PAA > FAA 12/18

Motivations & Methods  Deep transfer mapping (DTM) Perform adaptation on multiple layers in a deep manner  Algorithm 1. Select a group of layers 𝑀 on which to perform adaptation 2. From bottom to top layers in 𝑀 , perform adaptation on the specific layer with the proposed adaptation methods, but keep the other layers unchanged 3. After adaptation on each layer, insert an linear layer after it and set the weights and bias of the linear layer as the solved 𝐵 and 𝑐  Advantages of DTM – More powerful for flexibly aligning the distributions between the domains – Captures more comprehensive information and minimize the discrepancy of distributions under different abstract levels 13/18

Experiments & Analysis  Datasets #Sample (in 3755- #Writers Dataset Dataset Info class) (domains) Training Set CASIA OLHWDB 1.0-1.2 2,697,673 1020 On-ICDAR2013 Test Set 224,590 60 Competition Adaptation Set Unlabeled samples from each domain (writer) in test set Online handwritten Chinese characters. The samples of one writer stored in one single file, can be viewed as a domain. 14/18

Experiments & Analysis Base classifier: 11 layers 97.55% accuracy on test set  Different adaptation methods for Convolutional layers Four adaption methods on the same convolutional layer #8 15/18

Experiments & Analysis  Adaptation property of different layers in CNN Adaptation method: WIA – From bottom to top layers, the adaptation performance increases – Bottom layers extract general features, which are applicable across different domains, thus the promotion are not obvious after adaptation – Top layers occupy abstract features, which are more domain specific, thus adaptation is helpful for such layers 16/18

Experiments & Analysis  Deep transfer mapping – DTM can further boost the performance of the base classifier – DTM still has some limitations, the promotion is not obvious when adopt overmuch adaptations 17/18

Deep Transfer Mapping for Unsupervised Writer Adaptation Hong-Ming - PowerPoint PPT Presentation

Deep Transfer Mapping for Unsupervised Writer Adaptation Hong-Ming Yang 1,2 , Xu-Yao Zhang 1,2 , Fei Yin 1,2 , Jun Sun 4 , Cheng-Lin Liu 1,2,3 1 NLPR, Institute of Automation, Chinese Academy of Sciences 2 University of Chinese Academy of Sciences 3

Texture and other Mappings Texture Mapping Texture Mapping Bump Mapping Bump Mapping

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Image Warping Image Mapping Image Mapping - Examples Forward Mapping Forward Mapping -

TEXTURE MAPPING 1 OUTLINE Introduce Mapping Methods Texture Mapping Environment

Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Minema Minema

D is T rack Refocus yourself Graeme Britz - Project Manager Max Suffel - Writer/User

12. Unsupervised Deep Learning CS 535 Deep Learning, Winter 2018 Fuxin Li With materials from

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Coastal Adaptation Kellie Fisher FCERM Senior Advisor Why Adaptation? Adaptation to a

Unsupervised Clustering Approaches for Domain Adaptation in Speaker Recognition Systems Stephen

Lightweight Unsupervised Domain Adaptation by Convolutional Filter Reconstruction Rahaf Aljundi,

Towards Assumption-free Unsupervised Domain Adaptation for Visual recognition

Advanced Texturing Environment Mapping Environment Mapping reflections Environment Mapping

Texture Mapping Texture Mapping 1 Texture Mapping Texture Mapping Motivation Motivation:

Texture Mapping Surface mapping OpenGl and Implementation Details Texture mapping Bump

CS11-737: Multilingual Natural Language Processing Translation Yulia Tsvetkov Translation Mr.

pab stakeholder discussion January 30 and January 31 pab stakeholder goals and timeline

PhEDEx and CMS Data Transfers Paul Rossman Fermilab Global CMS Data Network Paul Rossman

Multiclass object recognition Sharing parts and transfer learning Sharat Chikkerur Outline

2018 U.S. Utility-Scale Photovoltaics-Plus-Energy Storage System Costs Benchmark Ran Fu, Timothy

USC "Entrepreneurship" -- Information Modeling Instructor: Peter Baumann email:

Horizontal or Vertical Storage A fact table for data warehousing is often fat Tens of even

Location tracking Engineering & Public Policy Rebecca Balebako October 20, 2015 y &

Deep Transfer Mapping for Unsupervised Writer Adaptation Hong-Ming - PowerPoint PPT Presentation

Deep Transfer Mapping for Unsupervised Writer Adaptation Hong-Ming Yang 1,2 , Xu-Yao Zhang 1,2 , Fei Yin 1,2 , Jun Sun 4 , Cheng-Lin Liu 1,2,3 1 NLPR, Institute of Automation, Chinese Academy of Sciences 2 University of Chinese Academy of Sciences 3

Texture and other Mappings Texture Mapping Texture Mapping Bump Mapping Bump Mapping

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Image Warping Image Mapping Image Mapping - Examples Forward Mapping Forward Mapping -

TEXTURE MAPPING 1 OUTLINE Introduce Mapping Methods Texture Mapping Environment

Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Minema Minema

D is T rack Refocus yourself Graeme Britz - Project Manager Max Suffel - Writer/User

12. Unsupervised Deep Learning CS 535 Deep Learning, Winter 2018 Fuxin Li With materials from

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Coastal Adaptation Kellie Fisher FCERM Senior Advisor Why Adaptation? Adaptation to a

Unsupervised Clustering Approaches for Domain Adaptation in Speaker Recognition Systems Stephen

Lightweight Unsupervised Domain Adaptation by Convolutional Filter Reconstruction Rahaf Aljundi,

Towards Assumption-free Unsupervised Domain Adaptation for Visual recognition

Advanced Texturing Environment Mapping Environment Mapping reflections Environment Mapping

Texture Mapping Texture Mapping 1 Texture Mapping Texture Mapping Motivation Motivation:

Texture Mapping Surface mapping OpenGl and Implementation Details Texture mapping Bump

CS11-737: Multilingual Natural Language Processing Translation Yulia Tsvetkov Translation Mr.

pab stakeholder discussion January 30 and January 31 pab stakeholder goals and timeline

PhEDEx and CMS Data Transfers Paul Rossman Fermilab Global CMS Data Network Paul Rossman

Multiclass object recognition Sharing parts and transfer learning Sharat Chikkerur Outline

2018 U.S. Utility-Scale Photovoltaics-Plus-Energy Storage System Costs Benchmark Ran Fu, Timothy

USC &quot;Entrepreneurship&quot; -- Information Modeling Instructor: Peter Baumann email:

Horizontal or Vertical Storage A fact table for data warehousing is often fat Tens of even

Location tracking Engineering &amp; Public Policy Rebecca Balebako October 20, 2015 y &amp;

USC "Entrepreneurship" -- Information Modeling Instructor: Peter Baumann email:

Location tracking Engineering & Public Policy Rebecca Balebako October 20, 2015 y &