Dilated Convolutional Network with Iterative Optimization for - PowerPoint PPT Presentation

Dilated Convolutional Network with Iterative Optimization for Continuous Sign Language Recognition Junfu Pu, Wengang Zhou, Houqiang Li CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System, EEIS Department, University of Science and Technology of China pjh@mail.ustc.edu.cn, zhwg@ustc.edu.cn, lihq@ustc.edu.cn July 2018

Outline  Background  Contribution  Proposed Architecture  Iterative Optimization  Experimental Results  Conclusions 2

Background  What is Sign Language? ◼ Communicating language used primarily by deaf people ◼ Use different medium such as hands, face, etc. for communication purpose  Why Sign Language? ◼ > 20 million people with hearing damage ◼ Algorithm applied for human-machine interaction ◼ Social impact: AI techniques improve the life quality for people with disabilities 4

Background Problem in real world Communication Difficulty hearing and language damage Translation Research Topic Text Sign video Results Recognition (translation) System 5

Ƹ Background  Problem Formulation ➢ Continuous SLR ➢ Isolated SLR 𝑈 𝒕 = 𝑡 𝑗 𝑢=1 𝑑 = arg max 𝑞(𝑑 𝑗 |𝑾) 𝑡 𝑗 ∈ 𝒲|𝑗 = 1,2, … , 𝐿} 𝑗 Input 𝑗 = 1,2, … , 𝐿 𝒕 = arg max ො 𝒕∈𝒕 ∗ 𝑞(𝒕|𝑾) MOEGLICH HEUTE NACHT Output Democracy FROST GLATT VORSICHT FLUSS MOEGLICH PLUS ACHT 6

Contribution  Develop a new framework based on 3D residual network and dilated convolutions for continuous sign language recognition  Propose an iterative optimization strategy with Connectionist Temporal Classification (CTC) for our sign language recognition system  Outperform the state-of-the-art methods on RWTH-PHOENIX-Weather dataset 8

Proposed Architecture  Overall Framework ➢ Visual Feature Extractor: 3D-ResNet ➢ Sequence Learning Model: Dilated Conv. Net with CTC (𝑗−1) )) 𝑗−1 𝐖 𝑂 = 𝑤 𝑢 𝑢=1 𝐆 𝑂 = 𝚾 𝚰 𝒘 𝒖 𝑨 = tanh 𝒟 𝑒 ℎ 𝑢 ⊙ 𝜏(𝒟 𝑒 (ℎ 𝑢 𝑈 𝑂 𝑂 𝐘 = 𝑦 𝑢 𝑢=1 𝑢=1 𝑗 = tanh(𝒟 1∗1 (𝑨)) 𝑝 𝑢 𝑗 𝑝 𝑢 = ෍ ෍ 𝑝 𝑢 𝑗 = ℎ 𝑢 𝑗−1 + 𝑝 𝑢 𝑗 ℎ 𝑢 10 𝑏𝑚𝑚−𝑐𝑚𝑝𝑑𝑙𝑡 𝑗

Proposed Architecture  3D ResNet  Dilated Cell 𝑈 𝐘 = 𝑦 𝑢 𝑢=1 𝑗−1 (𝑗−1) )) 𝑨 = tanh 𝒟 𝑒 ℎ 𝑢 ⊙ 𝜏(𝒟 𝑒 (ℎ 𝑢 𝑗 = tanh(𝒟 1∗1 (𝑨)) 𝑝 𝑢 𝐖 𝑂 = 𝑤 𝑢 𝑢=1 𝑗 = ℎ 𝑢 𝑗−1 + 𝑝 𝑢 𝑂 𝑗 ℎ 𝑢 𝑗 𝑝 𝑢 = ෍ ෍ 𝑝 𝑢 𝑏𝑚𝑚−𝑐𝑚𝑝𝑑𝑙𝑡 𝑗 𝐆 𝑂 = 𝚾 𝚰 𝒘 𝒖 𝑂 11 𝑢=1

Iterative Optimization ➢ Step 1: Optimize dilated convolutional network with CTC loss, generate pseudo labels. ℒ CTC = − ln 𝑞(𝒕|𝐖) ℓ 𝑗 = arg max 𝑄 𝑗∗ 𝑘 ➢ Step 2: Fine-tune 3D-ResNet with category loss using pseudo labels. ➢ Step 3: Extract improved C3D features for sequence learning. Alternately run Step 1 and Step 2 until converge. 13

Experiments  Dataset and Evaluation ◼ Continuous SLR Dataset: RWTH-PHOENIX-Weather ◼ Evaluation Metric: Word Error Rate (WER)  3D-ResNet Setups and Initialization ◼ Image crops: 224x224 ◼ Sliding window: length 8, step 4 (50% overlap) ◼ Pre-trained on an isolated Chinese SLR dataset Batch size 5, learning rate 0.001, weight decay 5 × 10 −5 ◼ ◼ Pooling-5b activations for clip representation  Dilated Convolutional Network Setups ◼ Dilations for each layer: 1, 2, 4, 8, 16 ◼ Size of blocks: 5 15

Experimental Results  Iterative Results  Comparison 16

Experimental Results  An example for iterative optimization 17

Conclusions  A novel framework with dilated convolutions for continuous sign language recognition.  An iterative optimization strategy to train the proposed architecture by generating pseudo labels.  Performs well both in accuracy and speed. 18

Dilated Convolutional Network with Iterative Optimization for - PowerPoint PPT Presentation

Dilated Convolutional Network with Iterative Optimization for Continuous Sign Language Recognition Junfu Pu, Wengang Zhou, Houqiang Li CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System, EEIS

Video De-Captioning using U-Net with Stacked Dilated Convolutional Layers. ChaLearn Video

U-Finger Multi-Scale Dilated Convolutional Network for Fingerprint Image Denoising and Inpainting

Dense Predictions Using Dilated Convolutions Najmus Ibrahim University of Toronto Institute for

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

ON TEGRA X1 ALAN WANG, NVIDIA Convolutional Neural Network optimization target Result

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

Basic Techniques II: Iterative Compression Marek Cygan Institute of Informatics University of

Chapter 12: Iterative Methods ES 240: Scientific and Engineering Computation. Iterative Methods

Development Figures are from : Agile and Iterative Development: A Manager's Guide, Craig

Anytime Reliability of Systematic LDPC Motivation Convolutional Codes LDPC Convolutional Codes

Convolutional Autoencoder (CAE) Prof. Seungchul Lee Industrial AI Lab. Convolutional Autoencoder

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

Outline Convolutional Neural Network Architectures for Matching Natural Language Sentences.

Exploiting Phase Inter-Dependencies for Faster Iterative Compiler Optimization Phase Order

Surfing: Iterative Optimization Over Incrementally Trained Deep Networks Ganlin Song, Zhou Fan,

This Talks Three Key Takeaways Relativistic time dilation is incompatible with

Practical Genericity: Writing Image Processing Algorithms Both Reusable and Efficient Roland

Subproduct systems and superproduct systems (or: behind the scenes of the dilation theory of

Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 11:

Jump into ltering IMAGE P ROCES S IN G IN P YTH ON Rebeca Gonzalez Data Engineer Filters

ANOMALOUS DIFFUSION, DILATION, AND EROSION IN IMAGE PROCESSING joint work with Sophia Vorderw

Optimistic Regret Minimization for Extensive-Form Games via Dilated Distance-Generating Functions

Wavelet coorbit spaces over general dilation groups Hartmut Fhr fuehr@matha.rwth-aachen.de AHA