LipNet
End-to-End Sentence-level Lipreading
Yannis Assael, Brendan Shillingford, Shimon Whiteson, Nando de Freitas
NVIDIA GTC San Jose 2017
LipNet End-to-End Sentence-level Lipreading Yannis Assael, Brendan - - PowerPoint PPT Presentation
LipNet End-to-End Sentence-level Lipreading Yannis Assael, Brendan Shillingford, Shimon Whiteson, Nando de Freitas NVIDIA GTC San Jose 2017 Outline 1. Introduction 2. Background 3. LipNet 4. Analysis 1. Introduction How easy do you think
End-to-End Sentence-level Lipreading
Yannis Assael, Brendan Shillingford, Shimon Whiteson, Nando de Freitas
NVIDIA GTC San Jose 2017
/21
We can improve it…
3
/21
https://goo.gl/hyFBVQ
4
/21
Among others:
5
/21
https://goo.gl/RTXh9Q
6
/21
7
/21
descent
8
input predictive distribution Layer 1 Layer 2 Layer L
/21
9
deeplearning.net
/21
10
/21
marginalises over all alignments
p(am) = p(aam) + p(amm) + p(_am) + p(a_m) + p(am_)
11
/21
12
/21 13
/21 14
/21 15
/21
3 students from the Oxford Students’ Disability Community
Replicate previous state-of-the-art architecture by (Wand et al., 2016)
Spatial-only convolutions
Language model disabled
16
/21 17
Unseen Speakers Overlapped Speakers CER WER CER WER
Hearing Impaired
47.7%
Baseline- LSTM
38.4% 52.8% 15.2% 26.3%
Baseline- 2D
16.2% 26.7% 4.3% 11.6%
Baseline- NoLM
6.7% 13.6% 2.0% 5.6%
LipNet
6.4% 11.4% 1.9% 4.8%
/21 18
/21 19
DGX-1