Visually grounded cross-lingual keyword spotting in speech
SLTU, August 2018
Herman Kamper1 and Michael Roth2
1E&E Engineering, Stellenbosch University, South Africa 2Saarland University, Germany
Visually grounded cross-lingual keyword spotting in speech SLTU, - - PowerPoint PPT Presentation
Visually grounded cross-lingual keyword spotting in speech SLTU, August 2018 Herman Kamper 1 and Michael Roth 2 1 E&E Engineering, Stellenbosch University, South Africa 2 Saarland University, Germany http://www.kamperh.com/ Advances in
1E&E Engineering, Stellenbosch University, South Africa 2Saarland University, Germany
1 / 12
1 / 12
1 / 12
1 / 12
2 / 12
Play
2 / 12
Play
2 / 12
Play
2 / 12
3 / 12
4 / 12
4 / 12
0.85 0.8 0.9 4 / 12
4 / 12
4 / 12
4 / 12
4 / 12
4 / 12
4 / 12
4 / 12
5 / 12
5 / 12
X Loss
max conv max feedfwd
ℓ
VGG-16 f(X)
Feld H u n d e s p r i n g t
English speech German (text) tags ˆ yde Cross-lingual keyword spotter
I
5 / 12
X Loss
max conv max feedfwd
ℓ
VGG-16 f(X)
Feld H u n d e s p r i n g t
English speech German (text) tags ˆ yde Cross-lingual keyword spotter
I
5 / 12
corresponds to dim. w
6 / 12
7 / 12
7 / 12
7 / 12
7 / 12
7 / 12
7 / 12
7 / 12
7 / 12
7 / 12
DETextPrior DEVisionCNN XVisionSpeechCNN XBoWCNN 8 / 12
9 / 12
9 / 12
9 / 12
9 / 12
9 / 12
9 / 12
DETextPrior DEVisionCNN XVisionSpeechCNN XBoWCNN 10 / 12
DETextPrior DEVisionCNN XVisionSpeechCNN XBoWCNN 10 / 12
DETextPrior DEVisionCNN XVisionSpeechCNN XBoWCNN 10 / 12
11 / 12
12 / 12
12 / 12
12 / 12
12 / 12
max conv max feedfwd
VGG-16 f(X)
F e l d Hunde springt
English speech German (text) tags ˆ yde Cross-lingual keyword spotter
corresponds to dim. w