Vincent Wan, Chun-an Chan, Tom Kenter, Jakub Vit, Rob Clark
CHiVE
Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network
CHiVE Varying Prosody in Speech Synthesis with a Linguistically - - PowerPoint PPT Presentation
CHiVE Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network Vincent Wan, Chun-an Chan, Tom Kenter, Jakub Vit, Rob Clark Modelling intonation in prosody A conditional variational
Vincent Wan, Chun-an Chan, Tom Kenter, Jakub Vit, Rob Clark
Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network
Modelling intonation in prosody
A conditional variational autoencoder captures the difgerent intonations
Language has a hierarchical linguistic structure
Sentence hello sil Words sil Syllables sil sil h+e l+ou Phonemes sil sil h e l
Frames
Add linguistic knowledge to the network
The structured model is betuer
CHiVE (46.1%) Baseline (30.7%) No preference (23.2%)