CHiVE Varying Prosody in Speech Synthesis with a Linguistically - - PowerPoint PPT Presentation

chive
SMART_READER_LITE
LIVE PREVIEW

CHiVE Varying Prosody in Speech Synthesis with a Linguistically - - PowerPoint PPT Presentation

CHiVE Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network Vincent Wan, Chun-an Chan, Tom Kenter, Jakub Vit, Rob Clark Modelling intonation in prosody A conditional variational


slide-1
SLIDE 1

Vincent Wan, Chun-an Chan, Tom Kenter, Jakub Vit, Rob Clark

CHiVE

Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network

slide-2
SLIDE 2

Modelling intonation in prosody

slide-3
SLIDE 3

A conditional variational autoencoder captures the difgerent intonations

slide-4
SLIDE 4

Language has a hierarchical linguistic structure

Sentence hello sil Words sil Syllables sil sil h+e l+ou Phonemes sil sil h e l

  • u

Frames

slide-5
SLIDE 5

Add linguistic knowledge to the network

slide-6
SLIDE 6

The structured model is betuer

CHiVE (46.1%) Baseline (30.7%) No preference (23.2%)