CHiVE Varying Prosody in Speech Synthesis with a Linguistically - - PowerPoint PPT Presentation

▶

Aug 21, 2022 359 likes •427 views

CHiVE Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network Vincent Wan, Chun-an Chan, Tom Kenter, Jakub Vit, Rob Clark Modelling intonation in prosody A conditional variational

SLIDE 1

Vincent Wan, Chun-an Chan, Tom Kenter, Jakub Vit, Rob Clark

CHiVE

Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network

SLIDE 2

Modelling intonation in prosody

SLIDE 3

A conditional variational autoencoder captures the difgerent intonations

SLIDE 4

Language has a hierarchical linguistic structure

Sentence hello sil Words sil Syllables sil sil h+e l+ou Phonemes sil sil h e l

Frames

SLIDE 5

Add linguistic knowledge to the network

SLIDE 6

The structured model is betuer

CHiVE (46.1%) Baseline (30.7%) No preference (23.2%)