Scott Stevenson
scott@faculty.ai
Deep learning for speech synthesis The good news, the bad news, - - PowerPoint PPT Presentation
Deep learning for speech synthesis The good news, the bad news, and the fake news Scott Stevenson scott@faculty.ai The fake news 2 3 4 The effect of hot mic incidents Hot mic incidents Incidents can change can bring huge
Scott Stevenson
scott@faculty.ai
2
3
4
5
“Hot mic” incidents can bring huge negative publicity Incidents can change voting intentions and the
This has been demonstrated on multiple occasions Politicians are particularly at risk
6
7
realistic audio they can fabricate “hot mic” recordings
expertise
this imminently possible
we can inoculate people against it
8
text audio linguistic representation
L IH NG G W IH S T IH K . R EH P R AH Z EH N T EY SH AH N .
9
10
11
T R IH L Y AH N . D AA L ER Z .
The CMU Pronouncing Dictionary http://www.speech.cs.cmu.edu/cgi-bin/cmudict
12
13
linguistic representation
concatenative systems
audio samples (“units”)
14
○ “My latest project is to learn how to project my voice”: two pronunciations of project ○ Liaison in French: final consonant no longer silent if following word begins with vowel
15
information to synthesise speech
systems because of DSP artefacts
16
17
18
19
arXiv 1609.03499
20
sufficiently large receptive field for good prosody
21
22
23
Filter Gate
arXiv 1606.05328
24
arXiv 1609.03499
25
26
27
28
29
30
31
32
Via backpropagation, Generator learns to produces better audio, while Discriminator learns to better distinguish synthetic from real.
33
35
scott@faculty.ai
Follow us:
We are hiring data scientists and machine learning engineers at all levels. If you’re interested in finding
36