FloWaveNet: A Generative Flow for Raw Audio Sungwon Kim 1 , Sang-gil - - PowerPoint PPT Presentation

flowavenet a generative flow for raw audio
SMART_READER_LITE
LIVE PREVIEW

FloWaveNet: A Generative Flow for Raw Audio Sungwon Kim 1 , Sang-gil - - PowerPoint PPT Presentation

ICML 2019 FloWaveNet: A Generative Flow for Raw Audio Sungwon Kim 1 , Sang-gil Lee 1 , Jongyoon Song 1 , Jaehyeon Kim 2 , Sungron Yoon 1,3 1 Seoul National University, 2 Kakao Corporation, 3 ASRI, INMC, Institute of Engineering Research, Seoul


slide-1
SLIDE 1

FloWaveNet: A Generative Flow for Raw Audio

Sungwon Kim1, Sang-gil Lee1, Jongyoon Song1, Jaehyeon Kim2, Sungron Yoon1,3

1Seoul National University, 2Kakao Corporation, 3ASRI, INMC, Institute of Engineering Research, Seoul National University

ICML 2019

Poster 6/12 6:30 PM @Pacific Ballroom #2

slide-2
SLIDE 2

WaveNet

log $% &':) = +

,-' )

log $% &, &.,

https://deepmind.com/blog/wavenet-generative-model-raw-audio/

slide-3
SLIDE 3

WaveNet

log $% &':) = +

,-' )

log $% &, &.,

Sequential sampling

https://deepmind.com/blog/wavenet-generative-model-raw-audio/

slide-4
SLIDE 4

Previous parallel speech synthesis models

!" #

$ % ||#' % Pre-trained WaveNet Inverse Autoregressive Flows (IAFs) Probability Density Distillation

Oord, Aaron, et al. "Parallel WaveNet: Fast High-Fidelity Speech Synthesis." International Conference on Machine Learning. 2018.

slide-5
SLIDE 5

Previous parallel speech synthesis models

!" #

$ % ||#' % Pre-trained WaveNet Inverse Autoregressive Flows (IAFs) Probability Density Distillation Parallel sampling

Oord, Aaron, et al. "Parallel WaveNet: Fast High-Fidelity Speech Synthesis." International Conference on Machine Learning. 2018.

slide-6
SLIDE 6

Previous parallel speech synthesis models

!" #

$ % ||#' % Pre-trained WaveNet Inverse Autoregressive Flows (IAFs) Probability Density Distillation Power Loss Perceptual Loss Contrastive Loss Frame Loss

+

Parallel sampling

Oord, Aaron, et al. "Parallel WaveNet: Fast High-Fidelity Speech Synthesis." International Conference on Machine Learning. 2018.

slide-7
SLIDE 7

Our Objectives

  • Simplify the training procedure for parallel sampling
  • Maintain the quality of speech samples
slide-8
SLIDE 8

Our Objectives

  • Simplify the training procedure for parallel sampling
  • Maintain the quality of speech samples

Flow-based generative models for raw audio!

slide-9
SLIDE 9

FloWaveNet

log $% &':) = log $+ ,- &':) + log det 2,- & 2&

,-

3

%

3+

Raw audio Gaussian Noise

Training phase

slide-10
SLIDE 10

FloWaveNet

log $% &':) = log $+ ,- &':) + log det 2,- & 2&

,- ,-

34

5

%

5+ 6 = 6':) ~ 5+ 6 = 8 9, ; , < & = ,-

34(6) Raw audio Gaussian Noise

Training phase Sampling phase

slide-11
SLIDE 11

FloWaveNet

log $% &':) = log $+ ,- &':) + log det 2,- & 2&

,- ,-

34

5

%

5+ 6 = 6':) ~ 5+ 6 = 8 9, ; , < & = ,-

34(6) Raw audio Gaussian Noise

Training phase Sampling phase Both the transformation ,- and ,-

34 are designed to be computed efficiently

à Efficient training & Parallel sampling

slide-12
SLIDE 12

FloWaveNet

log $% &':) = log $+ , &':) + .

/

log det 3 456

7 ⋅ 459 7

& 3&

4:

459

7

456

7

slide-13
SLIDE 13

Mean Opinion Scores

FloWaveNet ≥ Gaussian IAF

slide-14
SLIDE 14

Sampling speed

FloWaveNet ≅ Gaussian IAF ≅ Parallel WaveNet >> Autoregressive WaveNet

1000s times faster

slide-15
SLIDE 15

Conclusion

  • FloWaveNet produces high quality audio samples as well as

previous distilled models.

  • FloWaveNet synthesizes audio samples in parallel

– w/o well pre-trained WaveNet (No distillation!) – w/o auxiliary loss terms

Demo page Code

Poster 6/12 6:30 PM @Pacific Ballroom #2

slide-16
SLIDE 16

16