flowavenet a generative flow for raw audio
play

FloWaveNet: A Generative Flow for Raw Audio Sungwon Kim 1 , Sang-gil - PowerPoint PPT Presentation

ICML 2019 FloWaveNet: A Generative Flow for Raw Audio Sungwon Kim 1 , Sang-gil Lee 1 , Jongyoon Song 1 , Jaehyeon Kim 2 , Sungron Yoon 1,3 1 Seoul National University, 2 Kakao Corporation, 3 ASRI, INMC, Institute of Engineering Research, Seoul


  1. ICML 2019 FloWaveNet: A Generative Flow for Raw Audio Sungwon Kim 1 , Sang-gil Lee 1 , Jongyoon Song 1 , Jaehyeon Kim 2 , Sungron Yoon 1,3 1 Seoul National University, 2 Kakao Corporation, 3 ASRI, INMC, Institute of Engineering Research, Seoul National University Poster 6/12 6:30 PM @Pacific Ballroom #2

  2. WaveNet ) log $ % & ':) = + log $ % & , & ., ,-' https://deepmind.com/blog/wavenet-generative-model-raw-audio/

  3. WaveNet Sequential sampling ) log $ % & ':) = + log $ % & , & ., ,-' https://deepmind.com/blog/wavenet-generative-model-raw-audio/

  4. Previous parallel speech synthesis models Pre-trained WaveNet Inverse Autoregressive Flows (IAFs) Probability Density Distillation !" # $ % ||# ' % Oord, Aaron, et al. "Parallel WaveNet: Fast High-Fidelity Speech Synthesis." International Conference on Machine Learning . 2018.

  5. Previous parallel speech synthesis models Pre-trained WaveNet Parallel sampling Inverse Autoregressive Flows (IAFs) Probability Density Distillation !" # $ % ||# ' % Oord, Aaron, et al. "Parallel WaveNet: Fast High-Fidelity Speech Synthesis." International Conference on Machine Learning . 2018.

  6. Previous parallel speech synthesis models Pre-trained WaveNet Parallel sampling Inverse Autoregressive Flows (IAFs) Probability Density Distillation Power Loss Perceptual Loss + !" # $ % ||# ' % Contrastive Loss Frame Loss Oord, Aaron, et al. "Parallel WaveNet: Fast High-Fidelity Speech Synthesis." International Conference on Machine Learning . 2018.

  7. Our Objectives • Simplify the training procedure for parallel sampling • Maintain the quality of speech samples

  8. Our Objectives • Simplify the training procedure for parallel sampling • Maintain the quality of speech samples Flow-based generative models for raw audio!

  9. FloWaveNet 3 3 + % , - Raw audio Gaussian Noise + log det 2, - & Training phase log $ % & ':) = log $ + , - & ':) 2&

  10. FloWaveNet 5 5 + % , - 34 , - Raw audio Gaussian Noise + log det 2, - & Training phase log $ % & ':) = log $ + , - & ':) 2& Sampling phase 34 (6) 6 = 6 ':) ~ 5 + 6 = 8 9, ; , & = , - <

  11. FloWaveNet 5 5 + % , - 34 , - Raw audio Gaussian Noise + log det 2, - & Training phase log $ % & ':) = log $ + , - & ':) 2& Sampling phase 34 (6) 6 = 6 ':) ~ 5 + 6 = 8 9, ; , & = , - < 34 are designed to be computed efficiently Both the transformation , - and , - à Efficient training & Parallel sampling

  12. FloWaveNet 7 4 56 7 4 59 4 : 7 ⋅ 4 59 7 log det 3 4 56 & log $ % & ':) = log $ + , & ':) + . 3& /

  13. Mean Opinion Scores FloWaveNet ≥ Gaussian IAF

  14. Sampling speed FloWaveNet ≅ Gaussian IAF ≅ Parallel WaveNet >> Autoregressive WaveNet 1000s times faster

  15. Conclusion • FloWaveNet produces high quality audio samples as well as previous distilled models. • FloWaveNet synthesizes audio samples in parallel – w/o well pre-trained WaveNet (No distillation!) – w/o auxiliary loss terms Demo page Code Poster 6/12 6:30 PM @Pacific Ballroom #2

  16. 16

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend