Vector Quantized Neural Networks for Acoustic Unit Discovery - PowerPoint PPT Presentation

Vector Quantized Neural Networks for Acoustic Unit Discovery Benjamin van Niekerk, Leanne Nortje, Herman Kamper

The Generative Factors of Speech HH / Y / UW / M / ER HUMOUR Content: Prosody: Timbre: ● Discrete phonetic units. ● Rhythm ● Quality of a particular voice. ● ≅ 44 phonemes in English. ● Intonation ● Characterized by frequency ● Stresses spectrum.

The Generative Factors of Speech HH / Y / UW / M / ER Content: Prosody: Timbre: ● Discrete phonetic units. ● Rhythm ● Quality of a particular voice. ● ≅ 44 phonemes in English. ● Intonation ● Characterized by frequency ● Stresses spectrum.

The Generative Factors of Speech Content: Prosody: Timbre: ● Discrete phonetic units. ● Rhythm ● Quality of a particular voice. ● ≅ 44 phonemes in English. ● Intonation ● Characterized by frequency ● Stresses spectrum.

What is Acoustic Unit Discovery? The goal is to learn discrete representations of speech that separate phonetic content from the other factors. …all without any labels or annotations!

What is Acoustic Unit Discovery? The goal is to learn discrete representations of speech that separate phonetic content from the other factors. …all without any labels or annotations! Encoder

Applications Bootstrap training of low-resource speech systems: Automatic speech recognition Text-to-speech Non-parallel voice conversion

But, how do we learn discrete representations using neural networks?

But, how do we learn discrete representations using neural networks? A. van den Oord, and O. Vinyals. “Neural discrete representation learning.” Advances in Neural Information Processing Systems . 2017.

Vector Quantization Layer Codebook

Vector Quantization Layer Codebook Encoder

Our contribution: we propose and compare two models for acoustic unit discovery in the ZeroSpeech 2020 Challenge . A Vector-Quantized Variational A combination of Vector-Quantization and 1. 2. Autoencoder (VQ-VAE) Contrastive Predictive Coding (VQ-CPC) VQ layer Encoder Decoder Inspired by: J. Chorowski, et al. “Unsupervised speech representation learning using wavenet autoencoders.” IEEE/ACM transactions on audio, speech, and language processing. 2019.

Our contribution: we propose and compare two models for acoustic unit discovery in the ZeroSpeech 2020 Challenge . A Vector-Quantized Variational A combination of Vector-Quantization and 1. 2. Autoencoder (VQ-VAE) Contrastive Predictive Coding (VQ-CPC) VQ layer Encoder Decoder Inspired by: J. Chorowski, et al. “Unsupervised speech representation learning using wavenet autoencoders.” Inspired by: A. van den Oord, et al. “Representation Learning with Contrastive Predictive Coding.” 2018. IEEE/ACM transactions on audio, speech, and language processing. 2019.

Vector-Quantized Variational Autoencoder VQ layer Encoder Decoder

Vector-Quantized Variational Autoencoder VQ layer Encoder Decoder minimize reconstruction error

Vector-Quantized Variational Autoencoder Information bottleneck VQ layer Encoder Decoder

Vector-Quantized Variational Autoencoder Information bottleneck VQ layer Encoder Decoder Speaker

Vector-Quantized Variational Autoencoder Information bottleneck VQ layer Encoder Decoder Powerful autoregressive Speaker model

Vector-Quantized Contrastive Predictive Coding Prediction Input

Vector-Quantized Contrastive Predictive Coding Encoder Input

Vector-Quantized Contrastive Predictive Coding VQ layer Encoder Input

Vector-Quantized Contrastive Predictive Coding Context model VQ layer Encoder Input

Vector-Quantized Contrastive Predictive Coding Predictions Context model VQ layer Encoder Input

Vector-Quantized Contrastive Predictive Coding Context vector

Vector-Quantized Contrastive Predictive Coding Positive example Context vector

Vector-Quantized Contrastive Predictive Coding Positive example Context vector Negative examples

Evaluation - Voice Conversion Evaluation Metrics: VQ layer Encoder Decoder ● Speaker similarity (1-5 scale). ● Intelligibility (character error rate). ● Mean opinion score (1-5 scale).

Evaluation - Voice Conversion Source Converted Target Other Conversion

Evaluation - Voice Conversion

Evaluation - ABX Score Triphone A: bug Encoder

Evaluation - ABX Score Triphone A: Triphone B: bug bag Encoder Encoder

Evaluation - ABX Score Triphone A: Triphone X: Triphone B: bug bag bag Encoder Encoder Encoder

Evaluation - ABX Score

Questions?

Vector Quantized Variational Autoencoder Bottleneck Encoder linear(64) VQ(512) ReLU Decoder batchnorm conv 3 ( 768 ) jitter(0.5) embedding ReLU concat batchnorm upsample conv 3 ( 768 ) biGRU(128) ReLU biGRU(128) batchnorm 50Hz upsample conv 4 stride 2 ( 768 ) ReLU GRU(896) batchnorm linear(256) conv 3 ( 768 ) embedding ReLU ReLU linear(256) batchnorm ReLU conv 3 ( 768 ) 100Hz sample mu-law softmax log-Mel spec speaker

Vector Quantized Neural Networks for Acoustic Unit Discovery - PowerPoint PPT Presentation

Vector Quantized Neural Networks for Acoustic Unit Discovery Benjamin van Niekerk, Leanne Nortje, Herman Kamper The Generative Factors of Speech HH / Y / UW / M / ER HUMOUR Content: Prosody: Timbre: Discrete phonetic units. Rhythm

Acoustic Acoustic Control Systems BV Acoustic Acoustic Control Systems BV Control Systems BV

Quantized Vortices and Quantized Vortices and Quantum Turbulence Quantum Turbulence Makoto

Quantized Quantized superfluid vortices superfluid vortices in the unitary Fermi gas in the

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

Matrix and Vector Operations Matrix and Vector Operations 1 / 21 Matrix and Vector Operations

Integrating multi-dimensional information spaces Kostas Saidis, Alex Delis {saiko,ad}@di.uoa.gr

TECHNICAL DISCOVERY Ravindra Singh - @ravindrasingh01 Shashank Merothiya - @shashtra

Native Content Distribution through Off-Path Content Discovery A Proposal for a Downstream

Aural Melissa Chan (chanm3) & Josh Nazarian (jknaz) The problem: Personalized content

On the Feasibility of a User-Operated Mobile Content Distribution Network Ioannis Psaras , Vasilis

Distributed Algorithms for Content Allocation in Interconnected Content Distribution Networks

ESP Recreation & Tourism Update Draft Findings and Recommendations November 19, 2020 Update

Optimal content filtering in social networks with limited budget of attention Nidhi Hegde

Vector Quantized Neural Networks for Acoustic Unit Discovery - PowerPoint PPT Presentation

Vector Quantized Neural Networks for Acoustic Unit Discovery Benjamin van Niekerk, Leanne Nortje, Herman Kamper The Generative Factors of Speech HH / Y / UW / M / ER HUMOUR Content: Prosody: Timbre: Discrete phonetic units. Rhythm

Acoustic Acoustic Control Systems BV Acoustic Acoustic Control Systems BV Control Systems BV

Quantized Vortices and Quantized Vortices and Quantum Turbulence Quantum Turbulence Makoto

Quantized Quantized superfluid vortices superfluid vortices in the unitary Fermi gas in the

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

Matrix and Vector Operations Matrix and Vector Operations 1 / 21 Matrix and Vector Operations

Integrating multi-dimensional information spaces Kostas Saidis, Alex Delis {saiko,ad}@di.uoa.gr

TECHNICAL DISCOVERY Ravindra Singh - @ravindrasingh01 Shashank Merothiya - @shashtra

Native Content Distribution through Off-Path Content Discovery A Proposal for a Downstream

Aural Melissa Chan (chanm3) &amp; Josh Nazarian (jknaz) The problem: Personalized content

On the Feasibility of a User-Operated Mobile Content Distribution Network Ioannis Psaras , Vasilis

Distributed Algorithms for Content Allocation in Interconnected Content Distribution Networks

ESP Recreation &amp; Tourism Update Draft Findings and Recommendations November 19, 2020 Update

Optimal content filtering in social networks with limited budget of attention Nidhi Hegde

Aural Melissa Chan (chanm3) & Josh Nazarian (jknaz) The problem: Personalized content

ESP Recreation & Tourism Update Draft Findings and Recommendations November 19, 2020 Update