NASNet, Speech Synthesis, External Memory Networks Milan Straka - PowerPoint PPT Presentation

NPFL114, Lecture 12 NASNet, Speech Synthesis, External Memory Networks Milan Straka May 18, 2020 Charles University in Prague Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics unless otherwise stated

Neural Architecture Search (NASNet) – 2017 We can design neural network architectures using reinforcement learning. The designed network is encoded as a sequence of elements, and is generated using an RNN controller , which is trained using the REINFORCE with baseline algorithm.                                  Figure 1 of paper "Learning Transferable Architectures for Scalable Image Recognition", https://arxiv.org/abs/1707.07012. For every generated sequence, the corresponding network is trained on CIFAR-10 and the development accuracy is used as a return. NPFL114, Lecture 12 NASNet WaveNet ParallelWaveNet Tacotron NTM DNC MANN 2/44

Neural Architecture Search (NASNet) – 2017 The overall architecture of the designed network is fixed and only the Normal Cells and Reduction Cells are generated by the controller. Figure 2 of paper "Learning Transferable Architectures for Scalable Image Recognition", https://arxiv.org/abs/1707.07012. NPFL114, Lecture 12 NASNet WaveNet ParallelWaveNet Tacotron NTM DNC MANN 3/44

Neural Architecture Search (NASNet) – 2017 B = 5 B Each cell is composed of blocks ( is used in NASNet). Each block is designed by a RNN controller generating 5 parameters. Figure 3 of paper "Learning Transferable Architectures for Scalable Image Recognition", https://arxiv.org/abs/1707.07012.                                                                                                                                             Figure 2 of paper "Learning Transferable Architectures for Scalable Image Recognition", Page 3 of paper "Learning Transferable Architectures for Scalable Image Recognition", https://arxiv.org/abs/1707.07012. https://arxiv.org/abs/1707.07012. NPFL114, Lecture 12 NASNet WaveNet ParallelWaveNet Tacotron NTM DNC MANN 4/44

Neural Architecture Search (NASNet) – 2017 The final proposed Normal Cell and Reduction Cell: Page 3 of paper "Learning Transferable Architectures for Scalable Image Recognition", https://arxiv.org/abs/1707.07012. NPFL114, Lecture 12 NASNet WaveNet ParallelWaveNet Tacotron NTM DNC MANN 5/44

EfficientNet Search EfficientNet changes the search in two ways. Computational requirements are part of the return. Notably, the goal is to find an m architecture maximizing 0.07 TargetFLOPS=400M ) ( DevelopmentAccuracy( m ) ⋅ FLOPS( m ) 0.07 where the constant balances the accuracy and FLOPS. Using a different search space, which allows to control kernel sizes and channels in different parts of the overall architecture (compared to using the same cell everywhere as in NASNet). NPFL114, Lecture 12 NASNet WaveNet ParallelWaveNet Tacotron NTM DNC MANN 6/44

EfficientNet Search Figure 4 of paper "MnasNet: Platform-Aware Neural Architecture Search for Mobile", https://arxiv.org/abs/1807.11626.                  The overall architecture consists of 7 blocks, each described by 6         parameters – 42 parameters in total, compared to 50 parameters of                    NASNet search space.                 Page 4 of paper "MnasNet: Platform-Aware Neural Architecture Search for Mobile", https://arxiv.org/abs/1807.11626. NPFL114, Lecture 12 NASNet WaveNet ParallelWaveNet Tacotron NTM DNC MANN 7/44

EfficientNet-B0 Baseline Network                                                                                                 Table 1 of paper "EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks", https://arxiv.org/abs/1905.11946 NPFL114, Lecture 12 NASNet WaveNet ParallelWaveNet Tacotron NTM DNC MANN 8/44

WaveNet Our goal is to model speech, using a auto-regressive model ∏ P ( x ) = P ( x ∣ x , … , x ). t −1 1 t t Figure 2 of paper "WaveNet: A Generative Model for Raw Audio", https://arxiv.org/abs/1609.03499. NPFL114, Lecture 12 NASNet WaveNet ParallelWaveNet Tacotron NTM DNC MANN 9/44

WaveNet Figure 3 of paper "WaveNet: A Generative Model for Raw Audio", https://arxiv.org/abs/1609.03499. NPFL114, Lecture 12 NASNet WaveNet ParallelWaveNet Tacotron NTM DNC MANN 10/44

WaveNet Output Distribution 65 536 The raw audio is usually stored in 16-bit samples. However, classification into classes μ would not be tractable, and instead WaveNet adopts -law transformation and quantize the samples into 256 values using ln(1 + 255∣ x ∣) sign( x ) . ln(1 + 255) Gated Activation To allow greater flexibility, the outputs of the dilated convolutions are passed through the gated activation units z = tanh( W ∗ x ) ⋅ σ ( W ∗ x ). f g NPFL114, Lecture 12 NASNet WaveNet ParallelWaveNet Tacotron NTM DNC MANN 11/44

NASNet, Speech Synthesis, External Memory Networks Milan Straka - PowerPoint PPT Presentation

NPFL114, Lecture 12 NASNet, Speech Synthesis, External Memory Networks Milan Straka May 18, 2020 Charles University in Prague Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics unless otherwise stated Neural

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Speech Processing 11-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

11-752: Speech Synthesis Objectives Understand basic processing in speech synthesis

Speech Processing 15-492/18-492 Speech Synthesis Evaluation Evaluating Speech Synthesis How

Speech Processing 15-492/18-492 Speech Synthesis Waveform generation 2 Speech Synthesis Text

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation Letter to Sound rules Speech

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

SYNTHESIS OF SUPER SYNTHESIS OF SUPER NANOPOROUS SYNTHESIS OF SUPER SYNTHESIS OF

Text-to-Speech Synthesis Bernd Mbius Language Science and Technology Saarland University

Speech Processing 15-492/18-492 Speech Synthesis Talking heads Singing Synthesis More

Data-Limited Face Analysis Yibo Hu JD AI Research Previously, CRIPAC, CASIA

SAT Modulo Monotonic Theories Sam Bayless , Noah Bayless , Holger H. Hoos , Alan J. Hu

Speech Synthesis Lecture 19 CS 753 Instructor: Preethi Jyothi Project Preliminary Report

Modern speech synthesis for phonetic sciences: a discussion and evaluation Zofia Malisz 1 ,

Memory Consistency Models CSE 451 James Bornholt Memory consistency models The short version:

H-Store: A Specialized Architecture for High-throughput OLTP Applications Evan Jones (MIT)

Core-Chasing Algorithms for the Eigenvalue Problem David S. Watkins Department of Mathematics

Snappy Ubuntu Core Enabling secure devices with app stores We are the company behind Ubuntu.