a general artificial neural network extension for htk
play

A General Artificial Neural Network Extension for HTK Chao Zhang - PowerPoint PPT Presentation

A General Artificial Neural Network Extension for HTK Chao Zhang & Phil Woodland University of Cambridge 15 April 2015 Overview Design Principles Implementation Details Generic ANN Support ANN Training Data Cache


  1. A General Artificial Neural Network Extension for HTK Chao Zhang & Phil Woodland University of Cambridge 15 April 2015

  2. Overview • Design Principles • Implementation Details � Generic ANN Support � ANN Training � Data Cache � Other Features • A Summary of HTK-ANN • HTK based Hybrid/Tandem Systems & Experiments � Hybrid SI System � Tandem SAT System � Demo Hybrid System with Flexible Structures • Conclusions 2 of 14

  3. Design Principles • The design should be as generic as possible. � Flexible input feature configurations. � Flexible ANN model architectures. • HTK-ANN should be compatible with existing functions. � To minimise the e ff ort to reuse previous source code and tools. � To simplify the transfer of many technologies. • HTK-ANN should be kept “research friendly”. 3 of 14

  4. Generic ANN Support • In HTK-ANN, ANNs have layered structures. � An HMM set can have any number of ANNs. � Each ANN can have any number of layers. • An ANN layer has � Parameters: weights, biases, activation function parameters � An input vector: defined by a feature mixture structure • A feature mixture has any number of feature element s • A feature element defines a fragment of the input vector by � Source: acoustic features, augmented features, output of some layer. � A context shift set: integers indicated the time di ff erence. 4 of 14

  5. Generic ANN Support • In HTK-ANN, ANN structures can be any directed cyclic graph. • Since only standard EBP is included at present, HTK-ANN can train non-recurrent ANNs properly (directed acyclic graph). t-6 Feature Element 1 Source: Input acoustic features t-3 Context Shift Set: {-6, -3, 0, 3, 6} t t+3 Feature Element 2 Source: ANN 1, Layer 3, Outputs t+6 Context Shift Set: {0} t t-1 Feature Element 3 Source: ANN 2, Layer 2, Outputs t Context Shift Set: {-1, 0, 1} t+1 Figure: An example of a feature mixture. 5 of 14

  6. ANN Training • HTK-ANN supports di ff erent training criteria � Frame-level: CE, MMSE � Sequence-level: MMI, MPE, MWE • ANN model training labels can come from � Frame-to-label alignment: for CE and MMSE criteria � Feature files: for autoencoders � Lattice files: for MMI, MPE, and MWE criteria • Gradients for SGD can be modified with momentum, gradient clipping, weight decay, and max norm. • Supported learning rate schedulers include List, Exponential Decay, AdaGrad, and a modified NewBob. 6 of 14

  7. Data Cache • HTK-ANN has three types of data shu ffl ing � Frame based shu ffl ing: CE/MMSE for DNN, (unfolded) RNN � Utterance based shu ffl ing: MMI, MPE, and MWE training � Batch of utterance level shu ffl ing: RNN, ASGD 5 1 1 2 3 4 3 2 1 2 3 1 3 1 2 3 4 5 4 1 2 3 4 batch t batch t batch t Figure: Examples of di ff erent types of data shu ffl ing. 7 of 14

  8. Other Features • Math Kernels: CPU, MKL, and CUDA based new kernels for ANNs • Input Transforms: compatible with HTK SI/SD input transforms • Speaker Adaptation: an ANN parameter unit online replacement • Model Edit � Insert/Remove/Initialise an ANN layer � Add/Delete a feature element to a feature mixture � Associate an ANN model to HMMs • Decoders � HVite: tandem/hybrid system decoding/alignment/model marking � HDecode: tandem/hybrid system LVCSR decoding � HDecode.mod: tandem/hybrid system model marking � A Joint decoder: log-linear combination of systems (same decision tree) 8 of 14

  9. A Summary of HTK-ANN • Extended modules: HFBLat, HMath, HModel, HParm, HRec, HLVRec • New modules � HANNet: ANN structures & core algorithms � HCUDA: CUDA based math kernel functions � HNCache: Data cache for data random access • Extended tools: HDecode, HDecode.mod, HHEd, HVite • New tools � HNForward: ANN evaluation & output generation � HNTrainSGD: SGD based ANN training 9 of 14

  10. Building Hybrid SI Systems • Steps of building CE based SI CD-DNN-HMMs using HTK � Produce desired tied state GMM-HMMs by decision tree tying (HHEd) � Generate ANN-HMMs by replacing GMMs with an ANN (HHEd) � Generate frame-to-state labels with a pre-trained system (HVite) � Train ANN-HMMs based on CE (HNTrainSGD) • Steps for CD-DNN-HMM MPE training � Generate num./den. lattices (HLRescore & HDecode) � Phone mark num./den. lattices (HVite or HDecode.mod) � Perform MPE training (HNTrainSGD) 10 of 14

  11. ANN Front-ends for GMM-HMMs • ANNs can be used as GMM-HMM front-ends by using a feature mixture to define the composition of the GMM-HMM input vector. • HTK can accomodate a tandem SAT system as a single system � Mean and variance normalisations are treated as activation functions. � SD parameters are replaceable according to speaker ids. Pitch Mean/Variance Normalisation PLP HLDA CMLLR Pitch Bottleneck DNN STC PLP Figure: A composite ANN as a Tandem SAT system front-end. 11 of 14

  12. Standard BOLT System Results • Hybrid DNN structure: 504 ⇥ 2000 4 ⇥ 1000 ⇥ 12000 • Tandem DNN structure: 504 ⇥ 2000 4 ⇥ 1000 ⇥ 26 ⇥ 12000 System Criterion %WER Hybrid SI CE 34.5 Hybrid SI MPE 31.6 Tandem SAT MPE 33.2 Hybrid SI ⌦ Tandem SAT MPE 31.0 Table: Performance of BOLT tandem and hybrid systems with standard configurations evaluated on dev’14. ⌦ is the joint decoding with system dependent combination weights (1.0, 0.2). 12 of 14

  13. WSJ Demo Systems with Flexible Structures • Stacking MLPs: (468 + ( n � 1) ⇥ 200) ⇥ 1000 ⇥ 200 ⇥ 3000, n = 1 , 2 , . . . . Each MLP takes all previous BN features as input. • The top MLP does not have a BN layer. • System was trained with CE based discriminative pre-training and fine-tuning. • Systems were trained with 15 hours Wall Street Journal (WSJ0). FNN %Accuracy %WER Num Train Held-out 65k dt 65k et 1 69.9 58.1 9.3 10.9 2 72.8 59.1 9.0 10.4 3 73.9 59.1 8.8 10.7 Table: Performance of the WSJ0 Demo Systems. 13 of 14

  14. Conclusions • HTK-ANN integrates native support of ANNs into HTK. • HTK based GMM technologies can be directly applied to ANN-based systems. • HTK-ANN can train FNNs with very flexible configurations � Topologies equivalent to DAG � Di ff erent activation functions � Various input features � Frame-level and sequence-level training criteria • Experiments on 300h CTS task showed HTK can generate standard state-of-the-art tandem and hybrid systems. • WSJ0 experiments showed HTK can build systems with flexible structures. • HTK-ANN will be available with the release of HTK 3.5 in 2015. 14 of 14

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend