W ITH the widespread use of hands-free electronic gad- are mapped - PDF document

1 Transfer learning for cross-lingual automatic speech recognition Amit Das Abstract —In this study, two instance based transfer learning languages. With this, all language dependent transcriptions can phoneme modeling approaches are presented to mitigate the be converted to the WORLDBET convention. Therefore, this effects of limited data in a target language using data from richly represents a sematic way of handling multilingual phoneme resourced source languages. In the first approach, a maximum units. All the transcriptions and speech files from different likelihood (ML) learning criterion is introduced to learn the language corpora are pooled together into one single global model parameters of a given phoneme class using data from multilingual corpus. HMM training can be performed on this both the target and source languages. In the second approach, a hybrid learning criterion is introduced using the ML of the global corpus to form language independent acoustic models. target data and the maximum mutual information (MMI) of the The main disadvantage of this approach is that sometimes training data and the phoneme class labels. This not only takes subtle language dependent variations might be lost during into account increasing the ML estimates of the models using the mapping procedure. For example, monolingual phonemes data from both target and source languages but also improves the discriminative ability of the estimated models using incorrect for the alveolar “r” and palato-alveolar “r” sound differently phoneme class labels. but they might be represented with the same symbol in two Index Terms —Transfer learning, maximum likelihood, maxi- different languages. After mapping to WORLDBET, both the mum mutual information phonemes will be mapped to the same symbol thereby blurring the distinct language properties. The second approach is a data-driven approach as opposed I. I NTRODUCTION to the sematic approach described earlier. Here, the phonemes W ITH the widespread use of hands-free electronic gad- are mapped to a multilingual set using a bottom-up clus- gets, speech applications has been gaining more impor- tering procedure based on log-likelihood distance measure tance throughout the world. The utility of speech technologies [3] between two phoneme models. The models with least like automatic speech recognition (ASR) in these gadgets is distances are merged together to form a new cluster. Because dependent on the versatility of ASR systems across users who the estimation of the new phone models of the merged cluster speak different languages depending on which part of the is difficult to achieve, the distance between the two clusters is world they belong to. Hidden Markov Models (HMMs) have computed as the maximum of all distances found by pairing a gained the widest acceptance in building ASR systems. Ideally, phone model in the first cluster versus another phone model in language dependent or monolingual HMMs can be deployed in the second cluster. This “furthest-neighbor” merging heuristic electronic gadgets where they are expected to be used by a ma- was used to encourage compact clusters and was known to jority of the population speaking the most common language. work well empirically. The clustering process continues until Although feasible, this is not commercially attractive for two all calculated cluster distances are higher than a pre-defined reasons. Firstly, data collection of a specific language is a distance threshold or if a specified number of clusters have time consuming and expensive process. Secondly, experienced been formed. The disadvantage with a data-driven approach transcribers who can mark word or phoneme boundaries with is that the phoneme models present in a single cluster lose a high degree of accuracy may be available only for a limited their original phonetic symbol and use a symbol that is the set of more popular languages like English. Hence, the need best representation for the cluster. Hence, it is possible that arises for building multilingual ASR systems and/or using models for the fricatives /s/ and /f/ might be fall in the same them for rapid adaptation to a new target (desired) language. cluster whose phonetic symbol may simply be denoted by /f/. In this section, first a brief overview of several techniques used Thus, /s/ loses its original semantic representation by using /f/ in building multilingual systems are explored followed by a as its identity which is misleading. brief explanation of some of the popular language adaptation The third approach is a hybrid of the semantic and data techniques. driven approaches. Here, all monolingual triphone HMMs A multilingual ASR system is sometimes known as lan- that have the same phonetic symbol for a given state (left, guage independent system since it is versatile across multi- center, or right) are pooled together. For example, the Gaussian ple languages. This implies that acoustic-phonetic similarities mixture densities of the phoneme /k/ in state 1 (left) of across languages must be exploited. In [1], multilingual phone “cat”, “cut”, “kin”, may be pooled together to form a pool modeling was achieved using three approaches. In the first and of mixture densities modeling the phoneme /k/. Clustering is the most obvious approach, given a set of corpora of multiple performed by taking the a weighted L1-norm of the difference languages, language dependent phonemes can be mapped to of all possible pairs of mean vectors present in this pool. a new mapping convention such as the WORLDBET [2] The motivation behind this is that performing clustering at that has a wide phonetic symbol coverage across multiple the level of mixture densities helps retain some distinctive

W ITH the widespread use of hands-free electronic gad- are mapped - PDF document

1 Transfer learning for cross-lingual automatic speech recognition Amit Das Abstract In this study, two instance based transfer learning languages. With this, all language dependent transcriptions can phoneme modeling approaches are

Entity Clustering Across Languages NAACL 2012 Montreal Spence Green* Nicholas Andrews #

JOINT TALK ON THREE DATA SUBMISSIONS TO TEXT ALIGNMENT AND ONE SOURCE RETRIEVAL ALGORITHM

MULTILINGUAL DOCUMENT CLASSICATION VIA TRANSDUCTIVE LEARNING Salvatore Romeo UNICAL

MASS: Masked Sequence to Sequence Pre-training for Language Generation Tao Qin Joint work with

Cross linguality and machine translation without bilingual data ith t bili l d t Enek

Fall Product Training _ _

Cross-Lingual Semantic Mapping of Authority Files Nadine Steinmetz, and Harald Sack November,

Pronunciation Extraction Through Cross-Lingual Word-to-Phoneme Alignment Felix Stahlberg, Tim

The Moment of Meaning The Moment of Meaning

Stacking With Auxiliary Features: Improved Ensembling for Natural Language and Vision Nazneen

WMT 2016 Shared Task on Cross-lingual Pronoun Prediction . Liane Guillou, Christian Hardmeier,

Cross-Lingual Word Sense Disambiguation using WordNets and Context Mapping Priyank Jaini Ankit

Automatic Machine Translation Evaluation using Source Language Inputs and Cross-lingual Language

A Method of Cross-Lingual Question-Answering Based on Machine Translation and Noun Phrase

MWE-WN Community discussion Florence, August 2, 2019 1 Agenda Feedback from the joint workshop

Using query transformation to improve Gnutella search performance Surendar Chandra

Investigating the role of the interviewer in a face-to-face (FTF) survey in Zambia MPSM / JPSM

How to get started in L A T EX Florence Bouvet 2 Introduction L A T EX is a document

Cross-lingual NLP Sara Stymne Uppsala University Department of Linguistics and Philology

COMMUNITY FORUMS: PROVIDENCE PUBLIC SCHOOLS REVIEW Commissioner Infante-Green OBJECTIVES

Event Model for Auto Video Search TRECVID 2005 Search by NUS PRIS Tat-Seng Chua, Shi-Yong Neo,

CS344: Introduction to Artificial CS344: Introduction to Artificial Intelligence Vishal Vachhani

FacetE: Exploiting Web Tables for Domain-Specific Word Embedding Evaluation Michael Gnther ,

Reactive Programming Models for IoT Todd L. Montgomery @toddlmontgomery Psst! Already Here! Not

W ITH the widespread use of hands-free electronic gad- are mapped - PDF document

1 Transfer learning for cross-lingual automatic speech recognition Amit Das Abstract In this study, two instance based transfer learning languages. With this, all language dependent transcriptions can phoneme modeling approaches are

Entity Clustering Across Languages NAACL 2012 Montreal Spence Green* Nicholas Andrews #

JOINT TALK ON THREE DATA SUBMISSIONS TO TEXT ALIGNMENT AND ONE SOURCE RETRIEVAL ALGORITHM

MULTILINGUAL DOCUMENT CLASSICATION VIA TRANSDUCTIVE LEARNING Salvatore Romeo UNICAL

MASS: Masked Sequence to Sequence Pre-training for Language Generation Tao Qin Joint work with

Cross linguality and machine translation without bilingual data ith t bili l d t Enek

Fall Product Training ___________________________________ ___________________________________

Cross-Lingual Semantic Mapping of Authority Files Nadine Steinmetz, and Harald Sack November,

Pronunciation Extraction Through Cross-Lingual Word-to-Phoneme Alignment Felix Stahlberg, Tim

The Moment of Meaning The Moment of Meaning

Stacking With Auxiliary Features: Improved Ensembling for Natural Language and Vision Nazneen

WMT 2016 Shared Task on Cross-lingual Pronoun Prediction . Liane Guillou, Christian Hardmeier,

Cross-Lingual Word Sense Disambiguation using WordNets and Context Mapping Priyank Jaini Ankit

Automatic Machine Translation Evaluation using Source Language Inputs and Cross-lingual Language

A Method of Cross-Lingual Question-Answering Based on Machine Translation and Noun Phrase

MWE-WN Community discussion Florence, August 2, 2019 1 Agenda Feedback from the joint workshop

Using query transformation to improve Gnutella search performance Surendar Chandra

Investigating the role of the interviewer in a face-to-face (FTF) survey in Zambia MPSM / JPSM

How to get started in L A T EX Florence Bouvet 2 Introduction L A T EX is a document

Cross-lingual NLP Sara Stymne Uppsala University Department of Linguistics and Philology

COMMUNITY FORUMS: PROVIDENCE PUBLIC SCHOOLS REVIEW Commissioner Infante-Green OBJECTIVES

Event Model for Auto Video Search TRECVID 2005 Search by NUS PRIS Tat-Seng Chua, Shi-Yong Neo,

CS344: Introduction to Artificial CS344: Introduction to Artificial Intelligence Vishal Vachhani

FacetE: Exploiting Web Tables for Domain-Specific Word Embedding Evaluation Michael Gnther ,

Reactive Programming Models for IoT Todd L. Montgomery @toddlmontgomery Psst! Already Here! Not

Fall Product Training _ _