From phonetics to speech technology Einar Meister Laboratory of - - PowerPoint PPT Presentation
From phonetics to speech technology Einar Meister Laboratory of - - PowerPoint PPT Presentation
From phonetics to speech technology Einar Meister Laboratory of Phonetics and Speech Technology Institute of Cybernetics Tallinn University of Technology Introduction Spoken language communication Progress in ASR
Introduction
Spoken language communication Progress in ASR Multi-disciplinary approach Projects at our lab Some future plans Co-operation
Complexity of spoken language processing
Human speech
communication is … “the most sophisticated “the most sophisticated behaviour of the most behaviour of the most complex organism in complex organism in the known universe” the known universe”
Prof Prof R.Moore R.Moore, University of , University of Sheffield Sheffield
Complexity of spoken language processing
There is huge and diverse literature describing human speech
processing behaviour
Many different disciplines are involved Most knowledge is based on indirect observation More is known about the peripheral auditory and articulatory
systems than the higher level phonetic, linguistic and cognitive processes
Research is fragmented across different levels of human SLP Models tend to address single aspects of human SLP behaviour There is little integration between models Many models are descriptive rather than computational
Progress in ASR
- Substantial progress has taken
place in the past 20 years
- Dragon “Naturally Speaking
10” Large Vocabulary Continuous Speech Recognition (LVSR) is available in 11 languages:
- American English
- Australian English
- Southern Asian English
- Indian English
- UK English
- Teen English
- Dutch
- French
- German
- Italian
- Spanish
- Up to 99% Accurate and Three
Times Faster than Typing
- Price: $99-$349
Progress in ASR
MS Windows Vista
- ffers ASR in 8
languages:
English (United States) English (United Kingdom) German French Spanish Japanese Traditional Chinese Simplified Chinese
http://www.youtube.com/watc
h?v=2Y_Jp6PxsSQ (July 29, 2006)
http://www.youtube.com/watc
h?v=KyLqUf4cdwc (February 10, 2007)
http://www.microsoft.com/ena
ble/demos/windowsvista/spee chdemo.aspx
Progress in ASR
Progress has NOT achieved as a result of
deep insights into SLP by humans
Improvements have come from:
extensive use of statistical learning algorithms
(data-driven approach)
availability of a number of large collections of
speech and text corpora
increase in computer power
Need for multi-disciplinary approach
ARTIFICIAL INTELLIGENCE ENGINEERING PSYCHOLOGY LINGUISTICS
SPOKEN LANGUAGE PROCESSING
Computational Linguistics Human- Computer Interaction Pattern Processing Psycho- Linguistics
Natural Language Proc. Information Retrieval Cognitive Science Dialogue
COGNITIVE INFORMATICS
Need for multi-disciplinary approach
Chin-Hui Lee (Georgia Institute of Technology, Atlanta, USA): From Knowledge-Ignorant to Knowledge-Rich Modelling: A New Speech Research Paradigm for Next Generation Automatic Speech Recognition (ICSLP 2004) Knowledge-Ignorant Modelling – there’s no data like more data Knowledge-Rich Modelling:
- Sound-specific features – in addition to spectral (cepstral)
features different other acoustic-phonetic features should be used: duration, loudness, F0, etc
- Keyword recognition and phrase verification
- Human-like speech processing models
Laboratory of Phonetics and Speech Technology
Speech research at IoC since 1960ies, Lab. of
Phonetics and Speech Tech since 1990
Mission: research on Estonian phonetics and
speech technology
Partner in:
eVikings 2 project (2002-2005) NordForsk VISPP-network (2004-2005) Doctoral School of Linguistics and Language Technology at
the University of Tartu (2005-2008)
National Programme for Estonian Language Technology
(2006-2010)
Laboratory of Phonetics and Speech Technology
- Staff:
- Einar Meister:
- head of the laboratory, senior researcher
- MSc (1998) in system engineering, PhD (2003) in general linguistics
- experimental phonetics, speech synthesis, speech databases
- Tanel Alumäe:
- senior researcher, currently post-doc at LIMSI (France)
- PhD (2006) in computer science
- speech recognition, language modelling, spoken document retrieval, dialogue
systems
- Toomas Kirt:
- researcher
- PhD (2007) in computer science
- data processing, neural networks, pattern recognition
- Lya Meister:
- researcher
- MA (2004) in linguistics, doctoral student at Tartu University
- experimental phonetics, foreign accent, speech corpora
- Temporary staff: 2-3 (1 doctoral student)
Projects funded by the National Programme for Estonian Language Technology
- 1. Research and development of methods for Estonian
speech recognition (T.Alumäe)
Main tasks:
determining optimal basic lexical units for Estonian LVCSR development of statistical language modeling techniques applying of acoustic model adaptation techniques delivering optimal solutions for development of medium-vocabulary
speech recognition systems
development of methods and algorithms for large/unlimited
vocabulary speech recognition systems
implementation of speech recognition prototype systems
Current results:
software for automatic segmentation of speech signal prototype for large vocabulary speech recognition
Projects funded by the National Programme for Estonian Language Technology
- 2. Speech analysis and speech variability modelling
(E.Meister)
Main tasks:
- microprosody – acoustic and perceptual analysis of intrinsic durations
and fundamental frequency
- macroprosody – temporal organisation and acoustic, lexical and
syntactic features of spontaneous (lecture) speech
- acoustics and perception of foreign accent in Estonian
Projects funded by the National Programme for Estonian Language Technology
- 3. Speech resources and databases (E.Meister)
Main task:
recording, segmentation and labelling of different speech corpora
for acoustic studies and speech technology
development of infrastructure for speech data storage, access and
management
Under development:
Accent corpus – recordings of Estonian spoken as foreign language Corpus of lecture speech – recordings of academic lectures, public
talks, conference presentations, etc
News corpus – recordings of radio news
Past projects
SpeechDat-like Estonian speech database
(2000-2003)
Estonian Text-to-Speech Synthesiser (2000-
2002) in co-operation with:
Institute of the Estonian Language Filosoft Ltd.
Dialogue interface to a theatre information
database (2002-2004) in co-operation with Tartu University
Future plans
Audio-visual speech synthesis
Classification of Estonian visemes (E.Liba at UT) A model of a talking head (M.Rei at TUT)
Co-operation with industry
Several projects in past:
EMT, ELION, Tele2 – SpeechDat database recordings Skype – speech quality assessment
Industry is ready to buy (almost) complete solution,
not willing to invest into research
Don’t try to sell lab prototypes – there is a huge gap
between a prototype and real application
To cover the gap a lot of funding for development