[PPT] - From phonetics to speech technology Einar Meister Laboratory of PowerPoint Presentation

SLIDE 1

From phonetics to speech technology

Einar Meister Laboratory of Phonetics and Speech Technology Institute of Cybernetics Tallinn University of Technology

SLIDE 2

Introduction

Spoken language communication Progress in ASR Multi-disciplinary approach Projects at our lab Some future plans Co-operation

SLIDE 3

Complexity of spoken language processing

Human speech

communication is … “the most sophisticated “the most sophisticated behaviour of the most behaviour of the most complex organism in complex organism in the known universe” the known universe”

Prof Prof R.Moore R.Moore, University of , University of Sheffield Sheffield

SLIDE 4

Complexity of spoken language processing

There is huge and diverse literature describing human speech

processing behaviour

Many different disciplines are involved Most knowledge is based on indirect observation More is known about the peripheral auditory and articulatory

systems than the higher level phonetic, linguistic and cognitive processes

Research is fragmented across different levels of human SLP Models tend to address single aspects of human SLP behaviour There is little integration between models Many models are descriptive rather than computational

SLIDE 5

Progress in ASR

Substantial progress has taken

place in the past 20 years

Dragon “Naturally Speaking

10” Large Vocabulary Continuous Speech Recognition (LVSR) is available in 11 languages:

American English
Australian English
Southern Asian English
Indian English
UK English
Teen English
Dutch
French
German
Italian
Spanish
Up to 99% Accurate and Three

Times Faster than Typing

Price: $99-$349

SLIDE 6

Progress in ASR

MS Windows Vista

ffers ASR in 8

languages:

English (United States) English (United Kingdom) German French Spanish Japanese Traditional Chinese Simplified Chinese

http://www.youtube.com/watc

h?v=2Y_Jp6PxsSQ (July 29, 2006)

http://www.youtube.com/watc

h?v=KyLqUf4cdwc (February 10, 2007)

http://www.microsoft.com/ena

ble/demos/windowsvista/spee chdemo.aspx

SLIDE 7

Progress in ASR

Progress has NOT achieved as a result of

deep insights into SLP by humans

Improvements have come from:

extensive use of statistical learning algorithms

(data-driven approach)

availability of a number of large collections of

speech and text corpora

increase in computer power

SLIDE 8

Need for multi-disciplinary approach

ARTIFICIAL INTELLIGENCE ENGINEERING PSYCHOLOGY LINGUISTICS

SPOKEN LANGUAGE PROCESSING

Computational Linguistics Human- Computer Interaction Pattern Processing Psycho- Linguistics

Natural Language Proc. Information Retrieval Cognitive Science Dialogue

COGNITIVE INFORMATICS

SLIDE 9

Need for multi-disciplinary approach

Chin-Hui Lee (Georgia Institute of Technology, Atlanta, USA): From Knowledge-Ignorant to Knowledge-Rich Modelling: A New Speech Research Paradigm for Next Generation Automatic Speech Recognition (ICSLP 2004) Knowledge-Ignorant Modelling – there’s no data like more data Knowledge-Rich Modelling:

Sound-specific features – in addition to spectral (cepstral)

features different other acoustic-phonetic features should be used: duration, loudness, F0, etc

Keyword recognition and phrase verification
Human-like speech processing models

SLIDE 10

Laboratory of Phonetics and Speech Technology

Speech research at IoC since 1960ies, Lab. of

Phonetics and Speech Tech since 1990

Mission: research on Estonian phonetics and

speech technology

Partner in:

eVikings 2 project (2002-2005) NordForsk VISPP-network (2004-2005) Doctoral School of Linguistics and Language Technology at

the University of Tartu (2005-2008)

National Programme for Estonian Language Technology

(2006-2010)

SLIDE 11

Laboratory of Phonetics and Speech Technology

Staff:
Einar Meister:
head of the laboratory, senior researcher
MSc (1998) in system engineering, PhD (2003) in general linguistics
experimental phonetics, speech synthesis, speech databases
Tanel Alumäe:
senior researcher, currently post-doc at LIMSI (France)
PhD (2006) in computer science
speech recognition, language modelling, spoken document retrieval, dialogue

systems

Toomas Kirt:
researcher
PhD (2007) in computer science
data processing, neural networks, pattern recognition
Lya Meister:
researcher
MA (2004) in linguistics, doctoral student at Tartu University
experimental phonetics, foreign accent, speech corpora
Temporary staff: 2-3 (1 doctoral student)

SLIDE 12

Projects funded by the National Programme for Estonian Language Technology

1. Research and development of methods for Estonian

speech recognition (T.Alumäe)

Main tasks:

determining optimal basic lexical units for Estonian LVCSR development of statistical language modeling techniques applying of acoustic model adaptation techniques delivering optimal solutions for development of medium-vocabulary

speech recognition systems

development of methods and algorithms for large/unlimited

vocabulary speech recognition systems

implementation of speech recognition prototype systems

Current results:

software for automatic segmentation of speech signal prototype for large vocabulary speech recognition

SLIDE 13

Projects funded by the National Programme for Estonian Language Technology

2. Speech analysis and speech variability modelling

(E.Meister)

Main tasks:

microprosody – acoustic and perceptual analysis of intrinsic durations

and fundamental frequency

macroprosody – temporal organisation and acoustic, lexical and

syntactic features of spontaneous (lecture) speech

acoustics and perception of foreign accent in Estonian

SLIDE 14

Projects funded by the National Programme for Estonian Language Technology

3. Speech resources and databases (E.Meister)

Main task:

recording, segmentation and labelling of different speech corpora

for acoustic studies and speech technology

development of infrastructure for speech data storage, access and

management

Under development:

Accent corpus – recordings of Estonian spoken as foreign language Corpus of lecture speech – recordings of academic lectures, public

talks, conference presentations, etc

News corpus – recordings of radio news

SLIDE 15

Past projects

SpeechDat-like Estonian speech database

(2000-2003)

Estonian Text-to-Speech Synthesiser (2000-

2002) in co-operation with:

Institute of the Estonian Language Filosoft Ltd.

Dialogue interface to a theatre information

database (2002-2004) in co-operation with Tartu University

SLIDE 16

Future plans

Audio-visual speech synthesis

Classification of Estonian visemes (E.Liba at UT) A model of a talking head (M.Rei at TUT)

SLIDE 17

Co-operation with industry

Several projects in past:

EMT, ELION, Tele2 – SpeechDat database recordings Skype – speech quality assessment

Industry is ready to buy (almost) complete solution,

not willing to invest into research

Don’t try to sell lab prototypes – there is a huge gap

between a prototype and real application

To cover the gap a lot of funding for development

phase is necessary!

SLIDE 18

From phonetics to speech technology Einar Meister Laboratory of - - PowerPoint PPT Presentation

From phonetics to speech technology

Einar Meister Laboratory of Phonetics and Speech Technology Institute of Cybernetics Tallinn University of Technology

Introduction

Complexity of spoken language processing

communication is … “the most sophisticated “the most sophisticated behaviour of the most behaviour of the most complex organism in complex organism in the known universe” the known universe”

Complexity of spoken language processing

Progress in ASR

Progress in ASR

languages:

Progress in ASR

deep insights into SLP by humans

(data-driven approach)

speech and text corpora

Need for multi-disciplinary approach

Need for multi-disciplinary approach

Laboratory of Phonetics and Speech Technology

Phonetics and Speech Tech since 1990

speech technology

Laboratory of Phonetics and Speech Technology

Projects funded by the National Programme for Estonian Language Technology

Projects funded by the National Programme for Estonian Language Technology

Projects funded by the National Programme for Estonian Language Technology

Past projects

(2000-2003)

2002) in co-operation with:

database (2002-2004) in co-operation with Tartu University

Future plans

Co-operation with industry

not willing to invest into research

between a prototype and real application

phase is necessary!

Thanks!