From phonetics to speech technology Einar Meister Laboratory of - - PowerPoint PPT Presentation

from phonetics to speech technology
SMART_READER_LITE
LIVE PREVIEW

From phonetics to speech technology Einar Meister Laboratory of - - PowerPoint PPT Presentation

From phonetics to speech technology Einar Meister Laboratory of Phonetics and Speech Technology Institute of Cybernetics Tallinn University of Technology Introduction Spoken language communication Progress in ASR


slide-1
SLIDE 1

From phonetics to speech technology

Einar Meister Laboratory of Phonetics and Speech Technology Institute of Cybernetics Tallinn University of Technology

slide-2
SLIDE 2

Introduction

Spoken language communication Progress in ASR Multi-disciplinary approach Projects at our lab Some future plans Co-operation

slide-3
SLIDE 3

Complexity of spoken language processing

Human speech

communication is … “the most sophisticated “the most sophisticated behaviour of the most behaviour of the most complex organism in complex organism in the known universe” the known universe”

Prof Prof R.Moore R.Moore, University of , University of Sheffield Sheffield

slide-4
SLIDE 4

Complexity of spoken language processing

There is huge and diverse literature describing human speech

processing behaviour

Many different disciplines are involved Most knowledge is based on indirect observation More is known about the peripheral auditory and articulatory

systems than the higher level phonetic, linguistic and cognitive processes

Research is fragmented across different levels of human SLP Models tend to address single aspects of human SLP behaviour There is little integration between models Many models are descriptive rather than computational

slide-5
SLIDE 5

Progress in ASR

  • Substantial progress has taken

place in the past 20 years

  • Dragon “Naturally Speaking

10” Large Vocabulary Continuous Speech Recognition (LVSR) is available in 11 languages:

  • American English
  • Australian English
  • Southern Asian English
  • Indian English
  • UK English
  • Teen English
  • Dutch
  • French
  • German
  • Italian
  • Spanish
  • Up to 99% Accurate and Three

Times Faster than Typing

  • Price: $99-$349
slide-6
SLIDE 6

Progress in ASR

MS Windows Vista

  • ffers ASR in 8

languages:

English (United States) English (United Kingdom) German French Spanish Japanese Traditional Chinese Simplified Chinese

http://www.youtube.com/watc

h?v=2Y_Jp6PxsSQ (July 29, 2006)

http://www.youtube.com/watc

h?v=KyLqUf4cdwc (February 10, 2007)

http://www.microsoft.com/ena

ble/demos/windowsvista/spee chdemo.aspx

slide-7
SLIDE 7

Progress in ASR

Progress has NOT achieved as a result of

deep insights into SLP by humans

Improvements have come from:

extensive use of statistical learning algorithms

(data-driven approach)

availability of a number of large collections of

speech and text corpora

increase in computer power

slide-8
SLIDE 8

Need for multi-disciplinary approach

ARTIFICIAL INTELLIGENCE ENGINEERING PSYCHOLOGY LINGUISTICS

SPOKEN LANGUAGE PROCESSING

Computational Linguistics Human- Computer Interaction Pattern Processing Psycho- Linguistics

Natural Language Proc. Information Retrieval Cognitive Science Dialogue

COGNITIVE INFORMATICS

slide-9
SLIDE 9

Need for multi-disciplinary approach

Chin-Hui Lee (Georgia Institute of Technology, Atlanta, USA): From Knowledge-Ignorant to Knowledge-Rich Modelling: A New Speech Research Paradigm for Next Generation Automatic Speech Recognition (ICSLP 2004) Knowledge-Ignorant Modelling – there’s no data like more data Knowledge-Rich Modelling:

  • Sound-specific features – in addition to spectral (cepstral)

features different other acoustic-phonetic features should be used: duration, loudness, F0, etc

  • Keyword recognition and phrase verification
  • Human-like speech processing models
slide-10
SLIDE 10

Laboratory of Phonetics and Speech Technology

Speech research at IoC since 1960ies, Lab. of

Phonetics and Speech Tech since 1990

Mission: research on Estonian phonetics and

speech technology

Partner in:

eVikings 2 project (2002-2005) NordForsk VISPP-network (2004-2005) Doctoral School of Linguistics and Language Technology at

the University of Tartu (2005-2008)

National Programme for Estonian Language Technology

(2006-2010)

slide-11
SLIDE 11

Laboratory of Phonetics and Speech Technology

  • Staff:
  • Einar Meister:
  • head of the laboratory, senior researcher
  • MSc (1998) in system engineering, PhD (2003) in general linguistics
  • experimental phonetics, speech synthesis, speech databases
  • Tanel Alumäe:
  • senior researcher, currently post-doc at LIMSI (France)
  • PhD (2006) in computer science
  • speech recognition, language modelling, spoken document retrieval, dialogue

systems

  • Toomas Kirt:
  • researcher
  • PhD (2007) in computer science
  • data processing, neural networks, pattern recognition
  • Lya Meister:
  • researcher
  • MA (2004) in linguistics, doctoral student at Tartu University
  • experimental phonetics, foreign accent, speech corpora
  • Temporary staff: 2-3 (1 doctoral student)
slide-12
SLIDE 12

Projects funded by the National Programme for Estonian Language Technology

  • 1. Research and development of methods for Estonian

speech recognition (T.Alumäe)

Main tasks:

determining optimal basic lexical units for Estonian LVCSR development of statistical language modeling techniques applying of acoustic model adaptation techniques delivering optimal solutions for development of medium-vocabulary

speech recognition systems

development of methods and algorithms for large/unlimited

vocabulary speech recognition systems

implementation of speech recognition prototype systems

Current results:

software for automatic segmentation of speech signal prototype for large vocabulary speech recognition

slide-13
SLIDE 13

Projects funded by the National Programme for Estonian Language Technology

  • 2. Speech analysis and speech variability modelling

(E.Meister)

Main tasks:

  • microprosody – acoustic and perceptual analysis of intrinsic durations

and fundamental frequency

  • macroprosody – temporal organisation and acoustic, lexical and

syntactic features of spontaneous (lecture) speech

  • acoustics and perception of foreign accent in Estonian
slide-14
SLIDE 14

Projects funded by the National Programme for Estonian Language Technology

  • 3. Speech resources and databases (E.Meister)

Main task:

recording, segmentation and labelling of different speech corpora

for acoustic studies and speech technology

development of infrastructure for speech data storage, access and

management

Under development:

Accent corpus – recordings of Estonian spoken as foreign language Corpus of lecture speech – recordings of academic lectures, public

talks, conference presentations, etc

News corpus – recordings of radio news

slide-15
SLIDE 15

Past projects

SpeechDat-like Estonian speech database

(2000-2003)

Estonian Text-to-Speech Synthesiser (2000-

2002) in co-operation with:

Institute of the Estonian Language Filosoft Ltd.

Dialogue interface to a theatre information

database (2002-2004) in co-operation with Tartu University

slide-16
SLIDE 16

Future plans

Audio-visual speech synthesis

Classification of Estonian visemes (E.Liba at UT) A model of a talking head (M.Rei at TUT)

slide-17
SLIDE 17

Co-operation with industry

Several projects in past:

EMT, ELION, Tele2 – SpeechDat database recordings Skype – speech quality assessment

Industry is ready to buy (almost) complete solution,

not willing to invest into research

Don’t try to sell lab prototypes – there is a huge gap

between a prototype and real application

To cover the gap a lot of funding for development

phase is necessary!

slide-18
SLIDE 18

Thanks!