Speech Processing Course Number: 40967 Semester: 1397-2 - - PowerPoint PPT Presentation
Speech Processing Course Number: 40967 Semester: 1397-2 - - PowerPoint PPT Presentation
Speech Processing Course Number: 40967 Semester: 1397-2 Instructor: Hossein Sameti Room CE706 sameti@sharif.edu Home page: CE courses 2 Speech Processing: Review of DSP Concepts Review of Probability and
Course Number: 40967
Semester: 1397-2
Instructor: Hossein Sameti
Room CE706
sameti@sharif.edu
Home page: CE courses
2
3
Speech Processing:
Review of DSP Concepts Review of Probability and Stochastic Processes Anatomy and Physiology of Speech Production
System
Phonemics and Phonetics Spectrogram Reading Linear Prediction Analysis Speech Coding and Compression Speech Synthesis (Text to Speech) Speech Quality Assessment (Subjective and
Objective)
Speech Recognition (Speech to Text) Speech Enhancement
4
Speech Processing:
Marking Scheme:
Homeworks (written and programming): 20% Course Projects:
10%
Quizzes:
15%
Midterm:
25%
Final Exam:
30%
5
Speech Processing:
Text:
Spoken language processing
Huang, Acero, Hon, 2000
Introduction to Digital Speech Processing
Lawrence R. Rabiner and Ronald W. Schafer, 2007
Discrete time processing of speech Signals
Deller,Proakis,Hansen,1993
Fundamentals of speech recognition
Rabiner,Juang,1993 Password for any documents for the course:
40967spring97
وطسرا:
تسا قطان ناويح ،ناسنا.
6
Old Speech Synthesizers
– Speech organ of Wheatstone, based on a system proposed by Wolfgang von Kempelen in 1791
7
Old Speech Synthesizers (cont’d)
– Speech organ of Joseph Faber (1830-40)
8
Old Speech Synthesizers (cont’d)
– Voder demonstrated in 1939
Source: http://www.ling.su.se/staff/hartmut/kemplne.htm
9
More modern labs
(ICP lab in Grenoble, France)
– Study of the face movements to be included in speech synthesis (and recognition).
10
Communication via Spoken Language
11
Communication via Spoken Language
12
Virtues of Spoken Language
Natural: Requires no special training Flexible: Leaves hands and eyes free Efficient: Has high data rate Economical: Communicated inexpensively Expressive: Conveys more than just words Popular/preferred: Verbal-acoustic problem solving Much longer evolution, compared to written language
13
Virtues of Spoken Language
Speech interfaces are ideal for
information access and management when:
The information space is broad and complex, The users are not allowed (or at ease or capable) to use
their eyes to read text messages,
The users are technically naive, or Only telephones are available.
14
Diverse Sources of Constraint for Spoken Language Communication
Acoustic: human vocal tract Phonetic: let us pray lettuce spray Phonological: gas shortage fish sandwich Phonotactic: sprachst (german) Syntactic: I am flying to Chicago tomorrow tomorrow I flying Chicago am to Semantic: Is the baby crying Is the bay bee crying Contextual: It is easy to recognize speech It is easy to wreck a nice beach
15
A Conversational System Architecture
16