Spoken Language Structure Hsin-min Wang References: - X. Huang et - PowerPoint PPT Presentation

Spoken Language Structure Hsin-min Wang References: - X. Huang et al., Spoken Language Processing, Chapter 2

Human Speech Communication � Spoken language is used to communicate information from a speaker to a listener. Speech production and perception ( 知覺 ) are both important components of the speech chains – Speech begins with a thought and intent to communicate in the brain, which activates muscular ( 肌肉的 ) movements to produce speech sounds – A listener receives it in the auditory system ( 聽覺系統 ), processing it for conversion to neurological signals ( 神經邏輯信號 ) the brain can understand – The speaker continuously monitors and controls the vocal organs ( 發聲器官 ) by receiving his or her own speech as feedback 2

Components of Human Speech Communication Speech Generation Speech Understanding Application Semantics, Message Formulation Message Comprehension Actions M Phone, Word, Language System Language System Prosody W Feature Extraction Neuromuscular Mapping Neural Transduction Articulatory A Parameter Vocal Tract System Cochlea ( 耳蝸 ) Motion Sound Speech Analysis Speech Generation 3

Speech Generation � Message Formulation : creates the concept (message) to be expressed � Language System : converts the message into a sequence of words – the pronunciation of the words (i.e., the phoneme sequence) – the prosodic pattern: duration of each phoneme, intonation( 語調 ) of the sentence, and loudness of the sounds � Neuromuscular ( 神經肌肉 ) Mapping : perform articulatory ( 發聲的 ) mapping to control the vocal cords ( 聲帶 ), lips ( 唇 ), jaw ( 顎 ), tongue ( 舌 ) and velum ( 軟顎 ) to produce the sound sequence 4

Speech Generation - Explanations � 首先要整理自己的思想，決定要說的訊息內容 ( Message Formulation ) � 把它們變為適當的語言形式，選擇適當的詞彙，按照某種語言的法則，組成詞句，以表達想說的訊息內容 ( 遣詞造句 ) ( Language System ) � 以生理神經式衝動的形式，沿運動神經傳播到聲帶、舌唇等器官的肌肉，驅動這些肌肉運動 ( Neuromuscular Mapping ) � 空氣發生壓力變化，經過聲腔的調節，從而產生出通常的語言聲波 ( Vocal Tract System ) 5

Speech Understanding � Cochlea ( 耳蝸 ) Motion : the signal is passed to the cochlea in the inner ear, which performs the frequency analysis as a filter bank � Neural Transduction ( 神經傳導 ) : converts the spectral signal into activity signals on the auditory nerve, corresponding roughly to a feature extraction component It’s unclear how neural activity is mapped into the language system and how message comprehension ( 理解 ) is achieved in the brain 6

From Sound to Phonetics and Phonology � Speech signals are composed of analog sound patterns that serve as the basis for a discrete, symbolic representation of the spoken language – phonemes ( 音素 ), syllables ( 音節 ) and words ( 詞 ) � The production and interpretation of these sounds are governed by the syntax ( 語法 ) and semantics ( 語意 ) of the language spoken � We will take a button-up approach to introduce the basic concepts from sound to phonetics ( 語音學 ) and phonology ( 音韻學 ) – Syllables and words are followed by syntax and semantics, which form the structure of spoken language processing � Contents of this part: – Sound and Human speech systems (Speech Production and Perception) – Phonetics and Phonology – Characteristics of the Chinese Language 7

Sound � Sound is a longitudinal ( 縱向的 ) pressure wave formed of compressions ( 壓縮 ) and rarefactions ( 稀疏 ) of air molecules ( 微粒 ), in a direction parallel to that of the application of energy � Compressions are zones where air molecules have been forced by the application of energy into a tighter-than- usual configuration � Rarefactions are zones where air molecules are less tightly packed 8

Sound (cont.) � The alternating configurations of compression and rarefaction of air molecules along the path of an energy source are sometimes described by the graph of a sine wave � The use of the sine graph is only a notational convenience for charting local pressure variations over time maximal rarefaction maximal compression crest trough 9

Measures of Sound � Amplitude is related to the degree of displacement of the molecules from their resting position – Measured on a logarithmic scale in decibels (dB, 分貝 ) – A decibel scale is a means for comparing the intensity ( 強度 ) of two sounds: ( ) 10 log I / I . I , I are two intensity levels 10 0 0 – The intensity is proportional to the square of the sound pressure P. The Sound Pressure Level (SPL) is a measure of the absolute sound pressure P in dB ( ) ( ) = SPL dB 20 log P / P 10 0 – The reference 0 dB corresponds to the threshold of hearing, which is P 0 =0.0002 μ bar for a tone of 1KHz • e.g. speech conversation level at 3 feet is about 60dB SPL, a jackhammer’s level is about 120 db SPL 手提鑿岩機 10

Measures of Sound (cont.) � Absolute threshold of hearing: the maximum amount of energy of a pure tone that cannot be detected by a listener in a noise free environment * 音爆 * 砲口 11

Speech Production – Articulation � Speech is produced by air-pressure waves emanating ( 發出 ) from the mouth and the nostrils ( 鼻孔 ) of a speaker � The inventory ( 清單 ) of phonemes ( 音素 ), the basic units of speech, can be split into two classes – Consonants ( 子音 ): articulated ( 發音 ) in the presence of constrictions ( 壓縮 ) in the throat or obstructions ( 阻礙 ) in the mouth (tongue, teeth, lips) as we speak – Vowels ( 母音 / 元音 ): articulated without major constrictions and obstructions 12

Speech Production – Articulation (cont.) � Lungs ( 肺 ): source of air during speech � Vocal cords ( larynx , 喉頭 ): when the vocal folds ( 聲帶 ) are held close together and oscillate one another during a speech sound, the sound is voiced, when the folds are too slack or tense to vibrate periodically, the sound is unvoiced. � Soft palate ( velum , 軟顎 ):allow passage of air through the nasal cavity (m,n) � Hard palate ( 硬顎 ): tongue placed on it to produce certain consonants � Teeth : braces ( 支撐 ) the tongue for certain consonants � Lips : round or spread to affect vowel quality, closed completely to stop the Rounded vowels: / u / oral air flow for certain consonants Spread -> / i / ( p,b,m ) 13

Speech Production - The Voicing Mechanism � Voiced sounds – Including vowels, have a roughly regular pattern in both time and frequency structures that voiceless sounds lack – Have more energy – When the vocal folds vibrate during phoneme articulation, the phoneme is considered voiced; otherwise it is unvoiced • Vocal folds’ vibration (60Hz (man) ~ 300 Hz (woman or child)) – The distinct vowel timbres (of a sound of an instrument, 音質 / 音色 ) are created by using the tongue and lips to shape the main oral resonance cavity in different ways 14

Speech Production - The Voicing Mechanism (cont.) � Voiced sounds (cont.) – The rate of cycling (opening and closing) of the vocal folds in the larynx during phonation of voiced sounds is called the fundamental frequency ( 基頻 ) • The fundamental frequency contributes more than any other single factor to the perception of pitch in speech • Use in tone recognition of tonal languages (e.g. Chinese) or as a measure of speaker identity or authenticity Fundamental frequency ~120Hz (1/8ms) 15

Speech Production - Pitch 細微的心理聲學家 16

Speech Production - Spectrogram Spectral analysis at a single time-point A short-term frequency analysis The darkness or lightness of a band indicates the relative amplitude or energy present at a given frequency The dark horizontal bands show the formants, which are the fundamental at natural resonances of the vocal tract cavity position 17

Speech Production - Formants � The resonances ( 共振 / 共鳴 ) of the cavities that are typical of particular articulator configurations (e.g. the different vowel timbres) are called formats ( 共振峰 ) 18

Explanations for Speech Production 人的發音器官可分三大部分 � 動力器官：呼吸系統及肌肉 – 呼吸運動肌肉良好的協調運動將胸腔中的空氣以穩定的壓力推出通過喉部，形成了發聲時的動力來源。 � 發聲器官：喉部的聲帶 – 喉部的聲帶經由喉內肌及周圍的其他肌肉共同的作用，形成特定的聲門組態。當動力源的氣流通過聲門時，帶動了柔軟的聲帶黏膜產生波動。根據此時的聲門組態，聲帶黏膜會產生特定頻率及形態的波動，使得通過的氣流受到規律的阻隔，產生了空氣的疏密波。 – 聲帶是發聲體本身，為語音提供主要的聲源。聲帶振動產生的一系列的脈衝 (impulses) ，是一種週期波，其頻譜含有大量的諧波 (harmonics) 成分，它們的頻率是基頻 (fundamental frequency) 的整數倍 19

Spoken Language Structure Hsin-min Wang References: - X. Huang et - PowerPoint PPT Presentation

Spoken Language Structure Hsin-min Wang References: - X. Huang et al., Spoken Language Processing, Chapter 2 Human Speech Communication Spoken language is used to communicate information from a speaker to a listener. Speech production and

Spoken Language Structure Berlin Chen 2004 References: - X. Huang et. al., Spoken Language

Spoken Language Structure Berlin Chen 2003 References: - X. Huang et. al., Spoken Language

Defining EBCL descriptors for Reception Spoken and Production Spoken Federica Casalin

Spoken and Sign Languages Spoken and Sign Languages A Cross Modal Study Purushottam Kar Achla

STANDARDS IN SPOKEN CORPORA OUTLINE (1) Case study: Spoken

Uncertainty in Spoken Uncertainty in Spoken Multimodal - speakers have intentions - speech,

THE SPOKEN BLESSING Numbers 6:22 27 Since the start of human history, the spoken blessing

Speech Processing 15-492/18-492 Spoken Dialog Systems Conversing with machines Spoken Dialog

Grading Quiz in Moodle Spoken Tutorial Project https://spoken-tutorial.org National Mission on

Grounding LING 575: Spoken Dialog Systems May 12 th , 2016 1 What is Grounding? Spoken Dialog

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Spoken Dialog Systems SDS

Speech Processing 15-492/18-492 Spoken Dialog Systems SDS components Spoken Dialog Systems More

Speech Processing 11-492/18-495 Speech Processing 11-492/18-495 Spoken Dialog Systems Conversing

Grading Quiz in Moodle Spoken Tutorial Project https://spoken-tutorial.org National Mission on

Speech Processing 15-492/18-492 Spoken Dialog Systems Advanced Concepts in Dialog Spoken Dialog

Some Open Challenges for Spoken Language Processing Lori Lamel CHIST-ERA Cork, September 6,

Sensory Processing Disorder Sensory Modulation Disorder - (Hyper / Hypo/ Seeking) Sensory

Human factors Ruth Aylett Topics Human senses and their limitations Sight Hearing

the human sensory, short-term , long-term Information processed and applied

Basic Acoustics Graduate School of Culture Technology (GSCT) Juhan Nam 1 Outlines What is

ICU: The Crystal Ball of Prognosis Jennifer A. Frontera, MD, FNCS Associate Professor of

Best Practices for Diagnosis Sadly, I have no conflicts of interest to disclose and

Public Blockchains Proof-of-Work Consortium & Private Blockchains Ekparinya et al,

Validation of Aura MLS stratospheric water vapor measurements by the NOAA frost point hygrometer

Sambuz

Useful Links

Newsletter

Mail Us

Spoken Language Structure Hsin-min Wang References: - X. Huang et - PowerPoint PPT Presentation

Spoken Language Structure Hsin-min Wang References: - X. Huang et al., Spoken Language Processing, Chapter 2 Human Speech Communication Spoken language is used to communicate information from a speaker to a listener. Speech production and

Spoken Language Structure Berlin Chen 2004 References: - X. Huang et. al., Spoken Language

Spoken Language Structure Berlin Chen 2003 References: - X. Huang et. al., Spoken Language

Defining EBCL descriptors for Reception Spoken and Production Spoken Federica Casalin

Spoken and Sign Languages Spoken and Sign Languages A Cross Modal Study Purushottam Kar Achla

STANDARDS IN SPOKEN CORPORA OUTLINE (1) Case study: Spoken

Uncertainty in Spoken Uncertainty in Spoken Multimodal - speakers have intentions - speech,

THE SPOKEN BLESSING Numbers 6:22 27 Since the start of human history, the spoken blessing

Speech Processing 15-492/18-492 Spoken Dialog Systems Conversing with machines Spoken Dialog

Grading Quiz in Moodle Spoken Tutorial Project https://spoken-tutorial.org National Mission on

Grounding LING 575: Spoken Dialog Systems May 12 th , 2016 1 What is Grounding? Spoken Dialog

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Spoken Dialog Systems SDS

Speech Processing 15-492/18-492 Spoken Dialog Systems SDS components Spoken Dialog Systems More

Speech Processing 11-492/18-495 Speech Processing 11-492/18-495 Spoken Dialog Systems Conversing

Grading Quiz in Moodle Spoken Tutorial Project https://spoken-tutorial.org National Mission on

Speech Processing 15-492/18-492 Spoken Dialog Systems Advanced Concepts in Dialog Spoken Dialog

Some Open Challenges for Spoken Language Processing Lori Lamel CHIST-ERA Cork, September 6,

Sensory Processing Disorder Sensory Modulation Disorder - (Hyper / Hypo/ Seeking) Sensory

Human factors Ruth Aylett Topics Human senses and their limitations Sight Hearing

the human sensory, short-term , long-term Information processed and applied

Basic Acoustics Graduate School of Culture Technology (GSCT) Juhan Nam 1 Outlines What is

ICU: The Crystal Ball of Prognosis Jennifer A. Frontera, MD, FNCS Associate Professor of

Best Practices for Diagnosis Sadly, I have no conflicts of interest to disclose and

Public Blockchains Proof-of-Work Consortium &amp; Private Blockchains Ekparinya et al,

Validation of Aura MLS stratospheric water vapor measurements by the NOAA frost point hygrometer

Sambuz

Useful Links

Newsletter

Mail Us

Public Blockchains Proof-of-Work Consortium & Private Blockchains Ekparinya et al,