Spoken Language Structure Hsin-min Wang References: - X. Huang et - - PowerPoint PPT Presentation

spoken language structure
SMART_READER_LITE
LIVE PREVIEW

Spoken Language Structure Hsin-min Wang References: - X. Huang et - - PowerPoint PPT Presentation

Spoken Language Structure Hsin-min Wang References: - X. Huang et al., Spoken Language Processing, Chapter 2 Human Speech Communication Spoken language is used to communicate information from a speaker to a listener. Speech production and


slide-1
SLIDE 1

Spoken Language Structure

Hsin-min Wang

References:

  • X. Huang et al., Spoken Language Processing, Chapter 2
slide-2
SLIDE 2

2

Human Speech Communication

Spoken language is used to communicate information from a speaker to a listener. Speech production and perception (知覺) are both important components of the speech chains

– Speech begins with a thought and intent to communicate in the brain, which activates muscular (肌肉的) movements to produce speech sounds – A listener receives it in the auditory system (聽覺系統), processing it for conversion to neurological signals (神經邏輯信 號) the brain can understand – The speaker continuously monitors and controls the vocal

  • rgans (發聲器官) by receiving his or her own speech as

feedback

slide-3
SLIDE 3

3

Components of Human Speech Communication

Message Formulation Message Comprehension Language System Language System Neuromuscular Mapping Neural Transduction Vocal Tract System Cochlea (耳蝸) Motion Speech Analysis Speech Generation Articulatory Parameter Feature Extraction Phone, Word, Prosody Application Semantics, Actions

Speech Generation Speech Understanding M W A Sound

slide-4
SLIDE 4

4

Speech Generation

Message Formulation: creates the concept (message) to be expressed Language System: converts the message into a sequence of words

– the pronunciation of the words (i.e., the phoneme sequence) – the prosodic pattern: duration of each phoneme, intonation(語調)

  • f the sentence, and loudness of the sounds

Neuromuscular (神經肌肉) Mapping: perform articulatory (發聲的) mapping to control the vocal cords (聲帶), lips (唇), jaw (顎), tongue (舌) and velum (軟顎) to produce the sound sequence

slide-5
SLIDE 5

5

Speech Generation - Explanations

首先要整理自己的思想,決定要說的訊息內容 (Message Formulation) 把它們變為適當的語言形式,選擇適當的詞彙,按照某種 語言的法則,組成詞句,以表達想說的訊息內容 (遣詞造 句) (Language System) 以生理神經式衝動的形式,沿運動神經傳播到聲帶、舌唇 等器官的肌肉,驅動這些肌肉運動 (Neuromuscular Mapping) 空氣發生壓力變化,經過聲腔的調節,從而產生出通常的 語言聲波 (Vocal Tract System)

slide-6
SLIDE 6

6

Speech Understanding

Cochlea (耳蝸) Motion: the signal is passed to the cochlea in the inner ear, which performs the frequency analysis as a filter bank Neural Transduction (神經傳導): converts the spectral signal into activity signals on the auditory nerve, corresponding roughly to a feature extraction component It’s unclear how neural activity is mapped into the language system and how message comprehension (理 解) is achieved in the brain

slide-7
SLIDE 7

7

From Sound to Phonetics and Phonology

Speech signals are composed of analog sound patterns that serve as the basis for a discrete, symbolic representation of the spoken language – phonemes (音素), syllables (音節) and words (詞) The production and interpretation of these sounds are governed by the syntax (語法) and semantics (語意) of the language spoken We will take a button-up approach to introduce the basic concepts from sound to phonetics (語音學) and phonology (音韻學)

– Syllables and words are followed by syntax and semantics, which form the structure of spoken language processing

Contents of this part:

– Sound and Human speech systems (Speech Production and Perception) – Phonetics and Phonology – Characteristics of the Chinese Language

slide-8
SLIDE 8

8

Sound

Sound is a longitudinal (縱向的) pressure wave formed of compressions (壓縮) and rarefactions (稀疏) of air molecules (微粒), in a direction parallel to that of the application of energy Compressions are zones where air molecules have been forced by the application

  • f energy into a tighter-than-

usual configuration Rarefactions are zones where air molecules are less tightly packed

slide-9
SLIDE 9

9

Sound (cont.)

The alternating configurations of compression and rarefaction of air molecules along the path of an energy source are sometimes described by the graph of a sine wave The use of the sine graph is only a notational convenience for charting local pressure variations over time

crest maximal compression trough maximal rarefaction

slide-10
SLIDE 10

10

Measures of Sound

Amplitude is related to the degree of displacement of the molecules from their resting position

– Measured on a logarithmic scale in decibels (dB, 分貝) – A decibel scale is a means for comparing the intensity (強度) of two sounds: – The intensity is proportional to the square of the sound pressure

  • P. The Sound Pressure Level (SPL) is a measure of the

absolute sound pressure P in dB – The reference 0 dB corresponds to the threshold of hearing, which is P0=0.0002μbar for a tone of 1KHz

  • e.g. speech conversation level at 3 feet is about 60dB SPL, a

jackhammer’s level is about 120 db SPL

( )

levels intensity two are , . / log 10

10

I I I I

( ) ( )

10

P / P log 20 dB SPL =

手提鑿岩機

slide-11
SLIDE 11

11

Measures of Sound (cont.)

Absolute threshold of hearing: the maximum amount of energy of a pure tone that cannot be detected by a listener in a noise free environment

*

音爆 砲口

*

slide-12
SLIDE 12

12

Speech Production – Articulation

Speech is produced by air-pressure waves emanating (發出) from the mouth and the nostrils (鼻孔) of a speaker The inventory (清單) of phonemes (音素), the basic units of speech, can be split into two classes

– Consonants (子音): articulated (發音) in the presence of constrictions (壓縮) in the throat or obstructions (阻礙) in the mouth (tongue, teeth, lips) as we speak – Vowels (母音/元音): articulated without major constrictions and

  • bstructions
slide-13
SLIDE 13

13

Speech Production – Articulation (cont.)

  • Lungs (肺): source of air during

speech

  • Vocal cords (larynx,喉頭): when the

vocal folds (聲帶) are held close together and oscillate one another during a speech sound, the sound is voiced, when the folds are too slack

  • r tense to vibrate periodically, the

sound is unvoiced.

  • Soft palate (velum,軟顎):allow

passage of air through the nasal cavity (m,n)

  • Hard palate (硬顎): tongue placed on

it to produce certain consonants

  • Teeth: braces (支撐) the tongue for

certain consonants

  • Lips: round or spread to affect vowel

quality, closed completely to stop the

  • ral air flow for certain consonants

(p,b,m) Rounded vowels: /u/ Spread -> /i/

slide-14
SLIDE 14

14

Speech Production - The Voicing Mechanism

Voiced sounds

– Including vowels, have a roughly regular pattern in both time and frequency structures that voiceless sounds lack – Have more energy – When the vocal folds vibrate during phoneme articulation, the phoneme is considered voiced; otherwise it is unvoiced

  • Vocal folds’ vibration (60Hz (man) ~ 300 Hz (woman or child))

– The distinct vowel timbres (of a sound of an instrument, 音質/音 色) are created by using the tongue and lips to shape the main

  • ral resonance cavity in different ways
slide-15
SLIDE 15

15

Speech Production - The Voicing Mechanism

(cont.) Voiced sounds (cont.)

– The rate of cycling (opening and closing) of the vocal folds in the larynx during phonation of voiced sounds is called the fundamental frequency (基頻)

  • The fundamental frequency contributes more than any other

single factor to the perception of pitch in speech

  • Use in tone recognition of tonal languages (e.g. Chinese) or

as a measure of speaker identity or authenticity

Fundamental frequency ~120Hz (1/8ms)

slide-16
SLIDE 16

16

Speech Production - Pitch

心理聲學家 細微的

slide-17
SLIDE 17

17

Speech Production - Spectrogram

Spectral analysis at a single time-point A short-term frequency analysis The darkness or lightness of a band indicates the relative amplitude or energy present at a given frequency The dark horizontal bands show the formants, which are the fundamental at natural resonances

  • f the vocal tract cavity position
slide-18
SLIDE 18

18

Speech Production - Formants

The resonances (共振/共鳴) of the cavities that are typical of particular articulator configurations (e.g. the different vowel timbres) are called formats (共振峰)

slide-19
SLIDE 19

19

Explanations for Speech Production

人的發音器官可分三大部分 動力器官:呼吸系統及肌肉

– 呼吸運動肌肉良好的協調運動將胸腔中的空氣以穩定的壓力推出 通過喉部,形成了發聲時的動力來源。

發聲器官:喉部的聲帶

– 喉部的聲帶經由喉內肌及周圍的其他肌肉共同的作用,形成特定 的聲門組態。當動力源的氣流通過聲門時,帶動了柔軟的聲帶黏 膜產生波動。根據此時的聲門組態,聲帶黏膜會產生特定頻率及 形態的波動,使得通過的氣流受到規律的阻隔,產生了空氣的疏 密波。 – 聲帶是發聲體本身,為語音提供主要的聲源。聲帶振動產生的一 系列的脈衝(impulses),是一種週期波,其頻譜含有大量的諧波 (harmonics)成分,它們的頻率是基頻 (fundamental frequency) 的 整數倍

slide-20
SLIDE 20

20

Explanations for Speech Production (cont.)

人的發音器官可分三大部分 (cont.) 共鳴(共振)調節器官:口腔、鼻腔、咽腔 (統稱”聲腔” or “聲道”, vocal tract)

– 聲腔是充滿氣體的管腔,具有一定的自然頻率。當來自聲帶的脈 衝之某一諧波與聲腔的某一自然頻率相同或相近時,就發生共鳴 (resonance)現象,此一脈衝諧波頻率成分被加強而提起。因此, 從口中輻射出的語音的頻譜在聲腔的自然頻率處就有共振峰 (Formats),它們的頻率叫做共振峰頻率 – 發音(articulation)機制、調音機制: 指聲腔對於聲帶產生聲音的 共鳴和調節作用,它與語音的音色關係極為密切 – 聲腔變化主要是由舌的高低前後所造成的,像語音學(phonetics) 常用的母音舌位圖 – 雙唇與牙齒是唯一從外部看得見的發音器官,可以額外地為人提 供許多語言交際的信息

slide-21
SLIDE 21

21

Explanations for Speech Production (cont.)

聲腔在發母音(vowel)與發子音(consonant)時的表現

– 發母音時聲腔裡沒有阻塞,但發子音時,聲腔的某兩個部位必定 構成阻塞、阻礙,然後突然釋放被阻空氣,氣流通過從狹縫洩出 或突然衝出,從而形成噪音 – 子音的音色跟聲腔阻塞部分的不同和解除的方式的不同有直接相 關

人體發出音聲之多樣性主要是因為喉部肌肉的作用可以形 成無數多種的聲門組態,聲道的形狀也是會千變萬化的, 因為此共振構音的變化,而構成人體可以發出許多變化的 音聲。 人類發出非言語音聲是與生俱來的能力,言語卻是後天學 習來的。

slide-22
SLIDE 22

22

Speech Perception - Physiology of the Ear

The ear processes an acoustic pressure signal by

– First transforming it into a mechanical vibration pattern on the basilar membrane (例如鼓膜) – Then representing the pattern by a series of pulses to be transmitted by the auditory nerve

Physiology (生理機能) of the Ear

– When air pressure variations reach the eardrum (鼓膜) from the

  • utside, it vibrates, and transmits the vibrations to bones

adjacent to its opposite side – Then the energy is transferred by mechanical action of the stapes (鐙骨) into an impression on the membrane stretching

  • ver the oval window (卵圓窗)

– The cochlea (耳蝸) can be roughly regarded as a set of filter banks, whose outputs are ordered by location

  • Frequency-to-place transformation
slide-23
SLIDE 23

23

Speech Perception - Physiology of the Ear

(cont.)

外耳 中耳 內耳 錘骨 鐙骨 鼓膜 耳咽管 耳蝸 聽神經 砧骨 耳翼

Stapes Oval window 卵圓窗 Semicircular canals 半規管

耳翼將接收的聲音傳輸至外耳道, 並擴音約4-6dB。 半規管內充滿液體,是由耳蝸連接過來的。 是我們人體平衡器官之一,它讓我們知道是 不是正在移動、移動的方向以及身體處在空 間中的位置。 外耳道的長度約為2.3公分~2.9公分;成人的平均長度約為2.5公分。 外耳道的直徑約為0.7公分,大多數都不是呈直線的。 耳膜非常有彈性,它可以捕捉音波的振動,並將之傳至中耳。 三個聽小骨組成聽小骨鏈,接收耳膜的振動,將聲音傳至卵圓窗(內耳)。

slide-24
SLIDE 24

24

Speech Perception - Physiology of the Ear

(cont.)

slide-25
SLIDE 25

25

Speech Perception - Physiology of the Ear

(cont.)

耳蝸

– 它是內耳也是聽力最重要的部位。 – 耳蝸是一種旋轉狀的構造,內部充滿淋巴,外圍是一層人體最堅 硬的骨頭保護著。 – 耳蝸從基部到頂部的長度約5公分,聽力的分析是在這個部位執 行。 – 基底膜的上面是聽覺器官的最尾端,這部位稱為柯氏器(Organ of Corti) – 柯氏器內有毛髮細胞;分為內毛細胞及外毛細胞,毛細胞上佈滿 纖毛。 – 內毛細胞一排,約有3500個;外毛細胞三排,約有12000個。 – 毛細胞和聽神經的神經纖維相連接,神經纖維約有30000個。 – 這些毛髮細胞會感應中階內液體傳來的振動或毛細胞上面覆膜的 移動。 – 毛髮細胞可將接收到的液體流動及振動轉換成神經刺激,這些神 經訊號經由聽覺神經,直通我們腦部的聽覺區,在腦部會被轉譯 為”聲音”。

slide-26
SLIDE 26

26

Explanations for Speech Perception

聽力形成:

  • 1. 聲音由耳翼接收,並傳至外耳道再傳至耳膜。
  • 2. 耳膜接收聲音的能量,並將它轉變成機械能量,所以第一個能量

的轉換是從耳膜開始。

  • 3. 耳膜再把機械能量,傳送到聽小骨鏈。
  • 4. 鐙骨的踏板接在卵圓窗上面,它將機械能再轉成液能,這裏是第

二個能量轉換處。

  • 5. 前庭階的能量會傳遞到中階,中階液體的移動,會造成柯氏器上

面毛髮細胞的移動。

  • 6. 中階再將液能轉為電能量,此為第三個能量轉換處。
  • 7. 毛髮細胞會刺激在柯氏器基部的神經細胞,再將這些神經訊號經

由聽神經傳到腦部。

  • 8. 能源轉換結論:外耳(聲能) → 中耳(機械能) → 內耳(液能及電能)
slide-27
SLIDE 27

27

Speech Perception

Physical vs. Perceptual Attributes

One fundamental divergence between physical and perceptual qualities is the phenomenon of non-uniform equal loudness perception of tones of varying frequencies

雙耳的 強度 響度 相位差

slide-28
SLIDE 28

28

Speech Perception

Physical vs. Perceptual Attributes

Non-uniform equal loudness perception of tones of varying frequencies

– Tones of different pitch have different inherent perceived loudness – The sensitivity of the ear varies with the frequency and the quality of the sound – Hear sensitivity reaches a maximum around 4000 Hz, which is near to the first resonance frequency of the

  • uter ear canal
slide-29
SLIDE 29

29

Speech Perception

Physical vs. Perceptual Attributes (cont.)

Masking: when the ear is exposed to two or more different tones, it’s a common experience that one tone may mask others

– An upward shift in the hearing threshold of the weaker tone by the louder tone – Pure tones close together in frequency mask each other more than tones widely separated in frequency – A pure tone masks tones of higher frequency more effectively than those of lower frequency – The greater the intensity of the masking tone, the broader the range of frequencies it can mask

slide-30
SLIDE 30

30

Speech Perception

Physical vs. Perceptual Attributes (cont.)

Masking (cont.)

Frequency masking Temporal masking

A sound too close in time to another sound cannot be perceived.

slide-31
SLIDE 31

31

Speech Perception

Physical vs. Perceptual Attributes (cont.)

Binaural listening greatly enhances our ability to sense the direction of the sound source

– Time and intensity cues have different impacts for low frequency and high frequency, respectively – Low-frequency sounds are lateralized mainly on the basis of interaural time differences – High-frequency sounds are lateralized mainly on the basis of interaural intensity differences

The question of distinctive voice quality

– Speech from different people sounds different, e.g., different fundamental frequencies, different vocal-tract length – The concept of timbre (音質) is defined as that attribute of auditory sensation by which a subject can judge that two sounds similarly presented and having the same loudness and pitch are dissimilar

slide-32
SLIDE 32

32

Speech Perception - Frequency Analysis

Researchers undertook psychoacoustic (心理聲學) experimental work to derive frequency scales that attempt to model the natural response of the human perceptual system (the cochlea acts as a spectrum analyzer)

– The perceptual attributes of sounds at different frequencies may not be entirely simple or linear in natural

The western musical pitch is described in octaves (八度 音程) and semi-tones (半音)

– A tone of frequency f1 is said to be an octave above a tone of frequency f2 if and only if f1=2f2 – There are 12 semitones in an octave, so a tone of frequency f1 is said to be a semitone above a tone of frequency f2 if and only if f1=21/12f2=1.05946f2

slide-33
SLIDE 33

33

Speech Perception - Frequency Analysis (cont.)

Fletcher’s work (1940) pointed to the existence of critical

(臨界) bands in the cochlear response – The cochlea acts as if it were made up of overlapping filters having bandwidth equal to the critical bandwidth – One class of critical band scales is called Bark frequency scale (24 critical bands) – By treating spectral energy over the Bark scale, a more natural fit with spectral information processing in the ear can be achieved – The perceptual resolution is finer in the lower frequencies – The critical bands are continuous such that a tone of any audible frequency always finds a critical band centered on it

( )

              + =

2

7500 f arctan 5 . 3 ) f 00076 . arctan( 13 f b

slide-34
SLIDE 34

34

Speech Perception - Frequency Analysis (cont.)

Bark Frequency Scale: (cont.)

slide-35
SLIDE 35

35

Speech Perception - Frequency Analysis (cont.)

Mel Frequency Scale (Mel): linear below 1 KHz and logarithmic above

– Model the sensitivity of the human ear – Mel: a unit of measure of perceived pitch or frequency of a tone

Steven and Volkman (1940)

– Arbitrarily chose the frequency 1000 Hz as “1000 mels”. – Listeners were then asked to change the physical frequency until the pitch they perceived was twice the reference, then 10 times, and so on; and then half the reference, 1/10, and so on

  • These pitches were labeled 2000, 10000 mels and so on;

and 500, 100 mels, and so on – Determine a mapping between the real frequency scale (Hz) and the perceptual frequency (Mel) – Have been widely used in modern speech recognition system

slide-36
SLIDE 36

36

Speech Perception - Frequency Analysis (cont.)

Mel Frequency Scale (cont.)

( )

      + = 700 f 1 ln 1125 f Mel

slide-37
SLIDE 37

37

Phonetics and Phonology

Phonetics (語音學): The study of speech sounds and their production, classification, and transcription Phonology (音韻學): The study of the distribution and patterning of speech sounds in a language and of the tacit (內隱的、模糊的) rules governing the speech pronunciation

slide-38
SLIDE 38

38

Phonetics and Phonology – Phonemes (cont.)

Phoneme: a notation system to represent the phonetic phenomenon that are crucial for meaning

– Like fingerprints, every speaker’s vocal anatomy (組織/構造) is unique, and this makes for unique vocalization of speech sounds – Language communication is based on commonality of form at the perceptual level

Phoneme vs Phone

– In speech science, the term phoneme is used to denote any of the minimal units of speech sound in a language that can serve to distinguish one word from another – The term phone is used to denote a phoneme’s acoustic realization – E.g. English phoneme /t/ has two very different acoustic realizations in the word sat and meter. We had better treat them as two different phones when building a spoken language system

We usually use the terms phoneme and phone interchangeably to refer to the speaker-independent and context-independent units of meaningful speech sound in spoken language processing research

slide-39
SLIDE 39

39

Phonetics and Phonology – Phonemes (cont.)

–The set of phonemes will differ in realization across individual speaker –But phonemes will always function systematically to differentiate meaning in words

slide-40
SLIDE 40

40

Phonetics and Phonology - Vowels

The tongue shape and positioning in the oral cavity do not form a major constriction (阻塞) of air flow during vowel articulation

– Variations of tongue placement give each vowel its distinct character by changing the resonance (just as different sizes and shapes of bottles give rise to different acoustic effects when struck) – The linguistically important dimensions of the tongue movements are generally the ranges [front <-> back] and [high <-> low]

F1 and F2

– The primary energy entering the pharyngeal (咽的) and oral cavities in vowel production vibrates at fundamental frequency – The major resonances of these two cavities for vowels are called F1 and F2, the first and second formants

  • Determined by the tongue placement and oral tract shape in vowels
  • Determine the characteristic timbre or quality of the vowel
slide-41
SLIDE 41

41

Phonetics and Phonology - Vowels (cont.)

F1 and F2 (cont.)

– Vowels can be described by the relationship of F1 and F2 to one another – F1 corresponds to the back

  • r pharyngeal portion of the

complete vocal tract – F2 is determined more by the size and shape of the oral portion

  • The cavity from the glottis to

the tongue extrusion(擠壓) is longer than the forward part

  • f the oral cavity, thus F2 is

higher than F1

  • Rounding the lips has the

effect of extending the front-

  • f-tongue cavity, thus

lowering F2 see

Table 2.5 Phoneme labels and typical formant values fro vowels of English

* *

slide-42
SLIDE 42

42

Phonetics and Phonology - Vowels (cont.)

Diphthongs(雙母音) : a special class of vowels that combine two distinct sets of F1/F2 values

– As the articulators move, the initial vowel targets glide smoothly to the final configuration – Since the articulators are working faster in production of a diphthong, sometimes the ideal formant target values of the component values are not fully attained

slide-43
SLIDE 43

43

Phonetics and Phonology - Vowels (cont.)

slide-44
SLIDE 44

44

Phonetics and Phonology - Vowels (cont.)

(tie) (ate) (coin) (foul) (tool) (book) (feel) (hit) (ten) (at) (car) (dog) (book) (go) (you) (ago)

slide-45
SLIDE 45

45

Phonetics and Phonology - Vowels (cont.)

The tongue hump (彎曲) is the major actor in vowel articulation The most important secondary vowel mechanism for English and many other language is lip rounding

– E.g. /iy/ (see) and /uw/ (blue) – When you say /iy/, your tongue will be in the high/front position and your lips will be flat, slightly open, and somewhat spread – When you say /uw/, your tongue will be in the high/back position and your lips begin to round out, ending in a more puckered (噘嘴的) position

slide-46
SLIDE 46

46

Phonetics and Phonology - Consonants

摩擦音 爆裂音 側流音 捲舌流音 滑音 鼻音

semi-vowel

Characterized by significant constriction (壓縮) or

  • bstruction (阻礙) in the pharyngeal (咽) and/or oral (口)

cavities – Some consonants are voiced; others are not – Many consonants occur in pairs; i.e., sharing the same configuration of articulators and one member of the pair additionally has voicing which the other lacks (e.g. /s, z/)

slide-47
SLIDE 47

47

Phonetics and Phonology – Consonants (cont.)

Plosive (or Stop, 爆裂音、破裂音)

– Consonants that involve complete blockage of the oral cavity – E.g. /b, p/ /d, t/ /g, k/

Fricative (摩擦音)

– Consonants that involve nearly complete blockage of the oral cavity – E.g. /s, z/

Affricate (爆擦音)

– A stop (e.g. /t/) followed by a fricative (e.g. /sh/), they combine to make a unified sound with rapid phases of closure and continuancy (e.g. {t+sh}=ch, as in church) – E.g. /jh/ (d+zh) /ch/ (t+sh)

slide-48
SLIDE 48

48

Phonetics and Phonology – Consonants (cont.)

Nasal (鼻音)

– The oral cavity has significant constriction (by the tongue or lips), with the velar flap open, air passes through the nasal cavity – E.g. /m, n/

Retroflex liquid (捲舌流音、捲舌音)

– The tip of the tongue is curled back slightly – E.g. /r/

Lateral liquid (側流音、舌邊音)

– The airstream flows around the sides of the tongue – E.g. /l/

Glide (滑音)

– Glides /y, w/ are basically vowels /iy, uw/ whose initial position within the syllable require them to be a little shorter and to lack the ability to be stressed (e.g. yes, well)

slide-49
SLIDE 49

49

Phonetics and Phonology – Consonants (cont.)

Semivowel (半母音)

– The English phones that typically have voicing without complete

  • bstruction or narrowing of the vocal tract are called semivowels

– Include the liquid group /l, r/ and the glide group /y, w/

Sonorant (響音)

– Semivowels + vowels

Voiced vs. Unvoiced (Voiceless)

– Voiced/unvoiced pairs of stops: /b, p/ /d, t/ /g, k/ – Voiced/unvoiced pairs of fricatives: /z, s/ (lazy, sit) /dh, th/ (then, thin) /zh, sh/ (genre, she) – Voiced/unvoiced pairs of affricates: /jh, ch/ (edge, march)

slide-50
SLIDE 50

50

Phonetics and Phonology – Consonants (cont.)

slide-51
SLIDE 51

51

Phonetics and Phonology – Consonants (cont.)

唇音 唇齒音 齒音 齒槽音 上顎音 軟顎音 聲門的

then thin she genre

slide-52
SLIDE 52

52

Phonetics and Phonology – Consonants (cont.)

阻塞部分在舌尖與齒背 阻塞部分在雙唇 阻塞部分在舌根與硬顎 軟顎下降使得鼻腔與口腔相通

ㄇ ㄋ ㄤ、ㄥ

阻塞部分在雙唇 阻塞部分在舌尖與齒背 阻礙部分在舌尖對齒背 阻礙部分在舌尖對硬顎前面 阻礙部分在舌面對硬顎 阻塞部分在舌根與硬顎

ㄅ ㄉ ㄍ ㄙ ㄕ ㄒ

slide-53
SLIDE 53

53

Phonetic Typology (語音的類型)

What is linguistically distinctive in one language could be less distinctive in other languages

– Length: Japanese vowels have a characteristic distinction of the length that can be hard for non-natives to perceive and use when learning the language

  • The word kado (corner) and kaado (card) are spectrally identical,

differing in their durations

  • Length is phonemically distinctive for Japanese

– Trilled r sound (顫音):

  • Spanish, non-lexical sound used by American circus ringmasters

– Pitch:

  • The primary dimension lacks in English
  • Many Asia and Africa language are tonal, e.g. Chinese
  • For tonal language, they have lexical meaning contrasts cued by

pitch, e.g. Mandarin Chinese has four primary tones

slide-54
SLIDE 54

54

Phonetic Typology (cont.)

語 音 實 驗 室

– Pitch: (cont.)

  • Though English don’t make

systematic use of pitch in its inventory of word contrasts, pitch is systematically varied in English to signal a speaker’s emotions, intentions and attitudes and it has some linguistic function in signaling grammatical structure as well

slide-55
SLIDE 55

55

The Allophone: Sound and Context

Phonetic units should be correlated with potential meaning distinctions

– mean /m iy n/ vs men /m eh n/

However, the fundamental meaning-distinguishing sound is often modified in some systematic way by its phonetic neighbors

– Coarticulation: the process by which neighboring sounds influence one another – Allophone (音位變體、同位音): when the variations resulting from coarticulatory processes can be consciously perceived, the modified phonemes are called allophones

  • p in (pin, /p ih n/) produces a noticeable puff (噴出) of air, called

aspiration (送氣), but loses its aspiration in (spin, /s p ih n/)

  • A vowel before a voicing consonant, e.g. /d/, seems typically longer

than the same vowel before the unvoiced counterpart, /t/, bad vs. bat

slide-56
SLIDE 56

56

The Allophone: Sound and Context (cont.)

slide-57
SLIDE 57

57

Speech Rate and Coarticulation

Did you hit it to Tom?

  • 1. Palatalization (顎音化) of /d/ before /y/ in Did you
  • 2. Reduction of unstressed /u/ to schwa (語音含糊的母音, ə) in you
  • 3. Flapping of intervocalic (位於兩個母音之間的) /t/ in hit it
  • 4. Reduction of schwa and devoicing of /u/ in to
  • 5. Reduction of geminate (double consonant) /t/ in it to
slide-58
SLIDE 58

58

Structural Features of Chinese Language

Not alphabetic At least 10,000 commonly used characters (字)

– Almost all morphemes (詞素) with their own meaning – All monosyllabic

Unlimited number of words (詞) , at least 100,000 commonly used , each composed of one to several characters (字)

– The meaning of the word can be directly or partly related, or even completely irrelevant to the meaning of the component characters

書 店,大 學,和 尚,光 棍

Tonal language

– 4 lexical tones plus 1 neutral tone (Mandarin)

slide-59
SLIDE 59

59

Structural Features of Chinese Language (cont.)

About 1,335 syllables only (Mandarin)

– About 408 base-syllables if differences in tone disregarded (Mandarin)

Large number of homonym characters (同音字) sharing the same syllable Monosyllabic structure of the Chinese language

– Each syllable stands for many characters with different meanings – Combination of syllables (characters) gives an unlimited number

  • f words, e.g. 電腦, 火山

– Small number of syllables carries plurality (多重性) of linguistic information

Almost each character with its own meaning, thus playing some linguistic role independently

slide-60
SLIDE 60

60

Structural Features of Chinese Language (cont.)

No word boundaries in a Chinese sentence

電腦科技的進步改變了人類的生活和工作方式

– Word segmentation not unique – Words not well defined – Commonly accepted lexicon not existing

Open vocabulary nature with a flexible wording structure

– New words easily created everyday

  • 電 (electricity) + 腦 (brain)→電腦 (computer)

– Long word arbitrarily abbreviated

  • 臺灣大學 (Taiwan University) →臺大

– Name/title

  • 李登輝總統 (President T.H. Lee) →李總統登輝

– Unlimited number of compound words

  • 高 (high) + 速 (speed) + 公路 (highway) →高速公路(freeway)
slide-61
SLIDE 61

61

Structural Features of Chinese Language (cont.)

Difficult for word-based approaches popularly used in alphabetic languages

– Serious out of vocabulary (OOV) problem

Phonetic structure of Mandarin syllables

– INITIAL / FINAL’s

  • INITIAL: initial consonant of a syllable
  • FINAL: the vowel part but including an optional nasal ending

– Phonemes

Different degrees of context dependency

– intra-syllable only – intra-syllable plus inter-syllable – right context dependent only – both right and left context dependent

slide-62
SLIDE 62

62

Structural Features of Chinese Language (cont.)

Examples

– 22 INITIAL’s extended to 113 right-context-dependent INITIAL’s – 33 phone-like-units extended to 145 intra-syllable right-context- dependent phone-like-units, or 481 with both intra/inter-syllable context dependency – 4606 triphones with intra/inter-syllable context dependency

Syllables (1,345) B ase-syllables (408) F IN A L ’s (37) IN IT IA L ’s (21) M edials (3) N ucleus (9) E nding (2) C onsonants (21) V ow els plus N asals (12) Phones (31) T ones (4+1)