Speech Information Processing Akinori Ito Graduate School of - - PowerPoint PPT Presentation

speech information processing
SMART_READER_LITE
LIVE PREVIEW

Speech Information Processing Akinori Ito Graduate School of - - PowerPoint PPT Presentation

Sound Media Engineering part II Speech Information Processing Akinori Ito Graduate School of Engineering, Tohoku Univ. aito@fw.ipsj.or.jp 1 Overview of the lecture #1: Production and coding of speech (1) Speech production, feature of


slide-1
SLIDE 1

1

Sound Media Engineering part II

Speech Information Processing

Akinori Ito Graduate School of Engineering, Tohoku Univ. aito@fw.ipsj.or.jp

slide-2
SLIDE 2

2

Overview of the lecture

  • #1: Production and coding of speech (1)

– Speech production, feature of speech sound – Basic codecs: PCM,DPCM,ADPCM

  • #2: Coding of speech (2)

– Linear Prediction of speech: Linear Prediction

Coefficients, PARCOR Coefficients and LSP

– CELP coding – Audio coding

  • #3: Speech enhancement

– Spectral subtraction – Microphone array

slide-3
SLIDE 3

3

Production of speech

  • Organs that produce

speech

– vocal cords – larynx – pharynx – tongue – gums – teeth – lips – nasal cavity

vocal tract

slide-4
SLIDE 4

4

Acoustic tube model

  • Human speech production is similar to wind

instruments

声帯 声道 喉頭 唇 鼻腔

Pitch of voice Linguistic content Personality

slide-5
SLIDE 5

5

Linguistic and speaker feature

声帯 声道 喉頭 唇 鼻腔

A speaker can control shape of this part

slide-6
SLIDE 6

6

Linguistic and speaker feature

声帯 声道 喉頭 唇 鼻腔

A speaker cannot control shape of this part, total length of vocal tract

slide-7
SLIDE 7

7

Speech waveform

  • Complex enough

/a/ /i/ /u/ /o/ /e/

slide-8
SLIDE 8

8

Speech waveform

  • It is complex, but almost periodic

Fundamental period

Fundamental period T [s] Fundamental frequency F0 [Hz] = 1/T

slide-9
SLIDE 9

9

Various "a"

  • Two /a/'s with different fundamental frequencies

– Same phone = same vocal tract shape – Completely different waveforms – What is the same between these waveforms?

slide-10
SLIDE 10

10

Spectrum of speech

  • Spectrum of two /a/'s

– Spectral shapes are similar →Shape of vocal tract – "Jaggies" of speectrum differ→Fundamental Freq.

slide-11
SLIDE 11

11

Spectrum and formant frequencies

  • F0: 基本周波数
  • F1,F2,..: ホルマント(formant)周波数

基本周波数 ホルマント周波数

F 0 F 1 F 2 F 3 F 4

Formant frequencies Fundamental frequency

slide-12
SLIDE 12

12

Speech coding

  • Sound (analog) → Convert to digital data

– Handle with computer – Transmission over digital line

  • How do we digitize sound?

– Goals

  • Good quality when converting back to analog sound
  • Less bit-rate

– Methodology

  • Exploit various features of speech
slide-13
SLIDE 13

13

Basics of speech coding

  • Sampling

– Observe the temporally continuous signal at

discrete time

– Period of "discrete" observation:

sampling frequency fs

– The original signal can be restored from sampled data

when the original signal only contains frequency component under fs/2 (Sampling Theorem)

slide-14
SLIDE 14

14

Basics of speech coding

  • Quantization

– Round off magnitude of signal into discrete level

  • Magnitude of signal can be represented in integers

– The discrete level : quantization step – Difference between the original signal and quantized

signal : Quantization error

slide-15
SLIDE 15

15

Sampling and quantization: how are they determined?

  • Sampling frequency is determined by the

highest frequency in the sound

– Telephone : 8kHz (up to 4kHz sound) – High-quality speech: 16kHz (up to 8kHz sound) – CD:44.1kHz (up to 22.05kHz sound)

  • Quantization is determined by the dynamic

range of the sound

– To code speech is to quantize speech

slide-16
SLIDE 16

16

PCM coding

  • PCM(Pulse Code Modulation)

– Represent the quantized values as binary numbers

  • What to be determined in PCM

– How many bits to be used for one sample – How to determine levels of quantization

  • Equal steps: linear quantization
  • Inequal steps: nonlinear quantization
  • Examples of PCM coding

– CD:16bit linear quantization – VoIP(G.711): 8bit nonlinear quantization

slide-17
SLIDE 17

17

PCM linear quantization

  • There are nothing difficult

5 10

  • 5
  • 10
  • 7 -7 5 2 -6 -2 0 1 4 0 -2 11 11
  • 1 -2 0 3 2 0 1 33

CD: quantize in 16bit(-32768~+32767)

slide-18
SLIDE 18

18

Nonlinear quantization

  • Most samples are nearly zero

→Total error can be reduced by finely quanti- zing values around zero

5 10

  • 5
  • 10

5 10

  • 5
  • 10
slide-19
SLIDE 19

19

Example of nonlinear quantization: G.711

  • Speech coding for 64kbit/s digital phone line

– 8kHz sampling, 8bit nonlinear quantization – μ-Law (Japan, US) A-Law (Europe) – μ-Law: 14bit linear quant.→8bit nonlinear quant.

Y =128 sign X  log1255∣X∣ 8192  log256

  • 150
  • 100
  • 50

50 100 150

  • 8000
  • 6000
  • 4000
  • 2000

2000 4000 6000 8000 8bit mu-Law 14bit linear

slide-20
SLIDE 20

20

Differential PCM (DPCM)

  • In ordinary speech signal, values of two

contiguous samples do not differ very much →Reduce bit-rate by transmitting the differences of samples

  • z-1

Q

slide-21
SLIDE 21

21

Differential PCM(DPCM)

  • Original

waveform

  • Differential

waveform

  • 25000
  • 20000
  • 15000
  • 10000
  • 5000
5000 10000 15000 500 1000 1500 2000 2500 3000 3500 4000 4500
  • 25000
  • 20000
  • 15000
  • 10000
  • 5000
5000 10000 15000 500 1000 1500 2000 2500 3000 3500 4000 4500
slide-22
SLIDE 22

22

Adaptive Differential PCM(ADPCM)

  • To enhance efficiency of DPCM

– Use more sophisticated prediction rather than

simple difference

– Adaptively change quantization steps

  • When difference between two samples is large, the

difference to the next sample is likely to be large too

  • When difference between two samples is small, the

difference to the next sample is likely to be small too

slide-23
SLIDE 23

23

Block diagram of ADPCM

adaptive quantizer adaptive de-quantizer ADPCM

  • utput

+ PCM input signal predictor + +

  • +

+

xk xek xrk  d k  d qk

reconstructed signal quantized differential signal differential signal prediction signal

I k

slide-24
SLIDE 24

24

Calculation algorithm of ADPCM

1.Compute prediction signal 2.Compute difference 3.Quantize (ADPCM output) 4.De-quantize 5.Reconstruct signal 6.Compute next prediction

xek  d k =xek−xk I k=Q d k d qk=Q

−1I k 

xrk=xekd qk  xek1= pred xrk ,d qk ,

slide-25
SLIDE 25

25

Prediction of speech signal

  • ADPCM quantizes difference between the input

signal and predicted signal

  • How to predict signal

– DPCM – A little better way – G.726

xek =xrk−1 xek=2 xrk−1−xrk−2 xek=∑

i=1 2

ai xrk−i∑

i=1 6

bid qk−i

slide-26
SLIDE 26

26

Determine quantization step adaptively (example)

  • 1

1 2 3 4 5 6 7

  • 7
  • 6
  • 5
  • 4
  • 3
  • 2
  • 8
  • 1

1 2 3 4 5 6 7

  • 7
  • 6
  • 5
  • 4
  • 3
  • 2
  • 8
  • Observe difference

between previous sample using the scale

  • If the difference is

"blue", half the size

  • f the next scale
  • If the difference is

"red", double the size of the next scale

slide-27
SLIDE 27

27

For high-efficiency speech coding

  • PCM, DPCM, ADPCM encodes general sound

signal

– DPCM, ADPCM partly exploits property of input

signal

  • Human speech is a small part of sound signal

→We can enhance efficiency of coding by considering property of human speech

  • What is the property of human speech?
slide-28
SLIDE 28

28

High-level speech coding

digital data speech feature phones words/ sentences semantics digital data speech feature phones words/ sentences semantics speech 音声 PCM coder (public phone) CELP coder (mobile phone) under research summarizing telephone? AD/DA vocoder speech synthesis Text-to- Speech

slide-29
SLIDE 29

29

Speech production model

声帯 声道 喉頭 唇 鼻腔

X =S T  R

S  T  R

radiation

X 

vocal cords larynx nasal cavity vocal tract lips

slide-30
SLIDE 30

30

Speech production model

S 

slide-31
SLIDE 31

31

Speech production model

S 

T R

slide-32
SLIDE 32

32

Modeling speech using parameters

  • Modeling speech using linear prediction (LPC)

– Spectral shape: parameters of linear prediction filter – Vocal cord vibration : residue – In spectral domain

xk=−∑

i=1 p

ai xk−iek X = E 1∑

n=1 p

ane

ni

=E H  S  T R

Estimate coefficients to minimize residue

slide-33
SLIDE 33

33

Analysis and transmission of speech by LPC

  • Information to be transmitted

– LP coefficients ai and residue e(k)

  • How to transmit them?

– Estimate ai for a fixed number of samples (a block) – Calculate e(k) using estimated ai – Transmit ai and e(k) as parameters of the block

  • How to restore the signal?

– Using LPC formula

xk=−∑

i=1 p

ai xk−iek

slide-34
SLIDE 34

34

Estimation of LP coefficients

  • How to estimate LPC from x(1)...x(k)

– Solve a simultaneous equation (Yule-Walker

equation) →LPC are calculated as the least-error solution

– Faster algorithm (Levinson-Durbin algorithm)

  • LPC equation

− xk−1 xk−2 ⋯ xk− p xk−2 xk−3 ⋯ xk− p−1 ⋮ ⋮ ⋱ ⋮ x p−1 x p−2 ⋯ x1  a1 a2 ⋮ a p = xk xk−1 ⋮ x p   ek ek−1 ⋮ e p 

−FA=V E

slide-35
SLIDE 35

35

Estimation of LP coefficients

  • Least square solution: minimize |E|2

→minimize |FA+V|2

  • Equation to be solved

F

T F A=−F T V

F

T F=ij, F T V =0j

11 12 ⋯ 1p 21 22 ⋯ 2p ⋮ ⋮ ⋱ ⋮  p1  p2 ⋯  pp a1 a2 ⋮ a p =− 01 02 ⋮ 0p Yule-Walker equation

slide-36
SLIDE 36

36

Estimation of LP coefficients

  • Elements in the Yule-Walker equation
  • Solve the equation directly
  • Fast algorithm (autocorrelation method) N>>p

– Matrix is in a special form (symmetric

Toeplitz matrix)

– Quick solution algorithm (Levinson-Durbin

algorithm)

ij=∑

n= p N−1

yn−i yn− j ij=r∣i− j∣= ∑

n=0 N−∣i− j∣−1

yn yn∣i− j∣

slide-37
SLIDE 37

37

Analysis and transmission of speech by LPC

  • Problem

– Re-synthesis by LPC formula could be unstable

→The output signal eventually oscillates when ai have quantization errors

  • Solution

– Transmit parameters that are equivalent to LPC and

stable against quantization error

  • PARCOR coefficients
  • LSP coefficients
slide-38
SLIDE 38

38

LPC and PARCOR coefficients

  • PARCOR (partial correlation) coefficients

k i=

n=−∞ ∞

i−1ni−1n

 ∑

n=−∞ ∞

i−1

2 n ∑ n=−∞ ∞

i−1

2 n

i−1n=xn∑

j=1 i−1

a j

i−1 xn− j

i−1n=xn−i∑

j=1 i−1

b j

i−1 xn− j

Forward prediction error

PARCOR coefficient is equivalent to correlation of the forward prediction errors and backward prediction errors

Backward prediction error

slide-39
SLIDE 39

39

PARCOR coefficients

x(n-i-1) x(n-i) x(n-i+1) x(n-i+2) x(n-2) x(n-1) x(n) x(n+1) x(n) x(n-i)

bj aj

^ ^ + +

  • in

in

correlation k i

...

slide-40
SLIDE 40

40

PARCOR and LPC

k 1=

n=−∞ ∞

0n0n

 ∑

n=−∞ ∞

0

2n ∑ n=−∞ ∞

0

2n

0n=xn 0=xn−1

k1 is a correlation coefficient between x(n-1) and x(n) As x(n-1) and x(n) have same variance and zero mean,

 x

1n=k1 xn−1

a1

1=−k 1

 x

1n−1=k1 xn

b1

1=−k 1

slide-41
SLIDE 41

41

PARCOR and LPC

k 2=

n=−∞ ∞

1n1n

 ∑

n=−∞ ∞

1

2n ∑ n=−∞ ∞

1

2n

 1

2n=k 21n−1=−k 1k 2 xn−1k 2 xn−2

1n=xn−k1 xn−1 1n=xn−2−k1 xn−1 Here, as 1n=xn−k1 xn−1  x

2n=k 11−k 2xn−1k 2 xn−2

a1

2=−k 11−k 2=a1 1−k 2b1 1

a2

2=−k 2

Similarly, b2

2=b1 1−k 2a1 1

b1

2=−k 2

slide-42
SLIDE 42

42

PARCOR and LPC

In general, a j

i=a j i−1−k i b j i−1 a0 i−1=0

b j

i=b j−1 i−1−ki a j−1 i−1 b0 i−1=0

We can calculate LP coefficients using this recurrence relation

slide-43
SLIDE 43

43

LPC and LSP coefficients

  • What is LSP (Line Spectrum Pair)?

– Representation of LPC equation in z-domain: – Decompose A(z) into P(z) and Q(z)

xk∑

i=1 p

ai xk−i=ek X z1∑

i=1 p

ai z

−i=Ez

A

 pz

Pz=A

 pz−z − p1 A  p z −1

Qz=A

 pzz − p1 A  pz −1

A

 pz= PzQz

2

slide-44
SLIDE 44

44

LSP coefficients

  • Roots of P(z)=0 and Q(z)=0 are on the frequency

axis (on the unit circle in z-domain)

  • P(z) and Q(z) can be written as
  • Roots of P(z)=0 and Q(z)=0
  • LSP

Pz=1−z

−1 ∏ i=2,4,, p

1−2 z

−1cosiz −2

Qz=1z

−1

i=1,3,, p−1

1−2 z

−1cosiz −2

z=cosi±i sini 1,2,, p

slide-45
SLIDE 45

45

Properties of LSP

  • Frequency parameter
  • Calculated using numerical analysis
  • Easy to determine stability
  • Robust against quantization than PARCOR
  • More computation needed than PARCOR
  • Widely used in current speech codings

012⋯ p

slide-46
SLIDE 46

46

CELP coder

  • CELP(Code-Excitation Linear Prediction)

– Basic coding scheme for mobile phone – Analysis and synthesis based on LPC – Transmit LSP coefficients and residue

slide-47
SLIDE 47

47

Overview of CELP

LPC analysis quantization LSP coeff. code vector selection residue codebook gain codebook residue LPC synthesis +

  • auditory

weighted distance Select a code vector with minimum distance generating bitstream

  • utput
slide-48
SLIDE 48

48

Speech codings based on CELP

– LD-CELP (Low-Delay CELP)

  • G.728 16kbit/s

– CS-ACELP (Conjugate Structure Algebraic CELP)

  • G.729 8kbit/s

– RPE-LTP (Regular Pulse Excitation with Long Term

Prediction)

  • GSM standard 13kbit/s

– VSELP (Vector Sum Excitation LP)

  • PDC standard 6.7kbit/s

– PSI-CELP (Pitch Synchronous Innovation CELP)

  • PDC half-rate standard 3.45kbit/s

– ACELP (Algebraic CELP)

  • GSM revised standard 7.4kbit/s
slide-49
SLIDE 49

49

Audio coding

  • Coding of general sound and music

– We cannot make assumption (like speech) on the

input signal

  • Higher frequency components have smaller power

– Model-based coding (like speech) cannot be used

  • Coding based on multi-band analysis

– Split the input signal into low-frequency to high-

frequency

  • Frequency analysis using filter bank / MDCT

– Change quantization step frequency by frequency

  • Course quantization for high frequency
  • Course quantization if that sound is not salient
slide-50
SLIDE 50

50

Basic framework of audio coding

frequ- ency analysis quant. quant. quant. quant. generate bitstream restore bitstream dequant. dequant. dequant. dequant. convert into time domain QMF MDCT Wavelet Consider auditory property (psycho- acoustic analysis) Entropy coding (Huffman, arithmetic)

slide-51
SLIDE 51

51

SB-ADPCM

  • 16kHz middle-quality speech coder (G.722)

– Sub-Band ADPCM – Split the input signal into high and low signals and

encode them using ADPCM individually

  • Frequency splitting using quadrature mirror filter(QMF)
  • ADPCM coding: 2bit for high band,4 to 6 bit for low band

(48~64kbit/s)

QMF

ADPCM encoder ADPCM encoder

generate bitstream restore bitstream

ADPCM decoder ADPCM decoder

QMF

slide-52
SLIDE 52

52

Quadrature Mirror Filter (QMF)

  • Split the input signal into high-frequency and

low-frequency signals and total data amount is identical

  • The original signal can be perfectly restored by

combining low and high frequency signal

  • Example of simple QMF (Haar Wavelet)

yi= x2ix2i1 2 zi= x2i−x2i1 2 x2i= yizi x2i1=yi−zi

QMF spilt QMF synthesis

xi xi zi yi

高域 低域

slide-53
SLIDE 53

53

MPEG1 audio

  • Audio part of MPEG, standard of video

encoding

– Layer 1 (MP1), layer 2 (MP2), layer 3 (MP3) – Frequency analysis, psychoacoustic model

polyphase filter bank/ MDCT Q Q Q Q bit- stream gen. bit- stream res. de-Q de-Q de-Q de-Q to time domain FFT psycho- acoustic model

slide-54
SLIDE 54

54

MP1

  • MPEG1 audio layer 1

– Frequency analysis by polyphase filter bank – Normalization and scholar quantization in every 12

samples

polyphase filter bank nonlinear scholar quant. 32 frequency bands nonlinear scholar quant. normalize normalize Block average power scholar quant. Block average power scholar quant.

slide-55
SLIDE 55

55

MP3

  • Frequency analysis by polyphase filter bank

and MDCT

  • Variable frame length (18 points standard)
  • Entropy coding

polyphase filter bank nonlinear scholar quant. 32 bands nonlinear scholar quant. MDCT MDCT Huffman coding bitstream

slide-56
SLIDE 56

56

Modified Discrete Cosine Transform (MDCT)

  • Convert n points of time-domain signal into n/2

points of frequency-domain signal

  • Exploits n/2-point overlap window

X m=∑

k=0 n−1

f k xkcos{  2n2 k1 n 22m1} xk= 4 f k  n

m=0 n/2−1

X mcos{  2n2 k1 n 22m1}

slide-57
SLIDE 57

57

Conversion to time-domain by Overlap-Add

  • The original signal can be restored by adding

the temporally overlapping data

MDCT Overlap- Add IMDCT

slide-58
SLIDE 58

58

Speech Enhancement

  • What is speech enhancement?

– Extract specific speech from input signal that

contains the target speech and other noise

– It is generally difficult : some kind of assumption

needed

  • Basic methods

– Single channel

  • Linear method: Wiener filter
  • Nonlinear method: Spectral subtraction

– Multiple-channel case

  • Linear method: microphone array
  • Nonlinear method: multichannel spectral subtraction
slide-59
SLIDE 59

59

Single-channel case

  • Speech signal x,noise signal n,observed

signal y

  • Aim: Estimate y from x (both x and n are

unknown)

  • Assumption needed

– Spectrum of n is known

yt=xtnt Y =X N 

slide-60
SLIDE 60

60

Linear method: the Wiener filter

  • Consider a filter W that minimizes errors
  • Consideration in temporal-frequency domain

 X =W Y =W  X N 

∣

X − X ∣

2d min

t ∑ i=0 N −1

∣X it− 

X it∣

2=

t ∑ i=0 N −1

∣X it−W iX itN it∣

2min

slide-61
SLIDE 61

61

Linear method: the Wiener filter

  • Differentiate the previous formula

If X(t) and N(t) have no correlation,

∂ ∂W i ∑

t ∑ i=0 N−1

∣X it−W iX itN it∣

2=0

t ∣

−2 X itX itN it2W iX itN it∣

2=0

W i=

t ∣X it

2

t ∣X it∣ 2∑ t ∣N it∣ 2

t ∣X it N it∣≈0

The Wiener filter

slide-62
SLIDE 62

62

The Wiener filter

  • Condition to apply

– Average spectrums of signal and noise are known

  • Actually, it hardly holds for the signal

– The signal and noise have no correlation

  • Meaning of the Wiener filter

– Suppress frequency band with large noise power

  • The powers of frequency bands become identical to E[Xi

2]

in average

W iX itN it=

t ∣X it∣ 2

t ∣X it∣ 2∑ t ∣N it∣ 2⋅

 X itN it

slide-63
SLIDE 63

63

Example of the Wiener filter

1000 2000 3000 4000 5000 6000 7000 8000 9000 50000 100000 150000 200000 250000 300000

Speech Noise

1000 2000 3000 4000 5000 6000 7000 8000 9000 0.2 0.4 0.6 0.8 1 1.2

Spectrum

  • f the filter
slide-64
SLIDE 64

64

Spectral subtraction

  • Suppress noise by nonlinear processing
  • Noise is assumed to be stable
  • Subtract spectrum of the noise from observed

spectrum

  • Definitions

– Signal – Noise – Observed signal

X it N it Y it=X itN it

slide-65
SLIDE 65

65

Spectral subtraction

  • Principle of SS

– Power spectrum of the observed signal – The signal is assumed to have no correlation to the

noise

– Noise signal is assumed to be stable

Y it∣

2=∣X itN it∣ 2≤∣X it∣ 22∣X it N it∣∣N it∣ 2

∣X it N it∣≪∣X it∣

2∣N it∣ 2

∣X it∣

2≈∣Y it∣ 2−∣N it∣ 2

∣N it∣

2=N i 2

∣X it∣

2≈∣Y it∣ 2−N i 2

slide-66
SLIDE 66

66

Spectral subtraction

  • Estimation of noise spectrum

– Noise spectrum must be prepared beforehand – Estimated from silent part before the voice

  • To estimate magnitude spectrum

X it≈

Y it∣

2−∣N it∣ 2

∣Y it∣

2

Y it

slide-67
SLIDE 67

67

Practical problem and its solution

  • Power spectrum eventually become negative

– Solution by flooring

  • Overestimation makes the quality better

∣X it∣

2≈{

Y it∣

2−N i 2

if ∣Y it∣

2N i 2

∣Y it∣

2

  • therwise

0≪1

∣X it∣

2≈{

Y it∣

2− N i 2

if ∣Y it∣

2 N i 2

∣Y it∣

2

  • therwise

1

slide-68
SLIDE 68

68

Examples (waveform)

Speech Noise Speech with noise Enhanced speech

slide-69
SLIDE 69

69

Examples (spectrogram)

Speech Noise Speech with noise Enhanced speech

slide-70
SLIDE 70

70

Speech enhancement using multiple microphones

  • Exploit speech signals recorded by more than
  • ne microphone

– Spatial information can be used

  • Record speech and noise individually
  • Beam forming
  • Various methods

– Linear processing

  • Delayed-sum array (superdirective microphone)
  • Adaptive array

– Nonlinear processing

  • Multi-channel spectral subtraction
slide-71
SLIDE 71

71

Delayed-sum array

  • Record sound signal derived from a specific

angle using multiple microphones

– The sound is assumed to be plane wave

d

d sin

q

slide-72
SLIDE 72

72

Delayed-sum array

d

q

d sin

sint

sin t−d sin  c  sin t−2d sin  c  sin t−3d sin c 

slide-73
SLIDE 73

73

3d sin  c  2d sin c  d sin c 

Delayed-sum array

sin t−3 d sin  c − sin t−3 d sin  c − sint−3 d sin c − sint−3 d sin c −

Delay

slide-74
SLIDE 74

74

3d sin  c  2d sin c  d sin c 

Delayed-sum array

Delay

4sint−3d sin c −

slide-75
SLIDE 75

75

3d sin  c  2d sin c  d sin c 

Delayed-sum array

Delay

n=0 3

sin t− nd sin3−nd sin c −

f

slide-76
SLIDE 76

76

Example

入射角(rad)

n=4, d=1 fun1: w=5 fun2: w=10 fun3: w=50

slide-77
SLIDE 77

77

Property of the delayed-sum array

  • Simple processing

– Fast calculation – Easy hardware realization

  • Directivity

– Main robe width become narrower when

  • More microphones are used
  • The frequency become higher
  • Microphone-to-microphone distance become wider

– Spatial aliasing

  • Condition of no aliasing: d<c/2f
slide-78
SLIDE 78

78

Adaptive noise suppression

  • When the noise can be recorded by a

microphone

  • Linear filtering

speech+noise noise Signal Processing speech+noise

slide-79
SLIDE 79

79

Adaptive noise suppression

  • Processing by an adaptive filter

– n(k) is not a noise signal mixed into x(k), so we

can't subtract n(k) from x(k) directly →Use the filter W(z)

speech+noise noise speech+noise W(z)

+ -

xk nk

G(z)

Update W(z) so that power

  • f the output

signal become minimum

yk ek

slide-80
SLIDE 80

80

Adaptive filter

  • Realize W(z) as a FIR filter

– : i-th filter coefficient at time k

  • Updating coefficients using LMS algorithm

(Many other algorithms have been developed)

yk=∑

i=1 p

wiknk−i wik wik1=wik2 eknk−i1

slide-81
SLIDE 81

81

Subtractive array

  • Eliminate noise using microphone array

(When the direction of the noise is known)

+ - delay

noise(angle ) speech N d sin N c

slide-82
SLIDE 82

82

Adaptive subtractive array

  • When direction of the noise is not known

– Noise suppression using adaptive filter

+ - adaptive filter

noise speech

adaptive filter

slide-83
SLIDE 83

83

Adaptive subtractive array

  • Some constraints are needed for the filters

– Without any constraints, the output become zero

+ -

雑音 音声 H 1 H 2

slide-84
SLIDE 84

84

Adaptive subtractive array

  • Adaptive filter
  • Transfer function from speech source to

microphone

  • Constraint of the filters
  • Example

– Griffith-Jim array

F =∑

i

Gi H i=1 H i Gi

slide-85
SLIDE 85

85

Griffith-Jim array

+ - delay delay delay + + - + -

H 1 H 2

Output of delayed-sum array noise only

slide-86
SLIDE 86

86

2ch spectral subtraction

  • Spectral subtraction using microphone array

– Estimate noise spectrum by suppressing target

signal

– Subtract noise spectrum from the observed

spectrum

  • Features

– Nonlinear processing – No need to prepare noise spectrum beforehand – Effective for unstable noise

slide-87
SLIDE 87

87

2ch spectrul subtraction

+ - delay delay + |DFT|2 + - noise only |DFT|2

  • utput of delayed-

sum array

nonlinear processing (overestimation, flooring)