Some Notes on the Psychoacoustics and Signal Processing of RASTA-PLP - - PowerPoint PPT Presentation

some notes on the psychoacoustics and signal processing
SMART_READER_LITE
LIVE PREVIEW

Some Notes on the Psychoacoustics and Signal Processing of RASTA-PLP - - PowerPoint PPT Presentation

Wire Communication Laboratory - University of Patras Some Notes on the Psychoacoustics and Signal Processing of RASTA-PLP Analysis of Speech Jrg Buchholz Introduction Reflection Masking Model RASTA-PLP Processing RASTA


slide-1
SLIDE 1

Some Notes on the Psychoacoustics and Signal Processing of RASTA-PLP Analysis of Speech

  • Introduction
  • Reflection Masking Model
  • RASTA-PLP Processing
  • RASTA applied to RIR filtered speech

Wire Communication Laboratory - University of Patras

Jörg Buchholz

slide-2
SLIDE 2

Comments on Perceptual Modelling

Psychoacoustics Physiology Perceptual Model (Masking Model) Speech enhancement in reverberant environments

Comparison Description Application Requirements Requirements

slide-3
SLIDE 3

Illustration of Masking and Suppression

masker suppressor suppressor test signals test signals post (forward) masking pre (backward) masking simultaneous masking time frequency

slide-4
SLIDE 4

Structure of the Masking Module

TMM Simultaneous Masking Directivity Module s (t)

i

s (t)

i-1

s (t)

i+1

Module Two-Tone Suppression Module (BP-Filterbank) Transformation / Resythesis Feature Vectors / Audible Signal TMM TMM TMM

slide-5
SLIDE 5

Block Diagramm of the RASTA-PLP Method

Speech compressing static NL (log) CB-integration (mel-scale) FFT expanding static NL (Exp) linear BP-filtering equal loudness curve IFFT / IDFT power law of hearing cepstral recursion solving set of linear equations cepstral coefficients of RASTA-PLP model RASTA

slide-6
SLIDE 6

Time / Frequency Analysis

time / ms 10 50 40 60 20 30 frequency frequency amplitude 1 fa/2 fa/2 power spectrum mel-scale power spectrum short time analysis i+3 i+4 i+1 i+2 i+5 i time trajectory k FFT CB integration

slide-7
SLIDE 7

CB-integration (mel-scale)

10

2

10

3

10

4

  • 40
  • 20

20 Magnitude in dB 10

2

10

3

10

4

  • 60
  • 40
  • 20

20 40 frequency / Hz Magnitude in dB

slide-8
SLIDE 8

Overview of some RASTA Methods

  • Additive Noise (uncorrelated) → Lin-RASTA
  • Convolutional Noise → Log-RASTA
  • Convolutional and Additive Noise → J-RASTA (Lin/Log)

Optimal J depending on the noise power!

( ) ( ) ( ) ( ) ( )

( )

( )

( )

s t h t S H S H

FT

⊗ → ⋅ → + ω ω ω ω

log

log log

y J x x e J e J

y y

= + ⋅ = − ≈ ln( ) 1 1

( ) ( ) ( ) ( )

s t n t S N

FT

+ → + ω ω

slide-9
SLIDE 9

Rasta BP-Filter

50 100 150 200 250 300 350 400 0.1 0.2 0.3 Time / ms Amplitude 10

  • 2

10

  • 1

10 10

1

10

2

  • 40
  • 30
  • 20
  • 10

10 Modulation Frequency / Hz Magnitude in dB

( )

H z z z z z z = ⋅ ⋅ + − − −

− − − −

01 2 2 1 0 98

4 1 3 4 1

. .

slide-10
SLIDE 10

LIN-RASTA-Processing of clean speech

500 1000 1500 2000 0.5 1 500 1000 1500 2000

  • 0.5

0.5 1 normalis ed amplitude 500 1000 1500 2000 0.5 1 time / ms

Time Trajectory (1 kHz band) BP-filtered Time Trajectory Negative values set to zero

slide-11
SLIDE 11

Lin-RASTA-Processing of noisy speech (-5 dB SNR)

500 1000 1500 2000 0.5 1 500 1000 1500 2000

  • 0.5

0.5 1 normalis ed amplitude 500 1000 1500 2000 0.5 1 time / ms

Time Trajectory (1 kHz band) BP-filtered Time Trajectory Negative values set to zero

slide-12
SLIDE 12

RASTA applied to Room Impulse Responses

Basic Assumptions for RASTA processing:

  • Analysis Window length >> Filter (RIR) length
  • Filter (RIR) should be constant (slowly changing) for

a duration >> Analysis Window length

  • For ASR: Window length ≈ 20-30 ms

Conflict!!!

Baseline Multiresolution Clean 8,6 % 13,5 % Reverberant 34,8 % 22,8 % Isolated Digits Word Error Rates

slide-13
SLIDE 13

Multiresolution Processing Concept (Avendano)

x(n) n x(n) n X(n , )

2 2k

ω X(n , )

1 1k

ω n2 ω1 A

n1 ω2

slide-14
SLIDE 14
  • 12 dB

0 dB 0 dB

  • 6 dB

+18 dB

Equal-Loudness Curves

slide-15
SLIDE 15

Loudness function (Zwicker)