The ERBlet transform, auditory time-frequency masking and perceptual - PowerPoint PPT Presentation

The ERBlet transform, auditory time-frequency masking and perceptual sparsity Thibaud Necciari 1 joint work with P. Balazs 1 , B. Laback 1 , P. Soendergaard 1 , 3 , R. Kronland-Martinet 2 , S. Meunier 2 , S. Savel 2 , and S. Ystad 2 1 Acoustics Research Institute, Vienna, Austria 2 Laboratoire de M´ ecanique et d’Acoustique, Marseille, France 3 Technical University of Denmark 2nd SPLab Workshop, October 24–26, 2012, Brno

Context: Analysis-Synthesis of Sound Signals. Idea: Integrate aspects of human auditory perception in the signal representation

Goal of the Study. Achieve a perceptually-motivated and invertible TF transform based on: Properties of TF transforms: 1 Linear Allow perfect reconstruction Adapted to non-stationary signals Results on human auditory perception (psychoacoustics) 2

Some Aspects of Human Auditory Perception. 1. Spectral Resolution: The Auditory Filters. = Ability to resolve sinusoidal components in complex sounds. Peripheral filtering ≡ bank of bandpass filters = auditory filters

Some Aspects of Human Auditory Perception. 1. Spectral Resolution: The ERB Scale [Moore & Glasberg, 1983]. Each auditory filter is characterized by its ERB = E quivalent R ectangular B andwidth

Some Aspects of Human Auditory Perception. 2. Temporal Resolution. = Ability to detect rapid changes in sounds over time. Time axis partitioned into time windows (analog to spectral resolution) Windows length = temporal resolution Windows length = frequency dependent ≈ “internal” TF analysis [van Schijndel et al. , 1999] Windows length ≈ 4 periods of center frequency e.g. , 4 ms @ 1 kHz and 1 ms @ 4 kHz

Some Aspects of Human Auditory Perception. 3. Auditory Masking. = Increase in the detection threshold of a sound (“target”) in the presence of another sound (“masker”).

Some Aspects of Human Auditory Perception. 3. Auditory Masking. = Increase in the detection threshold of a sound (“target”) in the presence of another sound (“masker”). Measurement Amount of masking (dB) = masked threshold − absolute threshold � �� Detection threshold of target in Detection threshold of target in quiet presence of the masker

Some Aspects of Human Auditory Perception. 3. Auditory Masking. = Increase in the detection threshold of a sound (“target”) in the presence of another sound (“masker”). Main parameters: Time Frequency Stimulus duration Stimulus level Frequency region of the audible spectrum [20 Hz . . . 20 kHz]

Some Aspects of Human Auditory Perception. 3. Auditory Masking: Consequence in Signal Representation. � s ( t ) = STFT ( τ, ω ) g τ,ω ( t ) d τ d ω C g �� R normalization TF atom

Some Aspects of Human Auditory Perception. 3. Auditory Masking: Consequence in Signal Representation. � s ( t ) = STFT ( τ, ω ) g τ,ω ( t ) d τ d ω C g �� R normalization TF atom Can we represent only audible atoms? If so, which atoms can be removed?

Proposed Approach. To obtain a perceptually-motivated and invertible TF transform:

Proposed Approach. To obtain a perceptually-motivated and invertible TF transform: Adapt the transform parameters to mimic the auditory TF 1 resolution → A variable-resolution transform is required! ֒

Proposed Approach. To obtain a perceptually-motivated and invertible TF transform: Adapt the transform parameters to mimic the auditory TF 1 resolution → A variable-resolution transform is required! ֒ Use a psychoacoustic model of TF masking to represent only 2 the audible components (perceptual sparsity concept).

Outline. Perceptually-based TF transform: The ERBlet 1 Perceptual sparsity concept: Investigating auditory TF masking 2 Discussion: Combination of ERBlet & perceptual sparsity? 3

Outline. Perceptually-based TF transform: The ERBlet 1 Concept Implementation Example Perceptual sparsity concept: Investigating auditory TF masking 2 Discussion: Combination of ERBlet & perceptual sparsity? 3

The ERBlet Transform . Concept. The non-stationary Gabor transform (NSGT) [Balazs et al. , 2011] Allows resolution to freely evolve over T and/or F We can adapt both The shape of g ( t ) either in T or F The redundancy Perfect reconstruction is achieved if the frame inequality is fulfilled Idea Develop a perceptually-motivated NSGT: Use NSGT with resolution evolving over frequency to mimic the ERB scale ֒ → The ERBlet transform .

ERBlet Implementation. 1. Analysis Functions. NSGT with resolution evolving over time available in LTFAT [Soendergaard, 2010]: function nsdgt.m Applying nsdgt on the Fourier transform of s ( t ) �→ ˆ s ( ν ) allows to construct NSGT with resolution evolving over frequency (= constant-Q NSGT in [Velasco et al. , 2011] but with � = functions)

ERBlet Implementation. 1. Analysis Functions. NSGT with resolution evolving over time available in LTFAT [Soendergaard, 2010]: function nsdgt.m Applying nsdgt on the Fourier transform of s ( t ) �→ ˆ s ( ν ) allows to construct NSGT with resolution evolving over frequency (= constant-Q NSGT in [Velasco et al. , 2011] but with � = functions) Analysis functions (Gaussian windows): Γ m = f( m ) � � 2 2500 ν e − π 1 ˆ Γ m √ Γ m h m ( ν ) = 2000 1500 Γ m (Hz) where 1000 m = frequency index 500 Γ m = ERB m (in Hz) 0 0 0.5 10 15 20 Frequency index m (kHz)

ERBlet Implementation. 2. Spectral Resolution. Analysis windows Dual windows 0.08 0.3 0.07 0.06 0.25 0.05 0.2 Amplitude Amplitude 0.04 0.15 0.03 0.1 0.02 0.05 0.01 0 0 0 1000 2000 3000 4000 5000 6000 7000 8000 0 1000 2000 3000 4000 5000 6000 7000 8000 Frequency [Hz] Frequency 1 window/ERB ( ≡ auditory filterbank); 34 channels @ 8 kHz, 49 channels @ 22 kHz

ERBlet Implementation. 3. Temporal Resolution. Analysis windows, time −3 4.5 x 10 4 kHz: Resolution = 1.1 ms 4 (auditory = 1 ms) 3.5 1 kHz: Resolution = 3.7 ms 3 (auditory = 4 ms) Amplitude 2.5 2 1.5 1 0.5 0 −500 0 500 1000 1500 2000 2500 Time index

ERBlet Example. LTFAT Speech Test Signal “greasy”. Standard Gabor (dB SPL) ERBlet (dB SPL) 8000 8000 100 100 4000 6000 80 80 Frequency (Hz) Frequency (Hz) 2000 60 60 4000 1000 40 40 500 2000 20 20 250 0 100 0 0 0 0 0.1 0.2 0.3 0 0.1 0.2 0.3 Time (s) Time (s) Frame bounds ratio = 1.5 Frame bounds ratio = 1 Redundancy ≈ 4 Redundancy ≈ 4.6 Reconstruction error < 10 − 16 Reconstruction error < 10 − 16

Outline. Perceptually-based TF transform: The ERBlet 1 Perceptual sparsity concept: Investigating auditory TF masking 2 Problematic Experimental methods Results Discussion: Combination of ERBlet & perceptual sparsity? 3

Auditory TF Masking: Problematic. Which atoms can be removed from the signal representation? A representation of TF masking for short and narrowband signals is required.

Auditory TF Masking: Problematic. Current masking data are not suitable for prediction of masking between TF atoms

Auditory TF Masking: Problematic. Current masking data are not suitable for prediction of masking between TF atoms Psychoacoustical studies mostly focused on T OR F

Auditory TF Masking: Problematic. Current masking data are not suitable for prediction of masking between TF atoms Psychoacoustical studies mostly focused on T OR F Very few studies measured TF masking [Fastl, 1979; Kidd & Feth, 1981; Soderquist et al. , 1981; Moore et al. , 2002]

Auditory TF Masking: Problematic. Current masking data are not suitable for prediction of masking between TF atoms Psychoacoustical studies mostly focused on T OR F Very few studies measured TF masking [Fastl, 1979; Kidd & Feth, 1981; Soderquist et al. , 1981; Moore et al. , 2002] These studies used long-duration maskers: not compatible with atomic decomposition

Experimental Methods. 1. Stimuli (Masker & Target). Formula √ � � 2 πf 0 t + π e − π (Γ t ) 2 s ( t ) = A Γ sin 4 f 0 = carrier frequency π 4 phase shift: signal energy = independent of f 0 Γ = shape factor of the Gaussian window

Experimental Methods. 1. Stimuli (Masker & Target). Formula √ � � 2 πf 0 t + π e − π (Γ t ) 2 s ( t ) = A Γ sin 4 f 0 = carrier frequency π 4 phase shift: signal energy = independent of f 0 Γ = shape factor of the Gaussian window Spectro-temporal characteristics ERB ⇔ Γ = 600 Hz [van Schijndel et al., 1999] ERD ⇔ Γ − 1 = 1.7 ms 0-amplitude duration = 9.6 ms

The ERBlet transform, auditory time-frequency masking and perceptual - PowerPoint PPT Presentation

The ERBlet transform, auditory time-frequency masking and perceptual sparsity Thibaud Necciari 1 joint work with P. Balazs 1 , B. Laback 1 , P. Soendergaard 1 , 3 , R. Kronland-Martinet 2 , S. Meunier 2 , S. Savel 2 , and S. Ystad 2 1 Acoustics

Vestibular and Auditory Sensory Systems Auditory Modulation difficulties Low Frequency:

Proposal to add masking function to GFM Proposal part 1 Adding a masking reference attribute on

Frames for Psychoacoustics tics Peter Balazs Erblet transform and perceptual sparsity ARI

1 2 Auditory processing is crucial because our learning is heavily reliant on auditory system---=

Auditory System Whats the frequency Kenneth? Overview Intro Physical Stimulus: Sound

Topic 10: The Z Transform o Introduction to Z Transform o Relationship to the Fourier transform o

Fourier Series and Transform Overview Why Fourier transform? Trigonometric functions Who is

Frequency Decomposition The base frequency or the fundamental frequency is the lowest frequency.

Very High-Order Masking: Efficient Implementation and Security Evaluation Anthony Journault and

SMART GOVERNMENT INVOICING: INVOICE PROCESSING PLATFORM LEAD. TRANSFORM. DELIVER LEAD. TRANSFORM.

On the Multiplicative Complexity of Boolean Functions and Bitsliced Higher-Order Masking Dahmun

Understanding the Masking-Shadowing Function INRIA ; CNRS ; Univ. Grenoble Alpes in

Masking schemes: evaluation Oscar Reparaz COSIC/KU Leuven PROOFS Taipei (Taiwan)

rtAIM : a real-time implementation of the auditory image model of auditory periphery Willem van

Topic 4: Continuous-Time Fourier Transform (CTFT) o Introduction to Fourier Transform o Fourier

Time-Frequency Analysis Time Frequency Analysis in Visual Signal Yetmen Wang AnCAD, Inc.

Visualization of perceptual qualities in textural sounds International Computer Music Conference

A New TABE for a New Era Agenda I. TABE Current Status II. NRS Changes III. TABE 11&12 IV.

Graphics, Interaction and Perception in Augmented and Virtual Reality AR/VR Karan Singh Inspired

Beyond Text INFM 718X/LBSC 708X Session 10 Douglas W. Oard Agenda Beyond Text, but still

MODULES AS PERCEPTUAL INPUT - SYSTEMS Language Perception Visual Auditory Perception

COMP 150: Developmental Robotics Instructor: Jivko Sinapov www.cs.tufts.edu/~jsinapov Audio

SENSORIAL MECR 2019 Ruthann Christensen introduction to sensorial Development of Sensorial

Augmented Reality (AR) Different implementations exist All combine real with virtual elements

The ERBlet transform, auditory time-frequency masking and perceptual - PowerPoint PPT Presentation

The ERBlet transform, auditory time-frequency masking and perceptual sparsity Thibaud Necciari 1 joint work with P. Balazs 1 , B. Laback 1 , P. Soendergaard 1 , 3 , R. Kronland-Martinet 2 , S. Meunier 2 , S. Savel 2 , and S. Ystad 2 1 Acoustics

Vestibular and Auditory Sensory Systems Auditory Modulation difficulties Low Frequency:

Proposal to add masking function to GFM Proposal part 1 Adding a masking reference attribute on

Frames for Psychoacoustics tics Peter Balazs Erblet transform and perceptual sparsity ARI

1 2 Auditory processing is crucial because our learning is heavily reliant on auditory system---=

Auditory System Whats the frequency Kenneth? Overview Intro Physical Stimulus: Sound

Topic 10: The Z Transform o Introduction to Z Transform o Relationship to the Fourier transform o

Fourier Series and Transform Overview Why Fourier transform? Trigonometric functions Who is

Frequency Decomposition The base frequency or the fundamental frequency is the lowest frequency.

Very High-Order Masking: Efficient Implementation and Security Evaluation Anthony Journault and

SMART GOVERNMENT INVOICING: INVOICE PROCESSING PLATFORM LEAD. TRANSFORM. DELIVER LEAD. TRANSFORM.

On the Multiplicative Complexity of Boolean Functions and Bitsliced Higher-Order Masking Dahmun

Understanding the Masking-Shadowing Function INRIA ; CNRS ; Univ. Grenoble Alpes in

Masking schemes: evaluation Oscar Reparaz COSIC/KU Leuven PROOFS Taipei (Taiwan)

rtAIM : a real-time implementation of the auditory image model of auditory periphery Willem van

Topic 4: Continuous-Time Fourier Transform (CTFT) o Introduction to Fourier Transform o Fourier

Time-Frequency Analysis Time Frequency Analysis in Visual Signal Yetmen Wang AnCAD, Inc.

Visualization of perceptual qualities in textural sounds International Computer Music Conference

A New TABE for a New Era Agenda I. TABE Current Status II. NRS Changes III. TABE 11&amp;12 IV.

Graphics, Interaction and Perception in Augmented and Virtual Reality AR/VR Karan Singh Inspired

Beyond Text INFM 718X/LBSC 708X Session 10 Douglas W. Oard Agenda Beyond Text, but still

MODULES AS PERCEPTUAL INPUT - SYSTEMS Language Perception Visual Auditory Perception

COMP 150: Developmental Robotics Instructor: Jivko Sinapov www.cs.tufts.edu/~jsinapov Audio

SENSORIAL MECR 2019 Ruthann Christensen introduction to sensorial Development of Sensorial

Augmented Reality (AR) Different implementations exist All combine real with virtual elements

A New TABE for a New Era Agenda I. TABE Current Status II. NRS Changes III. TABE 11&12 IV.