perceptual audio coding
play

Perceptual Audio Coding " Transmission bandwidth increases - PowerPoint PPT Presentation

1 Introduction Perceptual Audio Coding " Transmission bandwidth increases continuously, but the demand increases even more # need for compression technology Sources: Kahrs, Brandenburg, (Editors). (1998). Applications of digital


  1. 1 Introduction Perceptual Audio Coding " Transmission bandwidth increases continuously, but the demand increases even more # need for compression technology Sources: Kahrs, Brandenburg, (Editors). (1998). ”Applications of digital signal processing to audio and acoustics”. Kluwer Academic. " Applications of audio coding Bernd Edler. (1997). ”Low bit rate audio tools”. MPEG meeting. – audio streaming and transmission over the internet – mobile music players Contents: Overview of perceptual ! – digital broadcasting Introduction audio coding ! – soundtracks of digital video (e.g. digital television and DVD) Requiremens for audio Description of coding tools ! ! codecs Filterbankds ! Perceptual coding vs. ! Perceptual models ! source coding Quantization and coding ! Measuring audio quality ! Stereo coding ! Facts from psychoacoustics ! Real coding systems ! Requirements for audio coding systems Requirements (cont.) " Compression efficiency: sound quality vs. bit-rate " Algorithmic delay – depending on the application, the delay is or is not an important " Absolute achievable quality criterion – often required: given sufficiently high bit-rate, no audible difference – very important in two way communication (~ 20 ms OK) compared to CD-quality original audio – not important in storage applications " Complexity – somewhat important in digital TV/radio broadcasting (~ 100 ms) – computational complexity: main factor for general purpose " Editability computers – a certain point in audio signal can be accessed from the coded – storage requirements: main factor for dedicated silicon chips bitstream – encoder vs. decoder complexity – requires that the decoding can start at (almost) any point of the • the encoder is usually much more complex than the decoder bitstream • encoding can be done off-line in some applications " Error resilience – susceptibility to single or burst errors in the transmission channel – usually combined with error correction codes, but that costs bits

  2. Source coding vs. perceptual coding Source coding vs. perceptual coding " Usually signals have to be transmitted with a given fidelity, but not " Speech and non-speech audio are quite different necessarily perfectly identical to the original signal – In the coding context, the word ”audio” usually refers to " Compression can be achieved by removing non-speech audio – redundant information that can be reconstructed at the receiver " For audio signals (as compared to speech), typically – irrelevant information that is not important for the listener – Sampling rate is higher " Source coding : emphasis on redundancy removal – Dynamic range is wider – speech coding: a model of the vocal tract defines the possible – Power spectrum varies more signals , parameters of the model are transmitted – High quality is more crucial than in the case of speech signals – works poorly in generic audio coding: any kind of signals are – Stereo and multichannel coding can be considered possible, and can even be called music " The bitrate required for speech signals is much lower than " Perceptual coding : emphasis on the removal of perceptually irrelevant that required for audio/music information – minimize the audibility of distortions Lossless coding vs. lossy coding Measuring audio quality " Lossless or noiseless coding " Lossy coding of audio causes inevitable distortion to the original signal – able to reconstruct perfectly the original samples " The amount of distortion can be measured using – compression ratios approximately 2:1 – subjective listening tests, for example using mean opinion score – can only utilize redundancy reduction (MOS): the most reliable way of measuring audio quality " Lossy coding – simple objective criteria such as signal-to-noise ratio between the – not able to reconstruct perfectly the original samples original and reconstructed signal (quite non-informative from the perceptual quality viewpoint) – compression ratios around 10:1 or 20:1 for perceptual coding – complex criteria such as objective perceptual similarity metrics – based on perceptual irrelevancy and statistical redundancy that take into account the known properties of the auditory system removal (for example the masking phenomenon)

  3. 2 Some facts from psychoacoustics Measuring audio quality (Recap from Hearing lecture) " MOS " Main question in perceptual coding: – test subjects rate the encoded audio using N-step scale – How much noise (distortion, quantization noise) can be introduced into a signal without it being audible? – MOS is defined as the average of the subjects’ ratings " The answer can be found in psychoacoustics " MOS is widely used but has also drawbacks – Psychoacoustics studies the relationship between acoustic events and the corresponding auditory sensations – results vary across time " Most important keyword in audio coding is ” masking ” and test subjects – results vary depending " Masking describes the situation where a weaker but on the chosen test signals clearly audible signal (maskee) becomes inaudible in the (typical audio material vs. presence of a louder signal (masker) critical test signals) – masking depends both on the spectral composition of the maskee " Figure: example scale and masker, and their variation over time for rating the disturbance of coding artefacts 2.1 Masking in frequency domain Masking in frequency domain " Model of the frequency analysis in the auditory system " Figure: masked thresholds [Herre95] – subdivision of the frequency axis into critical bands – masker: narrowband noise around 250 Hz, 1 kHz, 4 kHz – frequency components within a same critical band mask each – spreading function: the effect of masking extends to the spectral other easily vicinity of the masker (spreads more towards high freqencies) – Bark scale: frequency scale that is derived by mapping " Additivity of masking: joint masked thresh is approximately frequencies to critical band numbers (but slightly more than) sum of the components " Narrowband noise masks a tone (sinusoidal) easier than a tone masks noise " Masked threshold refers to the raised threshold of audibility caused by the masker – sounds with a level below the masked threshold are inaudible – masked threshold in quiet = threshold of hearing in quiet

  4. 2.2 Masking in time domain Pre-echo " Forward masking (=post-masking) " Pre-echo : If coder-generaged artifacts (distortions) are spread in time to precede the signal itself, the resulting – masking effect extends to times after the masker is switched off audible artifact is called ”pre-echo” " Backwards masking (pre-masking) – common problem, – masking extends to times before the masker is been switched on since filter banks used " Figure [Sporer98]: in coders cause # forward/backward temporal spreading masking does not " Figure: Example of extend far in time pre-echo # simultaneous masking – lower curve (noise signal) is more important reveals the shape of phenomenon the analysis window 2.3 Variability between listeners 3 Overview of perceptual audio coding " An underlying assumption of perceptual audio coding is " Basic idea is to hide quantization noise below the signal- that there are no great differences in individuals’ hearing dependent threshold of hearing (masked threshold) " More or less true " Modeling the masking effect – absolute threshold of hearing: varies even for one listener over – most important masking effects are described in the frequency time – perceptual coders have to assume very good hearing domain – masked threshold: variations are quite small – on the other hand, effects of masking extend only up to about – masking in time domain: large variations, a listener can be trained 15ms distance in time (see ”masking in time domain” above) to hear pre-echos " Consequence: 2.4 Conclusion – perceptual audio coding is best done in time-frequency domain # common basic structure of perceptual coders " Research on hearing is by no means a closed topic – simple models can be built rather easily and can lead to reasonably good coding results – when desining more advanced coders (perceptual models), the limits of psychoacoustic knowledge are soon reached

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend