Lossy Audio Data Compression Effects

Lossy Audio Data Compression Effects

This page demonstrates the distortion that can be added to a soundfile when it is saved with lossy data compression. It is nearly impossible to distinguish between the original and the compressed version of the test file by listening. However, if you look at the spectrograms, you will see significant distortions (additional noise) around rapidly modulated sounds. Also, soft sounds (the constant sine signal at 17 kHz) will be degraded. Additional background noise seems to have no major effects on the spectrographic representation because the potential spurious noise is buried in that original masking noise. It should also be noted that the occurrence of the spurious noise is very unpredictable (the original zig-zag shaped signal ranging from 1.13 to 1.56 sec is identical to that from 3.1 to 3.54 sec). In the first example, the MP3 system was used. There are differences between the various versions (including the ATRAC system employed in MiniDisk recorders). However, the principle is always the same. If the available bit rate is not sufficient for encoding a given signal, the data reduction algorithm has to remove those parts of the sound, that are inaudible or less important for the human perception. This is done by reducing the bit-depths in some frequency bands. That procedure may produce additional spurious (quantization) noise in the decoded signal. Even more sophisticated algorithms as the bit reservoir feature of MP3 will lead to loss of information or distortion, as soon as complicated sounds last for more than a few milliseconds

This is the spectrogram of the original sound file. Listen to the original soundfile.

compressed MP3 copy

This is the spectrogram of the compressed MP3 sound file. Listen to the compressed soundfile (decoded back into .wav file format) and the compressed .mp3 soundfile.

		This is a single spectrum taken from the spectrogram at t=3.34 sec (uncompressed file)
		This is a single spectrum taken from the spectrogram at t=3.34 sec (compressed MP3 file). The spurious signal components ranging from 5 to 12 kHz have a maximum amplitude of -28 dB (relative to the peak amplitude of the original signal). Theoretically, a bit-depth of about only 4 bit (28dB/6dB) would be sufficient to represent that worst case situation.

This is the spectrogram of the test signal after passing through a MiniDisk recorder employing ATRAC 4.5. Listen to the compressed soundfile (decoded back into .wav file format). It reveals more dramatic artifacts than the MP3 example. The constant sine signals at 16 and 17 kHz temporarily disappear completely. The spurious noise surrounding the fast frequency modulated sound structures is considerably stronger at some locations.

		This is a single spectrum taken from the orginal test signal (uncompressed file)
		This is a single spectrum taken from the test file at the same location, but after passing through MiniDisk (worst-case situation at t=1.374 sec).

See also the spectrogram of the 8 bit version of the original soundfile. It can be seen, that even with a very poor resolution of 8 bit (dynamic range of 42dB), we are getting a much more precise spectrogram than from the compressed 16 bit version!

		This is a high-pitched synthesized mice call. Thanks to the synthetic generation, there are no environmental disturbances as reverberation or other background noise which are often present in field recordings.
		This is the compressed version of the above call. The additional noise caused by the encoder is clearly visible. The bit reservoir feature of MP3 was disabled in this example.
		A simple digital reverberation filter was applied to the original uncompressed sound file in order to simulate a poor field recording.* It can be seen, that even without the influence of a lossy data reduction system, an emitted signal may be degraded by other effects too. However, these effects represent the acoustic properties of the environment and are no artifacts introduced by the technology used. * Such effects cause 'poor' recordings only in terms of the spectrographic representation. But to ours ears, these effects are not necessarily unpleasant, because they may provide additional information on the acoustic environment (e.g. on the leaves in a forest).

The following two spectrograms represent natural bird sounds that have been recorded with both DAT (Tascam DA-P1) and MiniDisk (HHB Portadisk with Atrac 4.5) recorders. The microphone signal was fed simultaneously into both recorders. This test has been conducted by Jeremy Minns, Brazil. Listen to the underlying soundfile. The artifacts visible on the spectrogram at t=6.3 sec can also be recognized when listening carefully via headphones.

natural bird song on DAT

DAT recording

natural bird song on MD

MiniDisk recording

The amount of distortion added by the encoder will heavily depend on the frequency content of the recorded signals. Band-limited signals carry less information and are therefore easier to process. To illustrate this effect, the original artificial sound file has been low-pass filtered with a cut-off frequency of 8.5 kHz (spectrogram and soundfile of the uncompressed file). The remaining signals below 8.5 kHz will be much better reproduced (spectrogram and soundfile of the compressed file).

Care should be taken when transferring even digital data between a digital recorder and the PC. This example shows the spectrogram of a resampled and compressed soundfile, that has been digitally transferred onto a MiniDisk recorder and transferred back to the PC via SPDIF. Listen to both the the original re-sampled signal on the left and the compressed soundfile one right channel. Obviously, the real-time sample-rate conversion process of the SP/DIF interface from 44.1 to 48 kHz introduced the additional artifacts at the higher frequencies (especially those above 22 kHz). Under these circumstances it were better to transer the data via the analog path, even if some (neglectable) additinal quantization noise were introduced.

Methods

The synthetic test signal was generated using Avisoft-SASLab Pro, version 4.21 and it was saved as an uncompressed mono 16 bit/44.1 kHz .wav file. The data rate of this uncompressed file is 705 kBit/s. This file was then converted into a .MP3 file using LAME V3.93.1 (MP3DEV.ORG). The default settings were used (bitrate of 128 kbit, bit reservoir not disabled, ...). The bitrate of 128 kBit/s for mono files (256 kBit/s in stereo) corresponds to the 5:1 compression of comparable ATRAC systems (the compression ratio in the MP3 sample is 5.38 : 1). The .MP3 file was then converted back to a .WAV file by using the LAME decoder.

In the MiniDisk (ATRAC) example, the orginal test file was digitally recorded onto a Sony MDS-JE520 MiniDisk deck (ATRAC 4.5). That recorded test signal was then played back to the PC by using the same equipment. Please note, that this is not an evaluation of the equipment used. It should only demonstrate the horrific effects that may occur in extreme situations.

The spectrograms of both the compressed and original file were made in Avisoft-SASLab Pro. The spectrogram parameters were : FFT-Length = 256, Frame size = 100%, Window = FlatTop, Overlap = 87.5%. This configuration provides an analysis bandwidth of 650 Hz and a temporal resolution of 1.5 milliseconds.

Links

MP3 Tech , goto Overview of the MP3 techniques
Fraunhofer Institution developed the MPEG audio technology
ATRAC : Adaptive Transform Acoustic Coding for MiniDisc
Tutorial CD-ROM on coding artifacts, published by the Audio Engineering Society