Lossy Audio Data Compression EffectsThis page demonstrates the distortion that can be added to a soundfile when it is saved with lossy data compression. It is nearly impossible to distinguish between the original and the compressed version of the test file by listening. However, if you look at the spectrograms, you will see significant distortions (additional noise) around rapidly modulated sounds. Also, soft sounds (the constant sine signal at 17 kHz) will be degraded. Additional background noise seems to have no major effects on the spectrographic representation because the potential spurious noise is buried in that original masking noise. It should also be noted that the occurrence of the spurious noise is very unpredictable (the original zig-zag shaped signal ranging from 1.13 to 1.56 sec is identical to that from 3.1 to 3.54 sec). In the first example, the MP3 system was used. There are differences between the various versions (including the ATRAC system employed in MiniDisk recorders). However, the principle is always the same. If the available bit rate is not sufficient for encoding a given signal, the data reduction algorithm has to remove those parts of the sound, that are inaudible or less important for the human perception. This is done by reducing the bit-depths in some frequency bands. That procedure may produce additional spurious (quantization) noise in the decoded signal. Even more sophisticated algorithms as the bit reservoir feature of MP3 will lead to loss of information or distortion, as soon as complicated sounds last for more than a few millisecondsThis is the spectrogram of the original sound file. Listen to the original soundfile. This is the spectrogram of the compressed MP3 sound file. Listen to the compressed soundfile (decoded back into .wav file format) and the compressed .mp3 soundfile.
This is the spectrogram of the test signal after passing through a MiniDisk recorder employing ATRAC 4.5. Listen to the compressed soundfile (decoded back into .wav file format). It reveals more dramatic artifacts than the MP3 example. The constant sine signals at 16 and 17 kHz temporarily disappear completely. The spurious noise surrounding the fast frequency modulated sound structures is considerably stronger at some locations.
See also the spectrogram of the 8 bit version of the original soundfile. It can be seen, that even with a very poor resolution of 8 bit (dynamic range of 42dB), we are getting a much more precise spectrogram than from the compressed 16 bit version!
The following two spectrograms represent natural bird sounds that have been recorded with both DAT (Tascam DA-P1) and MiniDisk (HHB Portadisk with Atrac 4.5) recorders. The microphone signal was fed simultaneously into both recorders. This test has been conducted by Jeremy Minns, Brazil. Listen to the underlying soundfile. The artifacts visible on the spectrogram at t=6.3 sec can also be recognized when listening carefully via headphones. DAT recording MiniDisk recording The amount of distortion added by the encoder will heavily depend on the frequency content of the recorded signals. Band-limited signals carry less information and are therefore easier to process. To illustrate this effect, the original artificial sound file has been low-pass filtered with a cut-off frequency of 8.5 kHz (spectrogram and soundfile of the uncompressed file). The remaining signals below 8.5 kHz will be much better reproduced (spectrogram and soundfile of the compressed file). Care should be taken when transferring even digital data between a digital recorder and the PC. This example shows the spectrogram of a resampled and compressed soundfile, that has been digitally transferred onto a MiniDisk recorder and transferred back to the PC via SPDIF. Listen to both the the original re-sampled signal on the left and the compressed soundfile one right channel. Obviously, the real-time sample-rate conversion process of the SP/DIF interface from 44.1 to 48 kHz introduced the additional artifacts at the higher frequencies (especially those above 22 kHz). Under these circumstances it were better to transer the data via the analog path, even if some (neglectable) additinal quantization noise were introduced. MethodsThe synthetic test signal was generated using Avisoft-SASLab Pro, version 4.21 and it was saved as an uncompressed mono 16 bit/44.1 kHz .wav file. The data rate of this uncompressed file is 705 kBit/s. This file was then converted into a .MP3 file using LAME V3.93.1 (MP3DEV.ORG). The default settings were used (bitrate of 128 kbit, bit reservoir not disabled, ...). The bitrate of 128 kBit/s for mono files (256 kBit/s in stereo) corresponds to the 5:1 compression of comparable ATRAC systems (the compression ratio in the MP3 sample is 5.38 : 1). The .MP3 file was then converted back to a .WAV file by using the LAME decoder.In the MiniDisk (ATRAC) example, the orginal test file was digitally recorded onto a Sony MDS-JE520 MiniDisk deck (ATRAC 4.5). That recorded test signal was then played back to the PC by using the same equipment. Please note, that this is not an evaluation of the equipment used. It should only demonstrate the horrific effects that may occur in extreme situations. The spectrograms of both the compressed and original file were made in Avisoft-SASLab Pro. The spectrogram parameters were : FFT-Length = 256, Frame size = 100%, Window = FlatTop, Overlap = 87.5%. This configuration provides an analysis bandwidth of 650 Hz and a temporal resolution of 1.5 milliseconds. LinksMP3 Tech , goto Overview of the MP3 techniquesFraunhofer Institution developed the MPEG audio technology ATRAC : Adaptive Transform Acoustic Coding for MiniDisc Tutorial CD-ROM on coding artifacts, published by the Audio Engineering Society Home | Avisoft-SASLab Pro | Avisoft-RECORDER | Avisoft-UltraSoundGate | Downloads | Animal Sounds | Last modified on 18 August 2003, Raimund Specht |