Audio file formats flow effortlessly around the Internet - that's their charm and their challenge. But which is best?
The file-naming limitations of the Microsoft DOS operating system helped name the prevalent digital music format of our time. Karlheinz Brandenburg and colleagues at the Fraunhofer Institute in Germany were putting the finishing touches on their psychoacoustic compression scheme and needed a name for it.
'At that time, file extensions were limited to three letters,' Brandenburg recalled at the Audio Engineering Society Convention in London earlier this year. 'On 14 July 1995, we started to use the extension '.mp3' for our software. If there is an official birthday for MP3, that is it.'
Not long after the initial release of the reference encoder, Fraunhofer quickly fell victim to something that MP3 has repeatedly been blamed for since: Internet piracy. Brandenburg recalls: 'A student from Australia using a stolen credit card number took the software and put it out as 'freeware', saying 'thanks to Fraunhofer'.
'We did find who it was in the end,' Brandenburg adds.
The first MP3 download websites were online by the end of 1996. 'They were mostly for the illegal distribution of music,' says Brandenburg.
At first, the Fraunhofer researchers found difficulties in getting people outside the small community of researchers working on audio coding to accept the idea of a high-quality compression scheme.
'The work started originally in 1982 when Dieter Seitzer came up with an idea for transmitting music over ISDN - a digital telephony standard limited to 128kb/s. Seitzer talked to a patent examiner about the concept.
'The patent examiner said 'there is no high-quality music at 128kb/s,' says Brandenburg. 'In the early days I thought the patent examiner was right. It was not possible.
'But about the same time a lot of groups started doing this kind of work. A lot of people had similar ideas. The question was: how to compress to one-tenth of the original bitrate? Information theory tells us there has to be a loss of information. The way to deal with this is to make sure we are not able to hear that loss of information.
'The basic idea is now well known: it's called masking. Louder sounds mask weaker sounds. And it's the same for everybody.'
Brandenburg plays samples of masking effects and points to what he calls the '13dB miracle'. The masking threshold is, on average, 13.6dB below the louder sound.
Psychoacoustics theory describes the human hearing system as a set of band-pass filters, each one covers an area of the audio spectrum called a critical band. Each of these critical bands corresponds to a separate area on the basilar membrane in the inner ear. Two independent tones that lie within the same critical band will often be perceived as a single sound; two tones that are not within the same critical band are perceived as two separate tones: the combination of these two our brains will hear as a louder overall signal.
The masking effect means that it is possible to enode quiet signals at a much lower level of resolution than would be needed to make the louder signal sound undistorted.
The key to the MP3 format is to split the audio signal into multiple frequency bands, work out how many bits to allocate to each up to a limit set by the user and then use more conventional data-compression techniques to further squeeze the bitstream.
At conferences, recording engineer George Massenburg uses audio of the difference between uncompressed audio and the decoded MP3 version to demonstrate how much information is lost even with high-bitrate compression. However, the important factor is not the information that is lost but whether it is detectable - and at which level of compression.
Demonstrations of the poor quality of MP3 tend to focus on bitrates at or below Seitzer's target of ISDN bandwidth. A 2006 study by the University of Milan and STMicroelectronics used both subjective listening tests and more objective measurements designed to find known compression artefacts, such as 'birdies' - squeaking in heavily compressed files - and pre-echoes, in which the sharp attack sounds of instruments such as castanets are smeared so badly they can be heard before they are actually hit.
The objective tests found no practical difference above 200kb/s, a finding largely backed up by listening tests although the point at which people could find no difference was not reliably reached until the bitrate reached 320kb/s.
Other, more sophisticated schemes have followed MP3. Brandenburg explains: 'MP3 was a compromise. In 1994, some companies decided they could do it better. We said if there is to be competition we want to be part of it.'
The result would be the Advanced Audio Coding (AAC) system that, in principle, could halve the bitrate for the same quality. It took advantage of higher compute power to increase the number of frequency bands used for analysing the audio signal although the overall approach to perceptual coding remains the same for AAC as well as for the open-source Vorbis codec.
Although the Internet has made it possible to distribute any number of audio formats, from highly compressed AAC through to surround-sound at 192kHz and 24bit resolution, Brandenburg believes the problem is now one of choice: 'I meet more and more people who have terabytes of music at home. The question is how to get that organized. I met people ten years ago who said everything will be streamed. But we still have the genes of hunter-gatherers who collect for listening at home.'
But when they collect all that music, they are confronted with the problem of choosing what to listen to. Fraunhofer's researchers are looking at the problem of music classification so that people can have a computer pick something that fits their mood or taste that day.
But determining how well a piece of music fits a genre is tricky. 'To the algorithms, The Beatles' 'Ob-La-Di, Ob-La-Da' sounds like a children's song,' says Brandenburg. However, few Beatles fans would want 'Barney the Purple Dinosaur' to follow a favourite tune from the Fab Four. *