Digital & Analogue Recording
Sunday, 16. September 2007, 01:26:01
Sound is an impulse. An impulse of energy - in this case, the energy is moving air or some other resonant material. Vitality, energy, harmony - all that in the West is treated separately, in Chinese is called "ji" - the life essence itself. Music is a direct incarnation, reflexion of that.
The charge, potential, power of sound impulse is usually notated in frequency terms by humans.
Digital processing involves the conversion of any analogue of an image (or sound) into bit-coded representation. Initially there are only two solid states of a bit - on or off (true/false, yes/no, 1/0, etc.). With more bits assigned, the complexity can be increased, e. g. with several bits representing the position of a character in a mapped alphabet, other bits representing the position of a character within a line, other bits drawing the character onscreen, etc.
In digital representation, a whole of a picture or a fragment of sound is represented in smallest possible fragments - quantums. In the case of imagery, the smallest component of a represented picture is pixel. In the case of sound, it is sample. A pixel is a smallest square dot represented by a bitmapped image; a sample is a fragment of a second, capturing variation from one state of a sound wave to another. The more samples per second, the higher the frequency of sound wave captured, the higher the resolution (and precision) of the digital soundwave, the more natural the sounding.
Both in digital audio representation and imaging, the variation of values of a pixel/sample is determined by the bit depth. 16-bit means only 65535 different values are possible for either red, green, blue values for images, or for sample coordinates.
Any transforms within the original 16-bit-unit mathematical space will introduce rounding errors and other distortion. Basically, if a sample or pixel value falls out of the existing coordinate/colourspace grid, it has to be approximated by the calculation. In the case of straight, unsmoothed transforms, this means stray samples left across the soundwave, introducing short spikes across the bandwidth. Dither - a small amount of very low-level (less than -70 dB) noise - helps solve the problem of distortion caused by quantising to 16-bit, by occupying minimal coordinates in the wave and preventing stray samples from dropping into empty space. Introducing minimal noise removes distortion and possible interference noise.
This is why the minimum bit depth for any serious processing work with both digital images and sound is 24-bit/unit. In digital imagery, 24-bit gives a variation of 16+ million values, where an additional 8-bit is often reserved for an alpha channel, thus giving a total of 32-bit variation for each pixel.
In the case of audio, 32-bit means, well, 32-bit. Many current plugins already have a processing depth of 64-bit, to avoid introducing even those distortions that are possible with the millions of coordinate values granted by 32-bit processing.
In digital audio, a sound is represented as a Fourier transform of the original wave. Any sound can be represented as a sine wave of a certain frequency within certain harmonic series.
"Any non-sinusoidal waveforms, such as square waves or even the irregular sound waves made by human speech, can be represented as a collection of sinusoidal waves of different periods and frequencies blended together. The technique of transforming a complex waveform into its sinusoidal components is called Fourier analysis."
"According to the Fourier theorem, any periodic waveform may be analyzed as the sum of a series of sine waves with frequencies in a harmonic series, each of which has an amplitude and phase angle given by the Fourier coefficients. Since a sine wave has only a single frequency associated with it, it may be considered the simplest sound."
The ideal Fourier-transformed waveform is infinite. The ideal digital audio equipment would have an infinite frequency resolution. Ironically, such equipment had previously been invented, and had been used for years before the digital audio equipment. Magnetic tape stores a waveform in the format of a magnetic impulse, which is picked up by the electromagnetic deck head. Effective frequency reproduction tends to decline past ~16000 Hz, but magnetic tape does have an infinite audio resolution - albeit dropping past a certain threshold (influenced by the entire recording chain, of course). It is also not limited by any determined sampling period or sample bit depth - the technology mirrors what's coming in as an electric analogue, much in the same way photography mirrors what's coming through a camera's lens. Unlike digital mediums, the electromagnetic analogue method also has the potential for recording additional even harmonics (much like a moving ship leaves a trail in the water, a tape recorder's head introduces an overtone trail) - overtones. Overtones make analogue recordings always sound warmer and more consistent than digital recordings, especially as a human's ear does not pick up just a single sound wave, but is affected by secondary harmonics of the "sound trail" - the entire body of sound changes around a musician playing live. In this way, analogue recordings can carry more vital energy than digital recordings - overtones provide additional feedback that adds to the original record, amplifying its energy.
Valve amplifiers also introduce even harmonics (overtones) when distorted or overdriven, not odd (opposing, bouncing-off) harmonics as with digital processing/transistor amplifiers. Some currently manufactured audio equipment also has analogue recording limiters (Edirol UA-101, etc.) that cure the problem of "digital zero" (when a digital recorder reaches peak sensitivity, -0 dB, harsh odd-harmonic clipping occurs).
CD audio is limited by the 44100 Hz sampling frequency (which is roughly equivalent to a sound frequency of 22050 Hz). This leaves the number of harmonics of the original sound rather limited (digital sampling treats everything as a single Fourier-analysed sine wave); in straight words, the richness of the original recording can only be captured in a "thin-thread" approximation. Analogue media recording, by contrast, works by creating an electrical signal analogue that is stored in some shape (the groove cut with a mechanical method on vinyl disc masters, magnetic trail on tape, punches or rivets on piano rolls, etc.). Where exactly the most precise copy is created is a question answered by the many influences involved (microphone sensitivity, digital-to-analogue converters, signal processing, physical properties of recording mechanisms and media, etc.). However, CD audio by its very requirement that every sound be represented as a 22050 Hz sine wave with a relatively small amount of precision (65535 possible values for a given coordinate, under or over the time axis) compresses the original bouquet of frequencies and harmonics that musical instruments create, down to a single undulating thread. Of course that is a limitation of all digital sampling - the representation of a score of signals and frequencies as a single signal within a limited frequency scope, but CD audio does sound thinner than a digital waveform recorded at a higher sampling frequency and bit quanta.
Analogue audio relies exclusively on the physical properties of recording chain and media to transmit the original sounding. This does mean that analogue media can be fine-tuned to an extent where it will capture a better reflexion of the original, much like digital photography, which is more limited in resolution and requires complex technology, by comparison to analogue photography. Unlike digital sampling, analogue recording does not require a forced breakdown to linear coordinates (just like digital photography has its square pixels, sampled waves have signal steps).
Recording equipment can only capture a finite amount of the initial energy of sound. The problem with digital recording equipment, though, is that it can clip/ignore anything past the sampling frequency, unlike analogue recording equipment, which is limited by physical properties of the recording medium and equipment itself. The result, in digital recording, is often "sawoff" or "square-off" distortion, where instead of smooth, rounded edges of a sine wave, artefacts are introduced into the peaks of a wave. These artefacts can be smoothed and sample positions double-guessed by audio processors, but the software is only so effective. Those cutoffs are especially noticeable in the harshness of chromatic percussion (hihats, cymbals, etc.) in CD audio.
Further, even if a microphone is listed as being sensitive to only 22050 Hz maximum, in reality that means its sensitivity drops past acceptable (usually +/-3dB) beyond 22050 Hz. It may end up at -12 dB at 32000 Hz, but that doesn't mean there's no sound at 32 KHz. On the contrary, a drumkit's hihats resonate at 32 KHz just as well. But, with the digital recording device discarding everything over 22 KHz, that would mean everything past the recording limit would be discarded, introducing distortion into the original wave.
As a consequence, digital recording equipment requires very high sampling rates to avoid muddying up the sound. A good rule of thumb is three times the maximum frequency (closer to noise floor level, not -3 or -10 dB) sensed by microphones. And of course, human hearing plays no role here - it is the recording equipment's "hearing", precision, which matters.


How to use Quote function: