Thursday, September 27, 2007

Creating Digital Audio (1)

There are two ways we can create digital audio. One way is to convert analog sounds to digital which is called analog to digital conversion (AD). Another way is to create the digital sounds directly using a computer or some digital machine which is called digital audio synthesis.

Analog to Digital Conversion

The song we hear on the Compact Disc is digital audio which has been converted from analog to digital format. The singer sings the song into the microphone which is recorded digitally on the CDs.

To record the digital audio on CD you need to first convert analog sound of the singer into bits and bytes using analog to digital conversion. Once you have created a stream of bytes for the sound, you can store it on a CD, send it over the internet or process it with computer.

Digital Sampling

Analog to digital sound conversion requires the use of ADC or analog to digital converter. This device samples the sound at periodic intervals and calculates the binary number corresponding to the value of the analog signal at that instant. This process is called Sampling. This is how a continuous analog waveform is broken into multiple samples. Each sample is represented by a binary number.

The whole process is not as simple as it sounds. First of all the sampling rate has to be fast enough to capture enough samples of the waveform otherwise you can not faithfully convert back the analog signal from the samples collected. The sampling rate is dependent on the frequency of the waveform being sampled. Higher the frequency of the waveform, higher sampling rate is needed.

Sampling Rate

The sampling rate must be at least twice (or more) from the highest frequency component in the analog audio signal being sampled. Since the human audible range of frequencies is between 20Hz- 4KHz, to convert human audible analog audio signals to digital format the sampling rate should be 8KHz or more. Usually the sample rates in practice are 8KHz, 16KHz, 22KHz, 44KHz. More the sampling rate, more faithful reproduction of the original sound.

It is important to note that we need to limit the maximum frequency of the analog signal based on what sample rate we are using. The audio signal is first passed to a band-pass filter which cleans the audio from any higher frequency waveforms which can not be sampled at the sample rate being used

Sample Size

The sample collected at each sample interval is a number which represents some property (e.g. amplitude) of the signal at that instant. If we use only 1 byte (8 bits) to represent a sample, we can store up to (-127 to +127) value in the sample. That means we can divide the max and min amplitude of the signal into 256 discrete values (-127 to +127) or blocks. When we sample the signal we measure the value of the signal closest to one of the 256 discrete values and that is the sampled value at that instant.

256 discrete values may not be enough to capture small variations in the analog signal (as two consecutive samples may fall into the same discrete value) thus loosing this variation in the digital format. To capture the analog signal with high quality there should be more discrete values (bigger range) available which requires bigger sample size such as 16 bits or 24 bits. Bigger sample size means better quality of the digital audio when it is replayed.

Number of channels

To reflect the directional properties of a sound, it must be measured in more than one place. For example to record stereo sounds we need two microphones to be placed separately in two corners of the room.

To convert such audio into digital format, the signals from both the microphone should be sampled together at the same time. Sampling audio signals from two microphones would require encoding of two separate channels together in a single digital stream.
The reason we collect samples into different channels is that then we can reproduce sounds captured from each microphone separately by playing a single digital stream.

The amount of samples collected for a digital audio is dependent on number of channels it contains. For a digital stereo stream (containing two channels) the number of samples collected would be double than a mono (containing single channel) stream containing the same audio.

Frame Rate – For linear sampled data a sample frame stores all the samples for the sound at a given instant of time. The frame size would be the number of bytes collected for each channel in the duration. For example if the sample size is 16 bits then frame size of a stereo stream would be 4 bytes, 2bytes for each channel. The frame rate is number of frames per second for a sound and in most cases it would be same as sample rate.

However the frame rate will differ from sample rate if the sound is in compressed format. Usually the frame rate would be lesser than the sample rate in this case. We will talk about the compressed format a little later in this chapter.

Digital audio size

As we seen above having more sampling rate and sample size means better quality audio signal. This introduces another problem of size. The amount of samples collected would be much higher and would require more space/memory to store.

For example assume sampling rate of 8K and Sample size of 1 byte.
Amount of samples collected in one second = 8K (8000) X 1 byte = 8K bytes

Now if we use better sampling rate of 16K and sample size of 16 bits (2 bytes)
Amount of samples collected in one second = 16K (16000) X 2 = 32K bytes.

32 K bytes are needed for storing one second of a good quality of digital audio. Assuming the duration of a song which is stored in digital format is 5 minutes, it means 5X60X32K = 9600K or 9.6MB

If we now consider that we recorded a stereo song then it means 9.6MB for each channel that is 19.2 MB for a stereo recoding of a 5 min song.

To reduce the amount of space required there are several other techniques used which are called “coding/compression techniques”. These coding techniques try to compress the data and store it without compromising the quality of the digital audio.

No comments: