Thursday, September 27, 2007

Computer Audio

Sound cards

A sound card could be a separate add on card in your PC or could be part of the motherboard itself. Sound card is capable of interfacing the computer to the outside world for the input/output of audio. It has various input /output ports (or jacks)

Line in (input)
Microphone (input)
Line out (or headphones) (Output)

You need to connect external speakers or headphone in the lineout jack to listen to the audio played on your PC.


Software Applications and Sound Card

The software applications on your PC can not directly talk to the sound card or use its capabilities for audio input/output. There are multiple layers involved in it.

Applications
Operating System
Device driver
Sound card

At the bottom level there is device driver layer. Sound cards usually come with their own device drivers. Device drivers are installed by the operating system in order to provide support to the applications who want to use the services of the device.

Sound card device drivers are low level software programs which interface with hardware directly (for programming the hardware) and provide APIs to operating system to send and receive digital data from the sound card. The device driver may also supply other low level APIs for device initialization and various controls which may not be available directly to applications. Operating System controls the sound card using these primitive low level APIs

Audio applications (such as audio recorders, winamp etc) use APIs provided by operating system to play or record digital audio. Few applications may bypass operating system APIs for performance reasons or for better control of the underlying sound card.


Digital Audio files
To play digital audio in your PC you need files which store the digital audio. The digital audio is stored in various formats (and so different file extensions).
There are two major groups of audio file formats

Uncompressed formats
There are many uncompressed data formats. The most popular of them is WAV. It is a flexible file format designed to store more or less any combination of sampling rates or bit rates. This makes it an adequate file format for storing and archiving an original recording. There are other uncompressed formats such as .AU, .AIFF which are not very popular now.

Compressed formats
Compressed file formats are based on principle that leave out sounds that humans cannot or can hardly hear, e.g. a low volume sound after a big volume sound. MP3 is such an example. Compressed file formats optimize on the size rather than the quality of audio. Compressed formats require less space on disk compared to uncompressed one but lack the same quality as of the uncompressed format. MP3 is a popular compressed audio format.

Wave file format
A WAVE file is a collection of a number of different types of chunks. There is a mandatory Format ("fmt ") chunk which contains important parameters describing the waveform, such as its sample rate. The Data chunk, which contains the actual waveform data, is also required. All other chunks are optional.
There are no restrictions upon the order of the chunks within a WAVE file, with the exception that the Format chunk must precede the Data chunk.


The figure below shows the format of the wave file


A Note about data segment

The data segment contains the actual audio samples. Based on “bits per sample” field you can decide how to read the data. If bits per sample field is 8 that means samples are organized as bytes and you can read the samples byte by byte.
If bits per sample field is 16 that means samples are organized as 16 bit words and you should read one 16 bit word at a time.

For multi-channel sounds (for example, a stereo waveform), single sample points from each channel are interleaved. For example, assume a stereo (ie, 2 channel) waveform. Instead of storing all of the sample points for the left channel first, and then storing all of the sample points for the right channel next, you "mix" the two channels' sample points together. You would store the first sample point of the left channel. Next, you would store the first sample point of the right channel. Next, you would store the second sample point of the left channel. Next, you would store the second sample point of the right channel, and so on, alternating between storing the next sample point of each channel. This is what is meant by interleaved data; you store the next sample point of each of the channels in turn, so that the sample points that are meant to be "played" simultaneously are stored contiguously.
The figure below shows the bytes of a actual wave file



Digital Audio effects

Digital waveforms can be altered in multiple ways to create innovative effects. Audio effects are based on few fundamental principles
  • Amplitude Modulation
  • Time Delay
  • Waveform Shaping
  • Frequencey modulation

Amplitude Modulation effects

Volume Control - The effect produced by varying the amplitude of the signal. The amplitude can
be cut down by attenuating the input signal and can be increased by using amplifiers.

Volume controls are useful for placing between effects, so that the relative volumes of the different effects can be kept at a constant level. However, most, if not all effects have volume controls built-in, allowing the user to adjust the volume of the output with the effect on relative to the volume of the unaffected signal (when the effect is off).

Compression
The compression effect amplifies the input signal in such a way that louder signals are amplified less, and softer signals are amplified more. It is essentially a variable gain amplifier, whose gain is inversely dependant on the volume of the input signal.
It is mostly used in studio recordings, to give the recording a constant volume, especially to vocals. Compression tends to increase background noise, especially during periods of silence. Thus, a noise gate is usually used in conjunction with the compressor.

Expansion
An expander performs the opposite effect of the compressor. This effect is used to increase the dynamic range of a signal.

Panning
Stereo recordings have two channels: left and right. The volume of each channel can be adjusted - this adjustment effectively adjusts the position of the perceived sound within the stereo field. The two extremes being: all sound completely on the left, or all sound completely on the right.
Usually the 'balance' knob on a music system does this.

Noice Gating
A noise gate, blocks input signals whose amplitude lies below a certain threshold, and lets other signals through. This is useful for eliminating background noises, such as hiss or hum, during periods of silence in a recording or performance.

Time Delay effects
Echo
Echo is produced by adding a time-delayed signal to the output. This produces a single echo. Multiple echoes are achieved by feeding the output of the echo unit back into it's input through and attenuator. The attenuator determines the decay of the echoes, which is how quickly each echo dies out.
Chorus
The chorus effect is so named because it makes the recording of a vocal track sound like it was sung by two or more people singing in chorus. This is achieved by adding a single delayed signal (echo) to the original input. However, the delay of this echo is varied continously between a minimum delay and maximum delay at a certain rate.
Reverb
Reverb effect is used to simulate the acoustical effect of rooms and enclosed buildings. In a room, for instance, sound is reflected off the walls, the ceiling and the floor. The sound heard at any given time is the sum of the sound from the source, as well as the reflected sound.
Flanging
Flanging is similar to chorus effect where delay of the echo is varied continuously, kind of extreme of chorus.
Phasing
When two signals that are identical, but out of phase, are added together, then the result is that they will cancel each other out. If, however, they are partially out of phase, then partial cancellations, and partial enhancements occur. This leads to the phasing effect.

Wavetable synthesis of sound


Wave Table Synthesis
The majority of synthesizers available today use some form of sampled-sound or Wavetable synthesis. The theory behind this technique is to use small sound samples of real instruments. Based on what musical note needs to be played, the original sound sample is modified or modulated digitally to create the desired sound. This is much easier approach and a synthesizer can sound very much like real instrument.


Looping and Envelope Generation
One of the primary techniques used in Wavetable synthesizers to conserve sample memory space is the looping of sampled sound segments. For many instrument sounds, the sound can be modeled as consisting of two major sections: the attack section and the sustain section. The attack section is the initial part of the sound, where the amplitude and the spectral characteristics of the sound may be changing very rapidly. The sustain section of the sound is that part of the sound following the attack, where the characteristics of the sound are changing less dynamically. Usually the sustain section of the sound can be looped for a long time creating sound of a natural instrument such as flute or violin.


The figure on the left shows a waveform with portions which could be considered the attack and the sustain sections indicated. In this example, the characteristics of the waveform remain constant throughout the sustain section, while the amplitude is decreasing at a fairly constant rate. This is an exaggerated example, in most natural instrument sounds, both the spectral characteristics and the amplitude continue to change through the duration of the sound. The sustain section, if one can be identified, is that section for which the characteristics of the sound are relatively constant.


A lot of memory can be saved in wavetable synthesis systems by storing only a short segment of the sustain section of the waveform, and then looping this segment during playback. If the original sound had a fairly constant spectral content and amplitude during the sustained section, then the sound resulting from this looping operation should be a good approximation of the sustained section of the original.

For many acoustic string or wind instruments (such as violin, flute, saxophone), the spectral characteristics of the sound remain almost constant during the sustain section, while the amplitude (or volume) of the signal decays. This can be achieved with a looped segment by decaying its volume with some factor over a period of time.


A typical wavetable synthesis system would store sample data for the attack section and the looped section of an instrument sound. These sample segments might be referred to as the initial sound and the loop sound. The initial sound is played once through, and then the loop sound is played repetitively until the note ends. An envelope generator function is used to create an envelope which is appropriate for the particular instrument, and this envelope is applied to the output samples during playback.


Playback of the sample data corresponding to initial sound (with the attack portion of the envelope applied) begins when a note is just started to play. For example the moment a person presses a key on the keyboard the attack section will start playing. The length of the initial sound depends on the kind of sample used and what kind of instrument sound it is. The length of the attack and decay sections of the envelope are usually fixed for a given instrument sound. The moment attach section finishes, the sustain section is played (while the key is still pressed). The sustain section continues to repeat the loop samples while applying the sustain envelope slope (which decays slowly), until the key is released. Releasing of the key starts triggers playing release portion of the envelope.


Sustain Loop length
The sustain loop length depends on the basic length of the sustain section being looped. The loop length is measured as a number of samples. The length of the loop should be equal to an integer number of periods of the fundamental pitch of the sound being played.
Percussive sounds

Instruments like violin/flute/strings have sustain sections because these instruments are capable of playing same note continuously for longer durations. Percussive sounds like drum or cymbals do not have a sustain section as their sounds starts and decays very fast. For such instruments the looping of sustain section can not be employed. These sounds are stored as one sample which can be played as it is. The figure on left shows the waveform of a snaredrum sound. Note that it does not have a sustain section


Wavetable samples
There is a lot of processing involved on an audio sample collected from a natural instrument before it can be utilized in a wave table synthesis technique. We need to extract out the initial (attack) and sustain potions from it which can be looped. Also the portion which needs to be looped should be corrected so that end and start points of the portion blend with each other otherwise it will cause a glitch whenever it is looped. The sound sample may also be required to compress the dynamic range of the sound to save on sample memory.
Samples for various instruments can then be combined in a table which is called a soundbank or patch table. The synthesizer can load a specified sample in memory from the table when user wants to play a particular instrument.
Pitch Shifting/ Transpose
We know that two notes of a piano are related to each other in terms of pitch. Same notes in two different sets have frequencies which are double from the previous set. For example note C1 has frequency of 32.7032 while note C2 has frequency of 32.7032 X2 = 65.4064
In order to minimize sample memory requirements, wavetable synthesis use pitch shifting techniques so that same sample can be used to play various pitches (notes) of the same instrument
For example, if the sample memory contains a sample of a middle C note on the acoustic piano, then this same sample data could be used to generate the C Sharp (C#) note or D note above middle C using pitch shifting.
Pitch shifting can be achieved by controlling the speed of the playback of the stored sample.
Suppose an audio sample used for wavetable synthesis contains 100 frames (or 100 digital samples). The rate at which it is played is 10 frames per second which is the frequency of the sound being played. It results in a sound being produced of a particular frequency (say F1)
Now if we change the speed to 20 frames per second then it means doubling the frequency of the produced sounds (say F2)
If the sample contained a tone of 32.7032 Hz, it will sound like note C1 in first case. In the second case since the frequency is doubled (by changing the speed of the playback) it will sound like a note of 65.4064 Hz which is note C2.
You can see now that same digital sample can be used to play two notes one octave apart just by changing the speed of playing the stored sample.

Sound synthesis techniques

The most important goal of sound synthesis is to produce sounds which sounds like real musical instruments. The common method for sound synthesis is by using oscillators. Oscillators produce desired waveforms by mathematical computation. A higher quality reproduction of a natural instrument can typically be achieved using more oscillators, but increased computational power and human programming is required

Amplitude envelope
One of the salient aspects of any sound is its amplitude envelope. This envelope determines whether the sound is percussive, like a snare drum, or continuous, like a violin string. The envelope defines how the amplitude of the time varies with time. A sound’s amplitude profile is described by "ADSR" (Attack Decay Sustain Release) envelope model.

Attack time is the time taken for initial run-up of the sound level from nil to 100%.

Decay time is the time taken for the subsequent run down from 100% to the designated Sustain level.

Sustain level, the third stage, is the steady volume produced when a key is held down.

Release time is the time taken for the sound to decay from the Sustain level to nil when the key is released. If a key is released during the Attack or Decay stage, the Sustain phase is usually skipped. Similarly, a Sustain level of zero will produce a more-or-less piano-like (or percussive) envelope, with no continuous steady level, even when a key is held. Exponential rates are commonly used because they closely model real physical vibrations, which usually rise or decay exponentially.


Synthesis methods
There are also many different kinds of synthesis methods, each applicable to both analog and digital synthesizers. These techniques tend to be mathematically related, especially frequency modulation and phase modulation. You can find the details of these algorithms in any book dealing with Digital signal processing.

Subtractive synthesis
Additive synthesis
Granular synthesis
Wavetable synthesis
Frequency modulation synthesis
Digital sampling

Physics behind piano

You must have seen a piano or a digital keyboard (synthesizer). In order to play these instruments a keyboard is provided which is called musical keyboard. The keyboard has sets of black and white keys. If you look closely the pattern of 12 keys (5 black and 7 whites) is repeated several times on the keyboard.

Small or cheap pianos may have lesser number of these sets while bigger pianos may have 5 or more such sets.

All music is made up of twelve different notes which means 12 fixed pitches. Notes increase in pitch as you move to left to right on the keyboard.

The seven white keys (notes) are named C D E F G A B. Moving from left to right when you reach note B, the next note is again called C. Although its name is same but it has higher pitch than the previous note C. If you play both notes (C note from two different sets) together you will notice that in spite of the different pitches they are in fact the same note. This special relationship is called Octave.

Actually notes one octave apart are different in their frequencies. The frequency of higher octave note is exactly double the frequency of the same note in previous octave.

Sharps and flats
The pitch difference between two notes (or keys) is called semi-tone. A semi-tone is 1/12th of an octave. For example keys E and F are one semi tone apart.
Keys C and D are two semitones apart (there is a black key in between). Two semitones is referred to as a TONE.

The black keys are given names relative to the notes on either side. For example the black key between F and G can be called F Sharp (or F#) or G Flat (Gb) because F# is half step above F and half step below G.

Sharp means raising the pitch of a note by one semitone. Similarly Flat means lower the pitch of a note by one semitone.

Since the note names are same for each octave or set, they are identified by the set to which they belong.

For example notes of first set are called C1,D1..B1. Notes of fourth set would be called C4,D4…B4.

The middle C
In western music the expression “Middle C” refers to note C4 or “Do”. It also tends to fall in the middle of the keyboard. Note C4 or middle C is near the top of the male and bottom of female vocal range.
Although C4 is commonly known by the expression “middle C”, the expression is keyboard specific. For a small keyboard which has only 3 sets of keys, the note belonging to very first C from left could be “Middle C”.

Synthesizer keyboards indicate the key corresponding to “Middle C” by
Indicating it as C4. It is not necessary that it is the first key of the fourth set on the keyboard.

Note Frequencies
A440 is the 440 Hz tone that serves as the internationally recognized standard for musical pitch. A440 is the musical note A above middle C (A4). It serves as the audio frequency reference for the calibration of pianos, violins, and other musical instruments
The following table shows the frequencies of various notes of a 88 keys piano.
Each successive pitch is derived by multiplying the previous by the twelfth root of two
Table below gives frequencies of various notes of Piano

Creating Digital Audio (2)

Digital Synthesis of digital audio

This method is used to generate unique sounds directly without any analog to digital conversion. Obviously you can not create human or realistic sounds with this approach.

The method involves generating the appropriate samples directly which would have been otherwise produced when an ADC sampled an analog signal with a specified sample rate.

As long as we know the waveform and frequency of a certain kind of sound, we can re-create it through software using this method. However the sound produced may not sound very realistic. Modern day music software use this technique to produce some unnatural sounds.

For example we can write a program to create a sine wave signal’s sampled values using some arithmetic in our program. The sequence of these samples can be replayed to create the sound of a sine wave.



Synthesizers –Devices which are capable of generating (synthesizing) sounds are called synthesizer. They can be played by a human just like any other instrument such as flute or guitar. The theory behind them is to imitate the mechanism of sound production electronically.

A synthesizer is a very versatile musical intrument. First of all it can produce a variety of sounds such as piano, violin, flute, bass , strings and many more. You can play all these musical instruments using a single synthesizer. You press few buttons and program it to play like flute or piano or whatever you like.
There are both analog and digital synthesizers available in the market. Analog synthesizers were the first to hit the market in 70s.

Analog synthesizers mimic the mechanism of natural sound production involving vibration, modulation and pitch just like humans produce sound by vibration of our vocal chords Analog synthesizers use amplifiers, filters ,oscillators, enevelop generators to produce sounds various kind of sounds.The sound is produced by various settings of these modules under human control.
Moog, Corg, Yamaha, Roland manufactured these analog synthesizers.


Picture on left shows an analog synthesizer.





A digital synthesizer is a synthesizer that uses Digital signal processing (DSP) techniques to make musical sounds. They directly create digital samples of sounds by mathematical computation of binary data. They can transform the samples in more than one ways to give special effects to the sound or give a different pitch to the sound.

Early commercial digital synthesizers used simple hard-wired digital circuitry to implement techniques such as additive synthesis and FM Synthesis. Other techniques, such as Wavetable Synthesis became possible with the advent of high-speed microprocessor and digital signal processing technology. We will talk about synthesis techniques little later in this chapter.

Digital synthesizers are realized in a separate hardware which very much looks like a music instruments. Synthesizers typically have a keyboard which provides the human interface to the instrument and are often thought of as keyboard instruments. However, a synthesizer's human interface does not necessarily have to be a keyboard, nor does a synthesizer strictly need to be playable by a human.

A software synthesizer, also known as a softsynth or virtual instrument is a computer program for creating digital audio. It works like a common synthesizer, but is realized entirely in software.

A software synthesizer can use many methods of generating sounds. They can implement oscillators, filters and enevelop generators in the software itself which is similar to analog synthesizers or can use wavetable synthesis techniques (which uses sampled sounds described later) to produce the sound of various muscial instruments.

Another way software synthesizer can produce sounds of any musical instrument is by allowing the user to load small piceces of files containing the sound of instruments such as Drums or cymbals. These small samples of user defined sounds can be played in any order to produce the music of an orchestra.

Many software synthesizers also support MIDI which is a standard method for communicating with synthesizers. More on MIDI later.

Software synthesizers can run stand alone or as plugins with other applications. You can program, control, compose melody all using the soft synths.
Picture on left shows a softsynth.



Polyphonic Synthesizers
Synthesizers which can play more than one note at a time of the same voice (instrument) are called polyphonic synthesizers. It does not mean that a device can play multiple instruments together. You might have heard of polyphonic ring tones. Polyphony gives a rich quality to the sound and resembles like an orchestra concert. For example a guitar chord sound is polyphonic because it involves many strings to be plucked together at once.
A polyphonic synthesizer would be able to generate a guitar chord kind of sound because it can produce sound for each plucked string together. Most of the standard synthesizers (whether hardware or software) available today are polyphonic. Few cheap once available in the market a monophonic and they don’t sound good.

Multitimberal Synthesizer –

Synthesizers which can play sound of more than one instrument together are called Multitimberal Synthesizer. For example on such synthesizers you can play a piano and violin sound together. To control two instruments together it needs to provide two keyboards which may not be easy. Few synthesizers offer the ability to split the keys of the keyboard in two section. Each section then maps to one instrument which can be played by a single user simultaneously.

Patches/Soundbank -
The different sounds that a synthesizer or sound generator can produce are sometimes called "patches". Programmable synthesizers commonly assign "program numbers" (or patch numbers) to each sound. For instance, a sound module might use patch number 1 for its acoustic piano sound, and patch number 36 for its fretless bass sound. The association of all patch numbers to all sounds is often referred to as a patch map. Usually one patch corresponds to one instrument sound like piano or accrodian. A collection of patches is called a soundbank.

Creating Digital Audio (1)

There are two ways we can create digital audio. One way is to convert analog sounds to digital which is called analog to digital conversion (AD). Another way is to create the digital sounds directly using a computer or some digital machine which is called digital audio synthesis.

Analog to Digital Conversion

The song we hear on the Compact Disc is digital audio which has been converted from analog to digital format. The singer sings the song into the microphone which is recorded digitally on the CDs.

To record the digital audio on CD you need to first convert analog sound of the singer into bits and bytes using analog to digital conversion. Once you have created a stream of bytes for the sound, you can store it on a CD, send it over the internet or process it with computer.

Digital Sampling

Analog to digital sound conversion requires the use of ADC or analog to digital converter. This device samples the sound at periodic intervals and calculates the binary number corresponding to the value of the analog signal at that instant. This process is called Sampling. This is how a continuous analog waveform is broken into multiple samples. Each sample is represented by a binary number.

The whole process is not as simple as it sounds. First of all the sampling rate has to be fast enough to capture enough samples of the waveform otherwise you can not faithfully convert back the analog signal from the samples collected. The sampling rate is dependent on the frequency of the waveform being sampled. Higher the frequency of the waveform, higher sampling rate is needed.

Sampling Rate

The sampling rate must be at least twice (or more) from the highest frequency component in the analog audio signal being sampled. Since the human audible range of frequencies is between 20Hz- 4KHz, to convert human audible analog audio signals to digital format the sampling rate should be 8KHz or more. Usually the sample rates in practice are 8KHz, 16KHz, 22KHz, 44KHz. More the sampling rate, more faithful reproduction of the original sound.

It is important to note that we need to limit the maximum frequency of the analog signal based on what sample rate we are using. The audio signal is first passed to a band-pass filter which cleans the audio from any higher frequency waveforms which can not be sampled at the sample rate being used

Sample Size

The sample collected at each sample interval is a number which represents some property (e.g. amplitude) of the signal at that instant. If we use only 1 byte (8 bits) to represent a sample, we can store up to (-127 to +127) value in the sample. That means we can divide the max and min amplitude of the signal into 256 discrete values (-127 to +127) or blocks. When we sample the signal we measure the value of the signal closest to one of the 256 discrete values and that is the sampled value at that instant.

256 discrete values may not be enough to capture small variations in the analog signal (as two consecutive samples may fall into the same discrete value) thus loosing this variation in the digital format. To capture the analog signal with high quality there should be more discrete values (bigger range) available which requires bigger sample size such as 16 bits or 24 bits. Bigger sample size means better quality of the digital audio when it is replayed.

Number of channels

To reflect the directional properties of a sound, it must be measured in more than one place. For example to record stereo sounds we need two microphones to be placed separately in two corners of the room.

To convert such audio into digital format, the signals from both the microphone should be sampled together at the same time. Sampling audio signals from two microphones would require encoding of two separate channels together in a single digital stream.
The reason we collect samples into different channels is that then we can reproduce sounds captured from each microphone separately by playing a single digital stream.

The amount of samples collected for a digital audio is dependent on number of channels it contains. For a digital stereo stream (containing two channels) the number of samples collected would be double than a mono (containing single channel) stream containing the same audio.

Frame Rate – For linear sampled data a sample frame stores all the samples for the sound at a given instant of time. The frame size would be the number of bytes collected for each channel in the duration. For example if the sample size is 16 bits then frame size of a stereo stream would be 4 bytes, 2bytes for each channel. The frame rate is number of frames per second for a sound and in most cases it would be same as sample rate.

However the frame rate will differ from sample rate if the sound is in compressed format. Usually the frame rate would be lesser than the sample rate in this case. We will talk about the compressed format a little later in this chapter.

Digital audio size

As we seen above having more sampling rate and sample size means better quality audio signal. This introduces another problem of size. The amount of samples collected would be much higher and would require more space/memory to store.

For example assume sampling rate of 8K and Sample size of 1 byte.
Amount of samples collected in one second = 8K (8000) X 1 byte = 8K bytes

Now if we use better sampling rate of 16K and sample size of 16 bits (2 bytes)
Amount of samples collected in one second = 16K (16000) X 2 = 32K bytes.

32 K bytes are needed for storing one second of a good quality of digital audio. Assuming the duration of a song which is stored in digital format is 5 minutes, it means 5X60X32K = 9600K or 9.6MB

If we now consider that we recorded a stereo song then it means 9.6MB for each channel that is 19.2 MB for a stereo recoding of a 5 min song.

To reduce the amount of space required there are several other techniques used which are called “coding/compression techniques”. These coding techniques try to compress the data and store it without compromising the quality of the digital audio.

Analog Vs Digital

Analog / Digital sound

The act of listening or hearing sound is analog in nature. Analog means it would require physical motion of some sort. For example when we hear the sound of an airplane, the particles in the air vibrate and create disturbance which is picked up by our ears. There is a physical motion of the air particles involved here.

To store analog sounds directly you need some mechanism to record air particle vibrations in some media and replay it later by reproducing the same motions of air particles from the media.

The cassette tape recorder stores the voice in analog format. When we record a voice the vibrations of the air particles are converted into vibrations of magnetic field which are stored on the tape. When we re-play the tape, the magnetic vibrations on the tape are converted back into air particle vibration by the loudspeaker and we can hear the sound again.

On the contrary digital audio is what is stored and processed in the digital format.
You still need to convert it into analog form to hear it using a loudspeaker or headphone as our ear can hear only analog sounds.

Since digital audio involve bits and bytes or rather 1s and 0s, you need some kind of machine which understands 1s and 0s. CD players, computers, DVD players all are digital devices capable of processing digital data and we use them whenever we are dealing with digital audio.

People have been debating the respective merits of analog and digital audio ever since the appearance of digital sound recordings. Generally digital audio as more versatile than analog sound. We can process the digital audio in whatever way we like using the computer or some dedicated hardware. You can cut/paste modify sound the way you like, you can store it on your hard disk or CD easily, download over the internet or send via e-mail.

Analog/Digital recording

An analog recording is one where the original sound signal is modulated onto another physical signal carried on some media such as the groove of a gramophone record or the magnetic field of a magnetic tape. A physical quantity in the medium (e.g., the intensity of the magnetic field) is directly related to the physical properties of the sound (e.g, the amplitude, phase and possibly direction of the sound wave.) The reproduction of the sound will in part reflect the nature of the media and any imperfections on its surface.

A digital recording, on the other hand is produced by first encoding the physical properties of the analog sound into digital information which can then be decoded for reproduction. While it is subject to noise and imperfections in capturing the original sound, as long as the individual bits and bytes can be recovered, the original sound can be reproduced by decoding these bits.

Sound, waves and frequencies


Sound is a very basic element in our lives. Sounds are all around us, falling trees, trucks honking, dogs barking and countless others which we don’t even hear. But what is sound anyway?

Any sound, whatever it might be, is caused by something vibrating. Without vibration there can be no sound. The vibrating body causes the air particles next to it to vibrate; those air particles, in turn, cause the particles next to them to vibrate. In this way a disturbance of the air moves out from the source of the sound and may eventually reach the ears of a listener. This disturbance causes our ear drums to vibrate and our brains pick up the sound that we hear.

The sounds we hear are propagated in the air which is a transmission medium. Sound can travel in other mediums as well (as long as the medium has some particles to vibrate) like water, glass and buildings.

The sound travels in the medium in the form of waves. For example when you throw a stone in a pond or river, you would see circular waves moving out from the place where stone touched the water.

So what is a wave then?

A wave represents the periodic motion (vibration) of an object or particles in a medium. It describes the pattern in which an object vibrates. Since the vibrations can happen in any form, a wave can be of arbitrary shape. Following figure shows some example of waveforms

Sine Wave
Saw Tooth
Square wave
Zigzag

Wave which is periodic in nature i.e. where the movement of particles has a definite pattern can be described by mathematical formulae. For example the Sine Wave can be described by a function

Each wave irrespective of its shape has certain common characteristics

Amplitude – The maximum displacement of the particle from its mean position.

Cycle – Each complete vibration is called a cycle. From its starting position, to a maximum distance in one direction, back through the starting position, then to a maximum displacement in the opposite direction and back to the starting place

Frequency – No of cycles in one second is called frequency of the wave and measured in Hz (Hertz). (One thousand hertz = 1 kilohertz = 1 kHz.)

For example the frequencies of notes that can be played on a piano range from 27.5 Hz to just over 4kHz.

Pitch – Pitch is a subjective quality, often described as highness or lowness, and is something perceived by the human ear only. For example the sound produced from a drum has lower pitch than the sound produced by a whistle. Usually it is related to the frequency and amplitude of the sound but not always.

Tone – Sound consisting of a single wave of fixed frequency is usually referred to as tone or “pure tone”.

Harmonics- Most of the musical instruments (for example Guitar, when a string is plucked) generate more than one wave of different frequencies (sounding all together) which are related to each other. These frequencies are related to the frequency (usually to the lowest one) which gives the sound its characteristic pitch.

The tone with the lowest frequency is called the fundamental. The other tones are called overtones If the overtones have frequencies that are whole number multiples (x2, x3...up to x14) of the fundamental frequency they are called harmonics. It is the difference in the harmonic content of notes that gives each musical instrument its characteristic sound or timbre ("tam-brah").

Sound intensity or Loudness - It is the strength (or loudness) of the sound that we hear. For example an airplane sounds much louder than the sound of a car.

Sound loudness is a subjective term describing the strength of the ear's perception of a sound. It is intimately related to sound intensity but can by no means be considered identical to intensity. The intensity of the sound is measured in Decibels (db). Decibel scale is a logarithmic scale and not a linear scale.

The sound intensity also depends on the distance of the source from the listener as the sound waves loose strength (energy) when traveling in air.
Musical Note
What we perceive a musical note is subject to our interpretation. Any sound which is pleasant in nature can be called a musical note. Musical notes have been defined as sounds which are smooth, regular, pleasant and of definite pitch, and unmusical sounds have been those which are rough, irregular, unpleasant and of no definite pitch.

Usually it is the presence of harmonics in the sound which makes it pleasant. That is why musical instruments (such as guitar or violin) sound pleasant. The sound produced by these instruments contains a fundamental frequency tone and lot of other overtones.

A musical instrument such as piano is capable of producing a range of musical notes. Pressing a particular key of piano produces a musical note (sound) consisting of tone and overtones of a definite frequencies. Pressing other key would produce another musical note which has different frequencies sounding totally different from previous musical note.

It is important to note that frequencies of the tones produced by two different piano keys are still related with each other.