Tuesday, April 13, 2010

Back to blogging

April the 13th, after a long time realized I have a blog which was ignored so much that site services were thinking of closing my account. Anyway after a hot day in bangalore, there are heavy wind and pleasant whether. As always the power went as soon as BESCOM realized that you dont need a fan to beat the heat. Right now I am sitting in dark on a borrowed Laptop trying to do whatever is possible in the pitch dark.
Hope to come back soon this time.

Thursday, September 27, 2007

Computer Audio

Sound cards

A sound card could be a separate add on card in your PC or could be part of the motherboard itself. Sound card is capable of interfacing the computer to the outside world for the input/output of audio. It has various input /output ports (or jacks)

Line in (input)
Microphone (input)
Line out (or headphones) (Output)

You need to connect external speakers or headphone in the lineout jack to listen to the audio played on your PC.


Software Applications and Sound Card

The software applications on your PC can not directly talk to the sound card or use its capabilities for audio input/output. There are multiple layers involved in it.

Applications
Operating System
Device driver
Sound card

At the bottom level there is device driver layer. Sound cards usually come with their own device drivers. Device drivers are installed by the operating system in order to provide support to the applications who want to use the services of the device.

Sound card device drivers are low level software programs which interface with hardware directly (for programming the hardware) and provide APIs to operating system to send and receive digital data from the sound card. The device driver may also supply other low level APIs for device initialization and various controls which may not be available directly to applications. Operating System controls the sound card using these primitive low level APIs

Audio applications (such as audio recorders, winamp etc) use APIs provided by operating system to play or record digital audio. Few applications may bypass operating system APIs for performance reasons or for better control of the underlying sound card.


Digital Audio files
To play digital audio in your PC you need files which store the digital audio. The digital audio is stored in various formats (and so different file extensions).
There are two major groups of audio file formats

Uncompressed formats
There are many uncompressed data formats. The most popular of them is WAV. It is a flexible file format designed to store more or less any combination of sampling rates or bit rates. This makes it an adequate file format for storing and archiving an original recording. There are other uncompressed formats such as .AU, .AIFF which are not very popular now.

Compressed formats
Compressed file formats are based on principle that leave out sounds that humans cannot or can hardly hear, e.g. a low volume sound after a big volume sound. MP3 is such an example. Compressed file formats optimize on the size rather than the quality of audio. Compressed formats require less space on disk compared to uncompressed one but lack the same quality as of the uncompressed format. MP3 is a popular compressed audio format.

Wave file format
A WAVE file is a collection of a number of different types of chunks. There is a mandatory Format ("fmt ") chunk which contains important parameters describing the waveform, such as its sample rate. The Data chunk, which contains the actual waveform data, is also required. All other chunks are optional.
There are no restrictions upon the order of the chunks within a WAVE file, with the exception that the Format chunk must precede the Data chunk.


The figure below shows the format of the wave file


A Note about data segment

The data segment contains the actual audio samples. Based on “bits per sample” field you can decide how to read the data. If bits per sample field is 8 that means samples are organized as bytes and you can read the samples byte by byte.
If bits per sample field is 16 that means samples are organized as 16 bit words and you should read one 16 bit word at a time.

For multi-channel sounds (for example, a stereo waveform), single sample points from each channel are interleaved. For example, assume a stereo (ie, 2 channel) waveform. Instead of storing all of the sample points for the left channel first, and then storing all of the sample points for the right channel next, you "mix" the two channels' sample points together. You would store the first sample point of the left channel. Next, you would store the first sample point of the right channel. Next, you would store the second sample point of the left channel. Next, you would store the second sample point of the right channel, and so on, alternating between storing the next sample point of each channel. This is what is meant by interleaved data; you store the next sample point of each of the channels in turn, so that the sample points that are meant to be "played" simultaneously are stored contiguously.
The figure below shows the bytes of a actual wave file



Digital Audio effects

Digital waveforms can be altered in multiple ways to create innovative effects. Audio effects are based on few fundamental principles
  • Amplitude Modulation
  • Time Delay
  • Waveform Shaping
  • Frequencey modulation

Amplitude Modulation effects

Volume Control - The effect produced by varying the amplitude of the signal. The amplitude can
be cut down by attenuating the input signal and can be increased by using amplifiers.

Volume controls are useful for placing between effects, so that the relative volumes of the different effects can be kept at a constant level. However, most, if not all effects have volume controls built-in, allowing the user to adjust the volume of the output with the effect on relative to the volume of the unaffected signal (when the effect is off).

Compression
The compression effect amplifies the input signal in such a way that louder signals are amplified less, and softer signals are amplified more. It is essentially a variable gain amplifier, whose gain is inversely dependant on the volume of the input signal.
It is mostly used in studio recordings, to give the recording a constant volume, especially to vocals. Compression tends to increase background noise, especially during periods of silence. Thus, a noise gate is usually used in conjunction with the compressor.

Expansion
An expander performs the opposite effect of the compressor. This effect is used to increase the dynamic range of a signal.

Panning
Stereo recordings have two channels: left and right. The volume of each channel can be adjusted - this adjustment effectively adjusts the position of the perceived sound within the stereo field. The two extremes being: all sound completely on the left, or all sound completely on the right.
Usually the 'balance' knob on a music system does this.

Noice Gating
A noise gate, blocks input signals whose amplitude lies below a certain threshold, and lets other signals through. This is useful for eliminating background noises, such as hiss or hum, during periods of silence in a recording or performance.

Time Delay effects
Echo
Echo is produced by adding a time-delayed signal to the output. This produces a single echo. Multiple echoes are achieved by feeding the output of the echo unit back into it's input through and attenuator. The attenuator determines the decay of the echoes, which is how quickly each echo dies out.
Chorus
The chorus effect is so named because it makes the recording of a vocal track sound like it was sung by two or more people singing in chorus. This is achieved by adding a single delayed signal (echo) to the original input. However, the delay of this echo is varied continously between a minimum delay and maximum delay at a certain rate.
Reverb
Reverb effect is used to simulate the acoustical effect of rooms and enclosed buildings. In a room, for instance, sound is reflected off the walls, the ceiling and the floor. The sound heard at any given time is the sum of the sound from the source, as well as the reflected sound.
Flanging
Flanging is similar to chorus effect where delay of the echo is varied continuously, kind of extreme of chorus.
Phasing
When two signals that are identical, but out of phase, are added together, then the result is that they will cancel each other out. If, however, they are partially out of phase, then partial cancellations, and partial enhancements occur. This leads to the phasing effect.

Wavetable synthesis of sound


Wave Table Synthesis
The majority of synthesizers available today use some form of sampled-sound or Wavetable synthesis. The theory behind this technique is to use small sound samples of real instruments. Based on what musical note needs to be played, the original sound sample is modified or modulated digitally to create the desired sound. This is much easier approach and a synthesizer can sound very much like real instrument.


Looping and Envelope Generation
One of the primary techniques used in Wavetable synthesizers to conserve sample memory space is the looping of sampled sound segments. For many instrument sounds, the sound can be modeled as consisting of two major sections: the attack section and the sustain section. The attack section is the initial part of the sound, where the amplitude and the spectral characteristics of the sound may be changing very rapidly. The sustain section of the sound is that part of the sound following the attack, where the characteristics of the sound are changing less dynamically. Usually the sustain section of the sound can be looped for a long time creating sound of a natural instrument such as flute or violin.


The figure on the left shows a waveform with portions which could be considered the attack and the sustain sections indicated. In this example, the characteristics of the waveform remain constant throughout the sustain section, while the amplitude is decreasing at a fairly constant rate. This is an exaggerated example, in most natural instrument sounds, both the spectral characteristics and the amplitude continue to change through the duration of the sound. The sustain section, if one can be identified, is that section for which the characteristics of the sound are relatively constant.


A lot of memory can be saved in wavetable synthesis systems by storing only a short segment of the sustain section of the waveform, and then looping this segment during playback. If the original sound had a fairly constant spectral content and amplitude during the sustained section, then the sound resulting from this looping operation should be a good approximation of the sustained section of the original.

For many acoustic string or wind instruments (such as violin, flute, saxophone), the spectral characteristics of the sound remain almost constant during the sustain section, while the amplitude (or volume) of the signal decays. This can be achieved with a looped segment by decaying its volume with some factor over a period of time.


A typical wavetable synthesis system would store sample data for the attack section and the looped section of an instrument sound. These sample segments might be referred to as the initial sound and the loop sound. The initial sound is played once through, and then the loop sound is played repetitively until the note ends. An envelope generator function is used to create an envelope which is appropriate for the particular instrument, and this envelope is applied to the output samples during playback.


Playback of the sample data corresponding to initial sound (with the attack portion of the envelope applied) begins when a note is just started to play. For example the moment a person presses a key on the keyboard the attack section will start playing. The length of the initial sound depends on the kind of sample used and what kind of instrument sound it is. The length of the attack and decay sections of the envelope are usually fixed for a given instrument sound. The moment attach section finishes, the sustain section is played (while the key is still pressed). The sustain section continues to repeat the loop samples while applying the sustain envelope slope (which decays slowly), until the key is released. Releasing of the key starts triggers playing release portion of the envelope.


Sustain Loop length
The sustain loop length depends on the basic length of the sustain section being looped. The loop length is measured as a number of samples. The length of the loop should be equal to an integer number of periods of the fundamental pitch of the sound being played.
Percussive sounds

Instruments like violin/flute/strings have sustain sections because these instruments are capable of playing same note continuously for longer durations. Percussive sounds like drum or cymbals do not have a sustain section as their sounds starts and decays very fast. For such instruments the looping of sustain section can not be employed. These sounds are stored as one sample which can be played as it is. The figure on left shows the waveform of a snaredrum sound. Note that it does not have a sustain section


Wavetable samples
There is a lot of processing involved on an audio sample collected from a natural instrument before it can be utilized in a wave table synthesis technique. We need to extract out the initial (attack) and sustain potions from it which can be looped. Also the portion which needs to be looped should be corrected so that end and start points of the portion blend with each other otherwise it will cause a glitch whenever it is looped. The sound sample may also be required to compress the dynamic range of the sound to save on sample memory.
Samples for various instruments can then be combined in a table which is called a soundbank or patch table. The synthesizer can load a specified sample in memory from the table when user wants to play a particular instrument.
Pitch Shifting/ Transpose
We know that two notes of a piano are related to each other in terms of pitch. Same notes in two different sets have frequencies which are double from the previous set. For example note C1 has frequency of 32.7032 while note C2 has frequency of 32.7032 X2 = 65.4064
In order to minimize sample memory requirements, wavetable synthesis use pitch shifting techniques so that same sample can be used to play various pitches (notes) of the same instrument
For example, if the sample memory contains a sample of a middle C note on the acoustic piano, then this same sample data could be used to generate the C Sharp (C#) note or D note above middle C using pitch shifting.
Pitch shifting can be achieved by controlling the speed of the playback of the stored sample.
Suppose an audio sample used for wavetable synthesis contains 100 frames (or 100 digital samples). The rate at which it is played is 10 frames per second which is the frequency of the sound being played. It results in a sound being produced of a particular frequency (say F1)
Now if we change the speed to 20 frames per second then it means doubling the frequency of the produced sounds (say F2)
If the sample contained a tone of 32.7032 Hz, it will sound like note C1 in first case. In the second case since the frequency is doubled (by changing the speed of the playback) it will sound like a note of 65.4064 Hz which is note C2.
You can see now that same digital sample can be used to play two notes one octave apart just by changing the speed of playing the stored sample.

Sound synthesis techniques

The most important goal of sound synthesis is to produce sounds which sounds like real musical instruments. The common method for sound synthesis is by using oscillators. Oscillators produce desired waveforms by mathematical computation. A higher quality reproduction of a natural instrument can typically be achieved using more oscillators, but increased computational power and human programming is required

Amplitude envelope
One of the salient aspects of any sound is its amplitude envelope. This envelope determines whether the sound is percussive, like a snare drum, or continuous, like a violin string. The envelope defines how the amplitude of the time varies with time. A sound’s amplitude profile is described by "ADSR" (Attack Decay Sustain Release) envelope model.

Attack time is the time taken for initial run-up of the sound level from nil to 100%.

Decay time is the time taken for the subsequent run down from 100% to the designated Sustain level.

Sustain level, the third stage, is the steady volume produced when a key is held down.

Release time is the time taken for the sound to decay from the Sustain level to nil when the key is released. If a key is released during the Attack or Decay stage, the Sustain phase is usually skipped. Similarly, a Sustain level of zero will produce a more-or-less piano-like (or percussive) envelope, with no continuous steady level, even when a key is held. Exponential rates are commonly used because they closely model real physical vibrations, which usually rise or decay exponentially.


Synthesis methods
There are also many different kinds of synthesis methods, each applicable to both analog and digital synthesizers. These techniques tend to be mathematically related, especially frequency modulation and phase modulation. You can find the details of these algorithms in any book dealing with Digital signal processing.

Subtractive synthesis
Additive synthesis
Granular synthesis
Wavetable synthesis
Frequency modulation synthesis
Digital sampling

Physics behind piano

You must have seen a piano or a digital keyboard (synthesizer). In order to play these instruments a keyboard is provided which is called musical keyboard. The keyboard has sets of black and white keys. If you look closely the pattern of 12 keys (5 black and 7 whites) is repeated several times on the keyboard.

Small or cheap pianos may have lesser number of these sets while bigger pianos may have 5 or more such sets.

All music is made up of twelve different notes which means 12 fixed pitches. Notes increase in pitch as you move to left to right on the keyboard.

The seven white keys (notes) are named C D E F G A B. Moving from left to right when you reach note B, the next note is again called C. Although its name is same but it has higher pitch than the previous note C. If you play both notes (C note from two different sets) together you will notice that in spite of the different pitches they are in fact the same note. This special relationship is called Octave.

Actually notes one octave apart are different in their frequencies. The frequency of higher octave note is exactly double the frequency of the same note in previous octave.

Sharps and flats
The pitch difference between two notes (or keys) is called semi-tone. A semi-tone is 1/12th of an octave. For example keys E and F are one semi tone apart.
Keys C and D are two semitones apart (there is a black key in between). Two semitones is referred to as a TONE.

The black keys are given names relative to the notes on either side. For example the black key between F and G can be called F Sharp (or F#) or G Flat (Gb) because F# is half step above F and half step below G.

Sharp means raising the pitch of a note by one semitone. Similarly Flat means lower the pitch of a note by one semitone.

Since the note names are same for each octave or set, they are identified by the set to which they belong.

For example notes of first set are called C1,D1..B1. Notes of fourth set would be called C4,D4…B4.

The middle C
In western music the expression “Middle C” refers to note C4 or “Do”. It also tends to fall in the middle of the keyboard. Note C4 or middle C is near the top of the male and bottom of female vocal range.
Although C4 is commonly known by the expression “middle C”, the expression is keyboard specific. For a small keyboard which has only 3 sets of keys, the note belonging to very first C from left could be “Middle C”.

Synthesizer keyboards indicate the key corresponding to “Middle C” by
Indicating it as C4. It is not necessary that it is the first key of the fourth set on the keyboard.

Note Frequencies
A440 is the 440 Hz tone that serves as the internationally recognized standard for musical pitch. A440 is the musical note A above middle C (A4). It serves as the audio frequency reference for the calibration of pianos, violins, and other musical instruments
The following table shows the frequencies of various notes of a 88 keys piano.
Each successive pitch is derived by multiplying the previous by the twelfth root of two
Table below gives frequencies of various notes of Piano

Creating Digital Audio (2)

Digital Synthesis of digital audio

This method is used to generate unique sounds directly without any analog to digital conversion. Obviously you can not create human or realistic sounds with this approach.

The method involves generating the appropriate samples directly which would have been otherwise produced when an ADC sampled an analog signal with a specified sample rate.

As long as we know the waveform and frequency of a certain kind of sound, we can re-create it through software using this method. However the sound produced may not sound very realistic. Modern day music software use this technique to produce some unnatural sounds.

For example we can write a program to create a sine wave signal’s sampled values using some arithmetic in our program. The sequence of these samples can be replayed to create the sound of a sine wave.



Synthesizers –Devices which are capable of generating (synthesizing) sounds are called synthesizer. They can be played by a human just like any other instrument such as flute or guitar. The theory behind them is to imitate the mechanism of sound production electronically.

A synthesizer is a very versatile musical intrument. First of all it can produce a variety of sounds such as piano, violin, flute, bass , strings and many more. You can play all these musical instruments using a single synthesizer. You press few buttons and program it to play like flute or piano or whatever you like.
There are both analog and digital synthesizers available in the market. Analog synthesizers were the first to hit the market in 70s.

Analog synthesizers mimic the mechanism of natural sound production involving vibration, modulation and pitch just like humans produce sound by vibration of our vocal chords Analog synthesizers use amplifiers, filters ,oscillators, enevelop generators to produce sounds various kind of sounds.The sound is produced by various settings of these modules under human control.
Moog, Corg, Yamaha, Roland manufactured these analog synthesizers.


Picture on left shows an analog synthesizer.





A digital synthesizer is a synthesizer that uses Digital signal processing (DSP) techniques to make musical sounds. They directly create digital samples of sounds by mathematical computation of binary data. They can transform the samples in more than one ways to give special effects to the sound or give a different pitch to the sound.

Early commercial digital synthesizers used simple hard-wired digital circuitry to implement techniques such as additive synthesis and FM Synthesis. Other techniques, such as Wavetable Synthesis became possible with the advent of high-speed microprocessor and digital signal processing technology. We will talk about synthesis techniques little later in this chapter.

Digital synthesizers are realized in a separate hardware which very much looks like a music instruments. Synthesizers typically have a keyboard which provides the human interface to the instrument and are often thought of as keyboard instruments. However, a synthesizer's human interface does not necessarily have to be a keyboard, nor does a synthesizer strictly need to be playable by a human.

A software synthesizer, also known as a softsynth or virtual instrument is a computer program for creating digital audio. It works like a common synthesizer, but is realized entirely in software.

A software synthesizer can use many methods of generating sounds. They can implement oscillators, filters and enevelop generators in the software itself which is similar to analog synthesizers or can use wavetable synthesis techniques (which uses sampled sounds described later) to produce the sound of various muscial instruments.

Another way software synthesizer can produce sounds of any musical instrument is by allowing the user to load small piceces of files containing the sound of instruments such as Drums or cymbals. These small samples of user defined sounds can be played in any order to produce the music of an orchestra.

Many software synthesizers also support MIDI which is a standard method for communicating with synthesizers. More on MIDI later.

Software synthesizers can run stand alone or as plugins with other applications. You can program, control, compose melody all using the soft synths.
Picture on left shows a softsynth.



Polyphonic Synthesizers
Synthesizers which can play more than one note at a time of the same voice (instrument) are called polyphonic synthesizers. It does not mean that a device can play multiple instruments together. You might have heard of polyphonic ring tones. Polyphony gives a rich quality to the sound and resembles like an orchestra concert. For example a guitar chord sound is polyphonic because it involves many strings to be plucked together at once.
A polyphonic synthesizer would be able to generate a guitar chord kind of sound because it can produce sound for each plucked string together. Most of the standard synthesizers (whether hardware or software) available today are polyphonic. Few cheap once available in the market a monophonic and they don’t sound good.

Multitimberal Synthesizer –

Synthesizers which can play sound of more than one instrument together are called Multitimberal Synthesizer. For example on such synthesizers you can play a piano and violin sound together. To control two instruments together it needs to provide two keyboards which may not be easy. Few synthesizers offer the ability to split the keys of the keyboard in two section. Each section then maps to one instrument which can be played by a single user simultaneously.

Patches/Soundbank -
The different sounds that a synthesizer or sound generator can produce are sometimes called "patches". Programmable synthesizers commonly assign "program numbers" (or patch numbers) to each sound. For instance, a sound module might use patch number 1 for its acoustic piano sound, and patch number 36 for its fretless bass sound. The association of all patch numbers to all sounds is often referred to as a patch map. Usually one patch corresponds to one instrument sound like piano or accrodian. A collection of patches is called a soundbank.