General audio terminologies:

Acoustic Signature: The acoustic signature of a system is data containing all of the sound characteristics of a system. This includes such things as reverb time, frequency response and other timbral qualities. Impulse files used by Acoustic Mirror can be thought of as acoustic signatures.

Additive Synthesis — A system for generating audio waveforms or sounds by combining basic waveforms or sampled sounds prior to further processing with filters and envelope shapers. The Hammond tonewheel organ was one of the first additive synthesizers.

ADSR — When creating artificial waveforms in a synthesizer, changes in the signal amplitude over time are controlled by an ‘envelope generator’ which typically has controls to adjust the Attack, Sustain, Decay and Release times, controlled by the pressing and subsequent release of a key on the keyboard. The Attack phase determines the time taken for the signal to grow to its maximum amplitude, triggered by the pressing of a key. The envelope then immediately enters the Decay phase during which time the signal level reduces until it reaches the Sustain level set by the user. The signal remains at this level until the key is released, at which point the Release phase is entered and the signal level reduces back to zero.

AES — Acronym for Audio Engineering Society, one of the industry's professional audio associations. (

A-Law: A-Law is a compounded compression algorithm for voice signals defined by the Geneva Recommendations (G.711). The G.711 recommendation defines A-Law as a method of encoding 16-bit PCM signals into a nonlinear 8-bit format. The algorithm is commonly used in United States telecommunications. A-Law is very similar to µ-Law; however, each uses a slightly different coder and decoder.

Aliasing: A type of distortion that occurs when digitally recording high frequencies with a low sample rate. For example, in a motion picture, when a car's wheels appear to slowly spin backward while the car is quickly moving forward, you are seeing the effects of aliasing. Similarly, when you try to record a frequency greater than one half of the sampling rate (the Nyquist Frequency), instead of hearing a high pitch, you may hear a low-frequency rumble.

Ambience — The result of sound reflections in a confined space being added to the original sound. Ambience may also be created electronically by some digital reverb units. The main difference between ambience and reverberation is that ambience doesn't have the characteristic long delay time of reverberation; the reflections mainly give the sound a sense of space.

Amp (Ampere) — Unit of electrical current (A).

Amplitude — The waveform signal level. It can refer to acoustic sound levels or electrical signal levels.

Amplitude Modulation: Amplitude Modulation (AM) is a process whereby the amplitude (loudness) of a sound is varied over time. When varied slowly, a tremolo effect occurs. If the frequency of modulation is high, many side frequencies are created that can strongly alter the timbre of a sound.

Analog: When discussing audio, this term refers to a method of reproducing a sound wave with voltage fluctuations that are analogous to the pressure fluctuations of the sound wave. This is different from digital recording in that these fluctuations are infinitely varying rather than discrete changes at sample time. See Quantization.

Analogue (cf. Digital) — The origin of the term is that the electrical audio signal inside a piece of equipment can be thought of as being ‘analogous’ to the original acoustic signal. Analogue circuitry uses a continually changing voltage or current to represent the audio signal.

Analogue Synthesis — A system for synthesizing sounds by means of analogue circuitry, usually by filtering simple repeating waveforms.

Arming — Arming a track or channel on a recording device places it in a condition where it is ready to record audio when the system is placed in record mode. Unarmed tracks won’t record audio even if the system is in record mode. When a track is armed the system monitoring usually auditions the input signal throughout the recording, whereas unarmed tracks usually replay any previously recorded audio.

Arpeggiator — A device (or software) that allows a MIDI instrument to sequence around any notes currently being played. Most arpeggiators also allows the sound to be sequenced over several octaves, so that holding down a simple chord can result in an impressive repeating sequence of notes.

ASCII — American Standard Code for Information Interchange. An internationally

Audio Data Reduction — A system used to reduce the amount of data needed to represent some information such as an audio signal. Lossless audio data reduction systems, (eg. FLAC and ALAC) can fully and precisely reconstruct the original audio data with bit-accuracy, but the amount of data reduction is rarely much more than 2:1. Lossy data audio reduction systems (eg. MPeg. AAC, AC3 and others) permanently discard audio information that is deemed to have been 'masked' by more prominent sounds. The original data can never be retrieved, but the reduction in total data can be considerable (12:1 is common).

Audio Frequency — Signals in the range of human audio audibility. Nominally 20Hz to 20kHz.

Balance — This word has several meanings in recording. It may refer to the relative levels of the left and right channels of a stereo recording (eg. Balance Control), or it may be used to describe the relative levels of the various instruments and voices within a mix (ie. Mix balance).

Bandwidth — The range of frequencies passed by an electronic circuit such as an amplifier, mixer or filter. The frequency range is usually measured at the points where the level drops by 3dB relative to the maximum.

Baseline: The baseline of a waveform is also referred to as the zero-amplitude axis or negative infinity.



Beats Per Minute (BPM): The tempo of a piece of music can be written as a number of beats in one minute. If the tempo is 60 BPM, a single beat will occur once every second.

Bias — A high-frequency signal used in analogue recording to improve the accuracy of the recorded signal and to drive the erase head. Bias is generated by a bias oscillator.

Bit: The most elementary unit in digital systems. Its value can only be 1 or 0, corresponding to a voltage in an electronic circuit. Bits are used to represent values in the binary numbering system. As an example, the 8-bit binary number 10011010 represents the unsigned value of 154 in the decimal system. In digital sampling, a binary number is used to store individual sound levels, called samples.

Bit Depth: The number of bits used to represent a single sample. For example, 8- or 16-bit are common sample sizes. While 8-bit samples take up less memory (and hard disk space), they are inherently noisier than 16- or 24-bit samples.

Bit Rate — The number of data bits replayed or transferred in a given period of time (normally one second). Normally expressed in terms of kb/s (kilo bits per second) or Mb/s (mega bits per second). For example, the bit rate of a standard CD is (2 channels x 16 bits per sample x 44.1 thousand samples per second) = 1411.2 kilobits/second. Popular MP3 file format bit rates range from 128kb/s to 320kb/s, while the Dolby Digital 5.1 surround soundtrack on a DVD-Video typically ranges between 384 and 448kb/s.

Bi-Timbral — A synthesizer than can generate two different sounds simultaneously

Bouncing — The process of mixing two or more recorded tracks together and re-recording these onto another track.

BPM — Beats Per Minute.

Buffer: Memory used as an intermediate repository in which data is temporarily held while waiting to be transferred between two locations. A buffer ensures that there is an uninterrupted flow of data between computers. Media players may need to rebuffer when there is network congestion.

Bus: A virtual pathway where signals from tracks and effects are mixed. A bus's output is a physical audio device in the computer from which the signal will be heard.

Byte: Refers to a set of 8 bits. An 8-bit sample requires one byte of memory to store, while a 16-bit sample takes two bytes of memory to store.

Cut-off Frequency — The frequency above or below which attenuation begins in a filter circuit.

Cycle — One complete vibration (from maximum peak, through the negative peak, and back to the maximum again) of a sound source or its electrical equivalent. One cycle per second is expressed as 1 Hertz (Hz).

Damping — The control of a resonant device. In the context of reverberation, damping refers to the rate at which the reverberant energy is absorbed by the various surfaces in the environment. In the context of a loudspeaker it relates to the cabinet design and internal acoustic absorbers.

DANTE — A form of audio-over-IP (layer 3) created by Australian company Audinate in 2006. DANTE is an abbreviation of 'Digital Audio Network Through Ethernet'. The format provides low-latency multichannel audio over standard ethernet intrastructures. it has been widely adopted in the broadcast, music studio, and live sound sectors.

DAW — (Digital Audio Workstation): A term first used in the 1980s to describe early ‘tapeless’ recording/sampling machines like the Fairlight and Synclavier. Nowadays, DAW is more commonly used to describe Audio+MIDI ‘virtual studio’ software programs such as Cubase, Logic Pro, Digital Performer, Sonar and such-like. Essentially elaborate software running on a bespoke or generic computer platform which is designed to replicate the processes involved in recording, replaying, mixing and processing real or virtual audio signals. Many modern DAWs incorporate MIDI sequencing facilities as well as audio manipulation, a range of effects and sound generation.

Decibel dB — The deciBel is a method of expressing the ratio between two quantities in a logarithmic fashion. Used when describing audio signal amplitudes because the logarithmic nature matches the logarithmic character of the human sense of hearing. The dB is used when comparing one signal level against another (such as the input and output levels of an amplifier or filter). When the two signal amplitudes are the same, the decibel value is 0dB. If one signal has twice the amplitude of the other the decibel value is +6dB, and if half the size it is -6dB.

When one signal is being compared to a standard reference level the term is supplemented with a suffix letter representing the specific reference. 0dBu implies a reference voltage of 0.775V rms, while 0dBV relates a reference voltage of 1.0V rms. The two most common standard audio level references are +4dBu (1.223V rms) and -10dBV (0.316V rms). The actual level difference between these is close to 12dB. The term dBm is also sometimes encountered, and this relates to an amount of power rather than a voltage, specifically 1mW dissipated into 600 Ohms (which happens to generate a voltage of 0.775V rms). When discussing acoustic sound levels, 0dB SPL (sound pressure level) is the typical threshold of human hearing at 1kHz.

dB/Octave — A means of measuring the slope or steepness of a filter. The gentlest audio filter is typically 6dB/Octave (also called a first-order slope). Higher values indicate sharper filter slopes. 24dB/octave (fourth order) is the steepest normally found in analogue audio applications.

Decay — The progressive reduction in amplitude of a sound or electrical signal over time, eg. The reverb decay of a room. In the context of an ADSR envelope shaper, the Decay phase starts as soon as the Attack phase has reached its maximum level.

Dithering: Dithering is the practice of adding noise to a signal to mask quantization noise.

Dolby — A manufacturer of analogue and digital audio equipment in the fields of tape noise reduction systems and cinema and domestic surround sound equipment. Dolby’s noise-reduction systems included types B, C and S for domestic and semi-professional machines, and types A and SR for professional machines. Recordings made using one of these systems must also be replayed via the same system. These systems varied in complexity and effectiveness, but essentially they all employed multiband encode/decode processing that raised low-level signals during recording, and reversed the process during playback. Dolby’s surround sound systems started with an analogue phase-matrix system with a very elaborate active-steering decoder called ProLogic, before moving into the digital realm with Dolby Digital, Dolby Digital Plus, Dolby True HD and others.

Dynamics — A way of describing the relative levels within a piece of music.

Dynamic Range: The difference between the maximum and minimum signal levels. It can refer to a musical performance (high-volume vs. low-volume signals) or to electrical equipment (peak level before distortion vs. noise floor).

Effect — A treatment applied to an audio signal in order to change or enhance it in some creative way. Effects often involve the use of delays, and include such treatments as reverb and echo.

Envelope — The way in which the amplitude of a sound signal varies over time.

Equivalent Input Noise — A means of describing the intrinsic electronic noise at the output of an amplifier in terms of an equivalent input noise, taking into account the amplifier’s gain.

Fast Fourier Transform (FFT) Analysis: A Fourier Transform is the mathematical method used to convert a waveform from the Time Domain to the Frequency Domain.

Since the Fourier Transform is computationally intensive, it is common to use a technique called a Fast Fourier Transform (FFT) to perform spectral analysis. The FFT uses mathematical shortcuts to lower the processing time at the expense of putting limitations on the analysis size.

The analysis size, also referred to as the FFT size, indicates the number of samples from the sound signal used in the analysis and also determines the number of discrete frequency bands. When a high number of frequency bands are used, the bands have a smaller bandwidth, which allows for more accurate frequency readings.

Foldback — A system for making one or more separate mixes audible to musicians while performing, recording and overdubbing. Also known as a Cue mix. May be auditioned via headphones, IEMs or wedge monitors.

Formant — The frequency components or resonances of an instrument or voice sound that doesn't change with the pitch of the note being played or sung. For example, the body resonance of an acoustic guitar remains constant, regardless of the note being played.

Frequency — The number of complete cycles of a repetitive waveform that occur in 1 second. A waveform which repeats once per second has a frequency of 1Hz (Hertz).

Frequency Response — The variation in amplitude relative to the signal frequency. A measurement of the frequency range that can be handled by a specific piece of electrical equipment or loudspeaker. (Also see Bandwidth)

FSK — Frequency Shift Keying. An obsolete method of recording a synchronisation control signal onto tape by representing it as two alternating tones.

Fundamental — The lowest frequency component in a harmonically complex sound. (Also see harmonic and partial.)

Gain — The amount by which a circuit amplifies a signal, normally denoted in decibels.

Glitch — Describes an unwanted short term corruption of a signal, or the unexplained, short term malfunction of a piece of equipment.

Group — A collection of signals within a mixer that are combined and routed through a separate fader to provide overall control. In a multitrack mixer several groups are provided to feed the various recorder track inputs.

Harmonic — High frequency components of a complex waveform, where the harmonic frequency is an integer multiple of the fundamental.

Headroom — The available ‘safety margin’ in audio equipment required to accommodate unexpected loud audio transient signals. It is defined as the region between the nominal operating level (0VU) and the clipping point. Typically, a high quality analogue audio mixer or processor will have a nominal operating level of +4dBu and a clipping point of +24dBu — providing 20dB of headroom. Analogue meters, by convention, don’t show the headroom margin at all; but in contrast, digital systems normally do — hence the need to try to restrict signal levels to average around -20dBFS when tracking and mixing with digital systems to maintain a sensible headroom margin. Fully post-produced signals no longer require headroom as the peak signal level is known and controlled. For this reason it has become normal to create CDs with zero headroom.

Hertz (Hz) — The standard measurement of frequency. 10Hz means ten complete cycles of a repeating waveform per second.

Head-Related Transfer Function (HRTF): Sounds are perceived differently depending on the direction the sound comes from. This occurs because of the echoes bouncing from your shoulders and nose and the shape of your ears. A head-related transfer function contains the frequency and phase response information required to make a sound appear to originate from a certain direction in 3-dimensional space.

Hertz (Hz): The unit of measurement for frequency or cycles per second (CPS).

High Resolution — A misnomer, but used to refer to digital formats with long word-lengths and high sample rates, eg. 24/96 or 24/192. Audio resolution is infinite and identical to analogue systems in properly configured digital systems. Word-length defines only the system’s signal-to-noise ratio (equivalent to tape width in analogue systems) , while sample rate defines only the audio bandwidth (equivalent to tape speed in analogue systems).

Hiss — Random noise caused by random electrical fluctuations.

Hum — Audio Signal contamination caused by the addition of low frequencies, usually related to the mains power frequency.

Hysteresis — A condition whereby the state of a system is dependent on previous events or, in other words, the system's output can lag behind the input. Most commonly found in audio in the behaviour of ferro-magnetic materials such as in transformers and analogue tape heads, or in electronic circuits such a 'switch de-bouncing'. Another example is the way a drop-down box on a computer menu remains visible for a short while after the mouse is moved.

Hz — The SI symbol for Hertz, the unit of frequency.

Inverse Telecine (IVTC): Telecine is the process of converting 24 fps (cinema) source to 30 fps video (television) by adding pulldown fields. Inverse telecine, then, is the process of converting 30 fps (television) video to 24 fps (cinema) by removing pulldown.

k — (lower-case k) The standard abbreviation for kilo, meaning a multiplier of 1000 (one thousand). Used as a prefix to other values to indicate magnitude, eg. 1kHz = 1000Hz, 1kOhm = 1000 Ohms.

K-Metering — An audio level metering format developed by mastering engineer Bob Katz which must be used with a monitoring system set up to a calibrated acoustic reference level. Three VU-like meter scales are provided, differing only in the displayed headroom margin. The K-20 scale is used for source recording and wide dynamic-range mixing/mastering, and affords a 20dB headroom margin. The K-14 scale allows 14dB of headroom and is intended for most pop music mixing/mastering, while the K-12 scale is intended for material with a more heavily restricted dynamic-range, such as for broadcasting. In all cases, the meter's zero mark is aligned with the acoustic reference level.

Latency (cf. Delay) — The time delay experienced between a sound or control signal being generated and it being auditioned or taking effect, measured in seconds.

Load — An electrical load is a circuit that draws power from another circuit or power supply. The term also describes reading data into a computer system.

Loudness — The perceived volume of an audio signal.

Low-range (low, lows) — The lower portion of the audible frequency spectrum, typically denoting frequencies below about 1kHz

LUFS — The standard measurement of loudness, as used on Loudness Meters corresponding to the ITU-TR BS1770 specification. the acronym stands for 'Loudness Units (relative to) Full Scale. Earlier versions of the specification used LKFS instead, and this label remains in use in America. The K refers to the 'K-Weighting' filter used in the signal measurement process.

Mid-Side recording: Mid-side (MS) recording is a microphone technique in which one mic is pointed directly towards the source to record the center (mid) channel, and the other mic is pointed 90 degrees away from the source to record the stereo image. For proper playback on most systems, MS recordings must be converted to your standard left/right (also called AB) track.

Mix: Mixing allows multiple sound files to be blended into one file at user-defined relative levels.

Multiple-Bit-Rate Encoding: Multiple-bit-rate encoding allows you to create a single file that contains streams for several bit rates. A multiple-bit-rate file can accommodate users with different Internet connection speeds, or these files can automatically change to a different bit rate to compensate for network congestion without interrupting playback.To take advantage of multiple-bit-rate encoding, you must publish your media files to a Windows Media server or a RealServerG2.

Nyquist Frequency: The Nyquist Frequency (or Nyquist Rate) is one half of the sample rate and represents the highest frequency that can be recorded using the sample rate without aliasing. For example, the Nyquist Frequency of 44,100 Hz is 22,050 Hz. Any frequencies higher than 22,050 Hz will produce aliasing distortion in the sample if no anti-aliasing filter is used while recording.

Punch-In: Punching-in during recording means automatically starting and stopping recording at user-specified times.

Root Mean Square (RMS): The Root Mean Square (RMS) of a sound is a measurement of the intensity of the sound over a period of time. The RMS level of a sound corresponds to the loudness perceived by a listener when measured over small intervals of time.

Sample: The word sample is used in many different (and often confusing) ways when talking about digital sound. Here are some of the different meanings:

  • A discrete point in time which a sound signal is divided into when digitizing. For example, an audio CD-ROM contains 44,100 samples per second. Each sample is really only a number that contains the amplitude value of a waveform measured over time.
  • A sound that has been recorded in a digital format; used by musicians who make short recordings of musical instruments to be used for composition and performance of music or sound effects. These recordings are called samples. In this Help system, we try to use sound file instead of sample whenever referring to a digital recording.
  • The act of recording sound digitally, i.e. to sample an instrument means to digitize and store it.

Sample Rate: The Sample Rate (also referred to as the Sampling Rate or Sampling Frequency) is the number of samples per second used to store a sound. High sample rates, such as 44,100 Hz provide higher fidelity than lower sample rates, such as 11,025 Hz. However, more storage space is required when using higher sample rates.

Sample Value: The Sample Value (also referred to as sample amplitude) is the number stored by a single sample.

  • In 32-bit audio, these values range from -2147483648 to 2147483647.
  • In 24-bit audio, they range from -8388608 to 8388607. 
  • In 16-bit audio, they range from -32768 to 32767. 
  • In 8-bit audio, they range from -128 to 127. 

The maximum allowed sample value is often referred to as 100% or 0 dB.


Sampler: A sampler is a device that records sounds digitally. Although, in theory, your sound card is a sampler, the term usually refers to a device used to trigger and play back samples while changing the sample pitch.

Secure Digital Music Initiative (SDMI): The Secure Digital Music Initiative (SDMI) is a consortium of recording industry and technology companies organized to develop standards for the secure distribution of digital music. The SDMI specification will answer consumer demand for convenient accessibility to quality digital music, enable copyright protection for artists' work, and enable technology and music companies to build successful businesses.

SCSI MIDI Device Interface (SMDI): SMDI is a standardized protocol for music equipment communication. Instead of using the slower standard MIDI serial protocol, it uses a SCSI bus for transferring information. Because of its speed, SMDI is often used for sample dumps.

Sign-Bit: Data that has positive and negative values and uses zero to represent silence. Unlike the signed format, twos complement is not used. Instead, negative values are represented by setting the highest bit of the binary number to one without complementing all other bits. This is a format option when opening and saving RAW sound files.

Signed: Data that has positive and negative twos complement values and uses zero to represent silence. This is a format option when opening and saving raw sound files.

Signal-to-Noise Ratio: The signal-to-noise ratio (SNR) is a measurement of the difference between a recorded signal and noise levels. A high SNR is always the goal.

The maximum signal-to-noise ratio of digital audio is determined by the number of bits per sample. In 16-bit audio, the signal to noise ratio is 96 dB, while in 8-bit audio its 48 dB. However, in practice this SNR is never achieved, especially when using low-end electronics.

Small Computer Systems Interface (SCSI): SCSI is a standard interface protocol for connecting devices to your computer. The SCSI bus can accept up to seven devices at a time including CD ROM drives, hard drives and samplers.

Society of Motion Picture and Television Engineers (SMPTE): SMPTE time code is used to synchronize time between devices. The time code is calculated in hours:minutes:second:frames, where frames are fractions of a second based on the frame rate. Frame rates for SMPTE time code are 24, 25, 29.97 and 30 frames per second.

Sound Card: The sound card is the audio interface between your computer and the outside world. It is responsible for converting analog signals to digital and vice-versa. There are many sound cards available on the market today, covering the spectrum of quality and price. Sound Forge software will work with any Windows-compatible sound card.

Streaming: A method of data transfer in which a file is played while it is downloading. Streaming technologies allow Internet users to receive data as a steady, continuous stream after a brief buffering period. Without streaming, users would have to download files completely before playback.

Telecine: The process of creating 30 fps video (television) from 24 fps film (cinema).

Tempo: Tempo is the rhythmic rate of a musical composition, usually specified in beats per minute (BPM).

µ-Law: µ-Law (mu-Law) is a companded compression algorithm for voice signals defined by the Geneva Recommendations (G.711). The G.711 recommendation defines µ-Law as a method of encoding 16-bit PCM signals into a nonlinear 8-bit format. The algorithm is commonly used in European and Asian telecommunications. µ-Law is very similar to A-Law, however, each uses a slightly different coder and decoder.

Waveform: A waveform is the visual representation of wave-like phenomena, such as sound or light. For example, when the amplitude of sound pressure is graphed over time, pressure variations usually form a smooth waveform.