Psychoacoustics is the study of subjective human
perception of
sounds. Alternatively it can be described as the study of the
psychological correlates of the physical parameters of
acoustics.
Background
Hearing is not a purely mechanical phenomenon of wave propagation, but is also a sensory and perceptual event. When a person hears something, that something arrives at the
ear as a mechanical sound wave traveling through the air, but within the ear it is transformed into neural
action potentials. These nerve pulses then travel to the brain where they are perceived. Hence, in many problems in
acoustics, such as for
audio processing, it is advantageous to take into account not just the mechanics of the environment, but also the fact that both the ear and the brain are involved in a person’s listening experience.
The
inner ear, for example, does significant
signal processing in converting sound
waveforms into neural stimulus, so certain differences between waveforms may be imperceptible.
Audio compression techniques, such as
MP3, make use of this fact. In addition, the ear has a nonlinear response to sounds of different
loudness levels.
Telephone networks and audio
noise reduction systems make use of this fact by nonlinearly compressing data samples before transmission, and then expanding them for playback. Another effect of the ear's nonlinear response is that sounds that are close in frequency produce phantom beat notes, or
intermodulation distortion products.
Limits of perception
thumb|An equal-loudness contour. Note peak sensitivity between 2khz and 4khz, the
frequency around which the human voice centersThe human ear can nominally hear sounds in the range 20
Hz to 20,000 Hz (20 kHz). This upper limit tends to decrease with age, most adults being unable to hear above 16 kHz. The ear itself does not respond to frequencies below 20 Hz, but these can be perceived via the body's sense of touch. Some recent research has also demonstrated a
hypersonic effect which is that although sounds above about 20 kHz cannot consciously be heard, evidence suggests that ultrasonic sounds can induce changes in EEG (
electroencephalogram) readouts of listeners in controlled test environments. In addition, though we are unable to perceive sounds above 20 kHz, listeners in the same study gave qualitatively different judgments of sound when ultrasonic frequencies were present.
Frequency resolution of the ear is 0.36 Hz within the octave of 1,000–2,000 Hz. That is, changes in pitch larger than 0.36 Hz can be perceived in a clinical setting. However, even smaller pitch differences can be perceived through other means. For example, the interference of two pitches can often be heard as a (low-)frequency difference pitch. This effect of
phase variance upon the resultant sound is known as '
beating'.
The
semitone scale used in Western musical notation is not a linear frequency scale but logarithmic. Other scales have been derived directly from experiments on human hearing perception, such as the
mel scale and
Bark scale (these are used in studying perception, but not usually in musical composition), and these are approximately logarithmic in frequency at the high-frequency end, but nearly linear at the low-frequency end.
The "intensity" range of audible sounds is enormous. Our ear drums are sensitive only to variations in the sound pressure, but can detect pressure changes as small as 2×10
–10 atm and as great or greater than 1 atm. For this reason,
Sound Pressure Level is also measured
logarithmically, with all pressures referenced to 1.97385×10
–10 atm. The lower limit of audibility is therefore defined as 0
dB, but the upper limit is not as clearly defined. While 1
atm (
191 dB) is the largest pressure variation an undistorted sound wave can have in
Earth's atmosphere, larger sound waves can be present in other
atmospheres, or on Earth in the form of
shock waves. The upper limit is more a question of the limit where the ear will be physically harmed or with the potential to cause a
hearing disability. This limit also depends on the time exposed to the sound. The ear can be exposed to short periods in excess of 120 dB without permanent harm — albeit with discomfort and possibly pain; but long term exposure to sound levels over 80 dB can cause permanent hearing loss.
A more rigorous exploration of the lower limits of audibility determines that the minimum threshold at which a sound can be heard is frequency dependent. By measuring this minimum intensity for testing tones of various frequencies, a frequency dependent
Absolute Threshold of Hearing (ATH) curve may be derived. Typically, the ear shows a peak of sensitivity (i.e., its lowest ATH) between 1 kHz and 5 kHz, though the threshold changes with age, with older ears showing decreased sensitivity above 2 kHz.
The ATH is the lowest of the
equal-loudness contours. Equal-loudness contours indicate the sound pressure level (dB), over the range of audible frequencies, which are perceived as being of equal loudness. Equal-loudness contours were first measured by Fletcher and Munson at
Bell Labs in 1933 using pure tones reproduced via headphones, and the data they collected are called Fletcher-Munson curves. Because subjective loudness was difficult to measure, the Fletcher-Munson curves were averaged over many subjects.
Robinson and Dadson refined the process in 1956 to obtain a new set of equal-loudness curves for a frontal sound source measured in an
anechoic chamber. The Robinson-Dadson curves were standardized as
ISO 226 in 1986. In 2003, ISO 226 was revised as
equal-loudness contour using data collected from 12 international studies.
Overview
The term psychoacoustics describes the characteristics of the human auditory system on which modern audio coding technology is based. The most important psychoacoustics fact is the masking effect of spectral sound elements in an audio signal like tones and noise. For every tone in the audio signal a masking threshold can be calculated. If another tone lies below this masking threshold, it will be masked by the louder tone and remains inaudible too.
Masking effects
thumb|Audio Masking GraphIn some situations an otherwise clearly audible sound can be masked by another sound. For example, conversation at a bus stop can be completely impossible if a loud bus is driving past. This phenomenon is called masking. A weaker sound is masked if it is made inaudible in the presence of a louder sound. The masking phenomenon occurs because any loud sound will distort the Absolute Threshold of Hearing, making quieter, otherwise perceptible sounds inaudible.
If two sounds occur simultaneously and one is masked by the other, this is referred to as
simultaneous masking. Simultaneous masking is also sometimes called frequency masking. The tonality of a sound partially determines its ability to mask other sounds. A
sinusoidal masker, for example, requires a higher intensity to mask a noise-like maskee than a loud
noise-like masker does to mask a sinusoid. Computer models which calculate the masking caused by sounds must therefore classify their individual spectral peaks according to their tonality.
Similarly, a weak sound emitted soon after the end of a louder sound is masked by the louder sound. Even a weak sound just
before a louder sound can be masked by the louder sound. These two effects are called forward and backward
temporal masking, respectively.
'Phantom' fundamentals
A low
pitch (also known as the "pitch of the missing fundamental" or "virtual pitch") can sometimes be heard when there is no apparent source or component of that frequency. This perception is due to the brain interpreting repetition patterns determined by the audible harmonics that are present.
A
harmonic series of pitches that are related 2×f, 3×f, 4×f, 5×f, etc, give human hearing the psychoacoustic impression that the pitch 1×f is present. This phenomenon is used by some pro audio manufacturers to allow sound systems to seem to produce notes that are lower in pitch than they are capable of reproducing.
Software
thumb|Perceptual Audio Coding uses the Psychoacoustics algorithmThe
psychoacoustic model provides for high quality
lossy signal compression by describing which parts of a given digital audio signal can be removed (or aggressively compressed) safely - that is, without significant losses in the (consciously) perceived quality of the sound.
It can explain how a sharp clap of the hands might seem painfully loud in a quiet library, but is hardly noticeable after a car backfires on a busy, urban street. This provides great benefit to the overall compression ratio, and psychoacoustic analysis routinely leads to compressed music files that are 1/10 to 1/12 the size of high quality original masters with very little discernible loss in quality. Such compression is a feature of nearly all modern audio compression formats. Some of these formats include
MP3,
Ogg Vorbis,
AAC,
WMA,
MPEG-1 Layer II (used for
digital audio broadcasting in several countries) and
ATRAC, the compression used in
MiniDisc and
Walkman.
Psychoacoustics is based heavily on
human anatomy, especially the ear's limitations in perceiving sound as outlined previously. To summarize, these limitations are:
Given that the ear will not be at peak perceptive capacity when dealing with these limitations, a compression algorithm can assign a lower priority to sounds outside the range of human hearing. By carefully shifting bits away from the unimportant components and toward the important ones, the algorithm ensures that the sounds a listener is most likely to perceive are of the highest quality.
Music
Psychoacoustics include topics and studies which are relevant to
music psychology and
music therapy. Theorists such as
Benjamin Boretz consider some of the results of psychoacoustics to be meaningful only in a musical context.
Applied psychoacoustics
thumb|Psychoacoustics ModelPsychoacoustics is presently applied within many fields from software development, where developers map proven and experimental mathematical patterns; in digital signal processing, where many audio compression codecs such as
MP3 use a psychoacoustic model to increase compression ratios; in the design of (high end) audio systems for accurate reproduction of music in theatres and homes; as well as defense systems where scientists have experimented with limited success in creating new acoustic weapons, which emit frequencies that may impair, harm, or kill (see ).
It is also applied today within music, where musicians and artists continue to create new auditory experiences by masking unwanted frequencies of instruments, causing other frequencies to be enhanced. Yet another application is in design of small or lower-quality loudspeakers, which use the phenomenon of
missing fundamentals to give the effect of low frequency bass notes that the system, due to frequency limitations, cannot actually reproduce (see references).
See also