Guest contribution by Owen Green
[This is the first of a two part series on loudness, mixing, metering and the new ITU loudness spec]
Understanding the complex relationship between sound level and perceived loudness turns out to be very important to us as designers of sound for a number of reasons:
- Our perception of loudness is not constant (or even remotely linear) at different frequencies, so it is possible to have high level signals that nonetheless sound weak (and vice versa) depending on their frequency content.
- The levels at which we monitor in the studio have an impact on how we hear our work, and consequently on how the work translates to different spaces and systems, because our relative perception of frequency is not constant across different levels.
- Our judgement of frequency balance and loudness is not constant with time. This is particularly true if we tire our ears out with working – it becomes harder to make reasonable decisions.
- To get the best out of our equipment, we need to understand how it works and interconnects; this means knowing about the various different dB scales we will encounter, and how they should align.
- Recently, a number of areas have adopted a recent ITU recommendation on a loudness (rather than level) based form of metering and specification for broadcast. Other sectors are investigating following suit, so it is quite likely that this will be a standard and required practice.
Insofar as the ITU recommendation arose as an attempt to circumvent the sonic race to the bottom of the ‘loudness war’, this whole issue of how we relate the levels of our equipment, our perceptions of loudness and our working practices occurs at a complex intersection of technology, psychology, aesthetics, economics, politics, philosophy, etc…
Levels and dB
Sound pressure level or
dB SPL = 20log10(p/p0) where
p0 is the pressure that represents the nominal threshold of human hearing (0.00002 Pascals), and p is our measured pressure.
dBfs is the decibel scale for digital signal levels. In this case the reference quantity is whatever the maximum sample value for a given word length (the number of bits being used; 16, 24, etc.) is (i.e. how many different numbers you can represent with a given number of binary bits). You can work this out simply by
max value = 2word length - 1. So for 8 bits, 255; 16 bits, 65535, and so on.
This means that the maximum possible ratio is 1 (i.e.
max value / max value), and therefore the top of the scale is always 0 dBfs. (The log 1 – of any base – always equals zero; just as any number raised to the power 0 always = 1). The bottom of the scale is indicated by the dynamic range afforded by a particular word length, which can be worked out by
dynamic range = 20log(2word length * √(3/2)) = (6.02 * word length) + 1.76.
So the bottom of the digital scale is approximately -98dB for 16 bits, and approximately -146 dB for 24 bit. In practice the noise floor will be above this, of course. (If you see -96 and -144 dB around, this is because the above is often approximated without the √(3/2), or simply by 6 times the number of bits). Be aware that things are slightly different for floating-point formats, such as we use in most DAWs (although the converters are still fixed-point 16 or 24 bit).
Finally, we need to meet the electrical decibel. Somewhat confusingly, there are (at least) three different references we might encounter. Most commonly, we come across two: dBu and dBv. Here, we are dealing with ratios of voltages to some reference. For dBu, this is 0.775v RMS unloaded; for dBv, 1v RMS.
High end audio gear is normally referenced to a standard line level of +4 dBu. Cheaper, ‘semi-pro’ and consumer equipment uses -10dBv. Be aware that because these use different references, the difference between them is not 14dB! When you work it out properly, it is closer to 12 dB.
So, we have three different ‘realms’ to consider in our signal chains, and correspondingly three different types of decibel:
digital (dBfs) ↔ electrical (dBu / dBv) ↔ acoustic (db SPL)
Meters: Peak and Averaging
Our perception of loudness is not based so much upon instantaneous peak values in a signal, but more closely resembles the average energy over time.
The meters on a lot of our equipment, and in our DAWs (at least at the moment) tend to be peak meters (PPMs or similar); that is, they report peak values of the signal. However, it is important to realise that neither digital nor analogue PPMs are necessarily instantaneous. Analogue variants tend to be quasi-PPMs because they are still performing some (albeit very quick) averaging operation on the signal. Similarly, digital PPMs (obviously) aren’t updating at 22.5kHz and, moreover, may only report a clip if some number of successive samples exceeds 0 dBfs.
On some other gear (and available as plug-ins) we can get meters with a slower averaging process, most commonly the VU (Volume Unit) meter. These use a moving average of the signal over 300 milliseconds (which is still quite quick). Whilst these correlate more closely with subjective loudness than PPMs, it is still pretty approximate; the same deflection from a 30Hz signal and a 1kHz signal will have markedly different perceived volumes, because the meter is still a linear measure of (average) amplitude.
You will notice also that all VU meters and most PPMs have a dB scale; ‘dB with reference to what?’ you may well ask. The answer is: it kinda varies; where we set the 0 dB point on a VU meter or PPM is not relative to a fixed physical quantity as with SPL or dBu, but is a matter of convention, and not everyone uses the same convention! What fun!
Reference Levels, Alignment and Monitoring Levels
Lets pause and review the story so far:
- We have a number of different dB scales for describing levels of phenomena (pressure, voltage, digital amplitude) relative to some reference quantity.
- We know that human hearing is significantly non-linear, and that our perception of loudness in particular is a complex function of sound pressure level, frequency content, time and other factors.
- We have a working practice that encompasses digital, electrical and acoustic realms, and a need to produce work that translates well between diverse variants of this basic system recipe.
- The most common instrumentation we are offered for monitoring signal levels in our equipment corresponds, at best, approximately to our perception of actual volume.
Moreover, we know from experience that our ears can be fooled; with fatigue we are liable to start making misjudgements and might (for instance) start to over emphasise high frequency content, whilst our auditory system is busy defensively damping our high frequency sensitivity; we might also (and often do) mistake something louder for something better and over-work our audio. A lot of this can be brought under control, at least to some extent, by being systematically rigorous in how we set our equipment up: how the levels of components relate to each other (alignment) and at what level we do our work (monitoring level).
Why is this important? It does, after all, seem kind of dull. Well, here’s the thing, and it’s so important it gets red letters:
The crucial skill in doing good work with sound is trusting your ears. However, there are so many interfering factors and uncertainties that, in order to stand any chance of this happening, we have to bring as many of those factors under control as we can.
So, first is the matter of how we line up our digital and electrical realms. The process is actually pretty simple. All you need to do is decide upon (sensible) reference levels for your electrical and digital equipment, and then make sure they line up. Likewise, given that there is (sometimes) some degree of choice over where the 0 point on meters might sit, this too needs to be lined up with your sensible reference level.
Fortunately, there are standard practices for this. Less fortunately, there are different standards depending on where / for whom you are working (and furthermore, not all gear is well designed enough to either behave in spec, or allow calibration.). In Europe, the EBU R68 recommendation uses a reference level of -18 dBfs = 0 dBu; in European and UK film post, 0 dB VU is often set to 0 dBu. In the US, SMPTE SR155 uses a reference of -20 dBfs = +4 dBu = 0 dB VU. There are, however, other references in use.
The basic goal, however, is the same: to ensure that there is sufficient headroom throughout the chain. This, in turn, is based on some set of assumptions about the range of the material (the EBU spec, for instance, is predicated on having a peak level of -9 dBfs, thus allowing 9 dB of headroom before clipping) and about the clipping point of the analogue equipment relative to these levels. High quality equipment will operate comfortably with any of these conventions, whereas cheaper gear (where the manufacturer is more likely to have skimped on quality components and adequate supply voltages) could start to become nonlinear (i.e. distort) before these limits are reached. The solution in this case (short of buying some better equipment) is to move the reference down (e.g. 0 dBu becomes -8 dBVU), albeit whilst grumbling.
Ok, so we’ve lined up our digital and analogue levels (or it’s been done for us). Now what? We calibrate our amplifiers such that our reference levels are associated with some reference SPL.
Now, this is murkier territory than the alignment of our dBfs and dBu levels, and it is worth explaining both why the numbers that are commonly used are chosen, but also why there will also necessarily be some variation. The numbers you see most commonly are in the range 79-85 dB SPL for a -20 dBfs reference (which is quite loud if you’re listening to it constantly). The broad reasoning is that it is in this region that the ear tends to exhibit the least spectacular frequency variation, and it is loud enough that loud passages have impact, without being dangerously loud for prolonged exposure. The variation comes about in part due to individual difference and preference, and also because the upper end of this range feels generally too loud in smaller spaces, particularly those without acoustic treatment.
Setting this up is pretty easy. You need some pink noise at -20 dBfs, and a sound level meter (set to C weighting, slow). Put the meter where your head would be at the sweet spot and measure the level coming from each speaker on its own; adjust to make each speaker equal. There are obviously more involved extensions to this, including placing and tuning sub-woofers, looking at level at different frequencies, adding acoustic treatment etc.
The mastering engineer Bob Katz made a proposal a few years ago that is, in some ways, a precursor to the the new ITU recommendation (described in the next part of this two part series). Katz suggested that at lower monitoring levels than the ~83 dB SPL region, the inclination of the engineer is to make it louder, and that this translates to greater use of dynamic range compression (so as to lower the peaks and enable raising the average level). In an attempt to resist the tendency of the loudness war to produce recordings with less and less dynamic range, and more and more distortion, Katz suggested that one could set fixed attenuation points of one’s master volume in relation to a 0 dB point set at -20 dBfs = 83 dB SPL. Leaving the volume at 0 dB would work for the most dynamic material (similar to established practice in the film industry), and by turning it down by 6-8 dB people could work on more dynamically constrained material.
The K-System made no particular recommendation about metering, beyond Katz’s suggestion that a move to slower meters may be beneficial and that the system was amenable to more sophisticated loudness models than RMS (such as A-, B- or C-weighted measures, LAeq or Zwicker’s). However, despite the emphasis on trying to reclaim dynamics and on coupling reference level to monitoring level, there was nothing really to stop people using the K-system as a slightly more sophisticated way of targeting RMS levels with their masters.
Those links above illustrate some different attempts to take account of the complexity of how we hear in relation to linear measurements of physical phenomena, albeit with different levels of sophistication and with different goals in mind. It is worthwhile to emphasise just how complex and partially understood our hearing is, even for such a fundamental and basic aspect as the perceived loudness of a sound.
Whilst there is a reasonable level of understanding about what happens in the outer, middle and inner parts of the ear (‘the periphery’), much of what goes on in the brain remains poorly understood. There are a couple of things worth noting:
- In the inner ear, sound is decomposed into different frequency bands. However, these bands are not linear in frequency (nor even, it seems, linear in pitch), but form a set of non-uniform, highly overlapping zones referred to as critical bands.
- Sounds within the same critical band will give rise to a number of psychoacoustic phenomena as they interfere with each other. For example we can get a sense of dissonance or ‘roughness’ from tones that are close together. Within these channels, there is also a mechanism that acts like a dynamic range compressor, so that the change perceived loudness will be different for two sounds in the same band than it would be for two sounds in distant bands.
- Sounds in neighbouring bands of different levels can obscure each other, something called masking. This forms part of the basis for lossy compression formats like MP3 and AAC.
- Once sounds leave the periphery and enter the central nervous system all kinds of quite radical information reduction takes place. Moreover, there are feedback channels to the periphery that seem to affect, at a physical level, how the ear behaves. Remarkably, it seems that our sense of expectation plays a role in how this information reduction takes place, which is important both for our ability to focus on particular sounds in complex auditory scenes, and gives an indication of how tired or confused ears can be fooled into making strange mixing decisions!
So, the ear is complex and adaptive. Meanwhile, the loudness war has given rise to a number of practical problems in almost all fields of audio distribution: in music recordings are produced with ever dwindling dynamic range and ever greater distortion, in broadcast viewers are subjected to radical level jumps between segments, and so on.
[Part 2 will be about the new ITU loudness standard. Watch out for it next week!]
Owen Green is a sonic artist and designer based in Edinburgh.