Finding Your Way With High Dynamic Range Audio In Wwise
Guest Contribution by: Louis-Xavier Buffoni – Software engineer at Audiokinetic
HDR in a Nutshell
HDR (“High Dynamic Range”) audio is a technique which draws its inspiration from the local adaptation method used in HDR imaging, which “attempts to maintain local contrast, while decreasing global contrast.”  In audio, this local/global dichotomy applies to time, and contrast refers to loudness instead of brightness. The technique consists of using an automatic mixing system that maps virtual world loudness to living room loudness. Clerwall’s phrase “every sound is important, but not at the same time”  summarizes the essence of its algorithm: the mapping is adaptive to what is playing in the virtual world, and can be represented by a “sliding window”, as is illustrated in the following figure.
HDR audio has received a lot of attention since it was presented by DICE a few years ago, backed up by their astoundingly good sounding games Battlefield: Bad Company and Battlefield 3 . It left in many minds the impression that their system had solved the complex problem of mixing in an interactive context.
The HDR Mixing System versus Dynamic Range Compression
The HDR system affects your mix by making soft sounds inaudible when loud sounds play, and making them audible again when playing alone. The relative levels of sounds between one another in the HDR world are preserved, creating the illusion of a greater dynamic range, particularly effective with impact sounds, while in fact they are compressed within the output device’s lower dynamic range.
This mechanism is, apart from a few implementation details, numerically equivalent to audio limiting. A limiter estimates the input signal’s level, computes a gain based on this value, and applies it to the signal. When the input signal goes above a given threshold, the limiter scales it down to the desired output value, and consequently, the level of all sounds of the mixture drops by the same amount. The sliding HDR window is a representation of this process, and it is adequate because decibels are a logarithmic measure of amplitude; on a linear scale, the window would undergo homothetic transformations instead.
Now, anyone who has ever mixed/mastered a song or a game knows that fixing dynamics is a all about tradeoffs. There is no good and definite answer to how to set up audio compressors. For example, with an infinite compression ratio such as what is used in a limiter, any two sounds above threshold will come out of the system at exactly the same volume as long as they are not playing at the same time! In other words, HDR provides you with infinite dynamic range in your virtual world, but does so at the expense of good old “real” dynamic range for your output device. You might find a better compromise by using a finite ratio instead.
Likewise, ballistics (attack/release) may induce artifacts, such as coloration, usually due to the window moving too fast, or pumping, resulting from “too slow or too fast…or too, um, medium” attack and release settings .
HDR Audio Workflow
The core of HDR audio thus consists of some kind of dynamic range compressor. According to DICE, the technique also involves a drastically new workflow in which designers tag assets with loudness values that span through the entire range of hearing, expressed in dB SPL*, and rely entirely on the above mentioned system to correctly scale them back to device-friendly levels. In this workflow, loudness values should be seen as a measure of priority.
In practice, users of HDR systems often report that they actually need to tweak the dB SPL values so much that they are nowhere close to the “real life” original value. This is quite understandable: as we have just seen, dynamic range compression involves tradeoffs, and yet the whole mix depends on it. It is a bit like trusting the HDR image processor to get good pictures, it does not necessarily give good nor realistic results (type “bad HDR images” in Google and see for yourself). When we think about it, tweaking dB SPL values is similar to tweaking levels on a traditional mixer.
However, you do not need to flip the mixing paradigm upside down in order to benefit from the use of the HDR compressor. You can instead start with a traditional mixing approach, and then push the volume of a few sounds above the HDR threshold, little by little. Perhaps only a handful of sounds should trigger HDR compression in your game.
Behavioral Dynamics in Wwise
The HDR system made available in Wwise v.2013.1 is implemented as a dynamic range compressor acting on the logical volume of individual voices. In other words, it can be described as a behavioral dynamics effect. Working with sound metadata instead of an audio mixdown gives us the opportunity, among other things, to perform a few tricks that help reduce artifacts that are typical to compression.
On the other hand, the system is unaware of the actual amplitude of the sounds, and therefore sees them as black boxes with constant volume throughout their duration. Most of the time, the amplitude of sounds vary. Imagine an impact sound with a transient and a decaying part. If this sound was loud enough to fix the position of the HDR window, this position will remain unnaturally constant during the whole duration of the sound. See the resulting effect in the following figure, showing snapshots of the Wwise Voice Monitor. The colored lines represent logical volumes of each voice with their amplitude envelope. The image on the left shows the volumes at the input of the HDR system (high-dynamic range voices, sliding window in blue), the image on the right shows them at the output (low-dynamic range, fixed window position), and below is the corresponding output wave.
Notice the lull after the impact sound, caused by the HDR system interpreting it as being constant. To solve this, it is possible to let the HDR system peek into the black box by enabling envelope tracking. The HDR system uses it to move the window appropriately. Compare the resulting effects in the following figure.
Uses of Behavioral Dynamics Processing
HDR, as a dynamic range compressor, has many benefits, and one of them is to add clarity to a mix. In the following example, a loud machine gun sound is fired repeatedly. It takes the entire place; there is no need to leave anything else behind. The figure shows that its presence effectively ducks down the volume of other presumably ambient or less important sounds. This helps us focus on what is important at this point in the game by removing unnecessary noise. This can also be accomplished using side-chaining, because one sound is clearly more important than the other with regards to gameplay.
Another potentially beneficial aspect of HDR is that it can help you control loudness between various elements of the audio scene. For example, say you have two distinct sounds constituting an ambience, and the amplitude of one of them varies greatly. Whenever a sound goes above the HDR threshold, a portion of its energy is spent ducking down the other ambient sound instead of just piling up. The relative level between them is maintained, so perceptually the louder sound will effectively be louder than the other, but the overall level of the ambience mixture remains around the desired target level. This way, ambiences remain ambiences and do not interfere with other elements of the scene. HDR achieves this task more elegantly than side-chaining because the relative importance of both sounds is not known a priori. The system figures it out based on their respective instantaneous loudness.
As we have just seen, the sounds’ envelope has a subtractive effect on other sounds. One of the most exciting features of HDR in Wwise is the ability to edit envelopes manually, independently of the original audio material. Thus, you can handily edit the way a sound is going to carve its place into a mix! A simple but classic example would be to boost the envelope of an explosion sound before the bulk of the explosion’s energy actually occurs, as can be seen below. This will create a short period of silence just before the big blast.
Subjective mixing and HDR audio are often viewed as antagonists. On the contrary, the HDR feature set gives you additional tools that, to paraphrase Rob Bridgett , can help you design the dynamics at the beginning of the chain. At the other end of the spectrum, a game that is entirely unpredictable in terms of events might need to rely further on the system’s control of dynamic range to obtain a mix that sounds plausible in any situation. It is up to you to find where you stand within this spectrum, depending on the nature of your game.
* A dB SPL is a decibel value relative to a reference level that corresponds to the threshold of human hearing.
 Cambridge in Colour. http://www.cambridgeincolour.com/tutorials/high-dynamic-range.htm
 Anders Clerwall – How High Dynamic Range Audio Makes Battlefield: Bad Company Go BOOM – GDC 2009 – Slides 8 & 9. http://publications.dice.se
 Stefan Strandberg & David Möllerstedt – GDC Austin 2007, Adaptive Mixing in Frostbite. http://publications.dice.se/publications.asp?show_category=yes&which_category=Audio
 Case, Alex (2007). Sound FX, p.160. Focal Press, Cited in http://en.wikipedia.org/wiki/Pumping_(audio)
 Audiokinetic’s Wwise. https://www.audiokinetic.com
 Rob Bridgett – Dynamic Range: The Symptom at the End of the Chain. http://gameaudiomix.com/
We’re always open to guest contributions…and sometimes we may even hunt people down for one (like this article). You may consider every theme announcement a solicitation to share your thoughts with the community. If you have an article you’d like to pitch, contact shaun [at] designingsound [dot] org.