Reverb: The Science And The State-of-the-Art
Guest Contribution by Anish Chandak
In the previous article by Ian Palmer on this month’s theme of Reverb, you have already seen various creative uses of Reverb. Here are my two cents to the creative: using reverb to synthesize physics-based sounds for percussion instruments.
In rest of this article, we stick to the uncreative, the basics, the science behind the reverb. We will also discuss the (academic) state-of-the-art, the research work being done at various academic institutes across the globe, focussed on developing reverb techniques for interactive applications like video games. Hopefully, you will see many of these promising approaches used in video games released in near and distant future. In the spirit of an academic discourse, this article is long and in all certainty incomplete in its discussion of the fundamentals or the state-of-the-art.
Reverb, as used commonly within gaming community, refers to the effect of geometry on the sound waves emitted by a sound source as it undergoes reflection, diffraction, and scattering in a scene before reaching the listener (player). These effects are governed by a set of physics principle (the wave equation), giving a space its characteristic reverb. The physical reverb may or may not be used within a game depending on the extent of dramatization. However, here we talk about what a space is supposed to sound like physically. A discussion on artificial (non-physical) reverberation can be found in this article titled “Fifty Years of Artificial Reverberation”.
A reverb can be decomposed into three different components: (a) direct sound, sound reaching a listener through a direct path from the source, (b) early reflections, sound reaching a listener after few early reflections, say within a short time after being emitted by a source (within few tens or hundreds of milliseconds), and (c) late reflections.
Direct sound and early reflections help localizing a sound source (see precedence effect). For example, think of a stealth game, say upcoming Thief 4, where the ability of the protagonist to be able to locate an enemy around the corner, a few corridors away, or in shadows, by hearing early reflections could add a new dimension to the game play. Similarly, as video games are being increasingly used for training, modeling direct and early reflections correctly could lead to better training outcome with video games.
Late reflections provide an immersive and emotional experience. Think about listening to an orchestra in a concert theater or imagine immersing yourself in a cave scene. The late reflections modify the sound from a source, and place it within the visual environment, giving a better immersive experience. Similarly, realistic changes in late reflections as a player moves from one environment to another improves the immersive experience.
So, once the reverb is computed, it can be encoded in an impulse response. Impulse response encodes the response of an impulse for a linear time-invariant system (in this case, reverb computation). Once an impulse response is computed, any audio being played at a source within a given environment, can be played at the listener by convolving the impulse response with the input audio signal. Hence, the term convolution reverb, which is becoming increasingly popular within game audio community.
There are a few drawbacks of existing reverb techniques used in games.
Lack of variation in reverb (mainly late reflections)
Lack or absence of computing early reflections.
Lack of realistic base reverbs for a given environment.
Using physically-based techniques can address the above challenges. Below are some of the challenges associated with using physically-based techniques within video games to compute reverb during the game play (run-time).
Real-time computation of reverb are expensive and may not fit in the limited budget for audio related computation in a video game.
Offline computation of reverb (or other similar information) is possible, but the pre-computed data may not fit within the memory budget of a game.
Dynamic scenarios within a game (like moving geometry) are challenging to handle for offline pre-computation based techniques.
The (Academic) State-of-the-Art
Promising research is being done in academia to address the challenges mentioned above with using physically-based techniques in video games. Below are some of these techniques. Discussing pros and cons between various techniques is beyond the scope of this article and only an overview of various techniques is provided.
Aural Proxies and Directionally-Varying Reverberation
This technique provides variation in reverb (both early and late reflections) by using the depth information computed by shooting rays from the listener. A reverb filter for late reflections can be computed by estimating mean free path of sound waves in the environment from a source or listener position. This approach approximates the mean free path information from the depth information instead of performing full mean free path computation which could take a lot of compute time. The artist-specified reverb is adjusted based on the change in mean free path as the listener moves in the scene. Early reflections are computed by fitting a shoebox around the listener based on the depth information, and performing early reflections within a shoebox. This technique takes only a few milliseconds during run-time and has been demonstrated to create varying reverb for both indoor and outdoor scenes. More details on this approach can be found here.
Precomputed Wave Simulation for Real-Time Sound Propagation
This technique is based on wave simulation to create very accurate and realistic reverbs. The input scene is sampled for all possible source-listener positions and full wave simulation (accurate even for engineering applications) is used to compute impulse responses for all these source-listener positions. As storing these impulse responses may require tens of gigabytes of memory, various techniques are used to reduce the size of the impulse response. These technique compress early reflections and late reflections differently and bring down the memory requirements of pre-computed data close to a gigabyte. The pre-computation may require a few hours depending on the scene. During the run-time, appropriate impulse response is looked up and convolved with input audio to create final audio output. The run-time step only takes a few milliseconds. More details on this approach can be found here.
Pre-Computed Acoustic Radiance Transfer
In past few years, a room acoustic rendering equation has been proposed, which can be used to for reverb computation in video games. It is similar to the rendering equation in computer graphics developed nearly three decades ago. Various techniques from computer graphics can be applied to solve the acoustic rendering equation. One such technique is based on pre-computing acoustic radiance transfer operators. In this technique, transfer operators are computed between the surfaces of an environment during pre-processing taking few minutes. However, storing the transfer operators could require tens of gigabytes of memory, but efficient techniques to compress the transfer operators (using Karhunen-Loeve transform) leads to transfer operators of only a few megabytes. During run-time, the transfer operators is applied to compute impulse response for a specified source and listener position and convolved with input audio to create final audio output. The run-time operation takes a few hundred milliseconds. More details on this approach can be found here.
Pre-Computing Geometry-Based Reverberation Effects for Games
A popular technique to model early reflections is image source method. However, using image source method may require few hundreds of seconds to model early reflections during the run-time. This technique stores image sources and their position gradients for sampled source-listener locations during a pre-computation step. Depending on the size of the scene, the pre-computation may take hundreds of minutes, and require only few hundreds of kilobytes of memory. During the run-time, the source and listener positions are snapped to nearby sampled source-listener positions from the pre-computation step. The image sources however are extrapolated by using the gradients computed during pre-computation to take into account the correct source position. More details on this approach can be found here.
Cell and Portal Based Approaches to Compute Reverberation for Games
Decomposing a game level into cell and portals is typical. A few techniques have been proposed based on cell and portal decomposition to compute reverb: reverberation graphs and directional propagation cache. The basic idea behind these approaches is to compute reverb per cell and transport operators between portals during pre-processing step. During run-time, the portals connecting a source and a listener are found, and appropriate transport operators applied to compute a reverb. These techniques are close in spirit to the current techniques being used in video games today.
Wave-Based Sound Propagation in Large Open Scenes
Large outdoor scenes pose significant challenges for full wave simulation to create reverb for such spaces. It may require hundreds of gigabytes of memory and hundreds of hours of pre-computation for a simple outdoor scene. This technique computes transfer operators for various objects in outdoor scenes (e.g. house, trees, vehicles) during a pre-computation step and use compression techniques (based on Equivalent Source Method) which reduces memory requirement to only a few megabytes. The sound source in this scenario is fixed and pre-computation may still take a few hours. However, this allows creation of realistic reverbs for outdoor scenes and takes only a few milliseconds during the run-time step to compute impulse response for a moving listener. Like other approaches, the impulse response is then convolved to create final audio output. More details on this approach can be found here.
In conclusion, a lot of exciting work is being done on developing efficient reverb techniques for video games, and adoptions of these techniques will enhance the gaming experience of the players.
Anish Chandak is the CEO of an audio startup, Impulsonic. He received his Ph.D. in Computer Science Department from UNC-Chapel Hill and B.Tech. in Computer Science from IIT Bombay, India. His research interests include: sound rendering (synthesis, propagation, and 3D audio rendering) and computer graphics (ray tracing, visibility, and gpgpu).