There’s been more VR content made in the past year than the last twenty combined, thanks to the emergence of the Oculus Rift, Sony’s Project Morpheus and other such virtual reality (VR) devices. There’s lots of innovation happening on the visual front, including new methods of gameplay, narrative structure and visual design. The obvious question: what’s happening on the audio front?
There are discussions about audio for VR across the Internet but most of them are related to the technology behind binaural/3D positional audio. There also is lots of academic research related to auditory interfaces spanning the past couple of decades. A search on Google Scholar will lead to lots of good material worth reading. [This post is focussed on first person game like environments, where audio-visual realism and synchronisation is necessary]
Over the past two and a half years I have been involved with Two Big Ears where we’ve been developing 3Dception, a very very efficient real-time and easy to use binaural audio engine that works everywhere (you can head to the website to watch and download demos). During this period I’ve had the opportunity to design sound for about fourteen augmented and virtual reality projects including games, interfaces for the visually impaired and audio led tourism apps. My experience so far, especially when working with binaural audio, has shown that some of the ‘tricks’ we take for granted in non-VR applications don’t work as well. This article is a summary of a few things that I’ve learnt, as a designer, when dealing with such technologies.
This article is by no means exhaustive. My hope is that it can be expanded as more sound designers experiment in this area. I’ve also made a copy of this article on a wiki which I hope to update as I continue work in this area (it is on wiki to facilitate community contribution!). I’m also currently working on a short playable game that demonstrates some of what is discussed in this post and I will post a link here when it is done.
Experimentation is key with emerging technologies and new mediums. If there are no ‘recipes’, it would be great to get in early and begin defining some of them. Fail early, fail quickly, iterate.
2D or 3D Audio
Most game and game audio engines allow you to choose between 2D and 3D sound sources. 2D sounds do not respond to positional parameters from the game world, i.e, the sounds may not automatically pan depending on the relative position of the sound source from the player. 3D sounds on the other hand automatically pan and change in intensity depending on their relative position and distance from the player. Most sound engines have an amplitude based stereo panning algorithm that takes care of the positioning. Some engines add a low pass filter to the source if it is behind the listener to accentuate the effect. This has worked quite well for the past few decades, especially with visual information to support it, but when dealing with VR the experience can be a bit limiting.
First person VR worlds are all about trying to make the experience as believable as possible. This means that every aspect of the audio-visual experience must contribute to it. With VR and head tracking, the difference between binaural audio versus traditional stereo panning is huge, especially when dealing with both the horizontal and vertical plane. It is quite a cool experience to see something fly over you and hear it move over your head too. Most VR experiences are personal and are usually experienced through headphones making real-time binaural audio a perfect fit.
Mono Or Stereo?
Binaural engines and panners require mono audio sources, for obvious reasons. What about stereo audio material? I prefer recording all sources in MS to give me options while designing. I’ve used the following strategies for pre-recorded and synthetic stereo material:
- Choose a single channel: Discard one of the channels, if possible. Sounds that are equally spread across the left and right channels (Eg: close-up stereo recordings) can benefit from this.
- Mixdown to mono: If both left and right channels include content that is critical, a mixdown of both channels might work better. This depends on the content and attention must be paid to phasing issues.
- Two binaural sources: Depending on the size of the object in the visual world, you could create two audio sources and assign each of the stereo channels to each object. This has hardly worked in my experience, except for one time when designing the sound of a river which benefited from the sound being a bit diffused. I’ve generally found this technique to not be very effective and it can also lead to phasing.
- A combination of binaural and stereo audio: More on this in the next section.
Sound Size And Diffusion
When working with mono binaural sources it can get difficult to control the diffusion of the sound source. A diffused or larger sized sound object shouldn’t sound like it is being emitted from a single point in space. Examples of this include distant traffic sounds, streams/rivers, large vehicles and so on. This can be achieved quite easily with stereo panning techniques. With binaural sources, changing the minimum distance value for distance roll-offs can help. There are other complex methods to help achieve this, but I prefer just mixing the binaural sound source with a stereo sound source. The mix ratio between the binaural and stereo sound source can be used to achieve the necessary amounts of diffusion, while still making it easier for the listener to localise the sound. This obviously will cause some loss in localisation quality, but diffused sounds are a bit difficult to localise to a single point in space anyway!
Early Reflections And Reverb
Early reflections play a very important role in understanding the size, dimensions and type of environment the sound source and listener are in. Traditional stereo reverbs include early reflections that are panned across the stereo (or surround) stage. One of the features we put high on our list for 3Dception was including binaurally spatialised early reflections and a full room modelling system to approximate as much of reality as possible. The mix ratio of early reflections, reverberation and direct sound make a huge difference in the localisation quality. All three work together to provide the right cues to the brain and help it willingly suspend disbelief.
One the tricks commonly used in audio is to use reverberation as a ‘glue’ to help blend various elements of the soundtrack together while still communicating information about the space. A traditional reverb with stereo early reflections can decrease the localisation quality of a binaural sound source because the reflections and reverberation sound like they originate from inside the listener’s head while the binaural sound source is properly externalised. This can cause confusion and deter the quality of spatialisation. Binaural audio actually demands for greater use of techniques used in game audio such as reactive reverberation zones, detailed sculpting of roll-off curves and HDR mixing. The mix becomes all the more important when dealing with full 3D positional panning. Less is more.
Pre-delay values become all the more important too. I often delay the reverb slightly (using the pre-delay) to increase the quality of spatialisation, while still getting the late reverberation to act as a ‘glue’. The room modelling system in 3Dception makes it quite easy to get the best of both worlds — good sounding late reverberation and a room modelling system with early reflections that properly match the space in the VR/game environment.
Reactive And Procedural Content
There’s been increasing use of procedural techniques or a combination between procedural and more traditional techniques over the past few years. I would argue for a greater push for such reactive techniques with VR. While fully procedural content would be great, it doesn’t need to even go that far. Opening up more parameters of the audio engine and tying it to visual components can go a long way. Slight changes in pitch or carefully dialled in doppler values can create a very convincing experience. Thinking of every audio asset as a multi-layered reactive element instead of just a static audio file can help change the design and implementation mindset. Synchronising visual elements with audio parameters is very important and can help seal the experience.
With motion and head tracking, could foley evolve beyond footsteps and basic movement sounds? Can the game logic that trigger these sounds evolve beyond player controller input? For example, could close-up binaural sources be used to simulate the sounds of a diving suit? What about head movements of a robot? Detailed cloth movements for when the player spins? Could foley be used to give the player a better sense of their presence in the virtual world? There’s only way to find out — experimentation until it feels right. The interfacing of subtle design like foley with amazing technology is an area I hope to document more as I explore the world of VR and 3D audio.
The sound design community, including the game audio community is amazing when it comes to sharing knowledge — this website is a testament to that. If you are working with VR, I’d love to hear about your thoughts and design process. If you have something that you’d like to share as an article, contact us about guest contributions. I’ll continue documenting my thoughts on the wiki and through future blog posts.