Finally! With the advent of somewhat affordable VR systems and the first titles making their way into the public’s hand, we are hearing something from the general populace and other non-audio developers that we sound professionals have always known: Sound matters!
Since VR is an enclosed experience, in that the user is entirely enveloped visually in the world they are experiencing, sound plays a more critical role in either enhancing or destroying the sense of immersion that comes with having an entire sense taken over by an all-encompassing experience. VR developers have taken some novel approaches to augment immersion in their games with various tricks and smart thinking. While many of these techniques are already in use in non-VR experiences, it is my belief that we can adapt some of the more novel concepts from VR to create a more immersive experience outside of VR as well.
The Known Knowns
As the initial developers began to point out some of their earliest findings, a lot of what made sense and was necessary in VR were things many of us were already doing in non-VR 3-D worlds. Whereas we may have been attaching sounds to individual joints of a character or object instead of having them play from the object root or its center point, in VR this wasn’t just a matter of more accurate positionality, it became a matter of believability. In full 360-degree immersion when things do not behave as our brain expects, that sense of immersion in the visual world completely breaks down.
Similarly, placing ambience in the world as point source emitters for better localization of sound is something 3-d games have been doing for well over a decade. But in an enveloping visual world (with 3-d audio no less!), these practices become imperative for positional location of sound sources in most cases.
Even the notion of “audio mip-mapping,” that is, designing sounds which become more detailed as the user gets closer to the source become ever more critical in VR because detail is another key to immersion. We often take a similar approach introducing additional layers or entirely different sounds as you approach a sound source, but everything in VR is put under a finer microscope and thus becomes all the more important to sell the experience.
Another critical element of sound design whose nuanced importance we occasionally overlook is attenuation. In many games, the visuals are getting so photorealistic that we can be led down the path of attempting to make all of our sounds behave according the real-world behavior. While often this can help make the mix sit better, it is not a cure-all for mixing. We may often want or need to emphasize or deemphasize certain sounds in a way that do not match real-world sound propagation and we need to always remember that, as an entertainment product, we want to craft the best sounding experience for our users, not necessarily the most realistic. This can be even more critical in VR due to the added necessity of users being able to hear things at varied distances from a greater range of three dimensional space.
So we may already be doing a lot of things in the non-VR world that VR developers are finding critical to sell the immersion in their worlds. But what more can we be doing in non-VR projects which VR is doing well?
New Territory
One intriguing technique which was used by the Sony London team as part of the Ocean Descent demo in PlaystationVR Worlds was the use of a “focus” effect. As Simon Gumbleton explained in talks last year at GDC and PLAY Expo Manchester, they affected the mix of the game so that it changed based on what the camera/player was looking at. If the camera was focused on an object, say a manta ray, for a few seconds, they would pull its sounds up in the mix a bit; in effect, simulating the cocktail party effect of the brain focusing on elements it deems important and filtering out the rest. I could see this working in interesting ways in non-VR games, where we begin to use camera movement and positioning to help modify the mix in a nuanced way where the player is always hearing what they’re focused on (and any other additional information we need the player to be knowing at any given time). Naturally, this scenario will not work for all situations, and even in VR, Kenny Young wrote recently about his decision to NOT use this technique in Tethered because what the player was looking at was not necessarily what they were focusing on…they could have just been soaking in the world or staring into space. It’s a hard balance to maintain when going this route: while it can be a great way to focus the mix on what the user is paying attention to at any given time, it may not be the most important thing for the user to be aware of. I feel we may see this approach used more as a selective form of mixing in areas where the experience is a bit more linear or guided and not open to a larger sense of exploration.
In the beginning of VR audio research, most developers were confounded by a core element to most multimedia projects: music. It felt jarring or disorienting to have music playing all the time. Localized, diegetic music worked ok, but there isn’t always a place for that in the world. If you’re making a prehistoric game not based in the Flintstones universe, how do you convey music without having a caveman group banging on bones and sticks all the time? Through much experimentation and trial and error it has been discovered that VR does not need to cut out music entirely or use only diegetic placement, instead, there’s a more judicial use of music. In large part, this is due to prioritization of spatialized sounds over music. The spatialization of the soundscape is really what sells the VR experience on the audio side, and if music is spatialized well it gels right into the mix. But otherwise, it needs to be used selectively to allow the spatialized sounds priority in the mix. However, this more sparse, deliberate use also adds greater weight to its presence. In the VR experiences I’ve had, I have never felt the music was immersion breaking. And in comparing the presence of music to VR and non-VR titles, I notice most non-VR titles have music playing more frequently. Speaking specifically about video games, I feel the majority of game design still operates under the belief that music should be playing ALWAYS. This can be an immersion breaker if the user is focused more on the music than the action. Furthermore reserving music more for key moments in the game, whether emotional or action-based, can emphasize those areas better and create more memorable moments for users. Music is a critical component for so much sonically: selling the setting, a character’s emotional journey, or even the emotion the player should be feeling at any given time, but being more judicious about its use as VR is doing (albeit for a unique set of reasons), can help non-VR titles be more emotionally affective.
There is yet another area where VR is exposing the warts of our current technology. The uncanny valley has been an area of struggle for years for visual artists and technologists. As we’ve approached photorealism with computer rendered imagery, our brains begin to disbelieve these images more because they are so close to reality, but not quite there. On the audio side this is often tied to synchronization. As our human character models have developed into near life-like analogs of the actors who portray them, the facial and lip synchronization is what begins to break down the sense of immersion in cutscenes and other close-up scenes involving dialogue. If the mouth and face is not moving exactly as we expect, we begin to get pulled out from the emotion of a scene. On the VR front, that synchronization of visual and audio is even more critical, and not just in regards to lip sync. If a sound doesn’t hit when a visual does, the brain has a hard time processing the two events as being unified. We do lots of things to mitigate these potential problems like the aforementioned attaching sounds to joints of a character model and also attaching sounds directly to particle effects, but VR is placing synchronization under a microscope where it really needs to be. I think this will become more of an issue on the non-VR front if we embrace 3-d audio outside of virtual reality.
Not our concern?
There are also some issues in VR where non-VR titles may not need to be as concerned about immersion. From a video game perspective, the non-linear aspect of most games means we often need to be taking numerous potential actions by the player into account and design and mix accordingly. In VR this concept gets exploded exponentially when the user is in a world they can interact with because the perception in this closed system is that you can interact with EVERYTHING. In many non-VR games, your character may not be able to pick up and interact with every single object as you can in, say, Skyrim, which becomes an easy mechanic to accept. But in VR having objects the player can see but not interact with is disconcerting, especially in a first-person perspective game when you can see your avatar’s hands. The PlaystationVR Worlds team ran into this issue when working on The London Heist. As Simon Gumbleton explains, there is a sequence where you’re a passenger in a car. In initial tests, people would stick their head out the window. In a non-VR game this is not really feasible because we don’t have our heads as a control device. It’s all relegated to the controller. As a result, they ended up having to create a unique mix when the player sticks their head out the window to maintain that sense of immersion. This can lead to additional small details that really sell the world. In the same sequence, there’s a cup on the dashboard. Early testers would pick up the cup, and the team noticed something interesting: the testers would all try to drink from the cup! So Simon and the audio team made it so if you made a sound into the microphone with the cup near your mouth, it would play a straw sipping sound. They also did this with the cigar at the beginning of the game: pick it up, put it near your mouth, and inhale/make noise into the microphone and the cigar will glow and burn hotter. Some super cool detail all coming about as a result of the paradoxical freedom that the closed experience of VR goggles can provide.
Virtual reality is still such a new field, full of new and evolving technology and practice. Much of what we’re doing on the audio side is borrowing heavily from the non-VR space, and there are already plenty of novel techniques in use, some of which should be considered and utilized for non-VR projects to help enhance their immersion. I am most excited about seeing what the future holds: as the technology and our experience grows, we will undoubtedly come up with new, novel approaches to make our interactions even more immersive.
(Editors note: I’d like to thank Simon Gumbleton, Kenny Young, and Tom Smurdon for their feedback, thoughts and insights for this article)
Dale Adams says
“If the camera was focused on an object, say a manta ray, for a few seconds, they would pull its sounds up in the mix a bit; in effect, simulating the cocktail party effect of the brain focusing on elements it deems important and filtering out the rest.”
I feel as though with a purpose this focus technique doesn’t make much sense. It’s not how our mind works; attenuating volume is not based on where we are looking and our head position in space. It is, however, a great way to drive narration and the story in the space. At that point the value becomes clearly presented in the UX and serves a function. Clarifying the reason why it’s introduced in such a way. In the end we are still telling a story, just with sound.
Peter Weis says
I have been a composer and engineer for 10+ years and recently just discovered the world of immersive audio and programs like Fmod Studio, Unity. I was blown away by the capabilities of what you can do to immerse the user in a 2D and 3D environment. And how easy it is to adapt a sound to a spatial environment. The features to randomize and blend sounds in a way that interacts with the user could be used in so many different ways. The need for Audio in the VR is going to be huge once both consumers and developers realize its not just a gimmick. Imagine recording a live concert in a VR setting where the end user could walk around a 3D immersive setting of the venue.