THE FUTURE OF SOUND DESIGN IN VIDEO GAMES, Part 1
The following article contains excerpts from the “Future of Sound Design” lectures at GDC, VFS and Dutch Film Festival originally presented in 2006. Please note that certain expressions are personal opinions, and cannot be read as “fact”.
In our endless passion to make games have a similar, or exceeding sound-scape experience in comparison to other media, we constantly try to find new ways and techniques to obtain this. Some people ask “why are we comparing ourselves to film sound design, we’re very different”. Other say “Film sound experiences are the ultimate goal”. I say both are right. But to really figure out what the future may hold, we have to first learn from the past to enable measurement of missing objectives and goals.
To answer, we have to being by asking ourselves some questions:
- What’s been done in the past?
- What’s broken/missing?
- How does this compare to Visuals?
- What about Emotions ?
- Is there a future for Audio?
- What about everything else ?
The Past – Evolution in Numbers
Technical hinderances, ever since the X360 and PS3, have been much less of a hurdle for a sound designer to create engaging soundscapes. Lets look at the history, based on the most popular game machine/console during each period.
As you can see in Fig 1 & 2, the amount of sound-data currently storable on the console is so much, that in comparison the old consoles barely show up on the graphs. Memory isn’t really a technological barrier anymore.
What’s broken/missing?
The obvious one to look at would be the Sound Designer/Artist. Are the requirements of creative vs technical understanding still too high? Are they a hurdle we still have to overcome? In fig 3 I’m showing my estimates of the job requirements of a sound artist/designer working in the video games industry, when looking at the biggest selling platforms. Funny enough, the industry seems to repeat itself. Consumers are now using phones and other small devices to play games. These devices seem to be equally powerful to game consoles 4-5 years ago, which brings back the same technical hurdles, all well known and documented.
The 2nd obvious one to look at is the “no boundary” story telling experience. During the years I’ve noticed that for some sound folks who grew up in the technically restricted era, its very hard to cross-over to new platforms. As an example this is one the reasons I originally stopped doing music in the mid 90’s. I was pretty good at making small processors like the C64 and SNES do things they weren’t meant to do, and therefor got an edge on making enjoyable music. With the introduction of Redbook (CD) audio, the playing field was open to everybody, and I was no longer able to take advantage of any hurdles others hadn’t overcome yet. It’s critical that these folks find ways to move to the story telling, un-inhibited way of thinking. They have to gain this experience, or they’ll be left behind.
So what does this lead us to? To me, the biggest general missing link in making games have equal or better sound experiences than film is an investment in emotionally believable audio followed by treating the player smart, in both gameplay and audio. Lets focus on the first one for now.
Too often when playing games (including our own), I still feel “disconnected” from the experience due to sound. Some games make a great attempt at it, but in the end, there’s always something happening which causes the de-focus from the experience. During the remainder of this article we’ll touch upon what causes these disconnects. To understand this better, lets also look at how audio and visuals work together
Visual Media
Look at the below pictures. You’ll probably have a different instant feeling or emotion about each. You can tell that approaching visual realism isn’t always a good thing (the “uncanny valley” effect). It distracts from the believability, and the connected emotion you’re supposed to have. We’ll see later that Dialog has a similar issue.
Another interesting effect is that feelings generated from visuals can be interpreted different from person to person and feelings created by visuals are culturally relevant at times. Images generate a feeling, a response that we learned during our life.
VISUALS and AUDIO – THE MARRIED COUPLE
Now, lets do a quick exercise to see the relation-ship between visuals and audio (note: due to copyright, we can’t put this music here).
- Pick one of the above pictures. Look at it carefully. You’ll notice the feeling you had initially withers away quickly.
- After a few seconds, cue up your favorite rock piece near the chorus… Did the meaning of the picture change?
- Now cue a film-score (I like to use “seven pounds” as an example). How dramatically did the meaning of the picture change, but importantly, did the feeling it generated in combination with the picture sustain?
What can you conclude out of this? How does Audio fit into this picture?
- If picture gives you the instant feeling/reaction, audio maintains this feeling over time.
- Audio cues can change the expected emotion a picture generates
- Audio can enhance picture in more than a support role and change the emotional outcome
- Audio emotions take (usually) time to establish
A quick word about Feelings and Emotions
We have to understand when to say “Feeling” and when to say “Emotion”, as both are pretty different. It’ll also help us understand how audio plays a big part in this.
- Feelings are a learned response of the culture and your surroundings in which you grew up.
- Feelings are a subset of all your mind-body states (i.e. disappointment, hunger, hope etc.)
- A Feeling is the response part of the Emotion. (“I feel disappointed”… a resulting emotional reaction could be “I’m Angry”)
- Emotions are cross cultural – the same meaning all over the world
- Emotion is a chemical state in our brains. Those same chemicals inhibit our capabilities and limiting what we call rational thought
So how can a game use this ? If visuals and audio work together to establish feelings and emotions, you can use this to a certain degree to influence game-play:
- More emotion = less judgement
- If you want to remember something, get into the emotional mood you were in when you first experienced it.
- You’re likely to come back to a “liked” emotion. Some emotional states can be addictive.
- Person’s mood tends to follow that of the situation presented in front of them.
OK, so lets get back to Audio…..
The main ingredients of Audio in Game:
Everyone reading this probably knows the 4 main “technical” ingredients of audio in a game: Music, Dialog, Sound Effects and Mix. During the many lectures on this topic I always asked the question to the audience: “What is the most important element to a believable and emotionally engaging soundscape”? “What is the top contributor”, and “What is the top damager”. The answers usually ranged across the board, each picking their “favorite” one. Composers would pick music, sound editors would pick sound effects. Repetitive footsteps was often mentioned. Dialog was the most picked damaging ingredient…. Seldom the answer was “all”.
It’s probably obvious to you that every single ingredient is of equal importance to create an emotionally believable soundscape. You can’t approach a single ingredient in a lackluster way. Believability is key.
Believable Dialog
Dialog is still the #1 offender in believability area. I’ll get probably flamed for saying this, but I’ve yet to hear a single game which makes me believe I’m listening to the characters on-screen for an extended period of time. None have captured it as truly “believable” yet. Space, placement, acting, story, odd breaks, visual discrepancies etc. all contribute to dialog flaws. We’re running up against the same “uncanny valley” effect as visuals. We’re approaching reality and the human ear will now pick up every flaw, and is no longer forgiving. Yet if we’re very far from reality, we’ll believe it. Have you noticed you can watch a saturday morning cartoon and believe the characters?
What also isn’t helping us is that some characters in-game on-screen still look robotic (or don’t even exist), making us having to work even harder to make that certain voice believable in contrast to the visuals.
One issue I’m hearing quite a bit is the recording method used. Lots of dialog for games is still recorded in the traditional “music” way of placing a U87 (or similar) close-up to the actor. Often, introducing movement, space, air, body shocks is totally negated. 
In 2005 we did a quick test for a game called “NBA street” for which no longer accepted this type of recording. We rigged up a lot of players with wireless lavs, and had them play for several hours while feeding them scenarios and lines. The resulting effect was a much higher degree of believability. Following is an example with the “U87” version back to back with the “Lavalier” version. On purpose, similar hokey lines were picked to illustrate the concept (this was not a concept of good acting). Which one is most believable to you?
Example: nba.mp3
#1 is the old U87 version, #2 is the Lavalier version, separated by a beep
Believability Gap
Believability “gaps” are the #2 offender. Everytime a player is jerked away from the game’s believability, it makes him realize it’s a game, and intensity lessens. When this happens the intensity buildup has to be restarted to a certain degree. The game will never be able to reach the full potential of engaging, emotionally believable audio. Some examples of these gaps are:
- Awkward forced Loading screens (i.e. “silence screen”)
- Repetition on dialog, sound effects, or anything else noticeable
- Un-natural imbalance (Vol / EQ / Space etc.)
- Non believable Dialog
- Anything which goes against “learned” sounds, if not on purpose (more on this later)
Audio Mix – “The Glue”
This is a large topic, one that’s too big for this article to cover, to which we’ll come back later. It’s the #3 offender to create sustaining believable soundscapes. Too many games still ship with the “wall of sound” approach. A player can only take this so long. There are many solutions to this problem. One of the causes still seems to be “producer X listening on his TV in a noisey floor area wanting to hear every detail”, which is a hard one to overcome.
With any of the solutions, the mix shouldn’t make a user notice what’s happening, yet get more engaged. If your producer is asking for the “wall” approach, he’s in reality not asking for this. He just couldn’t hear something he wanted to hear. It’s now up to you to refocus the mix constantly to allow him to hear what he wants to hear, yet doesn’t take away from the rest.
Much more to come on this topic as there are many tricks to accomplish this.
Excuses
One of the other blockers in achieving an emotionally engaging soundscape is excuses. Yeah, the ones that every Sound Designer or sound-sup makes when he can’t achieve the needed result. We’re all guilty of it (including me). Yet, it’s one of the hurdles we have to overcome if we ever want our industry to excel above the film media’s level of engagement. If you know you can’t mix for an emotionally engaging soundscape, don’t do it. Don’t pretend you can learn it overnight. If you don’t have enough money to achieve the result, scope down, sell your ideas to execs or whatever you have to do. Don’t use’ em as an excuse why you couldn’t succeed. There are many reasons, many which are direct blockers, and many which can be overcome with creative solutions. I often hear “well, we don’t have the tools that some others have”…. Tools are a means to get to a result, but not the only way. Come up with creative ways to get the tools you need.
Wrap up
So what is the future of sound design ? It’s not some sort of new tool. It’s not a new console, it’s not a new “cool way” of creating sound in real-time. It is purely us overcoming our hurdles to find new ways creating emotionally engaging and believable soundscapes. Breaking out of traditional ways, learning what the human reacts to. How feelings and emotions tie in with sound is a must-know Content is no longer king, technology is no longer the queen. The combo of all of it, and the stimulating, game supportive result is what the player will be experiencing and wanting.
–end of part 1– We’ll look at “ear deficiencies” and “experienced sound”, including methods on how to use those to build an emotional soundscape, in the next few days.
Written by Charles Deenen for Designing Sound
Bobby says
I really enjoyed NBA Jam example of bringing VO out of the booth. This is something of a huge hurdle that we as a community need to push to overcome. There’s going to be resistance from the “old guard” voice directors. I know a few personally who would probably rather stay inside their cushy studios’ director chair.
Si says
hmm, about the nba back to back examples of VO/ dialogue, I dont think its a fair comparison as the studio recorded part has no processing (i.e. reverb) and is not mixed with environmental audio. The lavalier recorded VO has subtle environmental/ambient sound behind it which could lead to difficulty in the mix.
Charles Deenen says
Hi Si
There was no problem with the dialog like that in the mix. In fact, it made it easier as it jelled much more together with the BG’s etc. EA Sports and various other games are still using this method on some of their games to great success.
Think of it in this way though: You can have pristine recordings that might be great for a mix,, or you can have raw gritty recordings that have the needed emotion, phaseshifting etc., and still work perfectly well in a mix. I know which one i’d pick :)
Games in general use too much “saturday morning cartoon style” dialog, and a lot of it has to do with the way it’s recorded, and the way the actors feel comfortable, how they move, how they react.
I’ll take a real sounding dialog line recorded in a noisy spot over a pristine line that feels “sterile” any day. It’s emotion and believability that we are selling. The user doesn’t care that the line was nice and clean during the recording.