Whether you heard of Firewatch when it was announced a couple of years ago, around the time of its successful release or when it won such awards as Unity’s own “Best 3D Visuals” last year, you probably were struck first by its wonderful visual design and the incredibly natural and realistic flow of the conversations between its two protagonists, Henry (played by Rich Sommer) and Delilah (played by Cissy Jones). Such aspects, as delightful as they are, shouldn’t however overlook its gorgeous, emotion-filled audio direction. On this topic, I had the pleasure to interview Chris Remo, audio director, composer and game designer on Firewatch.
One thing that struck me in Firewatch ambiance is how life was so discrete and even non-existant by times (almost no birds singing, etc.). Was that a design choice to emphasize on the player loneliness or is there another reason for that?
Firewatch is very much about solitude. Companionship is a big part of the game as well—you spend nearly the entire game in constant communication with your supervisor Delilah—but it’s companionship at a remove. Henry is very isolated, and we wanted the soundscape of the game to reflect that. I also think it’s striking how few quiet spaces are left in our lives; most of us, certainly myself included, are surrounded by sound of some kind fairly constantly, and when you go out into these massive wilderness areas, it can be amazing how quiet it can be. It allows you to be alone with your thoughts and your own presence in the world. It felt important to reflect that.
Different locations have different densities of in-world elements—for instance, the forest area surrounding Henry’s lookout tower (which is called “Towerhub” in our game assets) has a lot of birdsong, creaking trees, whistling wind, and so on. Down by the lake, you hear the lapping of water and—on days when the teens are not present—the constant honking of ducks. Some areas are denser audio-wise, some areas more sparse.
Speaking of discrete, the whole audio direction for the game was quite minimalistic, be it with player foley sounds (reduced to footsteps only) or even the UI. What drove you to make this choice?
Our first pass at sound design was much punchier and “video gamey,” but I thought it was simply too much. The game’s visuals are stylized, but they’re also very subtle and pastoral, and I didn’t want the sound design to stomp on that at all. When there was a choice between punching up the sound design or toning down a bit, I almost always went for the latter. There are a few exceptions where the audio is a bit showier—in particular, one big surprise moment that verges on a jump scare, which I won’t spoil—and some little bits of foley I wanted to be really satisfyingly “chunky,” like manipulating the combination locks, interacting with the cave gate, and so on. But for the most part, my instinct was to tamp down the soundscape whenever possible. The ambience layer has more going on than most people probably think (it’s a multi-layer soundscape that blends between various mixes depending on factors like location, time of day, wind source locations, and so on) but it rarely calls attention to itself.
I had a lot of sound design assistance from Jared Emerson-Johnson of Bay Area Sound, and I remember he said something to the effect of “when it doubt, turn down the footsteps,” and that really summed up our approach. Pretty much right up until content lock, I was tweaking the audio mix of the game, and often it was to turn down those footstep sounds even more. I wanted the footsteps to nestle right in to the ambience layer. It’s a big annoyance of mine in games when footstep sounds start to become almost a percussive score unto themselves; it’s a distraction from what you should be devoting your active attention to. (Obviously, if your game actually uses that kind of thing for a specific stylized effect, that’s another story!)
Did you do some location and/or foley recordings for the game?
Near the beginning of the development cycle, we had a team camping trip in Yosemite National Park, which I used as an opportunity to get some reference material and really feel what it’s like to be in a massive open protected wilderness area. None of that audio was put into the game, it just got filed away into useful experience. But Jared did actually get quite a lot of field recordings on a trip he did elsewhere, and I used much of that material in both ambience layers as well as specific bits of nature-related foley.
A glimpse of Firewatch’s astonishing landscape – Source: firewatchgame.com
Did you have any specific influences regarding Firewatch’s audio direction (video games or other)?
I remember when I was getting started on ambience, I loaded up The Elder Scrolls V: Skyrim and spent a bunch of time just standing around in different environments to get a sense for the variety of ambiences they used. Obviously that game is physically massive compared to ours, but it actually has a lot of commonalities in terms of the variety of outdoor environments it depicts. I was surprised to note how stripped-down much of the ambience was, and it sort of gave me confidence to take a minimalist approach in our game as well. I haven’t gone back to do a comparison between the two games since Firewatch shipped, so I have no idea if they actually ended up very similar. Probably not, but who knows?
I remember reading that Henry and Delilah’s lines were recorded as actual dialogues in order to give a flow as natural as possible for their interactions. Could you tell us a bit about those recording sessions?
All of the dialogue sessions were directed by Sean Vanaman, who wrote nearly every line in the game. One of the interesting properties of those sessions was that they were all recorded in real time over Skype, which exactly mirrors how Henry and Delilah communicate in the game. Even though Rich Sommer and Cissy Jones both live in L.A., Sean requested they not meet in person before sessions started, so their relationship would be as similar as possible to that of the game characters. Unsurprisingly, they did end up meeting before development wrapped, but I think that early dynamic still comes through.
Firewatch’s music was used very parsimoniously but always in a very effective manner. What design choices did you had to make to end up with such a finely crafted balance between scored and non-scored moments?
Implementing the Firewatch score was one area where also being a game designer on the project made a huge difference. Ultimately, most of my time on the project was spent as a designer rather than as a composer, so I was extremely deeply familiar with the design and structure of the game as I incorporated musical elements into it. There are a number of tracks that are directly scored to specific conversational moments, which is pretty traditional. However, even though Firewatch is a narrative game with an essentially linear story in the broad strokes, there is a massive amount of variation in how any given player will work through the conversations and move from location to location. There are often different ways to reach your objective, entirely different and mutually exclusive conversations you can have on the way there, and in some cases the possibility to entirely skip big chunks of content, which the game then has to compensate for, so it all appears seamless to the player.
Because of all this, I thought it would be worthwhile to introduce elements of reactivity into the soundtrack. So I wrote a system that actually evaluates game state as you progress through the narrative, looking for moments that are appropriate to drop music in. For example, a piece of music might have 20 different places it CAN play, but it won’t trigger until, say, you’ve completed objective A, and you’ve already gone through conversation B on the way to location C, and it’s been more than X seconds since you’ve spoken with Delilah. Essentially, there are a series of different conditions that must be fulfilled, but for any one piece there might be different sets of conditions that work, to increase the likelihood you’ll get that piece of music at an appropriate time. Often, especially with the really chilled-out acoustic guitar pieces, the music was looking for “down time”—moments where maybe nothing big is going on in the narrative and it’s generally quiet, so the music can come in and almost elevate the moment by implicitly lending some emotional meaning to what’s going on. I think music can often encourage the player to reflect on their recent experiences in the game, in a more active way than they might otherwise do, because there’s this audible emotional material that serves as a vessel for their thoughts.
Then, once a given piece plays, it will never play again. The same piece of music never plays twice in Firewatch. I really didn’t want anyone to ever have the experience of thinking, “Oh, this piece again.” I didn’t want it to sound like a looping soundtrack with the “forest track” and the “river track” and the “cave track” or anything like that. Instead, there are musical motifs that are mainly assigned to specific characters and elements in the story, and they weave in and out of the soundtrack over the course of the game, with differing instrumentation reflecting the changing state of the story. And, much like with the overall soundscape, I figured it was on the whole better to err on the side of minimalism rather than on the side of overuse and noticeable repetition.
Firewatch’s original soundtrack cover – Source & purchase: bandcamp.com
Having done some researches, I noticed you used SECTR for your open-world building. How was this decision made and how did it help the audio design of the game (if it did!)?
SECTR was a great streaming tool that filled some big gaps in Unity’s out-of-the-box technology. It has an audio component, but we actually didn’t use that, we just used the level streaming technology. Audio was implemented entirely in Wwise, which I had not previously used but which I really grew to love over the course of the project. It does just about anything you could ever want in terms of audio implementation, if you’re willing and able to hook it all up as needed. It’s an incredibly mature and full-service tool.