Capcom Audio Director Tomoya Kishi Interview
Japanese video game composers and musicians get a good bit of coverage and acclaim over in the West, but the people making the booms and whoosh sounds don’t seem to get much visibility. To that end I reached out to Tomoya Kishi, who is the Audio Director and Senior Manager of Audio Design and Production at Capcom:
“Tomoya Kishi joined Capcom in 2001, beginning his career as an audio editor on the Onimusha series. In 2004, he was assigned to be the audio director for Lost Planet: Extreme Condition, a role he continued on the sequel Lost Planet 2. During this time he constructed a work flow to improve efficiency in video game audio development, and worked to forge more active collaborations with Hollywood sound studios—activities that have given birth to new ideas as well as new possibilities in the overall industry.
Tomoya’s recent work has been as audio director of Dragon’s Dogma. By utilizing the work flow developed on previous titles, as well as incorporating a number of collaborative works, the audio in Dragon’s Dogma has been one of the most interesting, challenging and inspiring projects of his career. Additionally, he has lead the development of Capcom’s original audio middleware, cooperating with professors and researchers with the aim of inventing a new technology in video game audio.
Tomoya currently is the senior manager of the audio production team at Capcom. The team consists of 60 members from various fields, including sound design, composition, engineering, programming, and audio production.”
Designing Sound: How did you get started in sound design? What inspired you to do sound design for games?
Tomoya Kishi: It’s a bit of a long story, and a little embarrassing, but I’ve never formally studied music―I originally studied marketing at my university’s commerce department. However, I first got into music when my parents bought me a Yamaha synthesizer at age 14. I started remixing my favorite artist’s tracks and experimented with composing my own.
It was in the 90’s, right when club music like house and hip-hop was breaking out in the underground here, that I was hooked on creating breakbeats with the AKAI S01 sampler. The RAM on the AKAI S01 is fairly limited, so I played around sampling at a higher pitch, then going back and lowering it, sampling at a lower bit rate, shortening samples as much as possible―I was always trying to cram as much as possible into that limited space, never thinking that this experience would come in handy down the road.
In college, I DJ’d at clubs, put on shows, and self-published my own album. At that time big beat was in, so artists like Fatboy Slim were hot.
Around this time I ran into someone from Capcom at a club and first learned about sound effects. It turned out they were working on the sounds for Street Fighter. I was job hunting and wanted to work in sound, so with their encouragement I dove into this world.
Whew, so that is quite a bit of back-story―basically, I started sound design when I entered Capcom at age 22. Luckily, I was used to most of the equipment involved; I just had to learn Pro Tools, and put my sampling and sound-mixing sense to work. In the end, it was less that I was inspired and more that my career just happened to start with game sound design.
-What software do you use for sound design? Any favorite plugins and workflows?
In general I stick to Pro Tools, and considering the total recall I don’t use an outboard. I like to keep things simple and stay away from physical controllers so I just control everything using a trackball. The plug-ins I use most are McDSP, FilterBank and CompressorBank, Waves, Pitch ’n Time, Duy EverPack, Pultec EQ and Comp, and finally Altiverb. Conceptually, Altiverb is about sampling acoustic spaces, so for someone who is into sampling as much as me it was love at first sight. I use it for a lot of different things: voice effects, to add a little something to the digital track to make it more organic, and so on. I also get a lot of use out of Pitch ’n Time. I love Pultec for how it dirties up sounds with heavy compression, so I use it to spice things up or to create nuances in footsteps. I like plug-ins that allow you to twist sounds a little with dirty effects while maintaining the original, organic texture.
-How much field recording has typically been done for projects you have worked on? What sort of field recording gear to you use? (microphones, digital recorders and accessories)
I don’t actively go out to do field recording. The amount of time allotted for production is quite limited, and considering everything else on our plate the schedule just doesn’t allow it. This is especially true for someone like me who is both a sound designer and tasked with being the Audio Director. Also, there are lots of things I have to do on the game design side such as give my opinion on the game itself, participate in meetings, train new recruits, and work to improve sound production throughout the whole organization (as most large Japanese publishers also do in-house development, I imagine it’s the same at those places as it is here.) For better or for worse, we typically rely on our existing library when creating sounds since that’s more efficient.
When we do need to do Foley or field recordings, we often work with studios in Hollywood in order to make the most of our limited production time. For example, we worked with Peter Zinda of Soundelux DMG on the sound design for Lost Planet 1 & 2. We asked Todd-AO to handle the Foley recording with Foley artist Gary Hecker and Foley mixer Nerses Gezalyan; it was truly sound design at its finest, just superb. Foley artists James Moriana and Jeffery Wilhoit are also amazing―the way they make sounds is almost magical. Working with them was such a great experience.
For the Dragon’s Dogma sound design, Bryan Watkins of Warner Bros. did a great deal of field recording for us. One early morning we all gathered together at the Warner Bros. Studio Facilities to record the sound of a bell echoing across the set, which we were able to reproduce in the game’s village. We also worked with John Fasal on animal recordings. Thanks to him we got some superb sounds for bears, lions, and tigers.
-Do you have any preferred sound effects libraries? Do you prefer field recording or sound libraries when designing sounds?
As I said, we typically use our library when editing sounds, but when we are in need of something fresh or have a very important sound to create, we often work with professionals to get the recording―though we may also do recordings ourselves, of course. We do Foley recording at the in-house studio and field recording work to capture environmental sounds. Also, since we can’t record gunshots here in Japan, we go abroad to get them. We’re actually in the process of building upon our own library at the moment. As you may have guessed, sound design in games can get hectic, so it’s important to improve business operations so that you can make more time for creative processes, including the workflow. But what’s important is not whether the sounds in our sound library, which are what we have to rely on, are our own original creations or commercially acquired, but whether they sound good in-game and communicate what is required to the player. When you think of it that way, the middleware you use becomes very important.
-For you what is the most important process in the creation of sound for video games?
That’s a good question. Firstly, since players are accustomed to Hollywood-level sound quality, it’s imperative for us to build an efficient workflow capable of delivering high-quality sounds. In a nutshell, ours looks like this: recording, editing, creation of the in-game audio system, implementation, balancing, etc.
Here in Japan there is a fast food chain whose slogan is “fast, cheap, delicious”. Our sound production works under the same premise; our output has to be on par with that of a 5-star restaurant while our processes have to be optimized to fit our budget. There are all kinds of obstacles: no matter how good of a recording you make, if you drop the ball during implementation, the final outcome is no good; if you spend too much time in editing, you’ll lose the time for balancing later on; even if everything goes well, all of your work may have to be redone due to changes in animation. You really have to have a good working relationship with your animators and VFX team. The most important things are probably implementation (how the control of randomization and pacing for sounds, and their priority have been implemented), balancing (the timing and volume of sounds for each category), and the interactive mix (which is like having an in-game Re-Recording Mixer―something to adjust sounds on the fly in response to changes in the in-game environment.) These sorts of processes, I suppose you could call them post-dubbing in movie-terms, have a great effect on the sound quality of a game. This could become a bit of a long discussion, but it’s also critical to decide on the target loudness from the start, and then from the source-creation stage to ensure that the entire sound team creates audio with an awareness of the target dynamic range.
Game audio is not done in post-production but in tandem with development, so it’s important to rationalize all processes within the workflow. What supports all of this is our proprietary game engine and audio middleware that allow us to create the optimum settings for a given game genre. Also, being able to develop applications that link our engine and audio middleware is an advantage. We have both creatives and programmers on the sound team, so we are able to whip up applications specific to our creatives’ needs at any time.
-As the Audio Director of both the Lost Planet 1 and 2 as well as Dragon’s Dogma, how did you approach each project in terms of sound design? Lost Planet is very sci-fi and Dragon’s Dogma is medieval fantasy; did you enjoy working on such different projects in terms of sound design style requirements?
Let’s talk about Lost Planet first. I have to say that it’s rare to find a game so jam-packed with stuff to satisfy a sound designer: from real guns to lasers and all sorts of other cool weapons; creatures and robots; real Foley and a variety of cutscenes―the blend of reality and sci-fi gave us a lot of room to express ourselves and that really got my adrenaline pumping! Not only that, but we were launching a new franchise on brand new hardware―the PS3 and Xbox 360 had just come out―which threw up all kinds of new challenges in terms of sound design. Since our target was a global audience, we kept in mind certain details, like how robots in the Western media make different noises from those in Japanese anime, all the while striving to attain a sense of realism. We also added some randomization and strengthening and weakening of in-game sounds along the lines of what you’d hear in a cutscene. A basic concept utilized was what I like to call exaggerated realism. A lot of the dynamics and perceptions of distance were recreated in the game as they would exist in the real world; however, with other things, like the sound of the player’s gun, or the impact of the enemies’ attacks, I ignored any sense of distance and overemphasized them, adding this dry, full-bodied feel.
Games are a bit different from film, and because the player hears, grasps and gains satisfaction from sound effects, we took extra care with the more exhilarating, cathartic sounds―sounds where the player pushes a button (for gunfire), and then hears a reaction to that (explosions, etc.) This is when I started making use of acoustic physics in sound design. For example, I prepared multiple samples for explosions from various distances. For explosions at a distance, I added more latency so they could be better heard, but for explosions behind walls and obstacles I damped and blurred the sound.
So, for about 5 years I was just immersed in the sci-fi universe of Lost Planet 1 and 2, but all the while people around me were working on games like Monster Hunter, Devil May Cry and Resident Evil. Now this could be unique to Japanese game development, but here all of the sounds for a number of different games are made on one floor. Being in an environment like that, I got to hear snippets of things other than sci-fi―and sometimes I got a little envious of other folks’ projects! It was around the time that we were wrapping up Lost Planet 2 that I heard about Dragon’s Dogma. As this was to be a high fantasy world set in the Middle Ages, it called for very realistic sounds. Having been doing nothing but sci-fi, I thought this might be a bit easier―boy, I couldn’t have been more wrong! I went out of my way to ensure we made thoroughly realistic sounds for the game, but there were of course challenges; the fact that it was an open-world game presented both technical limitations and problems. At the end of the day though, none of that matters to the player, so I just concentrated on creating the necessary realism to make the game world believable. Conceptually we were going for distance and atmosphere―we wanted to create a sound design that felt immersive, as if you were actually there. Since working on Lost Planet, I had been expressing myself aurally via exaggerated realism, so I did this in Dragon’s Dogma too. I also wanted to continue introducing acoustic physics into our games, so I injected elements like air absorption relative to distance (basically, I utilized a low pass filter used with respect to distance.) Sound effects were as simple as possible. Where the sounds in Lost Planet were strong, sharp and clear, with Dragon’s Dogma we went with mild sounds that would be pleasing to the ear. This was to make things easier on players, as role playing games tend to be quite long. We also took great care to treat each sound as important, so, for example, sounds like footsteps or cutting into enemies were all recorded outdoors so as to eliminate the reflective, atmospheric noise distinctive of rooms. All of the sound designers on the project shared an understanding that we were making very organic sounds.
-You mentioned to me that your sound team has 60 people including composers, sound designers, engineers and audio producers. How many projects do they work on simultaneously?
We work on games that are released on a number of platforms so in a given year we can work on 30 to 40 titles total, with 10 projects running simultaneously.
-How many audio people work in each discipline? (sound designer, composers) How do you keep the entire 60-person team communicating with each other effectively?
Altogether we have 30 sound designers, 10 composers, 5 sound programmers, 6 audio producers and 3 mixing engineers―oh, and then there’s production management. As I mentioned earlier, we all work on the same floor, so you’ll have everyone from newbies to old-hands all in one space but divided up into project-based islands. This is great because if you have a question you can just walk right over to someone who knows about it and ask. Actually, we’ve built an environment where knowledge can flow freely; each week we hold role-based meetings for knowledge sharing where all of the designers or composers will each get together and discuss their work or new technologies. We also have web resources for knowledge sharing, where information can be shared and accessed in real time. We post all sorts of things: project post-mortems, reports on business trips, information about current projects, and information about business partners. It’s all a part of the sound production development strategy. It’s an environment that keeps all of our members on the same page, but which also inspires a little friendly competition. We all use the same proprietary audio middleware as well, so whatever the project is we can shape things to fit. The same goes for improvements: if a given team member has an idea, we can hash it out and add it in. Our designers and engineers are all involved in production, so we can make decisions and develop tools and applications quickly.
-Japanese games typically have less assets and randomization for some sounds than Western games do. For example: Western games tend to have much more varied footstep sounds. What is your opinion of this difference in approaches?
Well, you could say we’ve developed a mindset where sounds are thought of as symbols; but to further explain would require looking at the long history of Japanese games. Back in the days of the NES, there were very simple sounds produced from a PSG sound source which would play when the player pushed a button. Also, animation has flourished in Japan; thanks to the prolific use of sounds as symbols in that medium, symbolic sound effects have been ingrained in our minds. So that is our base starting point. Even when we moved on to sampling based production we were constrained by memory limitations on the PS1 and PS2, effectively cementing the idea that a limited number of symbolic sounds could be used as game sounds. Take footsteps: it’s so important to transmit the idea of walking or running to the player. In order to have the way the character moves feel right for the player, we need sounds that transmit the feeling of ‘Left! Right! Left! Right!’ in a way that sounds rhythmical, so we make the sound effects go ‘Clip! Clop! Clip! Clop!’ Western game development progressed in a different way, though. By the time Western game development started to become dominant, sound sampling technology was readily available, so it seems only logical that those developers would pursue greater realism and more natural variation in sounds.
I think that Western games are less concerned with symbolic sounds, and put a much greater importance on creating an immersive, realistic sound experience. Personally, I think the difference in use of sounds as symbols is most salient in the UI and menus. Whereas in Japan we use very sharp, granular sounds, Western developers favor subtle noises that won’t detract from the game experience. Sound effects in games are really vital for players to judge a situation, feel exhilaration, grasp the results of their actions, and gain an overall sense of accomplishment―and I feel that they are something to be viewed subjectively. For sound effects, I think Japanese games do a better job of really making you hear the sounds, while Western games focus more on experience, but, of course, this could be just a matter of personal taste. One could even say that it might be necessary to localize sounds in games for different markets because of things like differences in environments and differences in how the world is viewed. In Japan, for example, it is because we lack physical space that you can’t have the volume cranked up in everyday life―so maybe because of this we needed to make small, sharp, distinctive sound effects in games.
-You recently spoke about the Interactive Reverberator at AES Convention 2012. It is implemented in Capcom’s middleware tool MT-framework. Could you explain a little bit about the Interactive Reverberator and your experiences with it?
The Interactive Reverberator is being developed as industrial-academic collaboration between Capcom and a professor of architectural acoustics and studio design researchers. In simple terms, it is designed based on statistical acoustics: by setting the room volume, surface area and average absorption coefficient it’s possible to elicit the reverberation of a given space. This is used as a first step in designing halls with architectural acoustics, but we began this project by figuring that it could also be applied to the virtual space of a game. Reverb in games is often set just as you would when using a general plug-in, by adjusting the reverb parameter. However, when you have a number of different stages where you need to create this effect it can become quite a chore. By using the Interactive Reverberator, you will be able to set the reverb for a given area simply by inputting the room volume, surface area, and average sound absorption coefficient. If you felt so inclined, you can even pull the spatial data from the DCC tools in Maya or other programs that your background design team is using. In other words, you can construct the reverb for an in-game space as soon as that space has been created. It can handle repeated changes on the fly, including immediately adjusting for changes in the environment such as walls collapsing. It is the same principle as IR reverb, but simply happens instantly in-game based on the three inputs described above. As you might expect, it requires quite a bit of processing power so we’re testing it out on a high-spec PC running the MT Framework―though I feel it should be ready to make its debut on the next generation of consoles.
In game audio production, it’s important to balance the volume of work to be done with the limited production timeframe; it’s becoming more and more important to hone efficient production processes. As is the case with music and movie audio post-production, experience and gut feeling play a large part in game audio production. The most appealing thing about game audio production, though, is that, while it’s related to a visual product, it’s possible to use programming to create different audio effects in the game world such as panning and sound damping with distance. However, perhaps it’s natural for those who rely mostly on experience and instinct to shy away from that sort of thing. There’s also the problem that this acoustic physics thing is by and large not well understood. I would venture to guess that only a handful of developers have tested out sound damping with distance―figuring out that as distance doubles you get a level drop of 6dB, or what have you. However, the reality is that it doesn’t function properly. This is the principle of audio point source, but it’s unworkable because in real space a point source doesn’t exist.
We know far too little about how acoustics should behave with respect to events in the game space. If we again turn to audio point source as an example, one can say that since it doesn’t work there’s no point in even incorporating acoustic physics into a game―leading naturally to the conclusion that experience and instinct are, after all, best. However, here at Capcom we have spent the last few years working to incorporate acoustic physics into our games as a means of enabling us to quickly reproduce a certain sound within a game and therefore increase efficiency. With a small number of preset parameters that we can use for acoustic physics, we can turn to our creatives’ experience and instinct for fine tuning. After all is said and done, it’s the human ear that is really important. According to our professor, even after calculating the reverb within a hall’s design, the finishing touches have to be made by an acoustic engineer. But even though acoustic physics might not be perfect, using those principles means that the time that would have been spent on implementing the correct acoustics for an increasing amount of in-game material, and on getting the right reverb and sound damping by distance, can now be spent on more creative work. I feel that this technology is the future of game audio design, but also that human feeling and instinct will remain the most vital element in the mix.
-Many Western developed AAA games often use audio middleware tools like FMOD and Wwise for their sound engines. Japanese developers seem to prefer their own tools such as Capcom with it’s MT-framework. Why do you think Japanese developers prefer to make their own custom tools rather than use third party software? What is your opinion of middleware such as Wwise and FMOD?
I can think of a number of reasons. Most major Japanese publishers are also themselves developers, and historically these companies were all rivals with proprietary engines, so some of that background must still play a part. Also, I reviewed Wwise’s service around the time it started, but at the time there wasn’t much merit to switching over as I couldn’t see much difference between their service and our in-house sound drivers―so that could be another reason. Further, many companies here have been utilizing CRI’s ADX and video codec since the PS1 days―and their Japanese tech support is really excellent, which could be one reason why it’s so prolific, and their audio codec itself is truly top-notch. Setting aside the finer details, there isn’t a huge difference in what sound drivers can do, and even more important than drivers are the applications used to implement the audio into the game (for example, to add sounds to the animations, set environmental sounds, and so on.) We make all of our games using the MT Framework at Capcom, so the sound drivers are common to all projects, and our applications are modular so that we can implement what’s needed on a per-project basis. We also optimize things for the MT Framework itself. We are in the same building as the MT Framework developers, and we have both creatives and sound programmers working on sound production, so we’re able to efficiently undertake application development and driver updates. This means we can customize everything to perfectly suit our production needs. Wwise and Fmod are really great, but, if you think about integration with our game engine and the continuation of our in-house development culture, there are a lot of good reasons to use our own middleware. Since we do both publishing and development it’s necessary for us to view production from a long-term operations perspective. I’ve already talked about the Interactive Reverberator, but to implement something like that you need to optimize things in terms of processing in order to read real-time information based on the game scene about areas, surface volumes, and the average sound absorption coefficient, and then reflect all of that data in the sound. This is something that could only be achieved with internally developed tools. These tools represent a unification of the game engine and audio production, and I feel that they will play a major role in improving quality moving forward. Sound design would be even more enjoyable if we had a lightweight, high-quality multi-platform codec, or if we were able to introduce more third party plug-ins, link with outside physical controls, and handle multiple outputs.
-How do you see the sound in Japanese and Western video games today? What do you think could be the next step or improved?
While this isn’t limited to sound, Western games really shine with their use of well-orchestrated, powerful spectacles. Japanese games, or perhaps just Capcom games, are very good at paying close attention to the finer details. We refine graphics and sound down to their smallest parts in order to satisfy players who are accustomed to Hollywood productions. Leaving aside the question of whether this is good or bad, even though the finer points are taken care of, sometimes our big scenes are not executed as well as they could be. Unnatural sounds can deeply sway a user’s impression of a game. In order to make the game more immersive, we adjust everything down to the finest detail―the difference in sound levels and dynamics for in-game audio and cutscene audio, the reverb and acoustic pressure in the dialog, the quality of the sound effects, the mix, and so on. However, there is one critical area which we should improve―the dialog. Getting a good performance from the game’s characters, which means developing in a foreign language and then carrying out a perfect localization, can be very difficult. This issue goes beyond sound production though; it covers the entirety of the project from prototyping to script design and voice over, right through to implementation. I’d like to design sound that can be used as a performance for an in-game character, and also makes the player feel as if they are actually there in the game world. Movie audio is made up of performances that have been recorded via a microphone, but game audio is made up of sounds that are heard in the game environment―I’d like to make this difference one of the distinctive characteristics of game audio. I think the next step is to optimize the ways in which game audio can express distance, space, echo, and reverb via acoustic physics while leaning on the methodologies used for TV and film performances, or even taking a psychoacoustic approach, in order to bring the entire production to a higher level.
Audio production is composed of numerous elements, and while it overlaps with movie or television post-production it can also incorporate physics. At Capcom, our game engine and middleware allow for a high level of integration between performances and physics. This lets our creative staff work efficiently and concentrate on expressing themselves. As I mentioned, every aspect of the workflow is important; quality is assured by distributing the most appropriate resources to all areas of production, which in turn breeds high levels of user satisfaction. Being both a developer and a publisher means we are directly involved with everything, from the early phases right through to promotion; we have an awareness of how things are at all of the different stages of production, and this allows us to maintain a sense of cohesiveness. Using the approaches we’ve developed, we can produce sounds that are even better than the very best things that we’ve produced up to now. I work every day hoping that something we create could set off a new trend in the field of sound production. I can say with confidence that Capcom could do that because we represent the best of what Japan can do in game sound production. We are evolving all the time, so I encourage you to check out what we are up to―and I’d like to emphasize that beyond designing sound, we are making games.
-What are some of your favorite video games overall and in terms of sound?
Because games are a multimedia entertainment package, it’s a bit difficult to think of them only in terms of sound, but I found Bioshock to be a really well-made, cohesive title with an immersive world. I had always wanted to make a game where the music came from the game environment, so I was it was somewhat frustrating having someone beat me to the punch! It’s truly an excellent title. Other than that, it’s tough to point out any single game that I really love, though most AAA games are really enjoyable as entertainment. Also, although they may not have had the best sales numbers, over the years I’ve played a number of Japanese games that I would say are less products than they are works of art; I really enjoy those kinds of games, they really stick with me.
Special thanks to Tomoya Kishi for taking time out of his quite busy schedule to answer our questions. Special thanks to Capcom for providing video, pictures, and translation services. You can find out more information about the Capcom Soundteam at their official site where they even have some English language posts related to Resident Evil 6 which you can find here.