Guest contribution by Martin Roth
We’ve all heard of the promises of procedural game audio. A veritable Valhalla where sounds are created out of thin air, driven by the game engine, eliminating the need for huge sample libraries and tedious recording. Sounds great! So why aren’t we hearing more of it in games today? We’ve all experienced Rockstar’s work in GTA 5; those bicycles sure do sound great! Some indy games such as Fract or Pugs luv Beats have dabbled. But it seems that if procedural audio were all that it promised, it would be much more common. What’s the deal?
The hard truth is that while the idea is great in theory, no one knows what they’re doing in practice. The field is lacking in design principles, tools, and technical performance. This is especially true considering the end-to-end workflow. On one end, high-level tools are needed to give designers the flexibility to explore sound and its interactions. On the other, low-level tools are needed to make those creations available where they’re needed, be that on the desktop, mobile, console, embedded systems, web, or anywhere else. The end-to-end workflow is key to the adoption of procedural audio.
For the purposes of this article the terms procedural, generative, and interactive as they relate to sound and composition will be used interchangeably. Their distinction is important, but we’ll leave that for another article.
Scarce Design Resources
The field suffers from a lack of resources to learn how to make procedural audio, including standards for judging its merits. Undoubtedly the best learning resource is Andy Farnell’s book Designing Sound. The presentation focuses on design from first principles, but may leave those without a technical background struggling to understand the reasoning (but don’t let that stop you from reading it!). The book is written for clarity, not for absolute performance or maximum sound quality. Resources are otherwise scattered, usually compensated for by personal interest or continued education specifically on the topic.
Tools, Well Almost
Undoubtedly there many excellent tools available to design sounds, especially musical ones. A near fifty year history of electronic music has created a wealth of knowledge, best-practices, and interfaces for exploring sound. But here the end-to-end argument is critical. Unless the designer can run the sounds on the target platform, the tools are not helpful except as a part of the creative process.
In order to satisfy this requirement, the available tools are generally limited to any number of audio programming languages (or even general purpose programming languages). There include Pure Data, Max/MSP, SuperCollider, Csound, Chuck, C/C++, the list goes on. Many of these have robust and knowledgable communities supporting them. All of these tools allow the user to “do stuff” with sound, but how well they meet the needs of sound designers is debatable. Many would say that the learning curve is far too steep. The target audience for these tools has typically been those more interested in experimental work.
This leaves us in the difficult situation where the ideal solution is fragmented between tools that satisfy the high-level design requirements and those that satisfy the low-level technical requirements.
Low-Level Really Is Low
But let’s suppose that you’re able to design a sound that you’re excited about, with the interactions that you want, and the tools available. Now you need to embed that asset in some environment where it will run. Pd’s BSD license makes it very flexible. Max/MSP’s gen~ code export is a great step in the right direction, while Supercollider is open source under the GPL making it practically incompatible with almost all interesting platforms. Even if you manage to embed your work, chances are that performance will be poor.
As sound designers we dream of bigger technical budgets, but in the meantime we may be better served by making our assets as efficient as possible. We want to use a really high level polished tool like NI’s Massive to design sounds, the interactive possibilities offered by Max/MSP, but need the efficiency of an optimised native implementation. No toolchain comes close.
The Only Way Is Up
So where does that leave us? A dearth of design resources, incomplete tools, and poor performance. That’s why procedural audio hasn’t taken off. How can we improve the situation? Where do we see ourselves as sound designers, not only in games, but for any application, in the next few years?
Design Principles For Procedural Audio
Starting from the beginning, clear design principles are needed. Whereas sound design is typically only judged on its aesthetic merits, procedural sound must also be judged on its technical merits. An incredible sound isn’t useful if it uses up all of the CPU budget. A lightweight process that sounds bad isn’t helpful either.
From the aesthetic perspective, a lot can be learned from the world of music synthesis. There are many resources available teaching how to make certain kinds of sounds with various synthesizers, including large communities of musicians. These are typically musical sounds like bass drums, the Hoover, or vocal pads, and not the kinds of sounds that games would be interested in, like gunfire, footsteps, or environmental effects. But the aesthetic principles are the same and skills in one practice translate easily to the other.
Of course musicians often have dedicated hardware for their sounds in the form of synthesizers. Game sound designers do not. Unfortunately the knowledge necessary to generate sounds in a lightweight manner is typically reserved for those with a lot of technical knowledge about computer architecture and digital signal processing. Those techniques need to be either taught, or to simplified and made accessible and usable to a wider audience (preferably both).
For these two merits to come together, community resources such as books, forums, and showcases are needed. Open source sound is the order of the day. Forums are needed for practitioners to not only ask questions, but also to showcase their work. The questions and presented sounds can be collected into a ongoing list of best practices. Designers such as Anton Woldhek and Graham Gatheral have already taken it upon themselves to post their own work, but such examples are rare and hard to find. It’s also worth mentioning that many impressive techniques exist in academia, but they rarely find their way into practice because there isn’t a professional community to receive them or to translate them.
New Challenges New Tools
New tools must be written to specifically address the needs of creating interactive audio. To be clear, it isn’t just about putting synthesizers in video games. It’s a shift in thinking about audio and audio production from linear to non-linear. All musicians and sound designers are familiar with traditional linear-time DAWs. The tools that we need must create the bridge from the linear DAWs, not in the direction of performance such as Ableton Live, but in the direction of design without a timeline.
It is important for these new DAWs to make traditional sound design skills portable. This means being able to work at a very high level, with a great deal of visual feedback (e.g. spectrograms, per-connection preview, frequency responses for filters, etc.). All underlying complexity must be abstracted, one detail which most existing audio programming language IDEs ignore.
88 Miles Per Hour
And finally the DAWs cannot constrain the designer to the desktop environment. The designs must be exportable to any interesting platform. This includes embedded systems, desktop, mobile, or even the web, in the form of libraries, plugins, standalone executables, and applications. The key requirement is performance; flops per sample.
Reminiscing About The Future
Technology always seems to travel in circles, or spirals at least. The opportunities and challenges presented by procedural audio today seem to offer parallels to the development of modules and trackers for computer game music in the late-1980s. At the time it simply wasn’t possible to deliver hundred of megabytes of compressed audio, so music and effects had to be created locally. And today, all of the recorded audio in the world can’t make a world sound alive. Could it be that “in the future” we won’t be downloading raw music data off of the internet at all? Perhaps we will go back to the future when modules reigned supreme; the data is the program. Sound won’t just be replayed from bulk data, it will be executed as a program.
Martin Roth is co-founder of London-based Enzien Audio. Enzien are the creators of the Heavy audio compiler, a tool focused on solving the performance problem of procedural and interactive audio. He has previously served as CTO of RjDj, software engineer at Google, and researcher at Deutsche Telekom Laboratories. He earned his PhD in electrical engineering from Cornell University. Find him online at @enzienaudio and @supersg559.