Procedural Audio: Interview with Andy Farnell
[Continuing with the procedural audio series...]
Andy Farnell – a familiar name in computer audio – is a computer scientist, sound designer, author and a pioneer in the field of procedural audio. He is a visiting professor at several European Universities and a consultant to game and audio technology companies. His book, ‘Designing Sound‘, is a bible for procedural sound and should be on your bookshelf, if it isn’t already!
He was very kind to find time in his busy schedule when I visited London, and we talked about what procedural audio is, where it stands now and what it can be in the future. This article is a transcription of our conversation, which he was again very kind to edit along with me. It was no easy task because there was so much good content!
Thank you Andy!
DS: Where does Procedural Audio stand now? Would you say it is comparable to where CGI was in the 70s/80s, when computers weren’t powerful enough?
Andy: That is a central mythology – that the computers aren’t powerful enough to do it. This is often brought out as a straw man argument against Procedural Audio by skeptics. One of the things I did with my 2005 demo was to make all of the sounds (they weren’t very high in quality) that you would need for a first person shooter game – fire, water, wind, rain, some animals, some footsteps, some guns, some vehicles. This was 2005 and I had them all running on a 533 MHz processor generating a realistic-ish sort of soundscape to prove that if you had 1GHz processor and if you used half of it for the graphics then it would be quite possible to synthesise all the sounds using the remainder. Six years after doing that people would still come to me with this straw man argument, they would say, “You know Andy, we love this Procedural Audio stuff but there’s just not enough CPU available”. But we now have two to the five times more CPU than when I did my 2005 proof-of-concept demo. So, what’s behind that? Why are they saying that? It’s not true. What happens is the internal politics of resources. The requirements always expand to fit the resources available. The game worlds get bigger and bigger and the graphics get more and more demanding. The audio team will always have the least amount of CPU allocated to them as an afterthought, because in the current structural model of production sound is “post production”, and no body wants to commit to giving audio that much CPU bandwidth. I feel that is the real reason behind the argument. You often get these straw man arguments that enter in to a culture and just get recycled. People know that there is an argument and it comes to their tongue very quickly and they say “Yes we could do it but there is not enough CPU”. With the left over CPU on a modern games console I could provide you great procedural sound. On an eight core architecture, we would need one or two CPU cores to give procedural sound. Even more interestingly is what happens when we run models in GPU, and many Procedural Audio models are inherently parallelisable. So, yes, Procedural Audio is somewhere in that era before the Tron movie, or before the Pixar CGI revolution, its possible, but not yet seen as viable, perhaps the shift is too painful for big companies to make.
DS: Have you tried doing a similar demo using today’s technology?
Andy: No. I just don’t have time at the moment. Life is moving so quickly and I’m involved in so many other interesting projects. Some of them to do with computer science, some of them to do with philosophy. Interesting times, but in the future I want to have a research department and have a bunch of guys, really smart guys who are just on this and want to do it and I can help direct their research. Because, when you look at what Procedural Audio breaks down in to, its actually very deep specialisations – just like CGI. There is room for intense talents within the area. Let us make use of CGI analogy again – if you are a real good texture artist you are great at looking at skin and saying, “That is the skin of a 40 year old, that is a skin of a particular kind of salamander, look at the way the bone structure moves underneath it, look at the way the light hits it”. You get specialisations within Procedural Audio which would be people who are very good at fluids, they are great at doing water falls and drops of water and boiling mud and lava. They understand that sound. They are able to model it and come up with great sounding objects and great processes that do it.
Before I took up this umbrella term, this banner of Procedural Audio, and tried to make a focussed idea out of it, I had mentors – I mean people I looked up to, leaders with ideas that nobody else was doing in industry or academia people like Perry Cook. He is like the grandfather of Procedural Audio. He was doing it in the early 90s, when the argument that there wasn’t enough CPU really was a good argument [laughs] and then after him came Dinesh Pai and Kees van den Doel and they worked on impacts and fluids. They did that as very narrow academic work. I don’t know if they saw (the generality of the possibilities) that the water could be taken and integrated with a glass so we get an object that could be filled up or emptied, or become raindrops in a particle based weather system that interacts with different objects the rain falls on… The object-object interaction based idea of “sounding objects” really came out the North Italian schools, otherwise we just have event driven sample playback . But, they did extremely good work and a lot of my stuff is just interpreting their work and generalising, extending it, and making a coherent philosophy of sound as process rather than data. We must always be mindful of that background to it. It didn’t just pop out of the air. It is a project that has been in the background (since Mathews in the 50′s) growing slowly. If anything, I have been a very vocal advocate of these ideas applied to the general case of everyday sonic simulation, and been instrumental in defining what procedural audio is.
DS: So the obstacles aren’t purely technological?
Andy: I don’t know what the “real obstacles” are now. I’ve said in another interview before, that around 2006-2007 it dawned on me that there weren’t any fundamental obstacles to radical technical progress. We could do this. The obstacles were structural and political. How do you introduce a new technology? How do you get people to take risks on that? One of the weaknesses of it, it’s a weakness but a very deep philosophical strength (and this is quite subtle), is that sound as data fits in to a capital model. Intellectual property allows you to own a sound asset. So if you record or create a sound, it is an asset that you own. You can trade assets. But procedural model breaks with an ownership model because what you are doing is you are substituting general sounding objects for something we can make million sounds in the future. There is no redundancy built in.
DS: Although, if a game developer spent time and resources building a procedural audio engine, as they would spend time building an audio engine, wouldn’t it be an asset that could be owned?
Andy: The observation there is that the code is the asset. But the code is useless without a group of people who understand how to make it sing. We move the value from residing in the thing itself to how it is used. I see procedural audio as an art to be practiced, not just an application layer to be built.
DS: So even with Procedural Audio you will need a sound designer to understand what it can sound like and how it can impact a player/end user?
Andy: I think this was on the game audio forum or something years back. Someone raised the accusation at me that, “Guys like you put people like us out of business and you are making technology that is going to replace our art”. I took that on board as a very valid point. Being a sound designer myself, the last thing I want to do is put other sound designers out of work. I see it as a liberating step – you have your sound samples and you have this. I always see it as a complimentary technology and not a replacing technology, that is point number one. Point number two that is more important is that every new technology that comes along generates a new requirement for skillets which the talented people in that business become really good at. So every Procedural Audio team would need a good sound designer. I wouldn’t leave it to the programmers, I want somebody who has a great set of ears and I would actually put them in a higher position and get them to direct the programmers and say, “No its more like this, listen to these examples. I want to get this emotion across”, and they can direct it aesthetically. It’s not really putting sound designers out of work and it is not a totalitarian project. This is why I worry that the bean counters, the alienating/asset-oriented capitalists, are seduced by this kind of technology because they just think, “Well we plug that in and we get rid of the sound department”. That’s not what I want to see happen. One of the great advantages is that it gives 90% of your assets for free. You just put your objects in the world and you get default sounds. What that means is that you don’t have to worry about an asset-event matrix any more. You don’t have to worry that somebody has forgotten to put a sound on something because everything will be covered by default and now the sound designer is liberated not to be thinking up every single little rock sound but to go and focus on the emotionally significant sounds – the hero’s sword, the getaway car, the gun sound. They can put all their time and energy in to getting those right and not have to worry about the other stuff. That’s another argument for Procedural Audio. It raises the bar from where you start from.
DS: So, Procedural Audio is just another tool in the arsenal of a sound designer? A combination of different techniques?
Andy: Procedural Audio is a philosophy about sound being a process and not data. In its broadest sense, if I were to say it is the philosophy of sound design in a dynamic space, and the method is as irrelevant to Procedural Audio as whether you use oils or water colours is to painting. If you use papier-mâché and glue or whatever is to sculpture – the end is in the artist and not the method. So you can mix and match the methods, they exist separately. In the industry now, all the successful Procedural Audio is mixed methods of samples – granular methods with the exception of as far as I can see of Nick Fournel’s work which is basically a similar kind of resynthesis but it is phaselet or PVOC type re-synthesis. First lets see why that happens. There is obviously a clear bridge there between existing technologies and the direction that Procedural Audio can go in. It gives you an immediate start. You can use your existing sample libraries and your guys out in the field. I incorporate this in to my understanding of Procedural Audio as: your sound guys now do analytical recording, not for the purpose of using those as final products but for exposing and analysing the sound underneath so you can build your procedural model. Granular methods are a very direct way of doing that. You just take the input sound and bust it up in to its component waveforms and then you re-synthesise them as grain clouds in different ways. Or in the phase vocoder or linear predictor sense you split them up into transient-exciter components and resonant parts. In that sense it is a direct re-synthesis. Now these approaches that have a method and analyses part, in effect, have a one to one mapping. So you are doing re-synthesis but you can fiddle around with the parameters in the middle. I call this a shallow or phenomenal approach. What it means is that is that the way in which you can change the sounds is limited mostly by your understanding of the parametric interface of the re-synthesis method. The re-synthesis method doesn’t capture any of the physics, it doesn’t capture any of the process of the sounds. Whereas building a procedural model, where you got a model and a method which compliments the model, you are interested in the behaviour of the sound that is built in to your model. That is not phenomenal. That is not surface, I call that essential or deep. Guys who are doing that kind of thing are like Zheng and James, they are using really accurate models. They are basically doing what computer scientists mean when they say computer modelling. They are using fluid dynamic models to model fluids. They need a rack of computers that take days to process a few seconds of sounds, its not practical Procedural Audio, as would be used in games, because it does not meet the real time criteria for a start. At least not yet.
But having said that I also think there is another side of (non real-time) Procedural Audio which is not in computer games. It’s in animation, where the idea is that once we have introduced the sounding objects as models, into the scenes, sound ceases to be post production. You re-arrange the objects and their behaviour in your scene to do your visuals and the sounds come out for free. You can even change the location of the microphone virtually after the fact. This is the future of cinematic sound , CGA or computational audio. The best thing about this is that we can drop the real time constraint and trade speed for quality. Why not have thunder rendered with one with a million N-waves, which sounds more like the real thing but which takes a long time to compute because you have got a render farm and you are a million dollar Pixar type company? You need good model programs and lots of computing resources but you don’t care about the real time thing. I think Procedural Audio encompasses that as an art. Skywalker once kindly offered me computing resources, I think Randy Thom set it up, but its hard to work remotely out of context, I would like to put a masters student or PhD on that one day, I grew up with that rather dangerous Radiophonic workshop ethos of Oram/Derbyshire (two original women sound designers), creative research and commercial production combined, it is an iconoclasm of experts, expectations and traditions. Much too risky for todays world.
DS: Much about sound design is about achieving hyper reality and not reality. You wouldn’t want a gun to sound like a real gun. So is Procedural Audio about creating realistic models and building on them to achieve hyper realism?
Andy: Amongst the many good sound designers that I have met, they all turn out to be really quite well rounded smart people. They aren’t purely phenomenal, see the world not only through an artists gaze, but an informed, worldly apprehension. Sound is about going in to the world, what happens inside and outside things, its about deep knowledge about how things work and what does that mechanism mean to your emotions. I think the role of what a sound designer is becomes somebody who’s language is not about computers but understanding the mechanisms of sounds. And the natural progression for sound designers, when they run out of all the shiny plugin technology in the world is Procedural Audio because it lets their art develop, connecting with sound and its causes and its propagation and reflection. Deep knowledge of sound is what a sound designer has, even when they can’t express or vocalise it. People who have become really talented at that often have hidden knowledge, ineffable knowledge, they don’t have a way to vocalise it or make it explicit but they understand things intuitively and you see them in the studio just do stuff and if you ask them how they did it, they would go “I don’t know, I knew thats what was needed” and it gives them, I don’t like the word, but it gives them a “rationale” to explicate their knowledge about sounds.
DS: How does Procedural Audio fit in to this? Most sound designers aren’t mathematicians or scientists, most of them have a good understanding of how objects react in the world but not necessarily the science behind it.
Andy: In the very long run and I’m talking about – I hope to see it in my life time – ten, twenty, thirty, forty years in the future I think all of this that we are talking about today will be well understood as part of the discipline of sound design. I think the philosophy of Procedural Audio will be a part just as much an artist now can talk about textures and different kinds of lighting. In a way sound is a real throwback, right now. Some people have said to me, “Sound technology is fifteen-twenty years behind graphics technology”. That’s crazy, all of the algorithms that made visuals possible came out of sound. One dimensional signals first and then two and three.. it comes from radio and radar. And, somehow culturally, sound got left behind. Because, we are visual beings and we put all our energy in to manipulating and creating ways to have a visual reality. Twenty-thirty years in the future all of what we are saying now will be a part of the language of sound design. But right now, theres a fork in the road between my understanding of it – the philosophy and the way it should go and the way it will go practically – and what is happening. So part of my philosophy in education is principles not products. It’s a reaction against the commodification of skills and people, techniques, arts being reduced to products. So if you look on the sound design list when somebody says, “How do I make such and such a sound”, and somebody else says, “Oh you need the zzaaaq plugin, that does that”, and they completely abdicate any desire for knowledge. They don’t even care (to pay for it) because they can get it off bit torrent anyway. Somebody else has packaged that capability and knowledge and given it to them. And by doing that they have robbed them of the knowledge, (Zarathustra says: be careful what I give you as a gift, because I may take something away from you) the knowledge is useless by itself but as for an artist, for your career, your development, for your ability to do things, that knowledge is important, its part of it. Both the commercial approaches to Procedural Audio at the moment, and this isn’t dissing these guys (in Audio Gaming and Sony… I am in touch with Amaury quite a bit and try to help them out recruiting and seeing the way ahead), what they are interested in producing is products (that is business). They want Procedural Audio models as drop in objects in the game. With this model, given to the current middleware developers, you’ll probably have opaque Procedural Audio objects, you cant see inside them, they have a few exposed parameters. Say you have a car and you can choose four or six cylinder engine, a bunch of different configurations for the exhaust, you can choose the body material, the tyre and that would be it. That is good! That is how you want your end user to see the object. At a certain level. So one approach would be to sell these as closed objects, with their functionality is hidden. But, for me as an “academic”, as a pioneer I am much more interested in enabling (my relationship is with the sound designer, with people), you should get in to this stuff. To me, the good stuff will be toolkits like the kind Dylan Menzies and others have proposed (and using the pluggable physical components using FEM/discrete numerical difference schemes like Stefan Bilbao explicates) Now you have an engine model and its a part of the car and you can replace the engine – real world analogy here – you can tinker, you can take it apart and change the the way the camshaft and the pistons work. You can replace the method that is used for that engine with a subtractive method that you wrote in C++ and drop it in because there is a well defined interface. And the kind of well defined interfaces that work are like data flow interfaces – like Max/MSP and Pure Data type interfaces where you can just plum these objects together very quickly in the studio and test them in-world while playing. Ultimately, open ended data-flow user interfaces with efficient JIT /bytecode compilers are the future for creative Procedural Audio. Then you will have a very vibrant community, a vibrant ecosystem of programmer-sound designers and sound designer-programmers and people who work in teams and Procedural Audio will be a vibrant technology. But I think first it will go through what basically becomes a plugin (mystified) culture. Procedural Audio in my (wishful) philosophy is open and based around knowledge [laughs]. That will come with time after these products have driven a path and we will think about sound differently. The construction of sounds will be more technically informed and richer than saying, “Oh yeah, Hollywood Edge track 6 number 4, that’s the one you need”.
DS: Would you relate the questions raised about Procedural Audio to people talking about motion capture replacing actors?
Andy: I never made that connection, that’s really good. Mo-cap in relation to CGI is the same as what I am calling analytical recording for Procedural Audio. You go out in the field and try and look at the behavioural features and try and capture them and then use them as data.
DS: So would that be the important point to make then? To use the analysis as data and not exactly copy it, just as how mo-cap is used in CGI?
Andy: Yeah, I was talking to some guys who are in to this mo-cap and human face stuff recently and they were telling me some amazing things about how Tom Cruise voiced an animated character and they got all his face expressions – the eye brows and everything. The trouble was it was a dog character or something and when they played it back it looked too much like Tom Cruise [laughs] – and this was how you could extrapolate to hyper reality where you just scale everything so that the eye brows doubled the distance. So they got this super hyper real version of Tom Cruise in the character but somehow you still knew it was him. This is built in to procedural technologies and data analytical technologies. You can create models based on interpretations of real data and then extrapolate that off in to hyper reality and thats really powerful in film and games. You want that capability built in. They were doing something else really weird, like they were morphing characters, they were doing something like Sigourney Weaver and Tom Cruise and you make Sigourney Cruise. Like you do with hidden Markov models in composition where you can hybridise them and you can kind of have Beethoven and Shostakovich and get new ones, Shothoven or Beethovich (??) [laughs] or what ever your new composer is.
DS: The obvious use of Procedural Audio is gaming and animation, as you mentioned. Where else do you see it being applied? There has been some talk of it being used in electric cars to simulate engine sounds.
Andy: I can see it as a real interesting safety feature in a car. I would also worry about this becoming a real nuisance – imagine a car running along the street sounding like a clown’s car and the person is drunk and its 3 in the morning! I can see a lot of social tension about objects having arbitrary sounds. Sound can be very intrusive and it has very different implications across different cultures and age groups.
DS: I came across a Harmon owned company called HALOSonic that promises to deliver technology that seems to simulate engine sounds for ‘cars of the future’. While there is very little information on the actual technology being used, I wouldn’t be surprised if it used Procedural Audio.
Andy: Let us analyse that as a product and say why Procedural Audio is really powerful for something like this. On the face of it you would think of using samples. I am the CEO of the company, I want a cheap product and I want to get it out. RAM and disk space is so cheap. I’ll get a bunch of sound designers, get them to make loops, load up my product with a thousand loops and plus I can now sell you extra ones. Why the hell would I be interested in Procedural Audio? There were two really powerful reasons – technical computer science reasons. Number one: If you haven’t got memory space like greeting cards, watches, mobile devices – one of the biggest errors of judgement in mobile/casual gaming and the whole mobile technology industry has been over estimating the available bandwidth. And what’s a good technology when you haven’t got any bandwidth? This is what is your biggest asset when you deliver procedural content. It occupies 4kb and plays for six hours. It has a million things that you can change all the time or download another one. So, as a technology procedural content is very powerful in situations where you’ve got limited bandwidth. With cars, another place where procedural technology is very powerful is where you want the sound to encode a large vector of changing parameters. Why is it useful to have a sound on a car? By listening to a car engine I can tell a lot about it – is it slowing down, speeding up, is it a large car or small car. I can localise it pretty well. So to replace a completely silent car engine what you want is a procedural sound object which behaves like the car (that is familiar to peoples expectations viz a viz reality – and hence safety) with engine, with tyre sounds, with exhaust simulation to delineate rear and front approach In fact you could encode all kinds of other information about the car as a safety feature which people would quite quickly get used to. If it is a bus – it could be a bigger noise, if it is a bike it’s got a lighter sound. That would be difficult to do with a sample. So the procedural object would be more versatile and able to to encode more information. That would be argument number two. Argument number three might be that to develop a library of a thousand different car engines would be very expensive. But once Procedural Audio technologies mature I should be able to buy an engine model as a one piece of software and adapt it – I could commission it as a one of piece of software or buy it on a license, put in to my product and I have all the versatility of it. There comes a point I think where the code becomes cheaper than the recording, for a limited use case. Maybe. I’m not sure about the economics of that. I am interested to see how it turns out. If on the other hand there are a lot of people out there recording things and there is a very buoyant market in recordings….
DS: Do you see Procedural Audio code becoming a commodity that is built and sold?
Andy: I think so, I see that happening with the apps market. Production models have changed, and this is very much down to Apple. To their credit, I’m very anti-Apple at the moment (for ethical reasons), but to their credit one of the really good things that Apple has done is (by standardising experience and lowering expectations) accelerate the tool chain to the point where the production of an app is a half a day’s work sometimes. Cookie-cut template things that go out in the store and you download them. That is code. It’s the entire application. Whereas before, code was something that was produced over weeks or months, that still happens with bespoke apps but we’re seeing the trajectory towards commodity code like that. So yeah, if you are a great procedural sound designer why could you not be a guy who specialises in engines and you sold two to Volkswagen and one to Mercedes and you are working on a couple of other ones for some company? People come to you because they know you are the ‘engine guy’. Because its the standard API for interfacing with the procedural code you know at the end of the day you are going to get six floats or something – four rotational velocities for your wheels and you are going to get an engine speed and you are gonna get a bunch of stuff and you plug that in. I would like to be optimistic about the futures for these market places, I would like to think they they might work. That is very much my (humanist) philosophy as a person, I think that these things should create work which should create opportunities and they should create markets and they should create things that people can do and involve their talents and this is why I say you balance technology, art and business . And any one of those can become an overbearing dominant thing. The culture can override the technology and the business, the technology can become the “oh technology oh technology!” and stamp on the business but more often these days it’s about the business running out of control. If the business dictates the technology and the art, if any one of those three gets out to kill the other its bad. As for Apple, who want to commodify and control your creative experience for a profit motive – the absolute antithesis of their “nineteen eighty four position”. So what I would fear in the case that you are saying is that commercial attempts at pushing procedural audio initially turn out quite watered down, neutered and over-packaged like supermarket food, in which case it will be procedural audio only in name, only in marketing speak. But then I’m a dreamer, 20 years ahead with this stuff in my head, disappointment with present reality is built in, its what keeps us pushing. Computational audio, more intelligent structuring of audio in games and cinema, will be a new creative frontier. But we need to get past a crisis of purpose with technology. The idea that easier is always better, that more is always better, dumbing down, disabling and concealing rather than opening up, enabling and enhancing. To paraphrase Laing who says “The emphasis is more and more on communication, but people have less and less to communicate”, we don’t really need more ways to do things, but better, more thoughtful and reflective ways to do things, to replace brute force and dizzying excess of choice from the previous epoch with more focussed and elegant ways. I hope some of the philosophies of procedural audio will shape sound design in a wider sense.