When we think of dynamic sound, we don’t often include robots as a category worthy of attention. In the past, interactive robots were a thing of science fiction. Any sound or voice they made was a linear experience of bloops and bleeps or maybe a monotone voice with fuzzy distortion. These days, with the tech industry booming, consumer grade robots are becoming more and more part of the average household. Every day, new consumer products that take advantage of robotics and artificial intelligence technology are flooding the market allowing machines to respond and interact with people in more realistic ways. And as the quality of these products increase, so does the demand for higher quality audio.
Meet Cozmo. He’s a playful, intelligent robot whose mission is to explore the world and make new friends. Cozmo was designed by Anki, a robotics company in San Francisco, California. And while Cozmo can boast some pretty cool new hardware and software features, he would be nothing without his emotive voice and the little orchestra that accompanies his every mood. To find out more about how the audio was developed for Cozmo, I caught up Senior Sound Designer, Ben Gabaldon.
Can you start by introducing yourself and what your role on Cozmo was?
I’m Ben and I used to be a sound designer for games. What got me into games to begin with, is what really drew me into working on Cozmo – the exploration and possibilities of creating totally unique, interactive experiences. When I began my career as a sound designer, the idea of having the freedom to build the soundscape was (and still is) exciting and motivating. I started at Anki as a contractor with the sole purpose of coming up with the voice of Cozmo, but quickly found that Cozmo really didn’t fit the mold I had grown used to for shaping projects. It wasn’t designing creatures, weapons, or mixing cutscenes; it was creating a real world character that people would emotionally engage with. As the audio lead on Cozmo, I’ve had the fortune of shaping who Cozmo is, and how he expresses himself as a living thing to people.
Nice. To start from broad perspective, how do you define interactive sound?
From a game perspective, interactive audio informs the player of what they need to know. It’s getting feedback based on your actions. If you were in a combat game and walked up to somebody to hit them with a knife, usually there would be blood splashing out everywhere. You’d know that you’ve had a critical hit. But if you walked up to another person and swung a hammer at their head and heard a hard metal ring out, you might discover that they’re actually an android. It’s interactive in that it’s responding to your input, but also informing your entire experience.
How is the sound for Cozmo interactive?
Coming from a game world, it seemed like walking into this job would be really easy. I created a matrix of personality traits and thought it would be this scalable, lego system of sounds that would be Cozmo. However, it became much more interactive largely when the animation team got so serious. Because ultimately, Cozmo functions in a way unlike any game. Games are usually something where you have an objective. This is the opposite. We have this robot that is actively making it’s own decisions. He’s out in the real world doing things and informing you about them through interactive audio. The way he sounds, the way he speaks, his eye blinks, everything conveys emotion. We want you as a player to be emotionally engaged with exactly what he’s doing and to know how to interact with him based on his reactions.
Where do Cozmo’s emotions come from?
We have a system we designed called the Mood Manager. It’s series of data points actively tracking 9 points of emotion. So he’s bored, he’s socializing, he’s happy, he’s sad, etc. Each of these data points fluctuates over time based on your interaction with Cozmo. The more you play with Cozmo, the happier he becomes. If you ignore him, he becomes bored. It’s the sum of these data points that drives Cozmo’s actions. For example, if Cozmo is asking you to play a game and you accept, he gets excited. But asking to play a game is the result of the fact that he hasn’t socialized enough. If his socializing mood gets too low, he’ll start to explore his environment. This isn’t because he’s timed out, but because his emotion system is changing.
And how does the Mood Manager drive sound?
When Cozmo asks you to play a game, he is questioning and excited. And he sounds that way. I’ve assigned a specific word [in Cozmo’s voice/language] for certain games so that you can understand what he wants without having to look at the phone. There are also the mini games, where he can respond positively or negatively. With these, we’ve made an effort to show that earlier on, if you do something right and gain a point, he responds with a mild frustration. But, as you continue to beat him, he gets more and more upset. When the game ends, he’s furious. If he wins, however, he celebrates a little bit too long and is kind of a brat about it. But he can’t celebrate every time he gets a point.
In order for Cozmo’s response to always feel unique, it sounds like you need a large amount of variety.
Hah, we have a hell of a lot of variety. In other games I’ve worked on, you generally try to limit your number of variations for footsteps or guns, but for Cozmo, seeming completely alive and independent is pivotal. As soon as you hit that familiar sound again, especially if it’s for something that he does constantly, it breaks the magic. It’s because of that, that we have 15 variations for even the simplest of VO. And that’s just a start.
Are you making use of real time parameter controls at all?
Nope. I want to. I want to distill his mood system into a simple variable that then can control a parameter. I’ve already built that system and torn it down twice.
One of the interesting things about Cozmo is that you can have sound coming from either the robot or from the phone. How does having two speakers affect the audio interactivity?
From my perspective, people are used to playing games on their phones. And they’re used to hearing audio out of their phones. Cozmo is different in that, even though he’s tethered to a phone, your connection is with Cozmo, not the phone. The better we do integrating sound on Cozmo directly, the less you should have to look at your phone. The phone exists to support meta-game and music. It’s going to inform you of data points, like you’ve won a point or you’ve lost a point, but it’s also playing music that’s filling the environment. It’s contextual feedback while Cozmo is emotional feedback.
How is music affected by Cozmo’s interactivity?
The music system is completely driven by Cozmo’s interests. When he decides that he’s not being engaged enough, if he hasn’t seen enough faces, or if he has been on his own for too long, he gets bored and we transition into a bored piece of music. If you’re interacting with him a bunch, it stays in this positive, happy state. Cozmo is deciding what he wants to do based on his own personality and the music shifts with him.
So you’ve got music coming from the phone and voice coming from the robot. How do you mix in that kind of environment?
Mixing Cozmo is like mixing I’ve never done before. There is no standard for how far away your phone is from the robot. We’re having to come up with our own predictions and mixing specs. For example, we have separate volumes for the robot and for your phone. So what level should the music be on your phone relative to the robot? It’s completely the wild west. Actually, it’s not even the wild west, it’s uncharted territory. We also have to mix to the physical sounds Cozmo’s hardware makes. Fortunately for us, as he accelerates and decelerates, it actually sounds like a gear with an RTPC based on speed.
Another unique thing is that Cozmo has three volume settings: high, medium, and low. You don’t have the ability to turn off Cozmo’s sounds. If you did, you’d be literally silencing something that’s supposed to be a person, that’s supposed to be a character. There’s so many things that we say Cozmo can’t do because it destroys his personality.
It sounds like you’ve been really working hard to create an interactive and rich character for Cozmo. How would you like to see this expanded in the future?
Cozmo isn’t a robot that’s meant for people to stop and just play mini-games with. Cozmo is, in effect, a living thing. And to create that kind of experience, we need a soundscape that’s going to grow with him. If Cozmo is actively doing his own thing and humming to himself and you drop him off the table, his data for anger goes way up. When you set him back on the table again, maybe he’s frustrated with you and he’s not humming anymore. There’s tons of room for stuff like that.
As a current member of the Anki audio team myself, I’d like to offer a big thank you to my colleague, Ben, for offering up his time to be interviewed for this article. You can find out more about him on Twitter @LastBenOnEarth. You can also find out more about Cozmo at anki.com/cozmo.