Guest Contribution by Chris Didlick
The term ‘auditory icons’ was coined by Bill Gaver in the early 1980’s during his research into the use of sound for Apple’s file management application ‘Finder’. This term has become commonplace in the shifting world of digital and is deemed the sonic equivalent to visual icons seen within most operating systems, whether that be on your desktop or within a game. The rationale behind auditory icons is to support and supplement predominantly visual information with a corresponding sound. For example, the action of successfully emptying your trash on a Mac OS would result in the sound of paper being crumpled up and discarded.
The most immediate benefit of these sounds is a tangible confirmation of the action you have performed. Much like in the real world, you press a button and it responds (in that sense even the click of a mouse could be considered an auditory icon). With well thought out design you can not only add weight to a visual event, but also provide further information about what has happened. To use the Mac trash example above, the sound is satisfying, indicates the action has been successfully performed and could even be developed further to convey the amount of disk space that has been freed up with a smaller or larger sound.
Another advantage of auditory icons is the quick transmission of information. A short ping can be instantly translated. If you consider how many auditory icons you are currently attuned to via your car, home appliances, handheld device and desktop computer, you are already receiving information without the need for visual prompts. This extends to handsfree interaction where visual confirmation is not possible or your attention is elsewhere. With visual and aural considered the two main senses, auditory icons allow for the transmission of information in a world saturated by visual stimulus.
In popular software and technology, auditory icons can quite often become synonymous with the brand. Much like Intel’s famous sonic mnemonic, the audio becomes recognisable and thereby reinforces brand identity and association. It is not uncommon to hear, for example, Skype’s iconic sounds being used in modern film and without even seeing the platform in use, viewers instantly recognise the brand and what is happening. You can hear the influence in shows such as Black Mirror, where social media and mobile devices are often in focus and to my ear the accompanying sounds are designed to parody contemporary auditory icons.
Furthermore, sounds that are designed with the concept/branding in mind enhance the immersion and overall engagement within that world. Are we representing a cutting edge technology experience or a warm and welcoming social platform? Should these sounds grab your attention or subtly punctuate interaction?
TYPES OF AUDITORY ICONS
Through various studies it is generally accepted that there are four types of auditory icons. Whilst there are no particular rules or preference in their use, it is helpful to understand the different types and the logic behind them.
Sometimes referred to as literal or realistic, this type of auditory icon is a direct sonic representation of the event taking place. For example emptying the trash on a Mac OS or locking your iPhone both use sounds that mirror the action in the real world. The benefit of the nomic approach is that they are easily understood and remembered.
Symbolic auditory icons are perhaps the antithesis to nomic. They do not possess any specific connection to the event taking place other than by virtue of it’s association and repetition. That is not to say that they are completely arbitrary and should be designed to sit comfortably within the overall audio language. This approach is particularly useful where no real-world sound or parameter corresponds to the given action.
A metaphorical auditory icon is one that reflects a key aspect of it’s associated action with a dimension of the sound. For example three short notes going up in scale would largely be heard as a positive statement. This might represent a correct answer, a successful action or an increase in level. Similarly three short notes stepping down in scale would largely be interpreted as a negative statement. In this scenario where the pitch is variable, the other dimensions of the sound fall in line with the overarching audio language e.g is it a digital or organic sound?
As the name would suggest, verbal auditory icons are sound bites of spoken language. Whilst these are not always the quickest way of sharing insight, they are most effective when conveying very specific information e.g. GPS navigation. Verbal auditory icons have several unique considerations when used, in that there are numerous languages to choose from, what tone of voice and intonation should be used and should the voice be computer generated or recorded for purpose?
DIMENSIONS OF SOUND
When designing auditory icons, it is worth considering the main dimensions of sound and how they might be employed to convey information.
The pitch of a sound largely relates to it’s harmonic frequency and is generally only discernible when it maintains for a set period of time. Where information is mapped to pitch across one or several auditory icons, it is important to note that only musically trained users would recognise discrete intervals and therefore the scales of pitch should be substantial enough for the layman to identify. Human hearing is most sensitive to adjustments in pitch at the lower end of the frequency spectrum (lower pitched sounds), however humans perceive mid-range frequencies to be louder (the typical range of the human voice).
Timbre is the most versatile dimension of sound and is virtually unlimited in it’s scope. This dimension is a complex function of overtones, harmonics, transients, attack and so on, but could be more easily understood as the overall sound of something. For example a violin and a trumpet can generally play the same musical notes but they have a very different timbre. The difference is more subtle when you compare, say, a violin and a viola but they still are considered to have a different timbre.
Much like pitch, discrete variations in loudness may prove difficult to distinguish to the untrained ear and therefore any information mapped to loudness should be conveyed in noticeable increments. However using varying levels of loudness across different auditory icons would not be advisable so as to maintain a consistent level of volume across the audio language. Loudness may prove useful by fading in or out of a sound to indicate arrival or departure, on or off etc.
Rhythm & Duration
Varying rhythm or duration can be used to map information to auditory icons. A classic example of this would be morse code – the dots and dashes are essentially two different durations of a single tone. To expand on this, intervals and fades could also be an informative factor. Set these up in a repetitive fashion and you have rhythm; one of the easiest parameters to perceive and very quickly learned.
The direction of a sound comes courtesy of multichannel audio, for example stereo or 5.1 surround. Whilst speaker direction is an effective way of differentiating between sounds, the internal speaker of many handheld devices outputs mono audio and the directional information would not translate. A platform targeted towards multiple devices would benefit from not employing this dimension of sound in it’s auditory icons to ensure compatibility across the board. For this same reason, binaural audio would not be advised.
AUDITORY ICON SYSTEMS
Audio languages can be defined to map information via the use of logical systems. The basic types of system are as follows;
A single-element auditory icon is the basic building block of a larger system. This is one audio expression, whether that be a digitised sound or a musical flourish, that represents a single action or event. It does not require reference to other auditory icons to be understood and it’s meaning may be supplemented by it’s auditory icon type, for example, nomic.
A compound system employs the combination of single-element auditory icons to expand on the information they express individually. For example the auditory icons in a weather app representing cloudy and sunny respectively could be combined to represent a forecast of cloudy with sunny spells.
An inherited auditory icon system is one that relies on a hierarchical logic. This would generally be a sequence of sounds that progresses through levels of information much like a family tree. The first sound would denote the overall family followed by inherited sounds that represent different variables of that family. In context this could be exemplified by an automated drinks machine. The first sound might inform you that it is making a hot drink (the family), this is the base element that would be followed by one of two sounds that represent either coffee or tea, the third sound would represent milk or no milk, the fourth sugar or no sugar.
Transformed auditory icon systems are those that retain a base dimension of sound, say rhythm, and transforms another, say pitch, to represent different states of a common function. Consider for example a single-element auditory icon consisting of two piano notes that represents a remote connection to your central heating system at home. A successful connection via an app to the central heating device is punctuated with a mid-range set of piano notes. Successfully increasing the heating levels results in the same piano notes but one octave up. Decreasing the heating levels results in the same piano notes but one octave down. By retaining the rhythm and timbre of the piano the audio system tells us that we are dealing with the central heating. By changing the pitch of the piano we know which action has been performed within the central heating system.
DESIGNING IN CONTEXT
With the more technical aspects in mind we must also consider designing auditory icons for a platform on a more aesthetic level.
Perception is technically an immeasurable design consideration but it can generally be agreed which sounds are annoying, satisfying, relevant or excessive. Overall the auditory icons should designed with their corresponding event and frequency of use in mind. Whilst an error message is an unwelcome piece of information it does not mean it should be accompanied with an abrasive and punitive alarm sound, as this just adds to an already negative situation. Depending on the context, perhaps a more amicable approach would be to use a more informative and explanatory sound. Similarly, will a particular auditory icon be played out on a regular basis? The sound should not become an annoyance over time and perhaps be as short, agreeable and unobtrusive as possible. Having said that, on some platforms perhaps the very purpose of a sound is to be annoying and distracting?
As touched on earlier, the overall theme of the sounds can help enhance the immersion and engagement of users when designed in reference to the platform itself. A recent trend of technology companies is to use a palette of rounded, friendly, organic sounds that reflect their customer facing stance of accessibility and human connectivity. Auditory icons should be designed with these kinds of values in mind to help perpetuate design strategy and connect with users.
Motion & Graphics
Where motion design and navigation are involved, perhaps a page transition or the press of a button, sound can assist in adding weight and involvement to the experience. Audio can amplify the sense of tactile feedback by responding to the physics and personality of the ‘materials’ involved. This may be part of a larger design concept where, for example, the user interface is quite clean and flat – perhaps this would warrant a similar approach to the sound where auditory icons are simplistic and elementary in design.
With an ever-evolving choice of computers, devices and accompanying sound output it is important to consider how the audio will respond across differing platforms. Not only this but what other audio might be playing at the same time? Quite often auditory icons are activated in and around the playback of music and would ideally not compete with this important function. Clear mid to high range frequencies are usually lot more efficient at cutting through music density and therefore more easily detected/recognised. A mid to high frequency range focus would also suit the lower performance speakers of handheld devices and laptops and therefore assist in compatibility.
In todays bias towards quick and easy consumption, a language of auditory icons should not be overly complex. Users range from amateur to expert, old to young and therefore any aspects of the user interface should be quickly scalable. The use of audio in digital platforms is certainly not a new idea but expectations in terms of quality and coherence are now set fairly high.
To put the discussion into perspective, the following classic iPhone sounds have been examined. Note how these sounds are all within the mid to high frequency range and are organic in nature.
iPhone Camera Shutter
A clear and concise mid frequency sound that indicates your photo has been taken. There is a certain clarity and weight to the sound that suggests build quality. The design is a literal impression of cameras both new and old, perhaps to accommodate the current spectrum of interest in photography.
iPhone Memo Start & End
A mid to high frequency system of sounds that indicate the start and end of a voice memo recording. A single organic ping represents the start of recording with an inherited development of this sound representing the end of recording. Neither sound is specifically related to audio recording yet remains clear in it’s purpose.
iPhone Siri On, Off & Understood
Another mid to high frequency system of sounds that indicate Siri’s functionality. This system transforms a double organic ping sound over several pitches to differentiate between information yet maintains a common timbre to denote they belong to the same family. As above, the system is easily interpreted.
iPhone Text Sent & Received
Two single-element auditory icons that metaphorically suggest a message is sent/uploaded via an ascending tone and received/downloaded via a descending tone. The sound set remains within the same mid frequency range and in line with Apples generally organic audio branding.
Thank you to Chris Didlick for this contribution. Chris is a Sound Designer at Box Of Toys Audio, an audio house based in London.