9. Vocal persona

`Don't worry about me'

I remember my mother telling me as a child ` don't worry about me -- I'm fine ', all in a very sad voice [X00]. I also remember the confusion that statement caused me: did she mean the words don't worry about me -- I feel fine or should I listen to the music in her statement: Please worry about me -- I feel miserable '?

The second interpretation was, I think, nearer the truth than the first, not least because of the narrative context of her statement: she was not always a happy person and she'd just been involved in a domestic argument. Another reason for prioritising the music of her statement was that her facial expression, body posture and gestures (in this case, lack of gesture), all aligned with her vocal timbre, volume, intonation, diction and speech rhythm but contradicted the meaning of her words.1 That said, I remember opting, as a child, to take my mother's don't worry about me at denotative face value, a decision which prompted my father to chide me for being insensitive. Although I was able to quip `but she told me not to worry', my father's reprimand encouraged me to listen more to my people's `music' and pay less attention to their words. The trouble with this interpretative strategy was that I would start `reading' people much more on the basis of their `music' (timbre, volume, inflexion, posture, facial expression, etc.) and forget about their words. Besides, if I had responded to mother's plaintive tone rather than to her words, by asking her sympathetically `what's the matter, mum?', I risked insulting her pride and hearing her retort: `I said I was fine. Why don't you ever listen to what I say?' It was, as they say, a lose-lose situation.

It took me many years to realise how I could interpret my mother saying [plaintive tone →] don't worry about me [normal reading →] as an integral statement, despite its contradictions. It actually meant something like `I'm very sad and I find it hard to put on the brave face of self-control I know that grown-ups should. So, please show me some kindness but respect the fact that I at least know I'm supposed to put on a brave face, even if I expect you to see through it'. I was slow to learn that you could consider the narrative context, the scenario, the body language, the words and the music of my mother's complex statements as a whole to be grasped instantaneously. It was a musicogenic statement in the same way as the clear musical moods, mentioned in Chapter 2, where they were expressed as pallid verbal approximations like desperately troubled in the midst of calm and beauty , or sick of the world and feeling alive because of that disgust .2

The don't worry about me anecdote illustrates at least three important issues of musical meaning, the first two of which have been discussed in earlier chapters. This chapter concentrates on the third point.

  1. Musical meaning is never created by the sounds on their own. They always exist in a syntactic, semantic and socio-culturally pragmatic context upon which their semiosis depends.
  2. Precision of musical meaning does not equal precision of verbal meaning or that of any other symbolic system. Hence, its logocentrically apparent contradictions of meaning (see pp. See Imagine, for example, the not uncommon state of mind characterised by a mixture of, say, irritation or resentment and the feeling that is nevertheless a nice day and good to be alive. Using the linguistic domain, you could express this single dynamic state of mind directly to a friend, partner, child, parent, or to the authorities, telling them first how strongly you disapprove of their behaviour: you could start by speaking with sharp timbre and choppy delivery, then switch to a smooth, mellifluous voice. Using the fine motoric domain, you could frown then smile, tap your fingers nervously then flutter your eyelids encouragingly, grit your teeth then relax your mouth. Socially, you might want to avoid the people causing the irritation and then make efforts to welcome them into your company. Using the physical and gross motoric domains of representation to communicate your state of mind, you'd almost have to first beat up the person or people concerned, then caress or hug them. Emotionally, you'd probably want to first yell and stamp your feet, then sit down and relax; or perhaps you'd first tense your shoulders and clench your fists, then lean back, open your arms and show the palms of your hands..-See These platitudes about love and life serve merely to illustrate the fact that while language only occasionally lets us conceptualise dynamic states of being as integral experiences, music almost always does so. Feeling angry on a good day, or desperately troubled in the midst of calm and beauty, or totally sick of the world and feeling alive because of that disgust-- these are no more than pale verbal hints of just part of three of the innumerable kinds of dynamic mood categories that music can create. We should therefore not be surprised that respected critics can describe the same piece of music --in this case the first movement of Mozart's 40th symphony-- in terms of both `deepest sadness' and `highest elation'. Was Mozart confused when he wrote the music? Probably no more so than usual. Does the music make a confused or contradictory impression? Certainly not to modern European ears: it's one of the most well-known, highly valued and widely covered pieces in the Viennese classical repertoire. Were the critics confused when they wrote about sadness and elation in the same breath about the same music? No again: they, too, were just giving pallid verbal hints of what they felt the music to be expressing..; See Polysemy and connotative precision., ff.) are irrelevant and should be treated as musically coherent.
  3. Vocal timbre, pitch, intonation, inflexion, accentuation, diction and volume, plus the speed, metre, rhythm and periodicity of vocal delivery are indicators of the affective disposition communicated by an individual speaker or singer using those means of expression.

`Are you talking to me'

The third point, just mentioned, is illustrated in clip 00 which uses a twelve-second extract from the film Taxi Driver (1976, see Fig. 9-1 and Table 9-1, p. See Vocal Persona Commutations: Are You Talking to Me? Clip timings at http://uk.youtube.com/watch?v=OL7uc6L5nMQ TD = Taxi Driver.) to highlight central aspects of links between voice and personality. In that extract, Travis, the film's main taxi driver character, played by Robert De Niro, practises his famous question are you talking to me? in the mirror, with the camera as that mirror . The De Niro character is appalled by the cruelty and decadence he meets in his job3 but he is also a rather nondescript loser who has exercised his second-amendment right and acquired a gun to bolster his confidence. Now he feels able to face the hoodlums he meets but his low self-esteem still demands that he prepare for eventual confrontations by practising his Are you talking to me? line in the mirror. He utters three variants of that question. Please note that the timings in brackets, below, refer to clip 00 (see Table 9-See Vocal Persona Commutations: Are You Talking to Me? Clip timings at http://uk.youtube.com/watch?v=OL7uc6L5nMQ TD = Taxi Driver.), not to their position in the actual film.

`Are you talking to me?': Robert De Niro in Taxi Driver (1976)

It takes De Niro less than two seconds to say the line and he pauses over two seconds between each time he does so (0:30, 0:34, 0:38). Leaving aside gesture, posture and facial expression for the moment and concentrating solely on the sound of De Niro's voice, minor differences of inflection, intonation, volume and accentuation can be discerned. The first time he asks the question (0:30) his voice is low-key but quite rapid with the quick but substantial rise of pitch normally used in English to questions expecting the answer `yes' or `no', but it does sound a little bit sudden, as if he had been taken off guard. The second utterance (0:34) is slightly slower, a little more deliberate and has clearer diction, suggesting that the imaginary low-life interlocutor may not have heard Travis the first time. The third utterance (0:38) is once again quite low-key but includes slightly more emphasis on `me' and a little less on `talking', this shift in accentuation underlining his personal involvement in the imagined encounter. Apart from these minor variants, it should be noted that De Niro does not raise (the volume of) his voice in anger or frustration, and that his is the normal voice of a young, probably white, North American, English-speaking male. In fact, without the narrative context and without De Niro's body language, there is nothing remarkable about his vocal persona in this scene any more than Travis himself is supposed to be a remarkable personality, even though his distinct lack of charisma may be what makes him narratively interesting.

Given that this relatively normal, neutral and uncharismatic personality has a correspondingly normal, neutral and uncharismatic vocal persona, it ought to be possible to replace his voice with others in order to understand how certain vocal elements are compatible or incompatible with other simultaneous aspects of non-verbal communication. We'll deal first with the latter in the Taxi Driver mirror scene, referring to clip 00 -- The Vocal Persona Commutations (see Table 9-See Vocal Persona Commutations: Are You Talking to Me? Clip timings at http://uk.youtube.com/watch?v=OL7uc6L5nMQ TD = Taxi Driver., p. See Vocal Persona Commutations: Are You Talking to Me? Clip timings at http://uk.youtube.com/watch?v=OL7uc6L5nMQ TD = Taxi Driver.).

The fact that we're in quite a noisy and untidy kitchen and that the young man is white, unshaven and wearing what appears to be an grey flannel or denim air-force jacket tells us quite a lot. It rules out vocal personae who are children (6:20, 6:25, 7:15), women, old men, African-American-sounding or from the higher echelons of society (4:05), unless they're slumming it, of course. It also rules out robots (3:03), death-metal monsters (3:40), chipmunks (6:25) or anything else that doesn't look --or sound-- like a Caucasian male, between 25 and 50 years of age and a member of the popular classes.4 But there is more in the visuals (0:30-0:45) that narrows down the vocal commutation possibilities.

Since De Niro is about one metre away from the camera, alternative voiceovers cannot sound too close (3:17, 3:57) or too far away (6:40). For example, the `repugnant intimacy' of the lecherous voice (3:17) only works if De Niro's face is in extreme close-up (3:30, 6:50). Obviously, then, one element of vocal persona is its perceived proximity. Another element is acoustic space: the `monster', `big guy' and `evil god' voices, for instance (3:40, 6:16, 6:59), have all been given generous amounts of reverb incompatible with Travis's kitchen.

  1. Vocal Persona Commutations: Are You Talking to Me?
    Clip timings at http://uk.youtube.com/watch?v=OL7uc6L5nMQ
    TD = Taxi Driver

Timing

Visuals

Voice

0:00

Logos and titles

robot (repeated)

0:30

TD (=Taxi Driver)

original (∞3)

0:52

TD, comments

original (∞3∞3)

1:35

TD

Dad talking to baby (∞3)

1:57

TD, comments

Dad talking to baby (∞3∞3 )

2:30

Webcam

Commentator (on baby talk)

3:03

TD, comments

Robot

3:17

TD, comments, close-up

Lecherous (repugnant intimacy)

3:40

TD, comments

Monsters (English, German)

3:57

TD, comments

Pathetic, despondent (∞4)

4:05

TD, comments

Posh Southern UK (∞4)

4:32

TD, comments

Regular Southern UK (∞5)

4:49

TD, comments

Disbelief/ridicule (∞5)

5:06

comments

Commentator

5:45

TD

original (∞2)

5:51

TD, comments

`It's tacky to be' [or not to be] (∞2);

5:56

TD, comments

`Då tar det väl vi.' (∞3);

6:02

TD, comments

`Are you watching TV?' (∞2)

6:03

TD, comments

`Distracting a bee.' (∞2)

6:08

TD, comments

`J'suis bien ton ami' (∞2)

6:14

comments

Original ∞2;

6:16

comments
comments

Surprised/indignant;
Big guy, big reverb

6:20

comments

Kiddy robot; Posh Southern UK

6:25

comments
comments

chipmunk; exasperated (1);
exasperated (2);

6:31

comments

Swedish; disbelief; normal German

6:38

end titles

Estuarian; angry robot; very quiet

6:42

end titles

Posh Southern UK; Liverpudlian

6:46

end titles

despondent; whispered; plaintive

6:50

TD, extreme close-up

lecherous

6:57

TD, inverted colours

slow kiddy robot

6:59

TD, distorted face

evil god

7:02

TD, severely distorted face

monster: `Sprichst du mit mir?'

7:06

TD, pixelated

robot, repeated

7:15

Webcam

commentator, evil child

The first time (0:30, 0:52) Travis asks the question he is at the far right edge of the screen with his body facing screen left. He turns his head towards us, as if just having heard something coming from the direction of the camera. He looks surprised, his eyebrows are slightly raised and his head tossed back a bit. It is the look of someone literally taken aback. However, there is nothing except the immediate narrative context that rules out the possibility of pleasant surprise, which is why the first baby talk voiceover (1:35) works well if viewers imagine the camera being the baby's point of view and that the De Niro character is a proud father, suprised and delighted by his infant's happy and communicative gurgling as he walks past.

For the second version of the question (0:34, 1:03) De Niro has half turned toward the mirror/camera, tossed his head back a bit more and raised his eyebrows higher. Once again, it is mainly the narrative context that rules out a possibly positive interpretation of Travis's body language and which lead us to believe that this more clearly `taken aback' posture is more likely to express affront and irritation than surprised delight. Even his teeth, visible for a short moment in an unsmiling mouth, may suggest an attitude of confrontation. Moreover, he seems to be `looking down his nose' at his imagined interlocutor, and since his diction and accentuation are slightly more forceful than before, the baby talk voiceover of the delighted dad is less convincing here. Furthermore, the despondent, depressed and weak voiceovers (3:57, 6:46) align badly with De Niro's posture, facial expression, accentuation and diction during these three seconds.

The third version (0:38, 1:25) is gesturally the clearest. His body is turned a little more towards us (the camera/mirror/imagined interlocutor), as he points to his own chest in sync with `to me'. Yet again, prior knowledge of Travis's character and story will most likely lead viewers to interpret his grin as insolent, and his hand gesture as expressing personal affront. However, without such prior knowledge and with the addition of a few sonic correctives to the narrative (baby gurgling, mother going `aaah!', as in `how cute!'), are you talking to me?, spoken by a personally delighted and surprised father, aligns quite convincingly with this third version of the famous question (c. 1:50, 2:20).

Several vocal persona commutations do not work because of problems with lip sync (synchronisation). For example, stereotypical robot voices (3:03) tend to apply equal durations for each syllable --a non-human speaking trait if ever there was--, while depressed and despondent statements (3:57) are much slower than the rate at which De Niro delivers Travis's famous question in a normal speaking voice. Similarly, whispering and other types of vocal close-up are incompatible not only with the lack of visual close-up in the Taxi Driver sequence but also with its speed of delivery: whispering has to be slower than talking because it has to compensate for its lack of voiced consonants and clear vowel sounds, while many types of intimate statement are unsuitable if delivered in a rapid tempo (e.g. `I love you' at breakneck speed).

Poïetic, acoustic and aesthesic descriptors

None of the observations just made about the Vocal Persona Commutations clip should come as a surprise. As Hughes et al. (2004: 296) remark:

`[L]isteners who hear voice samples can infer the speaker's socio-economic status..., personality traits,... and emotional and mental state... Listeners exposed to voice samples are also capable of estimating the age, height, and weight of speakers with the same degree of accuracy achieved by examining photographs... Independent raters are also capable of matching a speaker's voice with the person's photograph over 75% of the time.'

Indeed, the relationship between an individual voice and its unique personal identity has given rise to the voice print branch of the security industry, complete with its `biometric' claims about defeating credit card fraud or ensuring `that prisoners incarcerated in their homes or out on temporary passes [are] where they were supposed to be'.5 Whether or not the `scientific' sales spiel of voice print marketeers has any validity is not the point here, although incredulity may be warranted, bearing in mind the technical crudity and soico-linguistic incompetence of most corporate `voice recognition' phone systems.6 The point is that insights about congruence between individual voice and personal identity are nothing new. Indeed, the very word person contains the morpheme son, meaning sound, and Latin's personare literally means to sound through (per), hence to sound forth, to proclaim, etc. Moreover, the original meaning of the Latin word persona is `a mask... as warn by actors in Greek and Roman drama'. Its transferred meanings of enacted role, personality, etc. derive from the fact that revealing the true nature of a dramatic character involved projecting the voice of that individual through the mask worn by the actor playing that role. His or her voice had literally to sound through the mask --vox personans--and out into the audience's ears.7

None of these voice descriptions will sound very `scientific' to the sceptical reader: they are more likely to come across as highly subjective, at best as amusing or imaginative. However, the fact that `[i]ndependent raters are... capable of matching a speaker's voice with the person's photograph over 75% of the time', the existence of voice print companies, and the patterns of congruence and incongruence in the Taxi Driver commutation clip, not to mention the etymology of the word person[a] itself, all suggest that well-established patterns of linking voice with personality do exist and that such links can be verified interubjectively in given cultural contexts. We shall shortly return to these links and to their usefulness in discussing the `meaning' of singing voices, the central point of this chapter, but it is essential to be aware of other approaches to the issue of describing vocal sound.

The `musical' properties of vocal sound, spoken or sung, can in general be understood and verbalised using one or more of three main perspectives: [1] the physical techniques of its production (poïetic perspective); [2] its measurable physical attributes as sound (acoustic); [3] its perception, interpretation and effects (aesthesic).

The poïetic perspective focuses by definition on how particular parts of the human body are used to produce particular vocal sounds, e.g. larynx, throat, mouth, jaw, tongue, nose, lungs, diaphragm, shoulders, chest, head. Recurrent concepts are breathing, control, projection and register (chest, mixed, head, falsetto). Now, as we shall see later in this chapter, the ability to reproduce, at least roughly, a vocal sound can help us understand its `meaning'. That's why some familiarity with the physical implications of the terms just mentioned can be useful in identifying the body posture (shoulders, chest, head, etc.) and facial expression (mouth, jaw, nose, etc.) most conducive to the production of a particular vocal sound. That knowledge in its turn contributes to insights about the emotional state of the person[a] producing the vocal sound in question.

The acoustic perspective focuses on the physical properties of vocal sound, i.e. on volume (dynamics, intensity), attack, envelope, decay, fundamental pitch, overtones (partials, transients, harmonics, sound spectrum), etc. The number of possible variations in these quantifiable parameters is virtually infinite; their combination forms the physical basis of the enormous variation of sounds that human voices can produce and of how those sounds are perceived. Differing values in different parameters produce the complex phenomenon we simply refer to as timbre and which Riemann, one of musicology's most notable father figures, in a moment of similar simplicity defined as `the quality which differentiates sounds of the same pitch'.9

Now, there is no room here to explain even the rudiments of acoustic physics in relation to the human voice and its perception. Readers are instead referred to a wealth of literature, some of it online, some of it dealing with correlations between the measurable physical properties of particular sounds and their perception.10 That said, basic awareness of parameters like fundamental pitch, overtones, intensity, attack and envelope11 can, by drawing attention to the physical properties of a particular sound, refine procedures of commutation (e.g. changing the tone spectrum to check on possible changes of perceived effect) and lead to greater precision of semiotic analysis.

The third perspective is aesthesic, i.e. characterised by how sounds are perceived, interpreted, reacted to and used by those who hear them. As we've repeatedly stated, since this book is aimed primarily at music's users, I focus mainly on the aesthesic perspective and this chapter is no exception. So, I'll try, in what follows next, to sort out the various ways in which we seem to verbalise our perception of different vocal sounds. Then, after an excursion discussing basic differences between speaking and singing, the chapter will end with suggestions about how categories of vocal persona can be used in the semiotic analysis of music.

Aesthesic descriptors

Since 2006, I have, off and on, been trawling through websites looking for various combinations of voice , vocal or voiceover with words like quality, timbre, persona, personality and character . In addition to having annoyed students, friends and colleagues by asking them to describe voices to me, I have also taken an interest in vocal casting, a specialist profession in which verbal descriptions of voice play an essential part. For example:12

`Seeking voiceover talent who can recreate a female witch voice for an animated feature. The... project involves an English dub of a Russian animated feature... The witch is also very old, around 70.

Also seeking a counsellor voice. High pitched and whiny... This character is middle-aged.'

Here's a character description circulated by a Hollywood agency looking for computer game voiceover artists.13

`X is the comically annoying, shape-shifting spirit of an ancient Druid Priest who serves as a kind of guide to [the hero] throughout the ages, as well as being a bothersome pest. He pops up unexpectedly to give advice, frequently at less than opportune moments, although he basically means well. He has a sarcastic, dry wit and is an irritating, amusing, occasionally caring and sincere presence that [the hero] has little choice but to tolerate throughout time. Since he can become anyone or anything, he exhibits a wide variety of voices and personalities. [This character is] "a sophisticated elder" voice in the range of Sean Connery or Ian McKellan, as Gandalf in Lord of the Rings, with comedic undertones. Vocal Quality: should be older and wise-sounding, but also with a "Celtic"-type accent´...

That neither of these two adverts describe voice from the poïetic or acoustic perspective is hardly surprising since the jobs aren't for musicologists, singing teachers or acousticians. On the other hand, the paucity of aesthesic sound-descriptive words, given their denotative clarity and simplicity, does seem a little strange --just high-pitched and whiny for the counsellor and nothing else. Is this type of descriptor less relevant than others when advertising for a voice relating to a specific dramatic personality? To answer that question we need to explain our aesthesic voice-descriptive categories. These categories are based on observations made from: [1] student comments in popular music analysis seminars since 1992; [2] online descriptions of speaking and voices; [3] comments from a voice casting agent in direct response to specific questions (see p. 000). Table 9-See Aesthesic voice description categories with examples. (p. See Aesthesic voice description categories with examples.) includes examples of descriptors from these three sources, arranging them in the following three principal categories

  1. Sound descriptors directly denote perceived qualities of sound. Apart from the sample of adjectives listed in Table 9-See Aesthesic voice description categories with examples., a large number of vocal verbs could be added to the list, including babble, bark, bawl, belch, bellow, bleat, blubber, boom, caterwaul, chant, chatter, chuckle, chirp, cluck, complain, croon, cry, declaim, denounce, drone, exclaim, gargle, gasp, giggle, growl, grumble, gurgle, hiccup, hoot, howl, hum, laugh, lilt, moan, mumble, mutter, proclaim, pronounce, quack, rasp, recite, roar, scream, screech, shout, shriek, sigh, snap [at], snarl, snigger, snort, sob, spit, splutter, squawk, squeak, stammer, stutter, twitter, ululate, wail, warble, weep, wheeze, whimper, whinge, whisper, whistle, whoop, yammer, yap, yell, yelp and yowl. The human voice, lungs, mouth and nose can between them make all those sounds.
  2. Transmodal metaphors like rough, smooth, velvety and gravelly connote sound on the basis of homologies from the other senses. These descriptors are like anaphones in reverse in that they denote mainly kinetic and tactile sensations that are transferred to the perception of sound.
  3. Persona descriptors seem, perhaps unsurprisingly after our account of links between voice and personality, to be the most common type of vocal characterisation. They can be divided into four subcategories.

 

.
Aesthesic voice description categories with examples14
1. Sound descriptors

e.g. high-pitched, whiny ; squeaky, booming, low-pitched, deep, full-throated, gravelly, gruff, breathy, husky, guttural, distinct, harsh, rough, indistinct, muffled, plaintive, rasping, roaring, shrill, stammering, loud, soft, quiet, monotone, lispy, foghorn, hoarse, throaty

2. Transmodal descriptors (anaphonic descriptors)

e.g. sweet, smooth, rough, rounded, sharp, angular, velvety, scratchy, piercing, textured, clean, clear, shaky, wobbly, brassy, strained, grainy, gravelly,

3. Persona descriptors

3a. Named
persons with
distinctive voices

e.g. Sean Connery or Ian McKellan; Clint Eastwood, the `Clint Eastwood IS Dirty Harry guy', The Smurfs, Donald Duck, Richard Attenborough, Orson Welles, Morgan Freedman, Billy Holiday, Elvis Presley, Kate Bush, Björk,

3b. Demographic

e.g.| female , male; | very old, around 70, middle-aged, older ; young | `Celtic' accent ; , African American, French, Asian, Southern [US], British, upper class, working class

3c.
Psychological,
emotional
traits

comical, annoying, dry wit, sarcastic, bothersome, means well, caring, sincere, sophisticated | childlike, cute, cuddly, sweet, nice | wise, intelligent, controlled, well spoken, confident, regal | arrogant, dramatic, over-the-top, extravert | wilful, determined, courageous | energetic, flamboyant, bubbly, cheeky, cheery, coquette, jaunty, playful, keen, eager, sassy | interesting, complicated, quirky, eccentric, cartoony | hip, cool, sensual, seductive, sexy, friendly | vulnerable, embarrassed, scared, edgy, nervous, angry, frustrated, irritated, exasperated, bitter | dark, mysterious, introvert | sad, depressed, heartbroken, miserable, anguished | melancholy, bored, bland, nondescript, neutral |intimate, subdued, laid-back, relaxed, soft spoken, humble, simple, innocent | angelic, ethereal | raw, rude, tough, rugged, gritty, macho, aggressive | devious, slimy, nasty, evil, petty | sardonic, sarcastic, ironic, acerbic |

3d.
Professions, roles and
archetypes

witch, counsellor, Druid Priest, guide, elder; bitch, little boy, little girl, heroine, hero, leading woman, leading man, mother, father, evil queen, loving mother, princess, vamp, villain, monster, alien, soldier, old wise man, big boss, fat cat, gangster, robot, sissy, miser, nervous teenager, Barbie doll, imp, suicidal student, lager lout, football hooligan, wiseguy, nerd, geek, evil child, dirty old man,

Subcategory 3a, Named persons with distinctive voices, is often found in reviews, presumably to give readers an idea of what sort of vocal sound to expect from a record they have yet to hear. My unjustifiably disparaging remark that Portishead's Beth Gibbons, in Western Eyes (1997), sounds like an under-age Billie Holiday belongs to this descriptive subcategory. 15

Subcategory 3b, Demographic descriptors, covers the gender and age, as well as the ethnic, cultural, social and economic background, of the vocal persona in question. These descriptors are very common in characterisations of both singing and speaking voices.

Psychological and emotional descriptors (subcategory 3c) are the most common of all. They are mainly adjectives qualifying the feelings, state of mind, attitude or morality of the vocal persona in question.16

Subcategory 3d, Archetypal descriptors, combines traits from all the other categories into archetypal or stereotypical units of personality, sometimes in the guise of professions (priests, teachers, etc.), more often as narrative roles (heroes, vilains, victims, lovers, parents, sages, witches, wizzards, fools, tricksters, etc.). This subcategory has obvious advantages and drawbacks. Consider, for example, the following extract from a review of Audio Bully's 2005 album Generation.17

[T]he intro welcomes back Simon Franks' pot-smoking, pill-popping, wife-beating, bottle-lobbing, 'yes I do live on a council estate thank you very much', vocal persona...

Even though pot-smoking, pill-popping, wife-beating and bottle-lobbing may derive from the duo`s lyrics, those epithets also connote the sort of voice most urban UK residents would associate with (male) slob behaviour (uneducated, careless, thoughtless, self-centred), not least because the activities of wife beating and bottle lobbing imply a specific (and severly impaired) emotional state as well as specific body postures, breathing patterns, etc.18 Restricting ourselves to words listed in Table 9-See Aesthesic voice description categories with examples. (p. See Aesthesic voice description categories with examples.), it is much more likely that the vocal persona in question is loud and booming rather than soft or muffled, brassy rather than wobbly, working-class rather than upper-class, arrogant rather than humble, over-the-top rather than subdued, etc, etc., in fact the sort of voice associated with soccer hooligans (typically loud, male and working-class) and lager louts (vocally similar to soccer hooligans but with bottle lobbing as a plausible additionial trait).

The advantage of epithets like bottle-lobbing and lager lout is that they each encapsulate in a single concept a wide range of behavioural, psycho-social and vocal characteristics. The disadvantage is that epithets like lager lout are culturally restrictive: only those familiar with relatively recent phenomena in UK popular culture are likely to grasp the social and vocal implications of lager lout . As for the final epithet --the `yes I do live on a council estate thank you very much' vocal persona--, it would take another chapter to convincingly explain council estate and all its relevant connotations, yet another to provide a viable socio-linguistic analysis of ` Yes I do live '... and the final ` thank you very much '.19 In short, while the semantic efficiency of such epithets is undeniable within a restricted socio-cultural sphere, their connotations may well be meaningless to the rest of humanity, unless adequate equivalents can be identified in other cultural contexts.

Desipte problems of cultural specificity, there is little doubt that aesthesic descriptors are in much wider general use than their poïetic or acoustic counterparts and that persona descriptors, especially the demographic, psychological and archetypal subcategories, are particularly popular. This observation is substantiated by a Hollywood professional specialising in vocal casting for videogames and animated productions for film and TV. Here are two abbreviated extracts from our email correspondence on the subject.20

What problems do [producers] have in describing the type of voice they want?

The biggest problem they have when they first contact me is that they... describe body type, hair color, [etc.]... I often need to ask more questions, such as age, accent, vocal quality, personality traits, quirks, and temperament...

How often do you/they refer to voices in terms of character archetypes?...

Almost always. Most frequently requested are little boy, little girl, 20s heroine, 20s hero, leading man, evil queen, villain, monster, alien, soldier, wise old man, big boss, fat cat, gangster .

Of course, none of the aesthesic vocal description categories discussed so far are mutually exclusive. For example, a particular kind of witch voice (descriptor category 3d) might also be described as high-pitched and cackling (category 1), scratchy and piercing (2), as sounding like an angry and evil (3c) eighty-year-old (3b) version of the Annette Benning character in American Beauty (3a). Moreover, many descriptors bridge two or more categories: rasping , for example, may be most commonly used to qualify sound (category 1), but the act of rasping (using a rasp as a coarse file in the original sense of the word) has as much to do with touch and movement (category 2) as with sound. Similar observations apply to words like scratchy, piercing, clean, shaky, strained and gravelly . Indeed the whole point of introducing the categories just mentioned is not to create some sort of watertight taxonomy --a fruitless task in view of music's synaesthetic properties (p. See Cross-domain representation and synaesthesis., ff.)-- but to provide insights into the various ways that vocal sound is popularly perceived and described on an everyday basis. The aim of that exercise is in its turn to develop richer and more nuanced descriptions of what a vocal sound can communicate.

 

Vocal costume

`[C]lothing for a particular activity' or `an actor's clothes for a part' are, according to The Oxford Concise English Dictionary (1995), two common meanings of the word costume . With expressions like national costume, notions of group identity are added to the concept. In simple terms of perception, someone wearing a swimming costume is probably dressed for swimming (although it may be just a photo shoot), someone wearing the garb of a sixteenth-century Italian nobleman might be acting in Shakespeare's Romeo and Juliet (or just going to a fancy dress party), and a man in a tartan kilt and tweed jacket might have some intimate ties with the Scotish Highlands (or might be a complete fake). Costume is etymologically related to custom (`a particular established way of behaving') and semantically to the noun uniform , meaning `distinctive clothing worn by members of the same body', i.e. another type of costume signalling group identity.

Vocal costume 21 is a metaphorical expression meaning those aspects of phonation serving the three same sorts of function as literal costumes do: [1] to more easily carry out a particular activity; [2] to assume a role or to act a part; [3] to signal a particular group identity and/or to conform to a given set of cultural norms. Vocal costumes are something people put on like clothes for any or all of the reasons just mentioned: they are used on an everyday basis in both speaking and singing, as the following discussion will hopefully illustrate.

Spoken costumes

Phone voices provide a rich resource for studying vocal costumes, most probably because talking on the phone involves a particular type of sensory dislocation. It's one-to-one audio close-up (if the line is good) but without the visual, kinetic and potentially tactile aspects of one-to-one close encounters. A phone call takes place in the intimate acoustic space determined by the minimal distances between earpiece and eardrum, between lips and mouthpiece. Like it or not, we are almost at sonic kissing distance from our telephonic interlocutor down the road or on another continent. Such sensory dislocation may be less problematic when phoning `friends and family' but it requires some corrective measures if we're on the phone to someone we don't know, perhaps talking to a representatives for a large corporation or a public institution. It's mainly in these types of telephone encounter that vocal costumes come in handy.

When phones were a novelty in UK homes after World War II, many people of my parents' generation put on a special vocal costume when answering the phone. It was a more posh, more official-sounding voice whose diction, vowel sounds and intonation resembled that of BBC radio announcers or newsreaders of the day. These closely miked but widely broadcast official voices, by occupying the public space of the contemporary media, seem to have been taken to represent a sort of common ground for close-up speaking with which everyone was familiar. Of course, since this vocal costume was also that of the old British establishment, it was not the most comfortable clothing to wear and was usually dropped if it became clear that the person at the other end of the line was more `friends and family' than `authority'. Moreover, as with the increasing incogruity of using military marches for news and sports broadcasts in the UK (see Chapter 00), the old-establishment BBC voice later became an anomaly in the wake of socio-economic change leading to the use of other vocal costumes. Technological development played a central role in this process.

As the number of radio channels increased, and as TV and hi-fi recordings became part of both individual and domestic acoustic space, the repertoire of closely miked but widely disseminated voice types available for use as vocal costumes expanded radically. Consumerist propaganda, infamous for its just for you messages disseminated to millions of others belonging to the same target group, started to choose particular voice types corresponding to the intersubjectively verifiable values and desires of a particular demographic. Those voice types are often used today in automatic phone `dialogue' and `voice recognition' systems. Or, as one EU-funded eCommerce document puts it:22

`Advertisers adopt different strategies depending on the product they are selling and the intended audience. The same is true for creating automated telephone service dialogues.... Two of the [phone answering] personalities [`John' and `Kate'] were created with the intention that they would portray younger, more streetwise [bank] agents and therefore would appeal to younger users.'

This sort of vocal costume marketing has led to such cybernetic disasters as `Simone' (Virgin Mobile USA), `Claire' (Sprint), `Julie' (Amtrak) and `Emily' (Bell Canada). While each pre-programmed vocal persona sounds like an attractive, engaging, educated, helpful young woman, she turns out, in the reality of dialogue, to have the brains of a pea and the socio-linguistic skills of a drainpipe. However, so blind is the faith of corporations in the hocus-pocus of vocal pseudo-personalisation that they spend vast amounts of consumer money replacing fallible but relatively efficient humans with crude, incompetent, expensive and time-wasting machinery.23 That said, although `John', `Kate', `Simone', `Claire', `Julie' and `Emily' are no more than attractive vocal drapes covering lifeless dummies in a shop window, vocal costumes can serve some purpose, even inside the field of telephony, as long as no false claims about `interactive dialogue systems' are made. If, for example, you call Radio Taxi 8585 in Milan, the message telling you to hold so as not to lose your place in the phone queue has been recorded to sound like an attractive female secretary with a hint of coquettish fun in her voice. There's a hidden laugh of flirtatious complicity in her tone, or, as my friend Alessandra puts it:24

`It's as if she's saying to male customers "who knows what you and I could get up to while you wait?"... It's not the voice of a mother --that would sound too old-- or of a wife because that would be no fun. It's closer to the voice of an attractive and well-spoken lover... They assume of course... that most customers are men in need of flattery.'

Outside the weird world of brand-conscious, market-driven automated telephony, vocal costumes are simply a very real part of everyday life. For example, if you have to address a crowd of people and there is no microphone, or if you have to keep order in a primary school class, or if you have to make your bid heard in a capitalist casino like the New York Stock Exchange, you will have to put on a vocal costume to do your job and to avoid causing long-term damage to your larynx. Hopefully, you will change into a much softer, happier, more sing-song costume (`motherese') when you talk to your baby child, into something less lilting when you have to answer important job interview questions, into something more sincerely contrite yet competent when you have to explain why you are late delivering work to your boss, and so on. Or perhaps you are a psychoanalyst dealing with a highly strung patient, in which case you may want to put on your professional vocal valium costume and suggest `tell me how you felt about that' in a nice, relaxed manner. That way, your patient is less likely to throw a fit and, even if he/she does start screaming, at least you can keep your calm.

Attentive readers will already have noted that Public speaking voice, primary school teacher voice, a lilting parent voice (motherese), psychologist voice and the nervous interviewee voice are all aesthesic vocal descriptors, more precisely persona descriptors designating professions, roles or archetypes.25 Those labels act as shorthand not just for a type of person (teacher, trader, psychologist, parent, etc.) but also for the type of voice associated with that type of person in particular circumstances. One final example of spoken vocal costume should clarify the issue once and for all.

Before I went googling for vocal persona-related concepts in 2006, I had never heard of the girlfriend voice . The online Urban Dictionary defines the phenomenon as `[t]he change in pitch or tone of a man's voice when talking to their significant other'.26 The dictionary continues:

`The girlfriend voice is characterized by a higher pitch and a more effeminate tone with speech patterns scattered with pet names and childish words. This type of speech is usually frowned upon when used in the presence of other men. When another man uses this voice they will usually receive a fair amount of ridicule...

When he answers his phone and it's a guy, he uses his normal voice, but when he sees that it's his girlfriend calling, his voice instantly climbs several octaves and acquires a whiny, please-don't-be-mad-at-me tone. He's also the kind of guy who, when he gets on the phone with his girl, immediately walks away from the group, leaves the room, or tells everybody to shut up so he can talk.'

Even if `several octaves' is an obvious exaggeration, this explanation of the girlfriend voice provides a clear example of all three functions of vocal costume. The girlfriend voice involves traits of phonation that firstly enable the man adopting it to more easily carry out a particular activity, in this case that of talking to his `significant other' in the way he imagines will please her. Secondly, the same man assumes the role and acts the part of boyfriend rather than that of `one of the guys'. Thirdly, he signals that he belongs to the social sphere of the couple by conforming to the cultural norms of conversation he considers appropriate for that sphere of interaction, even to the extent of walking away from the group, leaving the room and telling his male peers to shut up.

Sung costumes

Many vocal costumes used in singing relate to the the first definition of costume in the sense of what you wear to more easily carry out a particular activity (the `swimming costume' function). Singing classical opera, for example, demands techniques of breathing, diction and phonation allowing the unmiked voice to cross the orchestra pit and stalls to reach listeners high up and far away in the opera house balcony. It can take years of conservatory training to master these `natural' amplification techniques.

 

 

lyrical, conversational, declamatory, dramatic soprano, operatic voice,

 

 

 

SPOKEN

almost all people who have rough voices are probably badasses (game character Auron in Final Fantasy X)

 

 

Spatial placement

- proximity, panning, still, moving

- acoustic spaces: the voice's, the setting's, relationship between the 2

Diction

- clear, crisp, blurred, slurred, indistinct, muffled

 

 

 


1. It is difficult to sound miserable with a cheerful face and with lively body movements.

2. These linguistically contradictory approximations of unequivocal musical mood are explained under `Synaesthesis and cross-domain representation' on pages See Imagine, for example, the not uncommon state of mind characterised by a mixture of, say, irritation or resentment and the feeling that is nevertheless a nice day and good to be alive. Using the linguistic domain, you could express this single dynamic state of mind directly to a friend, partner, child, parent, or to the authorities, telling them first how strongly you disapprove of their behaviour: you could start by speaking with sharp timbre and choppy delivery, then switch to a smooth, mellifluous voice. Using the fine motoric domain, you could frown then smile, tap your fingers nervously then flutter your eyelids encouragingly, grit your teeth then relax your mouth. Socially, you might want to avoid the people causing the irritation and then make efforts to welcome them into your company. Using the physical and gross motoric domains of representation to communicate your state of mind, you'd almost have to first beat up the person or people concerned, then caress or hug them. Emotionally, you'd probably want to first yell and stamp your feet, then sit down and relax; or perhaps you'd first tense your shoulders and clench your fists, then lean back, open your arms and show the palms of your hands..-See These platitudes about love and life serve merely to illustrate the fact that while language only occasionally lets us conceptualise dynamic states of being as integral experiences, music almost always does so. Feeling angry on a good day, or desperately troubled in the midst of calm and beauty, or totally sick of the world and feeling alive because of that disgust-- these are no more than pale verbal hints of just part of three of the innumerable kinds of dynamic mood categories that music can create. We should therefore not be surprised that respected critics can describe the same piece of music --in this case the first movement of Mozart's 40th symphony-- in terms of both `deepest sadness' and `highest elation'. Was Mozart confused when he wrote the music? Probably no more so than usual. Does the music make a confused or contradictory impression? Certainly not to modern European ears: it's one of the most well-known, highly valued and widely covered pieces in the Viennese classical repertoire. Were the critics confused when they wrote about sadness and elation in the same breath about the same music? No again: they, too, were just giving pallid verbal hints of what they felt the music to be expressing...

3. `All the animals come out at night' is another well-known quote from the film.

4. Monster and robot voiceovers work better if you manipulate the visuals (6:50-7:10).

5. SearchSecurity.com |searchsecurity.techtarget.com/sDefinition/0,,sid14_gci944937,00.html| [2008-02-11]. `Voice authentication products', the site goes on to inform, `are available from a number of vendors, including Vocent, Nuance Communications, Courion Corp., and VoiceVault'.

6. For documentary evidence of `voice recognition' incompetence see clip 00.

7. Cassell's Latin-English Dictionary, London, 1968. Also worth noting: the Ancient Greek for mask ,. prosvpeion, derives from prosvpon, meaning face or appearance (visual rather than vocal identity) and, later, person . Modern Greek's
prosvpo only means person , with its derivatives prosopiko ( personnel ),
prosvpikothta ( personality ), etc.,

8. The first three comments were online at: [1] rollingstone.com/artists/chakakhan/albums/album/243746/review/5945280/chaka [2] furia.com/page.cgi?type=twas&id=twas0196
[3] whiteperil.com/posts/1093202710.shtm. The Buddy Holly comment is in Bradby & Torode (1984) and the Eminem description comes from one of my students in Liverpool (c. 1997). The Ronstadt words were at | superseventies.com/spronstadt.html |.

9. Dictionnaire de Musique (1913), Princeton University Music Department: silvertone.princeton.edu/~john/timbretimeline.htm [2008-02-21]. Maybe Riemann's definition should have included `intensity' (volume)?

10. bibliographical refs. Serge/Caroline, UUtrechtLOT, other online stuff.

11. See Chapter 7, `Musical parameters'.

12. voice123.com/lv/3093614.html [2006-04-20] (errors of English corrected).

13. Thanks to Dawn Hershey of Blindlight ( blindlight.com , December 2007), for invaluable help with charting voice-descriptive language in the casting profession. Thanks to Peter D Kaye (Santa Monica) for putting me in touch with Blindlight.

14. Words in small capitals are taken from the two voiceover adverts (p. See `Seeking voiceover talent who can recreate a female witch voice for an animated feature. The... project involves an English dub of a Russian animated feature... The witch is also very old, around 70.., ff.).

15. Tagg & Clarida (2003: 456).

16. Ear-Nose-Throat specialists have documented numerous links between phonation and emotional state. For example, Deary et al., (2003: 374) describe how `[v]oice production is subject to and indicative of psychological status', while McHugh-Munier et al. (1997) conclude that `[e]motional state affects the physiological mechanism involved in phonation. Differences in acoustical parameters of the voice under stress', the add, `have been attributed to the coping mechanism used, which is based on the individual's perception of the situation.' Their study examines the relationship between coping strategies, personality, and voice in female subjects.'

17. Review by Jamil Ahmad -- musicomh.com/albums/audio-bullys_1005.htm [2007-02-25] . Audio Bully are a West London based duo to whom a wide range of genre labels have been applied, e.g. dance, urban, garage, electro, pop, punk, etc.

18. This voice has much in common with the `narcissistic-aggressive' type described by A M Benis | narcissism.homestead.com/npatype.html [2005-04-23]|.

19. Here's a drastically simplified and pallid summary of implied connotations with whose `logic' I don't necessarily agree! Council estates are areas of low-cost rentable housing in the UK. Living on a council estate implies lower economic and educational status. Proclaiming `yes I do live on a council state' implies being proud of not aspiring to higher economic or educational status. Adding `thank you very much' to the statement is ironic, implying that the interlocutor has a negative view of council estates and of those who live there. The speaker is, in short, `proud to be ignorant, happy to be a slob', etc.

20. Thanks to Dawn Hershey of Blindlight ( blindlight.com , December 2007).

21. It took a year of sporadic reflexion to come up with the term vocal costume. Vocal mould, uniform, habitus, template, etc. were all dumped for a variety of reasons.

22. Spotlight project 1999-10314: `Mass Market eCommerce Services using Multi-language Natural Spoken Dialogues', funded by the European Commission's Information Society Technologies (IST) | spotlight.ccir.ed.ac.uk/ [080224]| .

23. Bell launched `customer-service' `Emily' in 2003 to the tune of $10 million. See | http://www.speechtechmag.com/Articles/Editorial~Feature~Its-a-Persona,-Not-a-Personality-36311.aspx | and | tagg.org/zmisc/FidoCallTranscr.htm | [both 080224].

24. Thanks to Alessandra Gallone (Milan) for answering questions about prerecorded phone voices in Italy [080225]. `Stiamo cercando il vostro taxi. Restate in linea per non perdere la priorità acquisita' is what the flirting secretary voice says. Milan taxi customers are excused for wonderng what sort of `priority' entails waiting in line.

25. See Table 9-See Aesthesic voice description categories with examples., p. See Aesthesic voice description categories with examples., subcategory 3d.

26. |urbandictionary.com/define.php?term=girlfriend+voice [080224]| .