This is a DUMMY FILE used only for indexing purposes



The Mass Media Music Scholars’ Press (MMMSP)
87 West Brookside Drive, Larchmont, NY 10538 (USA);
Department of Music and Music Technology,
University of Huddersfield HD1 3DH (UK)
© Philip Tagg, 2012-2013

Bibliographical data
Tagg, Philip
Music’s Meanings: a modern musicology for non-musos
New York & Huddersfield: The Mass Media Music Scholars’ Press, 2012
e-book version 2.4.2, 2013-05-14, 710 pages, ISBN 978-0-9701684-5-0
as first hard-copy edition, 2013-03-06, 710 pages, ISBN 978-0-9701684-8-1
First published (version 1.0) as e-book, 2012-09-26

music, musicology, musemes, analysis, semiotics, signification, sign type, connotation, denotation, logogenic, musogenic, communication, dual consciousness, history of ideas, epistemology, education, emotion, gesturality, intersubjectivity, intertextuality,
interobjectivity, interdisciplinarity, structural designation, aesthesis, poïesis, genre, style, extended present, expression, emotion, affect, metaphor, time, space, motion, touch,
texture, timbre, tone, rhythm, metre, speed, tempo, surface rate, periodicity, loudness, volume, voice, vocal persona, form, episodes, diataxis, syncrisis, anaphones,
etymophony, synecdoche, figure-ground, melody, accompaniment, muso, non-muso, film music, classical music, popular music.

Production and distribution data
Set in Palatino Linotype using Adobe FrameMaker 8 and Photoshop CS3, 10;
CaptureWiz Pro 4.5, FontCreator 6.1, Steinberg WaveLab 6, Videolan 2.0.3
on a Toshiba Satellite Pro laptop U500D (Windows XP)
Editing, indexing and page laying 2009-2012 by the author.
Proof read by Tommi Uschanov, Yngvar Steinholt and the author.l

Hard-copy printing, binding, sale and shipping on demand directly through

e-book, hard-copy and further information available through

For Dave Laing and Simon Frith who told me a long time ago that I should write this book; and for my daughter, Mia, and all the other intelligent people who don’t know what a dominished seventh is but who are as passionate as I am about music and want to know more about how it all works.

MUSIC’S MEANINGS - a modern musicology for non-musos ―“also useful for musos” by Philip Tagg


I’m indebted to my teachers and mentors who encouraged me to make music and to think about music as if it really meant something other than itself. I’m thinking in particular of Jared Armstrong, Ken Naylor, Aubrey Hickman and Jan Ling without whom I would not have seen any of this through. I’d like also to thank Wilfred Mellers for having blazed a trail for musicologists interested in matters popular and semiotic, and both Line Grenier and Jean-Jacques Nattiez for taking the time to discuss important theoretical issues with me. Thanks also to Simon Bertrand and Shawn Pitre for having so generously and competently acted as my teaching assistants in Montréal, and to Margit Kronberg without whom I would never have dared make connections between music and so many ‘other sorts of something else’.

I want to thank those friends and colleagues who asked me ages ago to write this book, particularly Dave Laing (London) and Simon Frith (Edinburgh). Thanks also to Coriún Aharonián (Montevideo), Gillian Anderson (Boston & Bologna), Bob Clarida (New York), Bob Davis (Leeds & Huddersfield), Franco Fabbri (Milan), Serena Facci (Rome), Susana González (Mexico City), Stan Hawkins (Oslo), Markus Heuger (Köln), Bruce Johnson (Sydney), Peter D Kaye (Santa Monica), André Lambert (Montréal), Laura Leante (Durham), Fred Maus (Charlottesville), François de Médicis (Montréal), Morten Michelsen (Copenhagen), Richard Middleton (Castle Douglas), Piero Milesi (Mattarana), Yngvar Steinholt (Tromsø), Ola Stockfelt (Göteborg), Martha Ulhôa (Rio de Janeiro), Peter Wicke (Berlin) and Tim Wise (Salford) for input and feedback over the years.

I’m also grateful to graduate students past and present, in particular to Karen Collins, Laura Jordán, Serge Lacasse, Guillaume Samson, Luana Stan and Garry Tamlyn, as well as to other students (Alison Beck, Solène Derbal, Joanne Fellows, Anne-Laure Feron, François Gauthier, Marie Goffette, Hélène Laurin, Nicolas Masino, Jonathan Shave, Nick Thompson and others) for testing my methods to the limits and for producing excellent analyses. In fact I’m grateful to all students —in Göteborg, Liverpool, Montréal and elsewhere— who enrolled for one of my classes in either Popular Music Analysis or Music and the Moving Image, who introduced me to so much music I’d never heard before and whose insights, questions, frustrations, curiosity and enthusiasm taught me so much about how music communicates what to whom with what effect.

Sincere thanks go to Bob Davis and his family for their warm welcome and generous hospitality when I finally returned home after so many years working abroad and to Bob in particular for his patience, intelligence and experience when I felt unsure about what I was writing. Last but not least, thanks go to my daughter, Mia Tagg, for keeping me at least half sane and for just ‘being there’ wherever I was during the long trek of teaching, talking, thinking, learning, listening, playing, arranging, arguing, ranting, editing, composing, computing, travelling, reading and writing that culminated in this book. It has been a long journey.


Huddersfield, 26 September 2012.




Background 1: non-muso 3; Readership and aims 4; Background 2: muso 7; TLTT 17; Terminology 19; Overview of chapters 20; Appendices 24;

Publication issues 29; Formalia (typography, timings, etc.) 31

1. How much music? …35

Time budget 35 Money budget 37

2. The most important thing… 43

Definition and axioms 44 Conceptual comparisons 50

Evolution and development 54 Music and socialisation 58

Cross-domain representation and synaesthesis 62

A quick trip around the brain 68 Emotion, mood and metaphor 71

3. The epistemic oil tanker…83

Articles of faith and musical power agendas 84

Classical absolutism: ‘music is music’ 89

‘Absolute’ and ‘non-absolute’ 91 ‘Absolute’ and ‘arsehole art’ 94

‘Postmodern’ absolutism and text denial 101

Musical knowledges 115

Structural denotation 116 Skill, competence, knowledge 118

Notation: ‘I left my music in the car’ 121; Summary and bridge 130

4. Ethno, socio, semio…133

Ethno 133; Socio 137; Semio 145; Bridge 151

5. Meaning and communication…155

Concepts of meaning 155

Sign and semiotics 155; Semiosis: your aunt’s dog and a steel guitar 156

Semantics 158; Semiotics and semiology 159 ; First, second, third 160;

Icon, index, arbitrary sign 161; Denotation and connotation 164

Polysemy and connotative precision 167

Concepts of communication 172

Basic communication model 174 Codal incompetence 179

Codal interference 182 Representing immigrants 186

’Somatic’ and ‘connotative’ 189 Summary 192

6. Intersubjectivity…195

Aesthesic focus 196; Ethnographic intersubjectivity 199; Reception tests 200

Unguided association 204 Classifying test responses 208

VVA taxonomy issues 215 Lissa and library music 222

Summary of main points 227

7. Interobjectivity…229

Intro 229; Basic terminology: Object and structure 230; Museme 232

Interobjective comparison 238

Collecting IOCM 241

Ask a musician 241; Caveat 244; Recommender systems 246 ;

The more the merrier 248; Reverse engineering 1: from IOCM to AO 249

Reverse engineering 2: recomposition 251

Commutation 253

Structural designation 256

Unequivocal timecode placement 256; Paramusical synchronicity 260

Summary of main points 260

8. Terms, time and space…263

About Chapters 8-12 263

Basic concepts (1): Genre and style 266; Paramusical expression 268

Parameters of musical expression 271

Basic concepts (2) 272 (incl. Piece of music, Extended present,

Note, Pitch, Tone, Timbre)

Time, speed and space 281

Duration 281

Micro-durations: 281; Meso-durations 283; Mega-durations 288

Speed 288

Tempo, beat and pulse 288; Surface rate 289; Harmonic rhythm 291

Rhythm and emphasis 291 Metre and groove 293

Space and aural staging 298

9. Timbre, loudness and tonality…305

Timbre 305

Instrumental timbre 306; Ethnic stereotyping 306 ;

Conventions of mood and style 307; Acoustic instrument devices 309

Effects and effects units 309 incl. Distortion, Filter, Modulation, effects.

Loudness 313

Pitch and tonality 315

Pitch 316

Melodic contour 318

Tonality 319

Tuning systems 321 Intervals 322

Tonal vocabulary: modes and keys 325

Structural theory 326 Mode and connotation 332

Melody 335

Tonal polyphony 337

Drone 337; Heterophony 338; Homophony & counterpoint 338

Harmony 339

Chord types and harmonic idiom 339; Chord progressions 340

10. Vocal persona… 343

Persona and vernacular sources 344

‘Don’t worry about me’ 345 ‘Are you talking to me?’ 347

Poïetic, acoustic and aesthesic descriptors 350

Aesthesic descriptors 353

Vocal costume 360

Spoken costumes 361 Sung costumes 364

Singing as costume 365 Suiting up for opera 369

Group identity costumes 371 Genre-specific vocal costumes 373

Grasping vocal persona 376

Vocal parody 380

11. Diataxis…383

Three types of ‘form’ 383 Diataxis in Fernando 386

Cyclical processuality 392

General diatactic schemes 395

Popular song 395

Chorus, refrain, verse, etc. 395 AABA (Chorus-Bridge) 397

‘Post-Yesterday’ diataxis 401

Extensional diataxis 404

Mega-duration 405; Sonata form and other euroclassical diataxis 409

Harmony as episodic parameter 414

Conclusions and questions 415

12. Syncrisis…417

Overview 418 Musogenic scenes 418

Density and sparsity 420 Singularity and multiplicity 424

Figure/ground = melody/accompaniment: musical monocentrism 425

Psychotherapists & the soundscape 433 ; Sonic power and subjectivity 435

Bikes, guitar distortion and heavy metal 436; Post-biker paradigms 442

Syncrisis and social anaphones 446 Participants, strands, layers 446

Syncritic organisation and social meaning 449

Solo, unison, heterophony, homophony and counterpoint 449

Cross rhythm 456

Sub-Saharan cross rhythm 456 Hemiola 457

Transatlantic cross rhythm 463 Funk and hocket 465

Group-type manifestations 467 Responsoriality 470

Figure-ground relativity 474

Secondary figures 475

Return to the scene 479 Summary in ten points 483

13. A simple sign typology…485

Anaphones 487

Sonic anaphones 487

Non-vocal anaphones 488 Vocal anaphones 489

Linguistic vocal anaphones 489 Paralinguistic anaphones 492

Tactile anaphones 494

Sensuous string pads 494 Rough and grainy 497

Kinetic anaphones 498

Gross-motoric, fine-motoric and holokinetic 499

Spatial anaphones 500 Gestural interconversion 502

Composite anaphones 509

Galloping 509, Stabbing 510, Newscasting 512, Social anaphones 514

Diataxemes 515

Episodic determinants and episodic markers 515

Unidimensional markers 516

Propulsive reiteration 518 Finality markers 520

Breaks 520 Bridges and tails 521

Diataxis (narrative form patterns) 522

Style flags 522

Style indicator 523 Genre synecdoche 524

So what? 528

14. Analysing film music 529

Invisible music 529 Course description 531

Before the analysis 534

History 534

Introducing concepts 541

Cheap tricks 542 Lissa’s film music functions 546

Other useful concepts 552 Bridge (scribal) 556

The analysis project 556

Overview and aims 556

Choice of film 558 Producing a cue list 558

Choice of analysis scene 562 Feedback 562

Written work 564

Preliminaries 564 Table of musical ideas 565

In-depth analysis 567

Diaboli in musica 567 Sick strings. 567

Graphic score 568 Discursive analysis text 572

General discussion of music throughout the film 573

Appendices, procedure, presentation, technical considerations 575

Too much? 577




New York & Huddersfield: The Mass Music Scholars’ Press, 2012





Please be aware that hyperlinks, including links to other pages in the book, do not work in this specimen file.


SEX is as good a word as any with which to start this book. That’s not so much because it’s an obvious attention grabber as because Western attitudes towards sex share much in common with widespread notions about music: both are characterised by the epistemic dissociation of public from private. Since such dissociation lurks behind key issues addressed in the first part of this book I’d better explain what I mean.

No-one in their right mind would claim that sex, one of the most intimate aspects of human behaviour, has nothing to do with society because no society can exist without human reproduction and because different cultures regulate the relation between sex and society in different ways. Three simple examples serve to prove this obvious point. [1] Public celebrities (politicians, film stars, sports personalities, etc.) are often publicly censured for intimate behaviour relating to their private parts. [2] A wife who has extramarital sex in private can, in some societies, be legally stoned to death in public. [3] In the West we are often subjected to the public display of private sexual fantasies in adverts plastered on billboards, or broadcast to millions of TV viewers, all of whom have to hear intimate voiceovers breathing in their ears or to see extreme close-ups of body parts, all from the audiovisual perspective of a a sexual partner in a private space and, at the same time, all mass diffused by cable or satellite.

Music also oscillates between private and public because musical experiences that seem intensely intimate and personal are often performed publicly or diffused globally. Media corporations rely on shared subjectivity of musical experience not just to sell as much of the same music to as many as possible but also to involve us emotionally in the films and games they produce, to help market the products they want us to buy, and even to sell us as a target group, defined by commonality of musical taste, to advertisers.

In contemporary Western culture the differences between private and public spheres in the fields of both sex and music involve a dual consciousness in that our sense of identity and agency in private is dissociated from whatever sense we may have of ourselves in the public sphere. Deep fissures can arise between how we see ourselves as sexual beings in private and how we respond to displays of sexuality in the media, just as our intensely personal musical experiences seem to be at the opposite end of the notional spectrum to all the technical, economic and sociocultural factors without which much of the music that so deeply moves us could not exist.

Having served its purpose to kick start the central issue of dual consciousness, sex can now be dumped and attention drawn to the rationale behind this book about music. Clearly, I must have thought there was a problem to solve, a lacuna to fill, or at least some error or half-truth to correct, otherwise I could have saved myself the bother of writing these words and you of having to read them. The point is that during my career in music studies I came to realise that the central problem in understanding how music works derives not from the dichotomies of private and public or of subjective and objective in themselves, but from the dual consciousness of individuals unable to link the two poles of those dichotomies. That is of course an epistemological observation. It means that over the years I’ve repeatedly found prevailing patterns of understanding connections between the various spheres of human activity relating to music to be inadequate. Now, if that’s supposed to be a reason for writing a book, it’s also a statement in need of substantiation. In Chapters 2-4 I present evidence supporting the statement. Here in this preface, however, I think it’s better to explore the problem from a more down-to-earth and personal perspective.

Background 1: non-muso

Before concretising this book’s rationale let me first explain what I mean by muso and non-muso. I use muso (without the non-) colloquially to denote someone who devotes a lot of time and energy to making music or talking about it, especially its technical aspects. A muso is in other words someone with either formal training in music, or who makes music on a professional or semi-professional basis, or who just sees him/herself as a musician or musicologist rather than as a sociologist or cultural studies scholar. Non-musos are simply those who don’t exhibit the traits just described and it’s they who feature in this book’s subtitle. The obvious question is why I as a muso think I both can and ought to write about music for non-musos.

The basic idea behind this book started to take shape in the early 1980s when music videos, cable TV, and academics specialising in popular music were novelties. That bizarre conjuncture was, I suppose, one reason why I was asked on several occasions to talk about music videos, a topic on which I’ve never been an expert. The invitations came mostly from people in media studies, linguistics, political science and the like, more rarely from fellow music educators or scholars. Those colleagues in other disciplines seemed to find music videos problematic because, if I understood them rightly, standard narrative analysis was unable to make much sense of audiovisuals that clearly spoke volumes to their (then) young MTV-viewing students. Some of those non-muso teachers had of course deduced that pop video narrative made a different sort of sense when it functioned as visualised music rather than as visual narrative with musical accompaniment. Those colleagues, all qualified to talk about socio-economic aspects of music and about Hollywood film narrative, seemed in other words to be asking me, a musicologist, to help solve epistemological problems relating to music as a sign system.

Aware of musicology’s embarrassing inability at that time to help fellow educators and scholars outside our discipline solve an important problem, I have to admit that, faced with the task of deconstructing musical narrative for non-musos and their students, I felt at the best of times like the one-eyed man (with severely impaired sight to boot) in the land of the blind. Since then I’ve acquired partial vision in the other metaphorical eye. That slight improvement means I think I can now see enough, however blurred, to write this book, a task I wish were unnecessary and which I wouldn’t have undertaken if I didn’t think music was important. Trouble is that, judging from music’s humble status in the pecking order of competences housed in most institutions of learning, it’s all too easy to believe that maths, natural sciences and language must all be more useful than music whose pigeonholing as art or entertainment implies that it’s little more than auditory icing on the cake of ‘real knowledge’. As we’ll see in Chapters 1-3, everyday extramural reality tells quite a different story.

Readership and aims

Although this book will hopefully also interest musos, it’s primarily intended for people like Dave Laing, Simon Frith, my daughter and the teachers just mentioned, i.e. educated individuals without formal or professional qualifications in music or musicology —’non-musos’— who want to know how the sounds of music work in the contemporary urban West. It’s for those who want to understand: [1] how music’s sounds can carry which types of meaning, if any; [2] how someone with no formal musical training can talk or write intelligently about those sounds and their meanings. To cover that territory in a single book, simplifications and generalisations will be unavoidable. At the same time, in order to make sense of the territory, it will also be necessary to summarise basic tenets of music’s specificity as a sign system and to defuse such epistemic bombs as absolute music and music as a universal language (Chapters 2-3).

This book will not tell you how to make music, nor does it provide potted accounts of composers, artists, genres or of the music industry; nor will it be of any use to students cramming for music theory or history exams. It certainly won’t help you bluff your way through conversations about jazz, folk, rap, rock, dub step, classical music or ‘world music’. And under no circumstances whatsoever will it claim the superiority of one type of music over another: there is plenty of literature of all the types just mentioned. This book’s job is to present, without resorting to more than an absolute minimum of musical notation and in terms accessible to the average university student outside music[ology], ways of understanding the phenomenon of music as a meaningful system of sonic representation.

The appearance of this book is further motivated by factors linked to the emergence of popular music studies as a field of inquiry in higher education. The majority of scholars in this field have tended to come from the social sciences and the non-muso humanities (communication studies, cultural studies, film studies, political science, sociology, anthropology, cinema, literature, etc.) rather than from departments of music or musicology. Like the teachers flummoxed by pop video narrative in the early 1980s, these colleagues have understandably tended to steer clear of the music in popular music, leaving an epistemic void which musicologists have only recently started trying to fill. Since the early 1980s, when I conducted reception tests on title tune connotations and, more notably, since the 1990s, when I started teaching popular music analysis to students with no formal musical training, I’ve seen repeated proof of great musical competence among those who never set foot inside musical academe. It’s a largely uncodified vernacular competence that has with few exceptions been at best underestimated, more often trivialised or ignored, not only in conventional music studies but also by those individuals themselves. This kind of competence is discussed in Chapter 3 and used as one starting point for the method and analysis sections in this book.

It would at this stage be fair to ask, given ‘musicology’s embarrassing inability… to help fellow educators and scholars outside [the] discipline’, how a musicologist, with all the baggage of that discipline, can possibly explain anything useful about music to non-musos.

Although initially trained as musician and composer, my involvement in popular music studies, including music and the moving image, has brought me into contact just as much with non-musos as with fellow musicians and musicologists. That contact with non-musos ought, I hope, to have taught me enough to know what sort of things need explaining about the specifics of music as meaningful sound to those who have heard, enjoyed or otherwise reacted to it but who aren’t specialists at making it or verbalising about how it’s made. Nevertheless, since it’s impossible to gauge each reader’s prior knowledge in or about music, I have to apologise in advance if I misjudge the reader’s intelligence or musical competence. I must also apologise to eventual muso readers if, in the interests of a projected non-muso readership, I oversimplify the complexities and subtleties of music making. With those two caveats out in the open, I have to mention a third risk of misunderstanding, particularly about the first part of this book (Chapters 1-5).

If one of the book’s aims is to help seal the epistemic fissure of dual consciousness in relation to music, then I will, like it or not, have to visit areas of knowledge in which I myself have no formal training. The trouble is that the notional gaps between music as subjective experience and everything else to which it’s clearly related are more likely to be exacerbated than healed by disciplinary boundaries institutionally delineating distinct areas of competence. This means that if, as a muso, I cross the border into, say, sociology, semiotics, neurology or communication studies, I risk offending specialists whose institutional territory I enter without the mandatory visa of disciplinary competence. In such instances I can only apologise and beg authorities in the territory I am judged to have violated to treat me no worse than they would an uninformed but inquisitive tourist with honourable intentions. Notwithstanding that apology, it might be more constructive to interpret at least some of my ‘illegal entries’ in terms of a naïve but potentially useful challenge to the foreign discipline. After all, challenges in the opposite direction —against music studies from the non-muso ‘outside world’— inform many of this book’s key issues.

Background 2: muso

When, as described above, those non-muso teachers asked me to explain how the music in pop videos worked they were indirectly questioning ‘my’ discipline. They seemed to be assuming that musicology could come to the rescue at a time when the discipline rarely showed interest in either popular music or matters of musical meaning. Their assumption could in that sense be considered naïve because it didn’t account for the institutional reality of conventional musicology; but it also indirectly and, I believe, justifiably questioned our discipline’s usefulness and legitimacy. Be that as it may, their non-muso assumption about what musicology ought to be doing resonated with my own misgivings about the discipline, particularly in terms of its apparent reluctance to deal with matters popular or semiotic. My questioning was different from theirs only in that it derived, as I see it, from mainly muso experience. That experience is worth recounting for several reasons. [1] It helps me retrospectively sort out key events influencing my involvement in and ideas about music. [2] Some familiarity with that process makes my personal and ideological baggage more transparent to readers who can then ‘see where I’m coming from’ and apply whatever filter seems appropriate to any passage with which they may disagree. [3] The account that follows also illustrates central problems in the epistemology of music and partially explains why this book has been such a long time in the making.

Brief muso autobiography

I can’t have been much older than four when I first registered that music was as sound connected to things other than itself. I remember bashing clusters on the top notes of a piano and screaming ‘lightning’, then thumping a loud cluster on its lowest notes and yelling ‘thunder’ as I sat under the keyboard in delighted trepidation at the threatening sounds I’d produced. Not even then (1948) did I actually believe that the top notes ‘were’ or even ‘meant’ lightning and the bottom ones thunder, although I might well have said so if asked, but I was even then clear that the high sounds could not possibly be linked to thunder and that the low ones were unthinkable in terms of lightning. Having patiently put up with this sort of cacophony on the piano for a year or two, my parents decided, for the sake of the family’s sonic sanity, that I should be given piano lessons.

In 1952, aged eight, I was blessed with a piano teacher, Jared Armstrong, who, identifying the motoric torpor of the fingers on my left hand, looked out of the window at snow falling from a grey sky and jotted down an eight-bar piece called North Street in a Snow Storm, complete with a mournful melody to exercise my left hand and bare, static sonorities to occupy the right. In the summer he swapped my hands around in By the Banks of the Nene, another eight-bar mini-piece which this time featured a quasi-folk tune in the right hand and a static bagpipe-like drone in the left. As with the thunder and lightning, I didn’t think North Street in a Snow Storm ‘was’ or even ‘meant’ a snowstorm in the street outside our house any more than I believed the banks of our local river to actually ‘be’ in By the Banks of the Nene. I just instantly recognised the sort of mood my piano teacher had intended to put across and was in no doubt whatsoever as to which title belonged to which piece. I knew in other words that the pieces neither sounded nor looked like what their titles denoted, but I did think they sounded like what it might feel like to see or to be in the scene designated by each title, even though I was obviously incapable at that age of distinguishing, albeit in such simple terms, between that type of connotation and other sorts of signification.

One year later I had to take lessons from a different piano teacher who made me sit national piano exams for which I had to prepare pieces drawn mainly from the euroclassical repertoire. Then, aged twelve, I was awarded a music prize. It was in front of the whole school that a local classical music celebrity presented me with a cloying biography of Mozart the Wunderkind and made a short speech in which he seemed to imply that the tiny classical parody I’d recently written was something of which the young Mozart would not have been ashamed. Well, Mozart might not have been but I was. That the local celebrity had mistaken my facetious parody for a straight style composition was one thing; worse was the resentment I felt, caused partly the Mozart book prize and partly by the local celebrity’s words, at being compared to a sad freak in a powdered wig who used boyish charm and pretty music to ingratiate himself among doting rich-and-famous grown-ups in late eighteenth-century Austria. It struck me that classical music’s local representatives —my piano teacher, the celebrity dishing out the prize, etc.— were treating me too as a precocious freak, perhaps hoping that, if flattered enough at regular intervals, I’d join their ridiculous ranks, and, like an obedient dog, perform more musical tricks for them. In retrospect I suppose that recruiting another circus animal might have helped boost their credibility in the artistic talent stakes of their own social aspirations, but at the time I felt angry and insulted. Wanting no part in their weird world I resolved to outrun everyone both in the 200 metres and on the rugby pitch, to go for longer bike rides, and to devote myself at the earliest opportunity to music that seemed to actually work, that had some real use and that didn’t ‘ponce about’.

As luck would have it my next teacher, Ken Naylor, held no fascination for freaks. He was an accomplished pianist, composer and church organist who ran choirs and orchestras with great skill, who wrote mean close-harmony arrangements and who taught me how to play jazz standards. He encouraged me to compose and improvise, and introduced me to Bartók, Stravinsky and Charlie Parker, as well as to the anthems and madrigals of Elizabethan composers. As my organ teacher, he also made me transpose hymns into more manageable keys for the congregation, and encouraged me to change their harmonies in the last verse to add a bit of drama to the drab routine of daily prayers. He even helped me overcome my Mozart trauma by drawing attention to the composer’s ability to transform ‘prettiness and wit’ into passages of wondrously disturbing regret. Ken Naylor’s professional eclecticism was living proof that no type of music could be seen as intrinsically superior or inferior to another, and that music learnt and produced by ear was just as legitimate as what you played or sang from notation. Of more obvious direct relevance to the analysis parts of this book were his practical demonstrations of relations between music as sound and ‘something other than itself’, most strikingly the word-painting skills I learnt from him when accompanying hymns in the school chapel.

Following through on the vows I’d made aged twelve, I joined a trad jazz combo while still at school and later, at university, a Scottish country dance outfit and an R&B/soul band. In those three ensembles, as well as in other non-classical groups I subsequently worked with, I was the only member with any formal musical training. Being in the minority, I had to curb my specialist tongue whenever we needed to discuss the sorts of sound we wanted to make. Fortunately, verbal denotation of musical structure was rarely necessary because differences of opinion were almost always settled practically using actual or imagined sound to compare musical idea x with alternative y. At no time did I ever think that my fellow band members’ lack of formal vocabulary denoting tonal structure meant that their musical skills and knowledge were in any way less valid or less systematic than those I had learnt in formal studies of the European classical repertoire. On the contrary, it soon became clear that the arsenal of structural terms I’d had to acquire in order to obtain a B.A. in music was quite inadequate, not least when it came to issues of rhythmic/motivic bounce and drive (as in grooves and riffs), even more so when denoting the details of timbre so important in so many types of popular music.

It also became clear that I was inhabiting at least two different sociomusical worlds with different repertoires, technologies, functions, values and modes of metadiscourse. However, I never really believed that I was myself living two musically separate lives. True, the institutional and social dividing lines between the official version of euroclassical music and all the other musics with which I’d come into contact were real enough; but just as my personality remained basically in tact when I learned to speak other languages, I felt I was the same musical person regardless of whichever musical idiom I happened to be playing in or listening to. The problem, I insisted perhaps arrogantly, was not with me but on the outside. If that were so I would, in the social reality outside my head, so to speak, have to confront one sphere of musical activity with another. That sort of confrontation involved not only efforts to persuade fellow rock musicians to join me at a performance of Bach’s Matthew Passion and fellow euroclassical music students to listen to my Beatles tapes; it also involved developing verbal discourse, comprehensible to members of whichever group I was arguing with, that could explain in their terms the expressive and creative qualities of whichever music was unfamiliar in their socio-musical sphere. This stubborn insistence, inspired in no small part by Ken Naylor’s living proof of musical eclecticism’s obvious advantages, meant that I acquired practical training in verbal mediation between musos and non-musos, rockers and jazzos, classical buffs and pop fans, etc. That practical training was also useful preparation for writing this book.

The sort of confrontation just described seemed in general to go down better with popular music acquaintances than with their euroclassical counterparts. One probable reason, I think, is that the former had nothing to lose in opening up to the latter whereas those whose career, mortgage payments or meta-identity depended on attaining or maintaining a higher sociocultural status did. As explained in Chapter 3 the classical music = high class equation was fuelled by the metaphysical aesthetic of ‘absolute music’ which, by theoretically locating the most noble of musical experiences outside the material world, enabled the privileged classes not only to feel culturally superior by appearing to transcend mundane material reality but also to divert attention from the fact that it was they who wielded the real power actually in the material world. Given that no-one likes losing their privileges, even if (or perhaps especially if) they are illusory, it was in retrospect naïve of me, if not plain stupid, to expect those with a vested interest in maintaining the absolute music aesthetic as part of the classical = class equation to recognise equal value in other musics or to welcome the discussion of music as if it meant anything except itself. The difficulty was that the world of euroclassical music, as I knew it in 1960s Britain, was highly contradictory about these matters.

While I knew very well, from working at the Aldeburgh Festival and from frequent visits to evensong at King’s College Chapel (Cambridge), that euroclassical music was often performed with great expressivity (‘as if it really meant something’), the music degree programme I followed at Cambridge focused mainly on technical and archival tasks. We had to ‘complete this motet in the style of Palestrina’ without considering the expressive imperative of words like crucifixus or resurrexit, to decipher lute tablature without sparing a thought for Dowland’s word painting, and to write essays about Wagner without linking his work to the moral, philosophical or political ideas of the composer or his times. None of it seemed to make any sense. Meanwhile I carried on gigging sporadically with the R&B band in pubs, in clubs and on student dance nights, performing numbers like I’ll Go Crazy, Walking The Dog and Route 66. That sort of musical activity, on the other hand, made very obvious social sense to me.

It was with relief that I left the Renaissance theme park of Cambridge in 1965 to do a teaching diploma in Manchester where the pragmatics of music education, including its social implications, were clearly on the agenda. It was at the height of the pop boom in northern England and I was encouraged to submit an end-of-year mini-thesis about the possible uses of pop in music education (Tagg, 1966). I also managed, during my teaching practice, to keep a class of usually rowdy pupils quietly and enthusiastically occupied writing horror film scenarios following the third movement of Bartók’s Music for Strings, Percussion and Celesta. Fourteen years later Stanley Kubrick repeated the same exercise, using the same music to underscore three scenes in The Shining (1980). If it was OK for Kubrick to link music and picture in that way, I argued retrospectively, it can’t have been wrong for me or my pupils to have tried our hands at it, even if the scenarios we produced were nowhere near as good as Kubrick’s. It was in any case more grist to the mill of linking music to other things than just music, and it was further evidence of unquestionable musical competence among a non-muso majority that included both Kubrick and my secondary school pupils.

Despite considerable encouragement from my supervisor for what must at the time have seemed quite bizarre ideas for music education, other end-of-year examiners were more conservative and predictable. They seemed to dislike my lack of enthusiasm for subjecting boys aged thirteen through sixteen to intensive vocal training and they disapproved of my reluctance to make proper use of the school’s Orff instruments. Then, when I looked in the Times Education Supplement for music teaching jobs, my heart sank deeper as I discovered I’d be expected to run recorder groups in one school, enter pupils for Associated Board exams in another, to teach piano and at least one wind instrument in a third, and so on. I had to conclude that there was no job in education for someone passionate about the popular and semiotic sides of music, plenty for those plodding down the same old path of performing the classics. That’s why I dumped music education as a career option and took a job in Sweden teaching English as a foreign language, keeping music on as just a hobby (1966-68).

I was much happier with music on the sidelines, so, after two years at my new job in Sweden, I decided to retrain as a language teacher (1968-71). I enrolled at the University of Göteborg and changed my musical sideline from being in a rock band to singing in a choir. Now, one of the altos in the choir (Britt) was married to a man called Jan Ling, who had recently been asked by the Swedish government to set up a new music teacher training college. Ling told me that popular music would be on the curriculum and that I was the only person he had met with the triaxial profile: [1] degree in music, [2] teaching diploma, [3] experience of making popular music. When asked to teach some music analysis at the new college in 1971 I leapt at the opportunity. I was eager to try out ideas that had lain dormant since abandoning music as a career option, but I soon ran into difficulties.

The main problem was that the ideas I had about ‘meaning’ in popular music were mostly intuitive, informed by music-making experience, not by any process of analytical reasoning. I had no coherent theory codifying that intuitive knowledge and only very patchy empirical evidence of structural aspects relating to musical semiosis in any shape or form. It was clear that if those ideas were to be of any use in education, they would have to be tested in various ways until viable patterns started to emerge that in the longer term might together constitute an at least partially coherent body of theory and method. Most of the initial testing took place in analysis classes where the students’ recurrent mistakes, questions and insights forced me to formulate potentially useful patterns of analytical theory and approach. So, armed with my own experiences of music and music making, with comments and questions from music students, with Dave Laing’s appeal for ‘a semiotic dimension’ to the study of popular music (Laing, 1969: 194-6), and with a few rudimentary concepts imported into musicology from hermeneutics and semiotics, I ended up producing a doctoral thesis in 1979 about the meanings of the title music to the TV series Kojak.

The Kojak thesis generated plenty of encouraging reactions but it was also criticised for concentrating on one single piece of music and for its lack of empirical underpinning. That’s why, in the 1980s, I conducted numerous reception tests on ten title tunes (not just one) and, with Bob Clarida’s help, started dealing with response data, transcriptions and musical analyses. The idea was to investigate listener responses in relation to structural elements in the ten theme tunes and, in the process, to thoroughly test, fine-tune and improve the analytical methods proposed in the Kojak thesis. Due mainly to the wealth of listener responses and their often complex connection to the musical structures eliciting them, Ten Little Title Tunes (TLTT) proved to be a mammoth undertaking. In addition, logistical factors, including full-time teaching commitments, the academic imperative to publish a yearly quota of articles or die, and moving continents, meant that the 914-page book was not completed until December 2003.

Even though I’d been encouraged, at various points during the 1980s and 1990s, by respected friends and colleagues like Dave Laing and Simon Frith to produce a book like this one, and even though I’d been approached by a respected publishing house interested in a book with the working title Music’s Meanings, I felt unable to start work on it before completing TLTT (Ten Little Title Tunes). It just didn’t feel right to write, let alone publish, Music’s Meanings until the theory and method I wanted to propose in it had been tested. TLTT documents that process of testing in considerable detail. It’s often used as a source for ideas and information in this book (see p. 17).

Just as important in laying the groundwork for this book are all the students who since 1971 attended my analysis classes. Between October 1993, when non-musos first joined my MA seminar in Liverpool, and December 2009, when I retired, I spent over 2,000 hours teaching some sort of semiotic music analysis to around 800 students. That means lots of analyses marked, lots of questions asked, lots of discussion and lots of opportunity to observe which ideas and methods caused problems or led to good results. Much of this book relies heavily on that teaching experience and on the lessons I learnt about what did and did not work, what was unnecessary, what needed clearer explanation, etc.

This book also draws on decades of having to confront ‘received wisdom’ about music and musical learning. I’m referring to various taboos and articles of faith according to which music is considered as an almost exclusively subjective, magical and irrational phenomenon of human experience that needs to be kept in a conceptually separate compartment from any systematic or rational notion of how knowledge and meaning are created and mediated. My personal credo is that failure to be rational and objective about what is habitually pigeon-holed as irrational and subjective is tantamount to intellectual treachery in a culture and society which exploits our dual consciousness for short-term goals of political or financial gain. Therefore, in order to prepare the way for the sort of theory and method presented in Chapters 6-14, I have to examine, explain and deconstruct the articles of faith which have for such a long time obstructed the development and spread of viable and democratic ways of talking about music ‘as if it meant something other than itself’.

In short, extensive testing of analysis procedures in the classroom and repeated exposure to ‘received wisdom’ about music means that I felt confident enough in 2007 to start work on this book so that the background, theory and practice of those analysis procedures could be presented to a wider public.


TLTT (Ten Little Title Tunes; Tagg & Clarida, 2003) is a 914-page tome to which I often refer in this book. To avoid having to explain the rationale and procedures of TLTT each time, here’s a brief resumé of information relevant to its use in this publication. My back cover sales pitch for TLTT included the following statements.

‘[TLTT] documents the associations of hundreds of respondents to ten extracts of music, each heard without visual accompaniment but used… as film or TV title music. It deals with links between listener connotations and musical structures in the global, Anglo-US-American mass-media culture of the late twentieth century, analysing musogenic categories of thought which own serious ideological potential.

Under headings like Minor Amen and crisis chords, Sighing sixths and sevenths, Country & Latin clip-clop, Big-country modalism, Ethnic folk lutes, anaphonic telegraphy, Busy xylophones and comic bustle, The Church of the Flatted Fifth and P.I. Cool, Latin percussion and eye shadow, etc., [TLTT] reveals how notions of gender, love, loneliness, injustice, nostalgia, sadness, exoticism, nature, crime, normality, urgency, fashion, fun, the military, etc. are musically mediated.’

The basic story is that between 1980 and 1985, and for methodological reasons already mentioned (p. 15), I played the ten title tunes to individuals attending one of my lectures or seminars. Most of the 600-odd respondents subjected to the this exercise were Swedish, but the tunes were also tested on 44 Latin Americans. Many respondents were students still in, or who had recently left, tertiary education, some were in secondary education, others in adult education. The representation of men and women as well as of musos and non-musos was roughly equal. The basic reception test procedures, including their construction, implementation and result classification are described in Chapter 6.

TLTT involved a lot of statistical and analytical donkeywork. Since one main aim was to find out how much of what respondents imagined as associated with what in the ten pieces, each tune had to be painstakingly transcribed, not to mention all the relevant bits of IOCM, and responses had to be grouped in categories so that, for example, the number of men or women imagined in connection with one tune could be reasonably compared with the number of men or women associated with another. That comparison provoked an enlightening but disturbing discussion of the representation of male and female through music. Suffice it here to say that response statistics from TLTT cited in this book can be interpreted using the following example.

Over 50% of respondents mentioned something in either of the categories love or male-female couple on hearing the first tune in the test battery. Bearing in mind that the average number of concepts reported per person per tune was greater than three and that the test was one of unguided association, 50% is a very high score indicating that every other respondent independently chose to write down words like love, romance or couple on hearing the piece —and that’s excluding responses like stroking, floating, slow motion, embracing, kissing, dreaming and wondering. Associations in the campestral category (grass, meadows, fields, etc.) were also common (15%), as were responses like walking through/over/across the scene (14%), in spring or summer (13%) some time in the nineteenth century (8%), most likely somewhere in Northwestern Europe (5%), definitely not in Asia, Africa or anywhere on the American continent (all 0%). Nor were any detectives, spies, cowboys, villains, crime, streets, disorder, or modern times mentioned by anybody: there was nothing fast, cosmic, urban, inimical, threatening, eruptive, conflictive, military, asocial or anything else of that type evoked by one or more of the other nine pieces, in any respondent’s imagination on hearing the piece. The percentages simply represent the probability of any of the individual test subjects coming up with a particular connotation in unguided response to one of the ten test tunes, or of mentioning a connotation subsequently classified in one of the categories listed in the VVA taxonomy shown as Table 6-1 (p. 209, ff.).


To avoid unnecessary confusion I’ve tried as much as possible to stick to established concepts and definitions when writing this book. The only trouble is that established terminology is sometimes the cause of confusion, not its remedy. This is partly true for semiotic concepts in need of adaptation to specifically musogenic types of semiosis, whence neologisms like anaphone, genre synecdoche and transscansion (see Chapter 13 and Glossary). Much more serious is an embarrassingly illogical and ethnocentric set of key concepts used in conventional music studies in the West to denote musical structures bearing on the organisation of pitch. I’ve dealt with these issues in ‘The Troubles with Tonal Terminology’ (Tagg, 2011f) and suggested more adequate definitions of words like note, tone, tonality, mode, polyphony and counterpoint. The most important of those clarifications are summarised in Chapter 8 (p. 272, ff.).

Just as problematic is the notion of form which in conventional music theory means the way in which episodes (sections) in a piece of music are arranged in succession into a whole along the unidirectional axis of passing time. That is indeed one aspect of musical form, but there is another, equally important ―and arguably more fundamental― aspect of form which seems to have largely escaped the attention of conventional musical scholarship. I’m referring to ‘now sound’ as form created through the arrangement of simultaneously sounding strands of music into a synchronic whole inside the extended present (p. 272, ff.). Without the shape and form of those batches of ‘now sound’, the conventionally diachronic aspect of musical form cannot logically exist. It’s sometimes called ‘texture’ but that’s only one aspect of synchronic form. Obviously, if both types of form constitute ‘form’, other words are needed to distinguish between the two. To cut a very long story short, I was unable, after extensive investigation and epistemic agonising, to find any adequate conceptual pair of labels to cover the essential distinction between those two types of musical form. I had no alternative but to introduce the terms diataxis to denote the diachronic and syncrisis the synchronic types of musical form. The two concepts are explained in a little more detail at the start of Chapter 11.

Overview of chapters in Music’s Meanings

This book falls roughly into two parts. Part 1, ‘Meanings of “music”’ (Chapters 1-5), clears the conceptual and theoretical ground for the bulk of the book in Part 2, ‘Meanings of music’ (Chapters 6-14), which focuses on analysing music ‘as if it meant something other than itself’ and on the parameters of musical expression.

Part 1 —Meanings of ‘music’?

Chapter 1 —How much music? (pp. 35-41)— estimates the importance of music in terms of time and money in the everyday life of people living in the urban West.

Chapter 2 —The most important thing… (pp. 43-81)— starts with definitions of and axioms about ‘music’, including the concept of concerted simultaneity, the non-antagonistic contradiction between music’s intra- and extrageneric aspects, and the basic tenet that music is not a ‘universal a language’. After an intercultural comparison of words denoting what we call ‘music’ and a short history of the concept in European thinking, music’s relation to other modes of human expression is discussed using observations from the anthropology of human evolution as well as from theories of cross-domain representation, synaesthesis and the cognitive neuroscience of music. The chapter finishes with a section on affect, emotion, feeling and mood, followed by a final word about the use of verbal metaphors of perceived musical meaning.

Chapter 3 —The epistemic oil tanker (pp. 83-132)— confronts the notion of absolute music, tracing its history, demystifying its articles of faith, including those of its latter-day ‘postmodernist’ counterpart, and deconstructing its ideological implications. The chapter’s second part identifies institutional splits in musical knowledge (poïetic v. aesthesic etc.) that exacerbate the polarities of dual consciousness. It also explains why notation was for such a long time considered the only valid storage medium in conventional music studies.

Chapter 4 —Ethno, socio, semio (pp. 133-154)— discusses the three main disciplinary challenges to conventional music studies in the twentieth century: ethnomusicology, the sociology of music and the semiotics of music. It highlights their contribution, real or potential, to developing the sort of music analysis covered in Part 2, underlining the importance of ethnomusicology and empirical sociology, and addressing the problems of music semiotics in dealing with semantics and pragmatics.

Chapter 5 —Meaning and communication (pp. 155-193)— is the book’s semiotic theory chapter. It explains key concepts like semiotics, semiology, semiosis (incl. object - sign - interpretant), semantics, syntax, pragmatics, sign type (icon - index - arbitrary sign), denotation, connotation, connotative precision, polysemy, transmitter, receiver, codal incompetence and codal interference. All these concepts are essential to the adequate treatment of the book’s main analytical questions about musical meaning.

Part 2 —Meanings of music

Chapter 6 —Intersubjectivity (pp. 195-228)— presents the first of two ways of getting to grips with the meaning of a musical analysis object. Six reasons for prioritising the aesthesic rather than poïetic pole are followed by a brief presentation of how ethnographic observation can help in the semiotic analysis of music. Much of the chapter deals with reception tests, the categorisation of verbal-visual associations (VVAs), the establishment of paramusical fields of connotation (PMFCs) and other important steps in the collection and collation of response data. The chapter ends with a short section on the use of library music in systematising reception test responses.

Chapter 7 —Interobjectivity (pp. 229-261)— focuses on intertextual approaches to the investigation of meaning in music. After the definition of essential terms — object, structure, museme — the two-stage process of interobjective comparison is explained, complete with advice on collecting interobjective comparison material (IOCM) and on the establishment of paramusical fields of connotation (PMFC). Verification procedures — recomposition, commutation — are also explained and the chapter ends with a section that should allay non-muso anxieties about the designation of music’s structural elements as an essential part of analysis procedure.

Chapter 8 —Terms, time & space (pp. 263-303)— is the first of five to focus on parameters of expression, i.e. on structurally identifiable factors determining how music sounds and what it potentially communicates. The first section summarises paramusical parameters (audience, venue, lyrics, images, etc.) and their role in the construction of musical meaning. It also includes explanations of basic terms essential to subsequent discussion— genre, style, note, pitch, tone, timbre and the extended present. Most of the chapter is devoted to simple explanations of temporal-spatial parameters, including duration, phrase, motif, period, episode, speed, pulse, beat, subbeat, tempo, surface rate, rhythm, accentuation, metre and groove. It ends with a section on aural staging, i.e. the placement of different sounds in different (or similar) types of acoustic space, both in relation to each other and as a whole in relation to the listener.

Chapter 9 — Timbre, loudness and tone (pp. 305-342)— covers the second set of parameters of musical expression. After reviewing instrumental timbre (vocal timbre is covered in Chapter 10) and how it creates meaning, an overview of acoustic devices and digital effects units explains everything from pizzicato and vibrato to distortion, filtering, phasing, limiting and gating. Then, after a short section dealing with loudness, volume and intensity, the rest of the chapter provides a rudimentary guide to things tonal, including pitch, octave, register, interval, mode, key, tonic, melody, tonal polyphony, heterophony, homophony, counterpoint, harmony, chords and chord progressions.

Chapter 10 — Vocal persona (pp. 343-381)— concentrates on one complex of parameters of musical expression whose meaningful details non-musos tend to identify and label more easily than musos do. These aesthesic and vernacular characterisations of spoken and singing voices are sorted into a taxonomy including descriptors of vocal costume, as well as those derived from demographics, professions, psychological and narrative archetypes. Practical ways of relating vocal sound to posture and attitude are explained so that its meanings can be more easily grasped and verbalised as part of the semiotic analysis.

Chapter 11 —Diataxis (pp. 383-416)— is the first of two long chapters about composite macro-parameters of musical expression. It deals with the narrative shape and form of music’s episodes, with its diachronic, extensional and chronologically more ‘horizontal’ aspects. It focuses on concepts like verse, chorus, refrain, hook, bridge, strophic form, AABA form, sonata form and the ways in which such ordering of musical episodes creates meaning.

Chapter 12 ― Syncrisis (pp. 417-484) ― deals with the synchronic combination of sounds in music, with the intensional and chronologically more ‘vertical’ aspects of form, with issues of singularity, multiplicity, density and sparsity, etc. The melody-accompaniment dualism is examined as musical parallel to the perceptual grid of figure-ground in other art forms and leads to a discussion of how different types of subjectivity and patterns of social organisation can be heard in contrapuntal polyphony, heavy metal, electronic dance music, unison singing, heterophony, homophony, cross rhythm, responsorial practices, bass lines, etc., as well as in various group-type manifestations, e.g. rock bands, symphony orchestras. The chapter ends with examples of the dual figure-ground relationship heard in innumerable pop songs and title themes, and with a brief glimpse into ‘figureless’ or ‘bodiless’ types of syncrisis.

Chapter 13 —A simple sign typology. With potentially meaningful musical structures (musemes, museme strings and stacks, diataxis and syncrisis) identified and linked to possible fields of paramusical connotation, this chapter presents workable ways of checking the viability of those links. Does the museme relate to its PMFC as an anaphone through the process of gestural interconversion, or as a genre synecdoche by referring to other music and its connotations, or is it an episodic marker signifying start, end or bridge…? Or does it, as a style indicator, identify a ‘home style’ in relation to other styles of music? Or is it a combination of more than one of those basic sign types?

Chapter 14 —Analysing film music — illustrates how ideas and procedures presented in the book can be put into practice. After a description of the course Music and the Moving Image and a discussion of conceptual prerequisites to the subject, the rest of the chapter focuses on the student assignment Cue list and analysis of a feature film, concentrating on underscore and presenting ways of explaining how music contributes to the overall ‘message’ of both individual scenes and to the film as a whole.



Terms that I’ve borrowed, adapted or had to coin in order to designate phenomena relevant to the ideas presented in this book are listed alphabetically and defined in the Glossary (p. 579, ff.). Specifically muso terms that may need explanation (e.g. pizzicato, sul ponte) and aren’t included in the Glossary can be easily checked on line using, say, the reliable Wikipedia glossary of musical terminology at G [120111]. Please note that ‘G’ indicates a web address (URL, see ‘Internet references’, below).


To save space and to avoid confusion about which appendix to consult when checking source references, this book has only one reference appendix, the ‘Reference Appendix’ (abbreviated ‘RefAppx’). Other substantial reasons for including ‘everything’ in one appendix, as well as all the icons used to save space, are explained at the start of the Reference Appendix on page 605.


The software used to produce this book, Adobe FrameMaker v.8, has one irritating defect: if there isn’t enough room at the bottom of the current page for the complete text of a footnote, it puts the entire footnote text at the bottom of the following page. Therefore, if there is no text at the bottom of the page on which a footnote flag number occurs in the main body of text, don’t be alarmed. The complete footnote text will appear at the bottom of the following page.

You may also occasionally find the same footnote number, like the little ‘28´ here, occurring in the main text twice in succession, like this.28 Don’t fret. Both numbers intentionally refer to the same footnote.

I know that some readers find my use of footnotes excessive and annoying. While I sincerely regret causing readers irritation, I persist in my struggle for the right to footnote for the following eight reasons.

1. Many footnotes consist of either references to other work or of extended argumentation about, or exemplification of (see §2), a topic which, for reasons of space and clarity, cannot be included in the main body of text. Readers sceptical about some of the things I try to put across need to know if I have any backing for what I write. Since it would be unfair to lumber all readers with that sort of extra evidence, I try to make it as unobtrusive as possible by consigning it to footnotes.

2. Many footnotes refer to actual pieces of music exemplifying observations made in the main text. All those musical references are listed in the RefAppx, together with source details. A substantial proportion of those sources include direct hyperlinks to recordings that can be heard at the click of a mouse. Since I cannot possibly know which of my comments about music will be understandable without exemplification to every reader, even less know which music examples will be familiar to each and every one, and since it would be unfair to lumber every reader with text that may be obvious to some, I put reference to those musical examples in footnotes for those who want to ‘hear what it sounds like’.

3. A fair number of footnotes contain URLs, some of which are notoriously long and cannot be included in the main body of text without seriously upsetting the flow of reading.

4. Some readers are simply inquisitive and just want to know a bit more about a topic that I can’t fully cover in the main body of text. I try to provide pointers for those readers if and when I can.

5. Since this book is written with a mainly non-muso readership in mind, I’ve painstakingly tried to reduce both musical notation and musicological jargon to an absolute minimum in the main body of text. On a few occasions, however, additional structural information potentially useful to musos has been consigned to footnotes.

6. Despite the donkeywork involved in writing footnotes (about 50% of the effort invested in producing this book), I think that academic procedures for source referencing are important so readers know when the author is aware of using someone else’s ideas. It’s also important, I think, for readers to be able to find verbal, musical and audiovisual source materials relevant to what I write about. The main body of text would be much less readable if it included all those references. Footnotes provide a compromise solution to that problem.

7. As I try to explain in Chapter 2, music is a combinatory and holistic symbolic system involving cross-domain representation and synaesthesis. That in turn means that talking or writing about music can (and maybe should) go off in almost any direction. Although I make valiant efforts in this book to toe the one-dimensional line of the written word, it would be dishonest to give readers the impression that the richness and precision of musical meaning can be realistically explained using the linearity of verbal discourse and nothing else. Therefore, while such linearity can be useful when discussing music’s meanings, there are occasions when it becomes inappropriate and when ‘going off at a tangent’ is the only viable discursive strategy. That said, if I were to put every possible tangent, every pertinent train of lateral musogenic thought, into the main body of text it would at best read like a bad parody of passages from Tristram Shandy (Sterne, 1759-67). I therefore take the occasional liberty of putting some of the inevitably lateral thinking that comes with the territory of music into footnotes.

8. Contradictions inside conventional music theory, as well as between musical and verbal discourse, are sometimes downright comical. I’ve included a few such items in the main text, for instance the dubious assumption that music is polysemic and the implication that ‘atonal’ music contains no tones. A few other jokes are peripheral to the main argument and have been relegated to footnotes. Typical examples of footnote frivolity are: [1] in the section on transscansion, where I suggest gormless words you could sing to the Star Wars theme (Williams, 1977); [2] in the section on sonic anaphones, where I raise the issue of whether or not live poultry was used in Psycho Chicken (The Fools, 1980).

It’s for these eight reasons that I beseech those irritated by footnotes to treat them indulgently, to tolerate their presence or, if need be, to simply ignore them. Reading footnotes is after all an option. They aren’t forced on you and, unlike advertising and other types of propaganda, they don’t assume you’re an infantile moron. If the footnotes still bother you, just treat them like bonus features on a dvd: you don’t have to watch those any more than you have to read my footnotes or open online ad links. You decide what you want to read. I don’t.


I was initially embarrassed by the number of references made in parts of this book to my own work. Rest assured that I’ve nothing to gain from self-promotion now that I’m a pensioner and my career is over. I have to refer to myself simply because this book draws much more on my own experience as a music practitioner, teacher and scholar than on anyone else’s. I just thought it better, where appropriate, to refer to my own work than to pretend with false modesty that I’d produced nothing that could possibly provide a little more ‘meat on the bone’.

In-text and footnote source references

Audiovisual and musical source references follow the same principles as bibliographical source references. For example, ‘Norman (1962)’ refers uniquely to publishing details entered in the Reference Appendix (RefAppx, p. 605) for the original recording of The James Bond Theme (p. 635). This book also contains active hyperlinks, for example:


Since I cannot predict how familiar every single reader will be with all the topics discussed in this book, I’ve included many internal cross references to pages where particular topics are covered. If you’re reading this on a computer or on a tablet, and if you’re using Adobe Reader, most of these internal references will work as active hyperlinks: clicking on the page number will take you to the relevant page.

Many internet references in this book work as active hyperlinks. Try, for example, clicking this link G If my home page doesn’t appear, you’re either reading this as hard copy, or you’re not connected to the internet, or someone has removed my web site, or you’re using book-reading software that doesn’t support hyperlinks in PDF files (see under ‘Formats, platforms and devices’ on page 29). This same proviso applies to internal page references inside the book.

Internet references

To save space in the References Appendix (‘RefAppx’) and footnotes, URLs are shortened, where possible, by replacing the internet address prefixes http://, http://www. etc. with the internet download icon G. Dates of access to internet sites are reduced to six-digit strings in square brackets. For example: ‘G [120704]’ means a visit to my home page ( on 4th July 2012.

YouTube references are reduced in length from 42 to 13 characters by using the unique 11-character code appearing in their absolute URL address, preceded by the YouTube icon ‘E’. For example: (42 chars.)

is rendered as just ‘ E msM28q6MyfY’ (symbol + 11 chars). Most of these YouTube references are active hyperlinks.

Publication issues

Formats, platforms and devices

This version (1.0) of the book cannot be read as hard copy unless you print it out. If response to this publication is favourable I’ll correct errors sent to me by readers, provide the book with an exhaustive index, and arrange for it to be available as a print-on-demand item.

Since the functionality of relevant software and hardware varies considerably and is in a constant state of change, information about devices, file formats, book-reading apps, etc. is given on line. You are advised to consult ‘Publication format and devices’ at G [120707]. In September 2012 page numbering and hyperlinks (both inside this book and to the internet) were fully functional using Adobe Reader X version 10.1.3 on both PC and Mac, and Adobe Reader version 10.2.1 on an Android tablet.

Caveat about internet references

Please be aware that material on the internet can be deleted, moved, renamed, or removed for any number of reasons. Inaccessibility of internet material referred to in this book is due to circumstances beyond my control as simple author/editor. I cannot guarantee the functionality of any such reference.

If you’re using a tablet to read this, you may also occasionally receive the error message ‘The author has not made this [content] available on mobiles’. You can either not bother about the reference or view it on a computer instead. Another error message might be ‘Access may be forbidden’. This usually turns up when the reference is to a pay-for-knowledge site of the JStor type. I’ve tried to keep reference to such sites to a minimum. Here again, you can either ignore or use a computer in an institution that can afford a JStor-type subscription.

Sometimes you may find that a video or audio hyperlink doesn’t play. That’s usually because they’re in a format for which your computer or tablet has no plug-in. Please also note that compacted files on the internet (e.g. zip format) usually need to be first downloaded to your device and to be opened in other software than the one you are using to read this book.


Many of the musical and audiovisual works referred to in this book have at one time or another been issued commercially. It would in the early 1990s have been absurd to expect readers to have access to more than a very small proportion of those works. In 2012, however, it is in most cases a very simple matter if you know where to look. Fearing prosecution for inducement to illegal acts, I can’t be more precise here than to say that there are several well-known websites where you can hear the majority of recorded works, audio or audiovisual, I refer to in this book. Some of those sites are pay-per-download and legal, some are legal and free, while other free sites may have posted recordings illegally. This much I can say: an online search for Police "Don’t Stand So Close To Me" (with the quote marks) produced 32,200 hits [2009-06-13], the first two of which, when clicked, took me to actual online recordings (on YouTube) of the original issue of the tune (Police, 1980). Using the on-screen digital timecode provided by YouTube, I was able to pinpoint the radical change from verse to chorus at 1:48. The whole process of checking a precise musical event in just one among millions of songs took less than a minute. Please be aware that while it is not illegal to listen to media posted on line, downloading works under copyright without permission or payment most probably is. I have, for the reader’s convenience, included many references to YouTube postings or to postings on my site. These references are mainly to two types of work: those in the public domain or which I’ve produced myself, and those which were, at the time of publication and to my knowledge, unavailable or otherwise not readily accessible. If you find any such reference to be in breach of copyright legislation please inform me (G and I’ll either take it down, delete the reference or contact my legal advisor. For more on publishing knowledge about music in the modern media, please visit G


No index is included in this online version of this book because words, concepts and names can be more easily found using your PDF-reading software’s Search option. Any future hard-copy version (see p. 29) will include an extensive index featuring page references to all proper names appearing in the book, i.e. to authors, editors, performers, composers, etc., as well as to titles of musical works, songs, tracks, albums, films, TV productions and so on. The index will also include page references to all topics and to important concepts covered in the book. Page numbers appearing in the index will work as internal hyperlinks (see ‘In-text and footnote source references’, p. 28).



1. A small Tahoma font is used to save space, especially when internet URLs are presented, e.g. G

2. Sans-serif is used for two other purposes: [i] to distinguish computer keyboard input from the words around it, for example: ‘a Google search for Police "Don’t Stand So Close To Me" produced 32,200 hits’; [ii] to distinguish the headings of tables and figures from the surrounding text.

3. Bold Courier lower-case is occasionally used to distinguish note names (a b$ b8 c# etc.) from other uses of single letters, as well as from the very few chord names mentioned in this book and which are given in upper-case serif (B$, C#m7$5, etc.).

4. A phonetic font is occasionally used to indicate the UK pronunciation of potentially unfamiliar words according to the symbols shown in Table P-1 overleaf.

Table P-1. Phonetic symbols for standard southern UK English

A: ah!, harp, bath, laugh, half O hot shot, what, want, Australia

Q hat, cat, map, Africa o: or, oar, awe, war, all, taught, ought

aI eye, I, my, fine, high, hi-fi, why OI toy boy, coil, Deutschland

aU down, about, Bauhaus, bow (bend), now (not know [n9U]),

plough (cf. o: and 9U) ( about, killer, tutor, nation, currant, current, colour, fuel, little, liar, lyre, future, India, confer, persist, adapt

D the, that, breathe, clothes, although, weather (cf. T) (: circumspect, fern, fir, fur, learn,

dZ jazz, John, general, gin, footage, bridge, Fiji, Django, Giacomo (U no, know, toe, toad, cold, bow (knot), although, (cf. aU, o:)

E help, better, measure, leisure S Sean, shirt, station, champagne, Niš

E:0 air, bear, bare, there, they’re tS church, itch, cello, future, Czech háček

EI date, day, wait, station, email,

Australia, patient, hey! T think, throw, nothing, cloth (cf. D)

I it, fit, minute, pretend Y but, luck, won, colour

i: eat, sees, seas, seize, Fiji, email u: food, cool, rule, rude, through, threw

I:) hear, here, beer, pier U foot, look, bush, put

j yes, yak, use, Europe, Göteborg ju: use, few, future, new music, tune

N singing, synchronise, think, gong, incredible, Z genre [!ZA:nr0 Fr. Z1%], decision, measure, garage, Rózsa, Janeiro, Žižek

! = start of stressed syllable ù = long vowel


CAPITALS are in general used according to the norms set out in section 6.9 of Assignment and Dissertation Tips at G

Small capitals are used for five purposes, the first four of which occur in the main body of text, the first two of those deriving from their usage in Lakoff and Johnson (1979).

1. To save space and to avoid having to insert a plethora of hyphens and inverted commas when introducing a short string of words, often adjectivally, to denote an integral concept, for example: The music is music myth is a symptom of dual consciousness.

2. To distinguish between typically authorial words and those of real or imaginary listeners responding to music, for example: it’s essential to know how much Austria rather than, say, Brazil or Japan, and how much shampoo rather than guns or cigarettes respondents imagined on hearing the reception test piece.

3. To highlight an important term introduced for the first time (roman font), or to refer to a term explained elsewhere in the same chapter or in the Glossary (italic).

4. To save page space with frequently recurring capital-letter abbreviations, for example dvd instead of DVD, iocm instead of IOCM.

5. To facilitate quicker identification of alphabetically ordered entries in the Reference Appendix.


Italics are in general used according to the norms set out in Assignment and Dissertation Tips (Tagg, 2001: 49-52) at G

Italics are also used to demarcate longer expressions that for reasons of syntax and comprehension have to be included as part of the sentence containing them and which would be even clumsier if delimited with quotation marks, for instance: ‘you can also refer to musical structures in relative terms, for example the danger stabs just before the final chord, or the last five notes of the twangy guitar tune just before it repeats.’

Timings and durations

Given that most musical recordings exist in digital form, and given that digital playback equipment includes real-time display, the position of events within recordings discussed in this book is given in minutes and seconds. 0:00 or 0:00:00 indicates the start of the recording in question, 0:56 a point 56 seconds after 0:00, and 1:12:07 a point one hour, twelve minutes and seven seconds from the start (see ‘Unequivocal timecode placement’, p. 256, ff.). Durations are expressed in the same form, e.g. 4:33 or 04:33 or 0:04:33 meaning 4 minutes and 33 seconds. To save space, simple timings may sometimes be expressed as follows (examples): 6" = six seconds, 12½" or 12.5" = twelve and a half seconds, 4'33 or 4'33" = four minutes and thirty-three seconds.

Milliseconds are given either as an integer followed by the abbreviation ‘ms’ (e.g. ‘5 ms’ for five milliseconds) or, when denoting exact points in a recording, as the final part after the decimal point following the number of seconds, e.g. 1:12.500 for one minute and twelve point five seconds, or 1:12:05.750 for one hour, twelve minutes and 5¾ seconds.

Frame counts in audiovisual recordings are expressed like milliseconds except that they consist of only two digits and are separated from the seconds count by a semicolon, e.g. 1:12:07;16 = one hour, twelve minutes, seven seconds and sixteen frames. Unless otherwise stated, frames counts are based on the NTSC rate of thirty (29.97) per second.

Date abbreviations

When abbreviated, dates are usually formatted yyyy-mm-dd (e.g. 2011-02-18) in the main body of text. In footnote references and appendices they also appear as yymmdd (e.g. 110218). The date in both cases here is the 18th of February, 2011. The 9th of November 1981 would be 1981-11-09 (main text) or 811109 (references). 2012-09-28, 13:12 2012-09-28, 13:12



1. How much music?

ne crude but effective way of understanding music’s importance is to estimate the amount of time and money the average citizen of the urban West devotes to music on a daily basis.

Time budget

[1] If the TV monitor in the average household is switched on for four and a half hours a day, about 120 minutes of music —mostly as jingles, logos, advertising music, theme tunes and underscore, less often as musical performance or music videos— will pass through the TV’s speakers into its viewers’ ears and brains.

[2] Music heard in shops, boutiques, malls, supermarkets, hotels, bars and lifts (elevators), or at religious and sporting events, or at the dentist’s, or in public spaces like airports and railway stations, or at the cinema, or in the theatre, occupies roughly thirty minutes a day in the life of the average citizen of the urban West.

[3] Some people wake up to a clock radio, some listen to weather and traffic reports and some just keep the radio on in the background for large parts of the day. Another thirty minutes per day seems a reasonable estimate here, given that most radio time consists of music between bouts of news and weather.2

[4] Some people are exposed to music all day in their place of work, others aren’t. Another average of thirty minutes per day would hardly be an excessive estimate for this source of music.

[5] Most people listen to some music of their own choice at home, in the car or on their smartphone. We may also hear music performed at festivals, on the street, in clubs, bars, concert halls, theatres and so on. Many or us sing, whistle or hum in the shower or in the kitchen and parents still sing lullabies and nursery rhymes to their young children. Some of us go to karaoke bars and most of us join in Happy Birthday and other festive songs. Some of us even play an instrument or sing in a choir: if so, we have to practise. These voluntary acts of music will likely account for another average of thirty minutes per person per day.

[6] Young people in the USA spend an hour every day playing computer games with virtually constant audio. If young people constitute one fifth of the population, the average citizen will hear another twelve minutes of music per day while gaming.

[7] If you have to phone a large corporation or public institution, you will, after ‘your call is important to us’, be subjected to hold music before you finally reach a human being. On an average day you will also hear a fair number of mobile phone ring tones, as well as several musical attention-grabbers over P.A. systems in airports or train stations. You may even be within earshot of a belfry or carillon. It’s not unreasonable to estimate an average of another five minutes per day for hold music, ring tones and tonal signals, bell chimes, etc.

Table 1: Estimated average daily dose of music

Source of music Estimated minutes/day

TV, DVD, video, games 120

Shops, bars, airports, etc. 30

Radio 30

Place of work 30

Personal choice 30

Gaming, phones, signals, etc. 17 (12+5)

Total 257 mins. = 4 hrs., 17 mins.



If these figures have any validity, average citizens of the Western world (including babies, pensioners and the deaf, as well as pop fans and music students) hear music for more than one quarter of their waking life. Even if you think these figures are exaggerated, it’s unlikely that any other sign system —the spoken or written word, pictures, dancing, etc.— can on its own rival music’s share of our average daily dose of symbolic perception.


Money budget

Music’s share of our time budget is echoed by its economic importance. Despite doomsday declarations from the industry about the supposedly adverse effects of file sharing, global phonogram sales rose constantly to stay at over $40 billion (US) between 1995 and 2001, since when they have fallen back to 1990 levels of around $25 billion. This recent decline should be seen against the backdrop of substantial global increases in the following areas: [1] collection of publishing rights for recorded music; [2] sale of satellite/cable TV services and of computer games, both featuring more than their fair share of music; [3] digital delivery of music, accounting for 29% of industrial revenue in 2010; [4] the recent emergence of live music promotion as the industry’s biggest money spinner (Cloonan, 2011). All of these trends should in their turn be seen in the context of the financial meltdown of 2008 and of the resultant radical reduction of disposable income experienced by citizens of those nations on whose statistics the trends are based. It’s also worth noting that music is an important source of revenue for the national economy of countries like the UK, the USA and Sweden. It can therefore be quite instructive to estimate how much money the average citizen of the industrialised West spends on music.

Let’s say you buy a new sound system for your home every ten years and let’s assume that the music you hear via the TV and DVD equipment you buy every ten years is worth one quarter of the purchase price value. Perhaps you have a mobile phone that plays audio and video, most likely also a sound card and audiovisual playback software on your computer. You may also be among the one in twenty who buys musical instruments, sheet music, etc. and you might be paying for private music lessons. You’ll almost certainly have to buy cables, plugs and batteries for various items of your music equipment and you’ll definitely be paying for the electricity you use to run it all. Estimating all these costs at $3,600 over ten years works out at one dollar a day.

If you still buy recorded CDs, or if you regularly pay to download music files, or if you buy blank CDs or DVDs, or extra memory to store your films and music, you’ll probably be spending about $150 annually ($0.40/day). In addition to that, the share of the money covering music production and copyright costs when you buy or rent a DVD, or when you use pay-per-view, plus whatever musical activities, including public music education, your local and national authorities may see fit to provide or subsidise via taxation and levies, may well account for another $150 annually. All in all that makes another $300 per average year or $0.80 on a daily basis.

Much of our musical spending is indirect. Radio and TV license fees have to cover the costs of broadcasting copyrighted music as a public service while commercial broadcasters pay for the same rights with the money they get from the pedlars of consumerist propaganda who in their turn pass down their advertising costs to those of us who buy the goods or services being marketed. Marketeers use money they get from us to pay radio and TV stations to broadcast music that will make us want stay tuned to whatever channel diffuses their propaganda. This means that whenever we buy something advertised on broadcast media we aren’t just paying for propaganda production: we’re also paying for the very thing that exposes us to their propaganda, i.e. music on our favourite format radio station. It’s very difficult to quantify what proportion of a commodity’s retail price is devoted to its marketing, let alone determine what part of the advertising budget goes to musical production but there is little doubt that the amounts of money passing hands here are substantial.

Every time we visit a café, restaurant, shopping mall, hospital, railway station, etc. where piped music is publicly diffused, the costs of licensing that music are once again passed down to the customer or user. Every time we visit a bar or club featuring live music or a karaoke machine we will either have to pay an entrance fee or more than the usual bar price for drinks. Even mobile phone ringtone rights and telephone hold music costs are ultimately paid for by us, the customers.

Perhaps you are a member of the Lady Gaga or Karlheinz Stockhausen fan club, in which case you might buy a T-shirt or other merchandising memorabilia. Add to these indirect payments for music the possibility of two visits each year to musical performances in a concert hall, theatre, opera house, entertainment complex or sports arena, plus your travel expenses for getting to and from the venue, and we are looking at another estimated $250 each year or $0.70 a day.

In short, we probably spend on average the best part of $900 each year on music, the equivalent of about $2.50 each day. In January 2007, $2.50 was roughly what you would pay in Canada for a standard loaf of bread or for a litre of milk.



If music is as important as the descriptions just presented suggest, why does it so often seem to end up near the bottom of the academic heap? The short answer is that education and research (including this book) are largely language-based while music is a non-verbal system for mediating ideas. We may like to talk enthusiastically about our musical experiences and tastes but we are often at a loss when it comes to explaining why and how which sounds have what effect.

‘Why and how does who communicate what to whom and with what effect’ is of course the million-dollar question of semiotics and much of this book will suggest ways of tackling that question in relation to music. Still, before launching into the treacherous waters of music semiotics it’s essential to establish a workable definition of the word music according to its use in contemporary Western culture. We at least need to know what sort of boat we’re in before navigating those troubled seas, because some of our difficulties about explaining music come from culturally specific assumptions about its very nature. 2012-09-28, 19:01

_________________________________________________________ 2012-09-28, 19:30


2. The most important thing…

USIC as a ‘universal language’, as the ‘language of love’, or as the ‘natural expression of feelings’, or as an art transcending the sordid social realities of everyday life, or as auditory icing on the verbal-visual-numerical cake of logic and the material sciences… Those are just five of the more colourful notions of music that I’ve heard in the cultural environment in which I was brought up and have lived. Glancing through the estimates of music’s everyday importance (pp. 35-40), it’s clear that those assumptions about music won’t be much use in explaining how and why, in the everyday reality of most people living in this media-saturated society, music communicates what to whom with what effect. Indeed, to avoid confusion in what follows I’ll need to come up with a much more prosaic working definition of music so that readers will know what at least I mean by the word. The trouble is that defining music tout court would be an intellectually reckless undertaking. Therefore, please note that in what comes next I am not trying to describe what I think music means globally, nor what I think it ought to mean in general, nor what it meant in times gone by. No, my working definition of music and the axioms following it are no more than an attempt to distil the essence of the sorts of thing music seems to mean in the cultural environments with which I am familiar and where I’ve worked as a musician or music teacher.

Definition and axioms

In this book, music will be understood as that form of interhuman communication in which humanly organised non-verbal sound can, following culturally specific conventions, carry meaning relating to emotional, gestural, tactile, kinetic, spatial and prosodic patterns of cognition.

That rather convoluted working definition can be made clearer with the help of the following eight axioms.

1. Music cannot exist unless it’s heard or registered by someone, whether out loud or inside someone’s head.

2. Although the original source of musical sound does not have to be human, music is always the result of some kind of human mediation, intention or organisation, typically through production practices like composition, arrangement and performance. In other words, to become music, one or more humans has/have to organise sounds (that may or may not be considered musical in themselves), into sequentially and synchronically ordered patterns. For example, the sound of a smoke alarm is unlikely to be regarded in itself as music, but sampled and repeated over a drum track, or combined with sounds of screams and conflagration edited in at certain points, it can become music.

3. If points 1 and 2 are valid, then music is a matter of interhuman communication.

4. Like the spoken word, music is mediated as sound but, unlike speech, music’s sounds don’t need to include words, even though one of the most common forms of music making entails singing, chanting or reciting words. Another way of understanding the distinction is to remember that while the prosodic, or ‘musical’ aspects of speech —tonal, timbral, durational and metric elements such as inflexion, intonation, accentuation, timbre, speed of delivery, timing, periodicity, etc.,— are all important to the communication of the spoken word, a wordless utterance consisting only of prosodic elements ceases by definition to be speech because it has no words: it’s much more likely to be understood as music.

5. Although closely related to human touch, gesture and movement —dancing, marching, strolling, jumping, hitting, tapping, shaking, breathing, blowing, stroking, scraping, wiping, etc.—, human touch, gesture and movement can exist without music even if music cannot be produced without the mediation of some sort of human touch, gesture or movement (even at the computer keyboard).

6. If points 4 and 5 are valid, music is no more equivalent to touch, gesture or movement than it is to speech, even though it’s intimately associated with all four.

7. If music involves the human organisation and perception of non-verbal sound (points 1-6), and if it’s closely associated with touch, gesture, movement and prosodic aspects of speech, it is close to preverbal modes of sensory perception and, consequently, to the mediation of somatic (corporeal) and affective (emotional) aspects of human cognition.

8. Although music is a universal human phenomenon, and even though there may be some general bio-acoustic universals of musical expression (p. 47, ff.), the same sounds or combinations of sounds are not necessarily intended, heard, understood or used in the same way in different musical cultures (Tenet 3, p. 47).

In addition to these eight axioms it’s important to posit three more tenets about the concept of music.

Tenet 1. Concerted simultaneity and collective identity

Musical communication can take place between: [1] an individual and himself/herself; [2] two individuals; [3] individuals within the same group; [4] an individual and a group; [5] a group and an individual; [6] members of one group and those of another.

Particularly musical (and choreographic) types of communication are those involving a concerted simultaneity of sound events or movements, that is, between a group and its members, between a group and an individual or between two groups. While you can sing, play, dance, talk, paint, sculpt and write to or for yourself and for others, it’s very rare for several people to simultaneously talk, write, paint or sculpt in time with each other. In fact, as soon as speech is subordinated to temporal organisation of its prosodic elements it becomes intrinsically musical, as is evident from the choral character of rhythmically chanted slogans in street demonstrations or in the role of the choir in Ancient Greek drama. Thanks to this factor of concerted simultaneity, music and dance are particularly suited to expressing collective messages of affective and corporeal identity of individuals in relation to themselves, to each other, and to their social, as well as physical, surroundings.

Tenet 2. Intra- and extrageneric

Direct imitation of, or reference to, sound outside the framework of musical discourse is relatively uncommon in most Western musics. In fact, musical structures often seem to be objectively related to either: [a] their occurrence in similar guise in other music; or [b] their own context within the piece of music in which they (already) occur. At the same time, it’s silly to treat music as a self-contained system of sound combinations because changes in musical style are often found in conjunction with (accompanying, preceding, following) change in the society and culture of which the music is part.

The contradiction between music refers only to music (the intrageneric notion) and music is related to society (extrageneric) is non-antagonistic. A recurrent symptom observed when studying how musics vary inside society and from one society to another in time or place is the way in which new means of musical expression are incorporated into the main body of any given musical tradition from outside the framework of its own discourse. These intonation crises (Asafyev, 1976: 100-101) work in a number of different ways. They can:

• refer to other musical codes, by acting as social connotors of what sort of people use those ‘other’ sounds in which situations, for example an ‘ethnic’ flute in the middle of a piece of mainstream pop or a ‘pastoral’ drone inserted into a Baroque oratorio;

• reflect changes in sound technology, acoustic conditions, or the soundscape, as well as changes in collective self-perception accompanying these developments, for example from clavichord to grand piano, from bagpipe to accordion, from rural to urban blues, from rock music to electronic dance music;

• reflect fluctuations in class structure or other notable demographic change, such as reggae influences on British rock; or the shift in dominance of US popular music (1930s - 1960s) from Broadway shows to the more rock-, blues- and country-based styles from the US South and West;

• act as a combination of any of the three processes just mentioned.

Tenet 3. Musical universals

Cross-cultural universals of musical code are bio-acoustic. While such relationships between musical sound and the human body are at the physical basis of all music, the majority of musical communication is culturally specific. The basic bio-acoustic universals of music can be summarised in the following four relationships:

• between [1] the rate[s] at which notes or groups of notes are presented (pulse, surface rate, accentuations etc.) and [2] rates of heartbeat (pulse) or breathing, or footsteps when walking or running, or other bodily movement (shaking, shivering, waving, pulling, pushing, etc.). Put simply, no-one can musically relax in a hurry or stand still while running;

• between [1] musical loudness and timbre (attack, envelope, decay, etc.) and [2] certain types of physical activity. This means no-one can make gentle or ‘caressing’ kinds of musical statement by striking hard objects sharply and that it’s counterproductive to yell jerky lullabies at breakneck speed. Conversely, no-one is likely to use smooth phrasing or soft timbres for hunting or war situations because those involved will be too relaxed to do their job;

• between [1] speed and loudness in the presentation of notes and [2] acoustic setting. Quick, quiet notes are indiscernible if there is a lot of reverberation while slow, long, loud ones are hard to sustain if there is little or no reverb. This is one reason why bands playing venues with different acoustics have to supply their own acoustic space, using adjustable effects for echo, reverb, chorus, etc.

• between [a] musical phrase lengths and [b] the capacity of the human lung. This means that few people can sing or blow and breathe in at the same time. It also implies that musical phrases tend to last between roughly one and eight seconds.

The general areas of connotation just mentioned (spatial acoustics, energy, speed, movement, non-musical sound) are all in a bio-acoustic relationship to the various musical parameters with which they are associated (pulse, volume, duration, timbre, etc.). These relationships may well be cross-cultural, but it does not mean that evaluation of such phenomena as large spaces (cold and lonely versus free and open), hunting (exhilarating versus cruel), hurrying (exciting versus stressful) will also be the same even inside one culture, let alone between cultures. That’s because the musical parameters listed as potentially ‘universal’ (pulse, volume, phrase duration, certain aspects of timbre and pitch, etc.) do not include the way in which rhythmic, metric, timbral, tonal, melodic, or harmonic parameters are organised in relation to each other inside the musical discourse. Such musical organisation presupposes some sort of social organisation and cultural context before it can be created, understood or otherwise invested with meaning. In other words, only very general bio-acoustic types of connotation can be considered as cross-cultural universals of music. Consequently, even if musical and linguistic boundaries don’t necessarily coincide, it is as fallacious to regard music as a universal language as it is to say that language is a universal music.

To clarify this essential point about music’s cultural specificity, it’s worth mentioning a little experiment I once conducted at a symposium on cross-cultural communication. I informed thirteen participants, all working in the sphere of immigrant cultures in Sweden, that they would hear eight short examples of music which were ‘all connected to the same thing: an important event in any culture and something which happens to every human being’. The participants were asked to guess what the common denominator might be and, if they could not think of anything, to jot down on a piece of paper whatever mood, type of action, behaviour, images or thoughts the music suggested to them. All eight examples, each taken from a different non-Western music tradition, were connected with death, a universal phenomenon if ever there was because, with the exception of mass casualties in wars, natural disasters etc., the death of virtually every individual is marked by some form of ritual in all cultures. Did the thirteen cross-cultural experts manage to spot death in the music they heard?

Despite the obvious initial hint (‘an important event in any culture and something that happens to every human being’), not a single respondent associated death or anything death-related (wake, funeral, mourning etc.) with any of the eight death-related music examples. True, connotations like complaint, wailing, sadness, serious and suffering occurred in response to two of eight extracts, but the most common descriptions of all the examples had to do with either [1] energetic action or excitement, for example work, war, fighting, hunting, agitation, dancing, adventure, gymnastics; or [2] happiness and celebration, including joy, confidence, feasting, abandon, contentment etc. There was even some love and tenderness as well as one wedding. More significant is perhaps that eleven of the thirteen respondents tried to identify the cultural origin of the music: there were two Africas (plus one jungle), two Arabs (plus one each for bazaar, desert, camels and Yemen), as well as one each for China, Greece, India and Turkey. Clearly, the examples presenting music for funerals, burials, etc. were considered foreign and associated with a variety of moods and events, the vast majority of which have no discernible link with anything ‘death-like’ in contemporary urban Western culture.

Conceptual comparisons

Another way of understanding the Western concept of music is to compare it to different but related concepts in other cultures. Although no human society of which we have any knowledge has ever been without music in the sense defined on page 44, the concept of music is by no means universal. For example, the Tiv nation of West Africa (Keil, 1977) and the Ewe of Togo and Eastern Ghana do not appear to have found it necessary to single out music as a phenomenon requiring a special word any more than the British have needed different words for the three basic types of snow that the Inuktitut language conceptually refines into several subcategories. To be fair, the Ewe do actually use the English word music, but only as an untranslated loan word to denote foreign phenomena like singing church hymns or listening to the radio. The music they make themselves in traditional village life has no equivalent label in the Ewe language. According to a Ghanaian colleague:

‘Vù really means ‘drum’ and há is the word for club or association. A vù há is the club you belong to in the village… Voice is called bá, so singing is vù bá. Vù is used to signify the whole performance or occasion: the music, singing, drums, drama and so on.’

Having no exact verbal equivalent to our ‘music’ clearly does not mean that the culture in question is without music any more than the English language’s lack of verbal equivalent to the Hindi notion of rasa or to the German notion of Weltanschauung means that Anglophones cannot conceive of different types of mood/state-of-mind (rasa) or of different ways of looking at the world (Weltanschauung). Nor is a lack of equivalent to our word music connected to village communities in West Africa because the Japanese, with their long-standing traditions of music and theatre in official religion and at feudal courts, did not feel obliged to invent a word equivalent to the European concept of ‘music’ until the nineteenth century. The Japanese translated ‘music’ as ongaku (), on meaning sound and gaku enjoyment, i.e. sounds performed for listening enjoyment or entertainment.

In other words, neither the Japanese nor the Ewe seem to have needed a word for what we mean by ‘music’ until confronted by Europeans and our culture. It must have been strange to meet people like us who treated what we call music as if it could exist independently of a larger whole (drama, poetry, singing, dancing, ritual, etc.), and the Japanese zoomed in on this difference with the word ongaku, identifying the European notion of music as referring to the non-verbal sounding bits of what they themselves considered as part of a larger set of symbolic practices. The Ewe reacted similarly, using the untranslated English colonial word music to label European music which was not an integral part of their own traditional culture and which we Europeans conceptualise as distinct from other related cultural practices.

Both the Ewe (vù) and Japanese (gaku) concepts resemble to some extent that of the Ancient Greeks whose μουσική / mousikē was short for τέχνη μουσική / technē mousikē, meaning the art or skill of all the muses, including drama, poetry and dancing, as well as singing or playing an instrument. The musica of ancient Rome seems to have covered a similarly broad semantic field. However, there appears to have been a gradual shift in the meaning of mousikē and musica in learned circles, so that Saint Augustine (d. 430), worrying about the seductive dangers of music, seems to use musica in our contemporary sense of the word music.

It’s likely that this more restricted meaning of mousikē and musica prevailed amongst scholars and clerics in Europe from the fifth century onwards. Certainly, Arab scholarship of the eighth through thirteenth centuries, on which later European theorising about music is largely based, used the Greek loan word mousikē (al mūsiqā/الموسيقى) to refer to what we mean by ‘instrumental music’ today, not to the gamut of artistic expressions covered by the mousikē of Plato or Aristotle. It should also be noted that Mohammed the Prophet is said to have shown great interest in music and that the Qu'rān itself contains no directly negative pronouncements against music. However, conservative clerics of Islam were later to warn, like St. Augustine, against the alleged evils of music, the main controversy being whether the Prophet’s judgement of ‘poets’, including musicians, in the Qur'ān’s 26th sura referred to music connected to infidel rites or to music in general. The point here is that influential ascetic patriarchs of Mediterranean and Middle-Eastern monotheism were worried about the sensual power of the non-verbal aspect of sonic expression and that they needed a concept to isolate and identify it.

What happens to the word music in the vernacular languages of Western and Central Europe before the twelfth century is anybody’s guess. Perhaps, like old Norse or modern Icelandic, there was a blanket term covering what bards, narrators of epic poetry and minstrels all did. Certainly, the Northern French trouvères and the Provençal troubadours of the eleventh through thirteenth centuries were not only known as singers, players and tunesmiths (trouver / trobar = find, invent, compose) but also as entertainers, jugglers and poets.

Music enters the English language in the thirteenth century via old French, whose musique appears about a century earlier. The arrival of the word in the vernacular of both nations denotes more or less what we mean by music today. It also coincides with the granting of charters to merchant boroughs and with the establishment of the first universities. Unfortunately, there is hardly enough evidence to support the idea that use of the word music in its modern sense connects with the ascendancy of a merchant class, even though the Hellenic period, Arab mercantile hegemony, and the ascendancy of the European bourgeoisie, all seem to feature the new concept. Whatever the case, the European ruling classes were able to use the word music in its current meaning well before the eighteenth century: the semiotic field had been prepared by ecclesiastical theorists who had, by the eleventh century, established a metaphysical pecking order of musics. This type of hierarchy is, as we shall see (p. 84, ff.), important to the development of Romantic notions of music’s supposedly transcendental qualities.

These brief cross-cultural and historical observations about the word music indicate that the concept denotes particular sets of non-verbal sound produced by humans and associated with certain other forms of symbolic representation, sounds which relate enough to physical and emotional aspects of human experience to be considered disconcerting by ascetic clerics. The question is: which ‘sets of humanly produced sounds’ relate to which other forms of symbolic representation? One answer to that question is provided by theories of human evolution.



Evolution and development

Animal music?

In 1995 a flute made from the femur of the now extinct European bear was found by archaeologists working on a Neanderthal burial site in today’s Slovenia. The flute is between 45,000 and 84,000 years old. 50,000 years may seem like a long time but it’s the twinkling of an eye in terms of the evolution of our species: the earliest hominid forms evolved from the higher primates at least 3½ million years ago.

Evolutionist theories of music explain its origins in terms of adaptation, by which is meant the ability of a species to find effective survival strategies by adapting to their environment. One theory is that music derives from the synchronous chorusing of higher primates (Merker, 2000), while another argues that:

‘[I]t is in the evolution of affiliative interactions between mothers and infants that we can discover the origins of the competencies and sensitivities that gave rise to human music.’ (Dissanayake, 2000).

Several other theories stress the importance of what Stephen Brown (2000) calls ‘musilanguage’, i.e. that language and music, both sonic and both neurologically intertwined, stem from a common origin, ‘evolving together as brain size increased during the last two million years in the genus homo’ (Falk, 2000). Like the mother-and-infant theory, this explanation seems quite plausible because both Homo sapiens and neanderthalensis had, if our knowledge of the Slovenian bone flute and other finds of prehistoric musical instruments are anything to go by, started to treat oral language and music as distinct modes of sonic communication. Although neurologically interrelated, these two sonic systems were used for different functions. This aspect of evolution is important because the separation of music and language is often seen, rightly or wrongly, as a trait distinguishing humans from other animals.

One common objection to the theory of distinction between music and language as a basis for understanding the origins of music as a defining trait of human behaviour argues that if we, as humans, say that birds and whales sing, then we are talking about music, simply because that is how we hear it. The sonic habits of humpback whales provide fuel for this argument. As those great mammals migrate or swim around their breeding grounds, they piece together repeated phrases, singing song after song for up to twenty-four hours at a stretch. Humpback whales have a seven-octave range similar to that covered by the piano keyboard, i.e. a range of fundamental frequencies within the limits of what humans can hear, but much larger than the restricted range of pitches the human voice can produce. As the months go by, whales modify their song patterns and most males end up singing the same new song. Humpback whale song also contains rhythms and phrases which, strung together, build forms of a length comparable to ballads or symphonic movements. It also seems that their songs contain recurrent formulae which end off different phrases in much the same way as we use rhyme in poetry; in fact the more elaborate the whale’s song pattern, the more likely it is to ‘rhyme’.

All these traits of whale song come across as typically musical to the human ear. But the ‘music’ of the animal kingdom does not stop there. Certain insects produce distinct rhythmic patterns which, like those of human music, vary and repeat in longer patterns. Moreover, eleven percent of primate species can produce short strings of notes that, though less musical to our ears than the songs of humpback whales, form a recognisable pattern in time. This behavioural trait, characteristic for most of our own music, is thought to have evolved independently four times within primates. Such evidence suggests that music is not exclusive to the human species.

One problem with the objections just raised is that they are anthropomorphic in that they interpret non-human behaviour on the basis of human experience, perception and behaviour. The animals make music standpoint assumes, in other words, that the whales, insects and primates just mentioned hear and react to the sounds they make themselves in the same way that we hear and react to them. It also assumes that animals produce those patterns of sound for the same reasons as we make what we hear as comparable patterns of sound in our music. For example, although we hear birds as the greatest songsters of the animal kingdom, they don’t necessarily hear or use the melodies we hear them making in the same way as we hear and use melody in the music we make. Ornithologist Eugene Morton puts it this way:

‘Any analogy to human music is not interesting to me. It doesn’t explain anything about how the world is, except how humans want to perceive it. Good on ’em, but I want to understand animals… Birdsong constitutes an avian broadcasting network, letting birds minimise the arduous work of flying about during interactions’.

If singing can replace the amount of flying around birds would otherwise have to do, it’s certainly part of a symbolic system. Instead of physically repelling every potential invader of its own space, a bird can claim its territory by making sounds we call birdsong. Instead of flying round to see if local members of the family are all there before they shut down for the night and that they are all there again in the morning, an individual bird can join in the evening and dawn choruses. Birdsong is in other words a strategy for the survival of individuals within the group, because they all have to have a place to nest, and for the group as a whole, because they may all need to collect for foraging or migration. It seems that singing is just an energy-efficient way for birds to establish these relations essential to their survival.

It would in a similar way be unrealistic to expect whales, who have to cover huge distances in search of food but reconvene for breeding, to keep visual or tactile underwater checks on the whereabouts of each other, as individuals or as family groups, across vast stretches of ocean. In this sense, whale song, by replacing tactile and visual contact with sonic communication, also acts symbolically to facilitate the social cohesion necessary for the survival of their species. It’s also highly probable that the various functions of sonic communication in the animal kingdom are linked with what we humans might qualify as pleasure and pain, tension and relaxation, etc., i.e. with what we think of as emotions and which are essential ingredients in the evolutionary process of most sentient beings. If such ‘emotions’ are linked to situations in the animal kingdom where what we hear as their ‘music’ is used to signal messages we might understand verbally in terms like get off my property! or it’s OK, we’re all here, then it’s also probable that the sounds in question are accompanied by patterns of hormone production comparable to those found in humans when stimulated in certain ways by certain sounds in certain situations.

If there is any grain of truth in the line of reasoning just presented, there may be grounds for calling that animal ‘music’ music. After all, such an argument would go, what we have described tallies well with the seventh of our eight axioms about music, with our observations about ‘concerted simultaneity and collective identity’, and with several other points mentioned earlier (p. 44, ff.).

Whether or not zoomusicologists can demonstrate a separation between music and other forms of sonic communication produced by non-human animals, the point here is that we humans seem to have done so for at least 100,000 years. One sound-based symbolic system (language) is more suited, though not wholly dedicated, to the denotation of objects and ideas, while the other (music) is more closely, though not entirely, linked to movement, gesture, touch and emotion (axiom 4, p. 44). As stated earlier, language and music, both neurologically intertwined and both using the sense of hearing, seem to stem from a common origin, evolving together as brain size increased during the last two million years of evolution in the genus homo. However, even though the oldest musical instrument found so far may be from a Neanderthal burial site, it’s after the demise of our Neanderthal cousins some 50,000 years ago that we start to leave significant numbers of complex sonic objects behind us.

To summarise: the separation of sonic representation into two distinct but physically related spheres of activity —language and music— almost certainly started evolving in our hominid ancestors and developed further when we became the only surviving species of the genus. Cross (1999) goes as far as to suggest that this distinction between language and music may be the most important thing humans ever did. I’ll return to this point after the next section which deals with music’s importance for another fundamental aspect of human development.

Music and socialisation

At the age of minus four months most humans start to hear. By the time we enter this world and long before we can focus our eyes on objects at varying distances, our aural faculties are well developed. Most small humans soon learn to distinguish pleasant from unpleasant sounds and most parents will witness that any tiny human in their household acts like a hyperactive radar of feelings and moods in their environment. You know it’s no use telling baby in an irritated voice ‘Daddy’s not angry’ because the little human sees straight through such emotional deceit and starts to howl.

But baby’s hearing isn’t what most parents notice first about sound and their own addition to the human race. They are more likely to register the little sonic terrorist’s capacity to scream, yell, cry and generally dominate the domestic soundscape. Babies are endowed with non-verbal vocal talents seemingly out of proportion to other aspects of their size, weight and volume: they appear to have inordinate lung power and unfailing vocal chords capable of producing high decibel and transient values, cutting timbres and irregular phrase lengths, all communicating messages that parents interpret as I’m uncomfortable or I’m irritated or I’m in pain, or I’m hungry, messages demanding action such as change my nappies! or comfort me! or provide immediate nutrition! Maybe these tiny humans have to yell not just because they can’t speak but also because they need to dispel whatever state of adult torpor we happen to be in while watching TV, chatting, reading or, worst of all, sleeping. Babies instinctively use sharp timbres at high pitch and volume, sounds that carry well, cutting through whatever ambient hum and mumble there may be in the adult world, be it idle conversation, background media, fridges, ventilation, etc. Also, irregular rhythms and intonation by definition avoid the sort of repetition that can gradually transform into ambient (background) sound: a baby’s yell is always up front, foreground, urgent, of varying periodicity and quite clearly designed to shatter whatever else mother, father, big sister or big brother is doing. That sonic shattering is designed to provoke immediate response. Desires and needs must be fulfilled now.

Now is the operative word here. Sonic statements formed as short repetitions of irregularly varying length are also statements of urgency, as well we know from news jingles — important, flash, new, the latest update. Babies seem to have no conscious past or notion of future: all is present. The baby’s lack of adult temporal perspective in relation to self is of course related to its lack of adult senses of social space, which, in its turn, relates to baby’s egocentricity, essential for survival in the initial stages of its life.

Non-verbal sound is essential to humans. We monitor it constantly from inside the womb until deafness or death do us part from its influence. We use our non-verbal voices to communicate all sorts of messages from the time we are born until we die or turn dumb. Together with the sense of touch, non-verbal sound is one of the most important sources of information and contact with social and natural environments at the most formative stages of any human’s development. It’s vital to senso-motoric and symbolic learning processes at the preverbal stage of development and central to the formation of any individual’s personality. Then we all have to experience the process by which we gradually learn that we are not the centre of others’ constant and immediate attention: we have to get used to being just one human subject and social object among many others. We have to have some sort of working relationship with whatever society and culture we belong to and we cannot live in the vain hope of returning to a state where we are the sonically dominant or foreground figures. We can never regain any lost paradise, whatever advertisers, spin doctors, religious fanatics or drug-peddling pharmaceutical corporations might have us believe.

Different cultures and subcultures develop different norms for what course the process from baby via child to adult should run. The ultimate goal —becoming a fully functioning adult— depends on whatever the society in question at any given time sees as desirable on account of its material basis and cultural heritage. Assuming we have all been babies and if baby’s power over the domestic soundscape in the early development of every human is a biological necessity that must be relinquished for that individual to survive among fellow humans in adulthood, then we ought to gain important insights into how any culture works by studying patterns of socialisation that relate directly to non-verbal sound.

Humans can emit an enormous variety of non-verbal sounds. We breathe, talk, cry, shout, yell, call, sob, sigh, laugh, giggle, burp, fart, crunch, slurp, gulp, swallow, yawn, groan, moan, growl, cough, splutter, slobber, wheeze, sniffle, sneeze, kiss, hiss, snort, spit, scratch our heads, smack our lips, blow our noses, clear our throats, cough up phlegm, etc. Our hearts beat, tummies rumble and intestines gurgle. We make noise, however weak or strong, whenever we move our bodies —when we sit down or stand up, walk, run, stroll, tiptoe, limp, jump, hop, skip, drag our feet, stumble, fall, etc. We also shudder with fear, tremble with delight, or shiver with cold so that our teeth chatter. We make sound when we hit, kick, drag, push, cut, tap, pat, clap, caress, chop, saw, hammer, grind, scrape, slap, splash, smash, etc. Some of these sounds are loud, others soft; some are heavy, others light; some are fast, others slow; some are high-pitched, others less so; some are long or ongoing and repetitive, others short and discrete and so on. All these humanly produced sounds are made within a context that is itself full of sound. In urban industrialised societies we have fridges, freezers, computer drives, traffic, aeroplanes, mains hum, air conditioning and all sorts of other mechanical sounds. Elsewhere we may be able to hear wind in the trees, rain, sea swell, animals, birds, insects, running water, thunder, earthquakes, ice breaking, crisp or slushy snow under foot, waves breaking on the shore, etc.

Some of these sounds we make ourselves, others we just hear in a wide variety of acoustic settings, including those inside our own heads and bodies. Which (combinations of) sounds are evaluated as pleasant and unpleasant, which ones are deemed to be part of music and which ones not, will largely depend on the culture we belong to and on what sort of motoric and sonic behaviour prove to be generally compatible with the needs of that community, be it a youth subculture in late capitalism or a nomadic people using neolithic technology.

All of us have been babies who have had to learn that we cannot for ever remain at the centre of the world around us, acoustically or otherwise. We have to learn to cooperate, to negotiate social space for ourselves in relation to the community we belong to. Music and dance provide socially constructed sonic and kinetic frameworks for that learning process. We learn to sing, hum and whistle in accordance with the norms of what our culture regards as music, rather than just yelling, laughing, mumbling, or bashing objects at will. As we acquire the gift of language we learn to distinguish between humanly organised verbal and non-verbal sound. More importantly, we are repeatedly exposed, within the music culture to which we belong, to the simultaneous occurrence of certain types of musical sound with certain types of action, attitude, behaviour, emotional state, environment, gesture, movement, personality, people, pictures, words, social functions, etc. From those recurrent patterns of interconnection we construct a vast array of categories combining several of the constituent elements just mentioned into overriding and integral musogenic concepts.

Many of us also go on to learn how to play an instrument as a way of making sound whose functions are clearly different not only to those of spoken language but also to those we make when chopping wood, hammering nails, ironing clothes, doing the washing up, flushing the toilet, taking a shower, walking upstairs, driving a car, eating food, operating machinery, folding a newspaper, closing the door, etc. It would, from the perspectives just presented, be absurd to regard music as some sort pleasant but parasitic appendage to human life, as ‘auditory cheesecake’ as one writer put it.

Cross-domain representation and synaesthesis

There are other reasons for understanding music as an essential part of the survival kit for any human society, not as just cultural icing on the socio-economic cake. Some of these reasons can be summarised in the following simplified terms.

Our capacity as humans to process signals from the world around us via different domains of representation (verbal, visual, motoric, emotional, etc.) seems to have been one of our species’ great advantages in the evolutionary struggle, in that we can sort out abstractions of cause and effect by distinguishing between visual, verbal, sonic and motoric impulses. To put it simply, what we hear at a particular time (a sonic event) does not have to represent the same phenomenon as a movement or emotion we may perceive at that same time.

Of course, such domain-specific signal processing in no way prevents humans from making connections between several simultaneous domain-specific signals if they co-occur on a regular basis. For example, when a loving parent talks in a sing-song voice to a baby while holding and rocking it, the little one receives signals that are at the same time specific to the sonic, motoric and emotional domains of representation. As these combinations of domain-specific signals are repeated, the infant learns to make connections between them so that another, overriding or ‘embodying’ type of representation comes into play. Such combinations of sonic, motoric and emotional signals are sometimes called proto-musical. They also relate to synaesthetic patterns of cognition.

Fig. 2-1. Domains of representation and the ‘embodying’ cross-domain level

The specific domains relating to (proto-) musical representation, shown in figure 1-2-1, partially overlap and need some explanation.

1. The physical domain covers the ballistics, trajectory and kinetic relationship of a body (or bodies, including one’s own) to the type of space through which it travels or in which it is motionless. Fast or slow, jerky or smooth, regular or irregular movement, or no movement at all, in an open or closed space; movement which arrives or leaves within that space, towards or away from a point inside or outside it, movement which waits or passes over or under, up or down, to the left or right, to the back or front, to and fro or in one direction, suddenly or gradually: these aspects of movement and space, when performed by a human, are all part of the physical domain of representation. It also includes the presentation of some aspects of heaviness or darkness and lightness, of density and sparsity, as well as of multitude and singularity.

2. The gross motoric domain of representation involves the movement of arms, legs, head, etc., e.g. walking, running, jumping, dancing, pushing, pulling, thrusting, dragging, waving, rolling, hitting.

3. The fine motoric domain of representation involves the movement of fingers, eyes, lips, mouth, throat, etc. Blinking, glittering, shimmering, rustling, babbling, clicking, tapping, fiddling, dripping, etc. all exemplify movement requiring fine motoric representation.

4. The linguistic domain is mainly concerned with prosodic patterning, with the ‘musical’ elements of speech, i.e. with intonation, timbre, accentuation, rhythm, dynamics, etc., including the sonic characteristics of vowels and consonants.

5. The social domain involves the representation of patterns of human interaction, for example of individuals to a group or vice versa. As we’ll see later, particular strategies for structuring musical parts or voices can correspond to particular socialisation patterns.

6. The emotional domain is self-evident. It involves evaluating a situation in response to different body states such as posture, muscular tension or relaxation, hormonal stimulation, adrenalin count, etc. It includes evaluation of experience whose verbal conceptualisation is often formulated in polarities like pleasing/painful, happy/sad, beautiful/ugly, love/hate, security/threat, etc.

It should be clear that these six domains of representation are in no way mutually exclusive. For instance, it’s impossible to imagine a gross motoric activity like dragging (domain 2) without considering bodily movement in space and aspects of heaviness (domain 1). Moreover, any aspect of the emotional domain needs to be qualified by aspects from other domains. For example, is the expression of pain sharp and sudden? Is it relentless, throbbing and ongoing, or is it stifled in the background? Does the pain come in gradual waves or as violent shocks? Does it make you quiver, shudder, jump, fall over, fall apart, yell, scream, groan or grumble? Or does it hit, stab, pierce or poison you? Or does it make you depressed and apathetic? Is the pain repressed and under control, or is it up front and violent? Perhaps it paralyses or silences you altogether? Is it the pain of a solitary individual or does it more closely resemble a community of suffering?

Proto-music’s six domains of representation also overlap in terms of synaesthesis. For example, some onomatopoeic pairs, like babble and bubble or rumble and tumble, are normally, though not exclusively, associated with the sonic and visual/kinetic aspects respectively of the same basic type of movement, as, indeed, are rustle and glisten. Other sonically similar words like bustle, hustle and hassle not only lend themselves to expression in visual or sonic terms: they also include aspects of social interaction and emotional evaluation. It’s the combination of all these aspects that makes those concepts particularly musogenic.

Before going any further in this explanation, I need to clarify that I’m using the noun synaesthesis, not synaesthesia, to denote any normal use of two or more modes of perception at the same time. While synaesthesia is used as a clinical term denoting a neurological condition involving the disturbance of normal perception by the involuntary intrusion of impulses from more than one sensory mode, synaesthesis is no more than a transliteration of συναισθήσις, aisthēsis meaning ‘perception’ and syn = ‘with’, ‘accompanying’, i.e. simultaneous perception in more than one sensory mode. Synaesthesis is therefore not a pathological condition but a normal and essential part of human cognition. The only terminological trouble here is that synaesthesis and synaesthesia both give rise to the adjective synaesthetic. To avoid further confusion, then, synaesthetic will here qualify any type of perception using more than one sensory mode at the same time. In more concrete terms, I will qualify, for example, the combined tactile, kinetic, visual and sonic aspects of babble, bubble, bumble, rumble, crumble, tumble, rustle, bustle, hustle or hassle as synaesthetic because they constitute instances of normally functioning synaesthesis.

To summarise the argument so far, music can, as defined on page 44, be understood as a specifically human type of activity which lets us mix elements from any of the six domains of representation into an integral whole. It’s an activity allowing us to represent combinations of signals from its constituent domains in one symbolic package rather than in merely linguistic, social or somatic terms. As a meaningful system of non-verbal sound, music lets us engage in interpersonal activity on many levels simultaneously, either by making the music or by responding to it individually or together with others. To express ourselves on all these levels at the same time, we don’t need to confront each other with verbal outbursts, bodily display or physical interaction: we can use music instead. In other words, music provides relatively risk-free action to members of the culture producing and using it because it provides socio-culturally regulated forms of potentially risky interaction between humans. But music does more than that: it can also help avoid confusion. Avoid confusion? How can that be when music is so often thought of as ‘polysemic’? I had better explain (see also p. 167, ff.).

Imagine, for example, the not uncommon state of mind characterised by a mixture of irritation or resentment and the feeling that is nevertheless a nice day and good to be alive. Using the linguistic domain, you could communicate this single dynamic state of mind directly to a friend, partner, child, parent, or to the authorities, by expressing first your disapproval, then your generally positive mood. You could start by speaking with sharp timbre and choppy delivery, then switch to a smooth, mellifluous voice. Using the fine motoric domain, you could first frown, then smile; or tap your fingers nervously then flutter your eyelids encouragingly; or grit your teeth then relax your mouth. Socially, you could first avoid the people causing the irritation and then welcome them into your company. Using the physical and gross motoric domains of representation, you’d almost have to first beat up the individual[s] causing the irritation, then cuddle them. Emotionally, you’d probably want to first yell and stamp your feet, then sit down and relax; or perhaps you’d first tense your shoulders and clench your fists, then lean back, open your arms and show the palms of your hands.

Although feeling irritation on a basically good day is hardly a symptom of emotional instability, expressing that dynamic using just one of music’s constituent domains of representation, as described in the previous paragraph, might well come across as contradictory and confused. It might even cause offence, perhaps even provoke a diagnosis of manic depression. However, thanks to its character of cross-domain representation, music is able to mediate that same sort of dynamic as a unified single experience in a socially negotiated and culturally specific sonic form. After all, we seem to readily accept that the single linguistic concept of love involves feelings of vulnerable anxiety and the fear of loss in addition to the occasional, indescribably powerful bout of euphoria. Similarly, it’s impossible for us mortals to entertain the notion of human life without considering death.

These platitudes about love and life serve to illustrate the fact that while language occasionally lets us conceptualise dynamic states of being as integral experiences, music almost always does so. Feeling angry on a good day, or desperately troubled in the midst of calm and beauty, or totally sick of the world and feeling irrepressibly alive because of that disgust— these are no more than inadequate verbal hints of just three of the innumerable kinds of mood categories music can create. We should therefore not be surprised when respected critics describe the first movement of Mozart’s 40th symphony (1788) in terms of both ‘deepest sadness’ and ‘highest elation’. Was Mozart confused when he wrote the music? Probably no more so than usual. Does the music make a confused or contradictory impression? Not to modern European ears: it’s one of the most well-known, highly valued and widely covered pieces in the Viennese classical repertoire. Were the critics confused when they wrote about sadness and elation in the same breath about the same music? No: they, too, were just giving pallid verbal hints of what they felt the music to be expressing.

By combining input from its constituent domains of representation, music forms integral categories of cognition that, from a logocentric viewpoint, seem contradictory or confused, even though those categories may correspond more accurately with what we actually feel or imagine on a daily basis: angry on a good day, troubled in beautiful surroundings, sad and elated, vulnerable and euphoric, etc. This holistic aspect of musical cognition may well be one reason for music’s ability to move us so deeply, sometimes even to occupy our whole sensory being. It may also be one reason for music’s therapeutic usefulness. Furthermore, music helps cognitive flexibility, the ability to mix, switch and correlate across different domains of representation. Viewed from these perspectives, the development of distinctions between music and speech as two different modes of aural expression may well be, as Cross (1999) suggests, ‘the most important thing we humans ever did’.

A quick trip around the brain

The cognitive neuroscience of music tells a very similar story, albeit in a very different way, about the holistic, synaesthetic and cross-domain characteristics of music.

‘As soon as the primary auditory cortex receives a musical signal, our “primitive”, subcortical brain kicks in at once: the cerebellum’s timing circuits fire up to pick up the pulse and rhythm, and the thalamus takes “a quick look”, apparently to check for danger signals that require any immediate action before more complex processing occurs. The thalamus then connects with the amygdala to produce an emotional response — which, if danger is detected, might be fear. Only after this primeval scan to warn of hazards does the detailed dissection of the sound signal begin.’ (Ball, 2010: 244-245; author’s quotes, my italics)

That ‘detailed dissection’ starts when sound signals, sent from the cochlea to the brain stem for initial pitch treatment, are forwarded as information to several different cerebral departments, as follows.

‘[P]itch intervals and melody are processed in the lateral part of […] Heschl’s gyrus, within the temporal lobe, which has circuitry involved in pitch perception. [Pitch intervals and melody] are also handled by the planum temporale, a domain that deals with […] timbre and the spatial location of sound sources, and by the anterior superior temporal gyrus, which deals with streams of sound, including spoken sentences.’ (loc. cit.)

These two citations, the second focusing on pitch and intervals, are just sample sketches of how which parts of the brain deal with which aspects of music. Other aspects of pitch and tonality are missing from the samples, as are most aspects of metre, periodicity, long-term narrative, overall spatiality, compositional texture and patterns of expectation. To cut a long story short, music involves activity in the temporal, frontal and parietal lobes of both brain hemispheres, as well as in the amygdala, hippocampus and cerebellum. Even the occipital lobe is active because playing from sheet music or following a score aren’t the only musical activities involving vision. If you’re dancing to music and want to avoid bumping into other dancers (or the wall); or if you’re reacting to film underscore, or watching musicians or the crowd at a live music event; or if you’re just ‘seeing things inside your head’ as you listen to music, then your occipital lobe will also be in action. Put tersely, one of the most striking neurological features of musical experience is that neurons fire up all over the brain.

It should at the same time be obvious that the brain is not some sort of computer hardwired to process the same sound signals in the same naturally predetermined way in each individual or group of individuals living anywhere in the world at any time in history. However, that is what a small but significant minority of students I’ve met over the years apparently believe. It’s a belief that rests on two assumptions: [1] since experience of music affects body and emotions without seeming to involve much, if anything, by way of intellectual reasoning, it is intuitive; [2] if the process is intuitive it is also instinctive and therefore natural. The first assumption is not unreasonable but the second is fundamentally flawed because intuition and instinct aren’t the same thing. Instinct is an ‘innate, fixed pattern of behaviour in response to certain stimuli’, which by definition (‘fixed’, ‘innate’) involves natural hardwiring, whereas intuition, defined as ‘immediate apprehension by the mind’, clearly does not (‘apprehension’, ‘mind’). In fact, many of the intuitive skills we possess, like casual everyday speech in our mother tongue, are culturally specific. When it comes to musical intuition, I’m certainly not the only individual never to have learnt the intuitive skills involved in reaching the ‘natural’ state of trance experienced by those familiar with the sounds of shamanistic ritual or with the singing of worshippers possessed by the Holy Spirit in extremist sects of evangelical Christianity. Nor can I sing in one metre, clap my hands in another and walk in a third like many Sub-Saharan Africans, whose intuitive ability to do so ‘comes naturally’, but no more nor less so than mine when I immediately and with no conscious effort recognise harmonic closure in a V-I cadence. The difference between those intuitive skills is a result of nurture, not nature.

Of course, none of this means that there is no musical instinct in humans any more than our inability to speak or understand most of the world’s languages means that humans aren’t hardwired for language. It simply means that what seems to come naturally to us in music —what we hear as pleasant or unpleasant, appropriate or inappropriate, either together or in sequence, and for which purposes— cannot be explained in terms of innate instinct, genetic make-up or any other sort of hardwiring. That should be patently obvious, not just from observable differences between ‘what comes naturally’ in one music culture and in another but also by paying attention to how the brain actually works, to how it lets us think, learn, adapt, discover and innovate. The brain is like an amazingly complex and dynamically adaptive organism, not a crippling corporate or state bureaucracy. Plasticity is one of its defining features. We would have never survived, let alone evolved, as a species if the brain had been a glorified operating system running bio-behavioural software. Instead, it enables us, whatever our genetic inheritance, to survive and flourish in the vast variety of changing situations in which we have to live and learn from birth to death. If anything is natural —and wonderful— about how the brain deals with music, it’s the way it lets us all experience so many parts of our being at the same time, whatever our predispositions and circumstances.

Emotion, mood and metaphor

Before rounding off this chapter, two common assumptions about music need to be addressed: [1] music expresses moods and emotions; [2] music cannot be described in words. Neither assumption is wrong: it’s just that it’s misleading to reduce our understanding of music to those general assumptions alone. There’s no room here to discuss these issues in any depth but I’ll try to pinpoint some crucial conceptual problems and, where possible, suggest ways of dealing with them.

First let’s confront the notion that music expresses the feelings of the artist. Tchaikovsky certainly did not think so.

‘Those who imagine that a creative artist can —through… his art— express his feelings at the moment when he is moved, make the greatest mistake. Emotions, sad or joyful, can only be expressed retroactively.’

I’d go further than Tchaikovsky because I don’t think you need to have felt the emotion previously to present it convincingly. After all, a good actor playing a dastardly Richard III or a psychotic Hitler does not himself need to have ever felt or behaved like those villains to elicit emotions of disgust or horror in his audience. Similarly, as Frith (2001: 93-94) noted, the applause for Elton John’s rendition of Candle In The Wind at Princess Diana’s funeral was not ‘for being sincere… (his business alone)… but for performing sincerity… [It was] a performance of grief in which we could all take part’.

In order to convincingly communicate a sense of grief, loneliness, joy, contentment, or whatever other state of mind is required, the musician (composer, performer, etc.) must first be in some way aware of that state of mind: it has to be felt and appropriated before it can be presented in a culturally competent form that can be understood by an audience. If you ever had to rush to be on time for an engagement to sing or play at a funeral, and if one of the deceased’s nearest and dearest thanks you afterwards for ‘the beautiful music’ before you hurry off to another appointment, you’ll know exactly what I mean. If you had expressed your own feelings through music at the funeral you would have shown total disrespect for both the bereaved and deceased. If you can’t come up with something suitably dignified and moving for a funeral, however you might be feeling yourself, you’re simply not doing your job.

Viewing musical competence in this prosaïc way is useful because it lets us distinguish between emotion and the representation of emotion, none of which means that the artist’s composition or performance is fake. It’s simply a presentation, based on either memory and retrospection or on empathy, sensitivity, imagination and skill. That presentation process also involves some distancing from the emotion or mood in question because it needs to be identified and grasped conceptually —almost always in intuitively musical rather than in consciously verbal terms— before it can be packaged in a culturally viable form.

Having made this cardinal distinction between emotion and the musical representation of emotion, we are still not much wiser about differences of meaning between ‘woolly words’ like emotion, affect, feeling and mood that are commonly used when talking about music and an individual’s internal state of being. Let’s try to unravel some of that ‘wool’.

Emotions are characterised by the involuntary physiological response of an individual to an object or situation relating both to that individual’s physical state and to sensory input. This means there has to be observable response to such stimuli for emotion to exist. That is not so with affect, which can exist and be felt by a subject without concomitant observable emotion. Affect can in that sense be seen as a larger set of phenomena in which emotion is a subset of primary importance.

Feelings are strictly speaking neither emotions nor affects although all three words are often used synonymously. Feelings are the subjective experience of emotion or affect. For example, people in a state of uncontrolled fury, paralysed panic or euphoric ecstasy are overwhelmingly occupied by living out that ‘involuntary physiological response’ (the emotion), but that does not altogether preclude self-awareness, however fleeting it may be, which allows the emotion or affect to be registered by the subject as a feeling.

Mood is usually thought of as an ongoing state of mind —positive or negative, static or dynamic A mood is psychologically more likely to last for hours or even days compared to the mere seconds normally occupied by the expression of an emotion. Perhaps this simple distinction can help us sort out notions of mood and emotion in relation to music. To test that hypothesis, I’ll return to two of the musogenic but ‘inadequate verbal hints of musical meaning’ suggested on page 67 —feeling angry on an otherwise good day and desperately troubled in the midst of calm and beauty. It would not be unreasonable to identify mood with the general scene —the good day, the calm and beauty— and emotion with the more explicit state of mind —feeling angry and desperately troubled. One problem with this distinction is that neither good day nor calm and beauty are simply moods without affect or emotion because a good day (rather than a bad one) involves some degree of elation rather than depression, while calm and beauty (rather than stress and ugliness) implies a sense of contentment and wonder (rather than of frustration and indifference). Meanwhile, the otherwise and the in the midst of both imply that the listener providing the ‘inadequate verbal hints of musical meaning’ hears the music in both cases as each presenting a different type of emotion felt by the same subject. It’s the same person feeling both angry and happy, just as the person feeling both troubled and the effects of great calm and beauty is also one and the same. This ‘multi-affect, single-subject’ conceptualisation would certainly fit the third of page 67’s ‘inadequate verbal hints’ —sick to the teeth of the world and feeling irrepressibly alive because of that disgust. Does this mean that a musical mood is a combination of musically encoded emotions or affects? Or are emotions in music, like melody in comparison to speech, extended to last long enough to become a mood? Or is there a deeper problem preventing us from distinguishing between mood and emotion in music?

The underlying difficulty is, however tautological it may sound, that words denote states of mind in logogenic, not musogenic, terms. A brief scan of mood categories for silent film or in library music catalogues reveals this problem clearly. Some musical mood labels denote emotions (joy, sadness, etc.), but others use demographic, ethnic or geographical categories (children, Gypsy, Russia, etc.), or generic locations (sea, wide open spaces, laboratory, 1960s, etc.), or types of activity, social function or ceremony (battle, sport, funeral, etc.), or generic movement (action, tranquillity, flying, etc.), or narrative genre (crime, science fiction, etc.), or episodic function (intro, bridge, ending, etc.), or musical style and genre or instrumentation (classical, jazz, electronica, pan pipes, etc.).

‘Emotion words’

The fact that emotion words present just one of several ways of labelling musical ‘moods’ may partly be due to the audiovisual contexts in which silent film and library music are used, but that is certainly not the whole story. The underlying problem with emotion words when talking about music is that they denote states of mind in abstracto. They are not like music which culturally packages an emotion or affect into a performance, live or recorded, through the process just described (pp. 71-72). Instead they do what words are particularly good at: as signifiers they lexically denote their signifieds. It’s in this way that whatever emotion or affect a word denotes can be conceptually distinguished from all the gesturality, spatiality, tactility, temporality and kinetics that are part of the ‘physiological response’ that is by definition emotion (p. 72) and which is intrinsic to musical conceptualisation of that emotion or affect.

Consider, for example, the emotion or affect behind the verbal label ‘joy’. Are we talking about: [a] the joy of a small boy excitedly bubbling over as he plays with a new toy; [b] a calm, confident sense of joy slowly welling up inside someone realising that the end of the tunnel may be in sight; [c] the joy of two young girls giggling as they share an exciting secret; [d] the joy of a large crowd, seen from above in a city square, celebrating liberation from war and oppression; [e] the joy of a parent tenderly cradling his/her new-born baby? Those five ‘joys’ demand very different musics. Some of them will be fast, others slow; some loud, others soft; some gentle and delicate, others energetic and ebullient; some high-pitched, others pitched lower; some rhythmically regular, others irregular; some metric, others rhapsodic; some expansive, others moderated; some private, others public; some outdoors, others indoors in a confined space, etc. Whatever the single word ‘joy’ may mean, it cannot be musical because it gives no hint of the motoric, social, spatial or physical elements that music must by definition contain as a cross-domain, synaesthetic type of human communication which causes neurons to fire up all over the brain. But the logogenic-musogenic contradiction of representing affect can also go the other way.

If ‘joy’ is too general and abstract as a musical mood descriptor, other emotion words can be too precise. There are, for example, clear lexical differences between the following five states of mind: [1] envy —discontentment or resentful longing aroused by another’s better fortune; [2] jealousy —suspicion or resentment of rivalry in love or of another person’s advantages; [3] suspicion —distrust or doubt of the innocence or genuineness of someone or something; [4] guilt —shameful awareness of having done wrong; [5] embarrassment —awkwardness or discomfort in social interaction. The problem with these words is that, —unless we’re talking about a fit of uncontrolled rage of envy or jealousy (in which case the rage itself rather than its causes would be musically important)—, the five verbally denoted states of mind are musogenically very similar. They all involve psychological discomfort linked to bodily postures of defensive containment. The distrust, disgrace or indignity involved to differing degrees in those five states of mind are much more likely to be physically expressed in terms of a motionless body, hunched shoulders, eyes down or to one side, a furrowed brow and sealed lips rather than in effusive gestures, upright body posture, full-on eye contact and expressive speech. Everyday language makes this link quite clear. We say we are paralysed (not liberated) by envy, consumed (not empowered) with jealousy and burdened with (not relieved by) guilt, while we hide or hang our heads in shame and cringe with embarrassment —we literally shrink; we do not stand tall. In short, the musogenic aspect of these five emotion words is in the commonality of ‘involuntary physiological response’ they all share. Precision of musical meaning is more likely to be determined by how much of which sort of paralysis, burden, hiding, hunching or cringing is involved, not in verbal distinctions between the causes of the unpleasantness linked to the bodily postures just described. As with joy, the problem with envy, jealousy, suspicion, guilt, shame and embarrassment is down to the same old tautology: the verbal-lexical precision of words is logogenic, not musogenic. But that’s not all.

Not only are emotion words, as we saw earlier, just one among several types of musical mood label; they are also quite uncommon in silent film and library music collections. Those collections rarely use words clearly relating to the ‘physiological response’ aspect of affect —what defines it as an emotion in terms of an individual’s body posture and movement. There is, so to speak, very little by way of jumping for joy or cringing with embarrassment. None of this means that emotion words and, more importantly, verbal descriptions of body posture and movement are useless when trying to give some verbal indication of a musical mood. It simply means that conventions of musical mood labelling used on an everyday basis in audiovisual production do not seem to give emotion-related words any pride of place. Apart from distinctly musical and episodic labels like classical, jazz, or pan pipes; intro, bridge or ending, and those referring to narrative genres like detective, disaster or documentary, the most common library music labelling categories, many of them overlapping, are those based on the following sorts of distinction: [1] demographic, ethnic, geographical or historical concepts like children, Gypsy, Russia, olden times; [2] generic locations like laboratory, sea, open spaces; [3] types of activity, social function or ceremony like battle, sport, funeral; [4] generic movement like action, tranquillity, flying. I’m not suggesting that these four labelling categories are more important than the verbal designation of emotion, merely that they are more common in a well-established and practice-based convention of musical mood nomenclature.

That observation, together with the problem of logogenic versus musogenic precision, raises an obvious question: why have emotion words like happy, sad, tense and relaxed so often been default description mode for my students when they try to answer the question ‘what do you think the music is telling us here’? Do they think paramusical associations to music are childish? Don’t they know that grown-up professionals in audiovisual media production use the sorts of musical mood labels just mentioned? Do they believe in notions of ‘absolute music’, euroclassical or postmodernist, that are still propagated in many conventional institutions of cultural learning? Or, given that language has words denoting emotion and that music seems to touch our emotions, do they think those emotion words give any real sense of ‘what the music is telling us’, despite the obvious problem of logogenic versus musogenic meaning just discussed?

I try to deal with some of these questions in the next chapter, but it should already be clear that prioritising emotion words at the expense of other types of vocabulary can seriously skew our understanding of what music can and cannot communicate. Not only will we be less able to grasp the prosodic, motoric and kinetic aspects of music’s cross-domain representation that are intrinsic to the types of ‘physiological response’ defining an emotion; we also risk neglecting music’s demonstrable ability to present an infinite range of complex patterns relating to spatiality and tactility, as well as to historical, ethnic and social location.


If, as I’ve argued several times, music could be described in words, it would be unnecessary. But since no human society of which we have any knowledge has ever been without music in the sense defined on page 44, and since one of this book’s main aims is to suggest ways of talking about music as if it meant more than just itself, we will have to find words indicating at least something of its perceived meanings, however inadequate those indications may be. Given the restrictive problems of ‘emotion words’ and of music’s holistic combination of simultaneous modes of expression and perception in specific cultural contexts, it would be logical to talk about the meaning of musical sound in ways that recognise its intrinsic multimodality. This entails considering the synaesthetic and metaphorical characterisation of music less in terms of dubious or fanciful subjectivity and more as a potentially valid mode of providing at least partial clues to its perceived meanings, particularly if, as we shall see in Chapter 6, those clues trace lines of intersubjective consistency.

Metaphors have two poles: [1] a source which acts as a previously known semantic network or model for an analogy; [2] a target on to which that network of meaning is mapped. For example, the target of love is a battlefield and love is a jewel is love but the sources mapped on to that same target are very different. Of course, neither statement is literally true but neither is metaphorically false since the connotative model of both battlefield —victims, pain, destruction, etc.— and of jewel —sparkling, valuable, precious, etc.— can be mapped on to different aspects of love.

A similar sort of mapping is used in suggestive titles given to pieces of library music like Across the Plains, Caresses by Candlelight, Century of Progress, Days of the Roman Empire, Fogbound, Green Heritage, Psychotic Transients, Reactor Test and The Sleepy Cossack. Each piece can be understood as metaphorological ‘target’ and its title as the linguistic ‘source’ embodying the semantic field or network acting as a model for some essential aspect of how the music is perceived. Connotative responses to music work similarly: they supply verbal-visual hints (‘VVAs’) acting as source models whose meaning is mapped on to the music eliciting the response. Verbal metaphors of musical meaning are by definition metonymic. They ‘are not’ the music and do no more than suggest part of its perceived meaning. Even more importantly, they are almost always culturally specific because different audiences belonging to different social groups in different traditions at different times in different places under different conditions cannot be expected to map the same verbal ‘source’ on to the same musical ‘target’. However, the fact that music is not a universal ‘language’ (p. 47, ff.) does not mean that it’s any less universal a phenomenon than (verbal) language. On the contrary, to understand how any music can communicate anything apart from itself it’s necessary to study individual occurrences of musical semiosis in specific cultural contexts. It’s only on that basis that more general patterns of musical semiosis can be extrapolated, some of which may be applicable in a wider cultural context.

In short, verbal metaphors of perceived musical meaning are a useful starting point for anyone wanting to understand ‘how music’s sounds can carry which types of meaning’ (p. 4). Some readers may be uncomfortable with the notion of words as approximate metaphors for music because words in our logocentric tradition of knowledge are favoured as reliable bearers of meaning in a way that music isn’t. I would simply ask those readers to at least consult the following sections of this book before rejecting the cultural reality of words as metaphors of music: Polysemy and connotative precision (pp. 167-169), Intersubjectivity (pp. 195-228) and Gestural interconversion (pp. 502-509).


Summary of ten main points

[1] Whether or not we humans are alone in having developed two systems of sonic communication (language and music), we are probably the only species to distinguish so radically between them (p.54, ff.).

[2] Music is a form of communication involving the emission and perception of non-verbal sounds structured or arranged by humans for humans. As such, music is a universal phenomenon in the sense that no human society has ever been without it, even though what we mean by the word ‘music’ may have no exact verbal equivalent in many languages (p. 44, ff.).

[3] Music is no more a universal ‘language’ than language itself. Being a universal phenomenon does not mean that the same sounds, musical or verbal, have the same meaning in all cultures. The fact that language and music don’t trace the same cultural boundaries in no way means that any music or language can be understood by everyone on the planet (p. 47, ff.).

[4] Music often involves a concerted simultaneity of sound events or movements. Unlike speech, writing, painting, etc., music is particularly suited to expressing collective messages of affective and corporeal identity, since individual participating voices or instruments must relate to the underlying temporal, timbral or tonal basis of the particular music being performed (p. 45).

[5] By combining input from several domains of representation, music forms integral categories of cognition that, from a verbal viewpoint, may seem contradictory or polysemic but which correspond more accurately and holistically with states of mind as they are actually felt (verbal hints: angry on a good day, sad and elated, vulnerable and euphoric, etc.). Music also helps synaesthesis and cognitive flexibility (p. 62, ff.).

[6] Cognitive neuroscientists have demonstrated that musical experience causes neurons to ‘fire up all over the brain’. Such observations reinforce notions of music as a particularly synaesthetic and holistic type of human expression (p. 68, ff.).

[7] Emotion and affect are essential aspects of musical ‘meaning’ but preoccupation with individual subjectivity in Western discourse about music tends to divert attention from equally important issues like spatiality, movement, energy and tactility, as well as from aspects of ethnic, historical and demographic connotation (p. 71, ff.).

[8] If treated with care, verbal metaphors of perceived musical meaning can serve as a useful entry point into the discussion of ‘how music’s sounds can carry which types of meaning’ (p. 78, ff.).

[9] Music is, in different ways and to varying degrees, essential to any human in the socialisation process leading from egocentric baby to collaborative adult (p. 58, ff).

[10] Music is important in contemporary everyday life in terms of the amounts of time and money spent on it: about four hours and the price of a loaf of bread or of a litre of milk per person per day (p.35, ff).

Given these ten points and the discussion they summarise, the next question to ask is why music, if it is important in so many ways to humans, seems to have so often ended up near the bottom of the academic heap. Although its status in Western institutions of learning may not be as lowly as that occupied by other important aspects of human existence like dance or domestic science, it’s clearly not ‘up there’ with maths, the natural sciences and language. This anomaly is explained in the next chapter.



3. The epistemic oil tanker

F the stopping distance of an oil tanker is measured in nautical miles and its turning radius in kilometres, the inertia of a cultural legacy loaded with social, economic, technological and ideological ballast is better calculated in centuries than in years. This chapter identifies one such metaphorical oil tanker with a view to charting a less hazardous course through the troubled waters for which the vessel was not designed. The oil tanker in question is a certain set of Western notions about music, the troubled waters are those of the post-Edison era and the epistemological hazards are the anomalies relating to the unsuitability of that unwieldy vessel in those waters. Now, one of those hazards is the contradiction between music’s humble academic status and its importance in everyday life. It’s an antagonistic contradiction: either music just isn’t as important as I’ve made out (in which case no contradiction exists) or else music’s importance is underestimated and its character misunderstood. Assuming, on the basis of evidence given in Chapters 1 and 2, the second alternative to be more plausible, it will be necessary to examine the persistent belief system of which that contradiction is a symptom in order to clear the ground for the ideas presented later in this book. That’s why in this chapter I’ll try to identify and demystify some widely held articles of faith about music, which in its turn entails considering connections between ideology and musical institutions, as well as between notions of music and knowledge.

The basic anomaly

Compared to the visual and verbal arts, music in Western academe lives in a sort of conceptual and institutional isolation from the epistemological mainstream. This relative isolation in academe stands in stark contrast to music’s much greater integration into media production and perception processes. Every time you put on a DVD, play a computer game, or are subjected to consumerist propaganda on the TV, music is usually an integral part of what has been produced and of whatever it is you experience on hearing and seeing that multi-media production. Assuming that music makes a contribution to that experience, why, you might well wonder, in our tradition of knowledge, do we seem to lack the conceptual tools that could help us understand basic questions of musical meaning?

I’ve already questioned the notion of music as a ‘universal language’ (p. 47, ff.) and suggested that music’s humble status in the pecking order of sign systems in a largely logocentric and scopocentric tradition of knowledge may be due to its essentially alogogenic character. As should be clear from the previous paragraph, there is, unfortunately, more to the problem than that.

Articles of faith

One problem about understanding how music works as a sign system is that those who have written about such things have not always been transparent about their agenda. Another problem is that many sources we rely on for ideas about music date from before the advent of free public education and that verbal literacy was until then the preserve of an élite. These sources have a long historical legacy. They are also often normative, propounding, from particular standpoints in specific socio-historical situations, notions of musical right and wrong, good and bad, true and false, beautiful and ugly, elegant and vulgar, learned and ignorant, etc. Of course, the fact that literacy was until recently the preserve of privileged minorities in no way implies that societies with little or no division of labour have no musical norms, or that oral cultures have no notions of how their music should sound. It simply means that, in our largely scribal tradition of institutionalised and academically codified knowledge, we tend to rely heavily on written documents whose power agendas are rarely made explicit.

Musical power agendas: a historical excursion

One recurrent trait in documents about music from ancient ‘high’ cultures (Mesopotamia, Egypt, China, Greece, etc.), is its link to official religious doctrine or to ostensibly indisputable physical phenomena. In ancient Mesopotamia for example (3,000-600 BC), music theory was connected to astrology and mathematics. The general idea was that if you knew the motions of the stars, if you believed in their sway over human destiny, then you understood the harmony of the universe. You could theoretically be at one with the universe by making music which abided by the rules of its harmony. Music of the court and of official religion was held to conform to such rules; that of other classes and peoples did not. It was through such metaphysical links that an oppressive political system could be identified with a system of musical organisation which was in its turn aligned with the immutable system of the universe. Like the deification of the worldly system’s kings, metaphysical connections between the ruling classes, their music and the heavenly spheres created the illusion that their unjust political system was as divine, eternal, unquestionable and unchangeable as the universe.

Written records from ancient China are even more explicit. The tonal system of imperial music, based on observations about the relation of rising fifths to the perfect ratio 3:2, was put into a cosmic perspective. According to documents from around 450 BC, ‘[s]ince 3 is the numeral of Heaven and 2 that of the Earth, sounds in the ratio 3:2 harmonise as Heaven and Earth.’ The importance of official music in ancient China and its connection with irrefutable truths is also demonstrated by the establishment of a Music Bureau (乐府, Yuèfu) under the Imperial Office of Weights and Measures (141-87 BC). The Bureau’s brief was to standardise pitch, supervise music and build up musical archives. More importantly, for over 2000 years of Chinese imperial history (221 BC - 1911), one set of musical practices was identified by ruling-class ideologues as the ‘right music’: yăyuè (雅乐) or ‘elegant music’, as it was called, refers both to court music of that long period and, more particularly, to court music associated with Confucian philosophy.

The music of imperial Chinese courts, especially yăyuè (‘elegant music’), was, as we just saw, related to the cosmic values of the numerals 2 and 3 which, in their turn, were related to notions of heaven and earth, male and female, Yang (阳, sun) and Yin (廕, shade), etc. Yăyuè was certainly regulated by strict rules of performance, not only in terms of detailed stage positions for instrumentalists and dancers, but also with regard to tonal norms. Intricate division and subdivision of genres in terms of both musical style and audience type illustrate further aspects of complex codification, as do all the ancient texts setting out the history, aesthetics and metaphysics of imperial music-making. These sources also imply that knowledge of such intricacies was important for those producing and consuming the ‘elegant’ music, whose history could be traced back to what was, even then, the distant past of an ancient dynasty. Moreover, imperial Chinese music could be reproduced quite consistently from one performance or generation to another, not only because of the many treatises codifying its aesthetics and practice, but also because certain types of notation were used. Although such notation, either as ideograms indicating pitch or as tablature for string instruments, was probably used less prescriptively than the sheet music followed by euroclassical musicians, it at least helped ensure that singers and musicians could make the music they composed or performed conform adequately to prescribed patterns.

Similar hierarchies of music are found in written sources from other ‘high’ cultures. For example, to qualify as classical (i.e. as belonging to the ‘Great Tradition’), Indian performing art, be it from the North or South, must, as Powers (1995:72) points out, satisfy two main criteria.

‘Firstly it must establish a claim to be governed by authoritative theoretical doctrine; secondly, its practitioners must be able to authenticate a disciplined oral tradition of performance extending back over several generations.’

The important concept here is doctrine (śāstra), more specifically sangita-śāstra (musical doctrine). For Indian music to qualify as doctrinally correct, it must adhere to at least one canonical precept: melodic construction should be governed by one of the tradition’s rāgas. This rule is so important that the proper term for correct musical practices, śāstriya-sangit (‘doctrinal music’), is less frequently used than rāgdar-sangit (music based on a rāga). Indians also often use the English word classical when distinguishing rāga traditions from popular music practices. The Oxford Concise English Dictionary (1995) defines classical, qualifying the arts, as:

’serious or conventional; following traditional principles and intended to be of permanent rather than ephemeral value… representing an exemplary standard; having a long-established worth.’

Calling śāstriya-sangit or rāgdar-sangit ‘classical music’ is in other words quite appropriate because not only do buzzwords of higher and lasting value occur in the connotative spheres of both terms: śāstriya-sangit and classical music also both allude to notions of tradition, doctrine, convention and learning. Besides, śāstriya-sangit’s qualification as scientific or knowledgeable rhymes well with European-language equivalents of classical music, like musique savante, musica colta, música culta, música erudita, E-Musik, serious music and art music’. Unlike most types of ‘popular’ and ‘folk’ music, the musical practices qualified by such epithets as classical are all associated with doctrinal texts codifying the philosophy, aesthetics, performance, interpretation, understanding and structural basis of the music in question.

To cut a long story short, the division of music in Western culture into categories of art or classical and folk or popular has numerous parallels and forerunners. It’s even possible that elements of Mesopotamian theory passed via Greek and Arabic scholars into the metamusical mindset of Europe’s medieval clerics and their trichotomy of musics. This trichotomy consisted of musica mundana (the music of the heavens, of spheres in the universe), musica humana (music providing equilibrium of soul and body and instilled by liturgical song) and musica instrumentalis (the singing and the playing of instruments that were at the service of the devil as well as of God). As Ling (1983: 97) explains:

’[I]n the world of heavenly light, the harmonious and well-tuned music of eternity is heard. Its opposite is the unbearable noise and dissonant, discordant music of hell. Both heaven and hell exist on earth: the music of heaven is reflected in liturgical chant —it is organised, well-measured and based on science and reason. All other music is of the devil, being chaotic, ill-measured and uneducated.’

Since musica mundana was an entirely metaphysical idea (the music of the spheres, of heaven, of God’s perfect creation, etc.), the real world contained only two sorts of music according to the aesthetic and religious precepts of the church fathers: (1) musica humana as the uplifting liturgical song of Mother Church and of God’s representatives on earth and (2) musica instrumentalis as all other music, be it of the devil or of God. This basic dualism of musics changes character quite radically as part of the lengthy and complex process by which the value systems of feudal and ecclesiastical élites are supeseded by those of the ascendant bourgeoisie. These bourgeois music values are important to understand because they’ve been at the basis of much discourse about music in Western institutions of education and research since the mid nineteenth century. They include notions of the musically Good, Beautiful and True that still hold sway in many of our musical institutions and still exert a strong influence on what sort of meanings, if any, those of us who see ourselves as educated think that music can carry.

Classical absolutism: ‘music is music’

The notion of absolute music and of its superiority is a striking feature of institutional music aesthetics in the Western world. Hegel (1815), for example, made the following distinction between the musical values of the initiated and those of the average punter.

’[W]hat the layman (Laie) likes in music is the comprehensible expression of emotions and ideas, something substantial, its content, for which reason he prefers accompanimental music (Begleitmusik); the connoisseur (Kenner), on the other hand, who has access to the inner musical relation of tones and instruments, likes instrumental music for its artistic use of harmonies and of melodic intricacy as well as for its changing forms; he can be quite fulfilled by the music on its own.’

The most famous absolute music aphorism was formulated by Austrian music critic Eduard Hanslick who, in his treatise On Musical Beauty (1854), wrote: ‘Music’s complete content and total subject matter is nothing other than tonal forms in movement.’ Since then, similar views of music have ruled the roost in euroclassical circles to such an extent that some composers whose ‘tonal forms in movement’ clearly relate to ‘other subject matter’ have denied any such relation. Stravinsky (1882-1971), for example, once quipped that his music expressed nothing but itself, implying that stage works of his (Petrushka, The Firebird, The Rite of Spring, for example) were ‘pure’ music. It may be true that Stravinsky, a bit like David Bowie, frequently recast his public persona but the very fact that he saw fit, even just once, to do so from the standpoint of musical absolutism suggests that adopting that view may have advanced his artistic credibility in influential circles. This is certainly what Mahler (1860-1911) once felt compelled to do: having already written programme notes to his first three symphonies, he is reported to have raised his glass at a soirée with Munich illuminati in 1900 and to have proclaimed ‘death to all programme music!’.

The pressure on composers to conform to the notion of absolute music throughout the twentieth century cannot be underestimated. For example, famous film composers like Korngold (1897-1957) and Rózsa (1907-1995) lived double lives, compelled to separate their music for music’s sake from their work for the movies. Similarly, Morricone has on occasions expressed disappointment at the scant recognition he receives for his concert music, however widely acclaimed he may be as a musical pioneer because of his work for the cinema. The point is: if the institutional dominance of absolutist aesthetics can affect the lives of widely acclaimed figures like Mahler, Stravinsky, Korngold, Rózsa and Morricone, then such a view of music will have exerted at least as much influence on lesser figures in musical academe. For example, Francès (1958), in his pioneering research about musical reception, received several indignant responses from his music student informants in which they expressed strong absolutist views of the following type:

‘No, no and no again. Music is music. I cannot conceive of it as a source of emotional or literary ramblings.’

I still (2012) meet individuals who object to the idea that music can relate to anything except itself. Musical absolutism, it seems, continues to exert such a strong influence that it has, as we’ll see later in this chapter, even spilled over into discourse about various types of popular music. Obviously, the notion of absolute music clearly conflicts with semiotic approaches to music analysis, but its apparent tenacity also suggests that it’s an epistemic force to be reckoned with. If that is so, it would be foolish to simply write off the notion without first examining it in some detail, not least because, as already noted, musical structures can in one sense be objectively related to only ‘either: [a] their occurrence in similar guise in other music; or [b] their own context within the piece of music in which they (already) occur’ (p. 46). ‘In one sense’ is of course the problem here because the exclusively intrageneric stance of musical absolutism ignores everything else to which music can be related. In what comes next I’ll try to explain the nature of and reasons for musical absolutism’s epistemic lopsidedness.

‘Absolute’ and ‘non-absolute’

Calling music absolute literally means that the music so qualified is neither mixed up with, nor dependent on, nor conditioned by, nor otherwise related to anything else. The first problem with this absolute definition of absolute is that not even the most adamant musical absolutist would claim such ‘absolute’ music as a late Beethoven quartet to be 100% independent of the musical tradition to which it belongs. Since the quartet cannot have existed in isolation from the musical traditions to which its composer and audiences belonged, any notion of absolute music must be dependent on at least the existence of other absolute music for its own identity. Absolute is in this case relative, allowing the music in question to be absolute only in the sense of unrelated to anything else except other (‘absolute’) music. Now, apart from the fact that the other absolute music would relate to more absolute music, either in a loop (circular argument) or, at some final point in an otherwise endless chain of ‘absolute’ references, to something other than absolute music, the slight qualification, just proposed, of ‘absolute’ as partly relative is problematic for two more substantial reasons.

The first reason is that absolute music relies on the existence of non-absolute music for its distinction as ‘absolute’. Since non-absolute music must, at least by inference, be related to other music and to phenomena that aren’t intrinsically musical, absolute music must also, even if indirectly, be related to other phenomena than music, thanks to its sine qua non relation to non-absolute music, and to that music’s relation to things other than itself. Moreover, since those who distinguish one type of music from others by the qualifier ‘absolute’ in no way make up the entire population, they are just one of many sociocultural groups identifiable by their specific musical values and opinions. This means that the term absolute music is, like it or not, linked to the sociocultural position, tastes, attitudes and behaviour of those that use it. It thereby identifies not only absolute music in relation to other music but also its fans in relation to users of other music. Due to such inevitable sociocultural connotation, absolute music is a contradiction in terms.

The second reason for refuting the notion of absolute music is its implication that the music thus qualified transcends not only social connotations and uses but also patterns of synaesthesis. If that sort of transcendence existed it would mean that demonstrable patterns of juxtaposition between music and pictures, between music and words, or between music and bodily movement (as in dance, film, opera, Lieder, pop songs, adverts, videos, computer games etc.) could never influence the production or perception of absolute music and vice versa. Moreover, if absolute music were indeed absolute, it would need no elements of biologically or culturally acquired synaesthesis to exist, with the consequence that non-absolute music (opera overtures, ballet suites, TV themes, dance tunes, etc.) would be pointless in a ‘music only’ situation (at a concert, on the radio, on your smartphone) where their visual, dramatic or choreographic accompaniment is normally absent. Conversely, it would mean that absolute music played in connection with anything but itself or other absolute music would also be useless because its ‘autonomy’ would preclude any synaesthetic perception. This would in turn imply, for example, that the Taviani brothers were deluded when they used snippets from the slow movement of Mozart’s Clarinet Concerto in A (K622) as underscore to key scenes in Padre Padrone (1977); it would also mean that Kubrick misunderstood the values of euroclassical music in 2001 (1968), The Shining (1980) and Eyes Wide Shut (1999), or that Widerberg, not to mention his cinema audience, were musically incompetent when responding to the Elvira Madigan (1967) effect. In other terms, absolute music contradicts music’s inherent properties as a site of cross-domain representation (pp. 62-68).

In short, if music called absolute has ever had any social connotations, if it was ever written or performed in given historical contexts by certain musicians, if it was ever heard in particular social contexts or used in particular ways by a particular audience, if it was ever related to any drama, words or dance, then it cannot be absolute. Absolute music can therefore only exist as an illogical concept or as an article of faith. If so, how can it have been so influential and why is it so resilient? A first clue to this enigma is provided in the next three quotes.

‘Passions must be powerful; the musician’s feelings must be full-blown — no mind control, no witty remarks, no clever little ideas!’

This sort of statement could have been made by a dedicated jazz musician (see p. 408). In fact the words date from 1762 and are uttered by the rebellious main character in Diderot’s play Rameau’s Nephew.

German romanticist Wilhelm Wackenroder had similar ideas. In 1792 he described the optimal music listening mode as follows.

‘[I]t consists in alert observations of the notes and their progression, in fully surrendering my spirit to the welling torrent of sensations and disregarding every disturbing thought and all irrelevant impressions of my senses.’

In 1799, Wackenroder’s collaborator Ludwig Tieck wrote:

‘[O]nce music is freed from having to depict “finite”, distinct emotions, it becomes the expression of “infinite yearning”, and this indefinite quality is superior to the exactness of vocal music, rather than inferior, as was believed during the Enlightenment.’

Powerful passion, fully surrendering the spirit, infinite yearning etc. on the one hand and, on the other, mind control, disturbing thought, irrelevant impressions, distinct emotions and so on: the value dichotomy is clear in the three views of music just cited. Other important common denominators are that they all, like the Hegel passage that started this section (p. 89), come from the same period in European history and that they are all qualifiable as Romantic.

‘Absolute’ subjectivity and ‘arsehole art’

The rise of instrumental music in eighteenth-century Europe can be understood in the context of the Enlightenment, rationalism and the bourgeois revolution. The emancipatory values of these developments and the subjective experience of that emancipation found collective expression not only in emotive slogans like liberté, égalité, fraternité but also in a music that was itself thought of as liberated. Instead of having to make music under the constraints of feudal patronage and of the Baroque theories of affect associated with the ancien régime, music could now, it was believed, be purely instrumental, free to express emotions without the encumbrance of words or stage action.

Of importance to this historical background is the fact that Romantic views of music were conflated with notions of ‘personality’ and ‘free will’ central to bourgeois subjectivity, both of which were treated as conceptual opposites to the external world of material objectivity. Individuality, emotionality, feelings and subjectivity came to be imagined as opposite poles to the social, rational, factual and objective. Music played a central role in this history of ideas according to which the subject’s alienation from objective social processes was not so much reflected as reinforced, even celebrated. Since the humanist liberation of the ego from feudalist metaphysical dogma went hand in hand with the bourgeois revolution against the absolutism of the ecclesiastical and monarchist hierarchy, it’s hardly surprising to find contemporary notions of music unwilling to tie down musical expression by means of verbal denotation or any other type of reference to anything outside itself. After all, as long as the musical ideals were emancipatory in relation to an outmoded system of thought they could lend support to the development of revolutionary forms of music and society. But what happened when those musical ideals became the rule and their advocates the rulers?

Perhaps the most significant change is that the radical instrumental music of late eighteenth-century Central Europe, initially dubbed ‘Romantic’, acquires the label ‘classical’. This rebranding was established by the mid nineteenth century, along with the music’s institutionalisation in philharmonic societies, concert halls, conservatories, etc.

Another striking symptom of the same process was the adoption of recurrent buzzwords to signal aesthetic excellence: Art, Masterpiece, Genius, free, natural, complete, inspired, infinite, eternal, sublime, etc. Raised to the status of classical, the once emancipatory qualities of the music were mystified and its Great Composers mummified into those little white alabaster busts that classical buffs used to keep on top of well-polished pianos. Although the dynamic independence that the canonised instrumental music once possessed had been dynamic and independent in relation to older forms of music that were considered fettered by certain types of extra-musical bonding, it was, as ‘classical’ music, stripped of that historicity. In its new state of sanctity it was conserved in conservatories that by 1900 had successfully eradicated anything that might upset the canon, including the improvisation techniques that had once been part of the tradition whose champions the same conservatories professed to be. This institutionalisation process left the seemingly suprasocial absolute music deep frozen as sacrosanct notation: a century-and-a-half’s worth of performers were subsequently conservatory trained to perpetuate it. At the same time, concerts included less and less new music. For example, the proportion of living to dead composers’ music on the concert repertoire in France fell from 3:1 in the 1780s to 1:3 in the 1870s.

Freedom of expression without verbal or theatrical constraint had been the revolutionary drive of the new instrumental music that was later canonised as ‘classical’. Once canonised, it needed theories that would identify and codify those special qualities. And if the new music’s emancipatory driving power had been its unfettered emotional expression then that would be an obvious trait to conserve in conservatories and to expound upon in serious writings on music. One problem was that the new instrumental music had derived its perceived freedom of expression, its own internal musical rhetoric and drama, not from being devoid of words or dramatic action but from the fact that similar music had been repeatedly associated with particular words or stage action. When music went instrumental and crossed the street from the opera house or theatre into the concert hall, it simply carried with it those links to words and dramatic situations.

Still, even though the classical symphony could never have acquired its sense of dramatic narrative without a legacy of affects from the Baroque era, many experts still regard the European instrumental classics as absolute music. As Dahlhaus (1988: 56) explains:

‘Early German romanticism dates back to the 1790s with Wackenroder’s and Tieck’s metaphysic of instrumental music, a metaphysic that laid the foundations of nineteenth-century music aesthetics and reigned virtually unchallenged even in the decades of fin-de-siècle modernism.’

That metaphysic lived on through much of the twentieth century. Even Adorno’s hit list of listening types is clearly Hegelian and music is still sometimes taught as if it were at its best when divorced from words and the visual arts. Polarising the issue for purposes of clarity, it could be said that keepers of the absolute music seal condemned music, if deemed bad, to the aesthetic purgatory of entertainment or primitive ritual; if deemed good, they raised it to the lofty realms of Art. It’s no exaggeration to say that a large proportion of musicological scholarship since A B Marx has been devoted to propagating an arsenal of terms and methods describing the complexities of European instrumental music in the classical tradition at the expense of other musics. Among those ‘inferior others’ we find not only the music of peoples colonised or enslaved by the European capitalist classes (‘primitive’), but also the ‘light music’ (Trivialmusik) of the nineteenth-century European proletariat oppressed by the same ruling classes (‘entertainment’). That deprecation of low-brow by high-brow is callous, to say the least, because the French Revolution of 1789 and the Code Napoléon of 1804 would never have materialised without the support and sacrifice of the popular majority. Despite that support, the bourgeois revolution reneged on the promise of liberty and equality for all as it betrayed the fourth estate (workers, peasants, etc.). You don’t have to be a professor of political history to work out that deprivation directly affects people’s relationship to music, as the following simple points demonstrate.

• The less money you have, the less you can afford concert tickets, instruments, rehearsal and performance space, musical tuition, etc.

• The less money you have, the more crowded your living conditions will be, the less room you will have for musical instruments, and the more likely you will disturb your neighbours when you make music or be disturbed by them when they make music.

• The less leisure time you have, the less likely you are able to try out other musics than those readily accessible to you and the less likely you are to opt for music requiring patient listening or years of training to perform yourself.

• The noisier your work and leisure environments, the less use you have for music inaudible in those environments, or for music demanding that you listen or perform in a concentrated fashion without disturbance or interruption.

Bearing these points in mind, Wackenroder’s ‘right way’ of relating to music (p. 94) would be out of the question under the conditions that most people had to endure in industrial cities across nineteenth-century Europe. Nor were the old musical ways of the countryside much of an alternative. Apart from the fact that music connected with the cycle of the seasons was not suited to life in an industrial town, most members of the new working class were refugees from semi-feudal repression in the countryside who had little reason to idealise their rural past in musical or any other terms. Instead, the old folk music was replaced by street ballads, low church hymns, music hall tunes, popular airs from opera and operetta, dance tunes, marches and so on. It was this musical fare that nineteenth-century music authorities branded as light, trivial, trite, crude, shallow, low-brow, commercial, ephemeral entertainment in contrast to the deep, serious, classical, high-brow, transcendental Art of lasting value which they prized. Major figures in musicology were instrumental in propagating this view of high and low in music. For example, Hugo Riemann (1901: 313) dismissed popular music as embodying ‘the lowest musical instincts of the masses addicted to arsehole art.’ Such overt contempt for the popular classes and their music may strike us as both vulgar and elitist but even those charitable burghers who sought to raise the musical standards of the masses unwittingly adopted, through their efforts to improve others less fortunate than themselves, a position of cultural superiority. So, the first probable reason for the staying power of absolutist aesthetics in Europe is that it worked for a long time as a reliable marker of class membership. Even today, adverts for financial services are much more common on classical format radio than on pop or country stations. However, the classical music = high class equation did not just work as a sociocultural indicator.

Members of the new ruling classes faced a series of moral dilemmas, the most striking of which is probably that between the monetary profit imperative of the capitalist system and the charitable imperatives of Christianity. ‘Sell all that thou hast and give unto the poor’ rhymed badly with paying your employees as little as possible to produce as much as possible or with sending children to work down the mine. As a businessman in a ‘free’ market with ‘free’ competition, it might ease your conscience if you could draw clear dividing lines between your business and your religion, between work and leisure, public and private, personal and social, morals and money, etc. Any conceptual system rubber-stamping such polarities would offer welcome relief and help you sleep at night. Seen in this light, even the most outré statements of Romantic music metaphysics have to be taken seriously because the institutionalised concept of absolute music provided an ethical get-out clause: if listening to music in the ‘right way’ was a matter of the emotions, of the music itself and nothing else, then good business ought to be a matter of making money, business itself and nothing else. Or, to put it another way, feeling compassion or any other ‘irrelevant’ emotion while making money would be as inappropriate as thinking about money when listening to instrumental music in the ‘right’ way (see p. 94). To put it in a nutshell, music is music (absolute music) can only exist in the same way as orders are orders or business is business. All three statements are of course tautological nonsense, otherwise there would be no music industry, no War Crimes Tribunal and no International Monetary Fund; but that isn’t the point because the effects of the practices characterised by such conceptual absolutism and by the ideological purposes it serves are still painfully real. The conceptual dissociation of money from morality, military orders from ethics, and the world outside music from music, all illustrate the way in which capitalist ideology can isolate and alienate our subjectivity from involvement in social, economic and political processes. We are in other words back to the issue of dual consciousness raised at the start of the preface to this book.

Refocusing on music is music, we need to mention one final reason for the staying power of musical absolutism. I’m referring here to the way in which members of the haute-bourgeoisie, already on top of society’s monetary pyramid, could easily, by claiming the artistic high ground of musical taste transcending mundane material reality, convince themselves that they were superior to the masses in more than merely monetary terms: they cultivated what established experts agreed was good taste in music, they adopted the ‘right way’ of listening to the ‘right’ music; lesser mortals did not. By theoretically locating their musical experience outside the material world, the privileged classes were not only able to feel superior: they could also divert attention from the fact that it was they who exerted the real power, they who enjoyed the real material privileges, actually in the material world.

In this historical context, the Romantic metaphysics of music and its notion of absolute music, both of which became cornerstones in the capitalist state’s musical establishment, can be seen as essential supplies in the conceptual survival kit of bourgeois subjectivity. It’s for such reasons hardly surprising if academic institutions in a society still governed by the same basic mechanisms of capital accumulation have until recently propagated conceptual systems validating dissociation of the subjective, individual, intuitive, emotional and corporeal from the objective, collective, material, rational and intellectual. It’s also historically logical that this same dissociation should affect our understanding of music and dance, the most clearly affective and corporeal of symbolic systems, with particular severity. That dissociation lives on in our culture even outside the euroclassical sphere.

‘Postmodern’ absolutism: text denial

Remember Rameau’s fictitious nephew back in 1762 and his ideal of music making with no mind control? Or Tieck and Wackenroder’s dream of music freed from having to depict distinct emotions and their ideal listener fully surrendering to the welling torrent of sensations, disregarding every disturbing thought? Now compare that with this.

[1] ‘The point is… [to] overthrow… the power structure in your own head. The enemy is the mind’s tendency to systematise, to sew up experience. […] The goal [is] in OBLIVION’. (Reynolds, 1987)

[2] ‘[T]he power of pop lies not in its meaning but in its noise,… the non-signifying, extra-linguistic elements that defy “content analysis”: the grain of the voice, the materiality of the sound, the biological effect of the rhythm, the fascination of the star’s body’. (Reynolds, 1990)40

[3] ‘Rock and roll is corporeal and “invasive”… [W]ithout the mediation of meaning, the sheer volume and repetitive rhythms of rock and roll produce a real material pleasure for its fans.’ (Grossberg, 1990: 113)

It’s worth noting first that aversion to the idea of music mediating anything but itself isn’t the only common denominator between anglophone pomorockology in the late twentieth century and the musical metaphysics of German Romanticism in the late eighteenth because, as Table 3-1 (p. 103) shows, both trace a similar path from radical alternative to institutional norm with the following traits. [1] A canonic repertoire is established (row 4 in Table 3-1). [2] Subjectivity and individual freedom are promoted as key notions (row 7). [3] A strong relationship with political power develops (rows 9-10). [4] The educational institutionalisation of each body of music takes place a generation or so after the apogee of the original musical development subjected to subsequent canonisation (rows 1-2 in Table 3-1).

Ossification (row 4 in Table 3-1) of the European classical repertoire causes few eyebrows to be raised: it’s seen as the nature of the beast, so to speak. Less common knowledge is that similar tendencies developed in the anglophone world of pop and rock music: whereas album charts from the 1960s and 1970s included very few re-issues, back catalogue accounted for the majority of pop sales in 2000. Such processes of repertoire consolidation and conservation occur after an initial period of musical innovation associated with social change (row 8). These processes also include the adoption and patronage by state or corporate power of music that was seen as at least inappropriate, sometimes even as a threat, in the recent past.

Table 3-1: Classical and popular music as institutionalised fields of study

classical music studies popular music studies

1. initial period 1830s-1860s 1970s-2000s

2. institutions

created conservatories, departments of music and musicology performing arts colleges, social science and media courses

3. musical heritage instrumental ‘classical’ first jazz, then pop/rock

4. ossification

tendencies Music by dead composers gradually dominates 1960s: few re-issues in charts; 2000: 60% of sales back catalogue

5. musical lingua franca Central European,

mainly Germanic Anglo-American

6. global hegemony European colonialism US imperialism

7. liberties and attitude to pleasure liberation of the ego, emotionality, postponed gratification liberation of the id, corporeality, consumerism, immediate gratification

8. hegemonic class

movement rising capitalist merchant class against feudal aristocracy and abandoned fourth estate nouveau riche against old

‘cultured’ capitalism and new lumpenproletariat

9. examples of state appropriation and sanctioning in UK Händel (mass appeal) becomes Handel, musical representative of UK state power Queen’s jubilee: Brian May, Eric Clapton, Brian Wilson; Abba songs on symphony orchestra repertoire

10. UK official

honours bestowed

(selection) Sir: A. Sullivan, C.V. Stanford, C.H.H. Parry, E. Elgar, R. Vaughan Williams, A. Bliss, W. Walton, M. Tippett, P. Maxwell-Davies, R.R. Bennett, J. Tavener Sir: Cliff Richard, Andrew Lloyd-Webber, George Martin, Paul McCartney, Bob Geldof, Tom Jones, Elton John, Mick Jagger. Dame: Vera Lynn, Shirley Bassey. OBE: Van Morrison, Richard Starkey (Ringo), George Harrison


One obvious UK example of state sanctioning (row 9) is the concert held in Buckingham Palace gardens in June 2002, to celebrate Queen Elizabeth II’s fifty years on the throne. A succession of ageing rock stars, including Eric Clapton, Brian Wilson, Ray Davies and Sir Paul McCartney trooped on stage to perform a string of tunes from the late sixties and early seventies. Sir is perhaps a particularly reliable indicator of the process because popular music knighthoods (bottom of Table 3-1) didn’t exist before Thatcher came to power in 1979, since when Brits have been presented with Sir Cliff Richard, Sir Paul McCartney, Sir Elton John, Sir Robert Geldof, Sir George Martin, Sir Andrew Lloyd-Webber, Sir Michael Jagger, Sir Tom Jones, Dame Vera Lynn and Dame Shirley Bassey, plus several pop-rock OBEs to boot.

If the incorporation of previously oppositional music into established power structures had only been a matter of honorary titles, things would not be so bad. Unfortunately, the old epistemic patterns underpinning the dual consciousness that prevents individuals from integrating subjective and objective aspects of (musical) life have been much more substantially boosted by the way in which (not by the fact that!) various forms of popular music have become academically institutionalised on two fronts: knowledge in and knowledge about music (see p. 115, ff.). In the first of these (the ‘rock/pop conservatoire’) repertoire canons and national exams were established, first for jazz and later for rock performance, not just to legitimise those musics and those who rose socially with them (row 8 in Table 3-1), but also to meet the neo-managerial monster’s insatiable appetite for assessment, accountability, enhancement, excellence, outcome, quality assurance, benchmarking, league tables and other Kafkaesque concepts created to bring the complex dynamics of teaching, learning and extramural reality into bizarre one-dimensional schemes of quantification. These standardisation mechanisms have often made it hard for educators to find room on the curriculum for music still in dynamic interaction with extramural reality, while budget restrictions often cause problems in keeping up with the changes in media technology that music technology graduates will have to confront in the outside world.

However, much more damaging to the development of analytical perspective presented in this book has been the inverted musical absolutism that was so fashionable, at least in the anglophone world around 1990 and which, like the old-style art-music absolutism discussed earlier, exhibits avid aversion to making links between music as sound on the one hand and its meanings, uses and functions on the other. Inverted musical absolutism has its own articles of faith as part of an irrational belief system and still rules the roost in a significant number of institutions supposedly devoted to studies of culture, including music. I’m referring here to what, in the context of popular music studies, I call pomorockology and whose tendencies are exemplified in the three numbered quotes on page 101.

There’s no room here to provide much detail about the rise of ‘postmodernist’ approaches to music and I take the liberty of referring readers elsewhere for a fuller account. It is, however, important to understand its basic problems, not least because it circulated widely in the non-muso humanities and social sciences before gaining any foothold in musicology.

It was around 1980 that ‘postmodernist’ approaches to music seemed to take root among intellectual would-be radicals who, through lack of anything else they could understand about music had, so to speak, nothing to read but Adorno. The most striking traits in Adorno’s writings on popular music are: [1] ignorance of the music on which he passes judgement; [2] absence of musical-structural levels of concretion; [3] absence of empirical evidence (sociological, anthropological, or otherwise) to support his theorising. More directly influential on rock criticism in particular was Adorno’s protégé Herbert Marcuse who in the 1960s popularised the social-critical philosophy of the Frankfurt School among radical US students, including founding Rolling Stone columnist Jon Landau and pioneer rock historian Carl Belz (1969). It was from these origins among upper middle-class students of philosophy, literature and sociology —not music— that a literary style of rock journalism developed which promoted subversion and spontaneity as key criteria of ‘authenticity’. To cut a long story short, while in the 1960s and 1970s rock critics of this school concerned themselves with radicalism and alternative politics, using terms like the spirited underdog, or body music that entertains and provokes, the discourse shifts, as rock becomes part of the mass media establishment, in the direction of noise and oblivion, away from opposition and subversion. Characteristic is the insistence on, to use Reynolds’ own words, ‘music’s non-signifying elements that defy “content analysis”’. Like Hegel’s connoisseur ‘fulfilled by the music on its own’ and Wackenroder’s ideal listener ‘disregarding every disturbing thought’, pomorockology promoted a music is music aesthetic in which refusal to consider music as a meaningful sign system became an article of faith. The origins of this problem lie partly in the history of Cultural Studies.

Almost all Cultural Studies pioneers examined the verbal or visual media, i.e. the symbolic systems privileged in public education and those which were technically reproducible for teaching purposes at the time: ‘literally, the sort of thing you could photocopy’, as Simon Frith once put it. Their concern with music, at least as we musos understand the word, seemed with few exceptions to be sporadic, unsystematic or marginal. Indeed, there was little point in photocopying musical notation if neither students nor teachers could decipher it. More importantly, there was, as several colleagues in cultural and communication studies have pointed out, nothing much to read in English at that time about music except Adorno. For example, when asked over the phone in 2002 why he thought Cultural Studies scholars had engaged so little with music as text, Dave Laing replied:

‘[T]here wasn’t much out there by musicologists, except for loners like Wilfred [Mellers]. But there’s more to it. I think Adorno got in the way. He had high-art and left-wing cred that suited the way things were going in sociology [and] Cultural Studies. [H]is ‘On Popular Music’ reinforced… prejudices about the popular-classical split. We knew that pop music had different values (intensional versus extensional and so on) and musicology seemed mostly to be about notes on the page. Besides, [popular music] was so much about style and clothes and a way of life, not just about the music and definitely not about notes. So, I don’t think it even occurred to us to ask anyone in the Music Department [and] I don’t think any of us were really aware of ethnomusicology either.’

Laing’s retrospective ties in with the justified questioning of musicology’s usefulness, as expressed by the non-muso colleagues who in the 1980s asked me for a musicological explanation of pop videos (p. 3, ff.) and which first prompted me to think about writing this book. That said, conventional musicology’s general inability to deal with relations between musical text and context wasn’t the only problem because it takes two to tango and at least two to decide not to. Nor does it explain the paradox whereby pomo rock critics developed their own variant of the absolute music value aesthetics which characterised the conventional type of musicology whose legitimacy they criticised.

The fact that music became —and largely remained— a ‘troublesome appendage to Cultural Studies’ is frustratingly clear to those of us who have worked as musicians. Apart from the fact that the last to be hired and first to be fired as staff member at Birmingham University’s famous CCCS was its only musicologist, it has often been disheartening to register that the efforts we make as musicians to present sound x rather than y to produce effect z in contexts a, b or c are usually passed over in silence by non-muso colleagues supposedly studying music. I’ve even felt like the inferior partner in an unequal marriage where I’m expected to enthuse about Baudrillard and ‘embeddedness’ while very few of my Cultural Studies colleagues bother to find out what pentatonicism or synth presets are all about, or to understand how structuring sounds in different ways relates to the sociocultural contexts in which they’re produced and used. Still, however annoying such lack of reciprocity may be, griping about it will not improve matters. It’s better to explain.

One problem with Cultural Studies was that it had by the 1980s become the victim of its own success. Having started with a democratic agenda, including studies of cultural identities formed around various types of popular music —youth subcultures—, the Birmingham school understandably attracted acolytes like moths to a flame from a wide range of disciplines. One symptom of the problem was the Centre’s need to maintain its identity by providing a common epistemological umbrella for all those recruits from all those different disciplines. The ensuing theoretical superstructure that swelled to unmanageable proportions was largely based on what Mattelart and Neveu (1996) call (in French) La ‘French Theory’ and features the following heroes of the archetypal pomo bibliography: Barthes, Baudrillard, Derrida, Deleuze, Kristeva, Lacan, Lyotard and Žižek. Members of this disciplinarily heterogeneous bunch of scholars (linguists, literary critics, philosophers, social theorists, but not a single muso) have featured as mandatory authorities to which countless writers seem obliged to refer in texts emanating from the postmodernist establishment. This practice has two deplorable side-effects: [1] those who don’t comply with its imperatives can be ostracised from the institutional community it helps define; [2] theory sections of writings about any cultural phenomenon often swell to ludicrous proportions, leaving little or no room for empirical or structural investigation, that is assuming that such practices are allowed at all in extremist pomo circles. Mattelart and Neveu (1996 §69) do not mince words about such meta-theoretical excesses.

‘Faced with a world whose complexity is no more than a convenient slogan, Cultural Studies took up the challenge by introducing an abusive inflation of meta-discourses rather than by investigating a theory of that complexity.’

The fact that you are reading these words right now means that even this book is blighted by the problem. I cannot avoid the issue and cannot pretend that these obstacles to our understanding of music don’t exist because they are still widely accepted in circles where, at least in anglophone academe, the very people I’m writing for actually work and study. It’s regrettable having to devote so much time to unravelling misconceptions in order to focus on what really needs to be written about, but I shall persevere.

The second problem relating to Cultural Studies in the 1980s concerns the subject’s new recruits. Unlike the baby-boomer generation, they had no first-hand experience of the postwar changes in popular culture that were musically manifested in the form of rock and pop music and which were related to radical changes in patterns of subjectivity. Raised with a TV in the home and with access to 24-hour pop radio channels, the new Cultural Studies generation entered an intellectual environment that differed noticeably from what confronted baby-boomers when they had attended university twenty years earlier. The new scholars also lived in a very different political climate to that of the 1960s as the Thatcher and Reagan regimes unleashed their virulent strain of capitalism on the population. Working-class values of community and resistance suffered severe setbacks from anti-union and anti-welfare policies, while left-wing intellectuals were in a quandary about how to react as their own security was threatened by government crusades against ‘sociology’ and by the imposition of monetarist management models on universities. The problem was compounded by the apparent inability of Cultural Studies to manage its own success inside an establishment to which it had been at least partially opposed and, perhaps more significantly, its loss of its social foothold outside academe. One symptom of this institutional malaise was that one of the subject’s most influential theoretical models, that of subcultural opposition, became something of a paradox once it was itself part of a successful enterprise: it became a sort of academic parallel to the more blatant anomaly of continuing to celebrate the subversive underdog when rock was already part of the big-business establishment.

The most obvious change in Cultural Studies in the 1980s is probably the shift in emphasis on popular agency away from active participation in sites of opposition to the celebration of mass-culture consumers as agents in the ‘construction of meaning’. Of course, information about audiences is vital to understanding the dynamic of any cultural exchange but, as Mattelart and Neveu (1996: §§70, 76) explain, obsession with the notion of the audience’s freedom to determine the meaning of mass-mediated messages easily obscures the relations of power that exist between members of the audience and the socio-economic order which imposes restrictions on the range of readings effectively open to negotiation. Such idealisation of ‘alternative readings’ constitutes little more than an academic variation on the old freedom of choice theme chanted at consumers by zealots of the ‘free’ market. This change of focus coincides with the replacement of a partially Keynesian economic policy by neo-liberalist monetarism. It also coincides with pomo-rockology’s abandonment of the subversive underdog in favour of the sort of decontextualised body celebrated in the quotes on page 101.

The decontextualised body is perhaps the most insidious article of faith to come out of ‘postmodern’ Cultural Studies and rock criticism. Like the ideal of uninhibited ‘full-blown feelings’ and the establishment of an absolute music aesthetic around the time of the bourgeois revolution, postmodernist bodyism also celebrates music in absolutist terms but with one significant difference: it believes in the liberation of the body rather than of the emotions (the id and the ego in row 7 of Table 3-1 p. 103) and it celebrates the immediacy and ‘oneness’ of musical experience so that the sound of the music is seen as inseparable from the body responding to it. This notion is problematic because it implies that musical ‘texts’ don’t exist, as the following anecdote illustrates.

During a discussion I had in the mid 1990s with some popular music studies colleagues, two pomorockologists in a state of text denial held that the Percy Sledge hit When A Man Loves A Woman from 1966 was not the same in 1987 after its use in a widely diffused jeans commercial, even though the music used in the advert, not to mention the song’s re-issue following the popularity of the commercial, were both identical with the same original recording from 1966. The pomo case against this empirically demonstrable fact started quite convincingly: since the record-buying and TV-viewing public are the ultimate arbiters of pop’s meanings and values, and since the context of the jeans commercial was different to that of the same recording’s original release in 1966, different connotations and different values were perceived in relation to the song. I had no trouble with that. Then my pomo colleagues argued that if the same music did not come across as the same thing to those using it in the new context, it could not be the same as before because audiences are the arbiters of musical meaning. That, I thought, was a non-sequitur because, according to their line of reasoning, music was defined only as the response it receives and/or as the symbolic values attributed to it in some context or other, not as and not even in relation to the sonic text which elicited that response or to which those meanings were attributed in that context. The fact that commercial exploitation of the original recording’s connotations was dependent, twenty years later, on the TV audience’s ability to recognise the music as ‘that song’ (a musical text) with its own connotations for each listener in an earlier context rather than another song with other connotations from another time and place did not seem to matter; nor, apparently, did the fact that Atlantic (Sledge’s record label) cashed in on the same song’s renewed popularity, under new circumstances and with new connotations, by issuing a simple re-release, i.e. without having to re-record a single track, without having to produce a new musical text.

By marginalising or disregarding the musical text , pomo-rockologists conflated specific sets of culturally organised sound with the activities and reactions they believed to occur in connection with those particular sounds under a particular set of circumstances, even if they presented neither evidence of those activities and reactions, nor details of the context they had in mind. Obviously, if no musical text exists there can be no relatively autonomous set of musical sounds which can exist in other contexts where those same sounds may or may not be invested with different meanings, give rise to different reactions, have different functions, etc. All that remains in other words is just one idealised and absolute context. Now, absolute context is of course just as much an aberration as absolute music (text), not only because a context must by definition contain a text to exist as such (just as no text can exist without a context), but also because no context can have a specific character if no other contexts exist with which to compare it (just as no text can exhibit specific traits if there are no other texts from which it can differ). Put in simple semiotic terms, whereas the old musical absolutism had potential signifiers but no signifieds, pomo absolutism had only potential signifieds but no signifiers. Whichever way you look at it, semiosis is out of the question. Such a standpoint is clearly of no use if you want to know how music communicates what to whom, but it must be a godsend to anyone with a canonic axe to grind: with semantics and pragmatics out of the picture, the coast is clear for propagating an authoritarian view of music, not so much because socio-semiotic evidence is inadmissible as because it has been abolished. By mystifying text and disregarding context, Romantic music metaphysics could rank ways of responding to music on a scale of arbitrary aesthetic excellence compatible with bourgeois notions of subjectivity. By mystifying context and abolishing text, pomo-rockology did the same in reverse for the latter-day ideology of consumerism.

The gist of the argument is that pomo-rockology’s socially decontextualised body in an idealised ‘absolute context’ is no better than conventional musicology’s idealised, socially decontextualised emotions expressed in an idealised ‘absolute text’. Bodyism may in fact be worse in one sense, because while conventional musicology relies at least on syntax and diataxis, pomo-rockology speculates about pop/rock aesthetics, viewing semantics with suspicion and throwing both syntax and pragmatics out the window. Indeed, if, as seems to be the case in extreme pomo-rockology, there is no musical text, then there can be neither pragmatics, nor syntax, nor even semantics because, so to speak, the music IS the body (or vice versa) in no specific social context; or rather (which amounts to the same thing), music IS the body in one implicit, idealised, absolute and ‘seamless’ context. If that is the case, we aren’t so much dealing with a latter-day variant of Hanslick’s absolutist claim that music is ‘nothing other than tonal forms in movement’ (music is music), but with something even more metaphysical: the IS of pomorockologist aesthetics conflates music with the body instead of clarifying particular types of relationship between the two, while the body, devoid of social context, remains a culturally undefined entity.

The problem should be clear enough. By conflating signifier with signified, medium with message, message with response, response with text and text with context, pomorockology has, like the finance capitalism under which it grew and flourished, created an inscrutable black box whose contents are jealously protected from scrutiny. All those constituent parts of semiosis are conceptually imprisoned, inaccessible, invisible, nameless. All you get to see is the box, the packaging. This reification of an abstraction which obscures the material and social dynamics of music seems to mirror larger contemporary processes of reification too faithfully for it to be interpreted as a historical fluke, especially in view of other coincidences between, for instance, the celebration of rock intensionality and consumerism’s dependence on immediate gratification, or between the abandonment of rock’s subversive underdog and the dismantling of the welfare state. Viewed from this perspective, it seems that pomo-rockology has helped create the impression of an inscrutable monolith of power in which the political economy, its ideology, culture and patterns of subjectivity are fused into a seamless ‘postmodern’ whole; for if one type of subjective experience of a musical text in a particular context is confused with the music as text, and if that experience is conflated with the specific cultural context in which it occurs, then there can be no negotiation of meaning between text and context. With the effective denial of such negotiation, individual and collective experiences of music are bound to be conceptualised as inscrutable and monolithic. It’s in this way that canonic corporeal oblivion can be understood as a consumerist variation on the old absolutist theme of music as utter submersion, infinite yearning or eternal essence. In short, it should be obvious by now that postmodernist absolutism will be of as of little use as its euroclassical counterpart in getting to grips with matters of musical meaning.

Musical knowledges

The staying power of absolute music, be it packaged as ‘classical’ or ‘postmodernist’, is reflected in and reinforced by the institutional organisation of musical knowledge. This symbiosis of institutional and value-aesthetic categories is fuelled by the intrinsically alogogenic and largely non-denotative nature of music. The problem can be understood in terms of five anomalies, one of which has already been mentioned several times: music’s lowly status in institutions of education and research versus its obvious importance in everyday reality.

The second anomaly follows from the first. While, for example, critical reading and the ability to see below the surface of advertising and other forms of propaganda are considered essential to independent thought, and although such skills are widely taught in literary or Cultural Studies, equivalent skills relevant to understanding musical messages are not. This book is supposed to be a contribution to filling that gap.

Structural denotation

The third anomaly is really another aspect of the second. It highlights disparity between the analytical metalanguage of music in the Western world and that of other symbolic systems. More specifically, it deals with peculiarities in the derivation patterns of terms denoting structural elements in music (structural denotors) when compared to equivalent denotative practices applied in linguistics or the visual arts. This third anomaly requires some clarification.

It is possible at this stage, using a simplified version of terms explained at the start of Chapter 5, to equate the notion of a ‘meaningful musical structure or element’ with Peirce’s sign, i.e. that part of musical semiosis which represents whatever is encoded by a composer, performer, studio engineer, DJ, etc. (the sign’s object) and which leads to whatever is decoded by a listener (the sign’s interpretant). For example, the final chord of the James Bond theme (EmD9), played on a Fender Stratocaster treated with slight tremolo and some reverb, is a structural element (sign) encoding whatever its composer, arranger, guitarist and recording engineer intended (object) and decoded as listener response (interpretant) verbalisable in approximate terms like an excitement/action cue associated with crime, spies, danger, intrigue, etc. The musical structure (sign) is described here from a poïetic standpoint: ‘EmD9’ (‘E minor major nine’) designates how the chord is constructed, ‘Fender Stratocaster’ the instrument on which that chord is played and so on. The description is not aesthesic because it isn’t presented in terms of its interpretant: it isn’t identified as a ‘danger cue’, ‘spy sound’, ‘crime chord’, ‘detective chord’ etc.

Poïetic will qualify terms denoting a structural element of music from the viewpoint of its construction in that such a term derives primarily from the techniques and/or materials used to produce that element (e.g. con sordino, glissando, major minor-nine chord, analogue string pad, phasing, anhemitonic pentatonicism). Aesthesic, on the other hand, will qualify terms denoting structural elements primarily from the viewpoint of perception (e.g. allegro, legato, spy chord, Scotch snap, cavernous reverb).

In the analysis of visual art, it seems, at least from a layperson’s point of view, that it’s just as common for the identification of structural elements to derive from notions of iconic representation or of cultural symbolism as from concepts of production materials and technique. For example, structural descriptors like gouache or broad strokes clearly derive from aspects of production technique and are therefore poïetic, while the iconic representation of, say, a dog in a figurative work of art would be called dog, an aesthesic term, rather than be labelled with details of how the visual sign of that dog was produced. To put some meat on the theoretical dog’s bone, the dog in Van Eyck’s famous Arnolfini marriage portrait could also be considered a sign on indexical as well as iconic grounds, if it were established that dog was consistently interpreted in a similar way by a given population of viewers in a given social and historical context: the dog might be understood as recurrent symbol of fidelity, in which faithful dog would work as an aesthesic descriptor on both indexical and iconic grounds.

In linguistics there also seems to be a mixture of poïetic and aesthesic descriptors of structure. For example, the phonetic term voiced palato-alveolar fricative is poïetic in that it denotes the sound /Z/, as in genre [!ZAnr«], Žižek [!Zi:ZEk] or Zhivago [ZI!vA:g«U], by referring to how it’s produced or constructed, not how it’s normally perceived or understood: it’s an etic (as in ‘phonetic’) rather than emic (as in ‘phonemic’) term. One the other hand, terms like finished and unfinished, used to qualify pitch contour in speech, are aesthesic rather than poïetic: they refer to what is intended by the particular sound or to how it’s interpreted, not to technicalities of its construction.

Given these perspectives, it should be clear that, compared to the study of visual arts and of spoken language, conventional music analysis in the West exhibits a predilection for poïetic terminology, sometimes excluding aesthesic categories from its vocabulary altogether. This terminological tendency may be fine for formally trained musicians but it’s usually gobbledygook to the majority of people and prevents them from verbally denoting musical structures.

Skills, competences, knowledges

The fourth anomaly involves inconsistency in Western thinking with regard to the status of aesthesic competence in language compared to other symbolic systems. Whereas the ability to understand both the written and spoken word (aesthesic skills) is generally held to be as important as speaking and writing (poïetic skills), aesthesic competence is not held in equal esteem when it comes to music and the visual arts. For example, teenagers able to make sense of multiple intertextual visual references in computer games aren’t usually dubbed artistic, nor credited with the audiovisual literacy they clearly own. Similarly, the widespread and empirically verifiable ability to distinguish between, say, two different types of detective story after hearing no more than two seconds of TV music does not apparently allow us to qualify the majority of the population as musical. Indeed, artistic usually seems to qualify solely poïetic skills in the visual arts sphere and musicality seems to apply only to those who perform as vocalists, or who play an instrument, or can decipher musical notation. It’s as if the musical competence of the non-muso majority of the population did not count. The fifth and final anomaly, in fact a set of two times two dichotomies, offers some clues as to a possible remedy.

Table 3-2 divides musical knowledge into two main categories: music as knowledge and knowledge about music. By the former is meant knowledge relating directly to musical discourse and that is both intrinsically musical and culturally specific. This type of musical knowledge can be divided into two subcategories: poïetic competence, i.e. the ability to com-pose, arrange or perform music, and aesthesic competence, i.e. the ability to recall, recognise and distinguish between musical sounds, as well as between their culturally specific connotations and social functions. Neither poïetic nor aesthesic musical competence relies on any verbal denotation and are both more usually referred to as skills or competences rather than as knowledge.

Table 3-2: Types of musical knowledge

Type Explanation Seats of learning

1. Music as knowledge (knowledge in music)

1a. Poïetic

competence creating, originating, producing, composing, arranging, performing, etc. conservatories,

colleges of music



competence recalling, recognising, distinguishing musical sounds, as well as their culturally specific connotations and social functions


2. Metamusical knowledge (knowledge about music)


Competence in


metadiscourse ‘music theory’, music analysis, identification and naming elements and patterns of musical structure departments of music(ology), academies of music


Competence in


metadiscourse explaining how musical practices relate to culture and society, including approaches from semiotics, acoustics, business studies, psychology, sociology, anthropology, Cultural Studies. social science departments, literature and media studies, ‘popular music studies’


The institutional underpinning of division between these four types of musical knowledge is strong in the West. In tertiary education, poïetic competence (1a) is usually taught in special colleges or conservatories, musical metadiscourse in departments of music or musicology as well as in conservatories or colleges, and contextual metadiscourse (2b) in practically any humanities or social science department, less so in music colleges and conventional music(ology) departments.

Aesthesic competence (1b) is virtually impossible to place institutionally because the ability to distinguish, without resorting to words, between musical sounds, as well as between their culturally specific connotations and social functions is, with the exception of isolated occurrences in aural training and in some forms of ‘musical appreciation’, generally absent from institutions of learning. Aesthesic competence remains a largely vernacular and extracurricular affair. Indeed, there are no courses in when and when not to bring out your lighter at a rock concert, nor in when and when not to stage dive, not even in when and when not to applaud during a jazz performance or at a classical concert. And what about the ability to distinguish musically between degrees of threat, between traits of personality, between social or historical settings, between states of mind, behavioural attitudes, types of love or of happiness, sadness, wonder, anger, pleasure, displeasure, etc.; or between types of movement, of space, of location, of scenario, of ethnicity and so on? Those sorts of musical competence are rarely acquired in the classroom: they are usually learnt in front of the TV or computer screen, or through interaction with peers and with other social groups. In fact, the epistemic problem with music, as it has in general been academically categorised in the West, can be summarised in two main points.

Firstly, knowledge relevant to music’s production and structural denotation has been largely separated from that relating to its perception, uses and meanings. Established institutions of musical education and research have therefore tended to favour etic rather than emic and poïetic rather than aesthesic perspectives. Such imbalance, in symbiosis with a long history of class-specifically powerful and metaphysical notions of ‘good’ music’s absolute and transcendent qualities (pp. 84-101), has led to frequent misconceptions about music as a symbolic system (e.g. pp. 46-50, 89-91). This imbalance has also exacerbated ontological problems of music’s alogogenicity and made the incorporation of musical knowledge(s) into a verbally and scribally dominated tradition of learning an even more difficult task.

Secondly, the virtual absence of aesthesic learning (knowledge type 1b) in official education has meant that, compared to analytical metalanguage used with visual or verbal arts, relatively few viable aesthesic denotors of structure exist in musical scholarship. This paucity of user-oriented terminology has restricted musicology’s ability to address the semantic and pragmatic aspects essential to musical semantics. If that were not the case, this book would be superfluous. In addition to these two overriding problems relevant to the development of a simple semiotic approach to music analysis (the real subject of this book), one final major issue of institutional legacy needs to be addressed: Western musical notation.

Notation: ‘I left my music in the car’

Use and limitation

Notational literacy is useful, even in the age of digital sound. Let’s say you need to add extra backing vocals to a recording, that neither you nor the other musicians in your band are able to produce the sound you’re looking for and that you contact some professional vocalists to resolve the problem. You could give those singers an audio file of the mix so far and indicate where in the track you want each of them to come in to sing roughly what at which sort of pitch using which kind of voice. This would be a time-consuming task involving your recording, for demonstration purposes only, something no-one in your band can sing anyhow. It would also involve either extra rehearsal with the vocalists or the risk of them arriving in the studio and failing to sing what you actually had in mind. It’s clearly much more efficient to send the vocalists their parts written out in advance. It’s quicker for them and it’s both quicker and much less expensive for you because you won’t waste studio time and money on unnecessary retakes.

This utilitarian aspect of notation is important for two reasons: [1] it highlights the absurdity of excluding notational skills from the training of professional musicians and it contradicts widely held notions about notation’s irrelevance to the study of popular music; [2] it illustrates that the prime function of musical notation is to act as a set of particular instructions about musical performance rather than as a storage medium for musical sound. This last reason is of particular relevance to the discussion of musical meanings.

Many well-trained musicians can read a score and convert what’s on the page into sounds inside their heads. This ability is no more magical than being able to imagine scenery when perusing a decent physical map. However, although no sign system is totally irreversible, the ability to make sense of any such system presupposes great familiarity with its limitations, more specifically an intimate knowledge, usually non-verbalised, of what the system does not encode and of what needs to be supplied to interpret it usefully. For example, if the vocalists hired for your recording session are professionals and if the notation you sent them is adequate, they should be able to deduce from experience whatever else you want them to come up with in addition to the mere notes on the page. Just by looking at that notation, an experienced musician will understand what musical style it belongs to and, in the case of professional vocalists, will produce classical vibrato, gospel ornamentation, smooth crooning, rock yelling or whatever else you had taken for granted. In short, they will know to apply a whole range of expressive devices relevant to their craft and to the style in question, making decisions about timbre, diction, dialect, pronunciation, breathing, phrasing, vocal register and so on that are nowhere to be seen on the paper or in the email attachment you sent them.

Western musical notation is in other words a useful performance shorthand for certain types of music. It graphically encodes aspects of musical structure that are hard to memorise, especially sequences of pitch in terms of melodic line, chordal spacing and harmonic progression. It can also encode these tonal aspects in temporal terms of rhythmic profile and periodic placement, but it does not convert the detailed articulation of these elements. Moreover, elements of timbre and aural staging hardly ever appear in notation and parameters of dynamics (volume), phrasing, and sound treatment are, if they appear at all on the page, limited to terse or imprecise written instructions like f, cresc., leg., con sord., sotto voce, laisser vibrer, medium rock feel, brisk, etc.

Another important limitation of Western notation is that it was developed to visualise some of the tonal and temporal parameters particular to a specific musical tradition. Just as the Roman alphabet was not conceived to deal with foreign phonemes like /T/, /D/ (th), /S/ or /Z/(sh, zh), Western music notation was not designed to accommodate African, Arab, Indian, Indonesian or even some European tonal practices. Moreover, since the establishment, in the early eighteenth century, of the ubiquitous bar line in Western music notation, it has been virtually impossible to graphically encode polymetric aspects of music from West Africa or parts of Latin America where the notion of a downbeat often makes little sense. Even the frequent downbeat anticipations in basically monometric jazz, blues, gospel, funk and rock styles, so familiar to almost anyone living in the urbanised West, can only be clumsily represented on paper. In terse technical terms, the efficiency of our notation system is restricted to the graphic encoding of monometric music containing fixed pitches which conform to a division of the octave into twelve equal intervals.

Once aware of the restrictions just explained, it is of course possible to make good use of written music, not only as performance shorthand, as with the backing vocalists mentioned on page 121, but also, if you have that kind of training, as a viable way of putting details of tonal and rhythmic parameters on to paper, provided of course that the music in question lends itself to such transcription. Indeed, the analysis of music and its meanings would be easier if scholars held such a pragmatic view. The problem is that these simple truths still have to be explained to some students and colleagues who hold the scopocentric belief that the score is in some way the musical text or the music itself.

Now, given the hegemony of the written word in institutions of European knowledge, it would in one sense be odd if, before the advent of sound recording, music on the page, rather than just fleetingly in the air or as the momentary firing of neurons in the brain cells of members of a musical community, had not acquired a privileged status. After all, notation, despite its obvious shortcomings, was for centuries music’s only tangible medium of storage and distribution. The weight of this legacy should not be underestimated because it ties in with important historical developments in law, economy, technology and ideology. There is no room here to disentangle that nexus but it’s essential to grasp something of notation’s radical influence on music and on ideas about music in Western culture.

Law, economy, technology, subjectivity

Well before the advent of music printing around 1500, notation was already linked to the sort of subjectivity that later became central to bourgeois ideology. Of particular interest in this context is a passage in the entry on notation (Notschrift) from the 1956 edition of Musik in Geschichte und Gegenwart. The article draws attention to the musical doodlings of an anonymous monk who should have been copying plainchant but whose own musical imagination seems to have spilled out on to the parchment. He was supposed to be using the technology of notation to perpetuate the immutable musica humana of Mother Church, not for recording ideas like ‘what if I arrange the notes like this instead?’ or ‘what if I combine these two tunes?’ or ‘what if I change their rhythm to this?’ Of course, the abbot overseeing the duplication of liturgical music has crossed out the offending monk’s notes. Not only had this insubordinate brother made a unholy mess in a holy book; he had also, by committing his own musical thoughts to paper, challenged ecclesiastical authority and the supposed transcendence of God’s music in its worldly form (musica humana). Preserving Mother Church’s music for perpetuity was good; allowing the musical thoughts of a mere mortal to be stored for posterity was not. A millennium or so later, the democratic potential of music technologies like digital sequencing, recording and editing, not to mention internet file sharing, is sometimes ignored or demonised by other authorities, elitist or commercial, whose interests, like those of the medieval abbot, lie in preserving hierarchical legacies of social, economic and cultural privilege.

At least two lessons can be learnt from this story of the wayward monk. One is that there is nothing conservative about musical notation as such, even though its long-standing symbiosis with conservatory training and its conceptual opposition to graphically uncodified aspects of musical production (improvisation, etc.) can lead those who rarely make compositional use of the medium to believe that ‘notes on the page’ constitute an intrinsically restrictive type of musical practice. The anonymous monk’s doodlings and our studio vocalists’ notational literacy (p. 121) both suggest the opposite. It’s also worth remembering that, unlike European classical music, other traditions of ‘learned’ music rely rarely, if at all, on any form of notation to ensure their doctrinally correct reproduction over time.

The second lesson is that the connection between notation and subjectivity has a long history whose development runs parallel with the emergence of notions of the individual discussed earlier. Of particular importance is the process by which, in the wake of legislation about authorial ownership in literary works, creative musicians, no longer subjected to the anonymity of feudal patronage, were able to put their printed compositions on the ‘open market’. In late eighteenth-century London, for example, the market was a growing throng of bourgeois consumers wanting to cultivate musical habits befitting the status to which they aspired. As Barron (2006: 123) remarks:

’The capacity to earn a living by selling one’s works in the market freed the artist of the burden of pleasing the patron; the only requirement now was to please the buying public.’

Notation was a key factor in this development. As the judge, Lord Mansfield, stated during a 1774 court action brought by Johann Christian Bach against a London music publishing house:

’Music is a science: it can be written; and the mode of conveying the idea is by signs and marks [on the page].’

Thanks to these marketable ‘signs and marks’, composers became the legal owners of the ideas the sheet music was seen to convey. Composers became authors of not only a tangible commodity (sheet music) but also of financially quantifiable values derived from use of that commodity: they became central figures and principal public actors in the production and exchange of musical goods and services.

’As the buying public diversified its tastes, many [composers] cultivated greater self-expression and individuality (it was a way of being noticed). Under the sway of patronage,… [the composer] was expected to be self-effacing… Craft counted more than uniqueness… The rise of a wider, more varied and anonymous [public] encouraged [composers] to carve out distinctive niches for themselves. They were freer to experiment, because less commonly working to peer expectation or commission — instead producing in anticipation of demand, even to satisfy their own sense of Creative Truth and personal authority.’

Rameau’s nephew (p. 93) would have been delighted at this turn of events, perhaps even more pleased by the magic attributed to the Artist by representatives of German Romanticism, at least if the following characterisation of their notion of ‘the text’ is anything to go by.

’The text, which results from an organic process comparable to Nature’s creations and is invested with an aesthetic or originality, transcends the circumstantial materiality of the [score]… [I]t acquires an identity immediately referable to the subjectivity of its [composer].’

Here we are back in the metaphysical musical world of Tieck, Wackenroder and Hegel, except that this time we’re armed with notation as legally valid proof of the composer’s subjectivity and of the ‘authenticity’ of his Text/Work/Oeuvre. In short, musical notation in Europe around 1800 stands in the middle of a complex intersection between:

• the establishment of music as a marketable commodity;

• developments in the jurisprudence of intellectual property;

• the emergence of composers from the anonymity of feudal patronage and their appearance as public figures and principal actors in the exchange of musical goods and services;

• Romantic notions of genius and subjectivity.

Add to these four points the problem of music is music (absolute music) and its institutionalisation (pp. 84-95), plus the fact that notation was the only viable form of musical storage and distribution for centuries in the West, and it should come as no surprise that many people in musical academe still adhere to the scopocentric belief that notation is The Music it encodes so incompletely. Indeed, this belief is so entrenched in some muso circles that the word music still often denotes no more than ‘signs and marks’ on paper, as in statements like ‘I left my music in the car’. The institutional magic of this equation should not be underestimated. For example, one research student told me his symphonic transcription of a Pink Floyd track was intended to ‘give the music the status it deserves’; and I was once accused of trying to ‘legitimise trash’ because I had included transcriptions in my analyses of the Kojak theme and Abba’s Fernando.

Another important reason for the longevity of the equation music = sheet music is of course that notation was, for about a century and a half (roughly 1800-1950), the most lucrative mass medium for the musical home entertainment industry. In most bourgeois parlours, the piano was as focal a piece of furniture as the TV in latter-day living rooms. Before the mass production of electro-magnetic recordings in the late 1920s, or even as late as the 1950s and the advent of vinyl records, sheet music was, like an audio file, encoded ‘content’ in need of software and hardware to decode and reproduce. The parlour piano was only part of that hardware; the rest of the hardware and all the necessary software resided in the varying ability of sheet music consumers to decode notes on the page into appropriate motoric activity on the piano keys (or on other instruments, or by using the voice). The sheet music medium on which consumers relied in order to realise an aesthetic use value, hopefully commensurate with the commodity’s exchange value (its monetary price), demanded that they contribute actively to the production of the sounds from which any aesthetic use value might be derived. In this way, consumer preoccupation with poïetic aspects of musical communication was much greater than it was to become in the era of sound recording. Poïetic consumer involvement in musical home entertainment was also greater than that required for deriving use value, aesthetic or otherwise, from a newspaper or novel, especially after the introduction of compulsory education and its insistence on verbal literacy for all citizens: notational literacy was never considered such a necessity, even in the heyday of sheet music publishing.

The fact that those who regularly use Western notation today are almost exclusively musicians, not the general listening public, reinforces the dichotomy between knowledges of music, especially that between vernacular aesthesic competence (e.g. aural recognition of a particular chord in terms of crime and its detection) and the professional ability to denote musical structures in poïetic terms (e.g. ‘minor major nine’). What composers, arrangers or transcribers put on to the page is, as we’ve repeatedly stated, usually intended as something to be performed by trained musicians who, in order to make sense of the ‘signs and marks’, have to supply from their own experience at least as much of what is not as of what is on the page. It goes without saying that it would today be economic suicide to produce sheet music en masse in the hope that Joe Public would derive any value from it. Despite this patent shift in principal commodity form during the twentieth century from sheet music to sound recording, musical scopocentrism is still going strong, not only in the musical academy but also in legal practice. As late as November 2003, a California judge declined to award compensation to a jazz musician whose improvisation had been sampled on a Beastie Boys track. Judgement was passed on the grounds that the improvisation was part of a work whose score the plaintiff had previously deposited for copyright purposes in written form but that the improvisation in question was not included in that copyrighted score.

One final aspect of the dynamic between notation, subjectivity and the institutionalisation of musical knowledges deserves attention if any strategy for developing more democratically accessible types of discourse about music is to be at all viable. This dynamic has to do with the composer’s star status in the Western classical tradition after 1800.

Back-tracking to the nineteenth-century bourgeois music market for the last time, composers became, as we have seen, the legal owners and recognised authors of ideas conveyed through the tangible commodity of sheet music. In this way they also became the most easily identifiable individuals involved in the production of music. For example, the biggest names on popular sheet music covers were, in the heyday of notation, those of the composer and lyricist, while the optional as performed by… data, which only starts to appear in the inter-war years after the commercial breakthrough of electro-magnetic recording, was assigned a much smaller font. Of course, in the classical field, piano reductions and pocket scores virtually never include details of notable recordings of the work in question. Indeed, although nineteenth-century artists like Jenny Lind or Niccolò Paganini were unquestionably treated like pop stars in their day, they never acquired the lasting high-art status of composers enshrined as Great Masters in Western musical academe’s hall of fame. Romantic notions of the individual, of music as a refuge of the higher arts and of virtually watertight boundaries between subjective and objective contributed to this canonisation process. Among the continuing symptoms of this romanticised auteurcentrism is historical musicology’s considerable zeal for discovering musical Urtexts or for re-interpreting Beethoven’s notebooks compared to its relative lack of interest in how such music was used and in what it meant to audiences, either then or more recently. In short, musicological textbooks still tend to deal more with composers, their subjectivity, their intentions and their works, the latter overwhelmingly equated with the poïetically focused medium of notation, than with the effects, uses and meanings of that music from the viewpoint of the usually much greater number of individuals who make up the music’s audiences.

The consequences of notation’s long-standing central position in music education are, in the perspectives just presented, quite daunting. Thankfully, several major twentieth-century developments have highlighted many aspects of the anomalies brought together in the discussion so far. These developments, discussed in the next chapter, have not only enabled a critique of conventional musicology: they also prefigure the sort of ideas presented in Chapters 6-14.

Summary of main points

[1] Music’s relatively low status in the academic pecking order is due not only to its inherently alogogenic nature but also to its institutional isolation from the epistemological mainstream of European thought.

[2] The relative isolation of music from other aspects of knowledge in our tradition of learning is not only due to the latter’s logocentric and scopocentric bias but also to a powerful nexus of historical, social, economic, technological and ideological factors.

[3] Socio-musical power agendas are a severe obstacle to the understanding of music as a meaningful sign system. Music’s relative isolation in our tradition of knowledge is partly due to a long history of institutional mystification: notions of suprasocial transcendence have for thousands of years been a recurrent trait in learned writings about learned musics. The doctrinal ghost of one such notion of suprasociality —absolute music (‘music is music’)— still haunts the corridors of musical academe in the West.

[4] The strong link between absolute music and Romanticist (bourgeois) notions of subjectivity reinforces a more general dissociation or alienation of individuals from social, economic and political processes. In so doing, the link between absolute music and bourgeois notions of individuality also obscures the objective character of shared subjectivity among audiences, placing disproportionate emphasis on the individual composer or artist in the musical communication process.

[5] Postmodernist absolutism is a latter-day variant on its euroclassical counterpart. It exhibits similar traits of: [i] change from radical alternative to established intellectual canon; [ii] repertoire ossification; [iii] adoption and propagation by privileged classes; [iv] metaphysical and illogical discourse, often authoritarian, promoting the superiority of certain musical practices over others.

[6] Postmodernist absolutism came out of literary-style rock journalism and Cultural Studies, not out of institutionalised music studies. While classical absolutism focused on musical texts at the expense of their context, postmodernist absolutism tended to deny the existence of a musical text altogether. In either case semiotic approaches to music are out of the question.

[7] Overriding emphasis on the production of music, rather than on its uses and meanings, is so firmly entrenched in (classical) Western institutions of musical learning that terms denoting elements of musical structure are mostly poïetic, rarely aesthesic. Consequently, those without formal musical training are unable to refer in a doctrinally correct fashion to such structural elements (signs). This lack of officially recognised aesthesic structural denotors makes the discussion of musical meaning by those without formal training a very difficult task.

[8] The longevity of notation as the only medium of musical storage and distribution before the advent of recorded sound, combined with its subsequent status as the most lucrative medium during the early part of the twentieth century, has compounded many of the difficulties mentioned above. Unlike the written word, notation, conceived and used almost exclusively for the production of musical sound rather than for its perception, exacerbates the poïetic imbalance of musical learning in the West. At the same time, notation’s long-standing status as commodity form, combined with its historical association with European notions of subjectivity, especially during the Romantic era and in the wake of legislation rubber-stamping the composer as an authentic originator and owner of marketable property, has further contributed to the poïetic lopsidedness of thought about music in Western institutions. It has in the process also reinforced the metaphysical views of music and subjectivity mentioned in points 3 and 4.


The long and short of these eight points and of the discussion they summarise is that it should come as no surprise if intelligent people capable of embracing a socially informed semiotics of language or cinema are generally unable to do the same with music: the historical legacy of musical learning in the West has simply made that task extremely difficult. At the same time, although it is vital to understand the causes of this problem, it’s also obvious that it must be solved. Musical realities after a century of mass-diffused sound clearly demand that the mental machinery of the historical legacy be overhauled.

Therefore, returning to the analogy that started this chapter, we are perhaps now slightly better placed to determine what cargo to salvage and what to discard along with the ballast of the oil tanker representing the historical legacy just reviewed. Although we may be able to neither manoeuvre the massive vessel satisfactorily nor bring it to a complete standstill, we can at least decrease its inertia and more easily predict its behaviour. If all else fails, we can abandon ship and row our lifeboats towards another point on the shoreline. Hopefully the tanker can be safely moored before it causes more damage so that we can use as much fuel as possible salvaged from its hold to run less cumbersome vessels providing a more efficient and ecologically friendly shipping service in the public interest. Several epistemological lifeboats have already put out. They are the subject of the next chapter. 2012-09-28 19: 30




4. Ethno, socio, semio


his chapter deals with the ‘epistemological lifeboats’ mentioned at the end of Chapter 3. They form an important part of the foundations on which the rest of this book rests. For reasons of brevity, I shall call these lifeboats ethno (as in ethnomusicology), socio (as in the sociology of music) and semio (as in the semiotics or semiology of music). These three qualifiers imply that studying music should, unlike conventional studies in the West which have no such qualifying prefixes, entail considering music as an integral part of human activity rather than as just ‘music as sound’ (absolute music). Put simply, ethno relates music, as defined on page 44, to peoples and their culture, socio to the society producing and using the music in question, semio to the dynamic relations between structure and perceived meaning in music.


The earliest major challenge to institutionalised wisdom about music in the nineteenth-century West came from what is generally called either ethnomusicology or the anthropology of music.

There are several plausible explanations for the rise, in Europe and North America around 1900, of these ethno approaches. One reason may be that alienated European and North American intellectuals sought alternative cultural values to those of the brutal monetary economy they lived in. Another reason may have been concern for the fate of pre-industrial cultures threatened by urbanisation, a third the search for national musical identity. Whatever factors may have sparked interest in ‘folk’ and ‘other’ musics at the turn of the previous century, one thing is clear: ethnomusicology would not have flourished without the invention of recorded sound.


Now, although notation, not sound recording, was, during the first half of the twentieth century, the main musical storage medium in the West, acoustic recording, commercially available since around 1900, allowed collectors of non-notated music to store what they sought to document as it sounded rather than as scholars heard it or were able to transcribe it. Thanks to the new recording technology, standards of reliability in musical documentation improved: collectors could no longer return from field trips with mere transcriptions of the music they wanted to study. Through repeated listening to a recording of an identical sequence of musical events, they could more easily grasp unfamiliar ways of structuring pitch, timbre and rhythm, taking note of all relevant parameters of expression, not just those suited to storage in the European system of notation.

This early development in ethnomusicology is of importance to anyone studying music stored and/or distributed in aural rather than graphic form because focus on musical ‘texts’ shifts from notation to sound recording. With the early ethnomusicologists, audio recording became the primary medium for musical storage and acted as the basis for transcription. Put another way, the roles of notation and recording were reversed. Euroclassical composers and arrangers produced notation that served as the primary medium on which live performance and any subsequent recording were based, whereas the notation of music in other traditions relied on sound recording of a primary live performance for its existence as a text used for purposes of study rather than for (re)performance. Later, after the advent of moving coil microphones and electrical amplification in the 1920s, field recordings by collectors like Peer, Hammond and Lomax were to have an even greater impact: previously non-notated music traditions like hillbilly and the blues could now be stored, reproduced and distributed in quantities that would soon outstrip those of sheet music publishing. By the time of the Beatles’ Sergeant Pepper (1967), of course, media primacy is in the recording, live performance becoming at best an attempt to re-enact the recording on stage, often an outright impossibility, while notation has little or no relevance. Given this historical background, there are at least three reasons for stressing the importance of ethnomusicology’s three great challenges to Western institutions of conventional musical learning.

First: by using audio recording in their studies, early twentieth-century scholars, researchers, collectors and musicians made ‘other’ musics available for interested Westerners to hear, study and appreciate. Through subsequent work by scholars and collectors, more music from more cultures became available on phonogram, this development increasing the Western listener’s chances of finding aesthetic values in a greater variety of musics and substantially reducing the viability of maintaining a single dominant aesthetic canon for music.

Second: due to obvious differences in structure between Central Europe’s musical lingua franca and the ‘other’ musics studied by ethnomusicologists, ‘we Westerners’ could never take the meanings and functions of ‘their’ music for granted in the same way as ‘we’ thought we could with our own. ‘We’ needed explanations as to why ‘their’ music sounded so different from ‘ours’. ‘Their’ music remained incomprehensible to us unless it was related to paramusical phenomena, that is, unless it could be conceptually linked to social or cultural activity and organisation other than what we would call ‘musical’ —to religion, work, the economy, patterns of behaviour and subjectivity etc. If applying notions of the ‘absolute’ to familiar music in familiar surroundings is, as we already argued (p. 91, ff.), a contradiction in terms, applying such notions to unfamiliar music in unfamiliar contexts would be even sillier. So, forced to put the sounds of unfamiliar music into the specific social context of ‘foreign’ culture in order to make any sense of them at all, we had to compare the sounds of our own music with those of people living in other cultures, and the context of their music with our own cultural tradition. Perhaps we would need to ask how ‘our’ music worked in ‘their’ context if ‘their’ music was incomprehensible to us without understanding it in ‘their’ context; and if we had to ask those sorts of question, maybe we would need to start thinking more seriously about how ‘our’ music worked in ‘our’ own context. Whatever the case, understanding anything of the unfamiliar music that ethnomusicologists recorded meant thinking comparatively. It meant reflecting on the givens of our own music, culture and society in order to understand ‘theirs’; it entailed thinking in terms of cultural relativity. Under such circumstances, musical absolutism was out of the question.

Third: as already suggested, attempts at transcribing other musics actualised the limitations of our own system of notation and thereby the limitations of music encodable within that system. This process provided insights into the relative importance of different parameters of musical expression in different music cultures and paved the way for a musicology of non-notated musics. Diversity of aesthetic norms for music became reality and musical ethnocentricity, including Eurocentric notions of musical ‘superiority’, ‘absolute music’ and ‘eternal’ or ‘universal’ values could be challenged. This sense of the relativity of aesthetic norms for music was of central importance in the latter formulation of aesthetic values for all forms of music outside the European classical canon.

In short, ethnomusicology refuted the viability of maintaining just one aesthetic canon. It also drew attention to the importance of non-notatable parameters of expression and, of particular relevance to this book, it obliged any serious scholar of music to deal with questions of function and meaning in a socio-cultural framework.



The earliest text devoted explicitly to the sociology of music appeared in 1921. That date coincides roughly with the invention of the moving coil microphone and with the first broadcasting boom. A few years later, patents were taken out on electro-magnetic recording and on optical sound. These new sound-carrying technologies were essential to the development of radio, records and talking film. Mass diffusion of music via these new media highlighted differences in musical habits between social classes within the same nation state because people were now much more frequently exposed to what ‘everyone else’ —those ‘others’ again!— listened to. It’s also essential to note that the same inter-war years saw momentous social and political upheavals, including the emergence of the Soviet Union, the increasing strength of working-class organisations, general strikes and such disastrous effects of capitalism as the Wall Street Crash, economic depression, rampant inflation and the rise of fascism.

Realisation of this socio-economic-cultural conjuncture and concern about the future of individuals within this new and unstable type of mass society seem to be the main reasons behind the development, not least during the socio-political turmoil of the Weimar republic, of a sociology of music dealing with the everyday musical practices of the popular majority (those ‘others’ again!). Hence, for example, the establishment in 1930 of the Berlin journal Musik und Gesellschaft, subtitled ‘Working Papers for the Social Care and Politics of Music’. Before disappearing after the Nazi Machtübername in 1933, Musik und Geselleschaft had contained articles about, for example, music and youth, amateur musicians, urban music consumers and about music in the workplace. There were, in short, good ethical and political reasons for intellectuals to take a serious look at interactions between music, culture, class, society and values. Out of these political, social and aesthetic concerns about pre-war popular culture emerge two general trends which exert considerable indirect influence on the understanding of music in the West. One of these socio trends was more empirical, the other more theoretical.

The empirical trend in the sociology of music concentrated largely on documenting the musical tastes and habits of different population groups. It can in very general terms be understood as serving both exploitative and democratic purposes. It’s exploitative, for example, when the demographic data it produces is used by commercial media to sell socio-musically defined target groups to advertisers, while its democratic potential lies in the fact that similar demographic data can be used as arguments for the democratisation of public policy in the arts and education. Put simply, the democratic potential of empirical sociology not only contributed to a general broadening of the notion of culture, a conceptual cornerstone in what became Cultural Studies; it also fuelled the opinion that publicly funded music institutions were undemocratic. Such critique helped pave the way for the serious study of musics of the popular majority, musics whose producers, mediators and users are so tangibly involved in the complex construction and negotiation of sounds, meanings, values and attitudes in our own society. Under such circumstances it would be absurd to study music as ‘just music’, illogical to determine any aspect of musical structuration without considering its function or meanings.

Several proponents of the ‘more theoretical’ socio trend held very different views about the music of the popular majority. The most well-known representative of this trend was Adorno, a figure so frequently referred to by other writers on popular culture that anyone seriously studying music in the mass media is almost ritualistically obliged to mention him. One reason for Adorno’s academic notoriety is that, despite the Musik und Gesellschaft connection just mentioned, he is treated as if he were the first music scholar to deal with popular music. The chapter ‘On Popular Music’ from his Introduction to the Sociology of Music (1962) is Adorno’s claim to academic fame in this respect.

Adorno’s ‘On Popular Music’ can best be described as uninformed and elitist. The author seems to have very vague notions about the music, musicians and audience on whom he passes judgement. He also presents a hierarchy of listening modes, according to which concentrated listening as you follow events in the score is right and having music on in the background as you wash the dishes is wrong. Moreover, Adorno’s equation of a strong, regular beat and an easily singable tune with the manipulation of the masses expresses disdain for music’s somatic properties, as well as for the working class which, according to the socialism he professed to embrace, would rid society of the capitalism he himself criticised. How can such a learned man be so contradictory? According to Paul Beaud (1980), Adorno’s deaf ear for popular music can be explained as follows:

‘His texts’ [on popular music] ‘date from his [US-] American period when he was on the lookout for fascism everywhere. Anything resembling rhythm he equated with military music. This was the visceral reaction of the exiled, aristocratic Jew during the Hitler period.’

This plausible explanation raises two other problems. One is that popular music in the Third Reich was not dominated by military marches but by sentimental ballads (Wicke, 1985), a fact substantiating the view that Adorno was out of touch with the musical habits of the populace. The other problem is that Adorno’s aversion to music’s somatic power is contradictory to the point of anti-intellectualism because it precludes the development of rational models capable of explaining music’s relation to the body and emotions. Since, as we’ll see next, Adorno exerted considerable indirect influence on ‘alternative’ studies of music in the second half of the twentieth century, and since no mean amounts of music in our contemporary media have such clear emotional or somatic functions, awareness of Adorno’s shortcomings is essential. Ignorance of popular music, disdain for the musical habits of the popular classes, visceral aversion to music’s corporeal aspects and celebration of its cerebral aspects are hardly the ideal premises on which to base an understanding of Abba, Adèle, Bob Marley, Céline Dion, death metal, James Brown, the Dixie Chicks, games music audio, line dancing, Radiohead, salsa festivals, techno rave, TV themes and so on and so forth.

So, why bother about Adorno at all? ‘Because he has been so influential’ is the easy answer I just gave, but it’s an answer that begs other questions. If Adorno was himself light years away from forming a viable approach to understanding music in the mass media, why is he so often referred to by scholars with that particular field of interest? That question raises serious epistemological issues which anyone trying to develop a musicology of mass-mediated music would be wise to consider. One explanation is that Adorno’s influence on two areas of thought about music has been indirect and paradoxical.

First, Adorno, a musicologist with some high-art composition credentials, introduced music academics to a vocabulary of social philosophy which, despite its obvious shortcomings, made it just that little bit harder for those academics to bury their heads in wonted formalist sand. Second, and more importantly, Adorno was Herbert Marcuse’s mentor and it was Marcuse who popularised the social-critical philosophy of the Frankfurt School among radical U. S. students in the sixties, not least among those who, wittingly or not, contributed to the formulation of the rock canon. It’s in this second way that Adorno indirectly contributed to the establishment of influential types of postwar English-language discourse on music. In journalistic or academic guise, this discourse, which was also influenced by traditions of literary criticism and political theory, seems typically to concern itself with a certain set of social and cultural issues —youth, subculture, fashion, the business and the media, etc.— and with alternative aesthetic canons of ‘authenticity’ in popular music. This aspect of Adorno’s indirect influence is paradoxical because the rock canon of authenticity, for example the ‘spirited underdog’ and the ‘body music that … provokes’, contrasts starkly with Adorno’s cerebral anti-somatic stance.

Two other explanations will serve to complete the bizarre picture that is Adorno’s position in the pantheon of authorities to which scholars of contemporary culture so often seem obliged to refer. One reason is simple: that Adorno was much more widely translated into English than other comparable authorities. This prosaic reply begs the question ‘why Adorno and not others?’

The general gist of the second explanation is that many aspects of Adorno’s writing neatly align with pre-existing value systems and conventional categories of thought in the humanities. More precisely, Adorno is empiriphobic and undialectic on two fronts, for not only are the voices of music’s creators and users conspicuous by their absence in his writings; his work also involves little or no discussion of music as sound. Adorno is on this second count at an advantage in institutions where conceptual boundaries between musical and other types of knowledge are kept tight because no discussion of musical structure means that scholars without musical training can be spared the embarrassment of not knowing what ‘minor-major-nine’ and other items of muso jargon actually mean (see p. 89). For scholars in other arts or in social science, theorising around music (metacontextual discourse) is simply more accessible than discourse involving reference to the sounds of music in the terms of those who produce them (metatextual discourse with its poïetic descriptors). At the same time, Adorno’s lack of ethnographic and socio-empirical concretion, combined with his evident unfamiliarity with the realities of popular culture, are symptomatic of the sort of art criticism or literary theory in which little or no substantiation of value judgements seems to be required. As long as the language is abstruse enough and as long as shared aesthetic values are largely confirmed, disciplinary boundaries can be maintained and there need be no disconcerting paradigm shifts. Add to all this the left-wing credibility inherent in Adorno’s status as a critical intellectual Jew having fled from the Nazis to the English-speaking West and his popularity as reference point for anglophone academics who see themselves politically left of centre should come as no surprise.

In short, Adorno’s value-laden theorising has thrown two major obstacles in the path of those who want to understand how music can carry meaning in contemporary urban society.

[1] By omitting musical ‘texts’ from his discussions of music, Adorno reinforces disciplinary boundaries between studies of musical structuration and other important aspects of understanding music.

[2] By excluding empirical concretion, by privileging unsubstantiated value judgements and by his apparent unawareness of his own ignorance about the music of the popular majority, Adorno has reinforced scholastic tendencies in arts academe to confuse the elegant expression of aesthetic opinion with scholarship.


To summarise: Adorno’s value lies in what his status as much quoted authority tells us about the tradition of knowledge that has kept him in that position. It’s in spite of him that the socio challenge to the old absolutist aesthetics of music met with any success. That challenge came mainly from empirical studies of musical life in the industrialised West, studies enabling scholars to argue for the democratisation of institutions of musical learning, as well as for the validity of studying musics of the popular majority. Socio was also, it should be added, a convenient multi-purpose label which for a very long time could be stuck on to studies that discussed music as an integral part of sociocultural activity or which examined musics outside both the European classical canon and the conventional hunting grounds of ethnomusicology.

One final symptom of problems with both socio trends in music studies links back to the absence of musical ‘texts’ in most work about music in the mass media. Such studies are still overwhelmingly conducted by scholars with a background in the social sciences or cultural studies. It would be unreasonable to demand of those colleagues the expertise associated with the description of musical structures, more reasonable to expect musicologists to have devoted more effort to studying the vast repertoire of musics circulating on an everyday basis via the mass media. With the exception of ethnomusicologists, who until quite recently in general avoided that vast repertoire, very few music scholars examined relationships between that music and the social, economic and cultural configurations in which it plays a central part. As a result of this epistemological gap and thanks to the relative accessibility of the unsubstantiated theorising produced by Adorno, the denial of context associated with Romantic theories of absolute music could be replaced, just as idealistically, with explicit denial of the existence of musical texts. From the musician’s perspective, such text denial is of course absurd. How this problem affects the main point of this book may be easier to understand with the help of Table 4-1 (p. 144).

Fig. 4-1. Typical topics for ethno and socio studies of music

Figure 4-1 suggests that socio approaches deal mainly with social aspects of Western music outside the classical tradition, rarely with music in non-Western cultures. Ethno studies, on the other hand, have traditionally dealt with the musics of non-Western cultures and, as the thick double-ended arrow indicates, with the interaction between music as sound and the sociocultural field of which it is part. Figure 4-1also suggests that conventional European music studies are mainly concerned with the production and description of euroclassical texts, less with the music’s social aspects or with interaction between ‘text’ and ‘context’. An ethnomusicology of ‘other musics in Western society’ (the first two columns on the ethno line in Table 4-1) would therefore be extremely useful if we want to understand the meanings and functions of music in the contemporary mass media. Since such studies are still relatively rare, we may have to look elsewhere.



The semiotics of music, in the broadest sense of the term, deals with relations between the sounds we call musical and what those sounds signify to those producing and hearing the sounds in specific sociocultural contexts. Defined in this way, semio approaches to music ought logically to throw some light on the interaction between any music as text, anywhere or at any time, and the socio-cultural field[s] in which the text is produced and used. Indeed, semio studies of music should ideally exhibit the profile shown figure 4-2. The only trouble is that the majority of music studies carrying the semio label deal only with certain types of music and/or only with certain aspects of meaning. This very broad generalisation needs some explanation since there is no single semiotic theory of music but rather, as Nattiez (1975: 19) has suggested, a range of ‘possible semiotic projects’.

Semio approaches to studying music first appear with that label around 1960 and initially draw quite heavily on linguistic theory of the time. These early studies were later criticised by semio-musicologists who drew attention to problems caused by transferring concepts associated with the denotative aspects of language to the explanation of musical signification. Such laudable caution about grafting linguistic concepts of meaning on to music seems nevertheless to have encouraged a reversion to a largely congeneric view of music. Indeed, the majority of articles in volumes of semio-musical scholarship published in the 1980s and 1990s show an overwhelming concern with theories of music’s internal structuration. The same literature shows much less interest in music’s interrelation with other modes of expression and pays scant attention to music’s paratextual connections (semantics). Evidence linking musical structure to musician intentions or listener responses and discussion of these aspects of semiosis to the technology, economy, society and ideology in which that semiosis takes place (pragmatics) is conspicuous by its absence. This observation is based on the perusal of 88 articles published in three learned semio-musical volumes.18 59 of those 88 articles (67%) discuss either overriding theoretical systems rather than direct evidence for the validity of those systems, or else they deal with syntax, usually in terms of narrative form (diataxis), rather than with semantics or pragmatics. In the remaining 33% (29 articles) a few semantic issues are addressed but only three articles (3.4%) discuss pragmatics, each of those three focusing on musicians, none on music’s final arbiters of signification —its users. Clearly, fixation on narrative form (diataxis) and a lack of attention to semantics and pragmatics will not be much use if we want to understand ‘how music communicates what to whom’ on an everyday basis in the modern world. Indeed, Eco (1990: 256 ff.), emphasising the necessity of integrating syntax, semantics and pragmatics in any study of meaning, provides a very critical opinion of the semiotic tendencies just mentioned.

‘To say that pragmatics is one dimension of semiotic study does not mean depriving it [the semiotic study] of an object. Rather, it means that the pragmatic approach concerns the totality of the semiosis… Syntax and semantics, when found in splendid isolation become… “perverse” disciplines.’ (Eco, 1990: 259)

One possible reason for the lack of semantics and pragmatics in so many music-semiotic texts may be the fact that the type of linguistics from which theoretical models were initially derived accorded semiotic primacy to the written word, to denotation and to the arbitrary or conventional sign. Such notions of denotative primacy were understandably considered incompatible with the general nature of musical discourse. However, denotative primacy has been radically challenged by many linguists. Some of them argue that prosody and the social rules of speech (including also timbre, diction, volume, facial expression and gesture) are as intrinsic to language as words, and that they should not be regarded as mere paralinguistic add-ons. Other linguists refute denotation’s primacy over connotation, and all underline the importance of studying language as social practice (pragmatics). Music semiotics has, it seems, been slow to assimilate such developments in linguistics. How can such reluctance be explained if incompatibility with linguistic theory is so much less of an issue today than it was in the 1960s and 1970s?

The syntax fixation of many musicologists rallying under the semio banner is regrettably difficult to understand in any other terms than those discussed in Chapter 3 —the hegemony of musical absolutism in Western seats of musical learning. While ethnomusicologists had to relate musical structure to social practice if they wanted to make any sense of ‘foreign’ sounds, and while the sociology of music dealt mostly with society and hardly ever with the (socially immanent) phenomenon of music as sound, most music semioticians were attached to institutions of musical learning in which the absolutist view still ruled the roost. Their tendency to draw almost exclusively on euroclassical music for their supply of study objects provides circumstantial evidence for this explanation, not because music in that repertoire relates to nothing outside itself (on the contrary, see p. 89-91), but because the notion of ‘absolute’ music has been applied with particular vigour to music in that tradition. Without exaggerating too grossly, it could be said that the tradition of music semiotics we are referring to is not only ‘perverse’ in the sense put forward by Eco, but also based on a flawed (absolutist) notion of a limited musical repertoire developed during a limited period of one continent’s history by a minority of the population in a limited number of communication situations.

The main problems with the majority of semio-musical writing in the late twentieth century West can be summarised in five simple points.

1. It’s hampered by its institutional affiliation with the ‘absolute’ aesthetics of music.

2. Its objects of study are usually drawn from the limited repertoire of the euroclassical canon.

3. It exhibits an overwhelming predilection for either syntax or general theorising, much less interest for semantics and virtually none for pragmatics.

4. It concentrates almost exclusively on works whose compositional techniques must be considered as marginal, i.e. as the exception to rather than as the rule of current musical practices, codes and uses.

5. It resorts to notation as the main form of storage on which to base analysis.

The general neglect, by musicologists and semioticians, of Western musics outside the classical canon as a field of serious study is of course a matter of cultural politics, but it’s also a matter of importance to the development of both musicology and semiotics. The reason is that music circulating in contemporary media cannot be analysed using only the traditional tools of musicology developed in relation to euroclassical music because the former, unlike the latter, is:

1. conceived for mass distribution to large and sometimes heterogeneous groups of listeners;

2. stored and distributed in mainly non-written form;

3. subject, under capitalism, to the laws of ‘free’ enterprise according to which it should help sell as much as possible of the commodity (e.g. film, TV programme, game, sound recording) to as many as possible.

According to the third point, the majority of music heard via the mass media should elicit some ‘attraction at first listening’ if the music is to stand a chance of making a sell or, in the case of music and the moving image, of catching audience attention and involvement more efficiently than competing product. It also means that music produced under such conditions will tend to require the use of readily recognisable codes as a basis for the production of (new or old) combinations of musical message. Failure to study this vast corpus of familiar and globally available music means failing to study what the music around us usually mediates as a rule. It surely makes more sense to start by trying to understand what is mediated in our culture’s mainstream media before positing general theories of signification based on discussion of subcultural, counter-cultural or other ‘alternative’ musical codes like avant-garde techno, speed metal, bebop, Boulez, Beethoven’s late period or any other repertoire contradicting or complementing rather than belonging to the dominant mainstream of musical practices in our society. Using exceptions to establish rules may be considered standard practice for scholars projecting an image of high-art or high-cred cool but it is not a viable intellectual strategy for constructing a semiotics of music in the everyday life of citizens in the Western world.

The neglect of popular music as an area for semiotic analysis causes other basic problems of method. We have already touched on tendencies of graphocentrism which treat the score as reification of the ‘work’ or ‘text’ when in fact the notes represent little more than an incomplete shorthand of musical intentions. Such confusion is less likely in the study of popular music because notation has for some time been superseded as the primary mode of storage and dissemination to the extent that popular music ‘texts’ are usually either commodified in the form of sound recording carried on film, tape or disc, or stored digitally for access over the internet. Due to the importance of non-notatable parameters in popular music and to the nature of its storage and distribution as recorded sound, notation cannot function as a reliable representation of the musical texts circulating in the mass media.

Moreover, it is probable that the professional habitat of music semioticians in institutions of conventional music studies which still focus on the euroclassical canon tends to encourage a return to the old absolutist aesthetics as the line of least intellectual resistance. Conventional musicology’s pre-occupation with long-term thematic and harmonic narrative (diataxis) usually precludes discussion of the meaningful elements of ‘now sound’ (syncrisis) from which musical episodes or sections are constructed, and without which no narrative form can logically exist.

This account of the semio phase has so far been quite discouraging. We seem to have ended up where we started (p. 133), still dogged by notions of musical absolutism. I find myself describing a subdiscipline that is semiotic by name rather than by nature. In fact, I’d argue that if the semiotics of music, at least as I’ve encountered it institutionally, were a commercial venture, it might well qualify for indictment under the Trades Description Act.

There are, however, exceptions to the general trends of grand theory and syntax fixation just discussed. A few of these exceptions are explicitly semio, while most of them are semiotic by nature if not by name. They have all informed, to varying degrees and in different ways, the type of approach presented later in this book and have all challenged, sometimes in the face of considerable opposition, the institutionalised conventions of absolute music. One work deserves special mention in this context: it is Francès’ doctoral dissertation La perception de la musique (1958), a thoroughly researched and pioneering semio-musical work that has influenced the ideas presented in this book but which is seldom mentioned by those who defer to Adorno or who rally under the semio-musical banner. For reasons of space we can do no more than merely list, in the next footnote (no. 27), some of the other ‘semio exceptions’ relevant to the issues raised in this book.


This chapter has dealt with twentieth-century challenges to the graphocentrism and to the absolutist aesthetics of music in official institutions of education and research in the West. Although some of the tendencies described seem to have done little more than reformulate conventional conceptual differences between musical and other forms of knowledge (the socio avoidance of music as sound, the semio syntax fixation, etc.), the three challenges —ethno in particular— have made it much easier to address questions of musical meaning in the everyday life of citizens in the Western world. At the same time, although an absolutist aesthetics of music may still be on the agenda of many learned institutions, it can also be viewed as a historical parenthesis: it has after all only been ‘official policy’ in Western institutions for a century and a half. More importantly, everyday musical reality outside the academy has been consistently ‘unabsolute’. Musicians have continued to incite dancers to take to the floor and to gesticulate energetically or smooch amorously, while lonely listeners have regularly been moved to tears by sad songs and derived joy or confidence from others. More recently, movie-goers and TV viewers have been scared out of their seats, or they have distinguished between the good and bad guys, or reacted to urgency cues preceding news broadcasts, or registered a new scene as peaceful or threatening, or understood that they are in Spain rather than in Japan or Jamaica, etc., etc., all thanks to a second or two of music carrying the relevant message on each occasion

Even inside the academy, the notion of music as a symbolic system never really died. There were always champions of musical meaning, people like Herman Kretzschmar, who declared ‘autonomous instrumental music’ to be a ‘general danger to the public,’ or Deryck Cooke (1959), or, as just mentioned, Robert Francès. But there were also organists. Organists? Yes, church organists have always had to do things like extemporise between the end of their initial voluntary and the arrival of the bride at a wedding service or the coffin at a funeral. On such occasions, organists have to create moods encouraging the congregation to adopt appropriate postures and attitudes. My own organ teacher even encouraged me to word-paint hymns, as the following zoom-in on one microcosm of actual music-making demonstrates.

Number 165 in the old Methodist Hymn Book (1933) is ‘Forty Days and Forty Nights’, a popular hymn for Lent, referring to Jesus fasting in the wilderness and usually sung to the tune Heinlein by M Herbst (1654-1681). The words of verse two are:

Sunbeams scorching all the day, Chilly dewdrops nightly spread,

Prowling beasts about Thy way, Stones Thy pillow, earth Thy bed.

My organ teacher, Ken Naylor, I learnt to apply variations of timbre to each of the four lines just cited. For line one I would, on the Great manual, push down all mixture tabs, fifteenths, etc., flick up all 16-foot and loud 8-foot tabs, and remove my feet from the pedals. These poïetically described actions translate into aesthesic terms as follows: I removed the dark, booming low notes and produced a sparkling, sharp, bright, high-pitched, edgy timbre for ‘sunbeams scorching all the day’.

For line two’s ‘chilly dewdrops’ I moved from Great to Choir organ, making sure that 4- and 2-foot claribel flutes were in evidence. I would still desist from using the pedal board. This operation produced a smaller, much less sharp, more rounded, cooler, slightly airy but precise and delicate kind of timbre, still without the darkness of bass notes.

For the ‘prowling beasts’ of line three I lifted my hands up to the full Swell organ with all its reed stops connected, ensuring at the same time that my feet were playing all possible passing notes in the bass line assigned to the 16-foot Posaune. Full reeds on the Swell is as close as a church organ gets to guitar distortion: it gives a rich, gravelly, ‘dangerous’ kind of sound. Together with the low-pitched, rough sounding Posaune —not unlike the fat bass timbre of an Oberheim synth— and the insertion of extra notes to produce a walking bass line, the ‘prowling beasts’ were appropriately ‘musicked’, I thought.

In line four I returned to the Great, this time with only 8-foot Diapasons selected, while disabling the 16-foot Posaune pedal tab and suppressing the tendency to go on playing passing notes with my feet. The idea here was to create a medium-volume sound, quite large but devoid of brilliance, delicacy or rough edges —a loudish sort of flat, medium, ‘grey’, ‘matter-of-fact’ sound for ‘stones thy pillow, earth thy bed.30

This personal anecdote documents a musical reality that flies in the face of ideas propounded by musical absolutists, partly because the sounds I produced actually communicated something to someone other than myself, making me aware of relationships between timbre and various aspects of touch, movement and space. As a musician I also learnt which harmonies made the old ladies in the local Methodist church more sentimental, which bass licks worked better with members of my university’s Scottish Country Dance Society, which placement of which mike connected to which amp with which settings made me sound more like Jerry Lee Lewis, which patterns on a Hammond organ made people think our band resembled Deep Purple, which type of arpeggiation made the accordion sound more French, etc. It’s this kind of experience, which I share with countless other musicians, arrangers and composers, that motivated my attempts to critique the dry theme-spotting exercises of syntax-fixated music analysis —the story so far in this book— and to develop ways of examining music as if it had uses beyond its mere self as just sound, i.e. as if it actually meant something. The rest of this book takes that sort of empirically proven poïetic conviction for granted.

Summary of main points

[1] Ethnomusicology has been particularly important in developing ways of relating music as sonic ‘text’ to its meanings, uses and functions. It has also demonstrated the absurdity of propagating one single aesthetic canon for all music and, through its pioneering use of sound recording, drawn attention to the importance of non-notable parameters of musical expression.

[2] Two types of sociology, neither of which concerned itself with musical structuration, have made an indirect contribution to the development of analysis methods presented in this book. Through Adorno a tradition of critical theory became popular among students of literature, communication studies and Cultural Studies, while, more importantly, empirical, demographic sociology helped motivate the inclusion of popular music in academe, i.e. music evidently incompatible with notions of the ‘absolute’ and clearly demanding a different mind-set.

[3] Despite its theoretically promising potential, the semiotics of music, with its disciplinary habitat in seats of conventional musical learning whose corridors were still haunted by the ghost of absolute music at the turn of the millennium, focused largely on syntactical aspects of musical semiosis at the expense of semantics and pragmatics. Alternative views of music as meaningful sign system (e.g. Kretzschmar, church organists) nevertheless persisted throughout the reign of musical absolutism and have influenced the development of analytical method used in this book. 2012-09-28, 19:30

Fig. 4-2.

Ideal topics for SEMIO studies

_________________________________________ 2012-09-28, 19:30


5. Meaning and communication


Fig. 5-2. Shampoo: (a: left) Timotei advert (Sweden, c. 1980)

(b: right) Elvira Madigan (Widerberg, 1967): videocassette cover


Sorting out notions of ‘music’ is what this book has mainly been about so far and the previous chapter ended with a promise to treat music as if it actually meant something beyond itself. Indeed we shall, but the promise cannot be fulfilled without first bringing some order into the concepts of meaning and communication.

Concepts of meaning

Meaning, sign, semiotics

Meaning, in the sense of one thing conveying, indicating or referring to something else, is a recurrent concept in this book. Signification, treated here as a virtual synonym to meaning, contains the morpheme sign. Sign, in its turn, simply means a thing indicating or representing something other than itself. It’s in this sense that Charles Peirce, US philosopher and father of modern semiotics, ended up by using the word. Sign also turns up in expressions like sign system and sign type.

Sign system denotes a set of conventions of meaning, like this kind of written English, or like impressionist painting, or like music for silent films in North America and Western Europe. Sign type designates the way in which a sign relates to what it signifies, for example, if it physically resembles what it means (icon, p. 161) or if the relation is arbitrary or conventional (p. 163). Sign is also a translation of the Ancient Greek words sēma (σήμα) and sēmeíon (σημεΐον) found at the root of words like semiotics, semiology, semiosis, semaphore and semantics.

Semiotics, deriving from Peirce’s semeiotic, means the systematic study of sign systems. Semiology, a term coined by Swiss linguist Ferdinand de Saussure, is generally used to mean the same thing as semiotics. There are some important differences, a few of which will be discussed shortly, between Peircean and Saussurean terminology. Saussure’s most widely used concepts are probably the signifier, a translation of the French word signifiant (» sign) and the signified (signifié = what the sign stands for or represents).

Unlike silicosis, semiosis is not a clinical condition but, like osmosis, a process. Semiosis is simply the process by which meaning is produced and understood. It includes the totality of, and the connections between, three elements that Peirce called sign, object and interpretant, and which I’ll explain next. As already suggested, it’s simplest to think of the sign as a thing, with an identifiable physical existence, that represents or stands for something other than itself.

Semiosis: your aunt’s dog and a steel guitar

Let’s say that the sign is a photo you once took of your aunt’s dog. The photo clearly isn’t your aunt’s dog —it’s a photo of it—, even though you might point to the photo and say ‘that’s my aunt’s dog’: the photo represents your aunt’s dog. What you saw the moment you took the photo, that momentary visual perception, constitutes what Peirce calls the object, while the photo representing that object is its sign. However, when you look at the photo long after you took it and see my aunt’s dog, your visual perception can never totally correspond with what you saw when you took the photo (its object). This later perception and interpretation of the sign, rather than your perception of the dog when you took the photo, is called its interpretant. Now this distinction between object and interpretant might seem like academic nit-picking because it’s obvious that the photo looks like your aunt’s dog. Still, that very obviousness can be a problem because differences between object and interpretant, as well as between interpretants, inevitably occur in relation to the same sign. Those differences cause meanings to be renegotiated, to change and to adapt to new needs, functions and situations. To understand that dynamic more easily, let’s go back to your aunt’s dog and put some more meat on the poor animal’s conceptual bone.


Many years after taking the snapshot, you open your family album and look at that same old photo of your aunt’s dog. Note first that it has now become ‘that same old photo’. Time has passed, you are different and circumstances have changed but the photo (the sign) remains the same. Maybe your beloved aunt has died in the meantime, or maybe you subsequently learnt things about her that put her in a bad light. Or perhaps you yourself now have a devoted dog, or perhaps you were badly bitten recently by one that looked like the dog in your photo. Any of these factors could easily affect the interpretant[s] you form when looking at the same photo at that later date. True, the prosaic my aunt’s dog aspect of the interpretant will still work after all those years, but it will likely give rise to an array of different final interpretants, ranging from wistful longing for bygone days, when you were a child and you played with your kind aunt’s dog, to what a mangy mongrel! or what a mean old woman! And just wait until you start showing your my aunt’s dog photo to friends and family. When you do, they will, in their turn, form other final interpretants of the photo. The content of those interpretants will depend on things like how well your family or friends knew your aunt and her dog, on whether or not they like dogs, whether or not they like you, and on a whole host of other factors. Whatever the case may be, this my aunt’s dog story illustrates the necessity of distinguishing between object and interpretant, as well as between interpretants, in relation to the sign. These distinctions are essential when it comes to understanding how musical signs work, how the same sounds can mean different things to different people in different contexts at different times.

A complementary way of understanding semiosis is, as I just implied, to look at it in terms of a message and its communication. There are three main aspects to this process, too: [1] the thing or idea to be encoded (similar to Peirce’s object), [2] the concrete form of that code —the sign— and [3] the decoded version or interpretation of that code (similar to Peirce’s interpretant). Seen in this light of intention and interpretation, the ‘ideal’ semiosis would theoretically produce total unity between the sign as semiotically intended and as interpreted. The word chair would, for example, represent a fully identical notion of chair in the minds of both speaker/writer (as an object) and listener/reader (as an interpretant), while the photo of your aunt’s dog would be perceived, by anyone at any time, in exactly the same way as you saw the dog when you took the photo. Since exact correspondence between intended and interpreted message is impossible (and we’ll shortly see how, even in the case of chair), semiosis is also sometimes used to refer to processes by which meanings of existing signs are modified and renegotiated, as with your interpretants that changed over time in relation to the same my aunt’s dog photo.

To put a musical slant on these observations about shifts in meaning over time, just think of the distinctive wining sound of the pedal steel guitar in Country & Western music. This sound may have derived something from dobro and slide guitar techniques in the US south, but its most obvious sonic forerunner is the Hawaiian guitar, popular in the USA in the late 1920s and early 1930s, before electrically amplified musical instruments were commonplace. To cut a long story short, from originally connoting things like Hawaii and sunshine, those steel guitar glissandi (swooping, sliding sounds) were gradually incorporated into the C&W mainstream and ended up as style indicators of Country music without the Hawaiian connotations. The advantage of looking at semiosis in such ways is that, by including intention as well as interpretation, the semiotic process is more open to understanding in terms of social and cultural interaction.


Semantics, a term coined by French linguist Michel Bréal, is defined as ‘the study of the relationships between signs… and what they represent’. Semantics is just one aspect of semiotics (or semiology) and the word is often used in contradistinction to both [a] syntax (the formal relationships of one sign to another without necessarily considering their meaning) and [b] pragmatics (the use of a sign system in concrete situations, especially in terms of cultural, ideological, economic and social activity). Now, as we noted earlier, to prevent semantics, the main focus of this book, from becoming a ‘perverse discipline’ (Eco, 1990: 259), it must be related to pragmatics. This imperative has at least two important implications.

Eco’s imperative firstly implies that a synchronic semantics (examining signs at one given point in time in one given culture) isn’t enough on its own: it needs a diachronic perspective that involves studying meaning as part of a dynamic sign system subject to change. The from Hawaii to Country mainstream process, described above, illustrates a diachronic line of semantic reasoning that can be called etymophony. If etymology studies the ‘historically verifiable sources of the formation of a word and the development of its meanings’, etymophony simply means studying the origins of a non-verbal sonic structure and the development of its meanings and functions over time.

The second implication of Eco’s imperative is both synchronic and diachronic. It entails relating semantics (‘relationships between signs and what they represent’) to factors in the socio-cultural field in which the musical meanings under examination are generated and used. These meanings obviously both inform and are informed by value systems, identities, economic interests, ideologies and a whole host of other factors that constitute the socio-cultural biosphere without which music and its meanings, as just one semiotic sub-system among others, cannot logically exist. We’ll soon return to one aspect of this essential part of musical semantics (see ‘Codal interference’, p. 182, ff.).

Semiotics and semiology

When denoting the study of sign systems, speakers of French and Spanish seem to prefer sémiologie/semiologia, while anglophones, Italians and others tend to use semiotics/semiotica. This confusion may eventually be resolved like the VHS versus Betamax battle over videocassette formats in the 1980s but it’s impossible to predict which concept, if indeed either, will oust the other. In the meantime, I’ll be using semiotics rather than semiology will be used here for two reasons. [1] A book written in English ought logically to use English-language terms. [2] Two of Peirce’s numerous trichotomies (sign - object - interpretant and icon - index - ‘symbol’) substantially inform the conceptual basis of what follows. Even so, in order to save space, Saussure’s binary notion of signifier and signified, where signifier is roughly equivalent to Peirce’s sign and signified means what the sign stands for (in terms of both object and interpretant), will sometimes be used as shorthand, not as a replacement, for Peirce’s trichotomy object - sign - interpretant. Another terminological problem is that Peirce uses symbol to denote what Saussure calls sign and vice versa. To avoid this confusion when discussing semiosis, I shall try to avoid symbol altogether and stick to sign in the Peircean sense. That means Peirce’s symbol / Saussure’s sign needs another label. Arbitrary sign is what I use to cover the concept (p. 163).

Two Peircean trichotomies

First, second, third

Peirce closely examined and classified all types of signification. Radically simplifying his overall system, you could say that the relationship between an audible sound and the human perception of that sound —as that sound alone without mediation— constitutes his notion of firstness: it’s phenomenologically just one thing, so to speak, even though the sound and its perception are physically separate entities. It’s just like the oneness of your aunt’s dog as such and your perception of it when you took the famous photo.

Secondness is easier to grasp semiotically because (surprise!) it has two poles. The musical sound as sign (one pole) includes, relates to and represents its firstness (the other pole), just as the celebrated dog shot relates to your perception of the dog when you took the photo. For example, soft, slow, smoothly swaying music, as in a lullaby, isn’t the same thing as soft, slow, smooth, swaying as such: it represents that movement in sound. There is a sign (the sound) and an object (the idea of movement and touch perceived as representable in sound).

The three elements of thirdness are: [1] sign (the sound of the lullaby); [2] object (explained under secondness) and [3] interpretant[s] (interpretations of the lullaby, including recognising it as a lullaby rather than a war song). Final interpretants might be: nostalgic feelings of comfort, images of an adoring parent singing a much loved infant to sleep, the smell of baby powder, evening light shining through a chink in the bedroom curtains, etc.

Icon, index, arbitrary sign

Peirce’s next three trichotomies are like a ninefold Kyrie in that firstness, secondness and thirdness each gives rise to its own three categories of sign. Since I shall concentrate on musical semantics, oneness will be taken as read. Secondness and thirdness, however, are of direct relevance to the topic. Still, to avoid death by conceptual drowning in Peirce’s trinities of 9, 27 and 81 categories, each with its own abstruse label, and so as to open up our musical semantics to sociocultural considerations through pragmatics, thirdness will be discussed in more accessible terms and use of Peirce’s sign types will be restricted to those of secondness. Peirce’s trichotomy of secondness distinguishes between icon, index (plural: indices) and arbitrary sign (what Peirce called symbol and Saussure called sign).


Icons are signs bearing physical resemblance to what they stand for. Iconic resemblance can be striking, as in photographs or figurative painting, but maps and certain types of diagram are also iconic because there is at least some structural resemblance, though less patent, between the signs and what those signs stand for. Even the representation of rising and falling pitch, of legato slurs (smooth) and staccato dots (choppy) in musical notation can to some extent be qualified as iconic. However, the visual representation of sonic events can only be considered a resemblance if conventions of synaesthetic homology are in operation allowing us to equate certain signs encoded in one mode of perception (e.g. visually, as staccato dots on the page) with certain objects/interpretants existing in another (e.g. sonically, as intermittent, choppy, pointillistic, aurally pixelated, etc.). Since, as explained earlier (pp. 62-67), synaesthesis is intrinsic to music, we will have to refine the notion of icons in music to cater for conventions of synaesthetic homology (see ‘Anaphones’, p. 487, ff.). Here, though, we need to get to the most obvious aspect of musical iconicity, i.e. to sounds as signs physically resembling the sounds they stand for.

If a photo like my aunt’s dog is an icon of the whatever it’s supposed to represent, then a musical recording ought logically to be considered an icon of the music as it sounded when recorded. However reasonable that assumption may be for live recordings, there are good reasons for considering icons differently as a musical sign type. One reason is that the sound of a recording does not even reach semiotic oneness until the sounds are actually perceived by someone hearing it, even less reach the semantic stages of secondness and thirdness where sonic signs can relate to objects and interpretants. It’s at these stages that musical icons (sonic anaphones, see p. 487, ff.) come into play, such as a low-pitched drum roll sounding like the rumble of distant thunder, or an overdriven electric guitar sounding like a Harley Davidson, or two consecutive notes a third apart on the piano imitating the call of a cuckoo, etc. None of these sounds like examples function solely as icons because distant thunder can mean danger, while a Harley might connote a pack of Hell’s Angels and cuckoo notes on the piano might make you think of a spring morning or of your junior school music teacher.


Distant thunder meaning danger, smoke meaning fire, dark clouds meaning rain —these are all examples of semiosis using a causal index as sign. Indices are signs connected either by causality, or by spatial, temporal or cultural proximity, to what they stand for. This sign type is so important in music that virtually all musical sign types can be considered as at least partially indexical. Some types of indexical sign are more common than others in musical semiosis, for example a type of metonymy called synecdoche [sI!nEkd9kI]. In language, synecdoches are part-for-whole expressions like the crown meaning the monarch and royal power in toto, not just a piece of bejewelled headgear; or like fifty head of cattle meaning not just the animals’ heads but fifty complete bovine beings. Synecdoches work similarly in music, for example, the overdriven guitar connoting, via the sounds like a Harley icon, an entire pack of Hell’s Angels and not just the bike, or the cuckoo notes on the piano connoting the entirety of a spring morning rather than just the cuckoo that happened to be part of the soundscape at the time. Another example would be (at least as non-French) seeing old Paris in your mind’s eye on hearing specific figurations in waltz time played on a French accordion (accordéon musette). That semiosis is typically synecdochal because only one tiny set of all the musical sounds circulating in Paris before World War II have come to connote the totality of that time, that place, its culture, its popular classes, their habits and activities, all more likely in black and white, too, rather than in colour.

Arbitrary sign

An arbitrary sign (Peirce’s symbol) is connected only by convention to what it represents. Examples of arbitrary signs in the English language are table, because, grass, semiotics, but, think, grateful, pullover and most other words and phrases. This sign type is called conventional or arbitrary because it is supposed that nothing but convention prevents a word like theology from denoting a can-opener, whereas it’s highly unlikely that an indexical sign like Champagne (the wine) will ever mean Polish vodka or lawn-mower, and impossible that smoke from a fire will mean the fire has gone out or that you have run out of sugar. In other words, a sign can be called arbitrary when its semiosis exhibits no readily discernible elements of structural similarity (icons), or of proximity or causality (indices), between sign and object/interpretant.

Arbitrary signs are rare in music, except for things like instrumental versions of national anthems or instrumental passages from Eurovision Song Contest tunes. In these cases there is rarely any musical signifier, iconic or indexical, of a particular national identity, the main point of the music often being generic, apparently: to sound like a national anthem or like a Eurovision Song Contest entry. It is only paramusical evidence —the language in which the melodies are sung, or, in the case of a national anthem, which flags are flown behind the Olympic medallists’ podium— that give uninitiated listeners a clue as to which nation the anthem or the Eurovision song represents. In other instances where musical signs are apparently stylised to the point of convention, some vestige of non-arbitrary semiosis, iconic or indexical, always remains. For instance, four French horns, in unison, playing broad, strong, consonant melodies in the upper middle register of the instrument, still sound heroic, even in space (as in Star Wars), despite the fact that the etymophony of that horn sound is shrouded in the historical mists of rural Europe, when horns were used in hunting or to clear the road for stagecoaches. That specific indexical link in history with quick, strong, energetic male activity may be lost on modern listeners but it has passed into stylised convention. Other aspects of the original semiosis remain, because those ‘heroic’ horn melodies move swiftly in broad, strong, sweeping and energetic gestures and because fast, broad, and strong are still supposed to be heroic characteristics.

Denotation and connotation

Denotation and connotation designate two different types of semiosis. By denotation is meant the lexical type of meaning associated with dictionary definitions and with arbitrary signs. The word table, for instance, denotes ‘a flat horizontal slab or board supported by one or more legs’; it doesn’t connote it. Similarly, theology doesn’t connote the idea of studying religious beliefs: it denotes that idea. However, in the statement smoke means fire, neither the phenomenon smoke nor the word smoke denotes fire: it’s the perception of smoke that connotes the presence of fire through causal indexicality. Despite the fact that smoke means fire exemplifies a more tangible type of semiosis than does theology’s link with the idea of studying religion, denotation is still often considered to be a less vague type of semiosis than connotation. Eco (1990: 6) challenges this assumption, branding the imagined solidity of denotative signification through arbitrary signs ‘rigid designation’, adding that language ‘always says something more than its inaccessible literal meaning’. If Eco’s observation is true for language, it’s even more relevant to music which, as just suggested, rarely uses arbitrary signs. Since music is highly connotative, it’s worth examining the concept of indexical connotation in more detail. I’ll apply Eco’s ideas to the semiosis involved in the statement ‘smoke means fire’.

I’ve shortened where there’s smoke there’s fire to smoke means fire. In so doing, I substituted an indexical observation of simultaneity (smoke at the same time as fire) with one of causality. I can do that because, unless we’re talking about stage smoke (liquid CO2), fire causes smoke. Now fit your smoke alarm as instructed (good) and go to sleep with a burning cigarette (bad). Your smoke alarm wakes you up. Its piercing sound is triggered by smoke caused by fire. You hear that loud, sharp sound (the sign) and you know it means fire (interpretant) and other alarming things, like wake up, get out of the house and don’t die (final interpretants). The alarm sound doesn’t denote fire like the word fire, nor does it directly mean fire indexically like the smoke you may or may not see that is caused by fire you are even less likely to see. The connection between the smoke alarm sound and fire is one of connotation: the alarm connotes a particular sort of fire and everything you know goes with it, because the relationship between the alarm sound as signifier and the fire as signified, with all its connotations, presupposes previously established levels of signification. These distinctions are essential in understanding how connotation, a central aspect of musical semantics, actually works.

The ‘previous levels’ just mentioned are all indexical and causal, namely the relationships [1] between the alarm sound and smoke (smoke triggers the alarm), [2] between smoke and fire (fire causes smoke), [3] between fire and danger (babies have to learn that fire hurts). With these three previous levels of signification you are able to connote the specific threats of multiple burns, asphyxiation and possible death with the sound of a smoke alarm. In Eco’s terms (1976: 55), ‘connotation arises when a signification is conveyed by a previous signification, which gives rise to a superelevation of codes’. The form of this ‘connotative semiotics’ is shown in Table 5-1.

Table 5-1. Smoke alarm: connotation as

superelevation of previous signification

Signifier Signified

Signifier Signified

Danger! Get out!

Signifier Signified


alarm noise smoke


According to Eco (1976: 55), ‘there is a connotative semiotics when there is a semiotics whose expression plane is another semiotics’. So, in the smoke alarm example, the interpretant (signified) of the three former significations combined —[1] the alarm sound is caused by smoke, [2] smoke is caused by fire and [3] the great pain of skin burns is caused by fire — becomes the signifier of a fourth signified: don’t die! get out! Thus the smoke signifies fire indexically, but the sound of the smoke alarm also connotes both danger and evacuation associated with fire thanks to the previous semiotic relationships. Eco continues his critique of denotative hegemony in conventional linguistics as follows.

‘The difference between denotation and connotation is not... the difference between “univocal” and “vague” signification, or between “referential” and “emotional” communication, and so on. What constitutes a connotation as such is the connotative code which establishes it; the characteristic of a connotative code is the fact that the further signification conventionally relies on a primary one.’

This critique of received wisdom about denotation and connotation segues into the next and equally problematic point —the widely held assumption that music is intrinsically polysemic.


Polysemy and connotative precision

Polysemic —from Greek poly (πολύ = many) and sēma (σήμα = sign)— means signifying many things at the same time, i.e. that the same sign is linked to many different objects and/or interpretants. Now, there is no doubt that music is polysemic from a logocentric viewpoint and I often produce the lexically incongruent concepts Austria and shampoo to illustrate the point. Austria is a middle-sized Central European nation famous for its capital city, Vienna, for mountains, Strauss waltzes, downhill skiing, Mozart and a host of other things that have nothing to do with viscose liquid that comes in small plastic bottles and that you apply to your scalp when washing your hair in the privacy of your own bathroom. Despite these patent differences, I claim that Austria and shampoo belong to the same, well-defined semantic field. That sounds ridiculous, so I’d better explain.

A one-minute extract from a romantic film theme (The Dream of Olwen by Charles Williams) was played without visual accompaniment to 607 listeners. Respondents were asked to jot down notes for a suitable film scene or anything else that came into their mind when hearing the piece. The most common responses were love, romance and either a couple or a single woman seen strolling through the grass of a summer meadow. Other common responses were waving corn, rolling hills, the long flowing hair and dress of the woman they saw, the swell of the sea in a summer breeze, billowing sails, a flowing river, olden times, etc. Several respondents imagined scenes in either England, France or Austria. Now, the Austria envisaged by respondents was not the Dolomites in bad weather, nor skiing at Kitzbühel, nor eating Sachertorte in a Konditorei, nor the airport or oil refinery at Schwechat. No, it was the Austria of The Sound of Music, in particular a woman in a long dress strolling through green meadows. This cluster of responses describes the scene, shown as Figure 5-1 (p. 168), in which Julie Andrews bursts into the film’s title song (‘The hills are alive with the sound of music’). Now, that scene features a fine open-landscape panorama quite different to the confines of a shower cabin where shampoo is applied to the scalp. The question is obvious: how can shampoo be like strolling through the green grass of an open meadow?

Well, the shampoo that respondents mentioned was no more shampoo as such than the Austria they saw was lexically Austria. Respondents were in fact alluding to a Timotei shampoo advert featuring a young woman, with long, flowing hair and a long, flowing, old-style white cotton dress, moving in slow-motion through the long grass of a summer meadow and watched longingly by a young man in the background (Fig. 5-2a). This scene may well derive from the famous love in the long grass love scene from Elvira Madigan (Fig. 5-2b).


Fig. 5-1. Austria: Julie Andrews bursts into song in The Sound of Music

Still captured from DVD © 20th Century Fox, 1958, 1965, 1993

Obvious similarities between these pictures suggest that respondents, some of whom said Austria and others shampoo, were not the least bit confused about what sort of scene, movements, gestures, activities, emotions or moods they got from hearing the music, even though there is no connection between dictionary definitions of Austria and shampoo. It is therefore only from a logocentric viewpoint, that Austria and shampoo, not to mention hills, hair, cornfields, sailing ships, dresses and manor houses, all common responses to the same music, can be considered contradictory, incongruous or polysemic.

Observations similar to those just made about Austria and shampoo apply just as well to very different sets of musical sound, for instance to those associated with city streets at night, with concrete, rain, crime, delinquency, flickering lights, urban loneliness, etc. This latter set of sounds and those of the Austria and shampoo piece cover mutually distinguishable fields of connotation, but the fact that each of the two sets of associations contains lexically disparate concepts does not mean that either of the two fields of connotation is in itself musically contradictory. On the contrary, play the music connoting either of those moods to anyone belonging to the culture in and for which the music was produced, and listeners will be in no doubt about which is which. Misconceptions of music as polysemic arise partly because academe demands that we present ideas about music, not in music, not even in terms of moving picture or of dance, but in words like these. These notions of music’s supposed polysemy can be questioned in at least two other ways: [1] by considering different symbolic representations of the same physical reality; [2] by turning the tables on denotative language and by absurdly branding it as polysemic instead.

Fig. 5-3. Castletown (Isle of Man): same geography, different representations

Figure 5-3 shows three representations of the same location. Images A and B can’t be polysemic just because the area’s geological details (image C) aren’t included. Nor can image C can be called vague because it doesn’t show buildings, roads or surface terrain. The point is that a physical location can be visually represented in a variety of ways, each symbolising different aspects of the same reality from different perspectives, using different rules of stylisation and abstraction, as well as different techniques for encoding different types of information for different purposes. If it’s accepted that the same location can be visually symbolised in different ways for different purposes, how come music, whose basic nature and functions differ so obviously from those of language or from graphic forms of representation, is expected to live up to linguistic or visual rather than musical criteria of semiotic precision?

Since different individuals within the same culture tend repeatedly to respond to the same music in quite similar ways, music cannot reasonably be considered polysemic. To underline the problem with logocentric thinking about musical meaning, you only need to apply musocentric arguments to language and ask, for example, what the sound of the spoken word table [!tEib(l] really means. True, like masă, mesa, pöytä, стол, stół, stůl, bord, Tisch, tavola, τραπέζι and other words denoting ‘a flat horizontal slab or board supported by one or more legs’, table is pretty monosemic, but it is, as [!tEib(l], musically indistinguishable from rhyming words like able, Babel, bagel, cable, cradle, Dave’ll [do it], fable, gable, label, ladle, Mabel, naval, navel or stable, each spoken with the same voice, intonation, timbre, inflexion, accentuation and speed of delivery. However, whereas no sane musicologist would dream of calling language polysemic just because all but the most onomatopoeic of words are musically ambiguous, many otherwise intelligent people still think of music as polysemic, just because musical categories of signification don’t coincide with verbal ones. This logocentric fallacy, part of the epistemic inertia discussed in Chapter 3, can also be refuted with the help of two final examples relating to a very simple, tangible, concrete and ostensibly denotative noun: chair.

[1] You can sit on one type of chair in the kitchen, in another in front of the TV; you can take the chair at a meeting, occupy another sort at a university and be fried on a final one in a Texas prison. Chair has to do for the lot of them and only the noun’s context or the addition of qualifiers like kitchen, easy, research or electric will clarify which chair is relevant. Words, in other words, even nouns denoting concrete objects, can be context sensitive and polysemic.

[2] The spoken word chair [!tSE:0] is as musically polysemic as singing the Twilight Zone jingle is verbally polysemic. Neither utterance carries clear meaning if judged according to the norms of semiosis applicable to the other sign system. A verbal statement is made less poly-semic (not more so) by prosody, i.e. by the ‘musical’ elements of speech, just as the precision of musical meaning can become more focused when heard along with words, actions or pictures.

In short, precision of musical meaning can never be the same as precision of verbal meaning. Music and language are not interchangeable sign systems: if they were, they would not exist separately. It’s for this tautologous reason that connotations given in response to the Austria and shampoo and urban alienation pieces of music mentioned earlier must be understood as belonging to musogenic, not logogenic, categories of meaning. Connotations elicited by music are verbally accurate in relation not to verbal but to musical discourse. Music is an alogogenic sign system whose semantic precision relies largely on connotation and on indexical signs. Mendelssohn put it this way:

‘The thoughts which are expressed to me by a piece of music which I love are not too indefinite to be put into words, but on the contrary too definite.’


Concepts of communication

So far this chapter has presented some background concepts essential to an understanding of musical meaning. Now, no semiosis can take place without communication, be it intimate and small-scale or broadcast by satellite from a stadium venue. Even singing alone in the shower is impossible without having first learnt patterns of melodic construction that pass for song in the culture[s] you are familiar with because all communication relies on some aspect of social organisation. Indeed, as we saw in the section about music as a universal language (p. 47, ff.), musical competence, poïetic or aesthesic, is to an overriding extent culturally specific. Even the simple word-painting tricks described at the end of Chapter 4 (sunbeams scorching, chilly dewdrops, etc.) had to be learnt, as did the Austria and shampoo connotations provided by respondents hearing separate musical extracts without verbal or visual accompaniment.

Returning briefly to the word-painting tricks described at the end of Chapter 4 (p. 152, ff.), I assumed, as an organist trained in a particular tradition, that my timbral variations would communicate to the congregation the basics of the kinetic, tactile, emotional and culturally connotative effects I had learnt: sunbeams scorching as sonically sparkling, sharp, bright, high-pitched and edgy; chilly dewdrops as rounder, cooler, slightly airy but precise and delicate, and so on. As ‘co-author’ of the music I was playing, I was simply acting in accordance with the assumption posited by Eco (1979b: 7):

‘[T]o make his text communicative, the author has to assume that the ensemble of codes he relies upon is the same as that shared by his possible reader. The author has thus to foresee a model of the possible reader… supposedly able to deal interpretatively with the expressions in the same way as the author deals generatively with them.’

That said, it would have been rash to assume that every member of the congregation registered exactly the same effects in exactly the same way, because social, physiological, neurological and psychological factors, including the momentary state of mind of each individual, would inevitably produce variations of response between members of the same basic musical community. More importantly, it would not be so much rash as absurd to expect members of a very different musical culture, with very different conventions of structuring and understanding timbre, to register my timbral effects in the same way as the congregation of the school chapel where I played organ in the early 1960s.

Here we enter the tricky territory of communication theory and (semiotic) pragmatics in which musical semantics (the relation between musical signs and what they mean) needs viewing within the framework of the relevant socio-cultural field. A short, explanatory disclaimer is called for here because this section of the chapter will not necessarily conform to the course content of B.A. programmes in communication studies. That said, what comes next is influenced partly by the Peircean tripartite semiotic models already presented, partly by Eco’s (1976: 32-47) reasoning about ‘signification and communication’ and by a more music-specific model presented by Bengtsson (1972). Even so, I should, in the interests of transparency, make three admissions: [1] that the main source of ideas presented in this section consists of observations and reflections made over sixty years of experience using, as transmitter or receiver, different kinds of music for different purposes, under different economic, social, physical and cultural circumstances; [2] that such experience has more often determined the theoretical models I adopt (perceptual learning) than vice versa (conceptual learning); [3] that 38 years of running courses in the analysis of music ‘as if it meant something’ forced me to abandon some intriguing but educationally less practicable conceptual universes (e.g. 18 of Peirce’s 27 sign types, not to mention all the specialised poïetic descriptors of musical structure). Instead I’ve prioritised concepts that gel more easily with students’ perceptions of music and its meanings, even though those perceptions are sometimes, as I suggest elsewhere, in need of problematisation. With that academic proviso out in the open I feel less inhibited about presenting a basic communication model.


Basic communication model

Fig. 5-4. Musical communication model in a socio-cultural framework

Figure 5-4 visualises basic elements of musical communication within a socio-cultural framework. The twisted arrows at the top and bottom of the diagram indicate that the model should be read as vertically circular (cylindrical), so that the store of signs and the sociocultural norms are seen as part of the same constellation of culturally specific values and activities, i.e. as part of the same socio-cultural field. More precisely, the store of signs is really just one of the socio-cultural norms shown at the bottom of the model because it contains all the social conventions of what constitutes music in the relevant culture, as well as all the socially negotiated norms about which elements of music have which connotations and are suited to which purposes, etc. I apologise for this problem of graphic representation but we need to distinguish between two types of ‘non-communication’ (incompetence and interference) and I was unable to graphically encode, all in one single diagram, that important distinction while at the same time visualising the store of signs as a subset of sociocultural norms. In fact, the diagram should really be spherical and (at least) three-dimensional, because it’s also horizontally circular, as suggested by the various arrows at the left and right edges. These arrows show that the uses to which we put the music we hear and the meanings we attribute to it, whether or not those uses and meanings are intended by those who made the music, influence the symbolic and behavioural conventions (the store of signs and the socio-cultural norms) which, in their turn, form the cultural starting point without which music’s ‘transmitters’ cannot meaningfully produce work as composers, arrangers, musicians, singers, studio engineers, producers, DJs, etc.

Since Figure 5-4 should really be spherical, you could theoretically trace any musical communication process starting at any point in the diagram. Indeed, many scholars have, without considering musical semantics, instructively examined interactions relating to music in the socio-cultural field, such as those between commercial and aesthetic value, between patterns of ethnic, religious, sexual or social identity and their representation in the media, etc. In such cases, the communication model would almost certainly, like the geographical representations in Figure 5-3 (p. 169), look very different. Be that as it may, since the main focus of this book is semantic, it’s logical to put the musical ‘message’ process at the centre of the model. That process runs as follows: the intended message, informed by specifics of transmitter subjectivity in objective relation to the socio-cultural field, passes from idea or intention, via its concretion in sonic form (‘channel’) to ‘receivers’ who respond to what they hear. Let’s first zoom in on that central semantic line in the communication process.

By transmitter is meant any individual or group of individuals producing music —composer, arranger, musician, vocalist (including you singing in the shower), studio engineer, DJ, etc. By channel or coded message is meant the music as it sounds (an array of signs), while receivers are those hearing or using the music, be they simultaneously the music’s transmitters or not. The intended message, similar but not identical to Pierces’s object, is what transmitters hope to express —the right sounds at the right time in the right order creating the right ‘feel’, so to speak. Since transmitters rarely use words to conceptualise intended messages —they do that in music—, I’ve provided a few verbal approximations hinting at a range of ‘feels’ that a musician working in the Western media might have to consider producing (Table 5-2).

Even though musicians within the European and North American cultural sphere might never use any of the words in Table 5-2 to describe any musical idea, professionals among them would still be able to come up with sounds corresponding to most of the ‘feels’ in the list. Similarly, codally competent listeners from the same cultural background would be able to distinguish that music according to categories similar to those in Table 5-2, a list that could go on for ever or include a totally different selection of mood categories. The point here is just to give some examples, in the form of pallid verbal approximations in the very verbal medium that is this book, of what an intended musical message might be, whether such intentions are verbalised or, as is more usual, just musically conceived. Of course, an intended musical message (or object), however inspired, doesn’t drop magically out of the blue. As the arrows on the left edge of Figure 5-4 indicate, they are informed by conventions existing in the sociocultural field, including its store of symbols, which in their turn are informed by previous acts of semiosis involving transmitters, receivers and the sociocultural field.

Table 5-2: Ethnocentric selection of connotative spheres (‘feels’/‘moods’)

rock’n’ roll kick-ass ethereal sublimity erotic tango

rural loneliness urban loneliness muso jazz cleverness

street-philosophising PI gospel ecstatic brave new machine world

yuppie yoghurt lifestyle cheerful children sex, aerobics style

headbanging thrash romantic sensuality bitter-sweet innocence

noble suffering slavery, drudgery wide-screen Western

Italian Western medieval meditation hippy meditation

psychedelia evil East Asians nice East Asians

savage Indians noble Native Americans slapstick comedy

pomp and circumstance sixties sound acid house body immersion

cybernetic dystopia death by frostbite twinkling happy Christmas

football singalong music hall pub song Methodist hymn

pastoral idyll the throbbing tropics inexorable violence

horror mystery grace and sophistication

Dracula’s drooling organ depravity and

decadence scorching sun,

blistering heat

wide and open smoky dive Arabic sound

West African drums distant bagpipe Barry Manilow ballad

Abba Aphex sound laid-back rock ballad seventies disco

1930s German cabaret Aboriginals inconsolably unjust tragedy

pagan ritual religious wonder Celtic mists

lullaby the march of death existential Angst


Thanks to Table 5-2, there is now a little meat on the bone of intention, which we’ll follow from transmitter to receiver. Does the music actually sound as intended? If so, does it physically reach receivers? If it does, what happens when they hear it? Is the message interpreted or used as intended or in a different way? We’ll start with the latter, taking as examples the first ‘feel’ in Table 5-2.

A typically ‘adequate response’ would probably come into play if, in the case of intended kick-ass, rock concert-goers reacted by gesticulating enthusiastically, perhaps also joining in by yelling out the hook line of the chorus. Stage diving would be good at a speed metal gig and brandishing a cigarette lighter appropriate for a rock ballad. Such activity would, however, not constitute ‘adequate response’ at a string quartet recital: listening in silence and without visible expression, not clapping between movements but giving the musicians a round of applause after the performance would be more appropriate. If people sit in expressionless silence during the intended kick-ass rock or if they bop around loudly to the existential Angst or ethereal sublimity of a late Beethoven quartet, or if they hear something intended as delicate and tender in terms of sentimental tack, or something intended as interesting in terms of horror, then there has been a breakdown in musical communication. In these cases, musicians have to ask themselves what went wrong. It’s not much use for composers to moan ‘they just don’t understand my work’, because that erroneously implies that a breakdown in musical communication is solely due to malfunction at the reception end of the process.

Of course, with live performance there can be difficulties at the actual venue. Is there disturbing background noise? Can’t careful miking, mixing, equalising or speaker placement help? Did the violins have to work too hard to make their notes last in a dead acoustic space? If such problems aren’t solved, some of the intended message won’t even make it into the ‘channel’: it won’t materialise as the signs, the sounds that you, the transmitter, want to put across so that your audience (the receivers) can form their interpretants. However, —and more likely— maybe your performance or recording sounds fine to you but the message still doesn’t seem to get across. Is it the wrong audience for your music or did you make the wrong music for them? Perhaps they laugh when they should cry, or gape apathetically instead of shouting and jumping? These problems of musical communication are attributable to what I call codal incompetence and codal interference.

Now, incompetence and interference both sound quite negative but neither term is intended in any pejorative sense. The two words are just shorthand for two types of breakdown in musical communication. Neither the ‘incompetence’ nor the ‘interference’ imply any stupidity or malice on the part of transmitter or receiver. Each concept simply highlights a particular set of mechanisms causing the varying degrees of difference that inevitably arise, in semiotic terms, between object and interpretant or, in terms of intentional communication, between intended and interpreted message. Codal incompetence and codal interference are in fact essential to the renegotiation of music’s possible meanings and to its survival as a sign system capable of adapting to different functions for different individuals in different populations at different times and in different places.

Codal incompetence

For musical communication to work, transmitter and receiver need access to the same basic store of signs, by which I mean a common vocabulary of musical sounds and norms (see p. 172). If the two parties don’t share a common store of signs, codal incompetence will arise, at either the transmitting or receiving end of the message, or at both ends.

Imagine, as a Westerner, hearing a field recording of traditional music from a village community in East Africa and thinking ‘this sounds festive’. Then you read the CD inlay and discover the song isn’t festive at all, at least if the notes written by a reputed ethnomusicologist are to be trusted. She describes the singing as ‘strident’, explaining that the track you’re hearing consists largely of stylised hyena calls and that packs of hyenas regularly ravage the villagers’ cattle. Whoops! Codal incompetence is at work here on several fronts. Firstly, you heard no hyenas in the music whereas, reportedly, those making or dancing to the music did so at the time of the recording. Secondly, you may not even know what a hyena sounds like, let alone what cultural conventions determine which aspects of hyena calls are stylised in which way into which types of song. Furthermore, you are unlikely to know how hyenas are regarded in the music’s original cultural context. Did you hear the threat to your livelihood that the calls of those hyenas connote or did the imitations of those animals ‘laughing’ make you want to laugh, too? Clearly, strident, rather than festive, would be an appropriate attitude for the villagers to adopt if, as you learn from the introduction to the CD inlay notes, courage, energy, organisation and determination are needed to effectively combat ravaging packs of hyenas. Mistaking strident for festive may be less inaccurate than hearing the music as mournful or gentle but codal incompetence on your part as listener is in clear evidence because you didn’t hear the music in the same way as would a member of the community producing and using those sounds. None of this means that your festive and no hyenas response is ‘wrong’. Codal incompetence at the receiving end just means an ‘inadequate response’ in terms of the music’s original cultural setting, functions and intentions. Besides, codal incompetence is in no way a trait exclusive to musical reception, as the next example suggests.

In the early 1990s someone in Liverpool informally asked me to come up with theme tune ideas for a series of local TV programmes. I understood the series was to include a fair amount of populist nostalgia for the ‘good old days’ when ‘ordinary people’ were supposed to have enjoyed themselves in ‘simple honest ways’. Having just returned to the UK after living in Sweden for many years, I had learnt to associate that kind of nostalgia with Swedish gammaldans, a cheery type of old-time, proletarian fun-and-games dance music featuring the accordion. Now, if, on that basis, I’d mixed some gammaldans into a signature tune to promote some that populist nostalgia for the ‘good old days’, I would have exhibited gross codal incompetence because Liverpool listeners would not have known what to make of those sounds and of their specifically Swedish connotations. So, perhaps my local theme tune would be less codally incompetent if I tried to emulate the sound of the older popular artists from Merseyside, maybe a Searchers pastiche to take viewers back to the city in the 1960s. The problem with that idea was that it too was likely to fall on deaf ears because younger Liverpudlians might not even recognise a Searchers sound, let alone be familiar with its connotations. In this latter case, however, there would also have been some codal incompetence from the receiving end, since the young audience would be unable to interpret musical signs that would be quite meaningful to older Liverpudlians. Thankfully, none these ideas saw the light of day because the TV project never passed the stage of loose chat in a pub.

Codal incompetence can also occur at more basic levels of musical structuration. For example, if you listen to recordings of Bulgarian women singing traditional harvest songs,H you’ll hear a lot of semitone clashes similar to those often used to help create tension, horror or discomfort in Western film music. The Bulgarian women’s semitone dyads and clusters may sound harsh and discordant to us Westerners the first time we hear them: that sound will at best come across exciting or exotic. But to the Bulgarian harvest singers themselves, pictured smiling and laughing in Figure 5-5, there’s nothing bizarre or exotic about their own music, nothing horrific about their semitones. It would in fact be codally incompetent, from the receiving end, to apply the semiotic conventions of semitones in Hollywood film music to the sound of Bulgarian women singing traditional harvest songs.

Fig. 5-5. Women singing harvest songs in Madzhare (Shopsko, Bulgaria)

(Musik från Bulgarien, 1965).

It would also be codally incompetent, from the transmitting end, to use the semitones of traditional Bulgarian harvest songs to celebrate the Christmas break at an office party in Milan or Milwaukee, that is unless a disproportionate number of ‘world music’ fans are among the party-goers. In that case Bulgarian semitones might work as group identity marker of sociocultural difference. With these ‘ethno’ fans and their radical recontextualisation of the Bulgarian women’s vocal techniques, we would be dealing not so much with codal incompetence as with codal interference.

Codal interference

Codal incompetence arises, as we just saw, when transmitter and receiver do not share the same store of musical signs, when the same musical sound, as sign, stands for different things at the transmitting and receiving ends of the communication process. Codal interference, on the other hand, arises when transmitter and receiver do share the same basic vocabulary of musical signs but differ in terms of sociocultural norms. Codal interference means that the intended sounds get across and are basically understood but that ‘adequate response’ is obstructed by other factors, such as receivers’ general like or dislike of the music and of what they think it represents. It can also result from visual, verbal, social or ideological recontextualisation of the music.

For purposes of illustration let’s go back to kick-ass rock from the 1980s. Those that hated the sounds of heavy metal and decried the music’s lyrics and lifestyle did not necessarily fail to understand the music’s message as you or I did with the East African hyenas (p. 179). No, metal haters were codally competent enough to register that the music was loud and powerful, that its lead singers tended to yell, that it made its listeners head bang, extend their arms in huge V-signs and so on. Indeed, heavy metal protagonists (soloists) had to be loudmouthed and loud-gestured because the backing they set themselves to be heard above, just like the society they and their audience inhabited, would otherwise have drowned them. They would, so to speak, have otherwise disappeared inaudibly and invisibly into an amorphous mass of sound and society.

Metal haters, just like its fans, knew that nice guys and good girls, with a well-mannered, reserved and demure behavioural strategy for social success, were incompatible with an aesthetic demanding a studied type of vulgarity, lavish amounts of ego projection and high volume to make the music work. Codal interference would obviously arise if you had invested time and energy into cultivating a nice-guy or good-girl identity and little or none into nourishing the self-celebratory and exhibitionist parts of your being. Metal aesthetics would be intolerable to you, not so much because the music seemed to spit on the nice guys and good girls as because you’d worked hard at repressing that anarchistic loudmouth and garish slob inside you which, if let loose, might ruin your efforts to please those in authority and to acquire social power and approval. You will have understood the music only too well but your sociocultural norms and motivations would have been antagonistically opposed to the expression of cathartic disgust, desperation or self-celebration that the music could have given you if you’d wanted.

Codal interference can work in the opposite direction if you think of metal, hardcore, techno, gangsta or industrial fans incapable of deriving any enjoyment from a classical string quartet. The subtle means of expression associated with classical chamber music can easily become a taboo area of affective and gestural activity for those who experience alienation at school, those whose peer group enthusiasm and social restlessness gets them thrown out of class, those who hate having to buckle under, learn the recorder or sing in the school choir, or who just resent all the goody-goody pupils and teachers who seem to love classical music so much. It’s no wonder if individuals feeling such alienation do not embrace music involving, among other expressive features, qualities like delicacy, control and containment. Still, just like the good guys and girls who repress the heavy-metal exhibitionist parts of themselves, alienated metal and rap fans who hate classical string quartets also miss out on essential aspects of music’s semiotic richness.


If psycho-social fear or resentment of certain music, and of what it is heard to represent, interfere with the communication of intended musical messages, deep identification with a certain music can do the same in reverse. In 1973, for example, the Strawbs, a politically conservative English band, released a tune called Union Man in which they parodied a trade union member in the lyrics and a proletarian pub or music-hall singalong ‘feel’ in the music: they intended to ridicule political views, people and music they did not like.HUnfortunately for the Strawbs but fortunately for socialists in the UK, the British left loved Union Man and adopted it as their own anthem on picket lines in 1984-5. Codal interference arose in this instance because of diametrically opposed political views and divergence of cultural identity between transmitter and receiver. It’s also clear that codal interference is in this instance related to codal incompetence because The Strawbs had radically misunderstood the British record-buying public’s store of signs.

Sometimes the words of a song can interfere with your perception of it as music. For example, if you had sung the well-known Welsh hymn tune Cwm Rhondda with its original words ‘Guide me, O thou great Jehovah!’ for twenty years in the local Methodist chapel and then, for the first time, heard lager louts sing it with lewd lyrics as you walked past the pub one night, it’s doubtful whether you would ever sing or feel the tune in the same way ever again. Similarly, visual narrative can also interfere with musical message, as so often happens with the use in TV ads of music you know from before. You only need think of the start of Richard Strauss’s Also sprach Zarathustra in ads for fabric softeners, office machinery and mobile phones, or of Dvořák’s New World Symphony for sliced bread, or of Muddy Waters’ Mannish Boy for jeans worn by young white US males.

Codal interference can work in two ways with the TV ads just mentioned. First, if you knew the music before seeing the ad, the connotations of those previous hearing[s] will be challenged, interfered with, just as the lager-lout words interfered with your previously established understanding of the Methodist hymn tune. Of course, the advertising idea is that positive values attached by target-group listeners like yourself to the borrowed music will magically migrate to the product being advertised. However, if you know the music well, or if it means a lot to you, it’s more likely that the its commercial use will seem like abuse and put you off the product advertised. In cases like this, advertising zeal to sell by associating product with assumed musical values can have the opposite effect, while, conversely, your prior knowledge of the music interferes with an ‘adequate response’ to the advertisers’ intended sales pitch. Secondly, if, on the other hand, you didn’t know the music before seeing the advert and then heard the music at a concert or on the radio, you would probably think of the advert you saw earlier. In this case, the music‘s paramusical accompaniment (visual, verbal) in the ad won’t necessarily interfere with your perception of the music because you never heard it before without visuals or voiceover. It will, however, certainly conflict with types of semiosis relevant to hearing the same music without such accompaniment, or in a different paramusical context, because you just can’t get the previously established paramusical connotations of the ad out of your head. Codal interference is certainly intentional in the advertising examples just given, the whole idea being that consumers associate the music, previously intended for, and used under, other circumstances, with the product being marketed. It’s a form of connotative hijacking.

Sometimes these intentional codal interferences, including connotative hijacking, serve their purpose, as do the adverts just mentioned, or Joe Hill’s parodies of Salvation Army hymns to union lyrics, or the Sousa march which became the Monty Python theme tune. Still, sometimes intended interference doesn’t work, as we just saw with the Strawbs’ Part Of The Union (p. 184), and sometimes it only half works, as in the next and final example, drawn once again from personal experience.

Representing immigrants

In 1981, Swedish Radio asked me to provide theme music for a programme series for and about immigrants. The programme’s title, Jag vill leva, jag vill dö i Norden (=‘I want to live and die in the North’, i.e. in a Nordic country), is the last line of the Swedish national anthem and provided a useful starting point. Since Sweden was the host nation into whose established majority culture immigrants had to assimilate, I decided to start with a full-blown, grandiose, official-sounding version of the national anthem’s last line. My budget couldn’t pay for a symphony orchestra or a decent brass band, so I settled for recording the line myself on full organ in a local church. In fact, that may have been a better solution because end-of-year school ceremonies in Sweden are often held in churches and are quite a nationalistic affair. OK, the official national ceremony organ sound took care of the powerful host-nation side of the story but the series was not supposed to be a nationalist PR stunt, so I also needed to reflect something of the conflicts and problems of immigrant life.

(Incidentally, when describing my intentions here, I am retrospectively verbalising mainly musical concepts and ‘feels’ that constituted the object of the recording which became the sign. It was really only when codal interference affected the relationship between my object and the producer’s final interpretants that I had to start rationalising, in verbal terms, what I had done musically.)

I put the first aspect of immigrant problems into music by replacing the grand final chord of the national anthem with an unresolved sonority. I quickly faded that worry chord to a much lower volume that could be held throughout the rest of the signature to allow solo ‘immigrant instruments’ to play the same melodic phrase (the last line of the Swedish national anthem) at different points in different keys and at different pitches. The first ‘out-of-key individual immigrant’ to play the national anthem was a dirty-sounding electric guitar which I included for two reasons: [1] I was not the only rock-playing anglophone immigrant in the country; [2] rock music was in 1981 itself fast becoming an integral part of the host nation’s mainstream culture. After the rock guitar I added accordion (Swedish and immigrant again) in another different key and then mandolin as a generic ‘ethnic folk lute’ to suggest Sweden’s numerous Greek (bouzouki), Turkish (saz), Eastern European (balalaika/cimbalon etc.) and Andean (charango) immigrants (instruments). The last ‘out-of-key ethnic instrument’ representation was soprano recorder as ‘generic folk flute’ —perhaps an Andean quena or a West Asian ney/näi/gagri. The final flute note was left loud, high, piercing, alone and long enough, with extra reverb, so it could be easily cross-faded into the programme speaker’s introductory words.

Those twenty-odd seconds of theme music were not without humour but I also wanted them to sound a little bit disconcerting. Why? Well, as an immigrant in a majority host culture, you try to fit in and to ‘sing from the same hymn sheet’ as the majority, but you often get the feeling that you’ll always be somehow out of step, out of tune and out of place because, like it or not, you think, feel, act, look or sound different to the host-nation majority. Since it was part of that experience that needed to be in those twenty seconds of music, I thought it would be good to juxtapose musical soundbytes that didn’t normally belong together in the same piece: I was in other words intentionally using codal interference. Hence the official-sounding festive pomp of the organ plus the worry chord, plus each timbrally distinct instrument representing a different culture. All those elements were supposed to interfere, like immigrants, with the first and most powerful statement on the organ.28

The recording engineer and I made numerous versions of the recording. Apart from the full mix, there was one without the organ, another without the distorted guitar, a third with neither organ nor guitar, and so on. The only mix the producer liked was the dubbed mandolin solo. She even made me dump the flute because it was ‘too shrill’. I tried to explain why I’d gone to the trouble of recording the organ track but neither organ nor guitar were acceptable, I understood, because ‘they don’t sound like immigrants’. ‘But’ I objected, ‘you can’t put over what it feels like to be an immigrant if there’s no host culture.’ To cut a long story short, the only concession granted by the producer was that, after much insistence from my side, the unresolved worry chord could be held under the dubbed mandolin parts. It’s that version which was finally used as programme signature. I had to content myself with the fact that there was at least a slight musical hint that being an immigrant and or hosting immigrants might not be entirely unproblematic.

My interpretation of the producer’s selection of just one element and her rejection of all the others is not that it was a matter of ‘personal taste’. She seemed to me to be saying that flutes can be cute or exotic, not strident, in the same way that host nations appreciate grateful and deferential immigrants who are never angry, alienated or frustrated. She also seemed to be saying that immigrants could not be English-speaking and not electric (so much for yours truly and hundreds of Vietnam draft dodgers in Sweden at the time). It was as if, in her mind, we should all conform to the host-nation immigrant stereotype that assumes we all come from far-off and backward rural areas where we all play pleasantly unfamiliar music on pleasantly unfamiliar acoustic instruments. The strangest thing was, however, that the signature theme should not allude to the overriding power of the host nation as a central issue affecting the lives of immigrants.

This little signature theme story illustrates codal interference on a grand scale. The producer knew as well as I did the values, attitudes and feelings encoded in the ‘channel’. However, although we probably both had access to a very similar store of signs, our sociocultural norms and expectations were in definite conflict. She did not think my musical view of being an immigrant was suitable and, as an immigrant, I thought hers was both unrealistic and unsympathetic.

Of course, the producer had the final word and, who knows, she may have been right. Maybe she saw me as a codally incompetent transmitter, as an unreliable or unprofessional young composer who ‘didn’t come up with the goods’. Perhaps I was supposed to produce something happier and more catchy, something that would just acoustically identify the programme and put potential listeners in a no problems frame of mind. However, since the only information I was given about the programme dealt with its content, I assumed that I was to focus on that. If, on the other hand, my job was to provide an innocuous musical identifier and to prevent listeners from switching channels, I should have been told so, or was I expected to read that between the lines?

Whatever the case may be, it’s very possible that another communication problem caused the codal interference just described. That problem relates to the task of formulating an adequate brief, i.e. the instructions given to a musician or composer by someone who is usually not. Those difficulties are, in their turn, one reason for writing this book. The fact that muso and non-muso discourse about music differ so radically, for all the reasons given in Chapters 2-4, calls for the development of models and of a terminology allowing musos and non-musos to better understand each other.


’Somatic’ and ‘connotative’

Throughout this book, connotative verbal expressions are used to designate interpretants linked to musical sounds. Those expressions turn up repeatedly in Chapter 6 as respondent VVAs (=verbal-visual associations), but there have been plenty in this chapter, too. Apart from all the ‘moods’ listed in Table 5-2 (p. 176), we had to explain the Austria/shampoo idea as part of a semantic field that also includes pleasant aspects of femininity, romance, open countryside, rounded shapes, and soft materials, as well as movements qualifiable as smooth, flowing and wavy but containing elements of rustling or tingling. On several occasions I warned that these connotative verbal expressions are but pallid verbal approximations of musical meaning. I’ve also suggested that they can sometimes act as culturally specific, metonymic labels or verbal metaphors of music. For example, an adequate aesthesic label like ‘spy chord’ (p. 116) does not mean that the chord signifies spy: it simply functions as a cognitive reference point allowing us to name a particular set of musical interpretants in relation to a particular set of musical signs. So what’s the problem?

The problem is that, despite the repeated caveats just mentioned, many people still object to any use of verbal connotation in the discussion of musical meaning because, they argue, such connotations falsify the intrinsically alogogenic character of music. Sometimes they argue their point by using the adjectives primary and secondary to qualify levels of signification, such ordinal categorisation leading to the assumption that being ‘on top of the pile’ (hierarchically primary) or ‘first in line’ (sequentially primary) implies greater importance or superior value. Now, Middleton, who introduced the terms ‘primary’ and ‘secondary signification’ (1990: 220-227), in no way views the difference between the two categories in that way. While his valid distinction is that between how ‘meaning might be produced at the introversive or “primary” level of signification’ and how ‘the associative sphere of musical meaning, the level of connotation and extramusical reference’ (‘secondary’), he strongly warns against the temptation to reduce the former to the sort of bodyist essentialism criticised in Chapter 3.

[T]‘the fields of gesture and connotation (primary and secondary meaning as I’ve called them elsewhere) are actually correlated, through the action of what some semiologists have termed a ‘semantic gesture’: a unifying, generating principle traversing semiotic levels (somatic; referential) and tied to deep cultural functions’. (Middleton, 2000: 116)


The common denominators of gesturality contained in the Austria/shampoo trope (rounded, soft, smooth, flowing, wavy, etc.) and discussed under ‘Gestural interconversion’ (pp. 502-509) demonstrate such ‘unifying, generating principles’ which very clearly ‘traverse somatic and referential levels’ of mediation (summer meadows as well as undulation, so to speak) and are ‘tied to deep cultural functions’ (e.g. romantic and parental love). The point is that if we abstain, for whatever reason, from using connotative verbal expression to designate musical interpretants, we’ll never understand the social, cultural and corporeal nature of the ‘unifying semantic gesture’ because we will have failed to verbally identify its constituent parts. It will moreover be impossible to democratise the denotation of musical signs because aesthesic designation of musical structure relies by definition on their perception and interpretation. If, as Middleton (loc. cit.) suggests and as argued here, the somatic and connotative aspects of musical meaning are, despite differences, neither contradictory nor mutually exclusive then there is no problem with using connotative verbal expressions to designate musical signs and their interpretants. This does not mean that connotative aspects of musical mediation are more important than somatic perception any more than the reverse is true: it simply means that if humans are more than mere animal automata, then their use of music will make little sense if somatic response is considered ‘primary’, just as it would be absurd to privilege the patently obvious power of music to move souls as well as bodies.


Chapters 1-4 were supposed to demystify notions of music and to explain why the epistemic divisions between music and other forms of knowledge are so entrenched in the West. In this chapter the focus was on basic concepts of meaning and communication. The main arguments can be summarised in the following seven points.

[1] Peirce’s distinction between object and interpretant in relation to the sign allows for a dynamic view of musical semiosis. Even though it saves time in semantics if you use Saussure’s signifier - signified, Peirce’s triad object - sign - interpretant is more compatible with thinking about music in terms of symbolic interaction between humans. It’s from this perspective that the object can be understood as conception or intended message at the transmitting end of a simple transmitter - channel - receiver communication model, and the interpretant as (surprise!) its interpretation at the receiving end.

[2] Since music works to such an overwhelming extent as a culturally specific sign system, its ability to carry meaning relies on the existence of a shared store of signs common to transmitters and receivers in the relevant cultural context. Although object (»intended message) and interpretant (»listener response) can never be identical, musical communication usually works, otherwise there would be no call for music on ceremonial occasions, nor in TV ads, computer games or anywhere else for that matter. However, there will be communication failure if the music includes signs unfamiliar to its audience, or if interpretation of signs from the common store varies radically between transmitter (composer, musician, etc.) and receiver (audience).

[3] Musical communication failure can occur for logistic reasons of acoustics, technology, etc., but their most common causes are codal incompetence or codal interference. Codal incompetence arises if transmitter and receiver do not share the same store of signs (including their meanings); it can occur at both the transmitting and receiving ends of the communication process. Codal interference arises when transmitter and receiver do share the same store of signs and their meanings but do not translate those same meanings into the same final interpretants. Differences in sociocultural values often cause codal interference.

[4] Codal incompetence and codal interference (intentional or not) are prerequisites for shifts in musical meaning. Signs from one culturally specific store (or vocabulary) can be appropriated into another where they acquire a different meaning or function.

[5] Among Peirce’s numerous trinities of sign types, one is of particular use to musical semantics: icon - index - arbitrary sign. Arbitrary signs are rare in music, whereas icons are not uncommon and indices are virtually omnipresent.

[6] Connotation isn’t less concrete or less efficient than denotation and music is definitely not more polysemic than language. Music is a connotative, alogogenic sign system. Verbal descriptions of musical meaning must therefore be treated as very approximate verbal connotations of musically precise messages.

[7] Since connotation relies on the existence of previously established meaning[s], and since indices are signs connected by either causality or proximity to what they signify, musical semiosis tends to be both connotative and indexical. In the next two chapters I'll try to explain how that sort of semiosis can be substantiated and understood.




6. Intersubjectivity

LOGOGENIC has been used several times in this book to qualify the noun music. It basically means that music is unverbalisable and that’s because its semiotic precision, linked to gestural, tactile, corporeal, emotional and prosodic forms of communication, relies mainly on iconic, indexical and connotative types of semiosis. It certainly doesn’t need the denotative sort of signs used in this sentence! Talking and writing about music ‘as if it meant something other than itself’ is in other words very difficult, at least in the tradition of learning with which I’m familiar. Chapters 6 and 7 confront that problem head on. Their basic rationale is as follows.

Given music’s obvious traits of social organisation and cultural specificity, it ought to be possible, using words and other sign types, to form some idea of the links between the sounds of music and something other than themselves, even if trying to put those sounds directly into words is a pointless undertaking. If that rationale makes any sense at all, we ought logically to be able to suggest how anyone capable of reading these words can investigate musical meaning and discuss such meaning in viable terms. That is at least the aim of Chapters 6 and 7.

As suggested earlier (p. 174 ff.), musical communication works best when those at the emitting and receiving ends of the process share similar sociocultural norms and the same basic store of signs. Since those norms and that store of signs are both part of the sociocultural field in which musical semiosis takes place, it makes obvious sense to look at that semiosis in socially verifiable terms. That’s where intersubjectivity (see next paragraph) comes in. This chapter deals with shared subjectivity at the receiving end of the communication process, while Chapter 7 considers the question of interobjective references or hypertexts. Both these fields of investigation can provide valuable information about musical meaning.

Intersubjectivity arises when at least two individuals experience the same thing in a similar way. The same (or a similar) experience is in other words shared between (inter) two or more human subjects. Now, musical experiences are often regarded as highly personal and subjective, but it’s just as easy to understand the fact that without intersubjectivity there would be no communities of musical taste, no format radio, no music industry and no other objective social phenomena demonstrably related to different musical configurations. Indeed, music for film, TV, games, advertising, dancing, weddings, funerals, sports events and so on would all be pointless if all individuals in a given audience understood and reacted to the same musical sounds in radically different ways. This simple truth implies that anyone looking for evidence of musical meaning might do well to look for patterns of intersubjectivity relevant to the music under analysis. That means turning in the first instance to the final arbiters of musical meaning, to those who hear the music in question, who use it and react to it, in order to verify the existence or non-existence of shared interpretants.

Aesthesic focus

It’s often tempting, especially for musos, to investigate matters of musical meaning at the transmitting end of the communication process, i.e. by studying poïesis rather than aesthesis. Now, issues of authorial intent can indeed be important in terms of insights about processes of musical production —why musicians choose to make sound x rather than sound y in relation to phenomenon z, so to speak— but that is not the focus of attention in this book for the following six reasons.

[1] It’s often difficult to contact the artist, composer or musician behind the music you’re analysing. Some are inaccessible ―they may be protected by media industry guard dogs― while others may quite simply be dead.

[2] If you do manage to contact your ‘transmitters’ they won’t necessarily want to talk about what they meant to mediate through their music. Many will say they intended nothing in particular or tell you the music speaks for itself. Others will talk about their music in poïetic terms and leave you none the wiser about what they meant by it all.

[3] When ‘transmitters’ verbalise comprehensibly about their music in interviews or in writing, they have to consider their image and credibility in particular sociocultural circumstances because what they say or write can determine what, if any, their next gig might be. You need to know more about what they really mean by their music than how they currently see it in relation to their public persona.

[4] Information from just one individual (composer, arranger, producer, artist, etc.) can by definition never be intersubjective. Greater reliability of intersubjective information is gained by consulting a greater number of individuals. Therefore, unless you’re studying esoteric musical situations where ‘transmitters’ outnumber ‘receivers’, it makes more sense to investigate patterns of intersubjectivity about music’s meanings among its listeners (aesthesis), less so to focus on the production pole (poïesis) of the communication process.

[5] Focusing on the poïetic pole can certainly be useful in providing technical tips to budding composers and musicians; but the risk with that focus is, as we saw in Chapter 3, that it privileges poïetic at the expense of aesthesic competence. This neglect of aesthesis does little to promote the democratic sort of musicology alluded to in the non-muso part of this book’s title, a musicology which, as we’ll see, seeks to use aesthesic competence to help construct a vocabulary of descriptors for aspects of musical structuration (e.g. vocal timbre) that conventional poïetic terminology does not cover satisfactorily.

[6] Although ‘transmitters’ and ‘receivers’ both obviously consist of individuals, the former are much more likely to be identified as such (the named composer or artist, the ‘star’) than the latter who usually remain nameless, viewed en masse in terms of an ‘audience’, ‘the public’, ‘the fans’, etc. It may be understandable if, from this perspective, conventional studies of music favour focus on readily identifiable musical individuals at the expense of the faceless masses; but one consequence of such institutionalised auteurcentrism is that, by privileging authorial intent and skill, it marginalises and disqualifies the demonstrable musical competence of individuals comprising the music’s audience. Authorship is conflated with authority, so to speak: more importance is attributed to intended meaning than to its perception, the sign’s object (in Peirce’s sense) taking pride of place over its interpretants. However, as we saw with the Strawbs’ song Union Man (p. 184, ff. H) and with the title theme I recorded for Swedish radio (p. 186, ff.), assigning semiotic privilege to the poïetic pole can be fatal because, whatever authorial intention may have been, listeners are the final arbiters of musical meaning. It is they, not me, not The Strawbs, nor any other ‘transmitter’, who form its final interpretants, they who use the music in particular sociocultural contexts, they who negotiate and adapt the music’s meanings after it has left authorial hands. Besides, there are many more of them than of me or of The Strawbs. Put tersely, the final proof of the semiomusical pudding is in its eating.

None of this means to say that discussion of authorial intention is irrelevant to the discussion of musical semiosis. However, the six reasons just presented suggest that it would be inadvisable to prioritise poïesis in the investigation of musical meaning in everyday life, more prudent and productive to turn primarily to its final arbiters so that the existence or non-existence of shared interpretants can be studied on the basis of some sort of empirical evidence. Shared interpretants, if they exist, can be observed at two general levels, one more ethnographic, the other more connotative, more explicitly semiotic.


Ethnographic intersubjectivity

Behavioural and demographic intersubjectivity can be observed ethnographically and involves such factors as:

• listening mode, e.g. whether the music under analysis is played in the background or if it’s more the focus of audience attention;

• listening venue, e.g. if the music is heard in a car, at home, in public spaces, in clubs or bars, or at a place of worship, through speakers or headphones, live or prerecorded;

• listener activity, e.g. whether the music incites the audience to sing or dance, stroll or march, to rise up or sit down, to break into tears or out laughing, to wake up or go to sleep, etc.

• cultural location (‘scene’), including demographic, historical, geographical, ethnic, linguistic and sartorial information; e.g. if the music is/was made and/or heard/used by middle-class Swedes in their thirties around 1975, by young male gang members in South L.A. in the 1980s, by elderly Kosovo Albanians in the 1990s, by exile Tamils in Toronto, by goths, punks, lager louts or bank executives wearing baggy jeans, national costume, pin-striped suits, flip-flops, denim or clubwear, etc.

Observations of the sort just listed can provide useful information about certain aspects of musical meaning. If we think of musical structure in terms of signs and of responses observed on hearing that music in terms of interpretants, it follows that a particular piece or extract of music giving rise to observable similarities of reasonably consistent audience response in the form of particular types of activity, emotion or connotation implies that the music in question in some sense signifies the complex of physical, social, cultural and emotional response with which it’s associated or which it appears to elicit. The only problem is that some of the points listed above, especially those included under listening mode and demographic location, will vary considerably in terms of semiosis depending on cultural context, especially in relation to which audience is identified at the receiving end of the musical communication process. Different audiences in different cultural circumstances give rise to different patterns of shared subjectivity in relation to the same music. Since one single set of intersubjectively shared responses can never be applied to all audiences at all times in all situations, it is vital, when using patterns of intersubjectivity observed at the receiving end of the communication process as a basis for discussing questions of musical meaning, to be clear about which audience you are referring to in which historical and cultural circumstances. Such demographic precision is also essential when it comes to the main source of user information discussed in this chapter: connotative intersubjectivity.

Reception tests

Connotative intersubjectivity involves indirect observations about shared responses to music. Such observations are often made through the mediation of words describing what listeners see, feel, imagine or otherwise associate to when hearing a particular piece or extract of music. In the interests of brevity I’ll call ‘a particular piece or extract of music’ the musical analysis object —AO for short— and I’ll refer to ‘the verbal expression of what listeners see, feel, imagine or otherwise associate to’ as verbal-visual associations —VVAs for short.

VVAs in response to a particular AO can of course be gathered by studying writings about the AO in reviews, inlay or sleeve notes, blogs, etc; but it can often be productive to ask listeners directly for their response to music, either in conversation or by means of a reception test. The immediacy and informality of one-to-one conversations more closely resemble everyday listening situations but their transcription and semiotic collocation, in addition to the task of actually conducting those conversations, can be very time consuming. Reception tests also demand, as we shall see, their fair share of semiotic collocation work but they have the distinct advantage of needing no transcription and can be run on many respondents at the same time. Such tests, it should be added, aren’t tests in the usual sense of the word: they in no way test the skill of listeners to provide ‘right answers’ in the form of previously determined VVAs in response to the AO. The only thing they do test are hypotheses about what an AO might ‘mean’ and listener responses to that test are supposed to help verify or falsify those hypotheses.

Reception tests can be conducted live in a classroom situation or posted on the internet. One advantage with live reception tests is that you have a captive audience whose responses you can collect on the spot. A distinct disadvantage is that classrooms are supposed to be sites of rational discourse rather than of the holistic, lateral and synaesthetic types of cognition associated with music, as discussed in Chapter 2. There are at least two ways of minimising that cognitive contradiction. You can: [1] present very short music examples that give little or no time for rational reflexion or intellectual reasoning and, if you’re testing more than one example, leave little or no time for deductive thinking between each example; [2] more importantly, you can give respondents clear instructions underlining that you’re looking for immediate responses to music, not for verbally well-reasoned argumentation or high standards of writing.

Internet reception tests also have pros and cons. One obvious drawback is that some individuals may listen more times or more attentively through better sound equipment and spend longer formulating their response than others. Such variation of listening attitude and situation can generate data that may be irrelevant to what you want to test. They can produce extraneous variations in response that are less likely to arise in the uniform classroom situation and that may introduce variables that aren’t part of the exercise.14 This risk can be reduced if respondents are given clear instructions about how they are supposed to listen to the music example[s] in question.

The advantages of internet testing are: [1] you avoid problems of illegible handwriting because subjects enter their responses via a computer keyboard; [2] you can cut and paste responses into whatever document you need to produce when writing up the results; [3] the test environment will probably resemble that of everyday listening more closely than does the classroom situation.

At least four main issues need to be addressed before you actually test any music on any respondents under any circumstances.

• Which music’s meanings do you want to test?

• Who do you want to test those meanings on?

• What sort of listening attitude should respondents ideally adopt?

• In what form do you want the responses?

Those four questions give rise to several other important considerations of which there’s room here to mention just a few. Firstly, you’ll need to decide if you want to test responses for several pieces or just one, or if you want to concentrate on one or two short extracts highlighting particular points of musical structure and meaning inside one and the same piece. Here it’s worth remembering that the more pieces or extracts you include in a battery of test examples, the more listener responses are likely to be influenced by what they just heard. For example, a suspense-chord stab preceded by a thrash metal riff may not sound as threatening as after a wistful ballad. Similarly, the longer the example or extract you play to your listeners, the more likely it is to involve some sort of narrative, i.e. to ‘go elsewhere’ or to move through more than just one relatively coherent musogenic semantic field. That can cause problems if you’re testing hypotheses of signification relating to one such single set of musical structures or to just one musogenic semantic field; but if you’re interested in VVAs elicited by musical narrative a longer test example will be necessary to discover how much your respondents hear processes roughly verbalisable in concepts like about to, then, suddenly, gradually, changes to…, all the time, once again, just before…, after which…, etc.

Secondly, if you want to test hypotheses of musical signification without respondents being influenced by verbal or visual message, you need to consider, in the case of a song, concentrating on instrumental passages or choosing a song with lyrics in a language that respondents don’t understand. In the case of music and the moving image it’s often worth selecting relevant instrumental extracts from the soundtrack album or, failing that, playing extracts from the full soundtrack that contain as little dialogue and as few sound effects as possible. On the other hand, you might actually want to focus on vocal production or on the effects of music in conjunction with images. In those cases you’ll probably have to construct your own test examples, juxtaposing two or more different vocalisations of the same lyrics, or, in the case of pictures, either two or more different musics to the same images or different visual sequences to the same music. Of course, if you wanted to test the effects of lyrics on musical message you would have to construct examples with different lyrics to the same music or different music to the same lyrics. Such cross-testing can be very useful but it poses one problem of method. The difficulty is that if respondents hear in succession identical music with different verbal or visual accompaniment, or different musics accompanying the same words or visuals, or the same vocal statement treated in different ways at the mixing desk, listener attention will automatically be focused on those differences. Since that kind of focus rarely occurs under the sort of everyday listening conditions which you might ideally want to replicate in a test situation, you could try playing some of the examples to one group of listeners and the others to different but demographically similar respondents. Of course, that procedure involves more work and raises other problems, for example the task of verifying to what extent the different respondent groups are in fact culturally and demographically similar.

Thirdly, the second of the four main questions posed at the start of this subsection (p. 202) asked what sort of audience you have in mind for your reception test. You might, for example, want to concentrate on fans, devotees or experts of a particular type of music; or maybe you’d prefer to use as wide and heterogeneous a population as possible. In the first instance it’s a good idea to also test your AO[s] on a control group of ‘non-experts’ to find out what VVAs are specific to fans and which are shared by a wider community. In either case, it’s essential to gather standard demographic and other culturally relevant data from each respondent.

The fourth question —in what form do you want the responses?— is basically an issue of multiple choice versus unguided association. Multiple choice answers are much easier to deal with because they present no problems of legibility and because they convert conveniently into statistics. However, multiple choice tests will be methodologically flawed if you can’t convincingly explain which processes led you to exclude every thinkable response possibility and to include only the very few alternatives you allow respondents to select from.

Unguided association

There are several important advantages in using unguided association. The first point is that although the relative immediacy of response involved in noting a few words on a blank screen or sheet of paper does not satisfactorily simulate everyday music listening situations, it does so much less inadequately than having to read a prepared text, put figures into boxes or tick alternatives on a neatly prepared test form. Moreover, the multiple in ‘multiple choice’ is really a misnomer in that such tests restrict listener response options much more severely than does a blank sheet or computer screen answer box preceded by a few basic instructions. In fact it’s reasonable to interpret each freely induced response as culturally more significant than multiple choice answers because each response is actively created by the listener with music as main stimulus without the restrictions of a limited number of ready-made alternatives. In addition to these advantages, it should be remembered that one main aim of the sort of musical reception test discussed here is to find out how people relate music to other phenomena than just music. Like it or not, using multiple choice testing implies a large degree of certainty as to what alternatives ought and ought not to be included in connection with each AO. Since very few scholars, if any, can lay claim to such certainty when it comes to musical meanings, multiple choice testing cannot be considered the wisest option.

Another problem with multiple-choice methods of gathering musical reception data is that they have tended to favour adjectives describing general moods or emotions and to avoid reporting other types of listener response. This kind of affective adjectival bias has meant that extremely common types of VVA like people (e.g. villain, princess, teenagers, lovers, James Bond), objects (e.g. car, crinoline, cigarettes, shampoo, neon lights), settings (e.g. sea, fields, church, street, suburb, paris, distant galaxy; medieval, 1950s, distant future; aristocratic, working class) are usually absent from such studies. Of course, affective adjectives like sad, happy, pleasant, unpleasant, romantic, calm, threatening are all perfectly viable co-descriptors of musical experience and must also be taken into account but they should never be the exclusive, nor necessarily the primary, focus of reception tests. If they are, response data can become skewed and misleadingly vague, for not only will physical, historical and social connotation be absent: so will music’s obvious capacity to communicate notions of space, gesture and movement.

To concretise the issue just raised, imagine two sets of response to a sound recording of a Hammer horror film pastiche of J. S. Bach’s highly popular Toccata and Fugue in D minor (1705). One listener responds majestic, ecclesiastical and ominous while the other writes Count Dracula drooling over the organ in his damp and degenerate castle before hitting the night air in search of young blood. The first response certainly contains appropriate adjectives but the second is more musogenic, for not only does it imply all three adjectives in the first response (Count and castle are majestic, the organ is ecclesiastical and the remaining concepts are ominous enough); it also connotes gestural, tactile and kinetic detail missing in the first response: drooling, damp, degenerate, hitting, night air, searching, young blood. In short, the general affectivity expressed by adjectives selected from multiple choice alternatives, themselves by definition a very restricted selection of all available affective adjectives, may seem fine from a verbal semantic viewpoint, but they are musogenically inadequate. Therefore, if you want to avoid the pitfall of affective adjectival restriction, why not tell your respondents, before they hear anything, something along the following lines?

‘During the next m minutes you’ll hear n short musical extracts. I’ll say the number of each one just before you hear it. Please note that number in the left margin of the page and then write down whatever you think could be happening on an imaginary film or TV screen along with each extract you hear. There won’t be much time to think, nor to write, so you don’t need to formulate complete sentences or bother about spelling or grammar; just jot down the impressions that come into your head for each piece of music. It might be a mood, or people you see in your mind’s eye, what they’re doing, what’s happening (if anything), where and when it’s happening, what it feels like and so on.’

These were the basic reception test instructions given to the 607 respondents whose VVAs provided the empirical intersubjective data used in the project Ten Little Title Tunes (TLTT). The aim of that test was to discover what kind of connotations well-known musical structures in relatively unknown pieces of film and TV music would elicit from a wide range of listeners.

Obviously, reception test instructions will diverge from those just cited depending on which music is being tested on which respondents for which purposes, but one thing is clear: unguided association responses will probably need to be written up and that means putting them into some sort of classificatory system. Indeed, although the kinds of reception test discussed in this chapter all involve some sort of methodological problem, they will be neither unreliable nor pointless as long as their aims, parameters and limitations are made clear. In fact, treated carefully and transparently, reception tests, even on just a single extract of music heard by a mere handful of people, can provide useful empirical information about degrees of intersubjectivity in response to a musical AO. Of course, the viability of a reception test using unguided association depends on the way in which the responses it produces are interpreted, collated and presented.

Classifying test responses

I’ve already described how, when trying to make semiotic sense of our respondents’ associations to the music we made them hear, Bob Clarida and I came to the conclusion that the two linguistically disparate VVAs Austria and shampoo had to be understood as musogenically similar when taken as responses to one and the same short extract of music. Of course, in order to argue that point we had to know ‘how much’ Austria rather than, say, Brazil or Japan and ‘how much’ shampoo rather than, say, guns or cigarettes our respondents imagined on hearing the reception test piece in question. That in turn meant devising ways of thinking about responses in categories like general moods and emotions, possible protagonists and background figures, animals, objects, scenes (geographical, ethnic, social, architectural, historical, etc.), action, movement, speed, stasis, spatiality, singularity, multiplicity, narrative, causality, and so on, while at the same time considering responses mentioning other pieces of music or other types of symbolic representation like drama, film or TV, including names of musicians, composers, actors, artists and directors. We eventually came up with a response grid, which we constructed on an ongoing basis to house the responses we received, so that we could report, for example, how much of which sort of humans, animals or insects (if any) were imagined doing what (if anything) in which way with what effect in which sort of setting and atmosphere at which time of day or night and at what time in history, in which type of weather, at what speed and with which type of movement, calmly or with agitation, or with humour, gently or threateningly, happily or sadly, robotically or gracefully, quietly and peacefully or noisily and frenetically, etc., etc. A short version of that response grid, with the overriding single-digit categories (1, 2, etc.) divided into double-digit subcategories (11, 12, etc.) and again into three-digit subtypes (111, 112, etc.), but without the original four-digit sub-subtypes (1111, 1112, etc.), occupies the next few pages. It’s included here as an illustration of how unguided associations to music can be grouped into semantic categories. It’s followed by explanations of its most important issues of theory and method.

Table 6-1: VVA taxonomy — overview

0. Statistics and relative time position

00. Test statistics

001-003: blanks, recognitions, illegible responses

02. Synoptic time position

021. Start: curtain comes up, introduction, main theme, opening, overture

022. Middle: scene, episode (part of prod.), entr'acte, break (in action)

023. End: final scene, showdown, epilogue

03. Episodic time position

031. Future: about to…, will soon…; imminent, …is expected, [s.g.] will happen [now]; leading to…, [will have] consequences; [will] eventually

032. Present: at this moment, has just started, after a while, turns into, changes mood, we switch to, we follow, now [x happens], during, meanwhile

033. Past: goal reached, journey over, finally, has [done x], after a long time (in past), once again, used to [do x], what we did

1. General attributive affects

10. Culturally ambivalent

101. Relative dynamism: excited; emotions, stimulating; complicated

102. Relative stasis: usual, familiar, neutral; no danger, no problems; simple

103. Reflexion, sentimentality, lyricism: bitter-sweet; nostalgic; introvert

104. Determination: deliberate, confident, resolute

105. Abandon: uncontrolled, ecstatic, passionate, no holds barred, extravert

106. Balance, control: reserved, cool and collected, serious

107. Humour: comical, funny, jokes, irony

108. Cultural dominance: important, prestigious, grandiose, sophisticated

109. Cultural emergence: dare-devil, rebellious, cheeky, cool (hip)

11. Culturally positive

110. General: pleasant, all is well, good feeling, nice atmosphere

111. Love, kindness: friendly; romantic; seductive; gentle; kind; well-meaning

112. Tranquillity, serenity: peaceful, quiet, still, harmonious, relaxed

113. Joy, festivity: happy, carefree, amusing, celebratory

114. Beauty, attraction: good-looking, elegant, nice proportions

115. Lightness, openness, freshness: clear, fair, frank, fresh, young, pure, clean, free, luminous, transparent

116. Strength, pride, success: brave, heroic, victorious, honourable

117. Wisdom, trust: reliable, experienced

118. Order: correct, tidy, well organised, efficient

12. Culturally negative

121. Generic: bad, nasty, unpleasant, suffering, pain disaster

122. Enmity, aggression, implacability: hate, rage, hostile, cruel, violent, destructive, vengeful, merciless

122. Disturbance, danger: unrest, adversity, setbacks, worried, troubled, threat, ominous, fateful, tense, scary, nerve-wracking

123. Sadness, boredom: disappointed, depressed, tragic, sorrow, melancholy, abandoned (alone), deserted, bored, alienated, monotonous, listless

124. Ugliness, repulsion: disgusting, revolting, crude, creepy, gross

125. Darkness, encumbrance, clandestinity, miasma: gloomy, hidden, stealth, heavy, confined, ill, decadent, dirty, rotting, dead, drugged, drunk

126. Weakness, fear, failure: hesitant, defeated, cowardly, miserly

127. Madness, futility, suspicion: absurd, stupid, useless, jealous, guilty

128. Disorder: messy, chaotic, confused, tangled, incomprehensible

129. Asociality: crime, delinquency, greed, robbery, prostitution, corruption

14. Culturally neutral

141. Asperity: rough, tough, sharp, jagged, hard, steep, bitter, sour, dry

142. Mollity: smooth, mild, soothing, rounded, curved, soft, wet, sweet

143. Heat: glowing, boiling, hot, warm, lukewarm

144. Cold: cool, freezing, icy

145. Largeness: big, huge, great, broad, wide, tall, high, long

146. Smallness: little, tiny, minuscule, narrow, short

147. Density: compact, crowded, full, deep

148 Sparsity: diluted, spread out, dissipated, empty, shallow

149. Colour: colourful, pastel shades, white, black, blue, green, yellow, etc.

2. Beings, props, gatherings

20. General

201. Generic gender: male, female (no person specified)

21. One human

210. Either gender: a figure, a person, a child, best friend (unspec.)

211. Single male: boy, man, [old] man, cowboy, cop, spy, soldier, hero, gangster, villain; + named males (e.g. Hitler, Bing Crosby, Bond, Dr Who)

212. Single female: girl, woman, heroine, princess, witch; Julie Andrews, Marilyn Monroe, Mother Theresa, Lisbeth Salander, Queen Elizabeth I

22. Two humans

220. Either gender: two people, me and you

221. Two males: two men, two buddies, Laurel & Hardy, Starsky & Hutch

222. Male and female: couple, lovers, Romeo & Juliet

223. Two females: best friends (fem.), Thelma & Louise

23. Several humans

230. Either gender: some people, [in] company, children, group of people

231. Males: sons, cowboys, ‘suits’, tough guys, goodies, baddies, football team

232. Females: girls, ladies, women, ladettes, ballerinas, prostitutes, nurses

24. Many humans

240. Either gender: crowd, many children

241. Many males 242. Many females

26. Props, objects, couture

261. Human body: hair, eyes, nose, mouth, arms, legs, hands, feet

262. Clothes: dressed up, skirt, uniform, jacket, coat, dress

263. Furnishings etc. window, curtain, chair, fire, bath, swimming pool

264. Comestibles, props: food, drink, eggs, chewing gum, sugar, beer, cigarettes, drugs, balloons (ludic), smoke (cigs.), briefcase, plastic bag

265. Vehicles: boat, car, bike, train, aeroplane, helicopter, space ship

266. Appliances: machine, fan, rope, gun, chain saw, cigarette lighter

267. Stones, metal: gold, iron, jewels, treasure, bricks, concrete

268. Paper: book, newspaper, banknote

269. Mortal remains: carcass, corpse, skull, bones

27. Social activity

270. General: society, culture, night life

271. Ritual: wedding, funeral, initiation rite, confirmation (rite)

272. Festive: party, picnic, gala, festival, birthday, public holiday

273. Presentational: performance, parade, display, spectacle, circus

274. Sport: Olympic Games, World Cup, football, horse racing, swimming

275. Military: army, battle, war, navy, air force

276. Recreational: entertainment, holidays, excursion, on leave

277. Economic: business, bank, sale, marketing

278. Educational: school, college, academy

279. Religious: prayer, liturgy, Salvation Army

28. Domesticated animals

281-284. Pets (dog, cat). Livestock (cattle, sheep). Horses. Birds (parrot)

29. Wild animals

291-294. Predatory (tiger). Flock (buffalo). Birds (swallows, wild geese)

3. Location, scene, setting

30. General or indoors/outdoors

300. Generic setting: at home (geog.), abroad, heaven, hell, local area

301. Generic outdoors: in the open air, outside

302. Indoors: at home (dom.), at work; disco, club, bar

304. Subterranean: under ground, tunnel, cave

305. Generic buildings: house, palace, (railway) station

31. Rural

310. General: countryside, pastoral, rural, bucolic

311. Campestral: fields, meadows, cornfields

312. Edificial: farm, manor, (country) cottage, castle (rural setting)

313. Undulant: hills, valleys, slopes,

314. Sylvan: woods, forests, trees

315. Horticultural: garden, lawn, flowers, fruit trees, spa, cemetery

316. Fluvial, lacustrine: rivers, lakes, brooks, creeks, (rural) canals

32. Panoramic

320. General: big country, broad expanses, vistas, open space, horizon

322. Flat: plain, fen, steppe, prairie, moor, savanna

323. Barren: wild country, desert, polar regions

324. Tropical: jungle, palm trees

33. Aqueous, aerial

330. Generic: water (unspecified)

331. Pelagic: sea, ocean, open water

332. Littoral: bay, inlet, beach, shore, island, archipelago, jetty

334 Aerial: air, clouds, sky

335. Cosmic: (outer) space, stars, planets, galaxy, universe

34. Miscellaneous outdoors

341. Natural: leaves, cliffs, den (animals), clouds of dust/sand

342. Artefactual: road, track, path, bridge, railway, highway

35. Urban

350. Generic: town, city

351. Thoroughfares: street, square, market place, 5th Avenue, Picadilly

352. Neighbourhoods: slum, downtown, red light district, suburb

353. Edificial etc.: factory, skyscraper, supermarket, airport, fun fair

354. Traffic: (lots of) cars, traffic jam, rush hour

355. Miscellaneous: street lights, neon signs, (outdoor) adverts, asphalt, kerb

36. Social location

361. Upper class: aristocracy, rich, [haut] bourgeois, royalty

361. Middle class

362. Lower class: working class, unemployed, poor, ‘the little guy’

37. Geographical location

371-379. Northern Europe, Southern Europe, North Africa and Middle East, Subsaharan Africa, South Asia, East Asia, Australasia, Oceania, North America, Central America, South America etc.

38. Historical location

380. Generic past: bygone days, olden times, in the past, once upon a time

381. Distant past: prehistoric, ancient times

382-386. Middle-distant past: medieval, baroque, 19th century, etc.

387. Recent past (relative to time and aim of reception test)

388. Today: modern, contemporary, up-to-date (relative to time of test)

389. Future: tomorrow’s world, times to come, near/distant future

39. Weather, season, time of day

391. Weather: sun, rain, fog, haze, mist, wind

392. Season: spring, summer, autumn, winter

393. Time of day: night, day, morning, evening, sunrise, sunset, dawn

5. Explicit space-time relations, movements, actions and interactions

50. Generic movement

51. Essive relations

511. Inessive : in, among, in the middle of

512. Superessive: above, overhead, on high, high up, on top of

513. Subessive: under[neath] 514. Retroessive: behind

515. Pre-essive: in front of, facing, on the other side [of], opposite

516. Circumessive: around (static), surrounding (static)

517. Conessive: present (loc.), [is/are] there , alongside

518. Non-essive: absent, not there, missing

52. Velocity and simultaneity

521. Low speed: slow, gradually, all the time, for a long time

522. High speed: fast, quick, suddenly, momentary, short time

523. Simultaneity: at the same time, together, synchronised, in phase

524. Asychronicity: out of time, out of step, out of sync, separate, divided

53. Non-specified movement, specific relative direction

531. Adventive: approach, arrive, enter, return [to here]

532. Exitive: leave, go away, part, [say] goodbye, walk out, escape

533. Transitional: pass, past [the window], [move] across [the field], [move] over [the meadow], forwards, along, between (mvt.), through

534. Ascending: [a]rise, go up, open up/out, reveal, upwards, from below

535. Descending: go down, close up/in/down, downwards, from above

546. Circular: [going a]round, circling, enveloping

54. Oscillatory and repetitious movement

541. Curvilinear: roll, undulate, wave, sway, whirl, spin, round and round

542. Tremulous: tremble, wobble, quiver, glitter, flicker, flutter, rustle, babble

543. Pulsating: throb, flash, jerk, pump, again and again

55. Prolapsual and volitative movement (directional)

551. Flowing (of liquids): flow, stream, run, pour

552. Floating, sliding (direction): float, sail, slip, slide

553. Volitative (direction): fly, glide, swoop

56. Specific movement, unspecified direction

561. Constant: shine, gleam, glare

562. Eruptive, tumescent, torrential: explode, gush, surge, burst

563. Pedestrian: walk, run, wander, trot, march, footsteps

564. Vehicular: travel, journey, ride, cruise, cycling, riding, driving

565. Ludic: play, perform, dance, swim, skate, hop, skip, jump

57. Stationary acts

570. Wait, hang around 571. Quiescent: rest, sleep, relax

573. Sedentary: sit 574. Upright: standing, on [his] feet

58. Suspension

580. Inactivity: motionless, do nothing 581. Aquatic: float (no direction)

582. Aerial: hover (no direction) 583. Other: hanging, dangling

59. Interaction

591. Appreciative, affectionate, respectful: ‘I love you’, marry, embrace, kiss, caress, smile, laugh, celebrate, salute, console

592. Conflictive, coercive, contusive: beat, hit, break, pierce, crash, shatter, smash, fight, struggle, bully, force, wound, shoot, kill, conquer

593. Cogitative, intentional: think, ponder, plan, try to, dream, decide, discuss, experience, feel, recognise, remember, understand, misunderstand

594. Transferential: push, pull, bring, take, drag, drive, carry, fetch, chase, accompany, follow, fill, empty, disseminate, spread, collect, retrieve

596. Symbolic communication: show, gesticulate, look, see, hear, listen, talk, whisper, shout, groan, sigh, cry, sing, read

597. Culinary: eat, drink, cook, fry, boil (tr.)

8. Media immanence

81. Musical

811. Genres and styles: classical, opera, jazz, punk, techno

812. Instruments: strings, brass band, orchestra, covers band, flute, trumpet, Fender Stratocaster, kick drum, piano, Hammond organ, church organ

813. Musicians: bass player, lead singer, Beethoven, Zappa, Britney Spears

814. Musical structure: singable tune, [nice] rhythms, dissonant, verse, chorus, bridge [section], minor key, diminished seventh

815. Musical works: Apache (Shadows), Boléro (Ravel), Liberty Bell (Sousa), Moonlight Sonata (Beethoven), God Save The Queen (Sex Pistols)

816. Dance: ballet, samba, shake, waltz; Pan’s People (cf. 232)

82. Extra- and paramusical

821. Cinema: feature film, movies, black and white film, cinemascope, silent film, Hitchcock, Disney, MGM; The Girl with the Dragon Tattoo, The Godfather, The Sound of Music, Taxi Driver, The World of Apu.

822. Television: TV series, [TV] documentary, news programme, soap opera, nature programme; Bonanza, Emmerdale Farm, Maigret, Wallander.

823. Videos, adverts, games: music video, [shampoo] advert, Mario Kart Wii

824. Radio: Melodiradion (Sweden), Radio 4 (BBC); DJ (radio)

825. Verbal media: books, poems, novels, newspapers, plays; Henning Mankell, Val McDermid, William Shakespeare

826. Other media: sculpture, painting; The Garden of Earthly Delights (Bosch), Kandinsky’s Composition X

83. Target groups

831-839: for the whole family, for children, young audience

84. Non-music genres

841-849: film noir, Western, science fiction

85. Production techniques

851-859: panning shots, cut-ins, slow motion

87. Production origin

871-879: as category 37, e.g. Czech [TV series], Hollywood [blockbuster], Bollywood [musical], Hong Kong [martial arts movie], Japanimation, Manga [movie], spaghetti [Western]

88. Production vintage

881-889: prewar [film]; 1980s [game show]

9. Evaluative and judgemental

91. Positive evaluation

911-919: enjoyable, good [tune], well produced

92. Negative evaluation

921-929: bloody awful, brainless, third rate, contrived, trash, kitsch, slushy, cloying, syrupy, schmaltz, speculative, badly produced


VVA taxonomy issues and explanations

The obvious advantage of a taxonomy like the one just shown is that you can group, say, the VVA love under kindness and romance (category 111) rather than with its alphabetical neighbours lousy (category 92), lout (129) and Löwenbräu (264). The taxonomy is, however, not without problems.

[1] Demographic inadequacy and cultural specificity. The taxonomy presented above, based on 8,552 VVAs collected in the early 1980s from over 600 individuals (mainly Swedes and Latin Americans) responding to ten different film and TV title tunes, cannot represent in any semantically exhaustive way the totality of those respondents’ imagination on hearing those pieces. That’s because what we present in such a list is the result of no more (nor less) than our interpretation and classification —in their turn based on criteria described below (§7 pp. 219-221)— of verbal-visual responses that in themselves inadequately express what the music ‘means’ to each respondent. Of course, that is the nature of the beast because, as already stated several times, trying to put music directly into words is a pointless undertaking. Still, that is mercifully not the object of this kind of reception test whose VVAs need to be considered metaphorically and musogenically, not just in terms of literal verbal denotation. However, the most substantial problem with our taxonomy is its cultural specificity: 8,552 VVAs from 561 Scandinavians and 46 Latin Americans hearing ten short pieces of stereotypical title music in the 1980s represents an absolutely infinitesimal part of all the VVAs imaginable in response to any music heard by any population at any time in any place. For this reason our taxonomy should be understood as just one example of VVA classification among a virtually infinite number of possible variants. It’s in no way intended as a universally applicable or scientifically watertight taxonomy.

Here it’s worth noting that aesthesically based structural denotors also run into problems of cultural difference. For example, while it’s quite common in English to call a reverb ‘wet’ if its secondary signals create a constant and fairly loud ‘wash’ (long decay time), the same expression translated into Italian — un eco umido or un eco bagnato— ‘means nothing’, Franco Fabbri told me once in answer to the question ‘How would you translate wet echo into Italian?’. To explain what I meant by ‘wet’ I had to make a ‘schplAaaaaaaafff!’ sound lasting about three seconds, while using outstretched arms and cupped hands to symbolise a very large space. ‘I see’, replied Franco, ‘un eco della Madonna’, whose literal translation back into English, ‘an echo of Our Lady’, would make as little sense to English-speaking studio engineers as un eco bagnato would to their Italian colleagues.

[2] Returning to VVA classification, the cultural specificity, of the music, the respondents and the time at the basis of the taxonomy can also cause problems. For example: [i] historical location categories 387 and 388 (recent history and today/modern), altered here from the 1980s to fit the year 2012, will be in constant need of adjustment. That’s because the 1970s were, at the time of the actual reception tests, recent and the 1980s up-to-date, today and modern, and because 2012, as I write these words, will be history by the time you read this; [ii] the national cultures of our respondents and the mainly English-language origins of the music they were subjected to are reflected in what may seem like ethnocentric categories of geographical location. These responses necessitated a fine-tuning of Europe (categories 371-374, 871-874) while little or no distinctions needed to be made under Asia (375, 875) or Africa (376, 876) because we received virtually no responses including reference to anywhere Asian or African. Clearly, this aspect of the taxonomy has to change according to demographic, musical, cultural and historical factors relevant to each reception test situation.

[3] Taxonomic fine-tuning. The original taxonomy has four, not just three, levels of categorisation and most VVAs from the reception tests on which it is based are arranged accordingly. Romance, for example, sorts under category 1112 together with romantic love but not with just love on its own (1111), i.e. neither with love that might just as well be brotherly, parental or patriotic, nor with tender and gentle (1117). That’s simply because romance isn’t always tender and because confusing parental with romantic love would be incestuous: you just don’t feel the same sort of love towards your lover, your child and your nation and that means different music for all three types of love. Still, despite such important types and subtypes of love, of music and of human behaviour, those related concepts belong to the same main three-digit category 111 (Love and kindness) which is distinct from other positive three-digit categories like Joy and festivity or Lightness and openness (115 and 115 respectively, also positive but not necessarily love) and, much more radically at the opposite end of the affective spectrum, from 121 (Enmity and aggression) or 125 (Darkness, encumbrance, clandestinity and miasma). The four-digit categories are excluded from the taxonomy shown above not because they are unimportant but for reasons of space and clarity. In other words, the particular type of taxonomic fine-tuning just illustrated down to the four-digit level in our classificatory grid may well be irrelevant to reception tests whose aim and scope differ from those of the list occupying pages 209-215. Even so, the 1-, 2- and 3-digit levels may still be of some use for many reception test situations.

[4] Polysemic VVAs. As already explained, responses in the form of unguided associations need to be discretised into individual concepts so that, for example, love in an original response like The femme fatale whispers ‘I love you’ while sipping her cocktail and romance in Film noir romance from 1950s can both be considered indicative of similar love and romance connotations in response to the same music. That sort of classification is relatively unproblematic but some concepts, not least proper names, are not so simple thanks to the wealth of further connotations they carry. James Bond is a classic example of that problem.

The VVA James Bond can be correctly classified as a single named male person, real or fictional (category 211X), and, indeed, the presence or absence of a male individual is an appropriate item of musogenic information to register. However, that single male name also connotes or implies spy (category 2116), thriller (841T), tough (1091), hard (1412), adventure/action (1015), excessive bravery (1092), sex (1055, not romantic love), women (232), most of whom are probably ‘ooh-la-la’ (1145), not to mention villains (2319), murder (5928), crime (1290), etc., etc. Depending on the number of respondents you’re dealing with, there are two ways of dealing with this issue of verbally connotative polysemy. If you have many respondents, you’ll almost certainly find that the music eliciting the Bond VVA from one person gives rise to VVAs from other respondents in the other Bond-related categories just listed or that the single Bond respondent has him/herself included VVAs in one or more of those categories. Otherwise, if you only have a few respondents you can consider including cross-references to Bond as a single named male (cat. 211X) from whichever of the other categories you consider relevant.

[5] Verbal context. Unguided associations demand that VVA classification take verbal context into consideration. For example, abandon means both leave in the lurch (e.g. ‘an abandoned child’, cat. 1236) and letting yourself go in the sense of ‘no holds barred’ (cat. 105). Those two emotional states suggest very different music, as do over (as in Over the Rainbow), over (riding over the prairie), over (the party's over) and over (a dark cloud over the city). In such cases it’s not just a matter of interpreting VVAs according to the verbal context of the response in question: it’s also necessary to think, as in the case of all those overs, musogenically in terms of kinetic, spatial, gestural and tactile difference. After all, the response words were elicited by music and not vice versa. Put simply, the musogenic difference between a dark cloud over the prairie and riding over the prairie is quite significant.

[6] Distanced VVAs. At least two types of ‘distanced’ response demand special consideration: [i] those in quotes, for example, ‘nice’, ‘heroic’, ‘love’, ‘freedom’ and [ii] negative or diminutive VVAs, for example not military, or not too much violence, or slightly scary. Those who offer these sorts of response clearly think that the music in question is supposed to connote something specific (a nice, happy or scary feeling, notions of freedom, heroic deeds or of misery and so on); but the same respondents just as clearly question the credibility of that supposed connotation. If these types of response, however distanced or critical, identify specific connotative categories (nice, happy, scary, etc.) it is appropriate to register that recognition because the respondents in question did so. At the same time, respondent distancing from that recognition needs also to be registered. That’s why it can be useful to include a special ‘distanced VVA’ subtype under the relevant main category, for example 110~ for ‘nice’ under 110 (Positive in general), 121~ for not too much violence under 121 (Emnity, etc.). That classification device lets you account for both recognition and critical distance when discussing the effects of the music you’re analysing.

[7] Visual bias. The VVAs at the base of the taxonomy on pages 209-215 are exactly that —verbal-visual associations— and the visual character of many responses to our reception test pieces is exactly what we had encouraged our respondents to come up with. Now, it would be perfectly reasonable to object that our interest in the visual misrepresents musical perception which under normal circumstances seems to cause few, if indeed any, images to appear in listeners’ minds. From this valid standpoint it is logical to argue that tactile, gestural, sonic, spatial and kinetic, not visual, connotations should have been the focus of our study. I wish that had been possible. The trouble with this laudable line of reasoning is that it is impracticable because it assumes non-visual modes of cognition to have an equal status to that of vision in our scopocentric tradition of knowledge. The problem is that words descriptive of touch, of gesture, of sound, of para- or extramusical space and movement are so much less common than those denoting what we see. As Johnson (2003 §5) notes:

‘English is very strong in visual modes. Read a page of English and try to delete all visual metaphors. Even harder: replace them with aural ones. It becomes instructively frustrating to discover how many terms we take for granted in discussing ways of knowing, for which we have only visually oriented vocabulary’.

Space and movement are more often than not popularly verbalised, concretised and, yes, visualised in terms of beings, objects and places that are, however tautological it may sound, visible, either in reality or in the mind’s eye rather than in the mind’s ear or at the mind’s fingertips. The sad conclusion here is that if we want to understand music’s meanings through the ears and minds of its final arbiters of signification (the whole point of the reception tests discussed here) we must, at least in our scopocentric tradition of knowledge, rely to a large extent on verbalisation of the visual as an unavoidable symbolic intermediary. Of course, those verbalisations can in themselves never be much more than metaphorical hints of whatever the music really seems to be expressing. Or, to use two visual metaphors, we shall at best ‘see through a glass darkly’ or be ‘the one-eyed man in the land of the blind’. But that doesn’t mean the responses discretised and classified in our VVA taxonomy let us ‘see’ nothing at all. The common denominator of the Austria of Julie Andrews in The Sound of Music and of the shampoo in the Timotei commercial (pp. 167-169) certainly suggests otherwise, as does the fact that our respondents unequivocally agreed in considerable detail about what the different test pieces connoted. For example: [1] no children (category 2301) were imagined in connection with the test battery’s only romantic love (cat. 1112) tune; [2] armed forces (275) and a total absence of reflective thought (103) were exclusive as combined characteristics for the only march; and [3] nervous tension (1223), sweat (2619) and no rural scenario (31-32) constituted a combination of VVAs exclusive to another of the ten tunes.

[8] Generic annexing ● One problem imposed by the necessity of visual imaging as an intermediary mode of perception is that respondents sometimes come up with VVAs whose semiotic link to the music they’re hearing may seem inexplicable. For example, the second of the Ten Little Title Tunes (TLTT) was a Western theme eliciting several Clint Eastwood and Italian Western responses even though the test piece contained very little resembling Morricone’s iconic sounds for the ‘dollar’ movies. In such instances the process of generic annexing works something like this: [1] ‘this is obviously a Western theme’; [2] ‘the Westerns I remember best were Italian and starred Clint Eastwood’. Those VVAs derived in other words much less from interobjective similarity bewteen the test piece and Morricone’s music for the Sergio Leone Westerns, much more on the visual annexing of narrative tropes typical for the Western as a much broader genre. The same goes for the few respondents who mentioned Indians: nothing in the test piece resembled all those familiar Hollywood cues of ‘Injun’ savagery in productions like Stagecoach (Steiner & Hageman 1939), Valley of the Sun (Sawtell 1942) or How the West Was Won (Broughton 1976). Those listeners simply annexed Indians as an automatic ingredient of visual narrative in relation to music that clearly spelt Western as a narrative film genre but which gave no sonic hint of any Indians.

As long as you’re aware that the visual, non-musogenic extension of an overall narrative genre suggested by the music can occur in reception test situations there need be no major problem. That’s mainly because generic annexing is the exception rather than the rule in responses to test pieces and because it is usually more than adequately counterbalanced by a majority of patently musogenic VVAs.

[9] Taxonomic criteria. A quick glance through pages 209-215 might give the impression that the taxonomy was based on subjective intuition. That objection is partly valid because Bob Clarida and I did from time to time ask each other what sort of music a previously unclassified VVA demanded so that we could compare the music we imagined suitable for that VVA with what we knew to be typical for a particular subcategory or subtype already included in our taxonomy. If such a category already existed we could classify the new VVA accordingly or, if unclassifiable in the grid as it existed at that point in time, we could create a new subtype for it and often, as it turned out, welcome others into its company. That procedure was, however, secondary and normally used only if our overriding classification criteria proved inadequate. Those overriding criteria derived from two sources: Polish musicologist Zofia Lissa’s list of film music’s functions and the descriptive tags and titles given to pieces in library music collections. Both are worth considering here because they represent widespread everyday practices in the verbal characterisation of musical meaning.

Lissa and library music

Authors treating the subject of music and the moving image in any depth usually propose some kind of system organising the different ways in which music relates to the images, sounds, dialogue and narrative it accompanies. Such classifications of film music’s functions are clearly relevant to anyone writing about music ‘as if it meant something other than itself’ and have influenced the construction of our VVA taxonomy (p. 209 ff.). I’ve always found Zofia Lissa’s systematisation of film music functions particularly useful — it is explained in Chapter 14, (p. 546 ff.) — because she constructs, discusses and exemplifies her classification through musicological argumentation that allows for verbalisation of a musical understanding of musical functions. For example, referring back to our classification grid, Lissa’s function number 1 (underlining movement) is closely related to our category 5, her function 3 (location) to our categories 30-37 and her function 4 (representing time) to our categories 02, 03, 38 and 39. However, although influential at this general level in the development of our VVA taxonomy, film music function classifications, including Lissa’s, were of less use when it came to the finer distinctions of musically constructed semantic fields at the three- and four-digit levels. Here we had to turn to our own musical experience and, more importantly, to library music characterisations of musical message .

Library music is also known as production music and, as those names imply, denotes a collection of recordings of almost exclusively instrumental music, each of which can be taken out of that collection (the ‘library’) for use as jingles, title themes, underscore etc., typically in TV and radio programming, in adverts and in low-budget films. Library music differs from music specifically commissioned for particular audiovisual productions —the usual procedure with film music— in that it is created and recorded in advance, in isolation from and without prior knowledge of any particular production in which it might later be used. Since library music is rarely conceived for use in a particular media production it cannot guarantee the uniquely customised ‘feel’ or exact synchronisation which a good working relationship between the composer and the film director or TV producer can create. Library music is in this sense a contradictory phenomenon: it has to be specific by providing particular moods, scenarios and dramatic functions, but it is at the same time generic because those particular moods and functions must have the potential to be used at any suitable point in any media production. Since the specific musical demands of specific situations in specific media productions can be most satisfactorily met through direct contact beween director/producer and composer, library music companies have to compensate for their disadvantage in this respect by being as specific as possible about the character of each track in its repertoire. These verbal specifications appear on vinyl sleeves, in CD inlays or in the catalogues, indexes and databases issued by the larger library music companies. Table 6-2 (p. 225) presents, in alphabetical order, a wide but by no means exhaustive selection of descriptive tags culled from various library music collections.

The categories listed in Table 6-2 are, from the standpoint of verbal taxonomy, pretty disparate. They refer to: [1] musical genres, instruments or structural traits (e.g. jazz, pop, soul, classical, strings, guitar, percussion); [2] states of mind (happy, sad, sentimental, solitude, love, stress, etc.); [3] synoptic or episodic functions (openings, links, bridges, titles); narrative genres (adventure, thriller, detective, Western); [4] historical periods (medieval, contemporary); [5] generic scenarios (sea, nature, rural, water, foreign); [6] named locations, regions or cultures (African, Celtic, Oriental); [7] animals (birds); [8] social functions, rituals and activity (sport, funeral, science, industry); [9] speed and movement (fast, slow); [10] affective descriptions of people, locations, actions and environments (big, clumsy, dainty, glamorous, impressive, eery, intimate, urgent) and so on.

Table 6-2: Selection of library music descriptive tags

action adventure African amusement ancient

animal archaic Asian atmospheric austere

ballad bands battle big birds

bridges bright bucolic Celtic children

classical closes (ends) clumsy comedy contemporary corporate dances danger dark detective

disaster drama(tic) dreaming ecological eerie

electronic endings enterprising ethereal exotic

fashion fast festival folklore foreign

funeral futuristic glamour grandiose Gregorian

grotesque gruesome guitar[s] happiness heavy industry

horror humour hurry impressive industrial

industry intimate introspective jazz jingles

joyfulness laboratory Latin-American light industry links

love luxurious majestic marches medieval

melancholy melodic melodrama military monotony

musette mystery national nature neutral

night club nostalgia obsessive olden times open air

openings organ Oriental panoramic parody

passion pastoral pathétique percussion period

playful pop prestigious purity relaxing

religious rhythmical ritual rock romance

royal rural sad scenic science

science fiction sea sentimenal. serious 17th century

slow solitude solo instrum. soul S. American space spectacular sport storm stress

strings suspense swing symphonic synthesiser

tails (ends) tender tense/tension themes thriller

titles traditional tragic transitions travel

underwater urgent violent vocal water

wedding Western

Table 6-2 and the comments preceding it deal with verbal clues about music conceived to facilitate the musical production of audiovisual presentations by individuals, most of whom, like the majority of our respondents, lack formal musical training. The main difference between our respondents and the average user of library music is that the former are at the receiving end, the latter at the transmitting end of the musical communication process. In other words, respondents have to come up with words descriptive of music rather than with music already hinted at in words. The importance of this dialectic is that in both cases the words referring to music are not so much structurally descriptive of music (poïetic) as functionally or synaesthetically grounded (aesthesic) in the established everyday practices of musical perception in an audiovisual mass-media context. This semio-musical reality makes the descriptive terms used by library music catalogue editors a logical starting point in the establishment of a taxonomy of visual-verbal associations (VVAs) to music. Of course, the taxonomically disparate nature of the sorts of concept listed in Table 6-2 needed arranging more systematically for the purposes of response classification but they do constitute the raw materials of our taxonomy.

With library music we are back to the start of this chapter and to the idea that ‘VVAs in response to a particular AO can be gathered by studying writings about the AO in reviews, album inlay notes, blogs and so on’ (p. 200). That ‘so on‘ is important because it includes the working vocabulary of library music company staff describing individual recordings in their collections so that potential users will have at least some idea as to whether the music so described will communicate whatever it is they want their audience to experience. And yet this working tradition of everyday music semiotics in practice, which includes choosing a suitable title for each piece, this tradition of using words on a daily basis in media production to characterise musical messages, seems to be either unknown or of little interest to scholars of music semiotics, at least to the extent that I’m unaware of any reception research into whether the characterisations offered by library music company staff actually tally with what respondents imagine or feel on hearing the music in question. If that is so we’ll be unable to make use of library music’s working repertoire of aesthesic musical descriptors when discussing the meaning of other music. But there is a way out of the epistemic impasse just described.


If we could establish verifiable links of structural similarity between our musical analysis object (AO) and certain pieces of library music, then we could test the hypothesis that some of the structurally similar library music’s verbal characterisations might also apply to our AO, i.e. that similar musical signs relate to similar musical interpretants. We could also look for structural similarity between our AO and other pieces of music, perhaps a song with lyrics, or a particular type of dance, or music for particular types of scene in theatre, film, TV or games productions. We could then test hypotheses about semiotic links between our AO and those lyrics, dances and scenes. These procedures are the main subject of the next chapter.


Summary of main points

[1] Intersubjectivity arises when at least two individuals experience the same thing in a similar way. Intersubjectivity is important when trying to understand how music is received, used and interpreted.

[2] There are at least six good reasons why focus has to be on the aesthesic pole when applying intersubjective approaches to understanding music and what it communicates.

[3] Ethnographic observations can be useful in intersubjective studies of music but the most common and direct way of finding out what sort of intersubjectivity exists in relation to a piece of music is to carry out some kind of reception test.

[4] Many different factors determine how a reception test will be conducted. How many respondents? Using interviews, handwriting on paper or online questionnaires? These choices are influenced by factors like methodological and demographic focus in terms of audience type, social scene, style-specific issues, etc.

[5] Unguided responses are more reliable and informative than results gathered through multiple choice tests.

[6] Full answers from each respondent have to be discretised into individual VVAs (verbal-visual associations) and sorted into categories so that it becomes clear how much of what respondents associated to when hearing each piece in the test.

[7] A four-tier taxonomy is presented as starting point for VVA categorisation work. Special care needs to be taken with polysemic VVAs, distanced VVAs, and questions of cultural specificity.

[8] Systematisations of film music functions and, in particular, library music categories and descriptions provide interesting models for grouping VVAs into useful categories. Library music is also an excellent source when tracking down IOCM (see Chapter 7). 2012-09-28, 19:30




WARNING.Please note that hyperlinks, incl. links to other pages in the book, DO NOT WORK in this specimen file.


7. Interobjectivity



In Chapter 6 we started trying to unpack the black box of musical meaning (Figure 7-1). Ethnographic observation, reception tests and a taxonomy of VVAs led to the establishment of shared subjectivity of response, as evidence of ‘other things than just music’ that demonstrate the existence of semantic fields linked to musical structure in an analysis object (AO). Those ‘other things’ are called paramusical fields of connotation, or PMFCs for short. The links are not extra- but paramusical because they exist alongside or in connection with the music, as an intrinsic part of musical semiosis in a real cultural context, not as external appendages to the music. The VVAs in Chapter 6 —all verbalised in terms of movement, location, mood, feeling and people, all those library music titles and descriptions etc.— are intrinsically paramusical. They are essential to the establishment of PMFCs, i.e. of particular semantic fields connected to particular sets of musical sound in particular cultural contexts. Now, the PMFCs in Chapter 6 derived mainly from intersubjective observations of response in relation to particular structural configurations in particular pieces of music. This chapter focuses on an interobjective approach to musical semiosis (Figure 7-2, p. 238).

Interobjectivity clearly has something to do with relationships between objects. It presupposes that objects consist of structural elements, and that one object can be more or less like another depending on the elements, if any, they share in common. Now, if any of music’s structural elements are, as we’ve argued, capable of carrying meaning we’ll need first to have some idea of what is meant by three concepts: [1] a musical object; [2] a musical structure; [3] ‘a musical structure that carries meaning’ or museme. With those working definitions behind us we’ll be able to focus more clearly on interobjective procedures.

Basic terminology

Object and structure

In the expression ‘analysis object’ (AO), object is not used in the Peircean sense (p. 156). Here it just means an identifiable piece of music in audible form, the object of analysis. It can be a pop song, a classical symphony movement, a jingle, a film music cue, a TV theme etc., and it usually has a name or title of some sort. When used in this sense, a musical object, if stored as recorded sound, will typically occupy one cd track or constitute a single audio file. Therefore, the interobjective procedures explained later in this chapter involve the establishment of sonic relationships between an analysis object (AO) and at least one other musical object (piece, song, movement, track, etc.). The recurring proposition in interobjective analysis is that something in musical object A (the AO) sounds like something in musical object B (or C or D… or Z).

Now, that something that sounds like… could be almost anything. It might be a turn of melodic phrase, a riff, a sonority, a rhythmic pattern, a harmonic sequence or type of chord, a particular use of particular instruments, of vocal timbre, of acoustic space, any of which could be presented at a particular speed in a particular register at a particular level of intensity and so on. Any such ‘something’, can be poïetically identified as a particular configuration of different parameters of musical expression of the sort just mentioned (rhythm, pitch, timbre, etc.). It will also usually be a combination of several such ‘somethings’. It could be a particular harmonic sequence played by particular instruments using a particular rhythmic pattern, or a particular melodic turn of phrase delivered with a particular vocal timbre at a particular pitch and volume in a particular type of acoustic space towards the front, back, left, right or centre of the mix. Most of these ‘somethings’ will be short enough to fit into the extended present but they can also be processual, comprising the order and manner in which different sections (episodes) in the AO are presented, varied, extended, shortened or repeated.

Whatever the exact structural characteristics of the possible types of ‘something’ may be, I just used poïetic rather than aesthesic terms to exemplify those constituent aspects of a musical object, i.e. I used terms derived from the process of constructing the sounds rather from how they’re perceived as communicating anything else than themselves. The ‘somethings’ of the previous paragraph are in that sense qualifiable as structural because any one them can be conceptualised as a musical structure regardless of semiotic potential. Just like these words typed into my computer, written to disk and useless until they are read or heard, musical ideas also have a semiotically dormant mode of existence, whether stored as an audio recording, or as a score, or in the brain cells of individuals constituting a musical community: they are also useless until they are reproduced and heard inside the head or out loud. In other words, a musical structure, as a poïetically determinable entity and set of sounds in physical form, always has the potential to become a sign in Peirce’s primary trinity of semiosis. In that case its status as sign presupposes that the structural entity materialises an initial idea or intention, and, more importantly, that it’s linked to an interpretant. If such a structure is not considered semiotically it remains just that —a mere structure— but if it’s considered along with intended or perceived meaning it also becomes a sign, a structural item of musical signification. A structural item with semiotic properties in music will be called a museme. If only things were that simple…


The term museme was coined by Charles Seeger (1960: 76).

[It is a] ‘unit of three components —three tone beats— [which] can constitute two progressions and meet the requirements for a complete, independent unit of music-logical form or mood in both direction and extension…. It can be regarded as… a musical morpheme or museme.’

The last part of this statement is clear enough: if a morpheme is the smallest linguistic unit that has meaning in and of itself then a museme is the smallest unit embodying meaning in music. If that is so, Seeger’s explanation of the term is problematic for several reasons.

Tone, as in ‘tone beat’, is the first problem with Seeger’s definition of museme. If tone means a note of discernible fundamental pitch, then a musical structure consisting of three notes without discernible fundamental pitch, as in a drum pattern, would have no ‘music-logical form or mood’ and would carry no meaning. Since that conclusion is both false and an insult to drummers let’s assume that Seeger meant ‘three notes’, using note in the midi sense of the word, i.e. a single, discrete sound of finite duration in a piece of music, whether or not the sound is tonal. At least that definition caters for the connotative distinction most Western listeners are capable of making between, say, a symphonic timpani roll and a funky drummer loop. It would also let us use the term museme to ‘horizontally’ identify meaningful units of rhythmic and melodic structuration, i.e. in terms of at least three consecutive notes and to think about such unlayered musemes as constituent elements in single-strand units of musical meaning —museme strings—, as evidenced in musical motifs, phrases, ostinato patterns or riffs, etc. So far, so good. The trouble is that musical meaning is not solely dependent on note sequences (the diachronic, ‘horizontal’ aspect). It is, as we’ll see, at least as much a matter of simultaneous layering (the synchronic, ‘vertical’ aspect) of notes.

This is neither the time nor place to discuss the epistemological background to Seeger’s pioneering ideas about musical meaning, except to say that its problems may derive partly from the type of linguistic theory circulating in his day, partly from conventional musicology’s fixation with narrative form (diataxis) and an apparent reluctance to deal with semantics or pragmatics. Many linguists have since Seeger’s day argued that prosodic aspects of speech —timbre, diction, intonation, volume, facial expression, gesture, etc.— are semiotically at least as important as the words they accompany. If such layering of sonic structuration is important to the mediation of meaning in speech, it’s absolutely essential and intrinsic to music because notes cannot exist without the sound carrying them, be that sound and its note[s] imagined inside your head or heard out loud. To put it in simple terms from the musician’s standpoint, the sound you put with the notes —how you play or sing them— is semiotically at least as important as the notes you put with your sound. Neither can exist as music without the other and, when it comes to musical signs, the how (notes or sound) is inevitably an intrinsic and inseparable part of the what (sound or notes). These ideas may become clearer with a bit of concretisation.

The two statements don’t worry about me said nonchalantly and don’t worry about me spoken with bitter resentment quite clearly send no more the same message than do the first line of your national anthem played by a professional symphony orchestra accompanying a large chorus of trained voices and the same passage sung out of tune, with the wrong words, by someone with a foreign accent accompanied by two drunks mistreating a concertina and a battered old acoustic guitar. Of course, the difference between the first three sung notes of one national anthem and another is semiotically significant, however they are performed, because that difference allows listeners to musically distinguish one nation from the other during, say, TV coverage of the Olympics. That said, the way those notes are sounded is at least as important, for while the symphony orchestra version of your national anthem may well be heard in terms of national pride and dignity, the foreign drunks are more likely to come across as disrespectful, as performing a musical equivalent to burning the flag. That cardinal difference between pride and ridicule is just as much a matter of musical structure (volume, timbre, instrumentation, intonation, accentuation, phrasing, etc.) as the notes (pitch and rhythm profile) telling us which nation’s patriotism is being extolled or dragged through a dung heap. All such structures and their connotations are in other words determined by different use of music’s various parameters of expression as well as, of course, by culturally specific conventions of musical perception and interpretation.

Now, assuming, at least for the time being, that museme means a minimal unit of musical meaning, it could be argued that the first notes in the tune of the Star Spangled Banner and of the Marseillaise each constitute a museme if neither of them can, as a sequence of notes producing a particular profile of rhythm and pitch, be broken down into smaller units that carry any meaning in themselves. But it would also imply that the official symphony and raucous drunks renderings of those two national anthems mean the same thing. That would be absurd because both versions of the two national anthems clearly contain other structural elements that semiotically link not to France or the USA but to interpretants which can be referred to in terms like patriotic pride and national ridicule respectively, regardless of which nation is the object of eulogy or derision. Moreover, both those types of ‘other’ musical sign can be broken down into smaller meaningful units, for example what the symphony orchestra’s string section plays on its own, or the sound of the drunk’s concertina without the raspy foreign vocals. And even those smaller but musically meaningful units may in their turn be reducible to yet smaller meaningful entities until the point where only one meaningful note is left, like the single-note museme struck on tubular bell at 0:04 in the title music for Monty Python’s Flying Circus.

If a museme can consist of as little as one single note, Seeger’s three-note criterion for qualification as a museme doesn’t work. Indeed, a one-note museme can exist because its semiotic charge relies more (though not exclusively) on how it’s constructed ‘vertically’ —by the way it’s struck on which instrument at which volume over which chord played by which other instrument[s] in which register in which tonal idiom and so on— than on its immediate ‘horizontal’ context (by its relation to whatever precedes and follows it). This clearly means that explanations of musical semiosis need to consider several individually meaningful layers that sound simultaneously but which do not necessarily occupy the same duration as each other. These composite layers of simultaneously sounding musemes are called museme stacks and constitute ‘now-sound form’ or syncrisis (Chapter 12). They’re particularly useful when forming hypotheses about which structural elements in an AO may be linked to which sort of interpretants.

Returning to the initial melodic notes of your national anthem performed in two different ways, Table 7-1 (p. 236) identifies the first museme (1a) as the first part of its first melodic line (e.g. the ‘Allons, enfants’ part of ‘Allons, enfants de la patrie’ in the Marseillaise, or just the ‘Oh, say’ bit of ‘Oh, say, can you hear?’ at the start of The Star Spangled Banner) in both the symphonic (A) and drunk (B) versions. As suggested above, the most obvious interpretant for museme 1 in both versions is the official identity of the nation in question. Museme 2, on the other hand, is actually a museme stack (or syncrisis) consisting of three constituent musemes for version A (2a-2c) and five for version B (2a-2e), some of which can in their turn also be understood as subsidiary museme stacks broken down into yet more constituent musematic entities. That sort of musematic hierarchy is illustrated by museme 2 in the B section of Table 7-1 and can be explained as follows.

Table 7-1: National anthem musemes: symphony orchestra and foreign drunks

museme museme sign designation feasible interpretants

A. Symphony orchestra and chorus

1a first part of first melodic line my national identity

2a professional symphony

orchestra in classical vein. official, organised, ‘classical’, quality, polished, dignified, impressive, etc.

2b professional chorus as 2a + large collective, synchronised individuals, common goal

2c big concert hall with

long reverb time large official venue, space for lots of

people and a big sound


1+2 Total = The nation, its values and institutions are big, strong,

honourable, etc. I may be small but I am proud to be one of its

citizens. United we stand. I belong. Together we are just great.

B. Foreign drunk singing in a pub

1a first part of first melodic line my national identity

2a single foreign vocalist not one of ‘us’, alien, inappropriate;

just one person

2b raspy voice unpolished, crude, unsophisticated


(stack) [2c1] out-of-tune guitar

[2c2] simple irregular strum

[2c3] simplified chords unpolished, unofficial, careless, messy, disrespectful; popular portable sound for parties or camp fires

2d concertina (diatonic) simple, portable, old-time, proletarian

2e background noise: glasses, chatter, raucous laughter disrespectful, inappropriate


1+2 Total = Either The nation, its citizens, its values and institutions are being vilely ridiculed and demeaned; or The bloated pomp and arrogance of those running my country is being rightly debunked.


The single foreign vocalist (museme 2a) does not represent the same thing as his raspy voice (2b) because a raspy foreign voice, a raspy native voice, a well-trained native voice and a well-trained foreign voice all sound different and embody four different interpretants. Nor do either museme 2a or 2b mean the same thing as the out-of-tune guitar strummed irregularly with simplified chords (2c) which, in its turn does not have the same effect on its own as the concertina without the guitar (2d). The total effect of these constituent musemes would also be slightly but significantly different without the background noise of museme 2e. Moreover, museme 2c (guitar) contains three subsidiary structural elements, each of which contributes to its overall meaning: it isn’t properly tuned (2c1); it’s strummed simply and irregularly (2c2); and the chords played on it are much more rudimentary than in an official version of the same piece (2c3). Alter or remove any of those three structural elements and both the overall structure and probable interpretants of museme 2c change too. Finally, add museme 1 to the equation and you have quite a complex museme stack capable of generating, inside a mere second or so, the two radically different sets of interpretants (PMFCs) shown at the bottom of each section in Table 7-1. To quote Mendelssohn again:

‘The thoughts which are expressed to me by a piece of music… are not too indefinite to be put into words, but on the contrary too definite.’

Although this discussion of the term museme will have hopefully provided a few insights into how musical signs may be constructed, identified and deconstructed, I’ve given the term no conclusive definition, simply because I can’t. It would after all be foolhardy to try and distil the theoretical essence of museme without providing much more extensive evidence of how the construction (poïesis) and reception (aesthesis) of individual musical structures are demonstrably and systematically linked to things other than themselves within the same broad music culture. Initial steps in the investigation of those links were suggested in Chapter 6 —‘Intersubjectivity’. Still, we are now, after discussing the terms object, structure and museme, in a better position to expand analytical method into the realm of interobjectivity as we seek to identify and interpret structural elements that carry musical meaning, be they musemes, museme stacks or museme strings.

Interobjective comparison

Fig. 7-2. The alogogenic ‘black box’: two escape routes

If procedures establishing shared similarity of response to music between several human subjects are called intersubjective (vertical arrow on the left in Figure 7-2), then those establishing shared similarity of structure between two or more musical objects can be called interobjective. Interobjective procedure is intertextual. It first entails finding structural elements in other music that sound like structural elements in the AO. That process of establishing musical intertextuality is called interobjective comparison. The ‘other music’ containing structural resemblance to the AO is called interobjective comparison material or IOCM for short. That type of sounds-like link is represented in Figure 7-2 by the horizontal arrow (’structural similarity’) between the AO and the IOCM.

Now, it may seem odd to suggest that referring to other music can help us escape from the black box of music is music: it’s like advocating regression into musical absolutism and to the notion that music refers only to itself. That’s why it’s essential to understand that interobjective comparison is only the first of two steps in a procedure relating the AO to the PMFCs appearing bottom right in Figure 7-2. Interobjective comparison simply exploits the non-antagonistic contradiction between music’s intra- and extrageneric characteristics, combining the potential of both. Considering first the intrageneric aspect, it’s worth recalling part of the second tenet in Chapter 2’s definition section.

‘[M]usical structures often seem to be objectively related to either: [a] their occurrence in similar guise in other music; or [b] their own context within the piece of music in which they (already) occur.’

As shown in Figure 7-2, interobjective comparison exploits this intrageneric side of the contradiction as a first step (horizontal arrow AO-IOCM) in opening up a second store of paramusical information (vertical arrow between IOCM and PMFC to the right in the diagram). A fictional example may help concretise this line of thinking.

Let’s say your AO is a short extract of film music containing sounds reminiscent of a library music piece called Mysteries of the Lake. Since that piece sounds, in part or whole, like your AO, you can assume it shares sonic structural traits in common with your AO. If that is so, the library music piece qualifies as potential interobjective comparison material —IOCM— linked to the AO by the ‘structural similarity’ arrow in Figure 7-2. At the same time, the library music piece’s suggestive title, Mysteries of the Lake, is an obvious hint at a paramusical field of connotation (PMFC) belonging to that piece of IOCM (step 2, vertical arrow on right in Figure 7-2). Noting also that library music company staff characterise the same piece as eerie and icy (also step 2), it’s possible to summarise the piece’s PMFCs so far as mystery, lake, eerie, icy. The point of this simple two-step process is that if, as in this fictional instance, the concepts mystery, lake, eerie and icy are linked to music sounding like something in your AO, then it’s conceivable that those paramusical concepts may also apply to the AO, in short that your extract of film music may be linked to a PMFC embodying notions of mystery, lake, eerie and icy. That is at least by no means unreasonable as a hypothesis. The only trouble is that one swallow doesn’t make a summer, or, less poetically, that one single piece of IOCM and its connotations do not prove that the relevant sounds in the original AO actually connote whatever mystery, lake, eerie and icy together create by way of a PMFC.

There are several ways of verifying or falsifying individual occurrences of paramusical connotation deduced through interobjective comparison. One way is to use the sort of reception tests discussed in Chapter 6 to check if the VVAs they produce (the vertical arrow of intersubjectivity in Figure 7-2) show any consistency with those deduced using IOCM. Put simply, do the two sets of PMFC at the bottom of the diagram match up? If, for instance, staying with the mystery lake example, reception test respondents associate to not just mystery, lake, eerie and icy but also to things like swirling mist, dark forest and medieval myth, all well and good; but if responses include significant amounts of, say, sunshine, airports, fashion shows, happiness and cowboys you’ll need to think again. But there are other ways of testing initial hypotheses of paramusical connotation.

The more instances of interobjective similarity you find, the better your chances will be of finding PMFCs relevant to your AO and of examining degrees of consistency between the PMFCs to all those different pieces of IOCM. For example, still using the fictional mystery lake AO, the more IOCM you find connected to PMFCs like mystery, lake, eerie, icy, swirling mist, dark forest and medieval myth, the more plausible your initial hypothesis will be. On the other hand, perhaps lake only occurs in conjunction with your initial piece of IOCM and with none of the others whose PMFCs veer more towards, say, mist, myth, medieval, Lord of the Rings or Harry Potter. If so, you might have to tweak your initial hypothesis, that is unless your respondents mention, or you find IOCM linked to, particular medieval myth elements like Merlin, King Arthur or Excalibur, in which case lake (as in ‘the lady of the lake’) would still be significant. Of course, in the unlikely event of other IOCM being connected to PMFCs verbalisable in terms like sunshine, airports, fashion shows, happiness and cowboys, you’d either have to abandon the initial hypothesis or to check how much those happy sunshine airport pieces of IOCM actually resemble your AO in musical-structural terms. You might also need to ask if the happy sunshine airport pieces are conceived in the same broad set of musical idioms as the film music cue whose ‘message’ you’re trying to explain in words.

The collection of IOCM necessary for the sort of procedure just sketched can seem like a daunting task, especially if you aren’t a musicologist or practising musician. There are three practical ways, explained next, of overcoming this difficulty: Ask a musician (with caveat), Digital recommenders and Reverse engineering.

Collecting IOCM

1. Ask a musician

One of the distinct advantages of interobjective comparison is that it treats music as music. Putting not too fine a point on it, you could say that it uses (other) music as a sort of direct metalanguage for music. The only trouble is that (verbal) language trumps all other sign systems in our tradition of knowledge and that IOCM can only be used as a first step in the semiotic analysis of music. That said, the direct structural intertextuality of interobjective comparison can, as we shall see, produce valid insights about the meaning of an AO. Musicians (instrumentalists, composers, singers, studio engineers, etc.) are very useful when it comes to tracking down IOCM because of their audio-muscular memory.

One way of conceptualising muscular memory (without the audio) is to imagine you’re at a cash machine and to tap your pin code on the nearest flat surface. You probably have a spatial-kinetic-tactile memory of your code reinforced each time you withdraw cash and you would, if your pin includes other numbers than 4, 5, 6 and 0, be confused if numbers 1-9 were arranged as shown on the left (A) of Figure 7-3 because muscular memory of your pin is based on layout B. You may even remember the gestural pattern of the phone numbers you most often call and I bet, if you’re not French and you’re confronted with a French computer keyboard, that you’ll curse every time you need to type A, M, Q, W or Z because your hands and fingers are used to making patterns on a qwerty, not azerty, keyboard. And what is more annoying than a new dvd player or tv whose remote control buttons are placed differently to those on your old remote so that the setup menu appears when your fingers press where the mute button used to be or the tv changes channel instead of turning down the volume? In these cases you simply recall and unconsciously repeat hand and finger movements that are reinforced by the rewards they regularly produce —money from the cash machine, phone contact with your nearest and dearest, your own words on the computer monitor, TV adverts with no sound, etc.

It‘s very similar with musicians and their physical relation to the sounds they’ve learnt to produce. To illustrate this point in teaching situations I often ask keyboard players in the class to ‘give me an octave’ on the nearest available flat surface. Regardless of hand size, they infallibly present a hand shape spanning just over 16 cm between the points at which thumb and small finger touch the flat surface. The audio aspect of muscular memory is even clearer in the case of cover band musicians who start work on a song they don’t know by playing along with a recording of the original version (direct audio-gestural mimicking of the relevant parts). Another example of the phenomenon is when musicians trying to transcribe what they hear use gestural patterns peculiar to their instrument to check that they’re hearing the music correctly. Even if they produce no audible sound, they hope that their gestures will correspond to what they hear in their head.

Air guitar provides another illustration of audio-muscular memory at work in music. As the Virtual Air Guitar project website puts it, ‘you don’t really need to know anything about guitar solos, except for how rock guitarists perform on stage´. The project team, like conventional air guitarists, have observed and mimicked particular gestural patterns in conjunction with particular rock guitar sounds; but they have then reversed the process so that particular gestures trigger particular sorts of sound without the performer having to play any instrument at all.

These examples of audio-muscular memory, not to mention the practice of speech shadowing and its implications for music making, illustrate that hearing musical structures is intimately linked with gesture producing those sounds and that this connection almost never involves verbal reasoning for it to work. Exploiting this phenomenon makes the collection of IOCM more direct and more efficient.

Let’s say you’ve identified a snippet of music in your AO whose connotations you want to investigate. All you need do is to ask musicians if they’ve ever before played (or sung, or composed, etc.) anything like that snippet and, if so, in what other piece of music it occurs. The musicians you ask will usually be able to recall and create or imagine a gesture that produces something resembling the musical structure in question. If they are able to isolate and identify that structure, they may even be able to imagine it in other pieces of music, perhaps a bit higher or lower, or a bit faster or slower, with a different ‘before’ or ´after´, maybe in a different key or on a different instrument, or, if sung, with different words, etc., etc. In any case, that’s how I work to find my own IOCM and if I’m unable to come up with anything because I’m unfamiliar with repertoire relevant to the snippet or sound in question, I’ll not hesitate to contact those who know it better and to ask them instead. For example, I’ve never been a brass player and I needed to test my gut feeling that the horn whoops in the theme for the 1970s TV series Kojak were heroic. That’s why I asked a friend who played French horn in the local symphony orchestra to tell me if, and if so where, he’d played such whoops before. He immediately came up with licks from Richard Strauss’s Ein Heldenleben and the Haupttema des Mannes from Don Juan, as well as with the main Star Wars theme —all highly heroic.

The great advantage of interobjective comparison is that it bypasses the frustrating exercise of trying to describe music in words. It arrives at its approximate verbal hints of musical meaning (the PMFCs lower right in Figure 7-2, p. 238) interobjectively, i.e. primarily through demonstrable musical-structural connection. The second step linking the IOCM to its verbally denotable PMFCs is merely a matter of registering previously established connections between particular musical structures and particular words (e.g. titles, lyrics), or particular types of people, action, space, energy, location, mood, movement and so on (PMFCs on right in Figure 7-2). Such patterns are of course culturally specific and warrant an important caveat.


Since the notion of music as a ‘universal language’ is so dubious (pp. 47-50), sounds like connections of the sort just described should as a rule be made using only IOCM that is part of the same broad music culture as that of the AO. Just as, say, the morpheme [wiù] can, depending on various cultural factors, be understood as we, oui, wee, Wii or weee!, the same melodic figure or instrumental sound or textural sonority is unlikely to have the same connotative charge in, for instance, bebop jazz, rap, Italian opera and Balinese gamelan music. Therefore, if the sound, whose connotations you guess to be, say, ‘weird’, is from a recent computer game, then the eerie, icy Mysteries of the Lake library music piece could well be relevant; but if the AO is a piece of traditional court music from Cambodia it would almost certainly not.

The sort of cultural incompatibility just alluded to can occur when a musician you’ve asked to provide IOCM, having first managed to reproduce the musical structure whose connotative charge you’re investigating, then places that structure in a musical context irrelevant to the broad music culture to which your AO and its listeners belong. For example, I remember hearing something resembling the hook line of an Abba song in an orchestral work by Bartók. Although the hand shape and movement required to produce (poïesis) both the Abba and the Bartók snippets are quite similar they just don’t sound the same. This aesthesic impression (not sounding the same) is due partly to differences between the tonal, orchestral and rhythmic contexts of the AO (Abba) and the potential IOCM (Bartók), partly to the fact that Abba and Bartók audiences tend more often than not to inhabit different sociocultural spaces. Although this meant I had to discard the Bartók reference in my discussion of the Abba hook line, it did seem right to use IOCM from the classical and Romantic periods in the euroclassical tradition, as well as twentieth-century popular song from Europe, North America and Latin America because: [1] the AO itself belonged to the same broad musical culture as those repertoires; [2] those musical idioms were not unfamiliar to Abba listeners in Sweden in the mid 1970s.

This issue of locating IOCM in relevant musical contexts is, as we’ll see later, a matter of precision about parameters of musical expression —the same tune played first on cathedral organ, then on kazoo will not sound the same and does not produce the same effect, so to speak. This means that the same structure with a different ‘before’ and ‘after’, in a different metre, with different instrumentation, etc., etc. cannot be expected to sound the same, let alone produce the same effect. As the Abba-Bartók incident suggests, a poïetically determined musical element in one piece, isolated and repeated with slight variations in the hopes of discovering IOCM, is by definition decontextualised: it assumes the quasi-autonomous status of poïetic structure in a dormant state and nothing else. That is clearly unsatisfactory if the aim of semiotic music analysis is, however tautological it may sound, to explain musical semiosis because that in its turn demands the existence of a musematic link between sign (the sonically concrete encoded part of the process) and musical or paramusical interpretant (whatever is decoded from the sign). This implies that a meaningful musical structure —a museme, a museme stack or museme string— should ideally be denotable in aesthesic as well as poïetic terms. The trouble is, as we saw in Chapter 3, that structural descriptors are, in Western institutions of musical learning, overwhelmingly poïetic, aesthesic descriptors much rarer and more vernacular. It’s for this reason essential, especially if using musicians to track down IOCM, to be aware of the poïetic risks involved in the process, even though instances of musically or culturally incompatible references are thankfully rare. But there other solutions to the problems of identifying musical signs in your AO and of collecting pieces of IOCM that contain such signs.

Recommender systems

Digital music recommender systems like iTunes, and Pandora have been under development since 2000 and can be a useful starting point when hunting for IOCM, as long as their limitations are understood. These systems are currently designed to make money in various ways by using music you already listen to as a basis for suggesting similar music they might be able to sell you. iTunes, for example, takes ratings from your playlists and compares those with ratings given by other iTunes users. By identifying and cross-referencing your tastes in this way, iTunes tries to predict what else you might like to hear or buy. works in a similar way. However, instead of using ratings, the software installed by on your computer logs every piece of music you listen to and builds up a detailed profile of your preferences. Your song log data is sent to’s central database and cross-referenced with log data from other users listening to similar sorts of music. It’s on that basis that the system tells you what else you ‘might enjoy’.

Unlike iTunes and, the Pandora system determines its recommendations on the basis of musical-structural traits in the music you listen to, as long as the music has already been analysed by a member of Pandora’s team of musician-scrutineers. Since the Pandora system relies on interobjective comparison (on similarities of musical structure observed by musicians) rather than on metamusical information (ratings, playlist logs, etc.), it’s hardly surprising that it currently receives so many positive online reviews as a reliable ‘sounds like’ recommender system. However, whatever the relative merits of these systems, it should be remembered that their function is not to identify and compare individual items of musical structure within a piece of music but to identify the characteristics of an entire piece with a view to selling you more pieces of music exhibiting similar characteristics. That said, these systems, particularly Pandora, ought to be able to provide you with enough titles of enough music in relevant styles that you can then test for structural similarities using your own ears.



The more the merrier

Before continuing with other possible procedures of interobjective comparison, it’s worth emphasising the following four points.

1. The more informants you ask to provide IOCM, the more pieces of relevant IOCM you are likely to find.

2. The more pieces of relevant IOCM you find, the greater your chances will be of finding PMFCs relevant to your AO.

3. The more your IOCM structurally resembles your AO, the more reliable your argumentation will be about connections between the AO and the PMFCs linked to the IOCM.

4. The greater consistency there is between PMFCs linked to your IOCM, the clearer will be your presentation of musical meaning.

These four points are only guidelines. You just can’t expect every music analysis to involve a statistically reliable sample of informants, nor an exhaustive bank of accurate IOCM for every relevant musical structure, nor an unequivocal set of PMFCs for every piece of IOCM relating to every musical structure in your AO. But there are a few simple steps that can be taken to improve analytical reliability: one is explained in the next paragraph, two more under Reverse engineering 1 and 2 (pp. 249-253) and another in the section on Commutation (p. 253, ff.).

If a reception test is part of your analysis (Chapter 6), you can always ask your respondents to provide not only the sort of connotations alluded to in the instructions on page 207: you can also ask them to jot down the name of any other music, artist, composer, style or genre the test piece reminds them of. That extra information may increase the size of your IOCM and, consequently, the number of PMFCs associated with it. As mentioned earlier, a cross-check between the two sets of PMFC at the bottom of Figure 7-2 (p. 238) can help verify or falsify your hypotheses about the musical meaning of your test piece (AO).

You can also switch the direction of the arrows in Figure 7-2. That gives two more useful ways of testing hypotheses about the meaning of sounds in your AO. Both procedures constitute a sort of reverse engineering by which you theoretically reconstruct sounds in your AO on the basis of PMFCs you think may be related to it. The first of these two procedures even lets you collect IOCM relevant to your AO without having to ‘ask a musician’.

Reverse engineering 1: from IOCM to AO

If you’re having trouble collecting IOCM for an AO you think communicates a certain mood or gives rise to certain connotations, you can start with that mood or with those connotations as hypotheses and try finding pieces of other music with titles, lyrics, on-screen action, moods and so on, that correspond to your hypotheses. For example, if your AO is a pop song whose lyrics recurrently include the words teen and angel, you can start by entering those words in the YouTube search box. Among countless versions of the actual song Teen Angel and innumerable episodes of the homonymous TV series, you’ll also find recordings of songs like Teenager in Love, Angel Baby, Tell Laura I Love Her, and Devil Or Angel, some of which may well contain passages sounding like something in your AO with all its teens and angels. If that search fails to turn up anything of relevance, you can always use a search engine like Google to look for songs lyrics containing teen or angel. If you find any (you will!), you can go to iTunes or YouTube and search by name for the relevant songs you found in Google. If the songs you find either way sound musically like your AO, you can count them as IOCM.

You can of course also use the sorts of search just explained if your AO reminds you of music by another artist or composer. Listening to short extracts from their music will soon tell you how viable any sounds like hunch might be. You can then check if any of the music your searches produce is linked to particular lyrics, moods, situations or audiences. If a particular extract from the music of another artist or composer bears structural resemblance to something in your AO (remembering the cultural caveat, of course), then those ‘particular lyrics, moods, situations or audiences’ become PMFCs of potential relevance to the discussion of meaning in your AO.

Hunting for IOCM does not necessarily entail online work. You can also scour your own or your friends’ music collections. In my own analysis work I often formulate hypotheses about musical meaning as keywords which I then shamelessly use to look for likely titles of CD and LP tracks of film music and pop songs, or, if appropriate, of classical Lieder, of Baroque arias, Romantic programme music and so on. I also search for the same keywords in the filename and title metadata of media files on my computer. If those searches produce results (they usually do) I then check, either aurally or in the score (if I have it), whether there’s anything in any of the pieces I manage to locate that sounds like anything in my AO. If there is, I note the location of the relevant musical structure within each of those pieces, along with the name of the piece and, if any, the piece’s publishing details. I then add the piece to my bank of IOCM.

But what if you’re having difficulties finding IOCM for an AO with no obvious verbal, visual or dramatic connections of its own? Perhaps it doesn’t even have a descriptive title. No problem, as long as you have a viable hypothesis about its PMFCs.

Let’s say that our fictitious mystery lake AO has no title, that it’s just listed as a numbered cue on a limited edition CD for film music buffs. As long as I have a hypothesis about its mood (it’s the mysterious lake) I’m not lost. In fact, having googled the search string |+"library music" +mystery lake| I was able, in a couple of minutes and going no further than the first few of the 16,500 hits supposedly answering to my search string, to hear sample demos from three library music pieces corresponding well with sonic particularities in the AO. The IOCM I was able to locate so quickly consisted of two atmospheric synthesiser tracks called Secrets and Unseen, and a symphonic piece entitled Approaching Unknown. This third piece was described by library music staff as ‘cautious, intense, surreal… moving, ominous, emotional, soaring… atmospheric, haunting… mysterious, suspenseful, apprehensive… eerie’, [giving] ‘a sense of the unknown, approaching trouble, mystery’ [and containing] ‘hypnotic flute, celeste, piano and harp ostinato’. No actual lake, admittedly, but I still thought the descriptions sounded about right, as indeed did the actual demo recording answering to those descriptions.

The point of these brief sorties into cyberspace is to show how simple it can be to find and hear music whose lyrics, title or descriptions tally with your hypothesis about what particular structural traits in your AO may be connoting. If something in the music of the piece[s] you discover through this sort of reverse engineering sounds like something in your AO, all well and good: your hypothesis is substantiated, at least in part. If not, your hypothesis might be faulty, or your IOCM might be conceived in a different musical idiom to that of your AO.

Whether you’ve ‘asked a musician’, used digital recommender systems or applied the sort of reverse engineering just described to hunt down pieces of IOCM and their PMFCs for your analysis, your findings can be cross-checked with results from the reception test you may have conducted (see Chapter 6). They can also be cross-checked using another sort of reverse engineering.

Reverse engineering 2: recomposition

Another control mechanism for checking the validity of the PMFCs you’ve collected intersubjectively or interobjectively, or that you’re simply putting forward as a hypothesis, is to provide musicians with a summary of your PMFCs and ask the them to come up with ideas for music they think would fit those fields of connotation. Of course, the musicians should not know the identity of your AO. The reverse arrow in this recomposition procedure goes from either of the two PMFC boxes in Figure 7-2 (p. 238) up to the AO because you’re asking musicians to reconstruct the AO on the basis of its supposed connotations. The obvious point here is that if your musicians suggest structural traits similar to those of the AO, your PMFCs will have greater validity than if their suggestions don’t sound like it. There is, however, one major problem with this procedure. If your musicians can’t verbalise their suggestions in terms you understand, if you’re unable to decipher jargon like ‘a saw-tooth cluster at 110 dB with maximum distortion at 3k’ (ouch!), and if you can’t persuade them to play or record their suggestions, then this type of reverse engineering won’t work. However, if you don’t stumble on this sort of problem, ‘composing back’ towards the AO from a set of PMFCs can be a very useful and convincing tool of semiotic analysis.

For example, during a postgraduate musicology seminar in Göteborg (Sweden) in the early 1980s, a psychologist from Lund told participants what a patient had said when listening to a particular piece of music under hypnosis. The instructions to the patient had been to say what the music made him/her see, like in a daydream. The seminar knew neither the identity of nor anything else about the piece of music that evoked the hypnotised patient’s associations which were recounted roughly as follows by the visiting psychologist.

‘Alone, out in the countryside on a gently sloping field or meadow near some trees at the top of the rise where there was a view of a lake and the forest on the other side’.

Using this statement as a starting point, seminar participants were asked to make a rough sketch of the sort of music they thought might have evoked such associations. The seminar’s collective sketch suggestion, which took about thirty minutes to produce, consisted of very quiet high notes sustained in the violins and a very quiet low note sustained in the cellos and basses. These two ongoing, extremely calm pitch polarities were in consonant relation to each other. A rather undecided, quiet but slightly uneasy melodic figure appeared now and again in the middle between the two pitch polarities. A solo woodwind instrument (either flute, oboe or clarinet) played smoothly, in a ‘folk’ vein, a wistful but not unpleasant tune that wandered quietly, slowly and a bit aimlessly over the rest of the barely audible static sounds.

The seminar’s quick sketch proved to correspond on many counts with the original musical stimulus —the ‘last post’ section at c. 4:20 in the slow movement from Vaughan Williams’ Pastoral Symphony (1922). This brief experiment suggests that people with some musical training are able to conceive generalities of musical structure linked to given paramusical spheres of association, not merely to perceive them. The recomposition exercise also suggested that the seminar participants and the patient from Lund made very similar connections, albeit in opposite directions, between specific musical structures and a specific paramusical field of connotation. The patient’s connotations and the seminar participants’ musical ideas reinforced each other.

Whichever methods of IOCM collection and PMFC verification you use, one thing is certain: the more precisely you indicate which musical-structural element[s] in the AO sound like which structural element[s] in the IOCM the more convincing your analysis will be. Besides, a musical structure can’t be treated as a sign (museme) if it isn’t also identified as a structure. This structural imperative is usually enough to make non-musos nervous, unnecessarily so, as I’ll explain under ‘Structural designation’ (p. 256, ff.). First, though, I’ll present the last of the procedures (‘Commutation’) allowing you to check the validity of conclusions you may have drawn about which structural elements in your AO relate to which PMFCs.


In linguistics, commutation means substituting one element among several in a group with something else to check if the meaning of the whole group of elements changes. For example, replacing the U sound /Y/ in southern UK English [lYk] (luck) with the oo sound /U/ in [lUk] (look) changes the meaning of the word, but making the same change from [bYs] to [bUs] doesn’t because [bÃs] (southern) and [bUs] (northern UK English) are accepted regional variants of the same word meaning the same thing: bus. Commutation is useful in the analysis of musical meaning for determining which structural elements are semiotically more or less operative than others.

Returning once more to the ‘official’ and ‘drunk’ versions of your national anthem, it’s clear from the discussion of their musemes and feasible interpretants (Table 7-1, p. 229) that some structural elements make for more radical differences of attitude towards your nation and its flag than do others. For example, replacing the raucous foreign voice with kazoo or exchanging the concertina for a ukelele would probably not make as much difference to the drunk version as would replacing the raucous foreign singer with an equally foreign classical baritone or the concertina player with a proficient pianist on a well-tuned concert grand. Similarly, it would change the character of the official version quite noticeably if even one member of the choir or orchestra were to perform their part out of time or tune, while considerably less difference of attitude toward your nation and its flag would result from a complete change of personnel from professional symphony orchestra to a proficient and well-rehearsed military band.

This sort of commutation is also called hypothetical substitution and more often than not it stays at the what if? stage. But the substitution can sometimes make you think of other music that sounds similar to the new variant you just imagined or created. That new iocm may or may not be similar to that of your AO. If the new iocm is different and if the pmfcs linked to it don’t align with those of your analysis object, then the structural element subjected to commutation in your ao can be considered operative in producing the pmfcs you found to be linked with your ao because changing that structural element to something else led to different music (the ‘new’ iocm) and to different pmfcs. Conversely, if your commutation leads to the same sort of iocm and pmfcs as those of your ao you’ll know that the element you replaced with something else was not so important in producing the pmfcs in question. An episode from an analysis class clearly illustrates this principle.

At a pop music analysis session devoted to finding iocm for a 1990s electro-dance track I was sure I was hearing a chord shuttle resembling that under the hook lines of well-known pop tunes like My Sweet Lord, He’s So Fine and Oh Happy Day. But when I started playing along with the track I discovered it was pitched in an unusual key and that I had to force my hands and fingers into unfamiliar shapes. Luckily my students didn’t notice how much effort I had to put into making it sound like one of the most familiar chord shuttles in the pop repertoire. The point is that I’d had to do something that was poïetically, from my point of view as a keyboard player, quite different: it was hard to make the music sound like ‘the same thing’. The conclusion my students and I drew from that episode was that significant changes from the musician’s poïetic standpoint don’t necessarily lead to changes of musical message because the fact that I’d had to struggle at the piano made not a blind bit of semiotic difference. Further discussion ensued and, asked what structural features would have made a difference to the musical message, the students mentioned different rhythmic and accentual patterning, a distinctly slower or faster tempo, playing the chords at a noticeably different pitch, or on an detuned piano or some other instrument. We all agreed that making simple changes to rhythm, tempo, articulation and instrumentation definitely made a difference while transposing the music up or down a semitone made virtually no difference at all. By the end of the lesson we had learnt that what musicians produce usually does make a difference to the message but that the degree of semiotic difference at the receiving end doesn’t necessarily correspond to the degree of structural difference perceived by musicians at the transmitting end.

The last example of commutation procedure comes from the fictitious mystery lake piece. Let’s say we’ve identified sounds in it that we think may somehow connote water, that none of the iocm we found has anything aqueous among its pmfcs, and that the iocm contains none of the structural elements we’ve identified as potentially watery in the ao. We can first imagine the ao without the sounds we think may be watery (i.e. take them out and replace them with nothing). If our ao with that omission sounds more like all the iocm whose pmfcs did not include water, then our hypothesis about the watery sounds in the ao may have some mileage. But it’s less likely to be a question of whether the structural element is itself included or omitted as a whole because its ‘wateriness’ could depend on any number or combination of factors —on volume/intensity, register, timbre, articulation, phrasing, tempo, metre, periodicity, tonal vocabulary, acoustic staging, etc. In fact it’s in conjunction with those parameters of musical expression that commutation is most useful because we can test, at least hypothetically, how different the music would sound if the values of any (combination of) those parameters were to be changed. In short, you have to ask what if structural element x is played faster, slower, higher, lower, smoother, choppier, using different notes, in waltz time, with a bossa nova groove, by strings or brass, with lots of reverb or dry, with the tune more up front or further back, without the bass line, etc., etc.?

Structural designation

The structural imperative in interobjective comparison, I wrote a few pages ago, is usually enough to make non-musos quite nervous. Indeed, how, you may well ask, can someone with little or no formal musical training, someone who can’t tell a diminished seventh from a hole in the wall, be expected to accurately identify musical structures, especially given the predilection in conventional music studies for poïetic descriptors of structure? Well, that objection may once have had some validity but it has in my view, at least since the mid-1990s, become more of an excuse for not confronting music as sound in the study of music. In fact I think there is today very little apart from epistemic sloth and institutional inertia that prevents non-musos from accurately identifying musical structures. I state that opinion categorically because there are at least two complementary ways of confronting the issue of structural designation, neither of which involves any muso skill or jargon: timecode placement and paramusical synchronicity.

Unequivocal timecode placement

CD tracks, films on dvd, audio files, video files, etc. all include timecode as part of the digital recording. That timecode is either displayed or displayable on stand-alone cd and dvd players; it’s also present in media playback software for computers, tablets and smartphones. As long as the piece is digitally recorded or rerecorded, the real time elapsed since the start of the piece you’re analysing is continually updated and shown as it is played. This means that you can hit the pause button when you hear any musical event of interest and note the timing at that point. Stand-alone players (cd, dvd, MiniDisc) and normal playback software on computers and smartphones let you pinpoint events to the nearest second, standard audiovisual recording and editing applications to the nearest fraction of a second. Currently (2012) the best solution is to make sure you have your ao as a sound file on the computer and to open it using audio editing software. That way you can see points of relative quiet and loudness, changes in sound wave shape, etc. that make it easier to find your way around the piece, as shown in the top part of Figure 7-4 on page 258.

The top line of Figure 7-4 is a screen capture of the whole of the original 1962 version of the James Bond Theme as displayed by the audio recording and editing software I use.H Using the line tool in an image editing application, I’ve marked up the starting points of the tune’s sections as I hear them. I can label them with vernacular terms like twangy guitar tune and spy chord because I can designate the sound I’m referring to by indicating the exact point, to the nearest second, in the tune’s timecode where that sound first occurs, for example the twangy guitar at 0:07 (for the entrance of 007 himself), the danger stabs at 1:33’ and the final spy chord at 1:40. Those structural designations are all accurate and unequivocal. No reader with access to the same recording can be in any doubt about the sounds I’m referring to.


The four small screen shots in Figure 7-4 show displays at four points in the same mp3 file of the James Bond Theme, this time using a freely and widely available media player. Please note that the total duration of the piece is 1:45 and that the screen shots have been taken at (a) 0:07, when, appropriately, the 007 tune is first heard; (b) 0:33, when the intro returns, not long before the brass first enters with its angular ‘danger’ tune at 0:40; (c) 1:17, a point unmarked in the top line of Figure 7-4; (d) 1:39 for the famous final spy chord. The timing 1:17 (c) marks the start of the last return of the intro, except that its up-and-down pattern only occurs once before the twangy guitar kicks in for the last time.

Simple media playback software is usually enough for simple analysis tasks but it has several drawbacks. [1] The pause button can be slow to react and you may find yourself noting timings that are a second too late. [2] Time resolution isn’t perfect and it can be difficult to start playing the music from exact points inside the recording. [3] You cannot extract individual mini-files or construct loops of particular sounds or passages you need to listen to repeatedly, or which you need to draw to the attention of those providing you with PMFCs or IOCM without them hearing what comes just before or after. [4] You cannot display enough of your AO on screen at one time to use as visual basis for a graphic score or for discussion of overall form and narrative process.

By creating an overview of your AO with precise timings of important events and its division into sections (see top of figure 7-4, p. 258, and the table of musematic occurrence for Abba’s Fernando, p. 387), you can also start referring to musical structures relatively, for example the danger loops just before the final chord, or the last five notes of the twangy guitar tune just before it repeats. It is, however, best when in doubt to provide an accurate timing so as to avoid any confusion about which sound you’re referring to.

Paramusical synchronicityParamusical synchronicity sounds much fancier than what it actually means, but it’s also much shorter than its explanation which, however brief, runs as follows. If, unlike the solely audio version of the James Bond Theme, your ao features lyrics, moving images, stage action or dance, its musical structures can also be designated by referring to paramusical events occurring simultaneously with or in close proximity to those structures. Three fictitious examples will suffice to illustrate this simple technique: [1] the singer’s contented growl on the last ‘oh, baby!’ in verse 1 (at 0:31 in a pop song); [2] the distant screeching sound just before she pours poison into his whiskey (at 1:02:15 in a feature film on dvd); [3] the drum pattern that synchronises with the quick zoom-in on to the lead vocalist’s lips (at 2:20 in a music video). It’s usually advisable to supplement this type of structural indication with timecode designation to ensure that whoever reads your analysis can find the relevant musical structure in the recording without wasting time waiting for the moment to arrive.

Summary of main points

[1] Structural elements in music can be considered as either: [i] dormant structures regardless of semiotic potential; [ii] structural elements that can be shown to carry some sort of meaning —musematic structures.

[2] A museme is a minimal unit of musical meaning but it’s often more useful to consider meaningful musical units in terms of museme stacks, museme strings, or as syncrisis (Chapter 12).

[3] In addition to the intersubjective procedures described in Chapter 6, a musical analysis object (ao: an identifiable and usually nameable piece of music) can be subjected to interobjective investigation.

[4] Interobjective comparison material (iocm) is music other than the ao that sounds like (bears structural resemblance to) the ao.

[5] The collection of iocm is the first of two steps in the procedure of interobjective comparison. The second step involves relating the iocm to its own paramusical fields of connotation (pmfcs).

[6] pmfcs related to the iocm can be posited as pmfcs relating to the ao.

[7] iocm can be collected by exploiting the audio-muscular memory of musicians. This method is direct and reliable since it is intrinsically musical, avoiding the mediation of words and using other music as a sort of initial metalanguage for the music under analysis.

[8] iocm can also be gathered by searching for music whose title, lyrics, accompanying images, connotations, including hypotheses you may have yourself, are relevant to the ao. Online searches usually result in quick access to relevant pieces of iocm (‘Reverse engineering 1’).

[9] Conclusions about musical meaning drawn from interobjective procedures can, if applicable, be cross-checked for viability with reception test results (see Chapter 6). They can also be verified/falsified using the techniques of recomposition (‘Reverse engineering 2’) and commutation (hypothetical substitution).

[10] Accurate structural designation is essential in interobjective analysis. Digital timecode placement and paramusical synchronicity are two simple ways in which anyone can unequivocally denote musical structures without having to use any muso jargon.

Fig. 7-4. Screen capture of James Bond Theme (Norman, 1962) in audio editing software display


Fig. 7-1. ‘Black box’:

escape route 1

Fig. 7-3. Numerical keypads

Screen capture of four points from VLC display of same MP3 as above 2012-09-28, 19:30



Fig. 8-1. Periodic and aperiodic sound waves

Fig. 8-6. 3D model (frontal view)



8. Terms, time & space


About Chapters 8-12

‘Digital timecode placement and paramusical synchronicity are two simple ways in which anyone can unequivocally denote musical structures without having to use any muso jargon.’

If that, the last sentence in Chapter 7, is true, why, you may well ask, are the next five chapters about musical structure? One reason is that understanding basic structural phenomena like tempo, timbre and tonality provides additional insights into what might be hidden in the black box of musical semiosis. Another is that it’s impossible to entirely avoid poïetic terms when referring to musical signs. I’ve already used many such words without any sort of explanation. Page 255, for example, included the following sentence.

‘The “wateriness” [of the music] could depend on any number… of factors —on loudness, register, timbre, articulation, phrasing, tempo, metre, periodicity, tonal vocabulary, aural staging, etc.’

Those ‘factors’, in italics, are categories of structuration I call parameters of musical expression. They are sets of properties constituting the vast variety of sounds we hear as musical. Just think of the following six sorts of musical change: [1] of instrumentation from electronic dub mix to string quartet; [2] of volume from loud to soft; [3] of pitch from high to low; [4] of tempo from fast to slow; [5] of tonal vocabulary from major to minor; [6] of timbre from smooth to rough. Such changes are likely to produce different effects on the listener and need to be named. The problem is that those parameters (and many more) are already subjects of entire books: I just can’t deal with them all in detail. It’s also why, despite valiant efforts to be brief, they occupy the next five chapters.

The aim of Chapters 8-12 is twofold. One is to complement intersubjective and interobjective procedures by providing a perspective based on categories of musical structuration: was that ‘watery effect’ caused by timbre or phrasing, tempo or volume, surface rate or pitch, or by a combination of all of those, or by none? The other main aim of these chapters is to provide a conceptual basis for identifying the sonic properties operative in creating musical meaning. Two concrete examples of mistaken structural identity should clarify the point.

Asked to explain why they think a film music cue sounds romantic, students often say things like ‘it’s the strings’. That’s certainly true if string instruments are involved but it’s also quite misleading because another trope of music for strings suggests the opposite. I’m referring to the ‘screeching, stabbing sound-motion of extra-ordinary viciousness’ in Herrmann’s music for the shower scene in Hitchcock’s Psycho (1960). To distinguish between romantic and psychopathic strings, you have to consider parameters like attack (smooth and soft, not sharp and hard), melodic and rhythmic profile (continuous, regular, gradually varied and overarching, not detached, jerky, sudden and repetitive), phrase length (long, not short), timbre (round, smooth and full, not harsh, rough and piercing) and harmony (consonant, not dissonant). In short, the effect of romance or horror isn’t down to the instruments as such but to how they are used to play what.

Another common case of mistaken connotative identity is caused by the popular equations major = happy and minor = sad. Even if this dualism of tonal vocabulary has some validity in the euroclassical repertoire, it’s inapplicable to any lively minor-mode chalga, cueca, hornpipe, jenka, jig, klezmer, lambada, malagueña, polska, reel, syrtos, tarantella or verbunkos. If you believe in the minor mode’s intrinsic morosity, try acting depressed as you sing along to merry minor tunes like Kalinka, Hava Nagila, God Rest You Merry Gentlemen or the Lambada. Or else try joyous abandon while hearing mournful major-key pieces like Handel’s Largo. No, indicators of happy or sad are less likely to be a matter of modality (p. 325 ff.), more likely due to particular usage of parameters like tempo, surface rate, loudness, phrase length, melodic pitch range and contour, accompanimental register and rhythmic configuration.

As already mentioned, there’s no room here to go into much detail about the sort of parameters of musical expression just listed. I will at best be able to give a rough idea of some of the essential ‘nuts and bolts’ involved in making and reacting to music. Readers requiring more detail must regrettably look elsewhere. Another caveat is that I have to pay more attention to concepts that conventional music theory treats either confusingly or cursorily (if at all). These priorities are necessary because many institutions of musical learning in the West still conceptualise parameters of expression hierarchically, as either primary — ‘syntax-based discrete relational categories (pitch, duration)’ — or secondary — ‘tempo, dynamics, timbre’. Such conceptual hierarchies are inapplicable to most of the music we hear on a daily basis.

After a brief discussion of concepts essential to the rest of the book, this chapter is devoted to parameters of time, speed, space and movement (duration, phrase, episode, tempo, beat, metre, groove, aural staging etc.). Chapter 9 deals with issues of timbre and tonality, including sections on effects units, loudness, tuning, octave, interval, mode, melody, chords, harmony, etc. Chapter 10 is devoted entirely to vocal persona, and Chapters 11-12 to the aggregated ‘macro’ parameters of narrative form or diataxis (extensional) and of syncrisis (intensional).

Still, before taking on all those structural issues, it’s wise to first clarify a few fundamental and recurrent concepts like genre, style, paramusical factors, the extended present, note, pitch, tone and timbre.

Basic concepts (1)

Genre and style

According to Fabbri (1999: 8-9; 2008: 121-136), musical genres evolve as named categories to define similarities and recurrences —’rules’— that members of a given community find useful in identifying a given set of musical and music-related practices. Such rules, writes Fabbri,

‘can be explicit, as in an aesthetic manifesto or a marketing campaign,’ [but they are just as likely to be] ‘implicit or never declared’… ‘Rules that define a genre can relate to any of the codes involved in a musical event —including rules of behaviour,… proxemic and kinesic codes, business practices, etc.’

I interpret Fabbri to mean that particular types of language (lyrics, paralinguistics, metadiscourse, etc.), gesture, location, clothing, personal appearance, social attitudes and values, as well as modes of congregation, interaction, presentation and distribution, are all sets of rules that, together with musical-structural rules, build a larger set of rules identifying a particular genre. The fact that music, as a cross-domain symbolic system, is central to genre identity should come as no surprise. After all, the business rationale of format radio assumes musical taste to be a key indicator of demographic factors (age, income, ethnicity, education, etc.) defining a target audience. Each subset of genre rules reinforces the others and music is at the centre because, as Fabbri adds, ‘[k]nowing what kind of music you’re listening to, or talking about, or actually making, will act as a compass’, helping you ‘choose the proper codes and tools’ for the genre as a whole.

Fabbri (1999: 8-9) defines style as:

‘a recurring arrangement of features in musical events which is typical for an individual (composer, performer), a group of musicians, a genre, a place, a period of time.’… ‘As a codified way of making music, which may (or must) conform to specific social functions, style is related to genre, and is sometimes used as its synonym… However, style implies an emphasis on the musical code, while genre covers all kinds of code relevant to a musical event.’

Style can in other words be seen as a set of musical-structural rules or norms, genre as a larger set of cultural codes that also include musical rules. This does not mean that styles are mere subsets of genre. For example, Morricone’s musical style —his personal idiolect— is unmistakable whichever genre he’s working in: sounds typical of his concert pieces turn up in his film scores, some of his film themes closely resemble his work with popular song, and his unique style of orchestration can be heard in all three genres.

Fabbri’s distinction between genre and style is useful for two reasons. Firstly, it allows for differentiation between two sorts of style flag (pp. 523 ff.): [a] musical structures that establish a ‘home style’ (style indicators) and [b] those using elements of another style to refer to a genre other than that pertinent to the home style of the music under analysis (genre synecdoches). The second reason is that, seeing how music and musical rules are central to the fusion of other aspects of genre into a recognisable (albeit fuzzy) sociocultural whole, it’s important to consider those other aspects of genre, too. And that’s why the next few pages deal with paramusical matters.

Parameters of paramusical expression

Since music is not a universal language (pp. 47-50) it’s essential to consider cultural parameters defining the act of musical communication. Obviously, what members of different populations intend by and interpret from the music they make and hear will vary considerably; and, as we saw earlier (pp. 178-182), the same musical structure doesn’t necessarily mean the same thing to all individuals included in the same basic demographic. That’s one reason why, under Ethnographic intersubjectivity (p. 199, ff.), listening mode, venue, activity and scene were put forward as important initial points in a semiotic approach to music analysis. Those general considerations refer back to the communication model (pp. 172-178) and can be summarised as follows.

General aspects of paramusical communication

1. Who, culturally and demographically, are the music’s transmitter[s] and receiver[s]? Do they belong to the same population? What sort of relationship exists between transmitter[s] and receiver[s] of the music in general and at the particular occasion of musical communication you’re studying?

2. What motivates receiver[s] to use the music and what motivates transmitter[s] to create and transmit the music?

3. What interference (p. 182, ff.) is the intended message subjected to in its passage in the channel? Do transmitter[s] and receiver[s] share the same store of symbols and the same sociocultural norms/motivations? What bits of the music do[es] the receiver[s] hear, use and respond to? What sort of response is observable?

4. What aspects of attitude or behaviour at the transmitting and receiving ends affect the musical ‘message’?

5. What is the intended and actual situation of musical communication for the music both as a piece and as part of a genre, e.g. dance, home, work, ritual, concert, meeting, film? Where, physically and socially, is the music produced and where is it heard and used?

These issues of genre rather than style affect what music is actually made and heard: they influence which parameters of musical expression are operative. Even if cultural context isn’t the main focus of your study they must be addressed in order to avoid the ‘perverse discipline’ of semiotics without pragmatics.

Simultaneous paramusical forms of cultural expression

As briefly illustrated by clinking glasses, lively chatter and raucous laughter in the comparison between the official and foreign drunk versions of your national anthem (pp. 229-237), musical meanings aren’t only affected by the overriding sociocultural and acoustic circumstances under which the music is created and heard: they are also influenced by paramusical expression. Obviously, hearing a rendition of your national anthem along with clinking glasses and raucous laughter does not have the same effect as hearing it without. Nor does the same opening to Richard Strauss’s Also sprach Zarathustra (1896) mean the same thing in TV commercials for Foxy Bingo or Silan fabric conditioner as it did in films like Clueless (1995), My Favourite Martian (1999) or Zoolander (2001). It certainly meant something quite different in the film that initiated this musical trope of audiovisual grandeur—Kubrick’s 2001 (1968)—, not to mention its origins in a philosophical fantasy novel by Nietzsche.

Checklist of paramusical types of expression

Paramusical forms of expression connected to a musical analysis object are summarised in the eight points listed next. The relative absence or presence and properties of the points enumerated are not only able to affect the meaning of the music with which they co-occur: they can also be used in the process of establishing paramusical fields of connotation related to your analysis object.

1. Paramusical sound, e.g. church bells, background chatter, rattling crockery, applause, engine hum, birdsong, sound effects.

2. Oral language, incl. dialect, accent, idiom, vocabulary used in dialogue, commentary, voice-over or lyrics.

3. Paralinguistics, e.g. vocal type, timbre and intonation of people talking; type and speed of conversation or dialogue.

4. Written language, e.g. programme or liner notes, advertising material, title credits, subtitles, written devices on stage or screen, expression marks and other scribal performance instructions.

5. Graphics, typeface/font, design, layout, etc.

6. Visuals, e.g. photos, moving picture, type of action, narrative genre, mise en scène, scene, props, lighting, camera angle and distance, POV, editing rhythm and techniques, superimpositions, fades, zooms, pans, gestures, facial expressions, clothing.

7. Movement, e.g. dance, walk, run, drive, fall, lie, sit, stand, jump, rise, dive, swerve, sway, slide, glide, hit, stroke, kick, stumble, forwards, backwards, sideways, up, down, approach, leave, fast, slow, sudden, gradual.

8. Location, venue and audience (when, where, and who for), e.g. 18th-century French aristocrats in a château, aliens on the starship Enterprise, euroclassical concert hall audience, rock fans at a stadium concert, 1970s disco clubbers, football match crowd, etc. Sartorial, gestural and other group-behavioural codes are an important ingredient of paramusical connotation.



Parameters of musical expression

Parameters of musical expression can be thought of in four main interrelated and overlapping categories: [1] Time, speed and space (this chapter); [2] Timbre and loudness (first part of Chapter 9); [3] Tone and tonality (second part of Chapter 9); [4] Totality (the parameter ‘aggregates’, Chapters 11-12). Chapter 10 is entirely devoted to ways of designating different types of vocal expression. Please remember that very few concepts denoting parameters of musical expression fit neatly into any one of the first three categories and that category 4 includes several by definition. For example, nothing in categories 2 (timbre and dynamics) or 3 (tone and tonality) can exist without the parameters of time and space (category 1); nor can elements of temporal organisation like rhythm and metre exist without timbral, dynamic or tonal patterning, nor can tone or timbre be understood without considering pitch and loudness.

None of this taxonomic untidiness will surprise those familiar with cross-domain representation, synaesthesis or music and the brain (pp. 62-71). After all, no sound can exist without the movement of an object or mass of some kind (incl. air and water) interacting with another (hitting, stroking, scraping, shaking, ruffling, blowing, stirring, etc.), nor can such sound-producing friction occur without energy enabling the movement which, in its turn, presupposes space in which the movement takes place. Even synthesised sound needs energy (electrical) to generate wave forms of sufficient amplitude to power movement in speaker and headphone membranes. Since all this sound-producing energy and movement occupies both space and time, parameters of expression primarily relating to time and space are presented first. However, it’s virtually impossible to discuss any aspect of musical structure without using four very common concepts whose meanings are often unclear. That’s why note, pitch, timbre and tone each needs its working definition. A piece of music and the extended present are two other essential terms requiring at least some sort of clarification. We’ll start with the latter.

Basic concepts (2)

Piece of music

A piece of music is usually delimited, both before and after, by something that isn’t heard as music (e.g. silence, talking, background sound). A piece of music can also start or end when immediately preceded or followed by other music that is clearly recognised to have a different identity. If a piece of music exists as recorded sound, it will typically occupy one cd track or constitute a single audio file.

Extended present

The extended present is a key notion in the conceptualisation of time in music. It can be understood as lasting for about as long as breathing in and out, or as a few heartbeats, or as enunciating a phrase or short sentence, i.e. as a duration equivalent to that of a musical phrase, or to a short pattern of gestures or dance steps. Such immediate, present-time activities usually last, depending on tempo plus degree of exertion, for between around one and eight seconds of ‘objective’ time.

The extended present is also a concept implied in the distinction between the intensional and extensional aesthetics of music (Chester, 1970). According to this polarity, a classical sonata form movement (see p. 409 ff.) is more likely to derive interest from the presentation of ideas over a duration of several minutes (extensional), while a pop song or film music cue is more likely to do so in batches of ‘now sound’ in the extended present (intensional). The 3.6 seconds of guitar riff accompanied by bass and drumkit in Satisfaction (Rolling Stones, 1965) provides a textbook example of rock intensionality in the extended present.

There’s no clear boundary between the extended present and the passing of time along a unidimensional axis from infinite past to infinite future through a point of supposedly no duration (the present). If you have ever stared transfixed at a sunset over the sea, or drowned with delight in the eyes of a your beloved, you’ll know that ‘now’ can extend for many seconds. Time just seems to stand still. But it’s also worth knowing that the extended present has an objective existence inside the human brain. For example, knowing how to finish the spoken sentence you just started relies on the short-term storage of information in a different part of the brain ―the working memory― to that used for medium- and long-term storage. The phonological loop is a key component in working memory. It can hold about two seconds of sound and, like a loop-based tape echo unit, involves two stages: short-term storage with rapidly decaying auditory memory traces and an ‘articulatory rehearsal component’ that can revive those traces. Each phonological loop is like an ongoing mini-chunk of information that can be recalled and strung together with up to three others in immediate succession (four in all) to produce a larger chunk of ‘now sound’ covering a maximum of eight seconds. This distinction is not unlike that between a computer’s RAM and its hard drive. It also means that setting the duration of the extended present to ‘between one and eight seconds’, an estimation based solely on timing a large number of musemes and musical phrases, cannot be dismissed as fanciful speculation.


In musical contexts note means four different things: [1] a single, discrete sound inside a piece of music; [2] such a sound with discernible fundamental pitch (p. 277); [3] the duration, relative to the music’s underlying pulse (p. 288), of any note (e.g. ‘whole note’, ‘quarter note’); [4] the scribal or graphic representation of a note, according to any of the above definitions, in musical notation. Note will be used here in its first sense, i.e. to mean any single, finite, discrete minimal sonic event in a piece of music, irrespective of that event’s duration, pitch, timbre or graphic representation.


Pitch is that aspect of a sound determined by the rate of vibrations producing it. Pitch is scientifically measured in units of sound wave frequency called cycles per second or Hertz (Hz). 27.5 Hz is, for instance, the pitch of the lowest note on a piano (‘bottom a’), 4,186 Hz its highest (‘top c’). Pitch is simply the degree of perceived ‘highness’ or ‘lowness’ of a sound.

High pitch is in general associated with light in both the ‘not dark’ and ‘not heavy’ senses of the word. Perhaps that’s because gusts of wind scatter leaves, plastic bags and other small, light objects, blowing them up into the air towards the sky, the clouds and the sun. Heavy objects are more difficult to move, more likely to stay on the ground, which is normally perceived as darker and heavier than air. Not only do large, heavy objects need lots of energy (a tornado, say, or vast amounts of jet fuel) to get them off the ground; their very weight makes them appear less volatile, more likely to be understood as heavy, dark or massive rather than quick, small or light. Besides, small children have smaller bodies and vocal equipment producing ‘higher’, ‘lighter’ sounds than grown-ups; and the process whereby adolescent male voices break and deepen reinforces the same sort of synaesthetic patterning, as does the fact that singers tend to use the head register to produce high notes, the chest register for low ones. Moreover, the vibrations of a loud bass instrument, or of an earthquake, are felt in the abdomen, whereas dissonant high-pitched sounds are often used in film music as a sort of sonic headache to accompany scenes of mental disorder, relentless sunlight, etc.

Along with volume, timbre and duration, pitch is a basic element of sound. It allows humans to distinguish between, for example a hi-hat and a large gong struck in the same way, or between the top notes of a piccolo flute and the lowest ones on alto flute played at the same volume with the same sort of attack for the same duration. Now, there’s a problem with that previous sentence because the high or low pitch of flute notes is different from the high or low pitches of hi-hat and large gong, even though the sound of a big gong contains a lot of low frequencies and the hi-hat (quelle surprise!) sounds high. That problem has to do with the difference between note and tone.


The difference of pitch between hi-hat and large gong, on the one hand, and, on the other, between high and low flute notes is that flute notes, high or low, each have one clearly discernible fundamental pitch while hi-hat, snare drum, bass drum and gong notes do not. It’s this factor of clearly discernible fundamental pitch —a concept explained under timbre (p. 279, ff.)— that determines whether the note in question is also a tone. A tone is simply a note of discernible fundamental pitch.

The main reason why, technically speaking, a tone contains a fundamental pitch is because its sound wave rate is steady or periodic, whereas aperiodic sounds exhibit no such regularity (fig. 8-1). That’s why singing is heard as more tonal than talking, whistling more so than hissing, groaning more tonal than grunting. All six sounds can be used as notes in music but only three of them (singing, whistling and groaning) are likely to be tonal. It may be worth adding the obvious point that, in our culture, tones are the only type of notes with pitch names like a, b$, b8, c, c#, etc. That all seems quite straightforward but there are at least two major problems with the word tone.

One problem is that tone means so many different things in relation to sound. It can refer to aspects of speech that express attitude, as in ‘I don’t like your tone’. It can even mean timbre, as with the ‘tone’ knob on a guitar amp, where tone is short for tone colour. Tone is often used to mean not a note of discernible fundamental pitch but the pitch step or interval (p. 322 ff.) between two neighbouring tones, as in whole tone (e.g. between the notes c and d) and semitone (e.g. between e and f).

Another critical problem with tone is the use of its derivatives tonal and tonality in conventional Western music theory. Given our commonsense definition of tone, tonal should logically mean having discernible fundamental pitch and tonality should mean any system according to which tones are configured in music. Unfortunately, many music scholars in the West still use tonal and tonality to refer to just one way in which tones are configured —that of the euroclassical repertoire between c. 1730 and c. 1910. This ethnically, socially and historically restrictive use of the word has bizarre consequences, one being the nonsensical dualism tonal v. modal —all modes are by definition tonal!—, another the anachronism of twelve-tone music which despite its name is called atonal instead of atonical! To avoid such lexical absurdity, here are the definitions I’ll be using.

• tone (n.): a note with discernible fundamental pitch;

• tonal (adj.): having the properties of a tone;

• tonality (n.): any system according to which tones are configured;

• tonic (n.): musical keynote or reference tone;

• tonical (adj., neol.): having a tonic or keynote.


Timbre [!tQmbr(] and its adjective timbral [!tImbr(l] are words denoting acoustic features that allow us to distinguish between two notes, tonal or otherwise, sounded at the same pitch and volume. Timbre, sometimes also called ‘tone quality’ or ‘tone colour’ (Klangfarbe), is a complex acoustic phenomenon whose four basic phases were simplified by analogue synthesiser manufacturers in an ‘ADSR’ scheme: A for attack, D for decay, S for sustain and R for release. The properties of each of these elements, and how those properties vary as the sound of a note is produced, continues and ends, determine the specific qualities of what we hear as timbre. That whole unit from start to finish is called the envelope (Fig. 8-2, p. 278).

The envelope of notes played on drums, piano and other percussion instruments, as well as notes on plucked acoustic instruments, consist of only attack and decay. Those played by bowed strings, woodwind, brass and electrically amplified instruments contain all four phases. The first type of note relies on a one-off action to produce a sound that can last from as little as just a few milliseconds (e.g. xylophone) to several seconds (e.g. large gong, loud held note on the piano, as in Fig. 8-2a and b). The second type is generated by ongoing action (bowing, blowing, electric current, etc., as with the violins and synthesiser in Fig. 8-2c and d.). These and other distinctions are essential to the understanding of how timbre is produced. However, for the purposes of a perception-based semiotic analysis the following three phases, explained next, will probably suffice: attack, continuant and release.

Attack refers to the initial fraction of a note corresponding to the way the note is struck, hit, plucked, scraped, blown, etc. on an acoustic instrument, or ‘attacked’ by the voice. For example, it’s easy to distinguish the same note of the same duration played at the same volume in the same position on the same string on the same guitar in the same room if the instrument is plucked with the flesh of the thumb rather than with a plectrum.

Release refers to the way a note ends. For example, xylophone and unsustained piano notes end more abruptly than piano notes played with the sustain pedal pushed down, or than undamped or unclipped notes on, say, guitar, French horn or cello. Release is often audible when violinists take their bow off the string at the end of a long note (Fig. 8-2c).


Fig. 8-2. Attack, decay, sustain release: four envelopes

Continuant is a term I’ve borrowed from phonetics where it means an extendable or sustainable consonant, like /r:/ as in ‘Rrreally!’ or /S:/ as in ‘Shshsh!’ when you want others to be quiet. I’m adapting continuant here to denote in a more aesthesically friendly way the ongoing ‘body’ of a note, i.e. the part that is most likely to be heard as tonal, regardless of whether it’s the decay of struck or plucked notes or the sustain part of notes produced in other ways. Continuants are easily conceptualised in onomatopoeias like ding and pling (two small bells?) or twang and blang (two electric guitar sounds?): the initial consonants represent the sound’s attack, ng its release and the vowels its continuant. Unless you’re hearing, say, a xylophone or short, unsustained notes on piano or acoustic guitar, a note’s continuant is usually, compared to the attack, a longer sound whose timbre is acoustically determined by its frequency spectrum, i.e. by how much of which frequencies it contains. And that, finally, is where fundamental pitch comes in.

As we saw just saw, some musical sounds, like those of the hi-hat and a kick drum, although heard as high- and low-pitched respectively, are aperiodic (fig. 8-1, p. 275): they have no audible fundamental pitch. The frequency spectrum of tonal instruments and singing voices, on the other hand, is periodic in relation to a fundamental. Now, a tone sung or played at a particular pitch doesn’t just consist of waves oscillating at the rate corresponding to that single pitch, its fundamental: it also contains the sound waves of overtones or harmonics (a.k.a. partials) oscillating at integral multiples of the fundamental’s own frequency. How strongly which harmonics are present in which parts of an envelope is an essential aspect of timbre.

Fig. 8-3. Sound waves for flute, clarinet, trumpet and piano

The four sound waves shown in Figure 8-3 are all periodic in that they all have a regularly recurring wave pattern. They also all have a strong fundamental (the first peak in each phase) but the similarities end there. Flute tones contain a strong first harmonic, oscillating at twice the frequency of the fundamental, but not much else (hence the wave form’s characteristic single bulge), while tones played on the other instruments consist of a more complex array of frequencies in the harmonic series producing more complex wave forms. The almost limitless range of combinations of variable amounts of harmonics present in a tone —its frequency spectrum— make timbre an essential parameter of musical expression. Variations of vocal timbre can be particularly expressive and are discussed in Chapter 10.

With these explanations of basic terms out of the way we can now confront the main topic of the next few chapters —the parameters of musical expression. The underlying premise is that change in any of the parameters, by definition involving a change of sonic structure, can also bring about a change of meaning.




Time, speed and space


It may be helpful to think of musical durations in five fuzzy categories: [1] micro-durations, lasting typically less than 1 second; [2] meso-durations, equivalent to the time span of at least one but no more than, say, eight bouts of the extended present (= 1"-60"); [3] mega-durations, ranging from the time occupied by a long advert or title theme (c. 1 min.), through that of an up-tempo dance number (= 2 mins.) to the standard length of a pop song, rock track, Schubert Lied, or short euroclassical movement (= 3-6 mins.); [4] macro-durations, typical for extended euroclassical symphony movements, for jazz or prog rock tracks containing multiple sections and/or lengthy solo improvisations (= 6-30 mins.); [5] giga-durations (>= 30 mins.), as for a complete opera, a Mahler symphony, or a traditional live rāga performance. Only micro-, meso- and mega-durations need concern us here.

Micro-durations: notes and pauses

Micro-differences of 100 milliseconds (one tenth of a second), or even less, can produce significantly different linguistic and musical effects. Figure 8-4 (p. 282) shows the durations in milliseconds (ms) of four different ways of asking the question ‘What did you say?’. Version [a] sounds angry — ‘WHAT …[the] & DID YOU SAY?!’; [b] asks ‘what did you actually say rather than mean to say?’; [c] sounds robotic; version [d] is spoken quickly, as in everyday conversation.

Fig. 8-4. ‘What did you say?’ – four patterns of micro-duration

At least five points of micro-duration are worth noting here.

1. Version [a] is longer than both [b] and [c], much longer than version [d]. Loudness, pitch, timbre and propulsive reiteration (p. 518 ff.) aren’t the only parameters determining sonic emphasis because, clearly, the more time you spend on one idea, the more it will be heard. This observation holds for individual syllables (notes) within the phrase (e.g. the what in [a], say in [b]) as well as for the whole phrase in relation to other phrases in its vicinity.

2. Variants [b] and [d] are both articulated as one single and uninterrupted stream of sounds, i.e. legato, Italian for ‘joined’. The same phrase is broken in variant [a] (anger) by a pregnant pause (tacet is Latin for ‘is silent’) lasting over half a second: ‘WHAT [pause] did you say?’. The 600-millisecond pause, as long as the whole of variant [d], is just as communicative as the ‘WHAT’ outburst that preceded it. Duration of silence in speech and in music can be as communicative as duration of sound.

3. Instead of one unbroken enunciation, all four notes (syllables) in variant [c] (robot) are sounded for the same duration (200-250 ms each) separated by short silences of equal length (150-200 ms). The phrase’s notes are presented in a choppy manner (staccato): the notes or syllables are detached from the flow of the statement to which they would normally belong in human speech.

4. Each note or pause in each of the four variants occupies a micro-duration inside a longer duration —the phrase ‘What did you say?’. Those different configurations of micro-durations give each variant its own identifiable rhythm.

5. Variation of micro-durations in music corresponds roughly with what classically trained musicians call phrasing because those durations are constituent elements in a musical phrase.

The patterning of micro-durations in music can have other significant effects. For example, the time difference between placing notes one half and two thirds of the way between beats is one sixth of a beat, e.g. 76 milliseconds at 132 bpm. That micro-duration makes all the difference between straight (½: l = il) and swung (2: l. = l z) articulations of the beat. Micro-durations are significant because their patterning contains important emotional and kinetic information. They are essential in mediating ‘feels’ that sound choppy or smooth, straight or swung, stuttering or flowing, distinct or fuzzy, nervous or confident, bold or timid, etc., as well as in mediating certain notions of space (p. 298 ff.).




Musical phrases are the basic units of meso-durations. Depending on tempo and degree of exertion, they can last for as little as one second and as much as around eight —the time span of the extended present (p. 272). Consecutive phrases tend to be separated from each other by the sort of time it takes to breathe in. Four seconds is a typical phrase length, equivalent to the time it normally takes to in- and exhale (the ‘ventilation cycle’, i.e. the duration of the extended present). Pertinent questions to ask about phrases are: What is their length? Are they extensive, controlled, lyrical or ecstatic? Or are they short, stressed, ‘out of breath’, just consisting of short motifs? Is phrase length consistent or does it vary? What effects do these phrase lengths create?


Motifs are either constituent parts of a phrase or extremely short melodic figures in themselves. They rarely last for more than a second or two and are building blocks not only in melodic construction but also in instrumental patterns like riffs. Motifs differ from phrases, not just by being shorter, but also in that they can be ongoing, as in the obvious case of repeated guitar riffs where no breathing space is required.


A musical period consists of at least one phrase. In many types of dance music and popular song periodicity is regular: the periods are of equal length, often arranged in multiples of 4 bars. A period consisting of 4 bars of r metre at 96 bpm (16 beats, 10 seconds)34 will most likely consist of two 2-bar phrases, each lasting 5 seconds. A period consisting of 8 bars of r at 120 bpm (32 beats, 16 seconds) will probably consist of four 2-bar phrases, each lasting 4 seconds.

In urban Western cultures, music relating to gross-motoric movement, especially music with an energetic groove (p. 296), is, as just mentioned, usually organised in larger symmetric durational units, typically in multiples of four —the four-bar phrase, the eight-bar period and so on. This quadratic symmetry of meso-durations applies not just to the obvious rectangularity of marches and techno but also to music for many types of dance, from slow foxtrots or waltzes to energetic jives, jigs, reels or sambas. Such symmetrical periodicity serves to organise dance steps into longer patterns, as exemplified in Table 8-1.

Table 8-1. Gay Gordons step patterns at 112 bpm over 8 bars of r (17")

bars secs. beats dir. steps

1-2 0-4.3 1-8 Q ae 4 steps forward, turn; ae 4 steps back

3-4 4.3-8.6 9-16 P ae 4 steps forward, turn; ae 4 steps back

5-6 8.6-12.9 17-24 Q a PP under e's right arm held high

7-8 12.9-17.1 25-32 Q ae PP, ae PP together, polka twirl


Each of the first sixteen footsteps in a Gay Gordons coincides with each beat of the music. At 112 bpm that means one beat = one footstep every 0.54 seconds. Each step pattern lasts four beats (one r bar) or 2.1 seconds. Each of these two-second, four-beat patterns has the duration of a musical phrase and is repeated with minor variations to create a larger pattern spanning two bars of r (8 beats in 4.3"), a duration still within the limits of the extended present. That two-bar pattern is in its turn repeated with minor variations to create a four-bar period of sixteen beats (4 × r) or 8.6 seconds, a duration equivalent at 112 bpm to two bouts of the extended present. The complete 17-second or 8-bar pattern of Gay Gordons steps falls into two clearly distinguishable four-bar periods, the first (bars 1-4) containing simple steps forward and backward, the second (bars 5-8) featuring two sets of two clockwise spins. Ladies (a) spin clockwise eight times and men (e) four times (P in column 5 of Figure 8-1) as both partners proceed in a generally anticlockwise direction (Q in column 4) to complete the entire 17-second pattern consisting of 32 beats (8 bars of r). That entire sequence is then repeated starting from a new position on the circumference of the shared dance floor. Finally, if the whole sequence of steps is repeated eight times at 112 bpm the dance will last for 2:17, by which time you will have held hands with your partner for 1:43 and spun around in each other’s arms, polka style, for the remaining 0:34.

The Gay Gordons example serves two purposes. The first is to illustrate the hierarchy of music’s meso-durations, ranging from smaller units within the extended present, through longer periods incorporating two or more such segments, to complete episodes like the entire 17-second cycle of dance movements. Periodicity is the operative word here. All too often overlooked in conventional music analysis, periodicity simply means the way in which musical meso-durations (phrases in particular) are configured within the same piece, whether they are long or short, regular or irregular, symmetrical or asymmetrical, etc. The second point of Table 8-1 is to exemplify the regular periodicity that characterises not only most types of dance and march but also work songs, in fact any music with an energetic groove (p. 296) relating to gross-motoric body movement, be it fast or slow.

Regular periodicity is common in situations where coordination of movement between individuals is essential, as in sharing space on the dance floor, marching on a parade ground, or in collaborative tasks of manual labour like hoisting the topsail, weighing anchor, cross-cutting trees, hauling barges, or track lining a railway. The greater the need for concerted simultaneity and the greater the number of people involved in the activity, the more regular and symmetrical the periodicity of music connected with that activity is likely to be. Just consider the difference between delivering a political speech (one person) and chanting political slogans (many people), or between a ‘personal’ rock ballad like Your Song (John, 1970) and a ‘collective’ rock anthem like We Will Rock You (Queen, 1977), or between country blues (one performer) and urban blues (usually several musicians).

Irregular periodicity, on the other hand, is more likely to cause Western listeners some sort of surprise, even confusion, because it either delays (‘We should be in the next bit by now!’) or anticipates whatever is expected to happen next (‘Whoops! That caught me off guard.’). It’s also common when the rhythm of lyrics or visual narrative overrides the expected regularity of musical events. An extended period of, say, 4½ or 5 bars instead of the usual 4 in a popular song can communicate something like ‘The words are important here and I’m going to fit them in even if it means spending a little extra time on them’, while cutting a period short can tell the listener ‘there’s no waiting for the next bit’. However, irregular periodicity can also help create effects at the opposite end of the spectrum from urgency or confusion. If the music is soft, the tempo moderate or slow, the articulation relatively smooth, if there are no sudden surprises and, most importantly, if the music features little or no energetic groove, then the symmetric arrangement of meso-durations into regular quadratic patterns can be discarded. That helps create a reflective, meditative and rhapsodic groove free from the constrictions of movement immanent in the social organisation of time and space typical for marching, dancing, work songs or street slogans. Instead, a floating sense of relative stasis and tranquillity can be produced, with tonal and timbral parameters helping define its mood as serene or desolate, relaxing or foreboding, etc.

Appropriate questions about periodicity might be: Is it constant or varied? Regular or irregular? What effects are created by the music’s periodicity? Are the lyrics or the dance groove more important? Is it a theme tune (regular periodicity more likely) or a piece of underscore (irregular periodicity more likely)? Is it a solo or ensemble piece?

Episode (section)

One step up the hierarchy of durations from phrases and periods, but below that of a complete normal-length piece, comes the category of section or episode. In point of fact, episodes aren’t so much definable in terms of duration as of their distinctiveness as regards sonic content and expressive character. Episodes like verse and refrain, ‘A’ and ‘B’ sections in a jazz standard, etc. are the basic building blocks of the longer narrative processes (mega-durations) discussed in Chapter 11.


The basic unit of mega-duration is that of an entire piece, be it a title theme lasting less a minute, a pop song or Schubert Lied lasting less than three minutes, or a shortish movement from a euroclassical symphony, or a prog rock or jazz track lasting six minutes or more. There’s clearly no point thinking in terms of mega-duration if the piece is a jingle, bridge or tail lasting no more than a few seconds, but even a 60-second tv theme tune contains phrases and periods, often also episodes. If so, its identity as a piece is partly determined by the way in which constituent episodes are managed in terms of order, relative duration, etc. inside its total duration. We’ll return to questions of musical narrative in Chapter 11.


A sense of speed in music is primarily created by using two parameters of expression: tempo and surface rate.

Tempo, beat and pulse

Musical pulse or tempo is measured in beats per minute (bpm), a rate also known as its metronome marking. Pulsus, the Latin origin of the word pulse, means beat, as in heartbeat. Metronome markings range from 40 to 212. This range of bpm relates directly to human pulse: 40 bpm is that of a well-trained athlete in deep sleep and 212 bpm that of a baby in a serious state of stress. Most metronome markings are in the range 50 to 160 bpm and can also be related to footsteps: 52 bpm for a slow funeral procession, 90 bpm for a pleasant stroll, 120 a brisk march, 160 for long-distance running and 200 for an Olympic 100-metre sprint. Tempo provides the underlying pace of a piece of music.

Tempo beats can be stated either explicitly, as in the obvious four to the floor kick drum sound of electronic dance music, or implicitly as points at regular intervals inferable from the rate at which prominent parts of the music seem to move.

Beat is often used loosely to refer to combinations of tempo, metre and rhythm (e.g. ‘breakbeat’), but it is strictly speaking no more than the occurrence, at regular intervals of between 0.67 and 3.5 per second (40-210 bpm), of points in time comparable with those defining the duration of heartbeats or breaths. The constant presence of these elemental biological functions throughout life makes them inevitable reference points for experiencing and measuring the speed and duration of other sound and movement. That’s why we can feel a beat in music even if none is audible and why the beat, in this strict sense of the word, is not only the basic unit of tempo but also of metre. It’s also why notes sounded regularly at a rate outside the limits of the metronome can’t be musical beats. If they’re much below 50 bpm we’ll hear them occurring on at least every other beat; if they exceed 200 we’ll hear two for every passing beat. Such extra-metronomic rates are often important in communicating a musical sense of speed, as we’ll see next.

Surface rate

If tempo, with its pulse quantifiable in bpm, indicates the music’s underlying pace, its surface rate can be measured in notes per minute (npm) indicating the speed at which actual notes are sounded or implied. For example, a homophonic hymn sung at 80 bpm with virtually all its notes running at the same rate (80 npm, i.e. one note per beat) sounds much slower than a TV theme tune played also at 80 bpm but with plenty of notes running two, three, four or even six times faster than the underlying tempo. Put simply, dum diddley diddley dum sounds a bit faster than dum diddle diddle dum, definitely faster than dum did- did- dum, and radically faster than dum —— dum, even though the duration between each beat on dum is identical in all four cases. Therefore, when talking about a sense of speed in music, it’s essential to consider both the underlying pulse (the tempo, the metronome marking, the dum-dum or boom-boom factor, ‘the bpm’) and the music’s surface rate (the diddle-diddle or diddley-diddley factor, ‘the npm’). Put another way, tempo (bpm) can usually, though not always, be thought of in terms of gross motoric movement and surface rate (npm) as fine motoric (p. 63, ff.).

Now, although surface rate is usually quicker than tempo, it can often be the same, as in homophonic hymns, and, occasionally, slower. Consider, for example, the TV theme for NYPD Blue (Post, 1993 H) which runs for one minute at a stable 120 bpm. At the start, loud drums establish a surface rate four times faster (480 npm) than the tempo (120 bpm), but at 0:19 the drums fade into the far distance and a pastoral theme, carried by sampled cor anglais and a string pad, occupies the foreground with a surface rate four times slower (30 npm) than the underlying tempo. The stark contrast of relationship between the constant tempo (120 bpm) and the two radically different surface rates (480 and 30 npm) creates two dramatically different moods giving two very different impressions of speed and space.

Surface rate tends to vary much more than underlying tempo and is, as the name suggests audible ‘on the surface’. Nevertheless, the same surface rate can often be heard more or less permanently throughout a complete episode or entire piece of music. The hi-hat sounding repeatedly in a pop song at twice the rate of the underlying pulse is one case in point and its consistent bisection of the beat creates what can be called subbeats. Subbeat —the regular subdivision of a beat— is a useful concept in the understanding of metre (p. 293 ff.).

Appropriate questions to ask about tempo and surface rate might be: Does the music have a regular pulse measurable in bpm? If not, does it change suddenly or gradually? If the music has no pulse, what effect does that create? If it has pulse, what’s the tempo in bpm? Is there a constant surface rate or does it vary? Or are there several simultaneous surface rates? How fast (or slow) is the surface rate in relation to the tempo? What effect is created by, say, a fast surface rate and a slow underlying tempo, or by a slow surface rate and a fast tempo?

Harmonic rhythm

Harmonic rhythm isn’t really a rhythm but a rate, more precisely the rate at which different chords (if any) are presented. A single held chord, or a very slow rate of harmonic change, is more likely to bring about an effect of stasis, quick change between several chords more likely to favour a sense of speed.


Rhythm has many meanings. Apart from its use as a blanket term to cover cyclical events (annual, seasonal, menstrual, daily, etc.) it’s also often used loosely to refer to one or more of several parameters like tempo, surface rate, metre and groove. Here, however, rhythm means temporal configuration of notes and pauses between notes (short or long, weak or strong, etc.) to produce recognisable patterns of sound in movement. Such configurations can be heard as smooth or jerky, monotonous or invigorating, varied or repetitive, and so on. As a specific configuration in micro-durations, a rhythm can consist of just two notes, like the ‘Scotch snap’ (Il.), or three notes, like the dadada daahm motif at the start of Beethoven’s fifth symphony (iil |l). The eight notes of the famous Satisfaction riff also form a rhythm constituting an entire musical phrase (3.6": l l. zil;l. zil_l) but it can also be heard as two elided motifs (l l. zil_ and l. zil_l).

A single note on its own does not constitute a rhythm, nor, strictly speaking, does the ticking of a clock or metronome. When we say the clock goes tick-tock we ascribe two different sounds to what is in fact an identically repeated single sound. We binaurally configure that one sound into a rhythm using a timbral/tonal distinction that seems logical for the ding dong of a two-tone door chime but anachronistic for the monotone ticking of a clock. Still, even monotonous ticking can, like the inexorable four to the floor of house and techno, become a rhythm if it’s in a piece of music containing other temporal configurations of notes and pauses between notes (rhythms). The monotony then becomes one of several rhythms that together create a groove, however mechanical or metronomic it may seem to some. Besides, rhythm usually involves the configuration of notes into specific, identifiable patterns of either pitch or emphasis.


A note can be emphasised using the following types of accent:

1. dynamic: the note is louder than those immediately preceding it;

2. agogic: the note itself, or the duration of the note plus the silence immediately following it, is/are longer than the note[s] immediately preceding it (its upbeat[s], its lead-in or anacrusis);

3. tonic: the note clearly diverges in pitch in relation to notes immediately before it.

4. metric: see next, under Metre.


In most types of Western music, emphasised notes recurring at regular intervals separated by the same number of beats or subbeats are heard as the regular grouping of beats into a metre with metric accents on the first note of each group. As we shall see, those beats can be either reinforced by coinciding with one or more of the three other types of emphasis, or contradicted by placing accents elsewhere in the metre.

The basic unit of metre is the bar or measure. A bar is a duration defined by a given number of component beats, all of a consistent duration definable in beats per minute. q (‘two-four’), w, r and Y (‘six-eight’) are the commonest time signatures (= symbols indicating metre) for music of European origin. The lower figure in the time signatures q, w, and r denotes a quarter-note (¼) or crotchet (l), the most common scribal unit for designating a single beat of music. The ‘2’, ‘3’ and ‘4’ on top in q, w and r indicate the number of beats in each bar. At 120 bpm (l=120), a q bar lasts for 1 second, a w or Y bar for 1½ and a r bar for 2 seconds. The 16 subbeats shown in Figure 8-5 (p. 294: 8 beats at 120 bpm in q, w, and r; 6 at 80 bpm in Y) occupy 4 seconds. In such standard types of metre, metric accents are on the first beat (‘one’) of each bar. That beat is also called the downbeat because down is the direction of the euroclassical conductor’s baton at those points in the music. Bars are user-friendly units for denoting musical durations because they can be both felt and counted as the music actually progresses, whereas thinking about duration in seconds is virtually impossible with music in progress unless its tempo is exactly 60 or 120 bpm.

While a recurring rhythmic pattern at a particular tempo and metre may be semiotically significant (e.g. slow waltz, frenetic jig, relentless march, laid-back ballad, etc.), difference of metre is on its own no guarantee for difference of perceived kinetic effect. For example, although music in fast q time will more likely sound like a polka or reel rather than a lyrical ballad, w is the metre not only for swirling Viennese waltzes but also for some types of lyrical ballad, as well as for the sedate UK national anthem. Similarly, depending on tempo, rhythmic patterning and other factors, Y metre might just as well signal a lullaby as a cavalry march, or lively galliard or cueca, while r is the most common metre for an almost limitless variety of Western music, ranging from up-tempo rock via son and foxtrot to funeral dirges.


Fig. 8-5. Metre: usual Western time signatures, bars, beats and subbeats

The explanations just given about metre apply in general to music of Central and Western European origin. That music is mainly monometric (= uses only one metre at a time) and symmetric, meaning that its patterns of strong and weak [sub-]beats are consistently grouped into simple multiples of two or three. However, in a lot of traditional music from West Africa, metric practices are much more complex. There are often polymetric configurations running in cycles of 12 or 24 subbeats simultaneously divisible into patterns of 2, 3, 4 and 6 subbeats, and with variable placement of what ears raised on monometric music hear as up- and downbeats in any one of those simultaneously sounding metres. Another tradition of metre differing from that of Western Europe is that found in many types of music from the Balkans, Turkey, the Arab world and Indian subcontinent. There, asymmetric metre is quite common, featuring time signatures like five (in groups of 3+2 or 2+3 beats or subbeats), seven (4+3 or 3+4), ten (3+4+3, 3+3+4, 3+2+3+2), etc.


In music from the urban West the most common exceptions to the symmetric articulation of beats and subbeats are those that configure the eight subbeats of a r bar as 3+3+2 instead of 4+4, or the sixteen subbeats of two r bars as 3+3+3+3+2+2 instead of 4+4+4+4. These alternative patterns involve metric accents, placed on the first note in each group of subbeats, that don’t coincide with regular points of emphasis in the underlying metre. Such variation of metric accent is sometimes called cross-rhythm; or, to put it more conventionally, ‘if a part of the [bar] that is usually unstressed is accented’, as with cross-rhythm, it can be called ‘a syncopation’. Of course, syncopation can also, as noted earlier, result from dynamic, tonal or agogic accents falling on a metrically unstressed point in the bar, but syncopation can logically exist only if the music is monometric. If two or more metres are in operation, as in many types of West African traditional music, or if the location of the downbeat varies between instruments or from one bar to the next, as in the combination of Cuban Montuno with claves and other ‘salsa’ patterns, there can be no syncopation because the music is, at least to eurocentric ears, in a permanent state of cross rhythm or ‘syncopation’. Anyhow, patterns of metric grouping —of beats, subbeats and accents— are, whatever their character, key elements in the construction of different ‘feels’ or grooves.


Like the grooves on a vinyl record, musical grooves are cyclical. They consist of one or more rhythm patterns lasting, as single units, no longer than the extended present (usually just a couple of seconds), but those patterns have to be repeated several times before they constitute grooves. They fit into an overriding tempo and metre, and are usually repeated constantly, though often with minor variations, throughout entire pieces of music, or at least for a complete episode inside one piece. Groove relates directly to the gross-motoric movement of the human body and is most obviously connected to dance, different grooves being suited to different types of body movement, step patterns, etc.

Although the musical sense of the word groove originated in discourse about one culturally specific type of rhythmic and metric patterning —the swing articulation and anticipated downbeats of jazz, later also applied to rock, reggae, funk, R&B, etc.— it’s a concept that can be usefully applied to any music whose present-time cyclic configurations of tempo, rhythm and metre relate to bodily movement. The Viennese waltz, the Chilean cueca, the bourrée, the courante, the jig, the slip jig, the reel, the mazurka and the minuet each has its specific groove, as does even a march, be it swung like The Washington Post (Sousa, 1889) or straight like the Marseillaise (Rouget de Lisle, 1792). This means that while mazurka or march grooves may sound ‘ungroovy’ to jazz or funk fans, they are, like it or not, metric/rhythmic configurations connected to bodily movement, configurations which, just like jazz or funk grooves, are repeated and whose duration as individual occurrences do not exceed the limits of the extended present. Of course, a march or mazurka groove sounds quite different to that of a Funky Drummer loop, but there is no reason to reserve such a useful concept as groove for certain musical traditions and deny it to others any more than there is to insist that a composition must be the written work of one individual, rather than, for example, the result of aural collaboration between band members with input from producers and recording engineers.

Repeated metric/rhythmic configurations (grooves in the sense just described) in the music you’re analysing might suggest continual or repeated movements like tiptoe-ing through the tulips, or marching to war, or trudging to a place of execution, or twirling around as an elegant couple waltzing in an imperial ballroom, or chopping the air with robotic arms, or singing your baby to sleep, or gyrating like a belly dancer, or hauling a heavy load, or swimming against the tide, or grinding and thrusting your pelvis, or floating on your back in a swimming pool, or shuffling your feet fast and forwards, or spinning round with others in an eightsome reel, or galloping hell for leather, or taking a leisurely stroll, etc. ad infinitum. In fact it’s probably best, if you feel uncertain about the lexical niceties of metre and rhythm, to describe your kinetic impression of the groove in question in the sort of aesthesic terms just listed. And if you feel uncertain about your own kinetic impressions of that groove you can always test those impressions intersubjectively (see Chapter 6).


Space is intimately correlated with time and movement. That simple assertion is borne out every moment of the day: we have to time our own movements through space correctly if we want to cross the road without being run over, to fetch food from the fridge, or, in fact, to take any action at all. Even when we’re motionless we rely on time and movement, as well as on loudness and timbre, to let us know what sort of space we’re in. The microseconds it takes for a sound we emit to rebound, once or several times, loudly or softly at 343 metres per second, from different surfaces of different materials placed at different angles and distances from our ears help inform us if we’re in, say, a bedroom, a bathroom, a cathedral, an open field, an empty street or alley, or a long (or short) corridor in a luxury hotel or large prison. Such aspects of acoustic space can be part of live performance but are used much more extensively as parameters of expression in recorded music where input signals from voices and instruments can be treated, separately or together, so that they appear to be sounding in a particular sort of acoustic space. Each acoustic space has a unique profile defined by many different parameters determining two sorts of sound reflection: echo, where return signals are heard as distinct repeats of part or whole of the input signal, and reverb, where return signals merge into one overall spatial impression. The first question to ask is therefore pretty obvious: What kind of space are we hearing in a piece of music through the use of echo or reverb?

Aural staging

With live acoustic performance we mostly hear just one space unless we walk around the venue to check out the sound from different angles and distances. But by the late 1920s, after the invention of the coil microphone, performance of popular song could include two simultaneous spaces: one for the band in the untreated acoustic space, the other for the vocalist who could, through amplification, sing softly in close-up to the listener without being drowned out by the band. Since the advent of multi-track recording, each strand (track, line, part, stream) of the music can be treated separately and so placed in different two-dimensional positions relative to the listener’s ears in the same given space (left, right or centre; near or far). Moreover, each strand of the music can also be assigned its own acoustic space that can be combined with other strands in the music to form a spatial composite impossible ‘out there’ in external reality, but which can be both suggestive and convincing inside our heads as virtual audio reality. Aural staging is what I call the use of acoustic parameters to create such virtual reality. Lacasse (2005) explains this sort of sonic mise-en-scène, illustrating its use in two recordings by Peter Gabriel (1992, 2002) in which a powerful dynamic between inner thoughts and emotional outbursts is created through the subtle treatment of vocal tracks in relation to the rest of the music. Widely used and extremely important in film soundtracks, video games and studio recordings, aural staging is still often overlooked as a vital parameter of expression to consider in the analysis of the vast majority of music produced since the mid 1960s.

Now there’s no room here to even start trying to explain the acoustics, neurology or psychology of aural staging because it involves not just the representation of particular types of space in music, but also the placement of different sound sources in their own spaces, how those sound sources are positioned (either stationary or in motion) in relation to each other, as well as how each of these various configurations produce a specific overall effect on the listener. Without that sort of background theory and without the poïetic experience of a sound engineer, the only viable analytical approach consists of: [1] being aware of aural staging and its importance; [2] the aesthesic description of its effects on the listener. As with many other parameters of musical expression, this approach involves registering and describing its effects on yourself and, if possible, on other listeners. Are you hearing a large or small space? Or several spaces? What sort of spaces? Which strands of the music (e.g. vocals, drums, bass, backing singers, individual instruments or instrument sections, sound effects, etc.) are in which space? Are they situated to the left, right or in the middle? Are they close by, far off or in the middle distance? Are they constantly in the same position? Which sounds are internal (‘thoughts in sound’) rather than external (‘statements out loud’)? Which sounds are more ambient, creating more of background or environment, and which ones are more like a figure (near or far) against that background, or in that environment?

Now, even if the most practical way of dealing with aural staging may be based in interpretations of perception, one theoretical issue is essential to the understanding of how a sense of musical space can be mediated. It has to do with how the three dimensions of Euclidean space (Fig. 8-6) are represented acoustically. In two-channel stereo, sounds can be placed anywhere on the horizontal (lateral) x axis running from left through centre to right (‘panning’, Fig. 8-7). It’s simple: a sound placed on the left, right or in the centre will be literally heard as coming from that position. With the vertical y axis things aren’t that simple because music is rarely, if ever, recorded or diffused in vertical stereo. In live performance you never see piccolo flutes, hi-hats or sopranos placed significantly higher than bass instruments or voices and it’s only in the most experimental studios that you’ll find tweeters in the ceiling or woofers under the floor: everything comes at you from the same height. The vertical placement and perception of sound relies in other words on a much less literal type of mediation, most obviously on pitch parameters.

Fig. 8-7. Speaker placement for (a) two-channel stereo; (b) 5.1 surround sound

The third dimension (z axis) has no standard adjectival label equivalent to the horizontal and vertical of the x and y axes. Often referred to as ‘depth’, the z axis might be more accurately called frontal in terms of simple stereo and frontal-retral in a surround-sound setup (Fig. 8-7). In fact, the z axis can, strictly speaking, only work properly in surround sound because if points on the x axis range from far left to far right, and those on the y axis from high up to low down, then those on the z axis must logically range from far behind to far in front of the listener. ‘Far’ is the operative word here because we are dealing with distance along both axes of stereo sound: horizontal (x) and frontal (z). As shown in Figure 8-7, the stereo’s acoustic horizon traces a semicircle, like the top half of a compass face or analogue clock, running from far left (‘west’ or ‘quarter to’), round through a long way in front of the listener (‘far north’ or ‘twelve o’clock’) to far right (‘east’ or ‘three o’clock’). Sounds can be placed anywhere within that semicircle and their distance or proximity to the listener is mediated by setting different values for parameters of loudness, timbre and reverb.

This short theorisation should make at least one thing clear: the mediation of acoustic space and the positioning of sounds within that space is, with the exception of lateral placement, not so much a matter of putting those sounds literally in their respective positions in relation to the prospective listener’s ears as generating, by other means, sonic data that listeners intuitively interpret as relatively to the left or right, far or near, high or low, diffuse or compact etc. in relation to an overall space that seems large or small, public or intimate, open or closed, and so on. The listener is in other words, as Figure 8-7 shows, placed centre stage with the aural staging arranged around him/her. It’s as if the audience and the actors —the auditorium and the stage— had changed places.

Now, this section has mainly dealt with space as a parameter of musical expression, but that isn’t the same as perception of space in music, as, indeed, was clear in the contradiction between our ability to hear sounds vertically and the absence of literal verticality in aural staging. In fact, our perception of space in music relies extensively on other parameters of expression, many of which are kinetic. To make this link quite clear, imagine the sort of movement you would make in which type of space when taking the following four actions: crossing the road, fetching milk from the fridge, surveying the surrounding countryside from the top of a hill at sunrise or sunset, and cramming yourself into an overcrowded train at rush hour. Obviously, you don’t act as if you’re trying to catch a rush hour train when you’re at peace on top of the hill, or meander meditatively with arms outstretched and eyes contemplating the horizon when you have to contend with other commuters trying to board the same busy train. Nor do you rush frenetically to fetch milk from the fridge for Grandma’s tea on a Sunday afternoon, just as little as you would try to cross a busy road by sauntering three metres into the traffic as if moving leisurely through the kitchen. Expansive musical gesture, slow and smooth or quick and sudden, will obviously suggest more space than do tight or contained types of gesture, be they nervous and claustrophobic or gentle and delicate.

All these aspects of movement through space can be musically mediated by the sorts of kinetic anaphone discussed in Chapter 13, especially under ‘Gestural interconversion’ (p. 502 ff.), as by overall aspects of compositional texture (Chapter 12). But first we need to identify and discuss parameters other than those of time, speed and space. 2012-09-28, 19:30




9. Timbre, loudness and tonality


Fig. 9-2. Piano keyboard:

one octave



he poïetic basics of timbre have already been explained and distinguished from those of tone (pp. 277-280). With its wide variety of types of frequency spectrum, attack, sustain, decay and release, timbre, in conjunction with particular combinations of pitch and loudness, seems to relate synaesthetically with senses of touch, texture, grain, consistency and substance (pp. 494-498). With its component parts produced in a matter of milliseconds (p. 278), timbre is a parameter of expression suited to the expression of various aspects of immediate materiality. Particular combinations of timbre with pitch and loudness are often easiest to denote using synaesthetic-aesthesic descriptors like rough, smooth, rounded, sharp, hard, blunt, cutting, piercing; soothing, watery, airy, sweet, sour, velvety, silky, scratchy; clean, clear, crystalline, bright, clear, limpid; dull, dirty, muddy, muffled, nebulous; brassy, woody, metallic, grainy, gritty, gravelly; full, fat, full-blooded, rich, meaty, compact, thick; thin, nasal, spindly, stringy, wiry, hollow; cold, warm, etc.

Since synaesthetic aspects of vocal timbre are discussed in Chapter 10, the next section focuses on instrumental timbre, including an overview of effects units and devices (p. 309 ff.). That is followed by an exploration the closely related parameter of loudness (p. 313 ff.). The second half of this Chapter (p. 319 ff.) provides no more than a cursory account of tonality. It consists of short explanations of such parameters as pitch, range, register, interval, mode, melody and harmony, all of which are covered much more substantially in Everyday Tonality (Tagg, 2009).


Instrumental timbre

As a parameter of musical expression, timbre can be understood to work in two ways: [1] anaphonically —the timbre in question has an iconic semiotic connection with the sensations described by sort of adjectives listed in italics on the previous page; [2] synecdochally —the timbre relates indexically to a musical style and genre, producing connotations of a particular culture or environment. These two types of timbral semiosis are not mutually exclusive. Let’s start with the latter.

Instrumental timbre as ethnic stereotyping

The timbre of a musical instrument is often used as part of a genre synecdoche (p. 524 ff.) to connote an ‘elsewhere’ heard from a musical ‘home’ perspective, i.e. through the ears of the culture into which it’s imported. For example, to most non-Japanese listeners the koto or shakuhachi is likely to suggest Japan, while Highland bagpipes may conjure up generic Braveheart and tartanry notions of Scotland to those unfamiliar with differences between a pibroch lament and pipers parading to the strains of Scotland The Brave at the Edinburgh Tattoo. Other well-known examples of ethnic timbre stereotypes are the French accordion spelling France (usually Paris) to the non-French, quena or zampoñas and charango to signal Andean folk to non-Andean folk, and sitar with tablas to evoke a generic India in the ears of most non-Indian listeners in the West. All these and countless other examples of ethnic instrument stereotyping will only work if listeners are unaware of the range of moods and functions with which the relevant instrumental sound is associated inside the ‘foreign’ music culture.

Instrumental timbre and conventions of mood and style

Inside our own familiar and broad tradition of musical cultures in the urban West, a symphonic string section can, as we already saw (p. 264), be used to produce familiar tropes ranging from love and romance to violent psychopathy. Similarly, depending on how they play what, French horns can be associated with heroism, hunting, danger or lyricism, while the different sounds of a saxophone might lead listeners to think of wind bands, big bands, jazz, rock or sex. The sex connotation is well established in music for film and television where slightly jazzy saxophone licks played legato have so often been used to suggest an erotic mood that they constitute a trope, referred to by such epithets as sexaphone or high-heeled sax. Or, as Ben and Kerry, writing on the TV Tropes website put it, ‘What’s Kenny G doing in everyone’s bedroom?’

Despite the sort of differences just mentioned, some kinds of instrumental timbre have more focused connotations because they connect, by culturally specific convention, with particular styles or functions. It’s in this way that the harpsichord is sometimes used in audiovisual productions to suggest olden times, typically the eighteenth century in a European upper-class setting, rather than to, say, a kitchen-sink drama from the 1960s. It’s also why the symphony orchestra is linked to either euroclassical concert halls and opera houses or to big-budget Hollywood productions rather than to pub gigs or experimental cinema. And it’s why the sound of a church organ suggests church rather than pole dancing, and at least partly why a legato oboe tune is more likely to signal nostalgic pastoral idyll than an angst-ridden post-apocalyptic dystopia.

Anaphonic conventions of instrumental timbre

Some instrumental sounds act anaphonically (p. 487 ff.) in that they resemble sound, touch or movement that exist outside musical discourse. Timpani rolls, for example, sound more like the rumbling of an earthquake or of distant thunder than like pattering rain or clinking glasses. That’s why a timpani roll is more likely to connote danger, as just before a daring feat of acrobatics at the circus, rather than the sparkling magic of a tinselly fantasy world of a Disney Christmas. The latter would be more aptly suggested by the tinkling of a glockenspiel or celesta because such sounds resemble those of a music box, which is more likely to connote a protected ‘olde-worlde’ sort of childhood than bombs exploding in a war-torn neighbourhood. The tinkling timbre of tiny metallophones also suggest shiny, small, brittle, delicate objects like the clinking glasses just mentioned, or like Tchaikovsky’s ‘Sugar Plum Fairy’ (1892). By the same anaphonic token, the grainy sound of a seriously overdriven electric guitar resembles more closely the growl of a Harley Davidson or of a Ducati fitted with Termignoni exhaust than of a babbling brook, while sonorously smooth viscous string pads are, as tactile anaphones, more likely to be linked to sensations of voluptuousness and romantic luxury than to those of digging up the road with a jackhammer. Still, even though there may be demonstrable anaphonic resemblance between types of instrumental timbre and what they seem to connote, it should be remembered that such semiosis is largely contingent on culturally specific conventions of stylisation.

Acoustic instrument devices

The basic principles according to which musical instruments produce different timbres in different ways have already been explained (pp. 277-280) and the rudiments of acoustic instrumental timbre semiosis have just been summarised. In addition to these basic considerations, acoustic instruments can produce countless different types of attack, continuant, decay, release and frequency spectrum by using different playing techniques, for example pizzicato, col legno and sul ponte on violin, or damping, laisser vibrer, picking and strumming on guitar. Acoustic devices are also used to vary the timbre of many instruments, for example the different sorts of mutes used by string and brass players, the different sorts of reed types used by woodwind players, the different kinds of mouthpiece available to players of instruments like the flute or trumpet, the array of sticks and brushes that drummers use, not to mention the variety of registration (stops or tabs) available to organists. The range of timbral variation has radically expanded since the 1960s, first with the spread of electro-acoustic and, later, digital devices —effects units— for treating audio input signals.

Effects and effects units

Timbre can be altered by using different types of echo and reverb effects, as well as by placement in the stereo space (left/right, far/near), and by using different types of microphone placed in different positions in relation to the original sound source. Other common alterations of timbre are produced by the following sorts of device.


Distortion effects (a.k.a. overdrive, saturation) radically alter the character of overtones in a sound’s frequency spectrum to create timbres that have been variously described as rough, gritty, harsh, rich and full-bodied. Fuzz, as in the famous Satisfaction riff (Rolling Stones, 1965), produces a slightly more piercing type of distortion effect.


Filter effects are those that boost or weaken particular pitch ranges in an audio signal. The most widely used filtering device is the equaliser (abbr. EQ). EQ settings can be used to make a signal more or less prominent in the mix, to get rid of unwanted sounds, or to create specific effects like the ‘disconnected’, disembodied, boxed-in sort of telephone sound that has sometimes been applied to vocal tracks.

The talk box is a filter device that sends input audio through a tube into the instrumentalist’s mouth which, shaped to produce any vowel sound, creates output audio giving the impression that the instrument is talking.

The wah-wah pedal creates a similar effect to the talk box, except that it only covers one binary of vowel sounds —from ‘oo’ to ‘ah’ [wa] and back [aU]. Wah-wah probably derived from acoustic muting techniques developed by jazz musicians. As an effects unit, wah-wah is usually applied to electric guitar sounds and is common in psychedelic music, as well as in certain types of funk and disco.

The vocoder manipulates frequencies in an audio signal to produce a non-human, robotic sort of sound.

Modulation effects

Modulation effects mix two or more audio signals to create a whole array of different sounds. Apart from ring modulation, which, depending on the input signal, produces bell-like or sci-fi sounds, modulation effects can be thought of in two main aesthesic categories that I call diffusive and oscillatory.

Diffusive effects

Diffusive effects are those that use various techniques to diffuse a single sound so that its position on the aural stage seems less precise, so that it seems to fluctuate or cover more acoustic space. These effects, particularly phasing and flanging, create a sweeping, swishing, swooshing sort of effect. Chorus effects are similar except that they sound fuller and often seem to shimmer rather than swish or swoosh. Dubbing (or doubling) is not strictly a modulation effect but it can, like chorus, make audio input sound bigger (not louder) and create the impression that there is ‘more of the sound occupying more space’, especially if the original and dubbed tracks are assigned different positions on the aural stage. Digital dubbing involves copying the input signal, detuning it very slightly, offsetting it by a few milliseconds and mixing that copy with the original. Applied frequently to vocal tracks, dubbing can be used to flesh out a thin voice or to make a single voice sound like two or more of the same vocal persona, or like two or more sides of the same vocal persona. Digital dubbing has not replaced ‘real’ dubbing practices in which the artist physically re-records the same passage a second time on to a different track. Real dubbing is useful if a radically different overdub is required, for example if the singer needs to whisper the words he/she has previously recorded in song so that listeners can hear the message both out loud and inside their heads.

Oscillatory effects

Oscillatory effects are those that add rapid to-and-fro movement to a sound. Vibrato involves microtonal oscillation between two pitches and is used by classical violin players to give more body to longer tones. The wide, wobbling, slow vibrato that was once fashionable as a device of heightened emotion among Mediterranean opera singers is rare in popular song, except for the infamous gospel jaw wobble applied to the end of long notes by vocalists performing slow ballads involving the public presentation of ‘deep personal feelings’. Tremolo involves no change of pitch but rapid oscillations in the loudness (volume) of a note. Tremolo produces more of a pulsating, shuddering, or, as the name suggests, trembling rather than wobbling sort of effect.

Loudness effects

Loudness effect units in common use are compression, limiting, gating and the volume pedal. Compression basically makes loud sounds weaker and weak sounds louder, thereby compressing the audio signal’s dynamic range. An audio track can be compressed to make it sound fuller and ‘tighter’ so that it stands out from other input sources. Compression is also often applied to the complete mix, to an entire song or album, even to the entire output of a radio station. Overall compression is useful if the music is to be heard in spaces containing a lot of extramusical sound, for example when driving a vehicle.

Limiters set a ceiling for the maximum strength of a sound and are mostly used to avoid unwanted distortion. Gating does the opposite: it sets a minimum level of intensity below which nothing passes through into audio output. By excluding certain elements of a sound’s attack and decay, gating alters the timbre of the input signal. A particularly common gating practice is the gated reverb that has often been applied to drum tracks in order to create a bigger, more compact sort of sound. Strong compression and gated reverb on kick drum tracks are largely responsible for the voluminous, sub-bass boof sound of electronic dance music’s four-to-the-floor aesthetic.

The volume pedal lets instrumentalists adjust their output level without having to take their fingers off the instrument they’re playing. Church organists use the instrument’s swell pedal to adjust the volume of passages played on the Swell manual and guitarists can use a volume pedal to increase volume during a solo. The device is also often used to change the timbre of individual notes, most commonly by weakening or muting their attack and shifting to full volume for the conitnuant. This technique, sometimes called violining, can make an overdriven electric guitar sound a bit like a bowed violin: it produces a swell effect that seems smoother, softer (not quieter), rounder, less percussive, less brash, more ethereal and more reflective than the untreated sound.


The words ‘softer (not quieter)’, used in the previous sentence, raise the first of several problems about the adjective loud. The first of these is that loud is a bit like light (adj.), whose opposite can be either dark or heavy, in that it also has two opposites: soft and quiet. There is in other words a difference between the more timbral-tactile (loud/soft) and the more dynamic-kinetic (loud/quiet) aspect of loud. Dynamic-kinetic has obviously to do with energy, power and movement, and that is literally what loudness is all about, at least in poïetic terms. It obviously takes more energy to produce a loud sound than a quiet one: string players bow more energetically, pianists hit the keys harder, wind players blow more forcefully, and your amp uses more electricity to make stronger sound waves that have greater amplitude. That means in its turn that the sounds so produced cover more three-dimensional space or, to be more accurate, that they literally occupy a greater volume. Volume and dynamics are commonly used as synonyms for loudness and are conventionally measured in decibels (dB), a unit which, in acoustics, quantifies sound pressure levels in air. These levels range from the threshold of human hearing (0 dB), through the sound of, for example, rustling leaves (10 dB), a washing machine (60 dB), a screaming child (90 dB), a helicopter (110 dB), an averagely loud rock band (120 dB), to the threshold of pain (130 dB) and a rocket launch (180 dB).

Another conceptual problem with loudness is that it isn’t just a matter of simple decibels. Many amplifiers used to come not only with the standard volume control regulating the total audio output signal strength (measured in dB), but also with a knob or button labelled ‘loudness’. The point here is that signals at the upper and lower ends of the audible frequency range need to be stronger if they are to be heard as loud as those in the middle. Loudness control compensated for this idiosyncrasy of human hearing by letting listeners boost (more dB) those highs and lows without simultaneously boosting mid-range frequencies. By regulating the decibel level of sounds at different frequencies, the amp’s loudness control also altered the timbre of the overall output. More recently, however, loudness has been used to denote the overall effect of compression in a complete audio production (see also p. 312). In that context it’s worth noting that the quest for constant maximum loudness, using radical amounts of compression, has, in many recent recordings of rock and electronica, led to a reduction in the dynamic range so that it’s now inferior to that of an Edison cylinder recording from 1909 (Vickers, 2010: 27). Whether this trend is a mere fad or whether it meets a need among listeners to block the extramusical world from impinging on an ‘absolute’ internal experience in an exercise of acoustic self-harm is a matter that cannot be discussed here.

Sound signal strength (measurable in dB) is only one factor determining the relative loudness or quietness of what we hear. Temporal, timbral and tonal parameters are all at least as important, especially if loudness is considered aesthesically in terms of the prominence and audibility of one strand or layer of sound in relation to others. Musical strands with clear rhythmic, tonal and timbral profile simply stand out more than those without and can seem louder, even if their output signal strength (dB) is lower than that of other strands in the music. Loudness is in short a parameter of musical expression in which signal strength (dB) is a central factor but which also relies on combinations of timbre, pitch, tone and timing to produce maximum effect.

Taking loudness as an aesthesic category, the obvious questions to ask of an analysis piece are: How loud is the music? Is the music constantly loud or quiet? If not, which passages are louder and which ones quieter? Are changes from loud to quiet or quiet to loud sudden or gradual? Which, if any, of the strands in the music are louder than others? Do any individual notes or motifs stand out as louder or stronger than others? What effects are created by these differences between loud and quiet? Are any features of loudness indicative of a particular type of music?


Pitch and tonality

Pitch and tonality, already defined (pp. 275-276), are, along with narrative form (diataxis), the parameters of expression covered in most detail by conventional music theory. Since I’ve dealt at some length with tonal topics in Everyday Tonality (Tagg, 2009), and since their explanation is more likely to involve poïetic jargon than has been necessary so far in this book, this section is stripped to its barest essentials.


Pitch and octave

Pitch, as we already saw, means the perceived ‘lowness’ or ‘highness’ of a tone and is measured in cycles per second or Hertz (Hz) with 440 Hz as internationally agreed concert pitch, the frequency of the note a in the middle of the human range of hearing. As mentioned under ‘Timbre’ (p. 277), the first harmonic or overtone (2f) has a frequency twice that of its fundamental (1f ). For example, 880 Hz (a5, the note a in octave 5) is 2f in relation to 1f at 440 Hz (a4, a in octave 4) which, in its turn, is 2f in relation to 220 Hz (a3). The note name for the three pitches 220 Hz, 440 Hz and 880 Hz is identical —a— and the pitch difference between any given pitch and another at twice or half its frequency is one octave. So, a4 (440 Hz) is one octave above a3 (220 Hz) and one below a5 (880 Hz). As the word itself suggests, an octave is the eighth note you arrive at if you ascend a heptatonic scale step by step, for example a b c d e f g a , where a is 1 and a is either 8, or 1 in the next octave.

The octave is a central concept in music for at least three reasons. [1] All known music traditions tend to treat two pitches an octave apart as the same note in another register : men are understood to be singing the same tune as women and children if both parties follow the same pitch contour. [2] The register and pitch range of audible fundamental frequencies can be referred to by octave (a3, b$4, c#5, etc.) without having to think about cycles per second (Hz). [3] The organisation of pitch intervals, the most important determinant of differences in tonal vocabulary (see pp. 322-332), is conceptualised within the framework of a single octave and is as a rule applicable to pitches in any octave.


Pitch range and register

Pitch range means exactly what it says. It’s either: [1] the range of pitches between the lowest and highest notes that can be played on a particular instrument or sung by a certain (type of) voice; or [2] the range of pitches between the lowest and highest notes in a certain strand of music, or in a particular passage or piece of music. For example, the pitch range of an oboe covers almost three octaves (from b$3 to g6), and the pitch range of the tune Happy Birthday is the single octave between its first note (lowest) and the note on the third occurrence of birth[day] (highest). Most people have an effective vocal range of just under two octaves, a range which no widely sung melody exceeds. Two octaves may seem quite puny compared to the ten-octave range of a humpback whale but there is an unmistakable difference in gestural affect between tunes that span an octave or more (expansive) and those that cover no more than a third (constrained). Although pitch range is more often applied to the sort of pitch spans just mentioned, it can also be used (see Chapter 12) to describe overall impressions of vertical space in terms of orchestral or chordal density and sparsity.

Register is easiest to explain by example. Depending on how you count what, the average human voice uses between two and four registers. Apart from the chest register and head register, so called because that’s where the sound of low and high notes usually resonate in the singer’s body, it’s also possible to speak of a mid register between the two. In addition, the human voice has a falsetto range that both overlaps with and extends higher than the head register. Since different vocal registers draw on different parts of the human anatomy they also produce different timbres: register is in other words a pitch range associated with particular timbral traits.

The larger the intervallic leap between two consecutively sung notes, the more likely it is that there will be a change of vocal register. For example, a deeply felt sigh of relief, delight or despair has to descend an interval of at least a sixth, but sliding down a mere third, an interval demanding no change of vocal register, will sound more like an indifferent uh-uh of negation, acknowledgement or resignation.36 Conversely, leaping an octave to a strong high note involves a much more expansive, proclamatory upwards-and-outwards gesture than ascending a mere second or third.36 Since musical instruments also vary in timbre from one register to another, patterns of vocal intonation and articulation, including ‘sighs’ and go-get-’em upward leaps, can be effectively expressed without involving the human voice.

Melodic contour

Melodic lines, including motifs, bass lines and riffs, all have a pitch contour, i.e. a pattern of ups and downs of the sort shown in Figure 9-1. Contour patterns can be typical for certain musical styles (e.g. the tumbling strain for blues) and some can be related to connotative categories like recitation (often ‘centric’) or dream (‘wavy’). Initial and final motifs (melodic cadence figures) can also be indicative of a particular music culture or type of gesture.

Fig. 9-1. Melodic phrase contour types

Here are a few basic questions that can be asked about pitch in a piece of music. What are its highest and lowest pitches? Do the high and low pitches occur at the same time? If so, how would you describe the texture: thin, full, top heavy, bottom heavy, all in the middle, with no middle? If pitch texture varies in the piece, where, how and why does it do so? Which strands in the music are in which register[s]? Is their pitch range large and expansive or narrow and constrained? Are there any noticeable intervallic ups or downs (disjunct motion and changes of register) or do the music’s different strands move in small steps ( conjunct motion)? How would you describe the pitch contours of melodic strands in the music? Do they suggest a certain style of music? Do any of the above suggest any sort of gestural affect? If so, which?



Tonality means the way in which tones (notes with discernible fundamental pitch) are configured. Now, since the octave is, as we just saw, cross-culturally accepted as presenting the same note in another register, differences of tonality between musical styles and cultures are to be found in how tones are arranged within any octave and in how those tones are treated, for example in terms of which ones are heard as sounding appropriate or inappropriate together or in sequence.

Now, if music, as I’ve repeatedly argued, is no more a universal language than language itself, it can also be argued that tonality is, with the exception of the octave, the least universal aspect of musical structuration and expression. For example, those of us brought up in the urban West are likely to hear most notes played on gamelan instruments as out of tune, even though gamelan musicians take great care to ensure their pitches conform to the appropriate tradition and function of their music. Closer to home, some of my music students have raised eyebrows at old recordings of white Appalachian or African-American vocalists who they hear singing ‘out of tune’, or ‘in the cracks between notes’ when nothing out of tune or ‘in-between’ was either intended or heard in the original context. As for norms about which tones sound good together or in succession, please listen again to the Bulgarian women singing their happy harvest song:H they derive much cheer from the sort of semitone dyads that we’re more used to hear as harsh clashes in underscore for a horror film. In short, since tonal norms can vary so greatly from one style or culture to another, we cannot assume that the conventions of our own traditions —of being in or out of tune, of consonance and dissonance, of what sounds pleasant and unpleasant, etc., however ‘natural’ or intuitive it may all sound to us— should apply to others.

Tonality is also probably the most difficult aspect of structuration for non-musos trying to get to grips with musical meaning. There are at least four reasons for this problem: [1] conventional musicology has developed a sizeable arsenal of terms relevant to tonality in the euroclassical tradition; [2] those terms can be problematic, even ethnocentric, and need critical discussion; [3] such discussion involves other specialist terms in need of explanation; [4] tonal phenomena are virtually impossible to explain in writing without resorting to musical notation which, as we saw earlier, developed to graphically encode aspects of musical structure that are hard to memorise, especially sequences of pitch (p. 122). That’s why, after a few initial words of practical advice, the next few pages provide no more than an extremely rudimentary summary of some of the most important aspects of tonality.

So, what can you do as a non-muso if your analysis piece contains something you hear in terms of a mood, gesture or connotation but which seems to be a tonal issue more than a matter of speed, rhythm, periodicity, loudness, timbre, narrative form, aural staging, or any of the other parameters more conducive to aesthesic description? Could it be a question of mode (major, minor, pentatonic, etc.), harmonic idiom (e.g. euroclassical, romantic, avant-garde, jazz, rock, etc.) or what? I would initially suggest the following. [1] Don’t be alarmed: tonal parameters aren’t necessarily the most important in your analysis piece. [2] Read relevant passages in Everyday Tonality (Tagg, 2009) to see if you can find any answers to your problem. [3] Use the unequivocal timecode placement tips (p. 256, ff.) to focus on the tonal features you’ve identified as potentially meaningful and, if need be, ask a musician for help in identifying and naming them. [4] Make valiant efforts to read and understand the next few pages.

Tuning systems

Much of the music we hear in the urban West conforms to the convention of equal-tone tuning (a.k.a. equal-tone temperament) which divides the octave into twelve equal and slightly doctored semitone intervals arranged on a piano keyboard in the familiar pattern of seven white and five black notes shown in figure 9-2. The twelve pitches in just-tone tuning, on the other hand, are based on more ‘natural’ frequency ratios. Just-tone tuning is suited to styles involving no more than seven different notes to the octave, as with many types of blues, bluegrass, blue-based rock, folk rock, not to mention the traditional musics of Africa, the Arab world, the Balkans, the British Isles, the Indian subcontinent, Scandinavia, etc. Just-tone tuning often sounds more ‘open’, ‘bright’ and ‘clean’ than equal-tone tuning, especially if drones are involved (p. 337, ff.).


In everyday speech interval means the ‘horizontal’ distance in time between two events. In music theory, interval refers to the ‘vertical’ distance in pitch between two tones. In Western music theory, pitch intervals are expressed as ordinal numbers based on the heptatonic (seven-note) scale and on the inclusive principles of Roman counting. Two notes at the same pitch (no difference) are in unison (unum = one), a difference of one tone between two pitches is called a second, a difference of two tones a third, and so on until you reach a difference of seven tones at an interval called not seventh but octave. Of course, a quick look at the piano keyboard (fig. 9-2, p. 321) reveals that the octave contains not only seven white notes but also five black ones. Each of those twelve notes is at an interval of one semitone (one fret on guitar) from its neighbours above and below. This means that some of the seven standard interval names (especially seconds, thirds, sixths and sevenths) need some sort of qualification. For example, the difference between a minor and major third in relation to the music’s keynote or tonic, an interval of three and four semitones respectively, is at the basis of notions about the character of minor and major tonality.

Intervals can also be expressed in terms of frequency ratio, as shown in column 4 of table 9-1 (p. 323) which sets out the twelve intervals inside an octave whose tonic (doh) I’ve set to the note c. Column 1 shows the names of the twelve notes, both white and black, column 2 the number of semitones separating each note from the low tonic on c, and column 3 the heptatonic scale degree in relation to that same c, for example ‘$3’ (‘flat three’) for the note e$, ‘5’ (‘five’) for g. Column 5 presents the full name of each interval, according to conventional Western music theory, also in relation to the low tonic on c.

Table 9-1. Western intra-octave intervals: a selection in just temperament and descending order with tonic (keynote) set to C.


Note name

(doh = c) 2. Semitones

above doh 3. Scale degree

shorthand 4. Frequency

ratio to lower tonic 5. Music theory interval name

(here in relation

to lower tonic on c)

c 12 8 2:1 octave

b 11 #7 15:8 major seventh

b$ 10 $7 9:5 minor seventh

a 9 #6 5:3 major sixth

a$ 8 $6 8:5 minor sixth

g 7 5 3:2 perfect fifth


f# 6

6 $5

#4 45:32

45:32 tritone or diminished fifth

or augmented fourth

f 5 4 4:3 perfect fourth

e 4 #3 5:4 major third

e$ 3 $3 6:5 minor third

d 2 #2 9:8 major second or whole tone

d$ 1 $2 25:24 minor second or semitone

c 0 1 1:1 prime or unison


This table reveals, for example, that e$ (column 1) is three semitones (column 2) above c. As shown in column 5, an interval spanning three semitones is also known as a minor third, or $3 (‘flat three’) for short (column 3). Given that concert pitch for middle c is 261.63 Hz, that e$ is a minor third (three semitones) above c (columns 2, 3, 5), and that the pitch frequency ratio for a minor third is 6:5 (column 4), e$ should be pitched at 313.96 Hz ({6×261.63}÷5). And so it is, at least ‘naturally’, except that, like the major third (5:4) —as well as the two sixths, the minor seventh and the major second—, the minor third has been doctored to fit into the system of equal-tone temperament that has been in widespread use in the West since around 1800. Intervals with more complex pitch ratios than 6:5 —in particular the minor second (25:24), the major seventh (15:8) and the tritone (45:32)— are subjected to greater adjustment in equal-tone temperament, but the fourth (4:3) and fifth (3:2) are adjusted by minimal amounts while the octave ratio of 2:1 is left entirely in tact.

Now, it may well be that the simple acoustic ratios for fourths and fifths make them more likely candidates than major sevenths or minor seconds (semitones) for cross-cultural treatment as consonances. Walter Werzowa, creator of the famous four-note Intel Inside audio logo d$d$

Although it’s impossible here to do more than scratch the surface of the topic, two aspects of intervallic affect, both relating to the linguistic domain of representation, are easy to observe and useful in semiotic music analysis. The first of these has to do with the fact that someone expressing surprise, enthusiasm, fright, frustration or indignation will normally speak using a wider pitch range than someone expressing boredom, depression, indifference or resignation. In other words, a melodic line, instrumental or vocal, that contains large intervallic leaps and bounds is more likely to be heard in terms of heightened emotional energy than one that doesn’t. However, the affective precision of that energy (interest or indignation, surprise or shock, etc.) will depend on matters of relative consonance or dissonance, as well as on timbre, loudness, rhythm, tempo, surface rate, and of tonal vocabulary.

Tonal vocabulary: modes and keys

Tonal vocabulary (a.k.a. ‘pitch pool’) means the store of different pitches used to create tonal structures in a body of music, be it a phrase, passage, work or an entire style. As mentioned earlier, some traditions use tonal vocabularies unfamiliar to Western ears in that they contain pitches that don’t match the twelve semitones of Western equal-tone temperament, while other traditions use ‘our’ twelve semitones in ways that can strike us as strange or exotic. One way of getting to grips with these important semiotic differences in tonical music is to distil tonal vocabulary down to single occurrences of each constituent pitch and to arrange those pitches, normally in ascending scalar order, inside one octave. A mode is simply the manageable conceptual unit resulting from that process of distillation and ordering. Please note that modes can be used to designate tonal vocabularies in terms of both melody and harmony but that the following account is based on solely melodic theories of mode.

Structural theory

Table 9-2. Western heptatonic modes on the seven white notes of a piano keyboard

Mode > A B C D E F G Note


Note \ Æolian Locrian Ionian Dorian Phrygian Lydian Mixolydian

f $6 5 4 $3 $2 8=1 $7 f

e 5 4 #3 2 8=1 #7 #6 e

d 4 $3 2 8=1 $7 #6 5 d

c $3 $2 8=1 $7 $6 5 4 c

b 2 8=1 #7 6 5 #4 #3 b

a 8=1 $7 #6 5 4 #3 2 a

g $7 $6 5 4 $3 2 8=1 g

f $6 $5 4 $3 $2 1 $7 f

e 5 4 #3 2 1 #7 #6 e

d 4 $3 2 1 $7 #6 5 d

c $3 $2 1 $7 $6 5 4 c

b 2 1 #7 #6 5 #3 b

a 1 $7 #6 5 2 a

g $7 $6 5 1 g

Tonic >

sol-fa A

La B

Si C

Doh D

Ré E

Mi F

Fa G

Sol Note



The most commonly used modes in the urban West are, in terms of the number of pitches they contain within one octave, pentatonic (5; pp. 330-331), hexatonic (6) and heptatonic (7). Most modes have their own keynote or tonic, the main tonal centre or reference point for other notes in the mode. Restricting this part of the account to tonal idioms compatible with Western equal-tone tuning and to the seven white notes on a piano keyboard inside one octave, table 9-2 shows the seven possible tonics (keynotes) in capital letters above and below the contents of the table. Reading from the bottom up, the table shows each heptatonic mode starting on each of those seven notes as tonic and each including its own seven steps ascending from 1 (the tonic) to 8=1, the same tonic note one octave higher. The columns far left and right show the white-note names corresponding to the numbers in each mode column, for example a b c d e f g for scale degrees 1 to 7 in the æolian (A) mode, c d e f g a b for 1 to 7 in the ionian (C).

The white-note heptatonic modes on A (æolian, ‘the A mode’ or ‘la mode’) and on C (ionian, ‘C mode’ or ‘doh mode’) are easily recognised by most people in the Western world. The ionian (C/doh) mode is configured exactly as the Western major scale and the æolian corresponds to one variant of the Western minor scale. Staying with only the seven white notes on a piano keyboard, both the ionian (‘C major’) and the æolian (‘A minor’) contain the same seven notes (a b c d e f g), but c is keynote or tonic in C major, the ionian mode, while a is tonic in A minor (= the aeolian mode). The point is that the same seven notes have a different flavour if the tonic changes place from C ionian (major) to A aeolian (minor), or indeed to any of the seven notes —D for the dorian mode, E for phrygian, F for lydian, G for mixolydian etc. Table 9-2 shows the structural basis of those configurations in several ways: [1] the scalar position of the two semitone intervals b-c and e-f is unique to each mode (e.g. 2-$3 and 5-$6 for the æolian, #3-4 and #7-8 for the ionian); [2] the pattern of scalar intervals —$ (flat), # (sharp) or unaltered— is unique to each mode (e.g. 1 2 $3 4 5 $6 $7 for the æolian and none other, 1 2 #3 4 5 #6 #7 for the ionian only); [3] the pattern of whole- and half-tone scalar steps is unique to each mode. Using semitones as a unit for counting intervals (‘1’ = 1 semitone), the ascending interval steps of an æolian scale create the unique pattern 2 1 2 2 1 2 2, those of the ionian 2 2 1 2 2 2 1, the dorian 2 1 2 2 2 1 2, the phrygian 1 2 2 2 1 2 2, the lydian 2 2 2 1 2 2 1, and the mixolydian 2 2 1 2 2 1 2.

When it comes to the modal specificity of scalar intervals in relation to a tonic, table 9-2 shows that the ionian (C or doh mode), lydian (F/fa mode) and mixolydian (G/sol mode) all contain #3 (‘sharp three’ or ‘major third’), a trait which gives rise to their common qualification as ‘major modes’, while the label ‘minor mode’ is given to the dorian (D/ré), phrygian (E/mi) and aeolian (A/la), since these three all contain $3 (‘flat three’ or a ‘minor third’). It’s also worth noting that the lydian is the only one of the seven diatonic heptatonic modes to include a raised fourth (#4) and that the locrian is alone without a normal ‘perfect’ fifth, which is the most likely reason for it being used so rarely and why it isn’t included in the discussion that follows.

All these dry facts about mode may seem nerdy and arcane but they can be useful in understanding the semiotic potential of tonality, at least if this theoretical knowledge is rooted in some practical familiarity with real sounds. Such familiarity is easy to acquire even if you aren’t a musician, or if you have no access to a piano keyboard, because many user-friendly midi keyboard apps can be downloaded free to your computer, tablet or smartphone. To ‘check out the feel’ of a mode using only the white notes of the keyboard, all you need to do is:

1. Hold down or repeat the tonic note (c for ionian, d for dorian, etc.) like a drone• in the bass register.

2. With the keynote (tonic) sounding constantly, play short melodic patterns, circling first round the keynote, then venturing further afield, using rising and falling patterns, or any of the melodic contours shown in figure 9-1 (p. 319).

3. Listen out for how the mode sounds when you include the semitone intervals e-f or b-c in short phrases that finish on the keynote or on the fifth.

4. Apply these white-notes-only tricks to any of the seven modes shown in table 9-2 (p. 326).

Each of the seven heptatonic modes in table 9-2 can be transposed so that any of the Western octave’s twelve constituent semitone steps can act as tonic, just as long as the mode’s unique sequence of tones and semitones is retained. For example, the ionian mode or ‘major scale’, with its unique ascending pattern of steps, 2 2 1 2 2 2 1 (still counting in semitone units), and of intervals (1 2 #3 4 5 #6 #7), produces, with c as its tonic, the notes c d e f g a b (plus c an octave higher). Transposing that same mode, with those same patterns of step and interval up one semitone from C to D$ produces an ionian mode on d$ (the D$ major scale): d$ e$ f g$ a$ b$ c. Then, if you transpose the same pattern down a minor third from C to A you end up with the ionian mode in A (A major: a b c# d e f# g#). If you carry out those two transpositions of the ionian mode, you will have played the same scale in three different keys: C major, D$ major and A major.

Minor keys are more problematic for reasons too complicated to discuss here. Simplifying matters drastically it can be said that in Western music theory, based on the euroclassical tradition, any mode including a minor third (dorian, phrygian, æolian, la-pentatonic, mi-pentatonic, etc.) is understood as generically minor in a simple major-minor dualism. The only criterion for qualifying music as major or minor is in other words whether the third scale degree is three or four semitones above the tonic. In fact, if a euroclassical piece is billed as being in the key of A minor, you are unlikely to hear the dorian or phrygian modes and you will probably hear the æolian mode only in descent. Despite this idiosyncrasy of euroclassical tonality it’s possible to consider any musical passage or piece as being ‘in a key’ provided that it is [i] tonical —it has a keynote, a central point of tonal reference— and [ii] that the keynote can be designated in absolute terms (A, B$, C, etc.). Indications of key also often refer to the music’s tonal vocabulary, for example: ‘God Save The Queen is in G major’, ‘Beethoven’s fifth is in C minor’, ‘Steeleye Span’s 1971 version of The Blacksmith is in C# dorian’.

Of course, some intervals in the heptatonic modes presented above can be altered and others added by ornamentation, inflection or through adjustment to tonal context, but the basic principles just summarised hold good for these and for other modes. One of those ‘other’ modes is the heptatonic ‘Gypsy scale’ of flamenco music, a mixture of the phrygian and Hijaz modes. Two well known ‘other modes’, by sound if not by name, are the hexatonic whole-tone scale, whose unique interval pattern runs (in semitones) 2 2 2 2 2 2, and the octatonic scale which runs in alternate steps of whole and half tones (2 1 2 1 2 1 2 1). Both are common in Hollywood film music mystery cues. However, the most widespread type of ‘other modes’ is pentatonic.

Pentatonicism is common in traditional musics from such far-flung parts of the world as West Africa, the Andes, many parts of East Asia (including China, Japan and Indonesia), the British Isles (notably Scotland) and Hungary. It’s also commonly used among Native Americans and the Sami. Moreover, it’s often heard in blues, gospel and in traditional music from the Appalachians. The most widespread type of pentatonicism is anhemitonic (= without semitones), an easy concept to grasp if you check out the piano keyboard’s five black notes, conveniently arranged within the octave in one group of three notes (g$ a$ b$) and the other of two (d$ e$). The gap between adjacent black notes is a whole tone (2 semitones) while that between the two groups of black notes is a minor third (three semitones). Column 4 in table 9-3 shows how the five anhemitonic pentatonic scales contain three interval steps of a whole tone (= 2 semitones), two three-semitone intervals (‘3’), but no single-semitone steps (anhemitonic). Each of the five modes has its own unique configuration of those two types of interval. The bottom line in table 9-3 is quite different. It shows the notes in a hemitonic pentatonic mode used in traditional music from Japan. As shown in column 4, it contains two single-semitone steps (‘1’), one whole-tone step (‘2’) and two steps of a major third or four semitones (‘4’).

Table 9-3. The five anhemitonic pentatonic modes (doh, ré, mi, sol, la) plus one hemitonic pentatonic mode.


Mode name 2.

Black notes only 3. Heptatonic scale degrees 4. ½-tones betw. notes 5. White

notes only

doh-pentatonic g$ a$ b$ d$ e$ [g$] 1 2 #3 5 #6 [8] 2 2 3 2 3 c d e g a [c]

ré-pentatonic a$ b$ d$ e$ g$ [a$] 1 2 4 5 $7 [8] 2 3 2 3 2 d e g a c [d]

mi-pentatonic b$ d$ e$ g$ a$ [b$] 1 $3 4 $6 $7 [8] 3 2 3 2 2 e g a c d [e]

sol-pentatonic d$ e$ g$ a$ b$ [d$] 1 2 4 5 #6 [8] 2 3 2 2 3 g a c d e [g]

la- pentatonic* e$ g$ a$ b$ d$ [a$] 1 $3 4 5 $7 [8] 3 2 2 3 2 a c d e g [a]

‘Trad. Japanese’ g# a@ c# d# e@ [g#] 1 $2 4 5 $6 [8] 1 4 2 1 4 e f a b c [e]


The different ‘feels’ of these pentatonic modes can be tested using the tricks listed on page 328 for heptatonic modes. Just ensure that you hold down the keynote (shown in bold font) in the bass and that you then play no other notes than those shown in column 2, if you prefer just black notes, or in column 5, if you’d rather stick to white notes.

Mode and connotation

Anhemitonic pentatonic modes may not have much geo-ethnic specificity but the Aeolians, Locrians, Dorians, Ionians, Phrygians and Lydians were all ethnic groups heard and seen from the perspective of the ruling class in Ancient Athens. Those peoples aren’t alone in providing mode labels. Several Arab modes also have ethnic or regional names — Hijaz (حجاز), Iraq (عراق), Kurd (كرد), for example— and mixtures of the phrygian and Hijaz modes are often referred to as Gypsy, flamenco or Arab. Westerners are also likely to hear the hemitonic pentatonic hirajoshi mode variant shown in the bottom line of table 9-3, as typically Japanese. Now, if you’re Japanese you’ll more likely hear that mode as old or traditional rather than as just Japan because, unlike the outsider, you’re familiar with all the other modes, including those of the urban West, that are more widely used in the music you hear on a daily basis and just as much Japan to you as the traditional mode. To the outsider, however, Japan cannot be musically represented as specifically Japan if it isn’t treated as different from us, as ‘another’. That’s why ‘exotic’ instrumental timbre (e.g. koto, shakuhachi) and ‘exotic’ modes containing scalar steps of four semitones (major thirds) and semitones are heard in the West as Japanese. It’s also why we hear heptatonic modes like Hijaz and Shad 'Araban (شد عربان), with their three-semitone and single-semitone scale steps, as more ‘typically Arab’ modes than Ajam (عجم) or Rast (راست) which resemble the Western ionian and æolian modes respectively.


Modes containing a flat seventh ($7) and no semitones next to the tonic (no $2, no #7), i.e. the æolian, dorian and mixolydian (see fig. 9-2, p. 326) and hexatonic modes with no sixth degree, are, along with doh- and la-pentatonicism (p. 331), more common in the popular song repertoire of pre‐industrial Britain, Ireland and Appalachia than in most music of continental European origin. These sorts of mode are often nebulously associated with either ‘Celtic’ or old anglophone ‘folk’ traditions even though, for example, mixolydian tunes are two a penny in baião music from Northeastern Brazil. It’s worth adding that dorian and mixolydian harmonies are extremely common in rock music and that certain types of mixolydian chord progressions are often used as style flags in Hollywood Westerns.

But connotations of mode are not solely ethnic or regional. After all, the words mood and mode are etymologically interrelated.

The most obvious convention of modal connotation in the urban West is of course the major-minor dualism and the widespread notion that major key = happy and minor key = sad. This notion, I argued earlier (p. 264, ff.), is a common cause of mistaken connotative identity because of all the sad music we hear in a major key, and happy music we hear in the minor. That said, the major-minor dualism does have some validity in euroclassical and jazz repertoires where a minor-key piece is probably (but not necessarily) more likely to involve states of mind like dejection, melancholy, sadness and fury.

Other musical traditions are, or have been, much more detailed about links between particular modes and particular states of mind. For example, in The Republic (c. 380 BC) Plato reports that Socrates wanted to ban the lydian and ionian modes from his ideal city state because they were allegedly too sad, relaxed, effeminate or drunken, and to promote instead the toughness, courage, moderation, prudence, openness and humility that were apparently associated, in that cultural context, with the dorian and phrygian modes.

Much music from the Arab world and the Indian subcontinent has over centuries developed a degree of melodic sophistication not found in the Central European tradition, much of whose tonal interest is harmonic. This is perhaps why melodic tonality in Arabian and Indian musics seem to offer a more detailed and varied range of connotations than we Westerners are used to. For example, the Arabian tonal configuration Rast is supposedly related to masculinity, pride and a stable mind, while Bayati is thought to evoke joy and femininity, Sikah love, Saba pain and sadness, and Hijaz the distant desert. Similarly, the rāgas of Northern Indian classical music were traditionally linked with certain seasons, times of the day and to particular moods or states of mind (rasa), each with their resident deity, colour, etc.


Melodies are monodic tonal sequences perceived as musical statements with distinct rhythmic profile (p. 291), pitch contour (p. 318) and tonal vocabulary (mode). Since melody is given extensive coverage in Everyday Tonality (Tagg, 2009: 57-79), this account is limited to: [1] a list of melody’s most important general characteristics; [2] an explanation of melismatic and syllabic singing; [3] some pertinent questions to ask about the ‘meaning’ of a melodic statement.

Melody has the following five important characteristics.

1. It’s usually the easiest part of the music to recognise, appropriate and reproduce vocally.

2. Its phrases cover the duration of an exhalation (the extended present again).

3. It’s normally delivered at a rate ranging from that of medium to very slow speech.

4. It’s often articulated with rhythmic fluidity and unbroken delivery of tonal material within one sequence. These properties mean that melody, as tonal monodic movement, is often understood as a heightened form of human speech and as that aspect of music most closely connected to human utterance, both gestural and vocal.

5. In most music traditions of the urban West, melody is the monodic musical foreground to which accompaniment and harmony are generally understood as providing the background. The semiotic importance of this dualism is discussed in Chapter 12 (p. 425, ff.).


A useful conceptual pair when considering vocal melody is syllabic « melismatic. A melisma is a string of several consecutive notes sung to the same syllable; singing one syllable per note is simply called syllabic. Syllabic singing is common in homophonic (p. 338) settings of hymns, as well as in the verse parts of recordings by singer-songwriters. Melismas are common in rock and gospel phrases like ‘oh yeah!’, and in liturgical settings of ‘Kyrie eleison’ and ‘Alleluia’.

Apart from profiles of pitch, rhythm, tonality and melisma, together with whatever they may suggest by way of gesture, affect and connotation, it’s also worth considering the overall ‘melodicity’ of the music under analysis. For instance:

• Is melody important in your AO or is there greater focus on riffs and rhythms, or on long, held sonorities?

• Is the melody mixed up front and centre stage or is it more like an equal part among all the other strands of the music?

• Are melodic lines performed by the same voice[s] or instrument[s] throughout? If not, do the melodic lines occur at the same time (‘tonal polyphony’), or in succession, or do they partially overlap?

• What effects are created by such ways of treating melody?

• How does the treatment of melody (including its absence) relate to the expression of notions of figure and ground (see pp. 425-481)?

There may also be other significant melodic traits. For example:

• Are any motifs, melodic cadences, turns of melodic phrase, or any ‘licks’ indicative of a particular musical tradition or language?

If so is it due to language rhythm or tonal vocabulary (mode)? Or both? Or neither?

• Does any of the melodic material in any way resemble any type of vocal statement or mode of phonation, for example affirming, announcing, bewailing, celebrating, complaining, confiding, confirming, cursing, crying, encouraging, giggling, groaning, hicupping, laughing, moaning, mourning, mumbling, pleading, praising, praying, preaching, proclaiming, ranting, reciting, sighing, stammering, whining?76

Tonal polyphony

Polyphony (without the ‘tonal’) simply means more than one sound at the same time. Singing without accompaniment is monophonic but as soon as you stamp your foot in time with the tune, the music becomes polyphonic. If you get out your guitar and strum a few accompanying chords to your song, or if someone else starts singing along in parallel thirds, you’re creating tonal polyphony.


One simple and very common form of tonal polyphony is the use of a drone to accompany melody. Drones are easiest to understand as the continuous notes that sound at the same pitch throughout part or whole of a piece of music. They act as tonal reference point and background for the changing pitch of the music’s other strands. Drones occur in bagpipe music from many parts of the world and usually (not always) feature the keynote or tonic of whichever melodic mode they accompany. Lower strings on the guitar or fiddle are also used to create drone effects that have a more rhythmic character in that note[s] of identical pitch are repeated at short intervals. Drones are also used in audiovisual productions as a suspension device to suggest either stasis (e.g. the stillness of wide open spaces) or, if booming in the bass register, an ongoing, oppressive threat.



Heterophony is polyphony resulting from simultaneous differences of pitch produced when two or more people sing or play more or less the same melodic line at roughly the same time. Heterophony can denote everything from the unintentional polyphonic effect of unsynchronised unison singing to the intentional discrepancies between vocal line and its instrumental embellishment that are characteristic of much music from Greece, Turkey and the Arab world. Another type of heterophony can occur in the final chorus of trad jazz performances when players improvise their individual variants of the same tune at the same time. An extreme example of multi-strand heterophony can be heard in traditional ‘home worship’ singing from the Scottish Hebrides where each florid improvisation on the same hymn tune is thought to present each individual’s ‘relation to God on a personal basis’.

Homophony and counterpoint

Homophony (pp. 453-454) is the type of tonal polyphony (different pitches sounding at the same time) in which different strands of the music move in the same rhythm at the same time. It’s the antithesis of counterpoint (p., meaning polyphony whose instrumental or vocal lines clearly differ in melodic and/or rhythmic profile. Most hymns and national anthems are homophonic.

Polyphony is homophonic or contrapuntal only by degree. The less concurrent similarity of rhythmic and melodic profile between the music’s strands, the more contrapuntal it becomes. For example, Bach fugues, Renaissance motets, rock recordings, funk grooves and overlapping call-and-response phrases in gospel music all exhibit varying degrees of counterpoint (with some homophony), while hymns, nursery rhyme harmonisations and Sousa marches display varying degrees of homophony (with some counterpoint).



Harmony is popularly thought of as that aspect of tonality which has to do with chords. Chord just means the simultaneous sounding of two or more tones with different note names. A chord consisting of two different notes is called a dyad, of three a triad, of four a tetrad, etc., and a chord of several neighbouring notes is called a cluster.

If counterpoint is imagined as the ‘horizontal’ or diachronic aspect of tonal polyphony, harmony is often thought of as its ‘vertical’ or synchronic aspect, as ‘the chords’. This distinction can be misleading, not only because tonal counterpoint produces chords in the sense just given but also because even the most homophonic types of tonal polyphony are inevitably diachronic. There are two basic reasons why harmony needs also to be considered ‘horizontally’, one being the fact that the individual notes in one chord lead to individual notes in the next one (‘voice leading’). The other reason is that the same set of chords, each of a particular note length presented inside a particular overall duration, don’t sound the same, or have the same effect, as those same chords sounded in a different order with different durations. The point here is that chord progressions constitute a diachronic parameter of musical expression that can signal a musical style as well as a sense of musical movement, flow or direction. Still, let’s first consider the synchronic aspects of harmony.

Chord types and harmonic idiom

If I play the James Bond chord (Em^9) on a piano rather than, as was originally intended, on a Fender Stratocaster treated with slight tremolo and some reverb, many non-musos are still able to identify the sound in terms of a spy chord, detective chord, etc. Such codal competence suggests that a chord’s tonal information can on its own, at least under certain conditions, carry culturally specific connotations. In fact, choosing the right chord type can be just as effective as instrumental timbre (p. 307 ff.) or melodic mode (p. 325 ff.) in establishing a musical idiom, as well as in suggesting moods and environments. I’ve found that many students, muso and non-muso, can, if their attention is drawn to the sonority in question, recognise not only detective chords but also other aesthesically labelled sonorities like the bitter-sweet chord, the romantic pathos chord and Burt Bacharach chords. They can also usually distinguish between drone-based and busily over-harmonised arrangements of folk tunes, between the harmonic idioms of trad jazz and bebop, between Elizabethan and late Romantic harmonies, etc. The problem is in other words not one of aural competence but of poïetic nomenclature (like major minor nine (m^9)) because aesthesic descriptors like bitter-sweet, romantic pathos, Burt Bacharach and twangy folk chords have (as yet, if ever) little or no validity in institutions of conventional musical learning.

Chord progressions

Like the types of chord just described, chord progressions often have semiotic significance. They can indicate a home style, refer out to a ‘foreign’ style and sometimes suggest a mood. They can also act kinetically and syntactically by contributing to the establishment of metre, phrase, period and overall form in a piece of music. It may be useful to think of chord progressions as existing at three levels of duration: [1] short-term shuttles or loops contained within one or two bouts of the extended present; [2] medium-term loops or matrices covering at least one period (several phrases); [3] long-term harmonic narrative.

Chord shuttles

‘Shuttle’ denotes an ongoing oscillation between two chords, ‘loop’ a repeated sequence of (typically) three or four chords. Chord shuttles and loops are common in many types of popular song and dance music. For example, the aeolian shuttle, as heard in All Along The Watchtower (Dylan, 1968; Hendrix, 1968), Whispering Thunder (Cain, 1972), Money (Pink Floyd, 1973) and Chopin’s Marche funèbre (1839), is, in slow to moderate tempo, a habitual harbinger of things dark and ominous. On the other hand, the ‘floating dorian shuttle’ —as heard in He’s So Fine (Chiffons, 1963), Oh! Happy Day (Hawkins, 1972), My Sweet Lord (Harrison, 1972) and, most notably at several brightness points on the Pink Floyd album Dark Side of the Moon (1972)— is a less ominous and a much more open-ended sort of affair.

Chord loops

Among the most familiar loop progressions must surely be the three-chord La Bamba pattern, so common in many types of Latin-American music and the chordal basis of tunes like Guantanamera, Pata Pata, Do You Love Me?, Twist And Shout, Hang On Sloopy and Wild Thing. Just as common is the four-chord vamp loop (also known as ‘vamp until ready’) that accompanied countless ‘milksap’ numbers recorded in the USA around 1960. I’m referring to ‘teen angel’ hits like Diana, Teenager In Love, Poetry In Motion, Oh! Carol, Happy Birthday Sweet Sixteen, Dream Lover and Sherry Baby, as well as the ‘A’ sections of jazz standards like Blue Moon (p. 398) and At Last.

A medium-term repeated chord progression covering several bouts of present-time can be called a harmonic matrix or chord matrix. One of the most well-known chord matrices is the standard twelve-bar blues pattern whose simplest form runs {I I I I IV IV I I V IV I I} where ‘I’ means a chord based on the tonic (keynote or first degree the mode), ‘IV’ a chord on degree 4 of the tonic’s major or minor mode and ‘V’ a chord based on scale degree 5. That means the chords of a simple 12-bar blues in E run {E E E E A A E E B A E E} and in F {F F F F B$ B$ F F C B$ F F}. Performed at 120 bpm, a twelve-bar blues matrix lasts 24 seconds, the matrix’s three four-bar periods each occupying eight seconds. Among other common cyclical harmonic matrices are the New Orleans R&B eight-bar pattern (e.g. I IV I I V IV I I) and the wide variation of chaconne and passacaglia sequences found in the euroclassical tradition, for example in the ever-popular Pachelbel’s Canon.

Large-scale harmonic progression that lends overall tonal structure to the mega-durations of entire pieces of music is dealt with in Chapter 11 under ‘General diatactic schemes’, especially in the sections on the 32-bar jazz standard (p. 397, ff.) and sonata form (p. 409, ff.). 2012-09-28, 19:30




10. Vocal persona


he voice is mankind’s primary musical instrument. Its importance has already been mentioned in conjunction with prosody, with timbre and aural staging, with pitch range and register, and of course with melody. As we’ll see in Chapter 13, voice is also at the basis of several musical sign types, including transscansions, language identifiers and paralinguistic anaphones. The purpose of this chapter is to suggest ways of denoting perceptions of the nonverbal aspects of voice.

Before going any further I need to clarify two points. One is the meaning of persona, the other an explanation of the mainly vernacular source of ideas presented in this chapter.


Person, without the final a, means an individual human being and personality ‘the distinctive character or qualities of a person’. In Latin, Italian and Spanish persona (with the final a), simply means person but in English persona denotes ‘an aspect of the personality as shown to or perceived by others’. Actors, singers and other types of performer aren’t the only ones to present personas2 because we all have to assume different roles in different situations at different times of life. Here are sixteen examples from my own life: [1] child in relation to parents; [2] parent in relation to a child; [3] student in relation to teachers and [4] fellow student; [5] teacher in relation to students and [6] colleagues as well as [7] administrators; [8] lover; [9] husband; [10] good friend; [11] reasonably ‘angry young man’; [12] even more reasonably (and cheerful) ‘angry old man’, latterly also ‘benevolent but eccentric patriarch’; [13] ‘one of the guys’; [14] classical musician; [15] rock musician; [16] solitary writer of academic texts like this.

It’s not always easy to adopt the right persona in the right situation, especially if the role expected of us has to change, for example from child to parent or from student to teacher, but there’s nothing intrinsically dishonest or schizophrenic about our ability to adapt to the appropriate role in the appropriate situation. On the contrary, it’s an essential social skill. That’s why vocal persona should not, in what follows, be primarily understood as role play in the sense of putting on a vocal front, although that may sometimes be the case, but as any aspect of personality as shown to or perceived by others through the medium of either prosody or of the singing voice.

Vernacular sources

The ideas presented in this chapter derive less from the wealth of scholarly writing on voice, much more from having run popular music analysis classes for many years. Insights gained from that experience are supplemented with observations about how voice seems to be described in music reviews, album inlays, in ads for voiceover artists, even in casual conversation. All these vernacular sources for the verbal description of voice share a common trait: unlike the poïetic terms designating musical structure defined by parameters of pitch, tonality, metre and episodicity, descriptions of voice, like those of timbre, are mainly aesthesic. This tendency may well be due to the fact that conventional music studies have yet to succeed in establishing a systematic and widely accepted terminology for vocal expression. There’s simply very little by way of poïetic jargon to intimidate non-musos, many of whom may struggle with the designation of music’s tonal aspects but who can often excel at timbral and vocal description.

Another ‘democratic’ aspect of voice as part of musical analysis is that it’s an instrument we all use in a musical way —prosodically — every time we speak. Most of us are experts at using our voices, not just to utter words but also to present our individual or group identity, and to express emotions, attitudes and behavioural positions (vocal personas). That’s why I’ll start with the music of the spoken voice, more precisely with my mother, followed closely by Robert De Niro.

‘Don’t worry about me’

When I was a child my mother would sometimes say ‘don’t worry about me — I’m fine’ in a very sad voice. I remember the confusion that statement caused me. Did she mean the words don’t worry about me — I’m fine or should I pay more attention to the music (prosody) in her statement: Please worry about me — I’m miserable?

The second interpretation was probably nearer the truth than the first, not least because of the narrative context of her statement: she wasn’t always a happy person. She might have been feeling unwell or have just been involved in a domestic disagreement. Another reason for prioritising the ‘music’ of her statement was that her facial expression, body posture and gestures (in this case a lack of gesture), all aligned with her vocal timbre, volume, intonation, diction and speech rhythm but contradicted the meaning of her words. With a child’s understanding of words and reason as privileged modes of symbolic interaction among grown-ups (although I wouldn’t have put it that way at that time), I remember opting to take my mother’s don’t worry about me at lexical face value. That decision once prompted my father to chide me for being insensitive. I didn’t know what ‘insensitive’ meant but it didn’t sound good, so I reverted to a more instinctive (or childish?) mode of interpretation, paying more attention to mother’s ‘music’ and less to her words. Unfortunately, reading her statements on the basis of their ‘music’ (timbre, volume, inflexion, posture, facial expression, etc.) and ignoring her words also turned out to be wrong, because if I responded to her plaintive tone by asking ‘What’s the matter?’ in a sympathetic tone of voice, I risked insulting her pride and hearing her retort: ‘I said I was fine. Why do you never listen to what I say?’

It took me many years to realise that I could interpret my mother’s [plaintive voice ®] don’t worry about me — I’m fine [normal ®] as an integral statement, despite its mixed message. She actually meant:

I’m very sad and I find it hard to put on the brave face of self-control I know that grown-ups should. So, please show me some kindness while respecting the fact that I at least know I’m supposed to put on a brave face, even if I expect you to see through it.

That statement would have taken mother much longer and have demanded an unrealistic amount of reflective self-control. Her ‘mixed message’ was in that sense more efficient. I was simply slow to learn that you could consider the narrative context, scene, body language, the words and the music of my mother’s mixed messages as a whole. It was a musogenic statement like the clear but complex musical moods mentioned in Chapter 2. I’m referring to those ‘pallid verbal approximations’ like desperately troubled in the midst of calm and beauty, or sick of the world and feeling alive because of that disgust.

The don’t worry about me anecdote illustrates three important points about musical meaning, the first two of which have been discussed earlier. This chapter focuses on the third point.

1. Musical meaning is never created by the sounds on their own. They always exist in a syntactic, semantic and socioculturally pragmatic context upon which their semiosis depends.

2. Precision of musical meaning does not equal precision of verbal meaning or that of any other symbolic system. Its apparent contradictions of verbal meaning (pp. 66-67; 167, ff.) should be understood as musically coherent.

3. Vocal timbre, pitch, intonation, inflexion, accentuation, diction and volume, plus the speed, metre, rhythm and periodicity of vocal delivery are parameters of expression conveying information about the sociocultural and personal identity (including meta-identity) presented by speakers or singers, as well as about their attitudes, feelings and emotions (i.e. their vocal persona).

‘Are you talking to me?’

The third point just listed is illustrated in the video Vocal Persona Commutations (E OL7uc6L5nMQ) which uses a twelve-second extract from the film Taxi Driver (1976) to highlight central aspects of links between voice and personality. In that twelve-second extract, Travis Bickle, the film’s taxi-driving main character played by Robert De Niro, has just exercised his second-amendment right and acquired a gun to bolster his confidence when faced with the miscreants he meets in his job. In the clip he prepares to confront such scumbags by rehearsing the famous line ‘are you talking to me?’ in the mirror. It’s worth examining the twelve seconds it takes De Niro to ask the question three times, including pauses, in order to discover which parameters of vocal expression communicate what. It’s also worth testing which voices can and cannot be substituted for De Niro’s in that famous scene so as to reveal the extent to which vocal persona is dependent on congruence with such factors as gender, ethnicity, age, social position, personality, clothing, opinion and attitude, acoustic distance and setting.

Leaving aside gesture, posture and facial expression for the moment and concentrating solely on the sound of De Niro’s voice, minor differences of inflection, intonation, volume and accentuation can be discerned between the three variants of are you talking to me? In the first variant his voice is low-key but quite rapid with the quick but substantial rise of pitch normally used in English to pose questions expecting the answer yes or no; but it does sound sudden, as if he had been taken off guard. The second utterance is slightly slower, a little more deliberate and has clearer diction, suggesting that the imaginary low-life interlocutor did not take him seriously the first time. The third utterance is once again quite contained but includes more emphasis on ‘me’ and a little less on ‘talking’. This shift in accentuation underlines personal involvement in the imagined encounter. Apart from these minor variants, it should be noted that De Niro does not raise (the volume of) his voice in anger or frustration, and that his is the normal voice of a young, probably white, North American, English-speaking male. In fact, without the narrative context and without De Niro’s body language, there is nothing remarkable about his vocal persona in this scene any more than there is about Travis himself, even though his lack of charisma may be what makes him narratively interesting.

Given that this relatively normal, neutral and uncharismatic personality has a correspondingly normal, neutral and uncharismatic vocal persona, it ought to be possible to replace his voice with others in order to discover which vocal elements are compatible or incompatible with which other simultaneous aspects of non-verbal communication.

The fact that we’re in a noisy kitchen and that Travis is white, unshaven and wearing what appears to be a grey flannel air-force jacket tells us quite a lot. It certainly rules out several of the persona substitutions in the Vocal Commutations video. It’s obvious that we’re not hearing/seeing a child, nor a woman or old man. It isn’t anyone African-American or East Asian, nor anyone from the higher echelons of society (unless they’re slumming it). Nor can it be a samurai warrior from the sixteenth century or a young executive in Qatar or Saudi Arabia. The visuals also rule out robots, death-metal monsters, chipmunks or anything else that doesn’t look or sound like a Caucasian male, a member of the popular classes, and aged between 25 and 45. But there’s more visual information restricting the vocal commutation possibilities.

Since De Niro is about one metre away from the camera, convincing alternative voiceovers cannot sound too close or too distant. For example, the repugnant intimacy of the lecherous dirty old man voice in the commutation video only works if De Niro’s face is in extreme close-up. Obviously, then, one parameter of expression for vocal persona is perceived proximity. Another parameter is acoustic space. The commutation video’s monster and evil god voices, for instance, have been given cavernous reverb incompatible with the size and acoustic properties of the cluttered kitchen we see on screen.

The first time Travis asks the famous question he is at the far right edge of the screen with his body facing screen left. He turns his head towards us, as if just having heard something coming from the direction of the camera. He looks surprised, his eyebrows are raised and his head tossed back a bit. It’s the look of someone literally taken aback. However, there is nothing except the immediate narrative context that rules out the possibility of pleasant surprise, which is why the commutation video’s first baby talk voiceover works well if viewers imagine the camera being the baby’s point of view and that the De Niro character is a proud father, suprised and delighted by his infant’s contented gurgling as he walks past.

For the second version of the question De Niro has half turned toward the mirror/camera, tossed his head back a bit more and raised his eyebrows higher. Once again, it’s mainly the narrative context that rules out a possibly positive interpretation of Travis’s body language and which lead us to believe that this more clearly ‘taken aback’ posture is more likely to express affront and irritation than surprised delight. Even his teeth, visible for a short moment in an unsmiling mouth, suggest confrontation. He also seems to be looking down his nose at his imagined interlocutor, and since his diction and accentuation are slightly more forceful than before, the baby talk voiceover of the delighted dad is less convincing here. Furthermore, the despondent, depressed and weak vocal persona substitutions align badly with De Niro’s posture, facial expression, accentuation and diction during these three seconds.

The third version is gesturally the clearest. His body is turned a little more towards the camera as he points to his own chest in sync with ‘to me’. Again, prior knowledge of the Travis character will likely lead viewers to see his grin as insolent, and his hand gesture as expressing personal affront. However, without such prior knowledge and with the addition of a few sonic correctives to the narrative (gurgling baby, the mother’s ‘aaah!’), are you talking to me?, spoken by a delighted and proud father, aligns quite convincingly with this third variant of the famous question.

Several vocal persona commutations don’t work because of problems with lip sync. For example, stereotypical robot voices, as we saw earlier (pp. 281-282), apply equal durations for each syllable, while depressed and despondent statements are delivered at a slower rate than that of are you talking to me? spoken normally. Besides, a depressed voice is usually accompanied by depressed body posture and facial expression —drooping shoulders, head hung low, eyes looking down, no eye contact, etc. Lip-sync problems also demonstrate that whispering and other types of vocal close-up are incompatible not only with the lack of extreme visual close-up in the Taxi Driver sequence but also with its speed of delivery. Whispering has to be slower than talking because it must compensate for the absence of voiced consonants and the full transients that identify vowel sounds, while intimate statements delivered forcefully at breakneck speed sound ridiculous.

Poïetic, acoustic and aesthesic descriptors

None of the observations just made about are you talking to me? in the Vocal Persona Commutations clip should come as a surprise.

‘[L]isteners who hear voice samples can infer the speaker’s socio-economic status…, personality traits,… and emotional and mental state… Listeners exposed to voice samples are also capable of estimating the age, height, and weight of speakers with the same degree of accuracy achieved by examining photographs… Independent raters are also capable of matching a speaker’s voice with the person’s photograph over 75% of the time.’ (Hughes et al., 2004: 296)

Indeed, the relationship between an individual voice and its unique personal identity has given rise to the voice print branch of the security industry with its biometric claims about defeating credit card fraud or ensuring ‘that prisoners incarcerated in their homes or out on temporary passes [are] where they were supposed to be’. Whether or not the sales spiel of voice print marketeers has any validity isn’t the point here, although incredulity may be warranted, bearing in mind the technical crudity and socio-linguistic stupidity of most corporate ‘voice recognition’ systems. The point is that insights about congruence between individual voice and personal identity are nothing new. Indeed, the very word person contains the morpheme son, meaning sound, and Latin’s personare literally means to sound (sonare) through (per), to sound forth, etc. Moreover, the original meaning of the Latin word persona is ‘a mask… as warn by actors in Greek and Roman drama’. Its transferred meanings of performed role, personality, etc. derive from the fact that revealing the true nature of a dramatic character involved projecting the voice of that individual through the mask worn by the actor playing that role. His or her voice had literally to sound (sonare) through (per) the mask —vox personans— out into the auditorium, into the audience’s ears and brains.

Links between voice and personality are also clear from numerous online searches for terms like voice, vocal, persona and personality. Although descriptive adjectives of voices were, as we shall see, far from uncommon, another frequently recurring type of voice characterisation related, unsurprisingly, voice to personality. Among the more striking examples found of persona descriptors of Anglo-US singing voices were (artists in brackets) hard-edged sexual exuberance (Chaka Khan), impish chirp (Katryna in The Nields), [they looked and sang like] Barbie dolls (Wilson Philips), cuddly vocal personality (Beverly Sill), a nervous teenager, fearful of being rejected (Buddy Holly), an angry Smurf (Eminem) and the Western mythical girl/woman, heartbroken yet resilient and entirely feminine… [with an] edge between vulnerability and willfulness (Linda Ronstadt).

The voice descriptions just listed sound neither serious nor scientific. They’re more likely to come across as spuriously subjective, at best as amusing or imaginative. That’s an understandable objection but it needs to be moderated in the light of four points made so far: [1] the fact that ‘[i]ndependent raters are… capable of matching a speaker’s voice with the person’s photograph over 75% of the time’; [2] the apparent commercial success of voice print companies; [3] the patterns of congruence and incongruence in the Taxi Driver commutation clip; [4] the etymology of the word person[a] itself. Those four points suggest that patterns of linking voice with personality do exist and that such links can be verified interubjectively in given cultural contexts. We’ll return to these links and to their usefulness in discussing the ‘meaning’ of singing voices, but it’s useful to be first aware of other approaches to the issue of describing vocal sound.

The ‘musical’ properties of vocal sound, spoken or sung, can in general be understood and verbalised using one or more of three main perspectives: [1] the physical techniques of its production (poïetic perspective); [2] its measurable physical attributes as sound (acoustic); [3] its perception, interpretation and effects (aesthesic).

The poïetic perspective focuses by definition on how particular parts of the human body are used to produce particular vocal sounds, e.g. larynx, throat, mouth, jaw, tongue, nose, lungs, diaphragm, shoulders, chest, head. Recurrent concepts are breathing, control, projection and register (chest, mixed, head, falsetto). Now, as we’ll see later in this chapter (p. 376 ff.), the ability to reproduce, at least roughly, a vocal sound can help us understand its meaning. That’s why some familiarity with the physical implications of the terms just mentioned can be useful in identifying the body posture (shoulders, chest, head, etc.) and facial expression (mouth, jaw, nose, etc.) most conducive to the production of a particular vocal sound. That knowledge in its turn contributes to insights about the emotional state of the person[a] behind the vocal sound in question.

The acoustic perspective focuses on the physical properties of vocal sound, i.e. on volume (dynamics, intensity) and timbre (attack, decay, fundamental pitch, overtones, etc.). The number of possible variations in these quantifiable parameters is virtually infinite; their combination forms the physical basis of the enormous variation of sounds that human voices can produce and of how those sounds are perceived. Now, there’s no room here to explain even the rudiments of acoustic physics in relation to the human voice and its perception. Readers are instead referred to a wealth of literature dealing with correlations between the measurable physical properties of particular sounds and their perception. That said, basic awareness of parameters like fundamental pitch, overtones, intensity, attack and envelope can, by drawing attention to the physical properties of a particular sound, refine procedures of commutation (e.g. changing timbre to check on possible changes of perceived effect) and lead to greater precision of semiotic analysis.

The aesthesic perspective is characterised by how sounds are perceived, interpreted, reacted to and used by those who hear them. Since this book is aimed primarily at music’s users I’ll try, in what comes next, to sort out the various ways in which we seem to verbalise our perception of different voices. Then, after an excursion discussing basic differences between speaking and singing, the chapter will end with suggestions about how categories of vocal persona can be used in the semiotic analysis of music.

Aesthesic descriptors

Between 2005 and 2008, I trawled cyberspace for websites containing various combinations of voice, vocal or voiceover and including words like quality, timbre, persona, personality, attitude and character. In addition to having annoyed students, friends and colleagues by asking them to describe voices to me, I also took an interest in vocal casting, a specialist profession in which verbal descriptions of voice play an essential part. For example:

‘Seeking voiceover talent who can recreate a female witch voice… [The] project involves an English dub of a Russian animated feature… The witch is very old, around 70. Also seeking a counsellor voice. High pitched and whiny,… middle-aged.’13

Here’s a character description circulated by a Hollywood agency looking for computer game voiceover artists.

‘X is the comically annoying, shape-shifting spirit of an ancient Druid Priest who serves as a kind of guide to [the hero] throughout the ages, as well as being a bothersome pest. He pops up unexpectedly to give advice, frequently at less than opportune moments, although he basically means well. He has a sarcastic, dry wit and is an irritating, amusing, occasionally caring and sincere presence that [the hero] has little choice but to tolerate throughout time. Since he can become anyone or anything, he exhibits a wide variety of voices and personalities. [This character is] “a sophisticated elder” voice in the range of Sean Connery or Ian McKellan, as Gandalf in Lord of the Rings, with comedic undertones. Vocal Quality: should be older and wise-sounding, but also with a “Celtic”-type accent.´

That neither of these adverts describe voice from the poïetic or acoustic perspective is hardly surprising since the jobs aren’t for musicologists, singing teachers or acousticians. On the other hand, the paucity of aesthesic sound-descriptive words does seem a little strange —just high-pitched and whiny for the counsellor and nothing else. Is this type of descriptor less relevant than others when advertising for a voice relating to a specific dramatic personality? To answer that question it’s best to have an overview of the basic categories of aesthesic voice description. These categories are based on observations made from: [1] student comments in popular music analysis seminars since 1992; [2] online descriptions of speaking and singing voices; [3] comments from a voice casting agent in direct response to specific questions (p. 359).14 Table 10-1 (pp. 356-357) includes examples of descriptors from these three sources, arranging them in the following four principal categories.

[1] Sound descriptors denote perceived qualities of sound and are of two types: [1a] directly sound-descriptive adjectives and verbs; [1b] genre descriptors referring to the musical style and by extension to the genre associated with particular types of voice.

[2] Transmodal / synaesthetic metaphors like rough, smooth, velvety and gravelly connote sound on the basis of homologies from senses other than hearing. These synaesthetic descriptors are like anaphones in reverse in that they denote mainly kinetic and tactile sensations that are transferred to the perception of sound.

[3] Persona descriptors seem to be the most common type of vocal characterisation. They can be divided into four subcategories.

Subcategory 3a in Table 10-1 (p. 357), named persons with distinctive voices, is often found in reviews, presumably to give readers an idea of what sort of vocal sound to expect from a recording they have yet to hear. My unjustifiably disparaging remark that Portishead’s Beth Gibbons, in Western Eyes (1997), sounds like an under-age Billie Holiday belongs to this descriptive subcategory.

Subcategory 3b in Table 10-1, demographic descriptors, covers the gender and age, as well as the ethnic, cultural, social and economic background, of the vocal persona in question. These descriptors are very common in characterisations of both singing and speaking voices.

Subcategory 3c, Psychological, psychosomatic and emotional descriptors (p. 356), are the most common of all. They qualify or allude to the feelings, attitude and morality, and to the state of mind or body of the vocal persona in question.


Table 10-1. (a) Aesthesic voice description categories with examples

1. Sound descriptors

1a. Directly sound-


adjectives high-pitched, whiny;* squeaky, booming, low-pitched, deep, full-throated, gruff, breathy, husky, guttural, distinct, harsh, indistinct, muffled, plaintive, rasping, roaring, shrill, stammering, loud, declamatory, soft, quiet, monotone, lispy, bird-like, hoarse, throaty.

and verbs babble, bark, bawl, belch, bellow, bleat, blubber, boom, buzz, cackle, caterwaul, chant, chatter, chuckle, chirp, cluck, complain, cough, croak, croon, cry, declaim, denounce, drone, exclaim, gargle, gasp, giggle, growl, grumble, gurgle, hiccup, hiss, hoot, howl, hum, lament, laugh, lilt, moan, mumble, mutter, praise, preach, proclaim, pronounce, quack, quip, rant, rap, recite, roar, scream, screech, shout, shriek, sigh, snap [at], snarl, snigger, snore, snort, sob, spit, splutter, squawk, squeak, stammer, stutter, twitter, ululate, wail, warble, weep, wheeze, whimper, whine, whinge, whisper, whistle, whoop, yammer, yap, yawn, yell, yelp, yowl

1b. Genre-


descriptors e.g. blues shouter, Bollywood vocalist, cantautore, cantor, chansonnier, crooner, death metal growler, dramatic ballad star, fadista, folk singer, gospel artist, Irish tenor, jazz vocalist, lyrical soprano, muezzin, opera diva, payador, rapper, singer-songwriter, troubadour

2. Transmodal descriptors (anaphonic/synaesthetic descriptors)

abrasive, angular, bouncy, brassy, clean, clear, creamy, effortless, full (-bodied), grainy, gravelly, hollow, laid back, meaty, piercing, rasping, relaxed, robotic, rough, rounded, sandpapery, scratchy, shaky, sharp, smooth, stilted, strained, sweet, textured, thick, thin, velvety, wobbly,


Subcategory 3d, archetypal descriptors, combines traits from all the other categories into personality tropes, sometimes in the guise of professions (priests, teachers, etc.), more often as narrative roles (heroes, villains, victims, lovers, parents, sages, witches, wizards, fools, tricksters, etc.). This subcategory has obvious advantages and drawbacks. Consider, for example, the following extract from a review of the 2005 Audio Bullys album Generation.

‘[T]he intro welcomes back Simon Franks’ pot-smoking, pill-popping, wife-beating, bottle-lobbing, “yes I do live on a council estate thank you very much”, vocal persona’…


Table 10-1. (b) Aesthesic voice description categories with examples18

3. Persona descriptors

3a. Named

persons with

distinctive voices e.g. Sean Connery or Ian McKellan;* Clint Eastwood, the Clint-Eastwood-is-Dirty Harry guy, The Smurfs, Donald Duck, R2-D2, Richard Attenborough, Orson Welles, Morgan Freedman, Billy Holiday; Elvis Presley, Adele, Kate Bush, Björk, Maria Callas, Elba Ramalho

3b. Demographic e.g.| female, male; | very old, around 70, middle-aged, older; young, child | ‘Celtic’ accent; African American, French, Asian, Southern [US], British, upper class, working class, well spoken, from the country/slums, slang, regional accent





somatic &


traits means well*, caring*, sincere*, kind, friendly| cute, cuddly, sweet, nice | wise, intelligent, controlled, confident, regal | arrogant, dramatic, over-the-top, extravert, provocative, ecstatic, orgasmic | willful, determined, courageous | energetic, flamboyant, bubbly, cheeky, cheery, comical*, coquette, jaunty, playful, keen, eager, sassy, interested| interesting, complicated, quirky, annoying,* bothersome,* eccentric, cartoony | hip, cool, sophisticated,* sensual, seductive, sexy | vulnerable, embarrassed, scared, edgy, nervous, angry, frustrated, irritated, exasperated, bitter | dark, mysterious, introvert | sad, depressed, heart-broken, miserable, anguished | melancholy, bored, bland, nondescript, neutral |intimate, subdued, laid-back, relaxed, soft spoken, humble, simple, innocent, childlike | angelic, ethereal | raw, rude, tough, rugged, gritty, macho, aggressive | devious, slimy, sleazy, nasty, evil, petty | sardonic, sarcastic,* dry wit,* ironic, acerbic


Professions, roles and

archetypes witch,* counsellor,* Druid Priest,* guide,* elder* | little girl, heroine, leading woman, loving mother, devoted wife| evil queen, witch, violent bitch, pretty princess, Barbie doll, vamp | villain, big boss, gangster, lager lout, hooligan, dirty old man | little boy, hero, father figure, leading man, wise old man | monster, alien, robot | sissy, miser, imp, evil child, suicidal student, nervous teenager, wiseguy, nerd, geek.


Even though pot-smoking, pill-popping, wife-beating and bottle-lobbing may derive from the duo’s lyrics, those epithets also connote the sort of voice many urban UK residents would, in 2005, associate with (male) slob behaviour (uneducated, careless, thoughtless, self-centred), not least because the activities of wife beating and bottle lobbing imply a particular (and particularly impaired) emotional state, as well as specific body postures, breathing patterns, etc. Restricting ourselves to words listed in Table 10-1, it’s much more likely that the vocal persona in question is loud and booming rather than soft or muffled, brassy rather than wobbly, working-class rather than upper-class, arrogant rather than humble, over-the-top rather than subdued, etc., in fact the sort of voice associated with football (soccer) hooligans (typically loud, male and working-class) and lager louts (vocally similar to football hooligans but with bottle lobbing as a likely additional trait).

The advantage of epithets like bottle-lobbing and lager lout is that they each encapsulate in a single concept a wealth of behavioural, psycho-social and vocal characteristics. The disadvantage is that descriptors like lager lout are culturally restrictive: only those familiar with particular aspects of UK popular culture in the post-Thatcher era will grasp the relevant social and vocal implications. As for the final epithet, the ‘yes I do live on a council estate thank you very much vocal persona’, it would take another chapter to convincingly explain council estate and its relevant connotations, yet another to provide a viable socio-linguistic analysis of ‘Yes I do live’… and the final ‘thank you very much’. In short, while the semantic efficiency of such epithets is undeniable within a restricted socio-cultural sphere, their connotations may well be meaningless to the rest of humanity, unless adequate equivalents can be identified in other cultural contexts.

Despite problems of cultural specificity, there is little doubt that aesthesic descriptors are in much wider general use than their poïetic or acoustic counterparts and that persona descriptors, especially the demographic, psychological and archetypal subcategories, are particularly popular. This observation was substantiated by Dawn Hershey, a Hollywood professional specialising in vocal casting for video games and animated productions for film and TV. Here are two abbreviated extracts from email correspondence I had with Dawn on the subject.

What problems do [producers] have in describing the type of voice they want?

The biggest problem they have when they first contact me is that they… describe body type, hair color [etc.]… I often need to ask more questions, such as age, accent, vocal quality, personality traits, quirks, and temperament…

How often do you or they refer to voices in terms of character archetypes?…

Almost always. Most frequently requested are little boy, little girl, 20s heroine, 20s hero, leading man, evil queen, villain, monster, alien, soldier, wise old man, big boss, fat cat, gangster.

Of course, none of the aesthesic vocal description categories discussed so far are mutually exclusive. For example, a particular kind of witch voice (description category 3d) might also be described as high-pitched and cackling (category 1), scratchy and piercing (2), as sounding like an angry and evil (3c) eighty-year-old (3b) version of the Annette Benning character in American Beauty (3a). Moreover, many descriptors bridge two or more categories: rasping, for example, may be most commonly used to qualify sound (category 1), but the act of rasping (using a rasp as a coarse file in the original sense of the word) has as much to do with touch and movement (category 2) as with sound. Similar observations apply to words like scratchy, piercing, clean, shaky, strained and gravelly. Indeed the whole point of introducing the categories just mentioned isn’t to create some sort of watertight taxonomy —a fruitless task in view of music’s synaesthetic properties (p. 62 ff.)— but to provide insights into the various ways that vocal sound is popularly perceived and described on an everyday basis. The aim of that exercise is in its turn to develop richer and more nuanced descriptions of what a vocal sound can communicate.

As endnote to this section it’s worth mentioning the rich store of vocal personas exploited in consumerist propaganda. You only need think of the motivational football coach voice hyperventilating about ‘all the fantastic bargains’ (‘Only 99.99!’… ‘And that’s not all!’… ‘Hurry!’… ‘Get yours now!’, etc.), or of the hard-boiled no-nonsense serious-business tough man of action film trailers (‘Clint Eastwood is Dirty Harry’ etc.) to get the idea. Then there’s the female best-friend voice telling ‘the girls’ how to lose weight by buying low-fat cereal brand X, the breathy lips-in-your-ear voice seducing you to buy super-silky hair product Y or to stuff your face with super-smooth creamy chocolate type Z. And don’t forget the cheerful but matter-of-fact young mother enthusing about supermarket A or microwave meal brand B. The list could go on forever. The point is that this supply of regrettably recurrent vocal stereotypes in commodity fetishism can be a very useful source of vocal persona descriptors, as long as you’re sharing your observations about voice, spoken or sung, with others steeped in the same consumerist media culture as yourself.

Vocal costume

‘[C]lothing for a particular activity’ or ‘an actor’s clothes for a part’ are, according to The Oxford Concise English Dictionary (1995), two common meanings of the word costume. With expressions like national costume, notions of group identity are added to the concept. In simple terms of perception, someone wearing a swimming costume is probably dressed for swimming (although it may be just a photo shoot), someone wearing the garb of a sixteenth-century Italian nobleman might be acting in Shakespeare’s Romeo and Juliet (or just going to a fancy dress party), and a man in a tartan kilt and tweed jacket might have intimate ties with the Scottish Highlands (or be a tartanry fake). Costume is etymologically related to custom (‘a particular established way of behaving’) and semantically to the noun uniform, meaning ‘distinctive clothing worn by members of the same body’, i.e. another type of costume signalling group identity.

Vocal costume is a metaphorical expression meaning those aspects of phonation serving the three same sorts of function as literal costumes do: [1] to more easily carry out a particular activity; [2] to assume a role or to act a part; [3] to signal a particular group identity and/or to conform to a given set of cultural norms. Vocal costumes are something people put on like clothes for any or all of the reasons just mentioned: they are used on an everyday basis in both speaking and singing, as, I hope, the next section will illustrate.

Spoken costumes

Phone voices provide a rich resource for studying vocal costumes, most probably because talking on the phone involves a particular type of sensory dislocation. It’s one-to-one audio close-up (if the line is good) but without the visual, kinetic and potentially tactile aspects of one-to-one close encounters. A phone call takes place in the intimate acoustic space determined by the minimal distances between earpiece and eardrum, between lips and mouthpiece. Like it or not, we are at sonic kissing distance from our telephonic interlocutor down the road or on another continent. Such sensory dislocation may be less problematic when phoning ‘friends and family’ but it requires corrective measures if we’re on the phone to someone we don’t know, maybe talking to a representative for a large corporation or public institution. In these types of telephone encounter vocal costumes can come in handy.

When phones were a novelty in UK homes after World War II, many people of my parents’ generation put on a special vocal costume when answering the phone. It was a more posh, more official-sounding voice whose diction, vowel sounds and intonation resembled that of BBC radio announcers or newsreaders of the day. These closely miked but widely broadcast official voices, by occupying the public space of the then contemporary media, seem to have been taken to represent a sort of common ground for close-up speaking with which everyone was familiar. Of course, since this vocal costume was also that of the old British establishment, it was not the most comfortable clothing to wear and was usually dropped when the person at the other end of the line was identified as more ‘friends and family’ than ‘authority’. Moreover, the old-establishment BBC voice later became an anomaly in the wake of socio-economic change leading to the use of other vocal costumes. Technological development played a central role in this process.

As the number of radio channels increased, and as TV and hi-fi recordings became part of both individual and domestic acoustic space, the repertoire of closely miked but widely disseminated voice types available for use as vocal costumes expanded radically. Consumerist propaganda was not slow to start using particular voice types corresponding to the intersubjectively verifiable and exploitable desires of a particular demographic. Those voice types are often used today in automatic phone ‘dialogue’ and ‘voice recognition’ systems. Or, as one EU-funded eCommerce document puts it:

‘Advertisers adopt different strategies depending on the product they are selling and the intended audience. The same is true for creating automated telephone service dialogues.… Two of the [phone answering] personalities [‘John’ and ‘Kate’] were created with the intention that they would portray younger, more streetwise [bank] agents and therefore would appeal to younger users.’

This sort of vocal costume marketing has led to telecommunications catastrophes like ‘Simone’ (Virgin Mobile USA), ‘Claire’ (Sprint), ‘Julie’ (Amtrak) and ‘Emily’ (Bell Canada). While each pre-programmed vocal persona initially sounds like an attractive, engaging, educated, helpful young woman, she turns out, in the reality of dialogue, to have the brains of a pea and the socio-linguistic skills of a drainpipe. However, so blind is the faith of corporations in the hocus-pocus of vocal pseudo-personalisation that huge amounts of consumer time and corporate money by replacing human beings with machines. That said, although ‘John’, ‘Kate’, ‘Simone’, ‘Claire’, ‘Julie’ and ‘Emily’ are mere vocal drapes covering dummies in a sonic shop window, vocal costumes can serve some purpose, even inside the field of telephony, as long as no false claims are made about ‘interactive dialogue systems’. For example, calling Milan’s Radio Taxi 8585 in 2008 triggered a hold message advising you not to lose your place in the phone queue. The recorded voice sounded like that of a coquettish female secretary with a hidden laugh of flirtatious complicity in her tone; or, as a Milanese friend put it:

‘It’s as if she’s saying to male customers “who knows what you and I could get up to while you wait?”… It’s not the voice of a mother —that would sound too old— or of a wife because that would be no fun. It’s closer to the voice of an attractive and well-spoken lover… They assume of course… that most customers are men in need of flattery.’

Outside the weird world of brand-fixated, market-driven automated telephony, vocal costumes are simply a very real part of everyday life. If you have to address a crowd of people and there’s no microphone, or if you have to keep order in a primary school class, or if you have to make your bid heard in a capitalist casino (stock exchange), you’ll have to put on a vocal costume to do your job and to avoid causing long-term damage to your larynx. Hopefully, you’ll change into a softer, happier, more sing-song costume (‘motherese’) when you talk to your baby child, into something less lilting when you have to answer important job interview questions, into something more contrite yet competent when you have to explain why you are late delivering work to your boss, and so on. Or perhaps you’re a psychoanalyst dealing with a highly strung patient, in which case you may well be tempted to put on your psychologist’s vocal valium costume. If you do, your patient will hopefully be less likely to throw a fit and, even if he/she does start kicking and screaming, you can at least pretend to keep your calm.

Attentive readers will already have noted that public speaking voice, primary school teacher voice, a lilting parent voice (motherese), the psychologist voice (‘vocal valium’) and the earnest interviewee voice are all aesthesic vocal descriptors, more precisely persona descriptors designating professions, roles or archetypes. Those labels act as shorthand not just for a type of person (teacher, trader, psychologist, parent, etc.) but also for the type of voice associated with that type of person in particular circumstances. One final example of spoken vocal costume should clarify the issue once and for all.

Before I first went searching for vocal persona-related concepts in 2005, I’d never heard of the girlfriend voice. The online Urban Dictionary defines it as ‘[t]he change in pitch or tone of a man's voice when talking to their significant other’. The dictionary continues:

‘The girlfriend voice is characterised by a higher pitch and a more effeminate tone with speech patterns scattered with pet names and childish words. This type of speech is usually frowned upon when used in the presence of other men’.… ‘When he answers his phone and it's a guy, he uses his normal voice, but when he sees that it's his girlfriend calling, his voice instantly climbs several octaves and acquires a whiny, please-don't-be-mad-at-me tone. He's also the kind of guy who, when he gets on the phone with his girl, immediately walks away from the group, leaves the room, or tells everybody to shut up so he can talk.’

Even if ‘several octaves’ is a gross exaggeration, this explanation of the girlfriend voice provides a clear example of all three functions of vocal costume. It involves traits of phonation that firstly enable the man adopting it to more easily carry out a particular activity, in this case that of talking to his ‘significant other’ in the way he imagines will please her. Secondly, the same man vocally assumes the role and acts the part of boyfriend rather than that of ‘one of the guys’. Thirdly, he signals that he belongs to the social sphere of the couple by vocally conforming to the cultural norms of conversation considered appropriate for that sphere of interaction, even to the extent of walking away from his male peers and telling them to shut up.

Sung costumes

Although pitch, loudness, timbre and tempo are parameters of expression common to both speech and music, and although prosody is a key element in music’s cross-domain mode of representation (p. 62 ff.), there is apparently no language unable to distinguish in some clear way between what we call speaking and singing. If that is so, what’s the actual difference between the two?

Singing as costume

Differences between speaking and singing can be understood in two general ways: [1] in terms of use, function, context and connotation; [2] in sonic terms. We’ll start with the first of those.

If someone changes vocal mode from talking to singing you can say they ‘burst into song’ but no-one ever says that they ‘burst into speech’ from song because speech is in most situations the default vocal mode. The idea of song as an exceptional, special or heightened form of vocal expression can be understood in four ways.

1. Being airborne. This is the popular notion of song as vocal expression at literally a higher level, either as air (air is a synonym and aria (= air) the Italian for a tune), or as something carrying us up into the air, so that we are borne on the ‘wings of song’, ‘flying’ (volare), singing (cantare), ‘in the blue’ (nel blu), ‘happy to be up there’ (felice di stare lassù), etc.

2. Special occasions. People in the urban West tend to sing more on special occasions than in their day-to-day lives. We don’t usually burst into song while filling out tax returns or having lunch with workmates; but we might well sing at birthdays, weddings, funerals, the New Year, or on a night out in a karaoke club. We are also more likely to sing in patriotic or religious contexts where some aspect of ritualised transcendence is the order of the day.

3. Heightened emotion. Circumstances of heightened emotion such as lulling your little child to sleep, falling in or out of love, righteous indignation, erotic arousal, deep sympathy or sorrow, painful separation, great elation, bitter resentment, angry alienation, wondrous amazement, suicidal depression, blissful contentment, etc. are more liable to bring on a song than what you feel when reading an instruction manual or attending a committee meeting. Put tersely, it can be ‘worth making a song and dance’ about some experiences but not about others.

4. Religious chanting. Before the advent of PA systems, speaking was for centuries replaced by chanting in reverb-rich venues like cathedrals and large mosques. The Word of God merely spoken by an officiant under such acoustic conditions could easily end up as an incomprehensible sonic blur in the ears of the congregation. The fixed pitches and measured delivery of chanting helped overcome this prosaic problem. This historical observation reinforces the notion of song as ‘transcendent’, more ‘otherworldly’ than speech.

Although those four observations clearly suggest that song is a special or heightened mode of vocalisation, it could also be argued that singing is more down-to-earth, more somatic, or at least more directly emotional, than talking, the dominant or default mode of vocal interaction among grown-ups. However, just as falling in love can be regarded as regression to emotions of infancy and at the same time an important step forwards in the personal development of adults, singing provides an instantaneous direct connection between, on the one hand, preverbal and/or nonverbal (infant and/or animal) vocalisation and, on the other, verbal vocalisation, all in the socially constructed cultural environment of a musical genre.

Turning to sonic differences between speech and song, it’s possible to make the following five general observations about typical traits.

1. Singing is more tonal than talking: sung pitches are longer and, if free from wide vibrato, more stable than spoken pitches.


2. When words are sung, vowels (and, sometimes, voiced continuants) tend to become longer while durations of non-continuant, unvoiced consonants remain much closer to those of speech.

3. Sung statements (phrases) tend to be longer and more fluid than those of speech.

4. Disjointed, staccato delivery containing short breaks is less common in song than in speech, while breaks between phrases or periods are generally longer in song than in speech.

5. Singing uses more regular and recurrent patterns of accentuation, metre and periodicity than does speech.

There are of course hybrid vocal modes mixing traits from both speech and song. I’m thinking here of four such modes: metric chanting, recitative, intoned chanting and Sprechgesang.

1. In metric chanting speech replaces the tonal traits of song while rhythmic and metric traits of song remain in tact, as in rap, in the scanned slogans of street demonstrations, and in some types of poetry reading.

2. In recitative (recitativo = sung solo dialogue in opera or oratorio) the tonal traits of song (fixed pitches) are retained and a full melodic tonal range is in operation but speech rhythm replaces that of song and there is no clear musical metre (parlando; senza misura).

3. In intoned chanting, where, as in recitative, speech rhythms dominate and the tonal traits of song are in clear evidence, melodic range is either very restricted (sometimes to just one note) and/or highly formulaic (e.g. consisting of a start motif, a recitation tone and a final motif). Non-metric psalm and canticle singing, synagogue cantillation, as well as Qur'anic recitation and calls to prayer are all examples of intoned chanting. Incantation usually takes the form of intoned chanting.

4. In Sprechgesang, a technique used only by individual voices, pitch range can be extensive, the overall pitch profile of a phrase well defined and the rhythmic patterning more similar to that of song than speech, but the individual pitches of each syllable are unfixed and much closer to those typical of speech.

To end this section it’s worth considering the use of sung tones on certain words in everyday speech. One of the most common examples in standard UK English must surely be the sudden application of sing-song motherese intonation, featuring a descending third delivered in a highish register, on to a particular disyllabic in utterances like: ‘Baby go bye-byes!’, ‘Oh-oh!’, ‘That’s naugh-ty’, ‘You’ll be sor-ry!’, ‘[I] love you!’, ‘Bo-ring!’, ‘[Good] bye-eee!’ (sing-song disyllabics in italics). This use of over-intoned ‘kiddie-speak’ can have effects ranging from humorous and childish to rude and patronising. How such effects are created and why they are used would be the subject of another entire book. The point here is that there is a momentary but marked change from normal speech into song, into a demonstrably different vocalisation mode to create a particular effect.

Talking is definitely more common than singing. That’s why, when we burst into song, we’re adopting a special human mode of vocalisation in a way that to some extent resembles changing clothes for a special occasion. It’s in that sense possible to think of singing itself as a vocal costume. Now, there’s more to it than that because there’s a clear difference between the general ‘singing costume’ that we’ve all worn at some time and that of a singer performing for an audience. However, since music semiotics rather than psycho-social role analysis is at the core of this book, I’ll leave issues of vocal stardom to colleagues in media studies and focus here on vocal costume and persona in terms of links between music as sound and its perceived meanings.

Suiting up for opera

Many vocal costumes used in singing relate to the first definition of costume (p. 360) in the sense of what you wear to carry out a particular task (the ‘swimming costume’ function). Classical opera singing, for example, demands techniques of breathing, diction and phonation allowing the unmiked voice to be projected across the orchestra pit and stalls to reach listeners high up and far away in the opera house balcony. It can take years of training to master these somatic amplification and projection techniques. Inside that tradition there are costume variants like the dramatic soprano, the heroic tenor; and inside, or across, those categories there are idiosyncratic differences of vocal timbre and style letting you distinguish between, say, dramatic tenors like Pavarotti, Domingo and Carreras. If you enjoy and listen to a lot of opera you’ll hear those differences instantaneously; if not, you may well hear no more than generic ‘male opera singers’.

Although I ought to know better, I’ve always had a problem with classical opera’s dislocation of vocal sound from narrative reality and psychological verisimilitude. I’m thinking here of the following two types of intimate scene. [1] Two characters alone on stage are in an embrace and perform a duet declaring their undying love for each other. This patently private declaration is even more patently public because the soloists belt out the duet for the benefit of listeners fifty metres away in the balcony, not for the narratively realistic ‘nearest and dearest’ on-stage partner who, in that role and situation, would surely take offense if his/her beloved were to bellow in his/her ear. [2] A heroine in a small room breathes her last few faint breaths but nevertheless manages to muster maximum lung power to perform a final aria for a large crowd in a large auditorium. Such operatic anomalies, however silly they may seem, are simply dramatic conventions that cause opera lovers no problem. After all, it could be argued, the sheer power and drama of operatic vocal costume can be heard as congruent with the power of emotions felt in such dramatic circumstances as falling in love or dying: both are in that sense ‘worth shouting about’. The anomalies are in fact no more absurd than those of hearing extreme vocal close-ups carrying intimate lyrics that are sung, recorded and broadcast or sold to millions of people all over the world.

So why do I, and many others besides me, accept, without batting an eyelid, Peter Gabriel’s dubbing of a whisper on to a full-throated vocal line —the voice simultaneously inside the head and out loud (p. 311)— but not opera’s way of dealing vocally with the dynamic between internal-private-subjective and external-public-objective aspects of expression? I think my problem with opera treatment of that duality stems from being born a generation after the invention of coil microphones and the amplification techniques that brought singing voices up close to the ears of individual listeners. Having reached adulthood in the era of multitrack recording, I’m simply used to hearing a vocalist breathe, whisper, croon and so on, not just declaim, exclaim or proclaim, whenever the recording artist, engineer or producer deems whichever sort of vocalisation to be appropriate. I expect intimacy to sound intimate.

The wealth of vocal detail audible, and manipulable, through multi-track recording is a prerequisite for the infinite variety of vocal personas which have become key elements in the aesthetics of popular music. This is a topic to which we’ll shortly return (p. 376). Here, though, it serves as an example of how differences in the perception of vocal costume, and, by extension, in the functions and meaning of that costume, can arise. Put simply, lovers of classical opera hear operatic voices as standard vocal clothing suited to a particular activity (singing opera) and differentiate easily between individuals, both performers and the roles they perform while wearing that vocal clothing, in the same sort of way that a laboratory assistant recognises the different roles and identities of other white-coated individuals working in the same lab. I mean: most of us will just see ‘white coats in a lab’ and think of, say, microbiology or genetics, unaware that one co-worker, Sharon, gram-stains bacteria and likes hill walking, while another, Gary, model-builds phenotypes and plays cricket. The semiotics of vocal costume are in other words dependent on degrees of familiarity with the real or potential variations of function and meaning inside the sphere of activity linked with the costume in question. Less familiarity and greater distance tend to shift the type of perceived vocal costume from suited to a particular activity (more familiar) towards signalling group identity (less familiar).

Group and genre identity costumes

The group identity function of vocal costume perception is perhaps clearest when vocal styles are heard by unfamiliar ears. In the urban West we often apply ethnic labels to singing styles —’Arabic’, ‘Bulgarian’, ‘Indian’, ‘Mongolian’, ‘Native American’, etc. as ethnic vocal costumes, so to speak— because we seem to hear the unfamiliar singing voices primarily in terms of ‘other people elsewhere’. That perception of otherness filtered through our own familiar frames of vocal reference tends to make us deaf to variants of style or genre that members of those foreign music cultures hear as distinctive and significant. Indeed, as we saw in the cross-cultural ‘death music’ experiment (pp. 49-50), we’re liable to identify particular functions and meanings in a foreign music culture not with those functions and meanings —funeral and death in that case—but with the foreignness we perceive in the music —Africa, Arab, China, Greece, India, Turkey, Yemen, bazaar, desert, jungle, etc.

We also tend to project the semiotic norms of familiar vocal styles on to unfamiliar ones. Hearing Bulgarian women singing traditional songs in semitone dyads as harsh and discordant rather than as standard procedure or good-natured fun (pp. 180-182) is one example.H Another is when we talk about the Bollywood girlie voice, even though Indian film’s most famous female singers were in their seventies when they were still, quite recently, recording vocals for roles lip-synced by actresses in their twenties. It’s also worth noting that Lata Mangeshkar and Asha Bhosle, pre-eminent vocal doyennes of Bollywood, were trained in the Indian classical music tradition. In that tradition a strong, straight high-soprano voice is preferred because it traces a cleaner and clearer melodic profile against the overtone-rich instrumental drones than would a deeper, more mellow vocal tone and timbre subjected to Western-style vibrato. If that is so, the girlie voice notion makes little sense because we’re not dealing with a particular female vocal persona (girlie), but with a vocal costume suited to a particular activity, that of presenting the female vocal line in tune with the drone-filled accompaniment so that the melody is clearly audible.

None of this means that we’re ‘wrong’ to hear Bulgarian semitone diaphony as discordant or Bollywood female vocals as girlish any more than I am to hear operatic voices as tonally blurred, wobbly, loud and generally ‘over the top’. It’s just that codal incompetence or interference is in action preventing us from hearing the unfamiliar sort of voice in an unfamiliar setting as we would if it were a familiar sort of voice in a familiar setting. Now, if you find such cultural relativity (or respect) uncomfortable, you might like to consider the work of Alan Lomax and his Cantometrics collaborators who, in Folk Song Style and Culture (1968), documented correlations between vocal style preferences and modes of food production in different types of pre-industrial society in different parts of the world. Their findings describe how, for example, the hunting communities studied in the project tended to show a general preference for a raspy solo male sound, while the horticultural societies seemed more likely to favour mellow mixed-voice chorality. To conclude, unlike Lomax and his collaborators, that these observations demonstrate the existence of a universally viable vocal persona for ‘the hunter’ and another for ‘the gardener’ would be out of order but some of the project’s findings could provide some ideas about crossovers between vocal costume and vocal persona.

Genre-specific vocal costumes

Male singer-songwriters

Fabrizio de André, Wolf Biermann, Jacques Brel, Johnny Cash, Leonard Cohen, Bob Dylan, Serge Gainsbourg, Socrates Málamas, Caetano Veloso, Tom Waits and Atahualpa Yupanqui, to name but a few, are all male singer-songwriters, each with a very distinctive voice. So, what vocal costume, if any, do they all wear that could possibly identify each one as belonging to the same overall genre?

‘In the canzone d'autore [= singer-songwriting], things that might be considered as mistakes of intonation, delivery and bad pronunciation in other genres are accepted as characteristics of individual personality, which is of primary importance in this genre.’ (Fabbri, 1982: 67)

Difference and non-conformity can in other words be understood as the singer-songwriter’s vocal costume. It’s a sort of ‘anti-uniform uniform’ at the opposite end of the spectrum from the relative uniformity of operatic vocal costumes, as well as from that of all those young hopefuls given the Melodyne auto-tuning treatment on TV talent shows like The X-Factor (2011). Being occasionally ‘out of tune, or too shy, or too “shouty”’, writes Fabbri (2005: 145), are vocal traits contributing to the singer-songwriter’s credibility as a ‘real person’, an ‘authentic voice’, a ‘true character’, complete with all the imperfections that inevitably come with every one of us and with our voices. It doesn’t seem to matter if the male singer-songwriter’s voice covers only a limited bass range (e.g. Cash, Cohen, Gainsbourg, Waits), or if he stays in mid register (e.g. Biermann, Dylan, Yupanqui), or if he covers a much wider range (e.g. De André, Brel, Málamas, Veloso). Nor does it matter if he sounds like a ranting preacher (Dylan), or a rueful ruminator (Cohen), or a gruff drunkard on sixty cigarettes a day (Waits), or like a degenerate rogue with little more than a dirty old man growl left by way of a voice (late Gainsbourg), or like a wise and simple but enigmatic bard (Yupanqui), or like a full-blooded but vulnerable thinker with a mellow voice that can break out into passionate exclamation (De André, Brel, Málamas). Almost any voice will work, just as long as the following stylistic conditions are met: [1] the voice is no-one else’s and does not appear to conform to norms established through formal training or audio technology; [2] the words are intelligent or enigmatic, thoughtful or provocative, poetic or witty and usually audible: the artist’s voice is up front and centre stage; [3] the song, recorded or performed live, should not bear obvious traces of intricate arrangement, orchestration or audio signal processing even if it may in fact have been subjected to such types of treatment. And the singer-songwriter's ‘no-frills’ performance, live or recorded, will be even more effective if reinforced by sartorial, behavioural, linguistic and other rules of the genre, especially if the lines between performing and non-performing persona are blurred. With all these attributes the singer-songwriter is easy to identify, not just as an ‘honest artist’ but also as the song lyric’s authoritative and authorial first person (Fabbri, 2005: 145).

Other genre-specific vocal costumes

It goes without saying that other vocal genre costumes exhibit different traits to those of the singer-songwriter. Nevertheless, whether it be a cantautore, a chansonnier, a fadista, a payador, or an opera diva, or female Bollywood singing star; or, in the anglophone world of popular song, a singer-songwriter, a death metal growler, a female gospel artist, a dramatic ballad star, a blues shouter, a crooner, a rapper, a mainstream jazz vocalist, a riot grrl or a folk revival songster, one thing is certain: every one of those different types of vocalist will be wearing some sort of vocal costume identifying him/her with the style and genre in question. As explained earlier, some vocal costumes may exist, at least partly, out of acoustic necessity (operatic voices, the Bollywood ‘girlie’ voice, intoned chanting etc.), but every one of the vocal costumes just mentioned will be signalling some kind of genre group identity. ‘But’, as the advertisers say, ‘that’s not all’.

If you’re familiar with the musical genre and style in question you’ll not only recognise the vocal style as a genre costume: you’ll also be able to distinguish the voices of individual singers and to recognise differences of vocal persona performed by those singers in those genres. Vocal genre costumes tend to be better suited than others to the presentation of certain types of vocal persona. For example, a death metal growler (e.g. Carcass, 1990) is incompatible with the smooth Mr Nice-Guy sort of persona a convincing crooner can create (e.g. Bowlly, 1933); and a crooner, in his turn, would be not be much use as a hoodie gangsta-rapping about ‘slappin' up de hoes 'n' bitches’ (e.g. Eazy E, 1987), who in his turn would be useless as a ‘sincere’ lovestruck torch ballad persona (e.g. Houston, 1992), who would make a lousy riot grrl (e.g. Bikini Kill, 1996), and so on.

Grasping vocal persona


Despite the vernacular terms students use, often with considerable insight of lateral (transmodal) thinking, to describe the character of vocal sounds, I’ve also often registered blank faces in response to questions like ‘What does the voice actually express here?’ or ‘What sort of person is singing to us?’ I never interpret those blank faces as a sign of incompetence because I’ve learnt that all hearing individuals intuitively know, within the same broad music culture, what a voice is communicating and what sort of person is behind it. The blank faces seem rather to express a reticence that probably stems from the discomfort of being asked to verbalise personal impressions of emotions in front of a cohort of fellow students: no-one wants to risk making a fool of themselves by revealing too much of their emotional sensitivity in the company of peers. That peer pressure problem is compounded by the fact that talking about voice in terms like nervous teenager, Barbie Doll or suicidal student isn’t regarded as commensurate with the serious or grown-up sort of impression imagined appropriate in the supposedly serious grown-up context of a university analysis seminar. The reticence is in other words a symptom of the dual consciousness in which ‘our sense of identity and agency in private is dissociated from whatever sense we may have of ourselves in the public sphere’ (p. 2). For while we seem to accept that a successful artist can use voice to express all sorts of intimate, emotional and personal things (private) to millions of listeners all over the world (public), some individuals still find the verbal description of feelings and impressions evoked in them by the same artist’s voice too personal, too private to talk about ‘live’, even in front of just a small group of people, and even though those subjective impressions are almost certainly shared by thousands of other human subjects. This contradictory vicious circle of dual consciousness has to be broken in semiotic music analysis. Discussion of vocal meaning tackles the problem head on, as we’ll soon see, in a clear and tangible way. So, how can talking about vocal persona help break that vicious circle of dual consciousness? There are, I think, two main ways of approaching the problem, one theoretical, the other practical.

From the theoretical angle it’s firstly reasonable to assume that familiarity with issues of dual consciousness (see Preface) and intersubjectivity (Chapter 7) will make the discussion of voice less embarrassing. That’s because understanding the socially objective character of subjectivity (intersubjectivity) gives greater confidence in considering personal emotions and impressions in relation to those of others. Secondly, knowledge about psycho-somatic links between voice, mind and body can help liberate notions of subjectivity from their conceptual isolation and bring them out into contact with the external, objective, material world. Here are five broad categories of such links: [1] the vocal behaviour of trauma sufferers; [2] the vocal characteristics of depression and of Parkinson’s disease; [3] connections between voice disorders and other physical or psycho-somatic conditions; [4] gender variation and attractiveness in voice quality; [5] personality inference from voice quality.

Those are all areas in which it’s absurd to act as if personal, subjective experiences had no empirically demonstrable connection with external, objective, physical realities.

Turning to the practical side of analysing vocal persona, I’ve found the following ten simple steps useful in teaching situations.

1. Isolate a short passage in the AO where the vocal characteristics to be studied are particularly clear.

2. Play back that passage as a loop.

3. Listening eyes closed to the repeated loop, use your own voice to impersonate (i.e. to imitate and to appropriate) the vocal sound[s] whose meaning you want to focus on. You don’t need to actually sing, just to make the general sound of the voice whose meaning you want to describe. Do not sing the lyrics at this stage! The object of this exercise is to understand the connotative meaning of a vocal sound, not the lexical meaning of words carried by that sound.

4. When you’re reasonably satisfied that the sounds you’re making sufficiently resemble the vocal sound in the loop, stop playback but carry on doing your vocal impersonation with your hands cupped round your ears as you continue to growl, moan, chirp, bellow, warble or vocalise in any other appropriate and convincing manner.

5. Still impersonating the appropriate vocal sound run a quick poïetic check. Are you using falsetto, head register or chest register? Is the sound you’re producing at all nasal or guttural? Is your voice pitched high, low or in between? Are you using a narrow or wide pitch range? Does the pitch of your impersonation change often, suddenly, gradually, or not at all? Does your vocal impersonation sound loud or soft? Is your breathing short and fast or deep and slow, or in between? If you add words, how is your diction? Muffled and mumbling or crisp and clear? How much of your impersonation is like song and how much like speech?

6. Freeze face and body at some point while impersonating the recorded voice. Is your head held high, hung down, tossed back, leaning to one side? Are your eyes wide open, shut or squinting? Are they cast down, rolled upwards, looking straight in front or to one side? Is your mouth open or shut? Are your lips pursed? What shape is your mouth? Are your teeth clenched? Are your teeth visible? Are your face muscles taut and wrinkled or relaxed? Are you frowning? Is your chin pointing forwards or has your jaw dropped? Is there tension in your shoulders or are they relaxed, or drooping? Are your arms outstretched, folded, by your side, or held in front of you? Are your fists clenched? Are your hands cupped? Are your fingers stretched and splayed or are they relaxed and together? Are the palms of your hands open and visible or closed and hidden? Do your posture and facial expression fit better with standing, sitting, kneeling, lying, walking, running, etc.? In short does anything in your facial and bodily expression correspond to any particular emotion, state of mind or attitude?

7. What words best fit the vocal sound you’re imitating. Is it any of these? I love you. I hate you. Life is pointless. This is fun. I’m bored. Don’t mess with me! Don’t you think I’m sexy? I’m a creep. I’m coming to get you. Come closer! Go away! You’re gorgeous. You’re stupid. This makes me laugh. I despise you. I’m sick of it. I’m worried. I’m terrified. I won’t give in. I don’t care. This is fantastic. What words sound ridiculous or are impossible to say with the facial expression and body posture you’ve adopted to produce your impersonation? If there are lyrics, how does their meaning fit with the words you think best correspond to the vocal sound?

8. What sort of person (age, gender, nationality, occupation, etc.) might typically be talking in that way? Is it a lover, sister, brother, teacher, preacher, best friend, enemy, trickster, philosopher, or any of those listed in section 3 of the table on page 356? Or is it someone or something completely different? Perhaps it’s an animal or a machine? Who might the vocal persona you’re imitating be addressing? Him/her/itself or someone else? Just one other person, or several, or many? What sort of relationship could there be between the vocal persona and whoever they’re addressing?

9. Where is the voice you’re impersonating most likely to be heard? Indoors, outdoors or inside your head, or all three? In a bedroom or a church? In a bar, car or club, or at school? In the street or countryside? At the far end of a long corridor or breathing in your ear?

10. What words best describe the vocal sound you’re impersonating? Is it any of the concepts shown in Table 10-1 (pp. 356-357)?


The main value of this ten-step exercise is that it tangibly relates non-verbal vocal sound with other types of expression inside the listening subject. Vocal impersonation concretises the attitude and emotional state of the voice under analysis. The exercise provides direct access to the identification and meanings of vocal persona and makes it easier to overcome the negative effects of dual consciousness.

And finally: parody

If, despite the tips just presented, the task of denoting vocal persona still seems difficult or embarrassing, why not try some humour? Just look on line for parodies of the sort of voice you’re struggling to describe. Parody involves the humorous exaggeration of stylistic traits which, like caricatures, become larger than life and which make salient features of the style and genre extremely clear. Here are a few examples of vocal persona parody that I found useful while putting this chapter together: [1] Reggie Watts’s rap spoof Fuck Shit Stack, his ‘Irish folk ballad’ Fields Of Donegal (both 2010a), and, sharpest of all, Big-Ass Purse (2010b); [2] vocal-instrumental gags by Bill Bailey, for example his Bryan Adams lampoon Hats Off To Zebras, or his Billy Bragg parody Chip Shop, or Dr Qui, the ‘Jacques Brel/Belgian jazz’ version of the Dr Who theme (all 2000); [3] Jon Lajoie’s boy band parody Pop Song (2009), complete with obligatory rapper for ‘a slice of the urban market’ and a verse for the ‘gay voice to let you know I’m sensitive’.

Then there are the acrobatic, ecstatic, post-gospel princess caricatures in Nile Rodgers’ ‘Soul Glo’ spoof ad in Coming to America (1988) and in Stevie Van Lange’s orgasmic ‘Whoaa!’ for the 1993 Bodyform TV ad (Tagg, 2008c). Add to that the looped coloratura phrase from The Queen of the Night’s aria in The Magic Flute (Mozart, 1791) set to visuals of ‘perfectly groomed young women in the back-arching, pupils-dilating throes of carnal abandon’ (Service, 2008) for the Durex Play-O TV advert (2008), and you have a fascinating but gender-politically disturbing can of semiotic worms that should be, if it isn’t already, the subject of a complete book discussing ‘auditeurism’, the audio equivalent of voyeurism (see Corbett and Kapsalis, 1996). I can’t deal with any of that here but the issue certainly suggests that the power of vocal persona should never be underestimated.

Of course, musical parody isn’t just limited to the humorous exaggeration of vocal traits. Vocal, instrumental and compositional style are parodied in different ways by different entertainers who draw larger-than-life musical cartoons of sounds you may need to describe in your analysis. Therefore, to end this chapter on a usefully frivolous note, I take the liberty of listing a few artists from the anglophone world whose musical parodies might be useful if you need to pinpoint style-specific musical traits. In addition to the instrumental as well as vocal mannerisms parodied by Reggie Watts, Bill Bailey and Jon Lajoie (p. 380) those few examples would be Dudley Moore (1961), Peter Schickele (1967, 1971), Stan Freberg (1957) and Frank Zappa (1965, 1967, 1981). I would also recommend mockumentaries like The Rutles (1978) and Spinal Tap (1984), as well as sketches from Mad TV, not to mention style-specific novelty songs such as Disco Duck (Dees, 1976). Finally, different stylistic versions of the same tune automatically draw attention to parameters of musical expression like instrumentation, vocal persona and aural staging that can be missed when the melodic line and its lyrics are the main focus of interest. One striking set of multiple examples of the same tune was broadcast in the Australian TV series The Money or the Gun (1989-1990). It featured a radically different version of Stairway To Heaven (Led Zeppelin, 1971) every week over a six-month period (see Stairways to Heaven 1992).65 Who said music analysis was a drag?


[ 2012-09-28, 19:30]




11. Diataxis

Three types of ‘form’

yntax, diataxis and syncrisis are three different aspects of form in music. Form means the shape or pattern into which different parts or elements are arranged, ordered, or otherwise combined into a whole. For instance, the three words in the two sentences Tim hit Tom and Tom hit Tim have, in accordance with the norms of English syntax, the same form ―subject - verb - object― but different meanings. Syntax also exists in music in that melodic phrases consist of at least two motifs (usually elided) which, when ordered differently, produce the same form but different effects. Now, conventional Western music theory rarely considers such syntax as form in the production of meaning in music. Instead it uses ‘form’ almost exclusively to designate the way in which episodes rather than phrases are ordered along the unidimensional and unidirectional axis of passing time to create extensional patterns of musical change and recurrence like ‘sonata form’ or ‘rondo form’. This long-term, linear, ‘horizontal’ or ´diachronic’ sort of form needs to be distinguished from the ‘short-term horizontal’ type of syntax. Diataxis [daI(!tQksIs] (which originally meant the order of service in Byzantine Orthodox liturgy) was the least ambiguous word I could find to mark that distinction. But that is neither the only nor most important reason for having to use the term.

Form in conventional painting, sculpture and photography has no diachronic aspect because its constituent elements do not unfold over time, as in music, dance or film. Form in the visual arts is usually called ‘composition’ and refers, at least in its perception, to the synchronic arrangement of a work’s constituent elements which are, so to speak, fixed on the canvas, or in the sculpture or photo. Neither those elements nor the form in which they are presented change once you start viewing the work, even if you interpret them differently the longer or more attentively you look, or if you view the work in a different context or under different circumstances. Among the parameters defining form (‘composition’) in the visual arts are size, proportion, perspective, positioning and orientation of constituent elements, the viewer’s point and angle of entry, colour, negative space, contrast, symmetry and lighting. Several of these parameters are relevant to music, not least the synchronic placement and relative importance of constituent elements. Indeed, as noted in the discussion of structure in music (p. 235):

‘[E]xplanations of musical semiosis need to consider several individually meaningful layers that sound simultaneously… These composite layers of simultaneously sounding musemes are called museme stacks and, as “now‐sound form” (or syncrisis), are particularly useful… in forming hypotheses about which structural elements in an AO may be linked to which sort of interpretants.’

We are in other words dealing with an aspect of form that is neither short-term syntax nor diataxis. Since such form is perceptible within the limits of the extended present ―for example as a composite of aurally staged, simultaneously sounding motifs, riffs, chords, instruments, voices, timbres, pitches, rhythms, etc. in a particular metre at a particular speed and dB level― it can be considered synchronic. Moreover, since stacking (as in ‘museme stack’) implies height rather than length (‘museme string’), this synchronic type of musical form can also be thought of as more vertical than horizontal, more intensional than extensional. It takes the form of a state more than of a process or narrative even though it can contain elements of short-term syntax. In contradistinction to diataxis, form consisting of a composite of ‘now sounds’ will be called syncrisis [!sInkrIsIs]. To summarise, it’s essential to distinguish between three aspects of musical form.

[1] Syntax denotes aspects of form and signification bearing on the temporal relationship of constituent elements. It normally covers the short-term ordering of elements inside the extended present (synchronic).

[2] Diataxis is the long-term, diachronic, processual and episodic aspect of syntax covering the extensional ordering of events over durations exceeding that of the extended present. It can be thought of in terms of overall narrative form, and as horizontal rather than vertical. In conventional Western music theory diataxis is usually called ‘form’ as if no other type of musical form existed.

[3] Syncrisis denotes aspects of form and signification bearing on the synchronic, intensional, arrangement of structural elements inside the extended present. It can contain elements of short-term syntax and be thought of as vertical stacking rather than as a horizontal array.

This chapter deals with diataxis, Chapter 12 with syncrisis. They are the two main macro-parameters configuring the ways in which a piece of music’s component parts, themselves constructed using the sort of parameters discussed in Chapters 8-10, are combined to create a whole with a particular overall shape and form.

Having defined basic terms, let’s see how diataxis can create meaning in musical reality, using an Abba tune as test case. We’ll take it from the bottom up, starting with musemes, identifying episodes and discussing the meaning of its diataxis.



Diataxis in Fernando

Figure 11-1 (p. 387) charts the occurrence of the eleven main musemes heard in Abba’s Fernando (1975). The musemes and their PMFCs were identified using the procedures set out in Chapters 6 and 7. Museme numbers appear in the diagram’s left column and their names in its bottom three rows. For example, museme 2 (symbolised ‘<•’ and labelled ‘sunrise’) occurs only twice: just before the start of Verse 1 at 0:25 and just after Refrain 1 at 2:01.

Episodes are easy to spot in a piece like Fernando, firstly because of the recording’s obvious verse-refrain duality: words for each of the three verses are different while those of the refrains are the same each time. Secondly, the diagram shows (with the exception of museme 6, the hook motif ‘Fernando’) that musemes in the instrumental plus verse sections (I and V for short) and in the refrains (R) are mutually exclusive. That fits the dictionary definition of episode as ‘a passage containing distinct material as part of a larger sequence of events’: next to nothing of what you hear in I or V is present in R and vice versa. Fernando’s episodic running order is therefore, as shown in Figure 11-1: Instrumental 1 plus Verse 1, Verse 2, Refrain 1, Instrumental 2 plus Verse 3, Refrain 2, Refrain 3, Refrain 4 with fade; or, in abbreviated form with timings, I+V (0:00-0:55) V (0:55-1:24) R (1:24-2:01) I+V (2:01-2:49) R R R (2:49-4:12). Stripped to its bare essentials and with I+V and V represented by the letter A (because they come first in the song) and the refrains as B (because they come after A), the diataxis or narrative form of Fernando can be distilled to the simple formula AABABBB. So what?

There’s not much point in reducing a piece of music to a string of capital letters unless you either need to show you can theme spot in a conventional music theory exam question, or, more seriously, unless you consider diataxis as a parameter of musical expression. In fact, a string of letters like AABABB(B) or VVRVRR(R) can be meaningful in two ways (continued on page 388).

Fig. 11-1. Fernando (Abba, 1975): Table of Musematic Occurrence

Firstly, Fernando’s episodic order can act as a style indicator (p. 523) signalling that the song was probably written after 1970 because until some time in the mid 1960s variants of the 32-bar AABA pattern of jazz standards (p. 397 ff.) were still in common use in anglophone pop. Secondly, the distinct character of episodes, of how much durational ‘space’ each of them occupies at which points in the piece, and of how much overlap there is between the two in terms of both simultaneity and shared material, etc. are factors of serious semiotic potential.

For example, the virtual exclusivity of musical material between verse and refrain (V-R or A-B) in Fernando contrasts starkly with another work shuttling between musical representations of Europe and indigenous South America. In Morricone’s score for The Mission (1986) the initially ‘alien’ panpipe punctuations’ (0:00:56 - 0:03:31) have by the end of the film been combined and harmonised with musical ideas associated with European humanism (e.g. just before 2:00:15). Very little syncretising of that sort occurs in Fernando whose European and Native American spheres largely remain musematically separate.

In Fernando the Latin American sphere is hinted at by a single reference in the lyrics to ‘The Rio Grande’ but made abundantly clear by museme 1, the charango and quena flute sounds from 0:00 to 0:55 and from 2:18 to 2:49. Museme 1 is an example of genre synecdoche by ethnic connotation of foreign, ‘world-musicky’ instrumental sounds. In this case it’s a reference to the huayno or Flûte indienne styles of the Andes and to the sort of music heard in the peñas of Chile’s Unidad Popular period, or performed by refugees in exile after the 1973 fascist coup. But there’s more to Fernando’s verse episodes than ‘just’ that. They also include a variant of one of the euroclassical tradition’s most famous tropes — the sunrise motif (museme 2) near the start of Also sprach Zarathustra (R Strauss, 1896)—, angel harps and tiptoe bass (museme 3), distant bolero drums (museme 4) and a particular type of vocal delivery from Annifrid Lyngstad (museme 5a).

As explained elsewhere (Tagg 2000b: 38-50), the angel harps and tiptoe bass (museme 3), have connotations of sincerity, innocence, grace, devotion, transcendence, heaven, etc., and the bolero drums (museme 4) of things hispanic and military. The vocal track of the verses (museme 5a with variants), adds even more connotative detail to the sonic picture of Fernando’s verses. The words are delivered less metrically than in the refrains and greater liberties are taken in terms of inflection of rhythm and pitch, giving the delivery an element of recitative (p. 367). This parlando aspect is reinforced by irregular periodicity: the verses’ five phrases cover 2, 3, 2, 3 and 3½ bars respectively, compared to the refrain’s four phrases each covering four r bars. These asymmetrical aspects of vocal delivery signal sincerity and involvement in the meaning of words because they aren’t subjected to regular scanning patterns, nor to a consistent metre or length of period. They tell the listener that the words are important, an effect reinforced by the fact that words are delivered at a faster rate in the verses than in the refrain. Moreover, the verses’ melodic phrases contain no interval greater than a second (conjunct motion) while the refrain contains several. Finally, the verses’ melodic lines consist entirely of appoggiature, a common device of rhythmic-tonal articulation in euroclassical song but quite exceptional in English-language pop or rock music and, unsurprisingly, absent in Fernando’s refrains.

The overall connotations of musical material in Fernando’s instrumental and verse sections (the I+V or A episodes) can, in the ears of a generic Northern European urban listener in the mid 1970s, be crudely summarised as Andean, rural, far away, outdoors, devotional, transcendent, a tiny bit ‘classical’, honest, sincere, involved, ethnically exotic (not ‘here’) and ‘olde worlde’ (not ‘now’, not ‘today’, not modern pop). It’s a geo-ethnic and historical elsewhere, a picture-postcard there-and-then. This episodic sphere is reduced visually in Figure 11-2 (p. 391) to the stereotype of a North American or Northern European solo songstress (‘la gringa vocalista’) clad in a fashionable poncho and flanked by two indigenous quena players (‘los indios quenistas’), in less fashionable ponchos, all set against an Andean altiplano backdrop.

The refrains are entirely different. Not only is the singing ‘collective’ (a man has joined the female first-person vocalist in a singalong manner) and metrically/periodically regular, the general scene (accompaniment) now includes musemes 9 and 10, a stack of soft disco features including a synthesised string pad, regular guitar strumming, a Fats Domino good-time second-line riff, and full drumkit featuring hi-hat patterns like those used later in Staying Alive, all endowed with plenty of disco-dance reverb (musemes 9-10). Connotations for these musemes were glistening, romance, urban mating rituals, indoors, dance hall/disco, Saturday night, familiar, recreational, pleasant, fun. There’s nothing geo-ethnic or historical here for the mainstream 1970s home audience in urban North America or Northern Europe. It’s a non-threatening here-and-now entertainment situation that is visually caricatured in Figure 11-2 by silhouettes singing, playing and dancing against the starry, twinkling backdrop of a slowly rotating disco ball.

With each episode identified in terms of both musical structure and PMFCs, we can now ask what the string of letters AABABB(B) or VVRVRR(R) actually means. If we split the string into two at 2:01, the tune’s halfway point, the song can be heard and seen to consist of two narrative processes: [1] from more of V (far away, another time and another place) to less of R (happy at home in the here and now), and [2] from less of V (there and then) to a lot of R (here and now). As shown in the top half (i) of Figure 11-2, those two processes together form the single overall process from A to B (verse to refrain). The semiotic significance of this diataxis becomes clear when compared to the processual commutation shown in the lower half (ii) of Figure 11-2, where what was previously R or B (the refrain episodes) has become A (what were the instrumental and verse episodes) and vice versa. It’s obvious: proceeding from there and then to here and now (i) isn’t the same thing as going from here and now to there and then (ii).

Fig. 11-2. Overall diataxis in Fernando, with commutation:

(i) there/then to here/now; (ii) here/now to there/then

The commutation just suggested in Fig. 11-2 (ii) assumes that the familiar home sphere is given some typically verse-like traits and that the unfamiliar foreign sphere is turned into something a little more metrically regular, refrain-like and conducive to singalong. The salient point in either instance is awareness of which musical sphere, in terms of both sonic material and connotation, is ‘home’ and which is ‘away’. Identification of these spheres has to be aesthesically based by referring to the general cultural habitat of an audience, all of which raises the issue of how diataxis is conceptualised. That’s not as complicated as it sounds, as we’ll see next, but it can be important, as in the case of Fernando, for determining the semiotic effects of diataxis.

Cyclical processuality

Fig. 11-3. Centripetal (recursive) process: (a) on a unidirectional time axis; (b) as centre and periphery; (c) from centre to periphery and back; (d) with centrifugal ending (Eisler, 1931).

In Western music studies it is, as already noted, customary to visualise narrative processes running along a time axis from left to right, as in figure 11-3a. The problem with this chronological visualisation is that the usual focal point of recursive diataxis, the A section, the episode establishing the home key as well as the central rhythmic and melodic ideas of the piece, is located at its extremities while the B section, the episode whose material diverges from that of the central core, appears in the middle. To overcome this paradox and to cater for the cyclical character of the episodic recurrence (repeats and reprises) that occur in so many types of music, it can be useful to conceptualise diataxis cyclically with the music’s central episode placed more experientially, as in figure 11-3b, at the musical centre of the piece rather than at the extremities of external time. This cyclical conceptualisation lets us track a simple AABA order of events in the cardioid shape shown as Figure 11-3c. A, the first episode, runs (clockwise) once round the inner circle from ‘start’ to ‘end’, returns to ‘start’ and runs round the same circle to the ‘end’ point a second time, whereafter it launches into the outer circle, the B section, returning to the start of A and to a third and final lap of the inner circle.

Figure 11-3d shows another basically centripetal model (R = refrain, V = verse) but with a centrifugal ending. It represents the diataxis of Eisler’s Solidaritätslied (1931) which starts with the refrain (R1, ‘Vorwärts und nicht vergessen’), launches out into verse 1 (V1), returns to the same refrain (R2), out into verse 3 (V3) and so on until, after verse 5 (V5), it reaches the sixth occurrence of the refrain (R6). R6 starts as did refrains 1-5 but ends quite differently (‘Wessen Welt ist die Welt?’), taking the music and its listeners to a new place outside the previously presented material. This centrifugal, non-recursive gesture contrasts starkly with the usual centripetal return of finality in V-R diataxis to the central core of the music’s material and meanings. With Solidaritätslied it’s as if we spin round cumulatively in patterns of recurrent verse and refrain to be finally hurled like a discus out of the established comfort zone:

V-R > V-R > V-R > V-R > V-R > V-R> ???

The overall centripetal process from verse to refrain episodes in Abba’s Fernando is represented as a skewed figure-of-eight shape in figure 11-4 (p. 394). The here and now at home sphere R (Refrain) is placed closer to extramusical time (the audience’s home ground) at the bottom of the diagram, the there and then elsewhere sphere V (verses including instrumentals) further away. The music’s pick-up point is at X, the start of the tune at 0:00, which fades and zooms in like the initial establishing shot of a film’s foreign location: the scene is set for two clockwise laps of the there and then verse circle (V1 and V2). We then come closer to home for a single anticlockwise lap of the here and now refrain circuit after which we are back in the there and then sphere for one more lap. Then we return to the here and now refrain sphere for good, running not just one, not two but almost three anticlockwise laps round the home circuit before we are eased smoothly back to extramusical time at home with the home-based refrain fade-out ending at musical drop-off point Y, 4:12 after the initial pick-up point X.

Fig. 11-4. Verse-refrain pattern in Fernando as centripetal process


The main advantage of cyclical modelling is that it more adequately represents the cyclical aspects of diataxis. While we often talk of returning to a physical location, even if both we and it have changed since we last visited, we are more likely to dismiss the idea of returning to a point in time as science-fiction fantasy. But in music it’s as self-evident as it is both real and common to go back to the start (da capo) of a piece or episode: it just doesn’t matter that da capo is impossible in external reality. The way in which music can freeze irreversible time in an ongoing series of meso-durations allows points in time to be revisited without difficulty. The chief disadvantage is that cyclical models are virtually impossible to create in two dimensions flat on the screen or on the printed page. Although that’s the main reason why no more such diagrams appear in this book it can still be semiotically revealing, as in the case of Fernando, to consider the narrative of an analysis piece in cyclical terms.


General diatactic schemes

Popular song

Figure 11-5 (p. 396) suggests that there are three basic types of common diataxis in popular song: [1] strophic (AA…A); [2] verse - refrain (VR or AB, like Fernando); [3] [verse - ] chorus - bridge - chorus (AABA or CB). Before discussing these three basic types it’s wise to first posit general definitions of chorus, refrain and verse.

Chorus, refrain, verse, etc.

In Ancient Greek theatre the chorus (χορός) was a troupe of artists who danced, sang or recited between other parts of the drama that were played by individual actors. In latter-day musicals the chorus also involves groups of people both singing and dancing but the word otherwise usually denotes a group of people concertedly singing or chanting rather than dancing. In English, chorus also means a recurrent musical passage, section or episode (or set of episodes) that is, or can be, sung concertedly by a group of people as opposed to a passage sung solo (or as a duet or trio). This singalong (or ‘dance-along’ or ‘play-along’) aspect of concerted performance is central to all episodic notions of chorus. As we shall shortly see, this general meaning is applied quite differently to different types of chorus sections in different types of popular song.

Refrain is a less generic concept than chorus: it’s simply a type of chorus episode whose words and tune change very little, if at all, each time it occurs in a song. Refrain is paired with verse, by which is in general meant a song episode whose tune remains the same (or similar) each time it occurs but whose words differ on each occasion. The problem with verse is that, like chorus, it means different things in different contexts, as will gradually become clear in what follows.

Figure 11-5 (p. 396) shows three basic types of diataxis heard in a large proportion of English-language popular song. Strophic form is presented in the top line using Bill Haley’s Rock Around The Clock (1955) as an example. In basic terms the recording is little more than a series of up-tempo twelve-bar blues periods AAAAAAA ( p. 342). It begins with eight bars (or 11") of sung introduction (‘1, 2, 3 o’clock’, etc.). Then there are two sung verses (starting ‘Put your glad rags on’ and ‘When the clock strikes 2’) followed by a guitar solo occupying the same duration (12 bars or 15") and following the same chords as each verse. Verses 3 and 4 (‘When the chimes ring 5’ and ‘When it’s 8, 9, 10’) come next, followed by the one-note horn riff version of the same 12-bar, 15-second period. The recording ends with a final sung verse (‘When the clock strikes 12’) and a brief cadential coda without vocals.

Fig. 11-5. Common types of recursive (cyclical) diataxis in popular song:

(i) strophic; (ii) verse-refrain (VR); (iii) chorus-bridge (CB) or AABA (32-bar ‘jazz standard’)

The small Hs in Figure 11-5 (i) denote the occurrence of the song’s hook lines (‘We’re gonna rock around the clock tonight’ etc.) which occupy the last part of each twelve-bar period or verse. While most hooks have the chorus character of a refrain in that they present recurrences of the same tune with the same words, they are no more than just one element in the episode containing them, be it verse, refrain or chorus: hooks do not in themselves constitute a chorus or refrain. They can occur at the start of a chorus, as in the AABA song Blue Moon (p. 398) or, as in the case of the title museme from Abba’s Fernando, on the last three syllables of phrases 1 and 2 (of 5) in the verses and phrases 1, 2 and 4 (of 4) in the refrain, or in fact virtually anywhere in the song, though most commonly somewhere in the refrain (if VR pattern) or chorus (if CB).

The middle line (ii) in Figure 11-5 is a generic representation of the Verse-Refrain (VR) pattern. Fernando’s overall VVRVRR shape is just one variant of the VRVRVRR shown in the diagram. Staying with Abba, the IVRVRV iR+I of SOS (1975a) is another variant, the IRVVR+VR++(I) of Dancing Queen (1976) another, and the IVRVR­R of Hasta Mañana (1974b) yet another. Honey Honey (Abba, 1974a) is a different kettle of episodic fish. Like many early Beatles songs it follows an AABA or CB (Chorus-Bridge) rather than a VR (Verse-Refrain) pattern: its I C C B C B i C (AABABA) resembles, for example, the ICCBCC iBCI order (AAAABA) of From Me To You (Beatles, 1963). Since this distinction between VR and AABA patterns may seem a bit arcane to some readers I’d better explain its importance.

AABA (Chorus-Bridge)

The bottom line (iii) in Figure 11-5 represents Chorus-Bridge (CB) diataxis, as typified in the AABA form of the 32-bar jazz standard or ‘evergreen’. Songs like Autumn Leaves, Blue Moon, Body & Soul, Misty, My Funny Valentine, Night & Day, Ol’ Man River, Over The Rainbow, Satin Doll, Smoke Gets In Your Eyes and Stormy Weather are just ten among the countless AABA tunes dominating music sales in the anglophone world between 1925 and 1960. Many of these jazz standards were written for musicals and their first constituent episode, the verse, was a lengthy recitativo type of affair, containing little by way of memorable melodic or rhythmic material. Its function was to link the plot of the drama to the moment when action could be frozen and the stage set for a show-stopping tune. Here, for example, are some extracts from the lyrics of the preparatory verse to Blue Moon (R Rodgers, 1934).

‘Once upon a time | Before I took up smiling, | I hated the moonlight…

With no one to stay up for, | I went to sleep at ten…

My heart was just an organ, | My life had no mission … [but]

Now that I have you |… I awake in Heaven’ [that’s why I want to say…]

Very few people remember a show tune’s verse but thousands know the chorus that kicks in on cue from the verse’s that’s why i want to say/sing setup. The lyrics of Blue Moon’s 32-bar chorus run as follows (see also figures 11-5 and 11-6, pp. 396 and 399).

A. Blue moon, | You saw me standing alone [bars 1-4]

Without a dream in my heart, | Without a love of my own. [bars 5-8]

A. Blue moon, | You knew just what I was there for. [bars 9-12]

You heard me saying a prayer for | Someone I really could care for. [13-16]

B. And then suddenly appeared before me [bars 17-18]

The only one my arms could ever hold. [bars 19-20]

I heard somebody whisper “please adore me” [bars 21-22]

And when I looked that moon had turned to gold. [bars 23-24]

A. Blue moon, | Now I'm no longer alone [bars 25-28]

Without a dream in my heart, | Without a love of my own.’ [bars 29-32]

Verses to tunes like this fell quickly into obscurity. One year after Blue Moon was published, Al Bowlly and the Ray Noble band recorded the song’s total chorus, without its verse, twice in succession as AABA + AABA. That recording is at the basis of Figure 11-6 (p. 399) which shows how four two-bar phrases constitute each of the AABA form’s eight-bar periods, and how those four eight-bar periods produce four episodes: [1] the A section that is [2] repeated so that it spans two periods (35" at 110 bpm); [3] the bridge or middle eight (the B section), containing contrasting material, some of it in a different key (17½"); [4] the A section in recap, also spanning a single period (17½").

Fig. 11-6. Episodes in a typical 32-bar jazz standard chorus (AABA form)


Episode Episode A (twice):

chorus incl. hook

1 × 2 periods (35" at 110 bpm) Episode B

bridge or

middle 8 (17½") Episode A

chorus, incl. hook (17½”)

Period 8 bars 8 bars 8 bars 8 bars

Phrase 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

Bars 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8


Since songs like Blue Moon, Autumn Leaves, Body And Soul, Misty and My Funny Valentine are almost always performed without their introductory verses, they are to all intents and purposes just their chorus sections. Those choruses contain in their turn a contrasting B section called bridge in North America and middle eight in the UK. Since the A section has no special name it’s sometimes called the chorus just to distinguish it from the bridge, even though A and B are both part of the chorus in the original sense of the word in this context.

The choruses of songs like Blue Moon are, as complete songs in themselves, often referred to as jazz standards, and a complete 32-bar (AABA) unit is what jazz musicians usually mean by chorus. Indeed, it’s worth remembering that jazz standards ( all those choruses) not only provided the harmonic and melodic basis for jazz improvisation but were also the main source of material for all those highly popular dance bands of which many noted jazz musicians had been members during the period 1925-1950. And it’s not as if AABA diataxis suddenly started in the 1920s, flourished and then suddenly disappeared again. Its direct predecessors are found in European music for the stage and can be traced back to the da capo aria of Baroque operas, oratorios and cantatas. The da capo aria was usually preceded by an easily forgotten introductory recitative like the jazz standard’s verse. Then, the aria proper, like the jazz standard chorus, started with the main A section. That A was followed by a shorter second episode (B) of contrasting character and the aria was rounded off with a recap (da capo) of the first section (A) during which the vocalist was, like the jazz musician, expected to exhibit improvisatory skills. The [A]ABA scheme also has a long history as instrumental music: marches and popular dances like the bourrée, minuet, polka and waltz almost always include a contrasting middle (B) section, often called the trio.

The centripetal [A]ABA pattern of the da capo aria, of marches and of older dance forms, consisting of two, not three, contrasting episodes plus recap of the first one, is widely but rather confusingly known in euroclassical music theory as ‘ternary form’. When jazz was at its most popular, the ‘ternary form’ of the AABA ‘standard’ appeared in versions that were often solely instrumental — some strictly arranged, others highly improvised — or that were sung through from start to finish, or in dance band recordings where the vocalist was given just one out of the four or five choruses that would fit on a 78 rpm disc. Moreover, like the twelve-bar blues matrix that sometimes runs to, say, 11½ or 14 bars, the 32-bar AABA ‘standard’ can also vary in length, as with the 38 bars of A Nightingale Sang In Berkeley Square, or the 29 bars of Yesterday (Beatles, 1965). But by the time of Yesterday we are at the end of the era when the AABA diataxis of jazz standards was such a common narrative form in English-language popular song. This does not mean that strophic verses and AABA form were the only diatactical strategies in the repertoire before Yesterday ―Roy Orbison and his dramatically non-recursive big ballads are a striking case in point― and it’s certainly not as if AABA form somehow disappeared after 1965. Yesterday is no more than an approximate reference point: AABA form was in general more common before it and less common after.


Strophic, verse-refrain and AABA types of diataxis may be very common in English-language popular song but they are by no means the only ways of ordering musical episodes in that repertoire. Apart from understanding how music helps create narrative cohesion in film (Chapter 14), it’s also worth noting how diataxis diverged from established patterns in pop-rock music of the post-Yesterday era. Figures 11-7, 11-8 (p. 402) and 11-9 (p. 404) provide three examples of such change.

Fig. 11-7. Abba: Name Of The Game (1977) ― ABCDE

Abba’s The Name Of The Game is unusual because after the first three statements of the short (12") minor-key A section (the instrumental intro and two sung verses) no episode recurs until half the song is over at 2:03 — AAABCDE. The second half consists only of episodes C, D and E — CDECC. If you consider the non-recapitulated A and B sections as a sort of verse (V) and CDECC as a sort of centripetal chorus in which the short D and E passages together act as a contrasting bridge between the recurring C episodes, you could argue that the tune follows a sort of jazz-standard pattern of V (‘verse’) plus a ‘ternary chorus’ running CBCBCC. However, given that the song’s A and B episodes are in the same groove as C, D and E, it’s difficult to hear them as a ‘verse’ in relation to a ‘chorus’: the overall effect remains one of non-recursive episodic progression rather than of ‘ternary’ centripetality.

Fig. 11-8. Beatles: A Day In The Life (1967) ― episodic overview

The Beatles’ A Day In The Life (fig. 11-8) starts with a short intro (‘I’), after which John Lennon (‘JL’) sings not two but three narrative verses about the curious death of a minor celebrity. That establishes a strophic, story-telling, multi-verse ballad pattern, but instead of a fourth verse, or a refrain, or instrumental interlude, we’re swept at 1:37 into 45 seconds of cacophonous orchestral clusters in a long ascending glissando (‘Gliss’) that could end anywhere at any time. That tonal and temporal unpredictability is unsettling. In fact we arrive at 2:17 in a sphere even more mundane than that of the previous JL ‘ballad’ verses. But this time, it’s someone else, McCartney (‘McC’), who’s the first-person narrator, telling a different story (banal details of matutinal stress) in a different voice, to a different tune, with a much less ballad-like, more mechanically animated sort of accompaniment until he falls ‘into a dream’ at 2:46 and the floating ‘Ah!’ phrases take over, swimming in unreal amounts of reverb (‘Ah’). Lennon returns at 3:15 with what sounds like a recap of the initial ballad verses (‘JL’), except that this time he only sings one such verse, and that its lyrics are even more surreal (the ‘4000 holes’, etc.) than those of the initial verses. This recurrence at 3:15 may technically be a recap but it also works like a flashback that interprets the initial story-telling reality in the light (or fog) of the intervening psychedelia (the ‘Gliss’ and ‘Ah!’ episodes). After just one JL verse we are launched back into an unmistakable recap of mega-glissando cacophony at 3:43 which leads this second time not back to the sphere of mundane morning stress but to the 45-second final orchestral chord at 4:18 and its cavernous reverb word-painting the ‘4000 holes’ they now know ‘it takes to fill the Albert Hall’. If you interpret this piece in terms like the absurdity of conventional narrative, the strangeness of normality, everyday reality as surreal, etc., then changing the order of episodes in this piece would, like the Fernando commutations described earlier, produce a radically different statement. Imagine what would happen if you started A Day In The Life with glissando chaos and ended with several verses of a normal strophic ballad, or if you ended it with a short, sharp cadence, or with a fade-out over a repeated ‘I read the news today, oh boy, oh boy, oh boy!’ Abnormal diataxis is simply an expressive advantage if the strangeness of everyday normality is not just to be understood verbally but something to be experienced musically.

Figure 11-9 (p. 404) shows the narrative form of The House, The Street, The Room (Gentle Giant, 1971). The most obvious feature in the diagram is the guitar solo which, with the break preceding it at 2:39, occupies 34% of the album track’s total duration (6:04). The sheer length and tonal cleverness of that solo aren’t the only things likely to impress serious young musicians: its muso music status is further enhanced by the arcane instrumentation and aleatoric character of the two quodlibets occupying another 22% of the track, traits that draw attention to the band’s avant-garde and modern jazz credentials. That leaves less than half the piece (44%) to house its main ideas which include two clever but forgettable riffs and nothing at all conducive to dance or singalong. Whether such musical scholasticism is interesting and innovative or pretentious and boring is not the point here. The point is that the Gentle Giant track also uses a general narrative pattern sending signals of musical erudition that work as follows. The initial V V I I section (0:11-1:02) forms a large episode (A1) that repeats (1:02 to 1:51) as V V I I I (A2) and is recapitulated (4:42-5:30), after the intervening quodlibet and guitar solo (B), as V V I I I (A3). That extended AABA structure, with the second quodlibet as coda, closely resembles the overall AABA episodicity of euroclassical sonata form (p. 409). In the context of anglophone rock from the early 1970s, the piece’s duration and diataxis, as well as the contents and connotations of its constituent episodes, place it squarely in the avant-garde ballpark of the prog rock playing field.

Fig. 11-9. Gentle Giant: The House, The Street, The Room (1971)

― episodic overview

Extensional diataxis

Both A Day In The Life and the Gentle Giant track just discussed are long pieces compared to the usual two, three or four minutes of most popular song recordings. Indeed, tracks on symphonic and prog rock albums tend not only to last longer but also to receive more extensive episodic treatment than a pop song to which a simple AABAABAA or VVRVRVRR order of events is well suited. None of this means that ostensibly simple diataxis becomes boring once it extends beyond three or four minutes. Indeed, the six minutes of verse and refrain in Dylan’s Like A Rolling Stone (1965b), not to mention the forty-three strophic verses in the nine-minute Brazilian rock classic Faroeste caboclo (Legião Urbana, 1987) both testify eloquently to the contrary. That said, you are as a rule more likely to find complex diataxis in longer rather than shorter pieces. The question is what, if anything, the narrative forms and actual durations of these long pieces can communicate apart from themselves. Let’s start with the durations.


As we saw in Chapter 8 (p. 288), the basic unit of musical mega-duration is that of an entire piece, be it a title theme lasting one minute, a pop song or Schubert Lied lasting around three minutes, or a prog rock or jazz fusion track, or a movement from a euroclassical symphony, lasting six minutes or more. Now, there’s no room here to discuss the diataxis of individual pieces (mega-durations) as elements in a larger work or as ingredients in another musically coherent context (giga-durations). True, the ordering of tunes in a stage musical or rock opera, or on a concept album, or of movements in a euroclassical work, or of dance tracks on a DJ’s playlist, or of items in a segment of Muzak, or of tunes performed by a live band, etc. can all obviously influence ‘what is communicated to whom with what effect’. Therefore, if the piece under analysis is an intrinsic part of a more extensive musical work that your listeners hear as a whole (e.g. a euroclassical movement or a concept album track not released as a single), it’s advisable to discuss its direct musical context as part of that larger work; but if the piece is a replaceable item in a performance with variable content or running order (e.g. a DJ playlist), there’s no need for direct musical contextualisation unless the entire performance is as much the focus of attention as the individual piece itself. Since detailed discussion of these issues could easily occupy another book of this size, let’s return to the level of the mega-duration and to the significance (or non-significance) of an analysis piece’s running time.

There’s no point thinking in terms of mega-duration if your analysis piece is a jingle, bridge or tail lasting just a few seconds. However, even a 60-second TV theme tune contains periods, often also episodes. If so, its identity as a piece is partly determined by the pattern created by how the constituent episodes are managed in terms of order, relative duration, etc. inside its total duration. So having set rough upper and lower limits to this part of the discussion (between J1' and J6'), let’s see if a piece’s duration can in any way indicate style, genre or function by using 4:33 as a test-case running time.

John Cage’s famous ‘piece of silence’ 4:33 (1952) lasts about as long as the Hallelujah chorus [4:34] (Handel, 1741), Gimme Shelter [4:32] (Rolling Stones, 1969), I Am The Walrus [4:36] (Beatles, 1967b), the Alla marcia from Sibelius’s Karelia Suite [4:36] (1893), Édith Piaf’s Milord [4:30] (1959), Ali Hassan’s Samiry [4:33] (2002), the verse anthem This is the Record of John [4:34] (Gibbons, 1615), and Einstürzende Neubaten’s Haus der Lüge [4:30] (1989). 4:33 is also close to the average durations of Purcell sonatas, Italian house tracks from the early 1990s, shorter movements in Bach Brandenburg Concertos or in symphonies by Haydn and Mozart, and Billboard chart-toppers from 1990. Not much style specificity there! Still, 4:33 is much shorter than movements in a Mahler symphony or than prog rock and jazz fusion tracks. It’s also longer than most pop songs, marches and Schubert Lieder (J3'), much longer than dances in suites and pot-pourris (J 2') and much, much longer than TV jingles, themes, tails and bridges (one minute or less), not to mention the last three of Webern’s Fünf Sätze für Orchester (1913). In short, the duration of a piece is more likely to be a partial style indicator if it lasts much longer than five minutes or less than around two. It can in other words be useful to see how typical the duration of an analysis piece is for the style and genre to which you think it belongs and to compare that duration with those of other styles and genres (Table 11-1).

Table 11-1. Very rough and incomplete guide to average durations of recordings in different types of music

< 1 min. TV themes, film cues, jingles, short oratorio recitatives, etc.

J 2 mins. uptempo dances, national anthems, Christmas carols, tangos, Mussorgsky’s Pictures, shorter country songs (e.g. Loretta Lynn), Billboard #1s in 1960 (J 2'-2½'), talent show performances (e.g. X-Factor, J1½'), etc.

J 3 mins. Schubert Lieder, most country songs, Billboard #1s in 1947 and 1970, trad and mainstream jazz tracks, tarantelle, rebetiko, fado, choro, Scriabin études, punk tracks; songs having to fit on one side of a 78 rpm or 45 rpm single (7 inch).

J 4 mins Beethoven symphony 3rd movements, Billboard #1s in 1980 and 2000, cúmbia tracks, arias in Bizet’s Carmen, etc.

J 4½ mins. see p. 406, esp. ftnt. 46, plus Billboard #1s 1990, Strauss waltzes; Radiohead, heavy metal, industrial and raï tracks, etc.

J 5-8 mins. tracks by Oscar Peterson, Iron Maiden, Metallica, Pink Floyd; jazz funk and prog rock tracks; Barber’s Adagio for Strings; movements in Elgar’s Enigma Variations and Holst’s Planets, etc.

J 8-10 mins. Ornette Coleman tracks, Liszt piano pieces; first movements in symphonies by Beethoven, Dvořák and Mozart (late period), etc.

J 10-20 mins. fusion jazz tracks, Liszt tone poems, Wagner overtures, movements in Bruckner’s 4th and Berlioz’s Symphonie fanastique; Cambodian court music pieces.

> 20 mins. movements in Mahler symphonies, rāga performances, very long prog rock tracks, etc.



It goes without saying that you’re more likely to find extensional diataxis in pieces of music that are themselves extended. As Table 11-1 shows, durations of over five minutes are rare for pieces (songs, tracks, etc.) in the everyday musical diet of most citizens of the urban West. We hear very few purely instrumental pieces lasting more than five minutes, unless ‘free’ jazz, fusion jazz or some other form of instrumental art music (including noubas and rāgas as well as euroclassical symphonies, concertos and chamber music pieces) are on our musical daily diet. These types of music use a variety of underlying structural rules to give long instrumental pieces cohesion to performers and listeners engrossed in a narrative that relies on neither verbal nor visual sequences of events. Unless such music is used as background to other activities, listeners need to pay close attention in order to draw maximum benefit from whatever they’re hearing. Charles Mingus certainly believed this to be so when he threatened to walk out of a New York venue if, during his performance, anyone in the audience started talking, or if he heard the kching of a cash register or ice cubes clinking in a drinks glass. By insisting on immersion in performance Mingus adopted the same position as Wackenroder who in 1792 underlined the necessity of ‘disregarding every disturbing thought and all irrelevant impressions of [the] senses’ as the preferred listening mode for the new instrumental music of his day (p. 94). Rosen (1976: 155) explains the diataxis of that music from the late eighteenth century as follows.

‘[T]he application of dramatic technique and structure… was the natural outcome of an age which saw the development of the symphonic concert as a public event.The symphony was forced to become a dramatic performance, and it accordingly developed not only something like a plot, with a climax and a dénouement, but also a unity of tone, character and action it had only partially reached before.’

Sonata form

The most common narrative device used by composers of the (then) new instrumental music just mentioned was what theorists were later to call sonata form. Its dynamic is worth understanding, at least as a general principle, partly because it so clearly illustrates the extensional pole of the distinction drawn by Chester in Second Thoughts on a Rock Aesthetic (1970) between intensional and extensional music aesthetics (p. 272), partly because it’s something with which every student of Western music is supposed to be familiar.

Haydn, Mozart and Beethoven are three famous exponents of sonata form. The first movements of their symphonies, concertos, quartets and sonatas are usually constructed using some variant of an overall sonata form scheme (see Figure 11-10, p. 410). It’s a radically extended sort of AABA diataxis heard in movements lasting not three or four but typically between around seven and twelve minutes. Sonata form relies to a large extent on changes of tonal centre and on motivic difference to create interest and a coherent sense of narrative. Whereas a typical performance of a generic 32-bar AABA jazz standard running at 90 bpm in r-time might last for three minutes and consist of an Intro ( 0:15) followed by the actual chorus with repeats, for example AABAABA (1:25 + 1:25 = 2:30), plus some sort of ending (0:15), euroclassical sonata-form movements elaborate on the underlying AABA structure in a very different way. For instance, the 1981 Kubelik recording of the extremely popular first movement in Mozart’s 40th Symphony in G minor (1788) has, as shown in Figure 11-10, a running time of 8:47. That’s almost three times longer than the generic jazz standard just mentioned.

Fig. 11-10. Simplified sonata form diagram of first movement in Mozart’s 40th Symphony in G minor, K550 (1788)

Figure 11-10 shows that this Mozart movement consists of: [A1] an exposition section (0:00-2:07), repeated once [A2] (2:07-4:22) with an extended cadence; [B] a development section (4:22-5:44); [A3] a recapitulation with coda (5:44-8:47). Inside that overall AABA structure there are several key changes. Everything in the home key is shown on the bottom line of the diagram, the dominant neighbouring key on the upper line. There is modulation roughly halfway through the exposition (A1 and A2) from the initial home key and the movement’s main theme (a) to a secondary theme (b) of contrasting character in the dominant neighbouring key. Expositions conclude with a final cadence in that other key (just before 2:07 and 4:22).

Development sections (B) are often short but they can be quite dramatic since they often mix ideas from both the A and B themes and because they usually visit several different keys not necessarily closely related to the home key or tonic. In fact the irregular shape of harmonic process in the development section of Figure 11-10 does not represent actual key changes: it just serves to indicate that development sections can move the music out of the tonal comfort zone of the exposition and recapitulation. With the recapitulation (5:44 in Fig. 11-10) the famous Mozart first movement returns to the initial a theme in the initial home key. The B theme is also stated in the home key and the movement is rounded off with an extended final cadence plus coda, all in the home key.

The basically recursive, centripetal and extensional narrative just described is based on: [1] the affective and connotative value of each of the movement’s constituent episodes, including the A theme, B theme and passages that join those themes (‘bridge’ or ‘transitory’ passages) and/or that modulate between keys; [2] sequences of keys visited for different durations at different points in the movement; [3] the relation of those keys to the home key or main tonic; [4] the general treatment and interplay of the A and B themes, including when and where they occur in which key. Although this sort of narrative may once have been heard and felt without effort, it’s unfamiliar to most of us brought up in the urban West since the first half of the twentieth century and we have to learn it conceptually before we can actually perceive and feel its dynamic. However, this rewarding learning process has tended to overshadow other aspects of analysis and appreciation in conventional studies of Western music in two unfortunate ways.

First, the teaching of sonata form, once institutionalised, fell into a rut of key- and theme-spotting exercises that were easier to correct and quantify in exams than were insights into emotional, kinetic and other semiotic aspects of the narrative. Secondly, the extensional aspect of sonata form and the canonisation of euroclassical ‘masterworks’ conceived in that mould fell prey to the syntax fixation discussed in Chapter 4, tendencies that trivialised intensional aesthetics and that put tonality, especially its harmonic aspects, on top in a hierarchy distinguishing ‘primary’ from ‘secondary’ parameters of expression.

Other types of euroclassical narrative

Although sonata form may take pride of place in conventional studies of euroclassical music, it is by no means the only type of diataxis in that tradition. Apart from the da capo aria (p. 399 ff.), all the minuets, bourées, courantes, jigs, allemandes, waltzes, marches, scherzos, etc. found in overtures and dance suites, as well as in the third movements of symphonies and quartets, are, with their less extensional types of diataxis resembling those of popular song and dance, just as much part of the tradition as is sonata form. Among other euroclassical narrative schemes still heard today, though labelled differently, are rondo and theme and variations.

Rondo involves a main theme (‘A’) that functions as a ritornello (instrumental) or refrain (vocal/instrumental) occurring before, between and after the presentation of a number of different episodes, typically ABACADAE…A. This sort of narrative is rare in contemporary Western popular music but is often heard in Indian rāga-based music where the recurring theme is called bandish (vocal) or gat (instrumental).

Theme and variations has, as a narrative strategy in music, a long history and is still going strong. It comprises a main theme that is first stated ‘as is’, then subjected to a series of modifications (the variations) before it rounds off the entire piece in either its original guise or as an easily recognisable version of it. Variations are sometimes used to display virtuosity involving lots of fast notes or elaborate melodic ornamentation. In euroclassical music, variations can be harmonic as well as melodic, but in jazz performance of 32-bar standards, as well as in blues or rock renderings of eight- or twelve-bar matrices, the chord changes are constant and attention is focused on the solo improvisation of melodic lines that diverge more radically from the original tune than do those of euroclassical variations in relation to their theme. In euroclassical, jazz and rock traditions each variation (classical) or improvised instrumental solo ‘chorus’ (jazz and rock) has the same duration as the theme (classical) or first and final chorus (jazz and rock).

Many patterns of extensional diataxis in euroclassical music, such as the fugue and classical concerto form, have, like sonata form itself, fallen out of common use since the nineteenth century. The same goes for other euroclassical ‘forms’ that are really compositional devices rather than types of diataxis. The chaconne or passacaglia in slow triple metre is one such device. It belongs to the larger category of ground bass by which is meant a short bass line that repeats every few bars and over which different rhythmic, tonal or harmonic lines can be created. The ground bass is still widely used in popular music. For instance, Pachelbel’s well-known Canon in D major (c. 1690), whose entire bass line consists of a repeated sequence of eight notes of equal duration, is used prominently in pop or rock recordings like All Together Now (Farm, 1991), Basket Case (Green Day, 1994), Cryin’ (Aerosmith, 1993) and The Streets Of London (McTell, 1974). Another example is the descending bass line, drawing on the start of Bach’s famous Air (1721), in A Whiter Shade Of Pale (Procol Harum, 1967).


Harmony as episodic parameter

Many types of popular song stay in the same key throughout. The twelve-bar blues matrix, for example, repeats the same underlying single-key chord sequence, while others distinguish between sections by introducing different melodic or rhythmic ideas, different instrumentation, different chord sequences and so on. However, key changes can also mark episodic difference in other types of popular music, one obvious example being the 32-bar jazz standard, whose B section (the ‘bridge’ or ‘middle eight’) almost always features material in a different key to that of the A section (p. 287 ff., 397 ff.). Among countless examples of verse and refrain in different keys are Granada (Lara, 1932), SOS (Abba, 1975) and Don’t Stand So Close To Me (Police, 1980), all of which proceed from minor (verse) to major (chorus). Military marches and minuet movements from euroclassical symphonies almost always include a ‘B’ section (‘trio’) set in a different key closely related to that of the main ‘A’ section. Similarly, different tunes played in uninterrupted sequence by ceilidh bands as part of the same set are often heard in different but closely related keys.

Another easily recognised type of wholesale episodic key change in popular music is described as follows on the TV Tropes website.

‘The Truck Driver’s Gear Change’ [occurs] ‘near the end of a song, shifting upwards by some relatively small pitch increment —usually by one semitone (half step) or whole tone (whole step), but occasionally by other intervals.’

This device was commonly used in Eurovision songs from the 1960s and 1970s to crank up listener involvement by literally taking the song to new heights and signalling an imminent end to the performance. Among countless well-known recordings to feature the truck driver’s gear change are Hasta Mañana (Abba, 1974a), Seasons In The Sun (Jacks, 1974) and Living On A Prayer (Bon Jovi, 1986).

Conclusions and questions

Diataxis is semiotically important in two main ways:

1. It can act as style indicator (p. 523 ff.), letting listeners know what sort of music they are hearing, what sort of events to expect in which sort of order, and potentially connoting the historical, social and cultural location of the piece.

2. By arranging episodes in a particular order with particular patterns of change and recurrence, diataxis exerts strong influence on the narrative sense of a piece in that, for example, A followed by B and ending with A does not convey the same overall message as B followed by A and ending with B, or A ending with B repeated, etc.

Some useful semiotic questions to ask in conjunction with a piece’s diataxis might be:

1. Does the piece change mood/character at all? If so, where do the piece’s various episodes start, end or recur?

2. Which parameters of expression —instrumentation, timbre, vocal persona, groove, metre, periodicity, harmony, etc.― determine identification of different episodes in the piece?

3. Does the piece have a narrative form corresponding to any of the diataxis types mentioned in this chapter (strophic, verse-refrain, AABA, sonata form, theme and variations, etc.)? If so, which?

4. How much of the piece is occupied by which episodes? Is there a lot more of A than B or vice versa? Or are they of equal importance?

5. Is one episode more memorable or singable than another? Is there any sort of build-up or clear run-in to one episode (see ‘Episodic markers’, p. 515 ff.)?

6. Does the piece have a hook? If so, where and how often does it occur?

7. What would happen if the piece’s episodes were ordered differently (see discussion of Fernando (p. 386 ff.) and A Day In The Life (p. 402)). Would changing the piece’s diataxis make any difference to its overall sense? If so, what sort of sense would it make instead?

8. Does the piece go round in circles or proceed more linearly (see ‘Cyclical processes’, p. 392 ff.)? If the piece is cyclical, does it return to the same basic mood as it started in (centripetal)? Or does it lead somewhere different?

9. How different in mood and character are the piece’s episodes? Are changes from one episode to another abrupt or gradual? What effect, if any, is created by changes from one episode to another?

10. Is the piece’s diataxis indicative of any particular musical style or genre (pp. 266 ff., 523 ff.)? If so, which?

And what about the other types of macro-parameters discussed in the next chapter? Do they in any way contribute to the identification of episodes in your analysis piece or of its stylistic and generic habitat?


Fig. 12-4. Monocentric



Fig. 12-10. ‘The bells! The Bells!’

(a) Vaughan Williams:

The Lark Ascending (1914)

(b) Schubert: Ave Maria (1820)

(c) Emmett: Dixie (1859)







(b) 2012-09-28, 19:30

______________________________________________ 2012-09-28, 19:30


13. A simple sign typology

This chapter summarises basic ways in which musical structures relate to what they can be seen to elicit by way of response. At least that is how this sign typology came into being. Its rationale is easiest to grasp by considering a concrete example.

The second of the Ten Little Title Tunes (TLTT) tested on 607 respondents was a cover version of a TV Western theme —The Virginian. Common responses to that piece were across, broad expanses (incl. panorama, open plains, prairie), cowboy, hero, horse, riding, towards, TV and, of course, Western. Those VVAs fall into two main categories: [1] Adventure, TV and Western; [2] hero riding across open landscape. The first category consists of narrative genre responses, while the second —a hero riding across open landscape towards some destination or other— consists of VVAs that might just as well apply to a Hungarian galloping across the Puszta in 1241 as to a cowboy riding the range in 1881. A lone hero dashing across wide-open spaces on horseback is kinetically and spatially, not geoculturally or historically, specific and music for such scenes would typically include patterns of sonic movement suggesting horse riding rather than skating or caressing, an individual rather than a gang, plains rather than streets, speed rather than calm, etc. Indeed, horses, riding and some sort of open landscape were, in addition to Western, The Virginian’s most common VVAs while other Western narrative tropes were absent. There were no wagon trains, no unshaven villains terrorising the townsfolk, no U.S. cavalry, no pow-wows or wigwams, no bar room brawls, no Mexican bandits, no buggy rides, no shoot-outs on main street, and no horses being serenaded by the light of a camp fire. The Virginian’s respondents clearly heard a particular type of Western. Some of the musical structures signalling that particularity were indicative of a particular musical style (style indicators) while others were more kinetic and spatial (kinetic anaphones). Such distinctions were useful in trying to determine which musical elements in the piece were more likely to connect with which responses.

The simple sign typology that follows has two complementary uses. [1] It facilitates, as just suggested, consideration of the ways in which certain responses to the music relate to certain sounds in the analysis object (AO). [2] It can help in the formulation of hypotheses about the connotations of the musical structures under analysis.

Table 13-1. Sign typology: basic overview

Sign types page Minimal description




Anaphone sonic 487 similarity to paramusical sound

tactile 494 similarity to paramusical perception of touch

kinetic or

spatial 498 similarity to paramusical movement

and/or paramusical space

composite 509 similarity using several modes of perception

social 514 similarity to size and type of (social) group



Diataxeme episodic

determinant 515 structural elements determining the division of music into distinct sections


marker 516 short processual structure signalling temporal position or relative importance of events

diataxis 522 overall patterning of sections (episodes)

into one process or set of processes


Style style

indicator 523 aspects of musical structuration indicating the ‘home’ style of the music in question

flag genre

synecdoche 524 pars pro toto reference to ‘foreign’ musical style, thence to cultural context of that style


Before explaining and exemplifying each of these sign types it should be noted that only two pairs of sign types shown in the table are mutually exclusive: [1] genre synecdoches and style indicators; [2] episodic determinants and episodic markers. With those exceptions, explained later, links between musical structure and VVAs can almost always be understood as combinations of any of the sign types just listed, as composites. That observation also tallies with the qualities of music as a mode of cross-domain representation and synaesthesis (p. 62, ff.), as well as with the importance of considering musemes not so much as single entities but rather as layered stacks of meaning (p. 235, ff.). It’s also worth flagging up that some sign types —anaphones, episodic markers and genre synecdoches— work at the level of musemes and museme stacks while the others —style indicators, episodic determinants and form patterns— are meaningful at the level of idiom and process. The former are the contents of the latter, so to speak.


Anaphone is etymologically analogous to analogy; but instead of meaning the ‘imitation of existing models... in the formation of words’ (ana-logy), anaphone means the use of existing models in the formation of (meaningful musical) sounds. Anaphones are in that sense homologous sign types and can be thought of in three main categories —sonic, kinetic and tactile— depending on which mode of perception —sound, movement or touch— is most striking in the link between musical structure and paramusical phenomena. That said, anaphones are usually composites of sonic, tactile and kinetic perception.

Sonic anaphones

A sonic anaphone can be thought of as the musical stylisation of sound that exists outside the discourse of music. Such sound can be produced by the human body, by animals or by elements and objects in the natural or man-made environment. In this section sonic anaphones are divided into two principal but overlapping subcategories: those produced by the human voice (vocal anaphones) and those that aren’t (non-vocal). Let’s start with the latter.

Non-vocal anaphones4

The babbling brook piano accompaniment in Schubert’s Die schöne Müllerin (1822), the thunderstorm in Beethoven’s Pastoral Symphony (1808), the earthquake in Bach’s Matthew Passion (1729), William Byrd’s bells (c. 1600), Daquin’s cuckoo (1735), Messiaien’s blackbird (1952), Hendrix’s B52 bombers (1971), the first motorbike second of Sweet Hitch-Hiker (Credence, 1971) and The Fools’ Psycho Chicken (1980) all contain pretty obvious sonic anaphones. However, as Rösing (1977) points out, sonograms of Schubert brooks or of Beethoven thunder share very little objectively in common with the real sounds those musical stylisations are supposed to represent. Still, that is hardly the point because the structural homologies between real and musical brooks or between real and musical thunder stem partly from cultural convention and social experience, partly from differences in sound technology. This dual mechanism explains why Vangelis’s sampled rain in Soil Festivities (1984) sounds much more like actual rain than Beethoven’s or, for that matter, any of Eisler’s Fourteen Ways of Describing Rain (1941).

To put things simply, you may well have a non-vocal sonic anaphone on your hands if a musical structure under analysis can be described using words of the following type:

bang, beep, boing, boom, bubble, bump, buzz, chatter, chug, chunter, clang, clank, clatter, click, clunk, clip-clop, crack, crackle, crash, creak, crumble, crumple, crunch, diddle, ding, dong, drip, drop, fizz, fizzle, flutter, grate, grind, gush, jangle, jingle, kerplunk, knock, lap, patter, plip, plop, pop, pow, rat-a-tat, rattle, ring, ripple, rumble, rustle, scrape, scratch, shatter, sizzle, slam, slap, slosh, smack, smash, smatter, snap, splash, splatter, splutter, squelch, squish, swish, swoosh, throb, thrum, thud, thump, thunder, thwack, tick-tock, tinkle, tintinnabulate, trickle, twang, vroom, whirr, whoosh, whizz.

Add to that all the sounds emanating from the animal kingdom —bark, bleat, chirp, cluck, moo, purr, quack, twitter, whinny, etc.—, plus words that can describe both human and non-human sounds —babble, belch, breathe, call, choke, chortle, cough, croak, cry, fart, groan, growl, gulp, gurgle, hiccup, hiss, howl, hum, moan, murmur, roar, scream, sigh, slobber, snarl, sneeze, snivel, splutter, squeak, whisper, whine, whistle, yell, etc. (see p. 492)— and you may find it easier to identify a non-vocal sonic anaphone somewhere in your analysis object. If not, you will almost certainly stumble on some other type of anaphone.

Vocal anaphones

Vocal anaphones are those in which the music’s melodic and rhythmic profile resembles that of speech or other human types of vocal utterance. Vocal anaphones musically stylise either linguistic rhythm and intonation or paralinguistic utterances like sighs and interjections of disgust, delight, horror, surprise, relief, contentment and so on.

Linguistic vocal anaphones


Transscansions are the most striking type of linguistic vocal anaphone. They are short wordless motifs whose melodic and rhythmic profile closely resembles that of at least two spoken syllables associated with the music in which it occurs. The talking drums of some traditional musics from West Africa demonstrate one of the most obvious and consistent uses of transscansion but transscansions also abound in contemporary Western mass media, notably in advertising and title music. The Intel Inside jingle provides one familiar example, its four notes echoing the rhythmic pattern of the four spoken syllables ‘Intel inside’.

Transscansions typically serve to highlight and reinforce key words or phrases that are sung elsewhere in the same piece of music or that appear in its title. They are common in the instrumental intros and outros to opera arias, Lieder, oratorio choruses, parlour songs and pop songs, where intros pre-empt and outros echo the prosodic profile of words in the vocal line, as in the Hallelujah chorus from Handel’s Messiah (1741), or Schubert’s An die Musik (1816), or Claribel’s I Cannot Sing The Old Songs (c.1855), or The Beatles’ Please Please Me (1963). A similar process operates when church organists play the first and/or last line of a hymn tune to prepare the congregation for what they are about to sing: hearing the first line of, say, ‘O God, our help in ages past’ without words establishes in advance both tune and words of the hymn ‘O God, our help in ages past’.

There are two other sorts of linguistic anaphone. Less striking than transscansions, the language identifier and the stock-phrase homology involve no explicit echoing of actual words and can occur in either vocal or instrumental strands of the music.

Language identifiers

As its label suggests, the language-identifier involves melodic-rhythmic motifs characteristic for the prosody of a particular language. For example, the two-note disyllabic Scotch snap, as in coming, going, body, hit it etc., is as typical a rhythmic trait of English or Gaelic as it is uncommon in Italian or Spanish, which in their turn feature trisyllabic patterns —Milano, Sevilla, ti amo, tus ojos, te quiero, la notte, mi vida, il mare, la playa, etc.— that are much rarer in English or Gaelic. Although anaphonic in that they resemble speech independent of musical discourse, the vocal anaphones of language identification also function as style indicators or genre synecdoches because, by signalling specific speech patterns, they can connote the language and culture of which those patterns are but a small part.

Stock-phrase homologies

Stock-phrase homologies also relate to prosodic aspects of speech. Like transscansions and language identifiers they are also wordless melodic-rhythmic motifs but they neither echo syllables directly associated with the piece in which they occur (transscansions), nor do they necessarily connote (although they can) any culture through the specific prosody of a particular language (language identifiers). As the label suggests, stock-phrase homologies are melodic-rhythmic motifs that can be heard to resemble the prosody of common verbal statements uttered with the attitude appropriate to their message. The motif under investigation might melodically and rhythmically resemble more a short, sharp shut up! or go to hell! than a mellifluous I love you because you understand, dear. Or maybe it sounds more like a flustered get a move on! or leave me alone!, or perhaps a sincere that’s so kind of you, or a provocative you ain’t seen nothing yet. For example, the sentence I love you because you understand, dear, as sung by Jim Reeves (1964), has exactly the same rhythm, duration and number of syllables as I hate you because you make me vomit. Unless comic effect is intended, these two isorhythmic and equidurational sentences are unlikely to be uttered using the same timbre, intonation, diction or dynamics.

Paralinguistic anaphones

Paralinguistic anaphones involve the musical stylisation of non-verbal vocal expression. Such anaphones can stylise the sounds of, for example:

babbling, booing, cackling, calling, crying, cheering, giggling, groaning, growling, grunting, gurgling, hicupping, howling, laughing, moaning, mumbling, murmuring, muttering, screaming, sighing, slobbering, snarling, snivelling, squawking, squeaking, squealing, stammering, whimpering, whining, whooping, yammering, yawning.

They can also be heard as resembling the vocal expression, with or without words, of states of mind and emotions, such as:

[1] abandon, ecstasy, extraversion or control, restraint, introversion; [2] loneliness, solitude or a sense of community and belonging; [3] contentment, hope, enthusiasm or anger, apathy, irritation, confusion, consternation, despair, despondency, disinterest, regret; [4] calm, confidence, determination or fear, worry, panic, apprehension; [5] delight, appreciation, encouragement, happiness or disgust, disdain, grief and sadness; [5] relief or tension; [6] surprise or shock; and of course [7] pleasure or pain.

Another way of understanding paralinguistic anaphones is to hear them in terms of either the prosody or social function of the type of utterance they resemble. Are any of the melodic motifs in your ao in any way similar to the prosody of any of the following types of speech?

accusing, agreeing, affirming, announcing, apologising, approving, arguing, asking, beseeching, bewailing, cajoling, challenging, chattering, chiding, complaining, condemning, confiding, confirming, comforting, cursing, declaiming, denying, disagreeing, encouraging, exhorting, gossiping, negating, objecting, ordering, persuading, pleading, praising, praying, preaching, provoking, ranting, reciting, teasing, taunting, threatening, warning, whining.

Or perhaps some melodic utterances in your ao may be presented in a form that resembles one of these types of speech:

monologue (incl. monologue intérieure), rumination, discussion, declamation, conversation, proclamation, question and answer, call and response, opening line, punchline, comment, (theatrical) aside.

Vocal anaphones can sometimes also be roughly indicated using interjections like aah!, aïee!, brrr!, eh?, gasp!, groan!, grrr!, gulp!, help!, hm!, ho-ho!, huh?, hush!, no!, oh yes!, oh-ho!, oh no! ooh!, oomph!, ouch!, phew!, phwoah!, sh! sigh!, slurp!, tee-hee!, ugh!, uh-uh!, whoopee!, wow!, yay!, yawn!, yee-ha!, yes!, yuck! or zzzz! However, given the connotative precision of music, it’s often more instructive if you can find words that coherently express the attitude or state of mind stylised in the relevant anaphone. Remembering that logogenic precision rarely resembles its musogenic counterpart, it’s unlikely that the words you find to indicate the relevant state of mind will create a stock-phrase homology, let alone a transscansion. Concocting lyrics is not the point of the exercise here: it’s a matter of prosaic verbal precision about musical discourse, not of it. That’s why it’s more useful to hazard interpretative guesses like If only things were that easy!, or I’m totally lost and confused, or Now I’ll show you, or even Life may be hard but I’ll not be put down than to hide behind the convenient truth that music cannot be put into words. The aim is not to put music into words but to use words to formulate an approximate or metaphorical description of what you think the music might mean. You will never be able to prove conclusively that your interpretation is right but you can at least test its plausibility using the approaches presented in chapters 6 and 7.

The fact that vocal expression, verbal or non-verbal, is intimately related to wordless musical statements will be evident to anyone familiar with certain types of blues, rock, Country, or with any other type of music in which call-and-response patterns are divided between voice (the ‘call’) and instrument (the ‘response’). For example, instrumental fills between vocal phrases in the musical styles just mentioned are ana- phonic in that they suggest conversation: sometimes they seem to merely answer the vocal line, while at others they comment and expand on what has just been sung. Perhaps even more indicative of language’s close relationship with melodic lines is the use of wah-wah and the talk box in rock music. Instrumental fills and comments, as well as the devices just mentioned, all serve in one way or another to make an instrumental statement resemble the utterance of a human voice.

Sonic anaphones, non-vocal or vocal, may be the most obvious sort of anaphone because they are unimodal (sound representing other sound) but they are neither the only nor the most common type of anaphone. Transmodal anaphones in which musical sound links with other sensory modes and domains of representation, notably those of touch and, even more importantly, of movement, are central to the semiotics of music.

Tactile anaphones

Sensuous string pads

One of the most familiar tactile anaphones to Western ears must surely be the sound of orchestral strings bowed smoothly to produce a slowly moving, continuous, chordal texture. On synthesisers such sounds are called string pads because they pad holes and fill gaps in the music’s overall texture. Performed live on a group of stringed instruments, string pads are characterised by their lack of distinct attack and decay, and by their relatively consistent envelope, all often enhanced by considerable amounts of reverb. This sort of sound produces a homogeneous, thick, rich, viscous sonic effect and, by haptic synaesthesis, sensations of luxury, comfort and smoothness. This observation can be substantiated by noting titles and in-house descriptions of library music featuring thick (rich, lush) string scoring of pleasant harmonies, for example: [1] Lullaby Of The City: home, soft and velvety, gently flowing, quiet, intimate and restful; [2] Penthouse Affair: fashions, sweetly melodic, slightly nostalgic but sophisticated, ‘dressed in silk and satin’; [3] Amethysts for Esmeralda — rich and dreamy; [4] Girl In Blue — lush, smooth melody; [5] Valse Anastasie — romantic, lush; [6] Sequence for Sentimentalists — rich, romantic theme. Viscous string pads have of course also acted as sonic emulsifiers in many a voluptuous Hollywood love scene.

Soft, gentle, velvety, silk and satin, lush, smooth and rich all connote a particular range of tactile sensations to which I added, more prosaically, homogeneous, thick, emulsified and viscous in efforts to concretise the shared physical characteristics of the unctuous aspect of the sort of string pad anaphone under review here. Oily, sticky and glutinous would admittedly be strange descriptors of the sound, but saccharine, sugary, syrupy, sappy, slushy, gooey and mushy have all been used disparagingly to qualify music that listeners find too lush, too cloying, sweet or rich. That said, sweet viscosity, musical or otherwise, cannot be unpleasant per se, as apposite correlatives like juicy, succulent, creamy, soft and smooth clearly suggest. Besides, the other tactile aspect of the library music descriptions —the silk and satin side of the anaphone — embody lithe delicacy and gently flowing smoothness as counterbalance to the deeper, thicker and more viscid aspect of smoothness in the rich string texture.

The lighter quality of smooth sheen (silk and satin) rather than of deep viscosity is largely attributable to higher pitch (higher as lighter, less heavy, less dark) in terms of fundamental or harmonics. String pads, live or synthesised, are characterised by a timbre that is normally rich in high harmonics so that notes played in the mid, or even low, register include an element of ‘shine’. Indeed, expressions like ‘shimmering strings’ and ‘silver strings’ are so widespread that suggesting synaesthetic links between the sheen of silk or satin and the ‘shine’ of ‘satin strings’ seems superfluous. The sweetly melodic co-descriptor of the silk and satin piece Penthouse Affair (p. 494) hints at another explanation of the contrast between the silky delicacy and rich viscosity types of smoothness: not only is the piece’s melody set at a higher (lighter) pitch than the average of its accompanying string pad; its legato notes also change at a quicker pace than those of the underlying string pad’s chords so that it can become, as the library music company put it, sweetly melodic and more gently flowing like Lullaby Of The City.

Now, although soft, gentle, velvety, silk and satin, lush, smooth and rich all indicate clearly tactile qualities, several of the words just used to describe string pad sounds obviously relate to taste —sweet, sugary, syrupy and creamy, for example. So why are gustatory anaphones absent from this taxonomy?

The first reason is that most of the taste words used so far have as much to do with texture or consistency (creamy, syrupy, juicy, etc.) as with taste itself and that those of unquestionably gustatory derivation (typically sweet and its opposite numbers sour and bitter) are just as often used in everyday speech to qualify perception that is non-gustatory, as in ‘a sweet-natured person’, ‘a sour face’, ‘a bitter experience’, etc. Although the sweet in the sweetly melodic description of Penthouse Affair (p. 494) may relate transmodally to the actual taste of sugar, honey or a ripe peach, that use of sweet is just as likely to be non-gustatory in the sense of ‘attractive, pleasant or endearing’. The second reason for excluding gustatory —and olfactory— anaphones from the taxonomy is that English has far fewer words descriptive of taste —and even less of smell— than of non-verbal sound, not to mention vision. This heavy reliance on other modes of perception (mainly tactile) to describe taste and smell explains why gustatory and olfactory anaphones are less useful in the semiotic analysis of music than those relating to sound, touch, space, mass and movement.

Rough and grainy

At the other end of the tactile spectrum from smooth, rich, lush sounds that pad holes, fill gaps and flow gently are sounds perceived as rough, coarse, grainy, uneven, choppy, harsh, bitter, sharp, piercing, etc. Excluding the more kinetic than directly tactile pair of opposites flowing-choppy, one clear example of rough, grainy sound in music is the distorted electric guitar used in some types of rock, in particular to produce the power chords of heavy metal.

In a 30-second TV ad from 1986 for an electric shaver, a fully distorted power chord on electric guitar, very similar to those heard at the start of Money For Nothing (Dire Straits, 1985), accompanies footage of a man riding a red Ducati through desert scenery. It includes close-ups of spinning bike wheels cross cut with the shaver’s rotor blades and shots of the man’s face in the bike’s rear-view mirror juxtaposed with close-ups of the same guy in his shaving mirror as he attacks some serious stubble. We’re obviously supposed to equate the typically male activity of shaving with the typically male excitement of riding a power bike and with a successful rock and roll lifestyle. These visual links rely partly on the well-established sonic anaphone rock power chord = motor bike, but in reality the shaver sounds much lighter (mid-range, low-volume buzz), much less powerful than the bike (full-throated roar). That obvious incongruity is counteracted in two ways: [1] a commonality of maleness between shaving and power-bike riding is taken as read; [2] the distorted guitar timbre works not only as a sonic anaphone for the bike but also as a tactile anaphone reminiscent of the not altogether unpleasant abrasive sandpaper sound and scratchy, tingling sensation of shaving. After all, both dzzzz (shaver buzz) and grrrr (bike vroom) have more in common, both in terms of touch and sound, with each other than they do with, say, babble, bang, clank, clink, ding-dong, eek, hiccup, kerrang, mmmm, oops, patter, rustle, sigh, splash, splutter, swish, tee-hee, tinkle, ugh or whisper.22

It’s worth noting that tactile anaphones, be they smooth and silky or rough and abrasive, usually relate to timbral properties of music and that the borders distinguishing them from sonic anaphones are fluid. The dividing line between tactile and kinetic anaphones is even fuzzier because touch is impossible without movement.

Kinetic anaphones

Since neither sound nor touch can exist without movement (energy and mass in space), both sonic and tactile anaphones should perhaps have been dealt with as subtypes of the kinetic anaphone. Indeed, all aspects of musical structuration stylising anything on a scale running from total stasis to frenetic action, are intrinsically kinetic. However, since experience has taught me that students find it easier to distinguish types of musical semiosis on the basis of different modes of sensory perception than by considering different combinations of different quantities of mass, energy and space, I’ve persisted in subdividing anaphones into sonic, tactile and kinetic, the latter relating strongly to elements of visual perception. Of course, this visual aspect does not mean that movement has to be seen, as any unsighted person will tell you and as any seeing person knows full well, especially if they’ve experienced a storm on the high seas in total darkness. It just means that kinetic anaphones are those that are, at least for the seeing, the easiest to conceptualise in visual terms.

Gross-motoric, fine-motoric and holokinetic

Kinetic anaphones relate closely to at least three of the six domains of representation that make up the embodying or combinatory (or ‘proto-musical’) level of representation discussed in chapter 2 (pp. 62-68). Those three domains are the gross-motoric, the fine-motoric and the ‘physical’. A very brief recap may be in order here.

The gross-motoric side of kinetic anaphones is easiest to envisage in terms of humans riding, dragging, pushing, pulling, driving, flying, walking, running, marching, trudging, skipping etc. along, through, round, across, over, to and fro, up and down, in relation to a particular environment or from one environment to another. Most types of dance contain culturally stylised kinetic anaphones —grooves— appropriate to certain types of human body movement. Gross-motoric kinetic anaphones can also be visualised as the movement of animals en masse (birds, insects, cattle, etc.) or of objects (machinery, vehicles, trees, cornfields, bodies of water, wind, etc.).

The fine-motoric aspect of kinetic anaphones relates to smaller, lighter and more delicate types of movement, for example blinking, glittering, shimmering, rustling, fluttering, babbling, clicking, tapping, ticking, fingering, fiddling, twiddling, dripping, tickling, and so on.

The physical domain of representation gives rise to complex kinetic anaphones in which different aspects of space are key ingredients and which always also contain fine- or gross-motoric elements. These anaphones, which can be called holokinetic, are spatio-temporal and involve the interaction of one or more objects (including bodies) with another or others, or with a mass. As stated on page 63, ‘the physical domain covers the ballistics, trajectory and kinetic relationship of a body (or bodies, including one’s own) to the type of space through which it travels (or they travel), or in which it is (or they are) motionless, as well as the relationship of movement between one body (or several bodies) and others in simultaneous movement or stasis’.

Spatial anaphones

Spatial anaphones are a subset of holokinetic anaphones that are easy to identify. Just as the same piece of live music heard in a hotel lounge sounds radically different when performed in a cathedral, the types of acoustic space assigned to different tracks in a multi-track recording, or to the mix-down as a whole, create virtual acoustic space[s] in the listener’s speakers, headphones and actual head. For example, my audio software features ready reverb templates allowing me to fake the following sorts of space: rich hall, wide open hall, concert hall, deep hall, long hall, medium hall, warm space, cavernous space, bright corridor, medium room, small room, live ambience, warm ambience, metal tank and sewer. I can also use different delay settings to create spaces like an empty urban alley, even to suggest to listeners that they’re hearing voices inside their own heads. I can apply these effects to one, more or all the tracks I record, depending on how I want them to be heard in relation to each other and on how I want the overall acoustic space of the recording to come across. I can of course also pan different tracks at various points to the left, right or centre of the recording and place them towards the back or front of the mix; and I can use pitch and EQ to make different tracks seem spatially higher or lower.

Identifying spatial anaphones in a recording lets you determine not only the positioning and relative importance of different strands in the music. It can also provide insight into the mediation of mood in the music. Do the spaces suggest, for example, something mystical or matter-of-fact, intimate or public, general or particular, open or enclosed, expansive or crowded, external and documentary or internal and psychedelic, an acoustic performance or a sci-fi scene? And how do the spaces assigned to different strands of the music relate to each other? The following descriptions of two rock/pop recordings illustrate how important spatial anaphones and phonographic staging can be in the mediation of musical message.

[1] The lead vocals seem to be a few metres away, together with a metallic-sounding drumkit, in an acoustic space resembling that of a disused factory. Guitar fills and synthesised stabs are punched straight into your face from much closer: they’re almost inside your head.

(Yes: Owner Of A Lonely Heart, 1983)

[2] The lead vocals are yelled from what sounds like the far end of a very long corridor. Closer by, life seems to move on inexorably with smooth but rather ominous or lugubrious synth pad patterns and an insistently tinny bell sound. (Massive Attack: Unfinished Sympathy, 1991)28

Other kinetic anaphones, like those of the vocal and instrumental tracks just alluded to, move inside (or can move in and out of) such varied constellations of acoustic space. Such movement can be fast or slow, hectic or serene, jerky or smooth, regular or irregular, oscillatory or unidirectional, sudden or gradual, repeated or varied, wavy or jagged, circular or angular, or relatively motionless. The movement, or lack of it, can occur in a wide open or tightly closed space that objects or beings can enter or leave in various ways, or in which they can remain. Music is eminently suited to representing perception of all such aspects of movement through combinations of kinetic anaphones. The notion of gestural interconversion, explained next, should shed a little more light on this matter.

Gestural interconversion

Diligent readers will recall the Austria and shampoo episode in chapter 5 (p. 167, ff.), featuring the Timotei shampoo advert and the theme song from The Sound of Music. I insisted that Austria and shampoo were musogenically the ‘same thing’, arguing that point on the basis of two facts: [1] that the Austria mentioned by respondents was not any old Austria but that of the Julie Andrews character in a long dress as she strolls, arms outstretched, through a grassy summer meadow against the backdrop of distant hills which, she proclaims in song, are ‘alive with the sound of music’; [2] that the shampoo in question was not so much a plastic bottle containing viscose liquid applied to the scalp in the intimacy of a shower cabin as the Timotei shampoo advert’s young blonde woman in a long dress sauntering through another summer meadow as she gathers wild flowers under the admiring gaze of a young man. A very similar type of gestural interconversion is at work in both the Austria and the shampoo scene: long flowing dresses, a leisurely walking pace, the potential sensation of summer grass against the legs, etc. The gestures immanent in both sets of images share a lot in common. So what is gestural interconversion?

Gestural interconversion is a two-way process that relies on anaphonic connections between music and phenomena perceived in relation to music. Since interconversion means the conversion into each other of two entities, gestural interconversion means two-way transfer via a commonality of gesture between, on the one hand, particular sensations that seem to be both subjective and internal, and, on the other hand, particular external objects (animate or inanimate) in the material world. Gestural interconversion entails in other words both the projection of an internal sensation via an appropriate gesture on to external phenomena and the internalisation or appropriation of external phenomena through the medium of a gesture corresponding in some way to the perceived form, shape, movement, grain, density, viscosity, etc. of those external phenomena. Of course, different cultures and individuals are liable to exhibit different patterns of gestural interconversion, so there can be no universal agreement as to which particular phenomena relate via which particular gestures to which particular sensations and vice versa. How does this work? Let’s start with some hills.

Fig. 13-1. Profile of the Clwydian range viewed from the Northeast

Figure 13-1 shows the outline of Welsh hills visible on a clear day from where I used to live in Liverpool. Moel Famau, in the middle, is 35 kilometres away as the crow flies, and the range stretches 45 km in a northwesterly direction from left to right. Apart from the vertical axis of Figure 13-1, exaggerated for purposes of clarity, these facts are irrefutable. Equally irrefutable is the fact that if you put your hand about 50 cm in front of you with your fingertips at eye level, you’ll find that the range of hills measures about 30 cm. You’ll also find that it takes roughly seven seconds for your fingertips to trace the ups and downs of that skyline from left to right. Three complementary types of movement are involved in completing this gesture: [1] your arm pans gradually from left to right; [2] your wrist rocks very slightly to outline the vertical contour of each hill and valley; [3] your elbow gently swivels a few degrees at each hilltop and valley floor to offset the vertical rocking and to ensure fluidity of movement and direction. A two-way process of gestural interconversion links the physical characteristics of these hills to the human movements just described. In one direction there is gestural internalisation from 45 km of external reality to 30 cm of hand gesture, quite a small-scale conversion in terms of map reading (1:150,000). In the other direction there is gestural externalisation (projection) from hand gesture to hills on a reciprocally large scale.

Figure 13-2 (p. 504) consists of five shots taken from the other side of the same range of hills as shown in figure 13-1. A sweep of the hand over the hills stretches this time from left (north) to the knoll on the far right (south). The scale of gestural projection here is now down to 1:40,000 because a panoramic 33 cm hand sweep over the hills or under the cumulus clouds only covers about 13 km of external reality.

Fig. 13-2. Dee valley looking east from Plâs Berwyn (Llangollen, North Wales)

Fig. 13-3. Some patterns of undulant gesturality immanent in Fig. 13-2

Much closer than the hills in the background, at about 25 metres from the camera, the tree line presents eight separate waves, all contained within one overall descending sweep spanning about 300 metres, or 33.3 cm at a hand-to-view scale of 1:900. Slightly closer still, at a scale of 1:600, the foreground terrain descends roughly 200 metres from left to right and is contoured by two rounded dips.

Figure 13-4a (p. 505) brings the external objects of gestural interconversion even closer. Ignoring the distant cordillera, a five-second hand gesture of 30 cm extending from your chest will trace over a kilometre of gently curved shoreline as small waves break towards you and ebb back every seven or eight seconds. Waves are also visible on the sea, as are changing areas of shadow and sunlight, both producing smooth patterns of movement a few tens of metres away. At least three different scales of gestural interconversion are at work in terms of undulation and of easy hand movements in this scene: 1:5,000 for the 1.5 km curve of the beach, 1:70 for the waves on the sea twenty metres away, and 1:20 for the small waves lapping at your feet from a distance of six metres.

Fig. 13-4. (a) Sunny day on a beach in Baja California; (b) unidentified clip art

You have to zoom in even further to trace the curves of the woman in Figure 13-4b. In order to see her in full figure, you’d need to be at a distance of about three metres. From that distance, a five-second, thirty-centimetre sweep of the arm down the curves, exaggerated by the clothing of the day and indicated by arrows, would take you from head to toe in a few seconds at a scale of about 1:7. The general ‘zoom-in’ drift of this wavy narrative should be obvious by now.

In Figure 13-5 gestural interconversion zooms in to a scale of 1:3. The Virgin’s head, shoulders and breast are now only just over a metre away from the viewer, her hands and knees a little less, just out of touching range at 1:1. In addition to the little round bundles of baby and breasts, Morales has painted his subject with long hair (locks on the left, strands on the right), and emphasised the arc of her neck and shoulders while creating additional curvature through folds in the smooth and supple material of her flowing garments. The Morales Virgin is far from the only European female subject to be portrayed in such terms. Long hair, arc of the neck, body curvature accentuated through exposure of bare flesh or by choice of suitable fabric etc. are recurrent features in our culture’s visual representations of desirable womanhood.

The ‘obvious narrative’ here is that an undulant 30-cm hand gesture fitting the outline of hills and valleys in the distance, or gentle waves on the sea or of wind across a cornfield at 100 metres, or the summer foliage of a large elm, oak or lime tree at 30 metres, corresponds to the actual length of curves in the adult human body, male or female, for example round the back of the head to the neck, round the nape of the neck to the shoulders, round the shoulders to the elbow, from the elbow round the wrist to the hand and fingers, and so on. What, you may well ask, does all this have to do with music?

As described in the Austria and shampoo section of chapter 5, among the most common VVAs provided by 607 respondents on hearing, without visual accompaniment, the first of Ten Little Title Tunes (The Dream of Olwen), were, in descending order of frequency: romantic, love, [long, green] grass (usually a meadow), summer, countryside, sea, a couple [man and woman], walking, sun, [sailing] boats, pastoral, flowers, girl, woman, family, woods, coast, beach, wind, lakes, hills, gliding, fields, cornfield. The same tune also scored well above average on connotations like summer, wavy, beck (creek) and river, and it was the only one to elicit the responses slow motion, valleys, rolling [hills], long dress, long hair and, of course, the kinetic and tactile aspects of shampoo as displayed in the shiny body-of-hair swish captured as stills in figure 13-6.

Fig. 13-6. TV advert for Pantène Pro Plus Shampoo (Canada, 2003)

The gestural common denominators, both kinetic and tactile, of all those waves on the sea, rolling hills and valleys, sandy beaches, trees, fields, long hair and dresses can be summarised in six points. [1] Long grass, flowers, corn in a field, trees and seas can all sway, undulate, ripple and flow. [2] Clouds can float, hills and valleys can roll. [3] Rivers, streams, long hair and long dresses can flow. [4] Flowing garments and long hair can sway. [5] Human movement filmed in slow motion floats or glides. [6] The ideal beach has smooth sand and a gently curving shoreline by the waves of the sea. Such observations illustrate the first premise of gestural interconversion: its shared characteristics do not derive from the unmediated objective qualities of the phenomena in question but from human gesturality, tactility, bodily movement and sensual perception which, within a given culture, can be observed as relating to the same objective phenomena — hence the waving of corn and of the sea, the swaying of hair and of trees, the flowing of loose garments and of rivers, etc.

The second premise of gestural interconversion follows from the first and states that it is possible to project the same basic set of human gestures on to all matter and objects perceived, in the manner just described, as sharing the same general qualities. This premise also assumes that the phenomena in question are perceived from particular perspectives, i.e. placed at such a distance from the human and viewed at such an angle as to allow a particular type of gesture to coincide with the perceived form, shape, surface or movement of the phenomena in question. If, as we saw, hills and dales are viewed from a distance of ten kilometres, or a cornfield or meadow from a hundred metres, or a large tree or waves on the sea from twenty, or the full figure of a human from three metres, or ‘his/her’ head and shoulders from one metre, etc., then the size, proportions and trajectory of a viewer’s gestures outlining the profile of those phenomena will be quite similar.

The third premise is a corollary of the second. Just as the same human gesture can be projected on to a set of gesturally compatible external objects, the same external phenomena can, if perceived from the relevant perspective, also be appropriated and internalised through the medium of gesture. It’s this two-way process of projection and appropriation through gesture which gives rise to the term gestural interconversion.

To clarify these principles, please note that The Dream of Olwen clocked up no response involving speed, conflict, aggression, crime, asperity, fear, eruption or disorder, and nothing in the urban, North American, darkness, danger, modernity or future-time departments (fig. 13-7, p. 509). That’s because the music featured kinetic anaphones perceived as wavy, undulating, smooth, curved, swaying, rolling, rippling, flowing and no sharp, square, angular, rough, sudden or choppy sorts of movement or touch. Gestural interconversion is in other words useful in semiotic music analysis because by synaesthetically connecting appropriate gesture with the music it allows us to concretise the sort of spaces, objects, textures and movements represented in sound. Of course, neither hair, hills, waves, fields and trees, nor motherboards, skyscrapers, radiators and Nazi phalanxes, are to be heard literally anywhere in any music, but space and movement mediated through commonality of gestural interconversion appropriate to either of those two sets of external objects is most certainly something that music can express with considerable force and precision.

Fig. 13-7. ‘Non-fluid’ (choppy/angular) gestural interconversion: (a) computer motherboard; (b) Chicago skyline; (c) radiator; (d) Nazi rally 1938

Composite anaphones

As mentioned earlier, the categories and subcategories of this simple typology of musical signs are rarely mutually exclusive. The following three examples —galopping, stabbing and newscasting— each illustrate how anaphones often combine sonic, tactile and kinetic aspects into one single museme stack.


The most common equine anaphone in European and North American music for the stage and the moving image must surely be the fast diddle-dum diddle-dum galop motif heard in such pieces as Rossini’s William Tell overture (1829 —also as tv theme music for The Lone Ranger), the signature tune for Bonanza (Livingston, 1959), and Morricone’s title music for both For a Few Dollars More (1965) and The Good, The Bad and the Ugly (1966). There may not be much tactility in this popular galop motif but its sonic and kinetic aspects are quite clear. Sonically this diddle-dum or giddy-up motif offers only a very approximate stylisation of horse hooves hitting the ground in full galop. That’s partly because the animal has four, not just three legs: it takes off after its fourth hoof hits the ground (on the ‘dum’ of diddle-dum or diddledy-dum, or on the literal up of giddy-up) for a duration similar to that occupied by the other three hooves touching the ground ([a] and [b] propelling into [c] and [d] in figure 13-8). Those three, not two, other hooves should logically prompt a four-syllable onomatopoeia like diddledy-dum rather than the mere three syllables of the popular galop motif diddle-dum or giddy-up. The strongest kinetic element in the motif is the group of two quick notes —the diddle or giddy— propelling energy towards an agogic accent at the take-off point —the dum or up.

Fig. 13-8. Galop: ‘diddle-dum’ (‘giddy-up’) or ‘diddledy-dum’?


Figure 13-9 (p. 511) is a reduction of Herrmann’s famous score for the shower scene at 0:48:24 in Hitchcock’s Psycho (1960). It’s included here as a textbook example of composite anaphones in action. Of the eighteen bars cited in the example, the first twelve consist of 36 equidurational stabs (8 every 5 seconds at 96 bpm), each played forcefully and very loud (sff = sforzando fortissimo) with down-bow strokes (ń). Such articulation creates strong, sharp, percussive, abrasive and piercing attacks for each note. Strident harmonic dissonances are added in a descending pattern so that the simultaneously sounded notes cover not just the high pitches heard on their own in bars 1-3 but progressively also encompass mid-register dissonances (bars 4-12).

Fig. 13-9. Herrmann: Shower scene music from Psycho (1960)

Sonically, the sudden, sharp, repeated notes in high register resemble a combination of female screams of terror and the sound of a large knife being sharpened. Tactile and kinetic anaphones are richer in this paragon of musical horror: the sharp, piercing, painful dissonances are one, the replacement of kheek-kheek [Xi:k] (bars 1-12) with the bweep-bweep [bwi:p] of bars 13-14 another. The short, sliding bw (glissando) of those six bweeps is particularly disconcerting because it replaces the straight hacking of the first 36 kheeks with a momentary softness of attack (bw instead of kh) at the onset of each stab. That modified attack resembles, in terms of both touch and movement, the initial resistance offered by the skin just before the knife point plunges into the body. Equally unsettling is the addition of progressively lower dissonances in bars 4-12, not so much because the hapless Marion slumps gradually down to her death as because pain occupies more and more of her entire body, starting with the high-pitched terror in (her) head register and gradually extending to the chest register and abdomen —from short, sharp, excruciating pain to more generalised, throbbing and internal haemorrhaging, so to speak.

The kinetic anaphone of regularly repeated stabs is also disturbing, perhaps surprisingly so since regularity usually implies predictability and because predictability generally implies order rather than chaos. However, since Hitchcock’s cutting of this scene is itself intentionally chaotic and disorientating (frequent and sudden changes of angle in jaggedly cut takes, etc.), Herrmann’s metronomically repeated dissonant stabs work as a disturbing counterpoint, as an inexorable, almost mechanically unstoppable, counterpart to the horror behind the chaos and mayhem shown on screen.


‘Staccato signals of constant information’ is how Paul Simon (1986) characterised this composite anaphone. He’s referring of course to the rapid, irregular rhythms of ‘news music’ that were so widely used throughout the 1970s and 1980s as urgency cues identifying TV productions like Sportsnight and Capital Radio News in the UK, like Sportnytt, TV-nytt and Aktuellt in Sweden, like TF1 Téléjournal Vingt Heures in France; and like This Week, Wall Street Week and The McLaughlin Group (world affairs programme) in the USA.

The kinetic aspect of these ‘staccato signals of constant information’ is closely linked to its sonic character. The rhythm of this type of anaphone resembles: [1] the unpredictable patterns of dots and dashes heard while sending or receiving Morse code messages, a sound associated with immediacy and urgency since the early days of telegraphy; [2] the fast but irregular string of repeated consonants uttered by someone stammering, a sound associated with stress, worry and urgency because even individuals who don’t normally stutter are more likely to do so if under pressure to say something important instantaneously. Moreover, the timbre of instruments articulating these rhythms often sounds ‘mechanical’ or ‘synthetic’, i.e. unlike the normal sound and utterance of a human voice or of an acoustic instrument stating a melodic line.

In terms of touch, this ‘news anaphone’ bears no trace of anything smooth, sweet, rich, soft, velvety, restful, dreamy, lush, fluid, flowing, rolling, rounded, wavy, swaying or viscose; but nor is it particularly rough, abrasive or eruptive. (Here the lines between touch and movement are obviously blurred.) Its short, quick notes clearly connect with fine-motoric rather than gross-motoric movement, with fingers tapping, or teeth chattering, rather than bodies bending, arms swirling or legs kicking. It more closely resembles rapid twitching, spluttering or flickering than babbling, rippling or rustling. It doesn’t have enough mass (too high, too light) to be square, jerky or angular but it does have a certain sharpness: not that of being cut by a knife or razor but of something more like running your fingertips down an irregularly pebble-dashed wall: no sensation of smoothness or fluidity there, just an incessant sequence of very small, hard and sharp points. Given also the mechanical or synthetic timbre that often carries these motifs and given their sonic etymophony (telegraphy, stammering), little wonder that these fine-motoric kinetic anaphones (sharply twitching, flickering) were also conflated with the stuttering sound of ticker-tape machines, with teleprinters and, consequently, considered thoroughly appropriate as urgency cues for newscasting.

Social anaphones

Some composite anaphones are not merely aggregates of textural, kinetic and spatial features because they also exhibit social traits involving the iconic representation of patterns of human interaction. These composites can be called social anaphones. Discussed at some length in chapter 12 (pp. 446-475) the organisation of different voices, instruments or recorded tracks into syncritic forms like unison, heterophony, homophony, contrapuntal polyphony, polymetricity, antiphony, call and response, figure-ground dualism, etc. can be heard as representing different types of relationship between individuals, or between an individual and his/her alter ego, or between individuals and their environment. We could be hearing a sort of conversation, or an argument, or agreement, or the statement of a single individual, or of several, or the relationship of an individual to an environment, or the ostensible absence of any individual in an environment. We might also be hearing everyone ‘singing from the same hymn sheet’ (performing exactly the same thing at the same time), or several people ‘saying’ basically the same thing but each audible as an individual (performing more or less the same thing at roughly the same time), or several people ‘saying’ different things but still simultaneously part of the same community. Or perhaps those individuals or groups of individuals are in conflict with each other, or maybe so many different things are being ‘said’ at the same time that the impression is more one of chaos or of a faceless crowd than of a collection of mutually distinguishable individuals. Although only a few concepts can be called upon to denote these social anaphones, they are easily perceived and identified if you are aware of their existence as a sign type and if you listen out for them in the music you’re analysing.



Diataxis is, as we saw at the start of chapter 11, the diachronic, episodic, extensional, long-term narrative aspect of form. A diataxeme is therefore an identifiable element of meaning relating to the music’s episodic order of events. There are three types of diataxeme: episodic determinant, episodic marker and overall diataxis.

Episodic determinants

Episodic determinants and markers, like most of the sign types presented in this chapter, had to be conceptualised because of differences in musical semiosis observed in conjunction with the reception test data referred to on several previous occasions. To be more precise, some pieces in the test battery gave rise to far more episodic associations than did others, i.e. to responses like has just, after that, after a long time, about to happen, leading to, and then. How were those pieces structurally different from less episodic examples?

The obvious conclusion was that test pieces featuring more structural change were heard as more episodic than those that just ‘chugged along’ recycling the same basic musical material from start to finish. In our test tunes we found five main structural factors operative in eliciting episodic responses: non-recurrent themes, harmonic blur, orchestral change, temporal change and unidirectional sweeps. Most of those structural change types are immediate. They put the music in a different place, so to speak, with little or no warning: the music changes key without preparatory modulation, or gets louder without an introductory crescendo, or changes register without a preliminary ascent or descent, or changes instrumentation, or melodic contour and rhythm, or underlying accompanimental figures, or metre, or tempo, or aural staging etc., or several of these all at once. These changes make one section of the music sound different to another (e.g. verse « chorus, A theme « B theme, instrumental « vocal, etc.) but are they episodic?

If episode, applied to music, means ‘a passage containing distinct material’ as part of a larger sequence of events, and if that passage lasts long enough to establish itself as an episode rather than as a mere phrase or event, then the sorts of change just listed are definitely episodic. However, such immediate changes do not so much mark episodic change as constitute in themselves the actual materials of difference that determine the existence of an episode. Such sounds determining the identity of one passage as distinct from another are therefore referred to as episodic determinants: they are the constituent ingredients of what sounds characteristic for the duration of the musical episode in question. They are in a manner of speaking the style indicators of a musical section, not of a whole piece.

Episodic markers

Unidimensional markers

One of the five episodic traits mentioned above —the unidirectional sweep— differs from the others. Since it does not in itself constitute episodic change but rather prepares it or draws attention to it, the unidirectional sweep does not determine the nature of a musical episode (as just defined): it ‘is’ not the change itself but the structure marking its imminent occurrence. Two of the more episodic pieces in the TLTT test battery contained this sort of unidirectional sweep. They were melodic figures rising quickly —about one second’s worth of ‘take-off’, ‘run-up’, ‘whoosh’, ‘scoop’, ‘swoop’, etc.— from mid to high register, at which point new melodic material was presented in that higher register. Neither figure lasted long enough to become an episodic determinant, nor did either of them occur in the material immediately preceding them. Since they were short events leading into and drawing attention to episodic change, not constituting it, they were called episodic markers.

It’s important to understand that an episodic marker has to be unidirectional. If a line of music leaps and falls straight back down again (<>) it’s no more than one single event inside an episode; but if it goes in only one direction —< or >— and leads into the new register continuing from the point of the arrow tip, it marks an episodic change from one register to another. The same principle applies to a crescendo (< : from quiet to loud) or diminuendo (> : from loud to quiet). If the process is unidirectional it can become an episodic marker drawing attention to the louder or softer section it prepares; but if it goes ‘there and straight back’ (<> or ><; < > or > <) it cannot: it merely draws attention to itself. It’s like queuing at the post office. When you see a cashier open a new window for service it’s an episodic marker (from closed to open), as is also closing the window after it’s been open for some time. But if the cashier opens then immediately closes the window there’s no episodic marker, just one single (and very annoying) event inside the ‘episode’ of being closed.

Chord sequences work in similar ways. Some are modulations (short chord progressions leading from one key into another), as just before the middle-eight (‘bridge’) section of A Nightingale Sang In Berkeley Square or Once In A While and in so many other jazz standards that the device works not only a marker of episodic change (to a new key) but also as a style indicator signalling ‘this sort of key change is typical for jazz standards’. But at other times, as we saw in chapter 8 (p. 341 ff.), short chord progressions operate as loops, like in the vamp until ready introductions to jazz standards or as tonal background to melody and riffs in rock and pop music. In addition to its function as style indicator, vamp until ready can work as an episodic marker when repeated at the start of a song because it draws attention to whatever breaks the repetition in terms of either a different chord sequence or, more commonly, the entrance of the main melodic line. Riffs in pop and rock, on the other hand, rarely, if ever, function as episodic markers because they are repeated throughout most of either an entire episode or piece (as kinetic anaphones), in which case they also act as style indicators.

Propulsive reiteration

One of the most common types of episodic marker is the propulsive reiteration. It’s exemplified in figure 13-10 and shows a stripped-down version of kit patterns you might hear from an extremely rudimentary rock drummer during the last two bars (= 8 beats or 4 seconds) of a four-bar period running in r metre at 120 bpm.

Fig. 13-10. Diddle-diddle drum fill as propulsive reiteration (episodic marker)

The diddle-diddle-diddle-diddle on toms packs eight notes into the final second (2 beats) of the period compared to the rate of just two notes per second (boom plus thwack) throughout the preceding three seconds (6 beats). Mid-to-low percussion notes are in other words articulated four times faster during that final second. Increasing the surface rate at the end of a period is a very common device for propelling the music towards whatever comes next, be it something new or simply a signal that the episode just ending is about to be repeated: it marks the end of one episode and the potential start of another. Such propulsion can also be generated by an increase in harmonic rhythm (the rate at which chords change), or by introducing a quick one-off scalar figure like the unidimensional upward sweep in the two episodic reception test tunes, or by repeating a short motif a few times.

Repeating a short motif a few times is a very common device for driving music forward into whatever comes next. This device, the propulsive reiteration exemplified in figure 13-10 (p. 518), is a kinetic anaphone also known as ready steady go!, sometimes as 1-2-3-go!, where steady is a repetition of ready, where 2 and 3 are repetitions of 1, and where go! is the point towards which the reiteration is propelled. Now, propulsive reiteration is a basic element of melodic construction in popular euroclassical pieces, in popular song from the Arab World, the British Isles, Russia, Scandinavia and elsewhere. However, even though all these sorts of reiteration propel movement towards whatever breaks the repetition, they only become episodic markers when their go! coincides with the start of an episode, old or new. Such episodic markers are clearest when the repeated motif is really short, as in the case of the diddle in our diddle diddle diddle diddle (actually a 1-2-3-4-go!) on page 518. The six repeated notes leading into the chorus of Abba’s Fernando (1975) provide another example of propulsive reiteration as episodic marker. In that instance, the surface rate of notes in the vocal line does not increase: the propulsion is generated by repeating the same note at the same pitch at the same surface rate for the six syllables there was something in the… producing a three-beat, 1-2-3-go! anacrusis: (1) there was, (2) something, (3) in the (go!)… It’s a bit like someone saying ‘Let me tell you that…, tell you that…, tell you that…’ before finally reaching ‘I love you’, ‘I hate you’ or something else altogether.

Other clear examples of propulsive reiteration are found as repeated centrifugal melodic mini-swirls in two popular waltzes: Die Fledermaus (J Strauss, 1874) and the Minute Waltz (Chopin, 1847). Their kinetic character is more like that of a discus thrower circling on the spot before reaching the go! point of actually hurling the object, less like a pole vaulter’s run-up to the take-off point. Whatever their kinetic niceties, these sorts of reiteration are often called ‘lead-ins’, ‘pick-ups’ or, more technically, anacruses. Their propulsive quality is anacrustic and they gather momentum to arrive at the go! point with maximum effect. That point may be a new section or the start of a reprise, but it can also be the final chord or note in a piece (1-2-3-stop!), because that final sound event is always new, given that it can logically occur only once in any piece of music.

Finality markers

Episodic markers of finality are obvious. The fade-out of a pop song, the repeated chords at the end of euroclassical pieces, the virtuosic milking of the penultimate chord to end numbers in stadium rock gigs, the descending flourish on sitar in a rāga performance, the final gong in gamelan music, etc. are all episodic markers of finality that are also archetypal style indicators. Finality markers can also exist inside the same piece of music. These end-of-phrase or end-of-section markers usually take the form of melodic, harmonic or rhythmic cadence patterns, also often formulaic enough to act as style indicators.


Breaks are also episodic markers. They are often used in pop and rock, as well as in jazz and dance music, and can in general terms be characterised as follows. A break starts when ongoing accompaniment patterns suddenly stop to give sonic space to, and thereby highlight, whatever occupies them. That ‘whatever’ can be total musical silence (in which case the ‘nothing’ is important), but it’s more likely to draw attention to catch phrases in the lyrics, or to the tune’s hook line, or to the skills of the drummer or of another musician, or to a point at which dancers can execute special moves, or freeze on the spot, or wave their arms in the air, etc. Obviously, breaks end when the suspended accompanimental activity is resumed.

Bridges and tails

In music for the moving image, episodic markers are often used to bring affective clarity to visual narrative that might otherwise seem confusing or incongruous. These markers are particularly clear in the short music cues used as bridges between two scenes of disparate character. Imagine an episode of the popular TV soap Dallas (1978-1991) in which the scene changes from [a] burly bronzed men in hard hats as they struggle to plug an oil rig that has blown to [b] the ice blue satin sheets of the blonde prom queen’s bedroom as she attends to her recently shampooed, conditioned and blow-dried tresses with leisurely legato sweeps of a bejewelled hairbrush. Staged in a theatre, there would be a scene change giving the audience time to adapt from oil rig to boudoir but on TV the cut from one to the other is instantaneous to the point of absurdity. A musical bridge lasting a few seconds helps solve this problem of narrative incongruity by first very briefly stating something compatible with the oil rig scene, for example a chordal stab that is held and cross-faded into musical elements compatible with the cossetted bedroom, typically a string pad with rich harmonic texture underscoring a short sexaphone motif. A bridge like that would last no more than a few seconds, ending on a sonority without closure that would fade, leaving the acoustic space open for dialogue and sound effects. Such endings, called tails because they literally tail off into whatever follows them, are episodic markers of non-finality. They serve a sort of to be continued purpose, typically to bring TV viewers back into the narrative after attacks of consumerist propaganda (commercial breaks).

Diataxis (narrative form patterns)

As shown in the commutation of order of events in Abba’s Fernando, in The Beatles’ A Day In The Life, as well as in the discussion of cyclical processes (pp. 390-403), diataxis (= narrative form patterns) can contribute substantially to the meaning of a piece of music. More prosaically but just as importantly, it’s evident from labels like twelve-bar blues, 32-bar evergreen, strophic ballad, verse-refrain pop song, not to mention sonata form, that the sorts of diataxis discussed in Chapter 11 are indicative of particular musical styles and, by extension, of genres to which those styles belong. They can in other words act as style flags.

Style flags

As its label suggests, this third and final main sign type uses particular sounds to identify a particular musical style and often, by connotative extension, the cultural genre to which that musical style belongs. There are two main categories of style flag: those that establish a ‘home’ style or musical idiom —style indicators— and those that refer from inside a home style to a ‘foreign’ style and to the genre associated with that style —genre synecdoches. Style flags of both types use different combinations of different aspects of duration, rhythm, timbre, tonality, spatiality, diataxis, etc. to determine style identity and these combinations often include structures that simultaneously work as anaphones, episodic markers or as both. They let listeners instantaneously know if they’re hearing 1970s disco rather than zouk, rococo chamber music rather than death metal, glitch dub rather than Gregorian plainchant, mbaqanga rather than Muzak, an Elizabethan madrigal rather than a low-church hymn, a rāga performance rather than a romantic pop ballad, a national anthem rather than a tv detective theme, and so on.

As already mentioned (p. 306, ff.), perhaps the most obvious sort of style flag, be it style indicator or genre synecdoche, is instrumental. For example, a string quartet (two violins, viola and cello) indicates ‘string quartet’, not ‘barbershop quartet’ or ‘glitch dub’, just as the standard guitar band line-up (lead, rhythm and bass guitars plus drumkit) indicates a particular type of rock, not a madrigal group or gamelan ensemble. Vocal types or mannerisms can also be clearly indicative of musical style, for example the lead vocalist and his two or three male backing vocalists for doo-wop, or the ‘authentic’ and ‘individual’ vocal persona cultivated by male singer-songwriters performing serious lyrics.

Of course, tonal vocabulary, rhythmic configurations, volume, diataxis and so on are all operative in communicating one style rather than another. For example, music using only a few chords (rarely inverted) but sporting plenty of vocal and instrumental inflection (of particular types) might be regarded as indicative of blues rather than of Viennese classicism, whereas plenty of different chords (frequently inverted) and much less variation in terms of vocal or instrumental inflection might be regarded as indicating Viennese classicism rather than blues.

Style indicator

A style indicator is that type of style flag which contributes to the establishment of a ‘home’ musical style. Style indicators are in other words those aspects of musical structure that state the compositional norms and identity of a given style and that tend to be constant for the duration of an entire piece. They lay down the basis of the ‘home’ style without which genre synecdoches (discussed next) cannot exist and into which the latter must be imported from outside. Apart from the sorts of style flag mentioned in the previous two paragraphs, it’s worth adding that obvious style indicators occur in dance music where characteristic rhythmic configurations (waltz, tango, twist, samba, etc.) for obvious reasons remain constant for the duration of the piece. Episodic markers of propulsion and finality can also act as style indicators. It is, for example, easy to distinguish between the way in which final cadences are milked in a euroclassical symphony and in a stadium rock gig. Similarly, a rapidly rising scalar figure (or a Stamitz-style crescendo, or ‘Mannheim rocket’) is more likely to provide propulsion in a euroclassical symphony than in a rock recording whose propulsive episodic markers are more likely to be articulated as a drum fill on toms leading to a downbeat cymbal crash and kick drum hit.

Genre synecdoche

The second category of style flag is the genre synecdoche [!ZAnr« sIn!Ekd«kI]. Synecdoche means a figure of speech in which a part stands for the whole (pars pro toto), as in the expression ‘fifty head of cattle’ meaning fifty entire animals, not just their heads. A musical synecdoche is therefore a set of musical structures imported into a musical ‘home’ style that refer to another (different, ‘foreign’, ‘alien’) musical style by citing one or more elements supposed to be typical of that ‘other’ style when heard in the context of the ‘home’ style. By including part of the ‘other’ style, the imported sounds allude not only to that other style in its entirety but also to the complete genre of which that other musical style is but a part.

Herrmann’s murder music for the shower scene in Psycho (fig. 13-9, p. 511), played in a concert or radio context to popular music listeners who don’t recognise the piece, might well be perceived as a genre synecdoche. Why? Well, since it sounds more like, say, a Penderecki composition than a Céline Dion recording, it might produce the style reference modern art music connoting, as genre synecdoche, avantgarde Angst rather than, as composite anaphone, murder by multiple stabbing in a popular horror film.

Genre synecdoches operate in many types of music, not least in well-known works from the euroclassical repertoire. I’m thinking here of a museme stack consisting of a long held note (drone or pedal point) over which a simple melody is played. This museme stack permeates the pastoral symphonies from both J. S. Bach’s Christmas Oratorio (1734) and Handel’s Messiah (1741), as well as at the start of Beethoven’s Pastoral Symphony (1808). With its drone and harmonic stasis, this sort of syncrisis was out of place in the urbane musical idiom of euroclassical traditions which featured no droned instruments but plenty of chord changes. That meant it could be used as a device referring from inside the euroclassical idiom to ‘other’ music on the outside, more specifically to the music of Europe’s (then) contemporary agrarian proletariat. That musematic device was in other words not only a style reference from inside a ‘home’ musical style to another on the outside; it also referred, by extension and connotation, to the people, a culture and a lifestyle on the outside. That’s why those familiar at that time with the euroclassical idiom could recognise that other music as ‘country’, ‘peasant’ and ‘pastoral’. It’s also why Bach and Handel found it handy for passages involving ‘shepherds keeping watch over their flocks by night’, and why Beethoven could use it to suggest the ‘awakening of cheerful feelings upon arriving in the country.’

Unlike anaphones, genre synecdoches connote paramusical semantic fields —another place, another time in history, another culture, other sorts of people — not by synaesthetic or structural homology, but by the intermediary of another (‘foreign’) musical style. Since the intermediate ‘foreign’ style is only one part of a larger set of cultural phenomena (way of life, attitudes, perceived environment, clothing, behaviour, etc.) as viewed by the ‘home’ style’s audience, the ‘foreign’ style acts as synecdoche for that larger set of ‘foreign’ phenomena. Put tersely, genre synecdoches contain three stages of semiosis in a connotative chain: [1] from certain sounds considered, rightly or wrongly, as typical for a ‘foreign’ musical style to the totality of that same style; [2] from that style to the genre of which it’s considered to be part; [3] from that genre to the rest of the culture of which that ‘foreign’ style and its concomitant genre are thought to be part.

Much use is made of instrumental or vocal timbre and technique to produce genre synecdoches. Among the more stereotypical Western notions about the geo-ethnic identity of vocal or instrumental sound, many of which have been mentioned earlier, are the following twenty ‘equations’: [1] quena flute, panpipes (zampoñas), charango and bombo drum = the Andes; [2] shakuhachi and koto = Japan; [3] accordéon musette = Paris; [4] tin whistle, uileann [!Il9n] pipes and keening (caoine) = Ireland; [5] Highland pipes, pibroch (piobaireachd), Scotch snaps = Scotland; [6] steel drums = Trinidad; [7] castanets and flamenco guitar = Spain; [8] mariachi trumpets = Mexico; [9] bouzouki = Greece; [10] balalaika = Russia; [11] ud, ney and darbuka = the Arab world; [12] bottleneck guitar and dobro = the deep rural south of the USA; [13] sitar, shenhai, tablas and adult women with ‘girlie’ voices = India; [14] erhu, pipa, qin, sheng, guan and gong = China; [15] gamelan metalophones = Indonesia; [16] digeridoo = Aboriginal Australia; [17] women singing semitone dyads = Bulgaria; [18] Inuit competitive song (katajjaq); [19] Mongolian throat singing; [20] Alpine yodelling.

As explained under ‘Mode and connotation’ (p. 332, ff.), tonal vocabulary is also used to produce genre synecdoches. Musicians refer to blues scales, Gypsy scales, Arab scales (including Hijaz and Kurd), medieval ecclesiastical modes, ‘pseudo-Russian’ guitar tuning, Celtic pentatonicism, Chinese pentatonicism, and so on. Perhaps the most familiar example of a different tonal vocabulary to connote ‘others elsewhere’, at least to Northern Europeans and North Americans, is the genre synecdoche linking the phrygian mode with all things Spanish, as in Rimsky-Korsakov’s Capriccio Espagnol (1887), Bizet’s Carmen (1875) and Rodrigo’s Concierto de Aranjuez (1940), including Miles Davis’s version of the piece on Sketches of Spain (1959).

In addition to the frequent use of tonal vocabulary and vocal or instrumental timbre to connote ‘others elsewhere’, anaphones of any type can in fact also double as genre synecdoches. For example, by importing Scotch snaps and pentatonicism into the home style of the euroclassical orchestral tradition, Dvořák was able to produce his ‘Symphony from the New World’ (1893): those imports gave his work a convincing flavour of ‘others elsewhere’ without radically disrupting the symphonic ‘home’ idiom.

Other genre synecdoches occur in pop music. In Abba’s Fernando (1975), for example, you hear quasi-Andean instruments and instrumental techniques, irregular periodicity, an extremely sparse bass line, military figures on snares but no other part of the drumkit, and vocal delivery resembling more recited speech than the strophically regular patterns of pop song (p. 386 ff.). These devices situate the tune’s verses in a verbally and musically foreign there-and-then that contrasts with the here-and-now home territory of the tune’s choruses, complete with their mid-1970s easy-disco backing patterns featuring full drumkit and constant bass riffs, their strophically regular vocal phrases, and their overall regular periodicity (4 × 4 = 16 bars). Of course, the ‘Andean’ elements in Fernando can only be genre synecdoches in the here and now context of Abba’s home style; but they would be style indicators in the original style and genre context from which they were borrowed: Huayno music from Peru, Bolivia or Chile. Whether the style flag is genre synecdoche or style indicator all depends on where ‘home’ is.

Sometimes a genre synecdoche can become a style indicator if it’s repeatedly used over time until it no longer sounds foreign in the home style. For example, from originally connoting things like Hawaii and sunshine in early styles of Country music in the USA, whining steel guitar glissandi gradually became style indicators of mainstream Country music and ceased to operate as genre synecdoches connoting Hawaii or sunshine. Such incorporation of ‘foreign’ elements into a ‘home’ style is of course part and parcel of musical acculturation and of semiotic processes in which music’s meanings and social functions are renegotiated over time from one audience to another under different social, cultural and technological conditions. The distinction between genre synecdoche and style indicator is in other words useful because it lets us to discuss musical meaning from a historical perspective.


So what?


The aim of this chapter has been to explain some of the ways in which musical structures can be related to their perceived meanings, i.e. to their connotations, uses and paramusical contexts. In fact that has been the aim of this book in general. The obvious question is whether any of the previous 528 pages have been of any use. Of course, I’d like to think so but there’s really only one finding out and that’s to see if it works in everyday practice. That’s why the next and final chapter of this book is devoted to an account of courses I ran from 1993 to 2010 about the meanings of the sort of music people in the contemporary urban West Music hear more than any other. I mean all that invisible music we hear in conjunction with the moving image.

(a) (b)

Fig. 13-5. Morales (c.1568): Virgin and Child (Prado)




14. Analysing film music

Invisible music

About half the music Westerners hear on average every day accompanies moving images. The vast majority of that music is invisible in the sense that we don’t see anybody actually making the sounds we hear. If you also consider music for religious and other ritual functions, for the stage, for dancing, on the radio, on smartphones, through speakers in bars, cars, cafés, trains, planes, shopping centres etc., it becomes obvious that most music, not just most music heard in conjunction with moving images, is invisible.

If music in everyday life is overwhelmingly invisible for the majority of those who hear and use it, it’s not unreasonable to ask why music studies have been so dominated by visible music making, mostly in terms of performance, less commonly (and less visibly) as composition, arrangement or recording. That anomaly is partly explained in Chapter 3’s deconstruction of the absolute music aesthetic, but it also relates to issues, discussed in Chapter 6, about music education’s lopsided concern for poïesis at the expense of aesthesis. While it’s obvious that music making cannot exist without being able to see and/or feel ‘where the notes are’, the actual music-making process (poïesis) is as a rule visibly absent at the moment of musical perception (aesthesis). True, the effects of the music may well be tactile and/or visual —dancing, goose pimples, tears, synaesthetic connotations, etc.— but these aesthesic aspects of touch and sight cannot be equated with those involved in music making. It’s from this perspective that analysing music used in connection with moving images becomes an important field of study, not just because music on TV or in films and games reaches our ears and brains more than does music from any other source but also because the music’s meanings cannot be fully understood without considering the audiovisual events with which the music co-occurs.

Another important reason for studying music and the moving image is that it’s a subject of equal interest to those with and those without formal training in music. For example, although my job was to teach in university schools of music, a significant number of students on my Music and Moving Image courses came from subjects like cinema and communication studies. This mixture of musos and non-musos on the same course has three distinct advantages.

[1] It rhymes better with the reality of audiovisual production where musical experts (composer, music editor, music supervisor, etc.) and others (director, producer, etc.) have to collaborate.

[2] Musos have to learn how to talk about their ideas in ways that non-musos can understand and to try and decipher what non-muso collaborators want by way of music.

[3] Non-musos who want to work in the audiovisual sphere, have to rely on their own aesthesic competence in music and learn how to give composers and music editors a coherent and comprehensible brief.

These three points further imply that:

[4] Musos have to learn the rudiments of cinematographic terms and practices, while non-musos have to plunge into the weird world of musical thought.

[5] Musos have to learn that music serves other purposes than those extolled by their music-making peers, that their art communicates more than just itself, and that visual narrative rarely aligns squarely with musical patterns of change, continuation and finality.

[6] Non-musos need to stop claiming they are unmusical and to learn to trust their own ears as well as eyes. They also need to understand something of how musicians tend to think and act and to grasp the potential of music’s main parameters of expression (Chapters 8-12).

[7] As shown in Chapter 7 (pp. 256-260), anyone, muso or non-muso, can, since the advent of real-time counters, unequivocally designate, to the nearest second, any musical structure by referring to the timecode at which it starts in a digital recording. Non-muso unfamiliarity with the poïetic denotors of conventional music theory is in other words no longer an excuse for avoiding structural denotation in music.

[8] Both musos and non-musos involved in audiovisual production need to have realistic notions of what music can and cannot do, and of how it can communicate things other than itself.

[9] Both musos and non-musos need to be aware that their own musical experience is not necessarily that of their prospective audience.

Since these nine points derive substantially from teaching experience, the rest of this chapter is devoted to explaining key elements in the course Music and the Moving Image. Particular attention will be paid to the hands-on analysis all students, muso and non-muso alike, have had to do by way of coursework.

Course description

Music and the Moving Image is the name of a single-semester undergraduate course I taught between 1993 and 2009. It most recently involved thirteen effective teaching weeks of three-hour sessions and an average class size of forty-four students (Table 14-1, p. 532).

Table 14-1: Music and the Moving Image course overview

1 G Intro: aims, content, tasks, evaluation, admin., etc.

” Discussion of music in two clips (pp. 542-543).

— 1. Film music’s functions (p. 546, ff.). 2. Semiotic analysis method (Ch. 5-13).

— Cue list and analysis assignment explanations (pp. 558-576).

2 ” Discussion of feature film choices for final assignment (p. 558).

— Origins of silent film music in Euro-classical repertoire (p. 534, ff.).

G Explanation of silent film group work (see p. 543, ff.).

3 ” Discussion of feature film choices for final assignment (p. 558).

˜ Group work: silent film and modern mood category comparison (p. 543, ff.).

4 ” Discussion of feature film choices for final assignment (p. 558).

— Film music during the silent era; arrival of the talkies; click tracks.

˜ Group work (continued; p. 543, ff.).

5 G Deadline for choice of feature film to analyse

˜ Silent film group work presentations 1-8 .

6 ” Silent film group work presentations 9-11.

— Final assignment explanations (pp. 558-576).

7 — ’Classical’ Hollywood film scores: Steiner, Tiomkin, Rózsa, etc., etc.

— ‘New’ styles & stereotypes: cowboys, PIs, men and women in film music.

8 — Postwar additions to the film music palette: folk, jazz, rock, ‘world’, etc.

— Digital techniology (incl. smpte), synthesisers, etc.

˜ Group supervision of work with cue sheet and graphic score.

9 ˜ Feedback sessions 1-3 (plenary) (pp. 562-564).

˜ Feedback sessions 4-8 and 9-13 (two classrooms; pp. 562-564)

10 ˜ Feedback sessions 14-21 and 22-29

11 ˜ Feedback sessions 14-37 and 38-44

12 ˜ Overspill for feedback sessions, supervision of final assignment, etc.

— Guest lecture 1: film music composer

13 G Deadline: submission of final assignment Cue list and analysis.

— Guest lecture 2: games audio expert


Music and the Moving Image contains two interrelated elements: [1] theory and history; [2] analysis. With the exception of the two guest lectures (weeks 12-13 in Table 14-1), theory and history are mainly dealt with in weeks 1, 2, 4, 7 and 8 while analysis elements occur throughout the course, occupying the entirety of sessions 3, 5, 6 and 9-11. The course’s main theoretical elements are: [1] the functions of film music (week 1; p. 546, ff.); [2] rudimentary semiotic method for film music analysis (week 1, based on Chapters 4-13 in this book); [3] cinematographic and film-musical terminology (reading, weeks 1-4; p. 552, ff.). These elements are introduced in direct connection with audio or audiovisual examples presented in class. The historical elements are: [1] origins of film music (week 2; p. 534, ff.); [2] music for silent film, including the comparative analysis carried out in groups (weeks 2-6; see p. 543, ff.); [3] ‘classic’ Hollywood film scores (week 7); [4] postwar film scores (week 8). This aspect of the course includes mandatory historical reading, plus obligatory viewing and listening repertoires.

The course focuses mainly on feature film music rather than on music for TV or games for the following four practical reasons.

[1] The feature film has the longest history of any extant type of recorded audiovisual production in which music plays an important part. It’s also the type of recorded production with which most people are most familiar.

[2] There is more literature of a serious nature about film music than about, for example, music for television or computer games.

[3] Legal recordings of feature films are in general easier to acquire than recordings of tv programming, adverts, short films, etc.

[4] Since video game narrative unfolds according to the skill of individual players, its music has to adapt to each player’s position in the game and to his/her relative speed in reacting to each situation where musical change may be appropriate. Since adaptive music conceived for such conditions cannot be mapped as fixed sync points on the immutable time line of a film or TV production, the composition and analysis of games music demand skills and practices that cannot be easily included in a single-semester course containing ‘hands-on’ elements.

Before the analysis


A basic history of film music is included in the course for several reasons.

[1] Technological developments (cue sheets, click tracks, smpte, timecode, etc.) highlight: [i] the centrality of synchronisation in the provision of music for the moving image; [ii] radical changes in the treatment of relationships between dialogue, sound effects and music (from one-channel mono and through-composed scores to multi-channel audio mixes, so to speak).

[2] Studying stylistic change provides insights into how film music’s range of idioms expands to include sounds appropriate to new sorts of popular narrative (Westerns, detectives, modern social drama, science fiction, stories set in unfamiliar locations, etc.). Style history also gives recurrent evidence of the need to use words to refer to music in ways that directors and producers, not just musicians, understand (silent film music compilations, library music categories, etc.).

[3] By hearing/viewing and talking about examples from the historical repertoire, students learn ways of discussing music and the moving image that are directly applicable to their analysis work (p. 562, ff.).

Although I cannot, for reasons of space and clarity, discuss the historical part of the course in any detail, one period is of particular importance to the analysis work that is the main focus of this chapter and to the topic of this book in general: the origins of film music.

Origins of film music

Two basic questions arise at the start of any history of film music. [1] Why did silent films need music? [2] Why, at least in North America and Europe, did music for silent film draw so much on one musical tradition and so little on others?

The usual answer to question 1 is that music was used to mask unwanted noise from the projector, from the audience and, in the non-soundproofed venues where films were often shown in the early years of cinema, from the street outside. This is certainly one feasible explanation for music being played at films shown in penny arcades, nickelodeons, fairgrounds and, in general, for cinema as working-class entertainment before 1910 or so; but it doesn’t explain why the practice continued after projectors were boxed off from the auditorium in picture houses with a foyer separating the audience from the street outside. While it’s possible that the practice may to some extent have continued into the 1910s and 1920s out of force of habit, a more likely explanation is that seeing just movement without any accompanying sound at all is simply unreal. It can even be quite upsetting. Just imagine a bad dream in which no sound comes out of your mouth even though you’re yelling at the top of your voice, or in which you’re trying to run away from a tsunami, an avalanche, a pyroclastic flow or some other loud threat you see but cannot hear bearing down on you. Little wonder, then, that the Lumière Brothers’ silent short Arrival of a Train at La Ciotat (1895) scared some of the audience, not just because a large locomotive came towards them in 2D black-and-white but also because it approached them without its sound. At the same time, the passengers and station staff bustling about on the platform were all silent too, unreal, ghostlike and sonically disembodied.

If you think I’m overstating the case for sound as intrinsic to energy and movement please consider the following pieces of circumstantial evidence: [1] ‘what a noise!’ translates into Swedish as ‘vilket liv!’ which literally means ‘what a life!’ in the sense of ‘what a noise and/or commotion!’, implying that sound, motion and energy are intrinsic to life; [2] motion occupies most of the words commotion and emotion, the latter linked to being moved by, for example, the sounds of music; [3] clamour, uproar, racket and hubbub (sound), as well as agitation, bustle, tumult and turbulence (motion), are all synonyms for commotion. These points might help explain why sound was necessary in the early days of cinema but not why the sound should be music. The simple answer is that music was cheaper and the only practical alternative at the time.

Before the general spread of electrically amplified sound technology in the late 1920s, dialogue, sound effects and ambience would either have had to be recorded and played back without amplification or produced live at every showing of every film in every cinema. The latter can be ruled out straight away because the amount of staff and equipment required to dub dialogue and to convincingly produce an infinite number of different sounds was never a financial or logistical option; nor were the newfangled 78 rpm audio discs because they had no bass register and because playback was unamplified. If the Lumière brothers’ large steam locomotive had been shown accompanied by an unamplified acoustic recording of the actual sound it had made when shot to celluloid, the cinema audience might have heard some faint hissing and distant clinking but the massive engine’s rumbling and clanking would have already been filtered out with all the other bass frequencies at the recording stage. The visual effect of the approaching locomotive’s power, size and volume and its presence close to the audience’s eyes would in other words have been contradicted by their ears. Moreover, since multi-channel mixing on to audio tape was not in widespread use until much later and stereo uncommon before the 1960s, it was impossible to acoustically stage any scene convincingly so that, for example, the ‘smaller’ sounds audible on the La Ciotat station platform when the visuals were shot (general bustle, passengers talking, station staff shouting, etc.) could be heard in the right place of the audio picture, if they were audible at all on the recording. Add to that the fact that selective audio focus —recreating the cocktail party effect, making an actor’s inner thoughts audible, even a simple voice-over — was out of the question and it’s clear that the camera’s and projector’s ability to produce a convincing simulacrum of visual reality had no sonic counterpart until decades later. And even if none of the technological and expressive restrictions just mentioned had applied, film distribution companies would have had to include twenty brittle and scratchable 78 rpm discs with the two or three reels of celluloid they circulated to cinemas for each hour-long film. Such hypothetical ‘solutions’ would in their turn assume that projectors and record turntables ran at a fixed rate in every cinema to ensure that sound and image were in sync, a technology unavailable before 1926 and which was replaced soon after by the more reliable and less cumbersome system of optical sound. Meanwhile it was music that offered the easiest solution to the silent film’s dilemma of sonic disembodiment.

Music could at least temporarily solve the synchronisation problem because it has its own logic of temporal and kinetic narrative independent of the way in which visual events unfold. Music’s sonic logic can in other words override the need to hear the audible counterpart to every potentially audible event or situation on film: neither lip-sync-ed speech, nor Foleys, nor even the ambient soundscape has to be present if music is played. As long as suitable music started on cue in the silent film era, it could be played for the duration of an entire scene until the general mood, atmosphere or location changed and different music was required. Moreover, since the music was played live, it was by definition heard in hi-fi, variable enough in volume and adaptable enough in all its other parameters of expression to accompany large locomotives, bustling crowds, pastoral idylls and intimate love scenes. What sort of repertoire could silent film musicians draw on to provide such a wide range variety of ‘suitable musics’?

A quick glance through any extant cue sheet from the 1910s or any collection of silent film music from the early 1920s shows that nineteenth-century euroclassical music dominates the repertoire. For example, Ernő Rapée’s Motion Picture Moods for Pianists and Organists (1924) contains 19 pieces by Grieg, 12 by Mendelssohn, 5 each by Beethoven and Bizet, 4 by Schubert, another dozen or two by the likes of Brahms, Dvořák, Schumann, J. Strauss, Tchaikovsky and Wagner, plus a significant number of pieces in the same vein by less well-known figures. There are at least three reasons for the dominance of nineteenth-century euroclassical music in the silent film era.

1. It was, at least in both Europe and North America, the most widespread and interculturally viable of any musical tradition available at the time.

2. It was a musical tradition with well-established practices for use together with a wide variety of paramusical forms of expression.

3. Its use of notation enabled silent film musicians to synchronise their performance with what was shown on screen.

Reasons two and three need some elaboration.

The most obvious ‘classical’ forerunners to film music are found in opera, ballet and music for the theatre. Here was a living tradition of invisible instrumental music accompanying stage action. It was music that could be easily adapted for use with silent film. Tone poems and other types of programme music provided another source, as did the piano parts of parlour song whose musical ideas had to fit the lyrics. The euroclassical tradition’s popular ‘character pieces’ were also useful since they were explicitly linked to extramusical phenomena and were already written for the piano. Moreover, no other repertoire was as at that time as interculturally viable as music in the euroclassical tradition, and no other repertoire had the European tradition’s system of physically storing essential aspects of its sound —staff notation— so that musicians could retrieve and perform a much wider range of music than they could possibly memorise.

Reason number three —notation as a prerequisite for satisfactory synchronisation— is even more prosaic. Let’s say we need music to cover visual footage lasting forty seconds and that we have a suitable musical extract occupying 16 bars of 4/4 time (64 beats in all) that sounds good played at 92 bpm. Using a metronome and simple arithmetic we can calculate the duration of those 16 bars by first converting the tempo of 92 beats per minute into beats per second [60÷92= 0.652174], then by multiplying that figure by the number of beats (64) in our musical extract [0.652174×64 = 41.74]. Since 41.74 is almost two seconds too long for the 40-second footage we want our music to fit we’ll either have to play the piece a little quicker or use simple arithmetic again to calculate the right metronome tempo for our 16 bars of music. Dividing the 64 beats of our piece by the duration (0:40) of the visual extract it has to fit (64÷40) means we’ll need to perform the piece at 1.6 beats per second, i.e. at 96 beats per minute (1.6×64) to make our music fit the footage.

This way of aligning musical cues to fit exactly with visual events became even more important with the advent of the talkies because no longer was a single film subjected to as many variations of music plus image as there were cinemas. Instead one single product was distributed to every cinema and it was obviously desirable that the centrally produced combination of music and moving image be as convincingly synchronised as possible. It was to this end that the system of click tracks, expressed as beats per minute and film frames per click, was developed. It was a synchronisation system that remained in use until the 1980s. With the exception of music scenes, title sequences and certain types of animated film, the fact that music had as a rule to adapt to visual events rather than vice versa demanded that its synchronisation be planned in the minutest detail. With musical notation as the only viable editing and storage technology allowing for such planning it’s hardly surprising that, between roughly 1930 and 1980, film composers were almost exclusively classically trained, with advanced skills in orchestration, arrangement, composition and conducting. And that is of course why, in the early days of the talking film, Hollywood head-hunted Europeans like Korngold (praised by Mahler), Steiner (pupil of Brahms and Mahler) and Tiomkin (pupil of Glazunov) to score films like Robin Hood (Korngold, 1938), King Kong (Steiner, 1933) and Lost Horizon (Tiomkin, 1937).

The main lessons to be learnt from this short excursion into the origins of film music are:

1. Sound is experientially intrinsic to movement. Visible movement without sound can come across as unrealistic and disembodied.

2. During the silent film era music was the only viable solution to the problem of soundless visible movement, not just for technical and logistical reasons but also because music’s own narrative logic can for short periods of time override that of the visuals.

3. Synchronisation of music’s entry, exit or radical alteration with key points in the visual narrative, especially with the starts and ends of scenes, is essential to the overall narrative credibility of any film.

4. The nineteenth-century euroclassical tradition was the most viable and widely used source for music capable, thanks to staff notation, of providing reliable synchronisation with the visuals. It also provided the only interculturally viable repertoire of ‘invisible’ instrumental music with well-established links to visual action.

5. The fact that film has relied so heavily on music in the euroclassical tradition squarely contradicts the notions of absolute music deconstructed in Chapter 3.

Introducing concepts

Despite the need to explain basic theory and method in the first few weeks of the course (the functions of film music and the semiotic analysis method in week 1, the origins of film music in week 2, etc.), it’s essential to confront students from the outset with concrete examples of music’s ability to tell us what the pictures on their own cannot, and to start identifying the sort of thing music actually can communicate to audiences. It’s with this goal in mind that I resort to two well-tried ‘cheap tricks’ at the start of the course.

Cheap trick no. 1: musical commutation

As soon as initial administrative tasks are out of the way, I play two short clips to the class. The first one consists of the same 30-second visual sequence played four times in succession: [i] without any sound to allow for a musically ‘unbiased’ view of the visuals; [ii] with the original music, in this case the pastoral signature tune to an interminable UK soap;25 [iii] a loop from a library music track that sync-ed well with the visuals and was characterised by record label staff as ominous and agitated; [iv] the looped instrumental intro to an up-tempo Deep Purple track which sync-ed abominably with the visuals. Students are often surprised by the radical difference of narrative between versions [ii] and [iii]. What was calm and peaceful in version [ii] comes across as eerie, desolated and threatening in version [iii]; what was an idyll without stress turns into the aftermath of a killer virus; what was in version [ii] a small car driven by a little old lady at a leisurely pace past the village green to the pretty stone farmhouse on the green hillside becomes in version [iii] a serial killer who, driving with an evil grin of determination, is transporting human body parts back to his necrophiliac stash in the insanitary cellar of his dark, dank isolated hideout. Version [iv], the up-tempo rock loop, usually provokes merriment because it’s so obviously inappropriate, not just in terms of the frontal cultural collision between its own connotations and those of the visual subject matter it accompanies but also in terms of its tempo, its surface rate, its short, square periodicity and boisterous bounce, none of which matches the long, slow, gliding helicopter sweeps and the smooth, soft cross-fades of the visual idiom. The point is that by viewing this clip and by registering the differences of narrative caused by differences of music and nothing else, attention is drawn to the central importance of synchronisation and of understanding the potential of music’s parameters of expression (Chapter 8) in mediating the differences observed. The discussion also provokes the need for a systematic understanding of music’s various functions in connection with moving images.

The second ‘week-one’ example I used during the last few years of teaching the course is a six-minute clip (0:06:14 to 0:12:21) from American Beauty (1999). It exemplifies virtually every film music function in the book (see p. 546, ff.), including source music that is so much more than merely diegetic (0:06:47, at the dinner table), psychological underscore of considerable poignancy (0:09:13, in the kitchen) and plenty of representing place and underlining movement (0:10:44) as the Annette Bening character cleans the house she hopes to sell. Unfortunately, if I start writing about the wealth of meaning mediated in any of those three cues, for example about the sunny, carefree, corporate marimba sounds in relation to the misery of desirable properties (many of which are probably now foreclosed) in the soulless suburbs of the great American Dream, I’ll never finish this book.

Cheap trick no. 2: the silent film music group assignment

The second trick is to involve students as early as possible in actual coursework. This element of the course, described below, involves access to Rapée’s Motion Picture Moods for Pianists and Organists (1924, see Figure 14-1, p. 544) in a library with photocopying facilities, dividing the class into groups of four, and the participation of at least one notationally literate keyboard-playing student in each group. This group work, called Musical mood comparison between silent film and recent feature film, runs roughly as follows.

Fig. 14-1. Motion Picture Moods for Pianists and Organists (Rapée, 1924: 10), showing mood categories in left margin.

Each group chooses at least one of the musical mood categories listed in Rapée (e.g. children, comedy, love, neutral, pastoral, Western, etc., as enumerated on the left of Fig. 14-1) and photocopies the pages of notation covering that mood (e.g. pp. 10-20 for battle, including Agitato no. 3). A group member with some piano skills then either plays representative extracts of the silent film mood live for the other group members or records them to an editable midi or audio file using a piano preset and appropriate audio software. Group members then list their impressions, paying particular attention to musical structures they consider typical for their chosen silent film music mood, and noting anything they find surprising, for example (typically) that horror music from 1924 doesn’t sound very horrific in our ears.

The second phase of the project entails scouring group members’ dvd collections in search of music from recent films containing scenes that can be characterised using whichever of Rapée’s musical mood categories was chosen for the project (action, children, comedy, love, neutral, pastoral, Western, etc.). The final stage of preparation involves describing the music heard in connection with the relevant scenes in the recent films and comparing that with music for scenes labelled in a similar way back in 1924.

This entire group project is presented in class. Group members are expected to have managed their own internal division of labour so that the following tasks are distributed equitably: [1] playing and recording the pages from Rapée; [2] structural description of music in Rapée; [3] structural description of music in the ‘recent films’; [3] description of other audiovisual aspects in the ‘recent films’; [4] formulation of comments and conclusions; [5] organising the presentation in class. Standards of presentation usually range from acceptable to excellent, some ambitious in their mode of presentation (e.g. commutation of music for the same visual footage or vice versa), others more in terms of structural detail and conclusions drawn.

Apart from inevitable comments like ‘they only had piano or organ back then’, this project can produce interesting insights. For example, several groups described how recent film-musical notions of children and love differ from those in circulation in 1924. Another group concluded that neutral music is never neutral because it has to be dynamic enough to avoid sounding static, and that music neutral in one context will inevitably sound culturally specific in another. The group work sketched above focuses in other words on musical difference over time and on how changes of attitude towards paramusical phenomena are expressed in music. It provides essential basic insights to which we can refer later in the course when discussing the mediation of ideology through film music in relation to, say, Africa, the American dream, crime and its detection, death, the English, the jungle, Native Americans, nature itself, peace, war, the Wild West and women.

The two ‘cheap tricks’ just described don’t just kick-start involvement in the course by exploiting real issues of musical meaning; nor is the sole function of the group work to create a better social environment for everyone involved in the course. Both ‘tricks’ also create a need for getting to grips with the functions of music in connection with the moving image.

Lissa’s film music functions

In The Aesthetics of Film Music, Polish musicologist Zofia Lissa (1965: 115-256) discusses the basic functions of film music. I’ve freely translated and adapted her function labels as shown in Table14-2 (p. 547).

[1] Emphasising movement mainly involves the use of kinetic and tactile anaphones (pp. 498, ff.), i.e. music relatable to verbs like run, rush, stress, bustle, galop, stroll, drag, push, pull, jump, ascend, climb, descend, fall, approach, leave, pass [by], relax, wave, sway, swell, shrink, spin, fly, hover, caress, hit, stab, cut, stretch, open, close, flicker, stay [motionless], and to adverbs like towards, away from, over/across, to-and-fro, quickly, slowly, calmly, roughly, smoothly, jerkily and so on. The relevant movement may or may not be visible on screen.

Table 14-2: Lissa’s ten basic film music functions

1. Emphasising movement

Unterstreichung von Bewgungen 6. Source music (diegetic)

Musik in natürlichen Rolle

2. Stylisation of real sounds

Stilisierung realer Geräusche 7. Expressing psychological experiences

Ausdrucksmittel psychischer Erlebnisse

3. Representing place/space/location

Representation der dargestellten Raums 8. Providing empathy

Grundlage der Entfühling

4. Representing time (day/history) etc.

Representation der dargestellten Zeit 9. Anticipation of subsequent action

Antizipierung des Handlungsinhalts

5. Commentary (≈ ‘counterpoint’)

Kommentar 10. Enhancement and demarcation of

formal structure

Musik als formal einender Faktor


[2] Stylisation of real sounds involves the use of sonic anaphones (p. 487), i.e. the stylised musical expression of sounds, actual or potential, in the ‘reality’ of the concurrent visual narrative. Sonic anaphones, very common in the era of the silent film, in early talkies and in some types of animated film, have largely been replaced by sound effects (Foleys, atmoses, etc.) but it’s still possible to hear feature film scores using sonic anaphones of machines, screams, sighs, laughter, etc.

[3] Representing place has two main aspects: [3a] mediating a particular sense of space irrespective of cultural considerations; [3b] representing a culturally defined location. Function 3a is closely linked to function 1 in that a culturally non-specific space —open, closed, large, small, high, low, crowded, empty, indoors, outdoors, etc.— inevitably implies movement —large, wide and sweeping gestures as opposed to small, cramped and restrained ones, for example. Function 3b usually involves the use of some sort of style indicator or genre synecdoche to connote a culturally generic or specific location. Examples of culturally generic locations might range from Western notions of the tundra via pastoral rurality and small market towns to the hubbub of modern metropoles, while culturally specific locations are often connoted using geo- or ethno-musical stereotypes, for example koto or shakuhachi playing in traditional modes for rural Japan in bygone days, or accordéon musette playing a cheery waltz for proletarian Paris in the inter-war years.

[4] Representing time has also, like function 3, two aspects: [4a] connoting a historical period; [4b] suggesting a time of day. Function 4a is very similar to 3b, as the phrases ‘rural Japan in bygone days’ and ‘Paris in the inter-war years’ suggest. It also involves the use of some sort of archetypal style indicator or genre synecdoche to establish the period, for example Gregorian plainchant or a Carmina burana pastiche for medieval mystery, non-organic electro-acoustic sounds for some types of science fiction. Function 4b, on the other hand, could be easily subsumed under function 1 since times of the day tend to link musically with types of activity or states of mind seen as typical for the hour in question. For example, Respighi’s ‘The Trevi Fountain at Midday’ (loud, bustling and bright) is quite different to his ‘The Villa Medici Fountain at Sunset’ (calm, darker) not least in terms of tempo, surface rate, register and volume, i.e. in terms of movement, energy and space.

[5] Lissa’s Comment function, a.k.a. counterpoint, entails the use of music to comment upon the images so as to create a distancing effect which, in the theatre and according to Brecht, ‘prevents the audience from losing itself passively and completely in the character created by the actor, and which consequently leads the audience to be a consciously critical observer’. In the cinema the risk is more likely that viewers/listeners become so engrossed in the complete media package with all its well-established congruences of sound and vision that they never get the chance to ‘back off’ and reflect on the values of what they hear and see. Film-musical comment or counterpoint can be used to create a distancing effect by contradicting the connotative sphere of the visuals, as Kubrick did at the end of Dr Strangelove (1964), setting Vera Lynn’s mellifluous rendering of We’ll Meet Again (1939) to the dropping of doomsday bombs. Conversely, the agitated and ominous music used in Cheap trick 1 (p. 542) to underscore idyllic images of rural bliss always provokes basic questions of the type ‘what’s wrong with this picture?’. In both cases the musical comment or counterpoint obliges the audience to step back from what they’re seeing and to reflect consciously on what’s actually happening underneath the visuals.

[6] Lissa’s category Real music situations is usually referred to as source music or diegetic music. Source music could be the best term to use for three reasons: [i] the real in real music situations is problematic; [ii] source is three syllables shorter than diegetic; [iii] source music is what the phenomenon is called in Hollywood production circles. However, diegetic is also a good word because: [i] it’s the standard term used by academics; [ii] unlike source music it has a useful antonym: non-diegetic; [iii] Anglophones using no rhotic vowels can avoid jokes about ‘sauce music’ (HP, Worcestershire, soy, wasabi, béarnaise, tabasco, etc.).

Diegesis (διήγησις) is Greek for ‘narrative’ (noun) and diegetic qualifies anything that belongs, literally or by inference, to the film’s narrated story, i.e. to the ‘reality’ supposed or proposed in the film. The term can be exemplified as follows:

• Two adjoining studio sets can represent locations thousands of kilometres apart in diegetic space.

• Two or more actors, for example a star and a stunt double, can successively play the same diegetic character.

Diegetic music simply means music whose source is justified by the film’s visual narrative (hence the term source music). Non-diegetic music is film music whose source is not motivated within the film’s visual narrative. Most underscore and title music is non-diegetic.

Source music or diegetic music can be thought of as music audible to (hearing) characters (if any) in the scene where it occurs. The sounding source of the diegetic music may be visible on screen —a marching band, a karaoke bar, a parent singing a lullaby, a concert, a church organ and congregation, etc.— but it might just as well be invisible —a car radio, Muzak in an airport or shopping mall, an inconsiderate neighbour’s sound system on full blast, etc.

[7] Expressing psychological experiences entails the use of music to communicate the affective state of an on-screen character. Imagine, for example, a neutral shot of the heroine reading a letter with an expressionless face, normal body posture and without any gestures but with horror music as underscore. Only the music tells the audience the state of shock or terror she experiences on receiving such bad news.

[8] Providing empathy involves the use of music to communicate a certain set of emotions that may or may not be the same as that supposedly experienced by an on-screen character. For example, imagine exactly the same scene and same music as in function 7, except that this time it’s the psychopathic serial killer, not the heroine, we see reading the letter. Once again the horror music underscore tells us (the audience) that something awful is going to happen but in this case the letter brings good news to its reader, perhaps the opportunity for temporary psychological release through the fulfilment of some indescribably perverse act of violence. That’s not good news for the killer’s victims, nor for the other people with whom we, the audience, hopefully identify. The horror music helps us to empathise with their horror rather than with the psychopath’s feelings of excitement or relief.

[9] Anticipation of subsequent action is virtually self-explanatory. Just as a pleasurable experience can be enhanced by preparing for it, a really nasty event can be made even worse if you have premonitions of the imminent horror. That’s why presenting a mood of threat while the visuals are still quite pleasant or neutral just before they cut to the foul deed itself can increase the effect of horror when it arrives on screen.

[10] Enhancement and demarcation of the film’s formal structure can be divided into two parts: [10a] themes and motifs; [10b] episodic markers.

Themes and motifs are usually quite melodic and serve mainly to identify characters, moods and environments recurring, not necessarily on screen, throughout a film, thereby helping to make the narrative more coherent. For example, if we hear the film’s warm love theme as the wounded hero is shown dying alone in the cold mud of a World War I trench we may be experiencing a bit of function 5 (comment or counterpoint), but by hearing a theme previously linked to ‘characters, moods and environments’ not currently presented on screen, it’s easier for us to make emotional sense of an important part of the film’s overall narrative.

Themes are more substantial and melodically more extensive than motifs which can be very short and may even consist of no more than just a rhythm or a sonority. A leitmotif may be associated with a person, place or idea but since it can be subjected to harmonic, rhythmic and orchestral modification, it’s not necessarily linked to a particular mood or emotion. Adapting the same basic leitmotif to different moods is one way of bringing structural cohesion to film scores that have to cover a wide range of atmospheres, locations and situations.

Episodic markers consist of openings, links and bridges, tails and endings. Openings (‘something new starts now’) and endings (‘that’s the end of that’) need no explanation. Links and bridges are short music cues that bridge two scenes. They are particularly useful when the two scenes are of a disparate character since music can quickly but seamlessly join one mood to another. Tails are snippets of music, often after a change of scene or at the end of a bridge, that set the mood of the new scene and tail off, often on an unresolved sonority signalling that the narrative will continue and leaving the acoustic space open for dialogue and sound effects.

Please note that the functions just listed aren’t mutually exclusive. For example, imagine some fictional footage from the 1970s of a female fashion model slinking around her sumptuous penthouse apartment in a silk nightgown to the accompaniment of a smooth sounding bossa nova album she has just put on the turntable. We could be hearing film music functioning in any or all of the following ways.

• It’s diegetic music (source music, function 6) because we saw her put the vinyl disc on the turntable and flick the start lever.

• It could be underscoring how we, the audience, are supposed to feel about her (function 8): luxurious, desirable, etc.

• As we see her do her make-up in the mirror, the music tells us a bit about how she may be feeling herself (function 7): laid-back, cossetted, smooth, sexy, sophisticated, etc.

• Since the music emphasises lazy slithering, sliding or caressing rather than, say, angular or energetic battering, clodhopping or head banging, there is also an element of function 1.

• The music tells us that we are far more likely to be in a luxurious North American urban penthouse, or possibly drinking cocktails under Martini parasols with the Ipanema élite rather than in the neighbouring favela of Rocinha; nor are we in a Scunthorpe scrap yard, nor the Gobi desert, nor halfway to Mars. Function 3 is therefore also in operation.

• Depending on what sort of bossa nova is being played, the music might also be a comment (function 5) communicating that what we see is really a bit old-hat: she might be feeling fine swanning round her apartment but there’s something in her choice of music possibly suggesting that she could be a bit older than she tries to look.

Other useful concepts

Apart from insights into the origins and functions of film music, students also need, before embarking on their analysis work, to be familiar with the most common terms used in cinematography and in film music production. There’s no room here to list, let alone explain, the cinematographical terms that students need to know and readers are referred to any of the numerous glossaries available on line to check the meaning of concepts like dolly, crane, boom, pan, tilt, tracking, jump cut, wipe-in/out, dissolve, pov, ls, ms, cu, detail shot, montage, post-sync, etc., etc. Since there is also a wealth of useful literature about film sound other than music, I’ll restrict this part of the chapter to brief explanations, in alphabetical order, of the most essential terms relevant to the analysis of film music.

Breakdown notes, a.k.a. timing notes, are prepared by the music editor. They comprise a detailed list of significant events inside a scene, including cuts and camera moves, as well as key points in the action and dialogue, each with its relevant timecode location. This listing allows the composer to synchronise, where appropriate, specific points in his/her score with specific visual events in the scene (hit points). Breakdown notes should not to be confused with cue list or cue sheet.

Click denotes the metronome sound that conductors and/or musicians hear in headphones when recording music to picture. This procedure means that the music for each cue can be recorded so as to align exactly as intended with the visuals.

Cue originally meant an event signalling that another event should take place, for example ‘cutting to a low-level shot of Danny’s tricycle in the corridor was the cue to fade in Bartók’. Over the years cue has come to denote not so much the point at which the music starts (the cue point) as the complete musical continuum starting at that cue point. The duration of a cue can vary from just a few seconds to several minutes.

Cue list: a list of cue points for part or whole of an audiovisual production, i.e. the chronological enumeration of timecode locations corresponding to the start and end of each music cue (not to be confused with cue sheet or breakdown notes).

Cue point: point, expressible in terms of timecode location, at which a musical cue starts; not to be confused with hit point.

Cue sheet: [1] a list of all cues in an audiovisual production, specifying details of duration, composer, publishing rights, type of usage (e.g. visual vocal music, instrumental underscore), and prepared by the music editor for copyright purposes (cf. cue list); [2] a list of scenes in a silent film (c. 1910-1920) together with titles and sheet music publishing details of pieces suggested as suitable for each scene in the film.

Hit point: point, expressible in terms of timecode and frame location, at which a particular musical event synchronises with a particular visual event inside a cue; not to be confused with cue point.

Music Editors time, organise and manage music cues for an audiovisual production. They are present at spotting, recording and post-production sessions. They also produce the breakdown notes and the cue sheet, sometimes also the clicks.

Music-led montage is a home-grown term denoting footage in which visuals are edited to fit music rather than vice versa. Music-led montage is typical for music videos and is also common in title sequences.

Music supervisors choose, or at least suggest to the director and/or producer, which music should be included where in a film, tv show, video game or live event.

Set pieces (another home-grown term) constitute that subset of diegetic music in which musical performance is visible on screen as part of the narrative. If the performance of the piece is the main focus of the narrative, as in most musicals, the visuals will be cut to music and whatever other action may be present can also be choreographed in time with the music. However, if the set piece is more of a backdrop to other activity, cutting points are less likely to be in sync with musical episodicity. Such diegetic music may change from visual foreground to background in the narrative, even to the extent that the source music is faded out and replaced by underscore. For example, in an episode of The Return of the Saint entitled ‘The Brave Goose’, source music (function 6) is provided as a set piece by a DJ and dancers in a Saint Tropez discothèque. In the middle of an up-tempo number a murder is committed. The camera zooms in on the heroine (who alone realises what has happened) and back to the dancers who are still bopping away on camera despite the fact that the source music has been replaced by non-diegetic music. This music underlines the emotions of horror the heroine is supposed to be feeling (function 7) and provides a basis for the audience’s emotions (function 8). Since the dancers are still shown to be having a good time, the set piece interrupted by underscore also counterpoints horror (non-diegetic music only) against gaiety (silenced source music and continued dancing), making the horror more poignant (function 5).

SMPTE = Society of Motion Picture and Television Engineers, more precisely the Society’s standard timecode system used in audiovisual production and according to which passing time is given in hours, minutes, seconds and frames so that, for example, 01:09:50;12 refers exactly to a point one hour, nine minutes, fifty seconds and twelve frames after the start of the production at 00:00:00;00.

The Spotting Session is held after work on the visual footage has finished. The director and composer discuss what sort of music should be used at which points in the production (‘cue spotting’). Those points are noted as timecode locations by the Music Editor and sent to both composer and director.

Temp track, a.k.a. temp music, temp score, scratch score, is existing music added to an audiovisual production during the editing phase. It is used: [1] to test a film on audience focus groups and on production executives; [2] to give the soundtrack composer an idea of the sort of music the director envisages at various points in the production.

Timing notes: see Breakdown notes (p. 552).

Title music is a generic term denoting music conceived for an audiovisual production’s title sequences (or credits), usually at the start (the main or opening titles) and/or end of the film or programme (end titles). Music for opening titles has three main functions:

1. Reveille function: wake up! Something different is about to start.

2. Preparatory function: something of a particular type, set in a particular environment, including particular types of character and particular types of action and mood is about to start.

3. Mnemonic function: a particular, identifiable and recurrent (type of) production is about to start, e.g. another news broadcast, another James Bond movie, another episode of Dr Who, Seinfelt or Coronation Street.

The most common function of theme tunes on radio or TV is of course mnemonic, but no title music could work properly without the preparatory function. Feature films rely mainly on the preparatory function to mediate the musical message of their title sequences.

Title sequences present the composer with the rare opportunity of writing music on music’s own conditions. While underscore demands strict adherence of music to visual narrative, title music often determines the pace and type of visual flow, at least within the obvious limits of duration assigned to the sequences and the general character of the complete audiovisual production. In fact, visual titles are more likely to be cut to the music (music-led montage), whereas underscore is recorded to picture.

Underscore is invisible non-diegetic music, usually background or incidental music, written to fit an existing visual sequence. Unlike title music and set pieces, underscore is normally recorded to picture.

Bridge (scribal)

Having sketched the outlines of the course Music and the Moving Image and dealt with some of its central historical, theoretical and terminological issues, we can now finally focus on the actual title of this chapter: Analysing film music.


The analysis project

Six of the course’s thirteen sessions, as well as the entirety of what students are expected to do between sessions from week six to thirteen, is devoted to one mandatory piece of individual coursework: Cue list and analysis of music in a feature film. To complete the project successfully students need to read the following text. It’s addressed to students and ends on page 576.

Overview and aims

This project consists of the following stages: [1] choosing a film; [2] creating a cue list of the film; [3] choosing one scene to analyse in detail; [4] presenting an analysis scene in class; [5] writing up the project.

After choosing, in consultation with the course leader, a full-length feature film recorded on dvd, you produce a cue list for the film and choose one of its scenes to analyse in detail. You also write up a discussion of the music’s uses and functions throughout the film as a whole. The main aim of this project is to let you discover, through hands-on work with existing audiovisual productions, how music interacts with moving images. This work entails observing, documenting and analysing details of sound and picture with a view to understanding which means of musical expression in conjunction with which visuals can produce which effects.

Since scores for audiovisual productions are virtually impossible to come by in the form of notation, music for the moving image has to be analysed by ear and eye without any pre-existing scribal or graphic intermediary. In-depth analysis is a central part of this project because it trains sonic and visual observation skills that are useful to composers, music editors and film directors when deciding what sort of music, if any, should occur with which images at which points in the film. However, since such analysis demands great attention to detail it cannot be applied to more than a short extract from the whole film. That’s why, in order to understand [i] the musical and filmic functions of the extract chosen in relation to the film as a whole, and [ii] issues involved in the production of a complete score for an entire film, it’s also important to study more generally how musical ideas are used throughout the film. Producing a complete cue list (enumerating ‘what happens when’) is therefore another key element in this project.

The in-depth analysis and cue list both involve the investigation of what happens when in the film and are essential to any discussion of what the film’s music may be communicating. They also let you more clearly and convincingly discuss to what extent the music, including its relative or total absence, makes the visual narrative more effective.

Given that each in-depth analysis extract comes from a different film and that the extracts are presented in class, this project also lets you come into contact with a wide range of styles and techniques used by different composers with different backgrounds for different purposes.

With so much attention paid to the composer’s work on the film you study, your aural awareness should improve. Involvement in this project, including your participation in feedback sessions presented by other students, should also provide you with insights about what sort of musical ideas you might or might not want to use in which way in your own audiovisual production work.

1. Choice of film

In consultation with the teacher you should by week 4 choose a feature film with a typical running time of between 75 and 130 minutes. That feature film will be your object of study for the whole project which constitutes the only assessed coursework during the remaining weeks.

The film’s narrative theme should not be primarily musical, i.e. it should not be focused on the production or performance of music, nor contain many scenes of visible musical performance, dance, singing, etc. The film should contain a minimum of thirty minutes of music of which at least twenty are not primarily motivated by any on-screen musical ‘reality’. Focus should in other words be on underscore or on music accompanying title sequences. Despite these restrictions a vast quantity of films available on dvd remains to choose from.

After choosing your film (stage 1), four more stages have to be covered: [2] producing a cue list; [3] choosing and analysing an extract for in-depth analysis (p. 562); [4] presenting your analysis scene for feedback (p. 562); [5] writing up the project (p. 564, ff.).

2. Producing a cue list

Your cue list should consist of four or five columns, as shown in the in Table 14-3 on page 559.

Table 14-3: Final cue list extract 0:00:00-0:04:22 in The Mission (1986)




DIALOGUE, etc. 4.


8 1 0:00:00 81. Letter to the Pope. Warner logo

fade in and out [no sound until 0:00:45]

0:00:11 [black]

0:00:15 White on black

or just black Basic production credits;

black at 0:00:33

0:0037 on black) Fade-in introductory text, then fade to black...

0:00:45 [black] m1A1 (Sick string slide)

0:00:49 i Cardinal, screen left ECU, harshly silhouetted, sweaty, uncomfortable. m1A1,

m2A (Death drum)

0:00:56 Cardinal dictates letter: ‘free to be enslaved’, ‘not the right note’, ‘begin again’. m1A1, 2A, m3A1 (Woodwind intermittence: (a) pan pipes; (b) low-reg. wood flutes).

0:01:33 Outdoor sounds at mission, Violin ensemble indoors.

Monologue continues

until 0:02:07. Fade in m8A1 (La folia): m1A1, 2A, 3A1 stay;

death drum at ‘Rome’.

0:01:47 View of mission (trees, cows); cut to jungle highlands. Voice-over ends at ‘martyrdom’ (0:02:07) Fade out La folia;

m1A1, 2A, 3A1 stay.

hit point at ‘martyrdom’ across cut at 0:02:07 to…

0:02:07 Jesuit cross and Guaraní leader in dark jungle. … m2A, m3A1 …



… m3A1 …



… m3A1 … (dim.)


0:02:46 8 2. Over Iguazu Falls. Guaraní talk carrying s.g. heavy through dark jungle



(2:46) It’s a priest, tied to cross and thrown into river. Floats downstream into rapids. m3A1 gradually drowned by water FX

0:03:31 Increasing rapids.

Priest on cross over falls at 04:11 [1'08'']

[water FX only]


0:04:22 8 3. Credits (end 06:23):

Iguazu falls with water FX


The four columns in Table 14-3 contain the following information.

1. Timecode location —in hours, minutes and seconds (frame count not essential in this project)— at which the music enters, exits, or otherwise changes significantly.

2. Thumbnail stills or storyboard-style drawings typical of on-screen events starting at the timing given in the left column and ending at the subsequent timing in the cue list.

3. Brief verbal indications of important paramusical events —action, dialogue, sound effects (if not contained in a separate fifth column)48 etc.— occurring between the timing given in column 1 and the subsequent timing in the cue list.

4. Brief verbal indications of music heard during the cue (see ‘Creating a table of musical ideas’, p. 565, ff.).

Columns 1-3 in the table are self-explanatory. Column 4 (Music), on the other hand, contains codes —’m1A1’, ‘m2A’, etc.— that save space and act as shorthand for musemes listed in the Table of Musical Ideas, explained shortly (p. 565, ff.).

It’s much easier to find your way around a cue list if you clearly distinguish between where music is present and absent. How you do so will depend on what software you use. The double-lined bounding box in Table 14-3 is just one example of how music cues can be highlighted. Other useful devices in a cue list are: [i] indication of dvd chapter starts (e.g. the thick horizontal line and ‘8’ in Table 14-3) to facilitate navigation to particular points in the film on dvd; [ii] noting the duration of music cues and of music’s absences, for example the ‘02:46’ in column 1 at 0:03:03 indicating that music is present until the end of the cue at 0:03:31 and has been audible since 0:00:45 (0:03:31 - 0:00:45 = 0:02:46).

You obviously don’t need to provide much detail in your cue sheet for passages containing no music. For example, while musical events are ‘up front’ between 0:00:45 and 0:02:07 in The Mission, from 0:02:07 to 0:03:31 they assume much more the role of background audio colouring to be eventually drowned out by other sound. This later section does not need to be covered in as much detail as the section preceding it. For the section from 0:03:31 to 0:06:23 (end of table entry at 0:04:22) there is no music at all and cue list entries can be limited to dvd chapter starts, scene changes and other important narrative events.

Before trying to finalise the complete cue list in the sort of form shown in Table 14-3 you’ll need to start with a rough working version containing timings for the film’s musical entry and exit points, as well as for the main changes of scenes, the dvd’s chapter starts, etc. That way you’ll soon have a good overview of what happens when in the film and be in a better position to choose your scene for in-depth analysis, as well as to decide what you’ll need to include by way of verbal description in columns three and four, and by way of thumbnails in column 2. Making this provisional rough list means you’ll also have a better idea of the musical ideas used throughout the film and that you’ll be able to sort them into categories because your provisional cue list will let you know where to find them on the dvd. At this stage you can just use numbers or temporary labels for the musical ideas and jot down, in poïetic or aesthesic terms, something to help you identify each of them.

Using database or spreadsheet software it’s easy to insert, alter or delete cue points in your cue list since you can index column 1 (Time) so that your list is always presented in chronological order of events in the film. Remember to include, where appropriate, leading zeros in your timings, for example ‘0:01:20’ for one minute and twenty seconds. If you don’t, ‘1:20’ (1 minute and 20 seconds) will appear after ‘1:19:55’ (1 hour, 19 minutes and 55 seconds)!

A cue list of the sort just described usually occupies between fifteen and twenty pages. If you think that’s excessive, please remember that Hollywood breakdown notes, the nearest professional equivalent to your cue list, usually run to hundreds rather than tens of pages.

3. Choice of analysis scene

For your in-depth analysis you’ll need to choose an extract which contains music, which interests you and which you think could interest other course participants. The extract might be typical of the film or of the film genre in general, or it might be a key scene in the film. The length of the scene you choose will depend on the following factors.

If your graphic score (p. 568, ff.) presents a lot of visually and sonically complex detail that is thoroughly discussed in your analytical text section (p. 572, ff.) then the duration of your extract can be much shorter than if you are presenting something simpler. In general your extract will last for between about 60 seconds for a really complicated passage treated in great detail and up to around 8 minutes for something really simple. The extract can consist of a single music cue or of several short cues whose lengths add up to the sort of durations just given.

4. Class presentation of analysis extract (feedback)

The primary aim of the feedback sessions is to discover what other course participants think the music in your chosen scene connotes and communicates. A secondary aim is to find out, time permitting, what other course participants think about the relation between music and other aspects of your chosen scene (sound, dialogue, mise-en-scène, visual action, camera work, etc.).

You have a time slot of 15 minutes of which at least half should be devoted to discussion and to hearing the associations and reactions of the other course participants.

Given that the main aim of this project is to find out ‘which means of musical expression in conjunction with which visuals can produce which effects’ (p. 557), your in-depth analysis will need to be substantially semiotic. You can use any of the procedures explained earlier in this book. The class presentation of your analysis extract is designed to provide you with intersubjectively generated evidence of the music’s possible connotations, meanings and functions. Now, since it’s obviously impossible to discover anything about how music works in your chosen scene if you know nothing of what the music communicates on its own, and since most people tend to notice pictures and words without paying much conscious attention to the music, it’s essential to start your presentation by focusing on the music alone, without the visuals, preferably also without any dialogue or sound effects.

The best way of isolating the music from everything else in your analysis scene is to find, if you can, an ‘original soundtrack’ recording of the relevant music because such recordings rarely include dialogue or sound effects that could distract listener attention from the music. Failing that you can just play your chosen scene to the others without showing them the pictures. In this case you’ll need to create a separate audio file of the extract[s] so that no-one sees the dvd’s menu images you otherwise depend on to navigate to your extract, or the first few frames of the extract before you manage to disconnect the visuals. Nor is it a good idea to let people see your dvd box with its title, colours and images. The less people know at this stage about the scene and its film the better. Besides, with an audio file of your scene you can yourself start work with the sound, musical or otherwise, in your graphic score without being distracted by the images or dialogue.

Preparing separate audio and video files of your scene has other important advantages in feedback sessions. You don’t have to waste time thumbing through menus, fiddling around with next, previous, fast-forward and rewind buttons in the hope of eventually arriving at the right place. You just click on the file you need to play.

You are strongly advised to identify in advance of your feedback presentation any problems you may have with the music in your analysis scene. You might need help identifying a musical sound, or in understanding the possible connotations of a particular sound or passage. In those cases it’s advisable to isolate those elements and to present them outside their musical or audiovisual context.

Actual feedback

Other participants are asked to note on a sheet of paper whatever comes to mind on hearing the music-only or audio-only version of your chosen scene. They are in other words subjected to a short musical reception test of the sort discussed in Chapter 6 and their responses can be of any type similar to those listed in the VVA taxonomy (p. 209, ff.).

You should be well prepared for your feedback presentation. You should also: [i] collect in response sheets; [ii] note or record comments arising during the discussion of your scene. This information constitutes an empirical basis for semiotic aspects of your analysis.

You are also expected to participate actively when others present their chosen scenes at feedback sessions.

5. Written work

Your written work should include the following sections: [1] preliminaries; [2] cue list; [3] table of musical ideas; [3] in-depth analysis including graphic score and discursive text; [4] general discussion of music in the film as a whole; [5] appendices.

5.1. Preliminaries

Filmography and important credits should include: [1] film title and original year of production and/or release; [2] production and distribution companies; [3] film director, composer and producer, as well as principal actors and the roles they play.

Publishing details of the dvd used in this project should be included in your references appendix, for example as in this book (p. 605 ff.).

You should include a brief summary of the film’s story line in your preliminary comments. This summary would typically include descriptions of the main characters and locations, as well as any traits of mise-en-scène contributing to the overall character of the film. You should also motivate your choice of film and choice of scene.

5.2. Table of musical ideas

Since one of film music’s functions is to make what the viewer sees easier to interpret affectively it’s hardly surprising if musical ideas recur during the course of a film. You should therefore create a table of all the main musical ideas in your chosen film (Table 14-4, p. 567). There are also two practical reasons why creating this sort of table is a good idea: [i] you need only once to describe each musical idea for the whole film; [ii] you can refer back to the table of musical ideas not just from the cue list but also from your analysis, your general discussion, even from the graphic score. Creating a table of musical ideas presupposes that you have an overview of what happens when in the film — you’ve already done that in your provisional cue list (p. 558, ff.) — and that you’ve named and/or numbered all the musical ideas you need to refer to.

Numbering musical ideas is normally a relatively simple process. It’s easiest to count the first musical idea to occur in the film as number 1 and increment the integer for each new idea that is presented. It can also be a good idea to think of the musical ideas in terms of what you think they communicate, grouping together those that both sound and feel similar (e.g. 1a, 1b; 1a1, 1a2). Using that kind of numbering system lets you more easily check which sort of musical idea belongs with which sort of scene, character, action or mood in your film.

Number codes may be shorter than names — useful when saving space in a cue list, for example — but most people find it easier to identify and recall musical ideas when referred to by name rather than by number. There are several viable ways of naming a film’s musical ideas.

Original names of film music cues can sometimes be found in the track listings on soundtrack albums. If you can’t find such an album, try searching for it on line: several sites let you hear short samples of each track listed. In such cases you can consider using the composer’s name for each relevant cue, otherwise dvd chapter names can sometimes give good ideas for appropriate names. In other cases you’ll have to invent your own names, either poïetically (e.g. distorted guitar; minor-key strings) or aesthesically (e.g. James Bond; hounds of hell) or a mixture of the two (e.g. celestial choir; high-heeled sax). Sometimes a catch phrase from the dialogue in the relevant scene can work as a music cue name (e.g. Never again; You don’t care). Of course you’ll need to explain any labels whose relation to the music or film is not obvious.

Every musical idea you include in your Table of Musical Ideas must be given an unequivocal timecode placement indicating where in the film it first occurs. It also helps if you provide timecode placements for other occurrences of the same idea, especially if it’s a variant. Once columns 1, 3 and 4 of the cue list are ready, a table of musical ideas for a feature film can be compiled, using database or spreadsheet software.

Table 14-4 shows the first of seven main sets of musical ideas used in The Mission. This first set is called diaboli in musica because: [i] musemes labelled 1D are entirely based on the tritone, an interval which, in the history of Western music, was also called the Diabolus in musica; [ii] all the musemes in category 1 are heard in conjunction with unpleasant ideas, statements or feelings in the film. Museme 1A1 is called sick string slithers because glissando means sliding, because the ideas are played by strings, and because slow glissandi between neighbouring tones, by stating no fixed pitches, seem tonally instable and are often used in situations of mental, emotional or even gastric instability.

5.3. In-depth analysis

Criteria for selection of an analysis scene are given on page 562. Your analysis should be presented in two parts: [i] a graphic score; [ii] a discursive analysis (p. 572).


Table 14-4: Musical ideas in The Mission (sample extract)

1. Diaboli in musica

1A. Sick strings.

1A1. Sick string slithers

0:00:45 Quiet, slithering, string clusters mid register as cardinal reads letter “Your holiness”, “free to be enslaved”, etc. Also at 0:24:24 (Mendoza Alone) and 0:29:23 (Mendoza’s Remorse).

1A2. Ongoing string screech

0:27:07 Rodrigo totally alone after killing Felipe; also at 1:36:20 – 1:36:20 (Mercenaries scale the falls); 1:40:34 (Battle preparations/Refusal); 1:51:11 (Massacre 2).

1A3. Visceral disturbance

0:15:14 Mendoza confronted by Gabriel in forest; also (varied) at 0:16:34 – Nervous plucked strings off-key. as slaves dragged into Ascunción; 0:26:03 – Constant quiet string wobble before fratricide; 0:26:26 – Rising timp./str. dissonance before Rodrigo explodes; 0:29:40 – “Jaws” idea: Mendoza Alone (jealousy); 0:30:15 – Mendoza’s Remorse: loud bass (threatens Gabriel); 0:14:44 – Slave Hunt: descending danger woodwind and strings

1B. Tangled woodwind


0:33:00 Mendoza’s Penance (main theme; break for Dies Irae at 0:32:44); also at 0:34:15 – Penance: repeated in woodwind; Blessing the troops at 1:29:39.

1C. Semitone bells

0:26:45 Source sound just before Mendoza kills Felipe (quasi-diegetic)

1D. Tritones

0:15:14 String punctuations in jungle at “making Christians”… “if you have the time” (Confronting Mendoza). Also at 0:24:20, ff.– The jealous Mendoza Alone: strings and/or woodwind, loud; 0:30:15 – Mendoza’s Remorse: string punctuations; 0:26:26 – Rising string discord in the Duel with Felipe; 0:39:40 – Knife held to Mendoza’s throat by Guaraní (str., bass) repeated); 1:42:01 – Portuguese paddle to battle; 1:51:44 – Off-key trumpet fanfare as mercenaries hack through jungle.



5.3.1. Graphic score

It is vital that your graphic score clearly shows — in hours, minutes and seconds counting from the start of the film, not of the extract — the timing at which your analysis scene starts.

Graphic scores usually consist at least of the following vertically stacked parts, each of them running horizontally left to right:

1. time line showing timecode in relation to the start of the film;

2. storyboard line, i.e. visual events represented by thumbnail photos, or by storyboard drawings or by brief verbal descriptions;

3. paramusical sound line showing speech, sound effects, etc.;

4. musical line[s].

The next three pages contain a specimen five-line graphic score covering the first scene in The Mission. The time line is at the top, in the middle and at the bottom, speech on the second line, sound effects (mostly shown as icons to save space) on the third, thumbnails on the fourth, and the scene’s three different strands of musical events on the fifth (strings, drum and ‘ethnic’ flutes).

Fig. 14-1. The Mission 0:00:37-0:02:15 – Graphic score (page 1 of 3)

The Mission 0:00:37-0:02:15 – Graphic score (page 2 of 3)

The Mission 0:00:37-0:02:15 – Graphic score (page 3 of 3)

5.3.2. Discursive analysis text

In this part of the project you discuss how the musical events detailed in your graphic score give rise to particular moods, effects or connotations. You should also discuss how those musical ‘meanings’ combine with the visuals, sound effects, dialogue, etc. to create an overall audio-visual complex of meaning. The discussion should in other words be semiotic and take into account comments and reactions gathered during the feedback session.

In the case of the graphic score just presented I’d probably start the analysis by dividing feedback responses into three categories: [1] tense, worry, headache, inner turmoil, ill, sick, confused, repressed terror; [2] unpredictable, fateful, ominous, foreboding, death, funeral, execution; [3] ethnic, South America, Indians, tropical birds, jungle. I would relate these categories to musemes 1a1 (Sick strings), 2a (Death drum) and 3a1 (Worrying woodwind – ethnic), more specifically to 3a1a (Panpipe punctuations and breathy blasts), 3a1b (Wood flute hoots), 3a1c (Screaming bird flutes). I’d try to substantiate this interpretation for museme 1a1 using iocm like [i] a very similar-sounding section one minute into Penderecki’s Threnody for the Victims of Hiroshima and [ii] various snippets of film music linked to seediness, drunkenness, motion sickness, madness, etc. I’d try to argue similarly for the death drum and the ethnic woodwind musemes, pointing out the temporal unpredictability but semantic consistency with which the latter are inserted at words relating to colonisers and the colonised (‘enslaved’, ‘settlers’, ‘La Plata’, ‘San Miguel’, ‘Indians’, ‘plateau’). I’d draw attention to the co-occurrence of the death drum with [i] the film’s first image (a European man of authority sweating uncomfortably in tropical heat at 0:00:54), and [ii] references to Rome (‘Your holiness’…’year of Our Lord 1758’ at 0:01:19 and ‘the academies of Rome’ at 0:01:46), noting how the drum is struck more regularly and repeatedly towards the end of the excerpt (from 0:01:54), accompanying the change of location (visuals and sound effects) from the enclosed European colonial mission out on to the plateau and into the jungle. It would also be worth noting how the sick strings linked to the moral turmoil inside the cardinal’s head give way at 0:01:35 to pleasantly melodic and harmonious quasi-diegetic music linked visually to the film’s ‘good guys’ —Father Gabriel and his indigenous pupils dressed in angelic white— and verbally to ‘the noble souls of these Indians’. Other musical events worth discussing might be: [i] the ‘ethnic’ flute figure after ‘I don’t think I’m setting the right note’ (0:01:12); [ii] the transscansion after ‘Begin again’ (0:01:15); [iii] the loud, spluttered pan pipe burst at ‘martyrdom’ (0:02:07); [iv] the fact that the very first audiovisual event in this film, at 0:00:45 and lasting four seconds, consists of total darkness containing a single note of music sliding uneasily down to reveal (at 0:00:49) the cardinal’s immobile face in extreme close-up; and [v] another sick string slither plus a death drum hit before the cardinal finally moves and opens his mouth to speak (0:56:24).

5.4. General discussion of music throughout the film

This discussion should be based on observations presented in the Cue list (p. 558), the Table of musical ideas (p. 565) and the In-depth analysis (p. 567). You should summarise your findings about the association of particular musical ideas with particular characters, locations, actions, attitudes and moods throughout the film, drawing on your knowledge of film music’s functions (p. 546) and other relevant concepts (p. 552) to make your arguments more convincing. You should discuss any eventual ethical or ideological dimension you think the music adds to the film, pointing out, if applicable, which individuals or groups of people, which locations, which types of action and attitude etc. are scored in positive, negative or neutral terms. This section of the project also gives you the chance to express your own opinions about how well or badly the music works in the film. In the case of The Mission I think the following sort of points would be worth discussing.

[1] Music is heard during 58% of the film’s total running time. Does the film really need that much music? Why is there relatively little music in the middle of the film?

[2] Morricone’s score contains a wide variety of musical styles, ranging from dissonant European modernism (for the cardinal’s inner turmoil and colonialist acts of violence) to idealised indigenous music, and from finely crafted exemplars of euroclassical tonality (for attitudes and acts of hope and generosity) to the death drum and other ‘threatening’ uses of percussion.

[3] The score contains several cues in which ‘humanist’ classical ideas combine with the idealised indigenous music to accompany scenes of concord where the common interests of the lowly Jesuit brothers and the indigenous Guaraní are presented visually or verbally.

[4] The unpredictable intermittent woodwind bursts on ‘ethnic’ instruments heard in the first scene become successively less threatening as the story unfolds. By the end of the film they are an integral, consonant part of the combination European humanism plus indigenous rights and dignity mentioned under point 3.

[5] The film’s main theme is really more of a motif. Built on three notes, it appears in many different guises, for example: [i] played by full symphony orchestra with classical harmonies for the main titles in front of the falls (0:05:01-0:06:26); [ii] much more discordantly for Rodrigo’s anguish and outburst in prison (0:29:23-0:31:40); [iii] sung in a very high register by an indigenous treble voice (‘innocence’) as a Miserere (= ‘Have mercy’) accompanied by almost dirge-like mid-to-low-register strings playing very simple held chords (1:59:59-2:00:50).

[6] One of the film’s main ‘humanist’ themes, Gabriel’s Oboe, occurs only once and very briefly as diegetic music (0:10:59). It’s otherwise used for longer underscore cues (at 0:14:13, 0:40:41, 1:15:58, 1:26:02, 1:27:13, 1:38:26, 1:55:22) and in the end credits. Despite being used mainly as underscore, Gabriel’s Oboe is one of the world’s most popular tunes, published in every conceivable type of arrangement, played at weddings, in concert halls, on bandstands, in bars and clubs, at figure skating competitions, in airline adverts, performed by amateurs and professionals, set to lyrics so it can be sung, used as soundtrack for ‘beautiful nature’ montages of stills on YouTube, available as a mobile phone ringtone, etc., etc. Why has this technically demanding tune that features long phrases and contains several subtleties of timing become such a popular piece to perform as well as to hear?

5.5. Appendices

Your written project should correctly list all verbal and audiovisual references you have used in your work. Norms for formulating these appendices are online at ||.

The Bibliography should contain all written verbal references (books, articles, web pages etc.) you have used in your work.

The List of Recorded References (LRR) should similarly list all recorded or broadcast materials (films, TV programmes, games, discs, etc.) you refer to in your work.

5.6. Procedure and presentation

To complete this project successfully it’s best to do its various tasks in the following order: [1] Choice of film; [2] Preliminaries (film details, motivations, etc.); [3] Cue list; [4] Choice of scene for in-depth analysis; [5] Table of musical ideas; [6] Graphic score; [7] Discursive analysis; [8] General discussion of music throughout the film; [9] Appendices; [10] Table of contents. This order of working is not the same as that in which the projects various parts should be submitted.

Your project should be submitted with its constituent parts in the following order: [1] Table of contents; [2] Preliminaries (filmography, motivations); [3] Table of musical ideas; [4] Cue list; [5] Graphic score; [6] Discursive analysis; [7] General discussion; [8] Appendices.

5.6. Technical considerations

This project requires an absolute minimum of the following software:

word processing or desktop publishing application;

• indexable spreadsheet;

• audiovisual playback capable of reading dvds and standard video file formats (mpg, avi, etc.).

Also extremely useful are the following sorts of software:

• audiovisual recording and editing;

• audio recording and editing;

• audiovisual format conversion and dvd decryption;

• still image editing;

• metronome and time calculator.

With all these applications you will be able to:

• convert dvd format (vob) to editable video formats (mpg, avi, etc.);

• join dvd vob files into one single file;

• add timecode to your film;

• split your film into manageable lengths showing the right timecode in relation to the start of the movie;

• dump screen stills to image files for use as thumbnails;

• put different music to picture (or different pictures to music);

• export sound, including music, to a separate file;

• manipulate sound files for presentation in feedback sessions;

• manipulate (still) image files (e.g. reduce to thumbnail size);

• establish metronome rate of music (in bpm);

• calculate bpm from durations and number of beats;

• calculate total duration of cues in part or all of your film.

Too much?

‘It’s too much. You can’t expect students to do all of that, especially if they’ve no formal training in music.’ These are some of the comments I hear from other teachers of both music and other subjects when I talk, often enthusiastically, of some of the student projects submitted by musos and non-musos alike. I’ve even had to ask students to lend me their work after it’s been graded and it’s back in their hands so I can show the doubting teachers who won’t check the examples of student work I’ve put on line that I’m not exaggerating. True, some students regret having registered for the course when they discover the extent of what they are expected to produce during weeks 5-13, and one or two abandon the course in week 2 after hearing what sort of work is in store for them. But even those who initially swore while slaving over their cue list must have found it worth the effort because anonymous student evaluations have been consistently positive and the generally high standard of student work suggests that there must be considerable interest and motivation for the subject. Although I’ve heard some students complain about the scope of the coursework, the most common irritant is expressed in terms like ‘I can’t watch a film any more without paying attention to the music’. ‘Don’t worry’, I reply when I get the chance: ‘I’ve learnt to switch off my analytical ear when I want, because I know I can always switch it back on again if need be’.

The scepsis of some colleagues towards the work described in this chapter contrasts starkly with the enthusiasm of many students for the work they have to do on the course. This contrast may well reflect some of the differences between knowledges discussed in Chapters 3 and 6, in that skills involved in Music and the Moving Image rely largely on aesthesic competence in music (knowledge type 1b on page 119) and its conventional status as a largely vernacular, extracurricular affair. Moreover, the project just described demands interaction with a mainly non-scribal medium including large amounts of invisible music: there is no notation to follow, there are no canonical texts to ingest, and, to quote Simon Frith again, it’s ‘literally not the sort of thing you [can] photocopy’. And yet, knowing that films, tv programmes and video games are media with which every student on every Music and Moving Image course I’ve run since 1993 has been familiar since birth, it strikes me not so much as absurd as wasteful not to help students understand and systematise their aesthesic competence in reacting to the messages, musical and otherwise, circulating in those media.

To put it bluntly, given the issues of dual consciousness and the ubiquity of invisible music in contemporary media, both raised at the start of this book, what, I wonder, should general music education be about if not the sort of thing set out in this chapter or, come to that, in this book as a whole? 2012-09-28, 19:30

_______________________________________ 2012-09-28, 19:30



abbr. = abbreviation | adj. = adjective | adv. = adverb | a.k.a. = also known as | attrib. = attribute/attributive |cf. = confer (Latin), i.e. compare | colloq. = colloquial | deriv. = derivation | derog. = derogatory | e.g. = exemplae gratiā, Latin for ‘by way of an example’ | Eng. = English | etym. = etymology | Fr. = French | Gk. = Greek | i.e. = id est, Latin for ‘that is (to say)’ | It. = Italian | Lat. = Latin | ling. = linguistic[s] | mus. = music[al] | M-W = Merriam-Webster (dictionary) | n. = noun | n. ph. = noun phrase | neol. = neologism | phon. = phonetic[s] | pl. = plural | pron. = pronunciation | q.v. = quod/quae videre, Latin for ‘which see’, i.e. go look up the term[s] just mentioned | relig. = religion | semio. = semiotic[s] | v. = verb



a cappella [ak(!pEl(] adv. mus. [1] usual sense: voice[s] only without instrumental accompaniment; etym. It. cappella = chapel, choir, i.e. in the manner of a chapel choir; [2] specialist usage: voice[s] accompanied by only church organ.

aeolian mode > church modes.

aesthesic [Is!TiùzIk] adj. (from Fr. esthésique, Molino via Nattiez); relating to the aesthesis [Is!TiùsIs] or perception of music rather than to its production or construction; opposite of poïetic.

a.k.a. abbr. also known as, alias.

aleatoric [alI(!tOrIk] adj. based on elements of chance; n. aleatorics.

alogogenic [EIlog«U!dZEnIk] adj. opposite of logogenic (q.v.).

anacrusis [Qn9!kru:sIs] n. short musical event having the character of an upbeat or pickup, i.e. a rhythmic figure and/or short tonal process propelling the music into whatever it immediately precedes; adj. anacrustic [Qn9!kru:stIk].


anaphone n. [!Qn«f«Un] neol. (1990); musical sign type bearing iconic resemblance to what it can be heard to represent (p. 487, ff.); adj. ana-phonic [Qn«!fOnik]; see also sonic anaphone, tactile anaphone, kinetic anaphone.

anaphora [«!nafor«] n. rhetorical device by which successive sentences start identically but end differently, as in Martin Luther King’s I have a dream speech; transferred to music, a melodic anaphora means that successive phrases start with the same motif but end differently, while a harmonic anaphora means that successive chord sequences start with the same change[s] but end differently. Anaphora is the opposite of epistrophe.

anhemitonic adj. mus. (of modes and scales) containing no semitone intervals (cf. hemitonic).

annexing > generic annexing.

antiphony [Qn!tIf9nI] n. mus; etym. Gk. ἀντί (= opposite) and φωνή (= voice), adj. antiphonal [an!tIf9n9l]. Antiphony is a responsorial (>) practice in which two equally dimensioned groups of singers or players exchange phrases or passages. Antiphonal practices include alternate singing by men and women, themes passed from one instrumental section to another, and the division of an English cathedral or collegiate choir into two equal halves placed on opposite sides of the quire with a central aisle between them (Decani and Cantoris).

AO [EI!«U] n. ph. abbr. neol. (1979) analysis object, i.e. a piece of music subjected to analysis.

appoggiatura [(pOdZa!tu:ra] n. mus. [1] in euroclassical music theory an accentuated, ‘dissonant’ grace note of equal duration to the following note on to which the dissonance ‘resolves’; [2] more generally a pattern of two adjacent, conjunct and equidurational notes of which the first is given more weight and joined smoothly to the second; etym. It.appoggiarsi = to lean, i.e. a leaning note; pl. appoggiature [(pOdZa!tu:rE].

arbitrary sign (a.k.a. conventional sign) n. ph. semio: sign connected by convention to what it signifies (see p. 163, ff.); cf. icon, index.


arpeggio It. [ar!pedZO], UK Eng. [A:!pEdZI(U], n. mus. (adj. arpeggiato or arpeggiated): chord whose constituent notes are played in sequence instead of simultaneously; from It. arpeggiare = to play the harp.

atmos [!QtmOs] n. (pl. atmoses [!Qtm«sIz]) a.k.a. ambience, ambient sound; an atmos presents the general ongoing soundscape, the audio scenery, the sonic backcloth, etc. relevant to the visual footage with which it is heard; etym. abbr. atmosphere.

aural staging n. ph. abbr. neol. (2011) the mise-en-scène of sound sources (voices, instruments, sound effects, etc.), in one or more acoustic spaces; particularly important in audio recordings —phonographic staging (Lacasse, 2005)— but also in film and games sound, as well as in live performance situations (see p. 299).

auto-tune [1] name of digital pitch correction software produced by Antares Audio Technologies; [2] generic term for any digital pitch correction plug-in used in studio recording or live performance, e.g. Melodyne. Best known for use in improving intonation of amateur vocalists in TV talent shows (>E X-Factor (2010)) but also used by professional artists like Peter Gabriel and Coldplay; deriv. forms auto-tuned, auto-tuner, auto-tuning.

bar n. mus. UK-English for measure (US), i.e. a recurrent musical duration defined by the number of beats (measured in bpm) of a given metre; e.g. one r bar (4 quarter-notes or crotchets) at 80 bpm lasts 3 seconds. Unless the music’s tempo is exactly 60 or 120 bpm bars are much easier to count than seconds and minutes.

bpm abbr. beats per minute (unit of tempo measurement, cf. npm).

break n. mus. [1] very short section during which ongoing accompaniment patterns in a piece of music stop to give sonic space to, and thereby highlight, whatever occupies them (see p. 520); [2] musical event[s] inside a break, as just defined, e.g. a ‘drum break’. N.B. Break, breakdown and drop have different meanings in post-1990 DJ parlance, notably in the sphere of hip hop and electronica.

breakdown notes a.k.a. timing notes n. notes prepared by the music editor and comprising a detailed list of significant events inside a single scene in an audiovisual production (cf. cue list and cue sheet).

bridge n. mus. [1] North American term for the middle eight (UK English), i.e. the contrasting B episode (normally lasting 8 bars) in the narrative form of an AABA 32-bar jazz standard (p. 397 ff.); [2] a short passage joining two contrasting sections in a euroclassical piece of music; [3] a short passage filling in between two statements of the theme in a euroclassical fugue; [4] a short musical cue joining two scenes of a different character in a film or TV production (see also tail).

call and response > responsorial.

CCCS n. abbr. Centre for Contemporary Cultural Studies, University of Birmingham, UK.

charango [tSa!raNgo] n. small stringed instrument of the lute family; used in traditional Andean music.

chord loop n. mus. sequence of (typically) three or four chords repeated several times in succession, for example: [1] the 4-chord ‘milksap’ loop or ‘vamp until ready’ { I - vi - ii/IV - V } (see E Tagg, 2007); [2] the 3-chord mixolydian rock loop { I $VII IV } or { $VII IV I } (see E Tagg, 2009b).

chord shuttle n. neol. mus. (1993) oscillation between two chords, for example the to-and-fro between tonic minor (i, B$m) and submediant major ($VI, G$) in Chopin’s Marche funèbre (1839), or Dylan’s All Along The Watchtower (1968: Am«F, a.k.a. aeolian pendulum (Björnberg, 1989)); or between ii7 and V in He’s So Fine (Chiffons, 1963), Oh Happy Day (Edwin Hawkins, 1969), or My Sweet Lord (Harrison, 1970). Chord shuttles are indicated by double ended arrows, e.g. i«$VI or B$m«G$ for Chopin’s funeral march; cf. chord loop.

‘church’ mode n., a.k.a. ecclesiastical mode; one of the six main heptatonic diatonic modes which, when arranged in scalar form with the initial note repeated at the octave, contain, in varying positions, two semitone and six whole-tone steps. The six main ‘church’ modes are: [1] ionian (c-c on the white notes of the piano); [2] dorian (d-d on the white notes); [3] phrygian (e-e); [4] lydian (f-f); [5] mixolydian (g-g);

[6] aeolian (a-a). See also pp. 325-334 and Tagg (2009: 34-37).


click denotes the metronome sound that conductors and/or musicians hear in headphones when recording music to picture. This procedure means that the music for each cue can be recorded so as to align exactly as intended with the visuals.

cluster n. mus. simultaneous sounding of several neighbouring tones.

conjunct motion n. ph. mus. melodic movement by small, normally single, intervallic steps; opposite of disjunct motion.

connote [k(!n(Ut] v. ling. to mean or signify by implication or association; n. connotation [kOn(!tEIS(n]; adj. connotative [k(!nOt(tIv]; see pp. 164-166; cf. denote.

contrary motion n. ph. mus. movement of two strands (parts) in opposite pitch directions; pitch movement away from each other; opposite of parallel motion.

constructional adj., neol. (2001) See poïetic.

continuant n. [1] phon. extendable consonant, e.g. /r/ as in ‘rrreally!’ or a long /S/ (‘shshsh’) when you want people to be quiet; [2] neol. (2011) the continuous ‘body’ of a timbre regardless of whether it’s technically the decay or the sustain part of the envelope (see pp. 278-279).

counterpoint [!kaUnt9pOInt] n. [1] mus. type of polyphony whose instrumental or vocal lines (strands) clearly differ in melodic and/or rhythmic profile; polyphonic antithesis of homophony; adj. contrapuntal [kOntr9!pYnt9l]; [2] intentional contradiction in music of concurrent verbal or visual events.

cue n. musical continuum in an audiovisual production; the duration of a cue can vary from just a few seconds to several minutes.

cue list n. a list of cue points for part or whole of an audiovisual production, i.e. the chronological enumeration of timecode locations corresponding to the start and end of each music cue (not to be confused with cue sheet or breakdown notes).

cue point n. point at which a musical cue starts, typically (but not exclusively) the start of a scene; not to be confused with hit point.


cue sheet n. [1] list of all cues in an audiovisual production, specifying details of duration, composer, publishing rights, type of usage; not to be confused with cue list; [2] list of scenes in a silent film, together with titles and sheet music publishing details of pieces suggested as suitable for each scene in the film.

da capo [da!kQp(U] adv. mus. instruction in musical notation telling musicians to go back to the start and to play or sing from the top; etym. It. da capo (= from the beginning, from the top).

denote [dI!n(Ut] v. to signify lexically; n. denotation [dIn(U!tEIS(n]; adj. denotative [dI!nOt(tIv]; see pp. 164-166; cf. connote.

diaphony [daI!Qf(nI] n. two-part vocal harmony typically featuring semitone dyads considered discordant in Western theories of harmony; often used to denote traditions of female singing in rural Bulgaria;H etym. Ancient Greek διαφωνία (diafonía = discord) as opposed to συμφωνία (symfonía = concord); adj. diaphonic [daI(!fOnik].

diataxis [daI9!tQksIs] n. mus. neol. (2011) arrangement / disposition / order of musical episodes in terms of chronological placement and relative importance (p. 383, ff.); in contradistinction to syncrisis (q.v.); etym. διάταξις= disposition, arrangement, order of events, running order, order of service, etc., as of processions, prayers, chants, bible readings, sacraments, and other ‘episodes’ in Byzantine Orthodox liturgy; adj. diatactical [daI9!tQktIk9l]); deriv. n. diataxeme [daI9!tQksi:m] identifiable element of diatactical meaning; see also episode, episodic determinant, episodic marker and syntax.

diatonic adj. conforming to the heptatonic tonal vocabulary of any of the European ‘church modes’ in which each constituent note is in English named after one of the first seven letters of the alphabet, for example a b c d e f g (aeolian in A), d e f# g a b c# (ionian in D), g a$ b$ c d e$ f (phrygian in G). Arranged in scalar form, all diatonic modes contain five whole-tone (1) and two semitone steps (½), e.g. c d (1), d e (1), e f (½), f g (1), g a (1), a b (1) and b c (½) in C ionian. Semitone steps in European diatonic modes are separated by a fifth (e.g. e - f and b -c on the white notes of a piano keyboard).


disjunct motion n. ph. mus. melodic movement containing large intervallic steps; opposite of conjunct motion.

doo-wop. n., primarily vocal genre with origins in black US gospel of the 1940s and in barber shop quartet singing. Originally sung a cappella or with simple percussion, doo-wop became part of US-mainstream pop in the 1950s and early 1960s. The term’s etymology is onomatopoeic (like fa la la la in Elizabethan madrigals), deriving from the style’s use of paralinguistic syllables vocalising approximations of instrumental accompaniment patterns, e.g. The Marcels’ version of Blue Moon (1961), Barry Mann’s Who Put The Bomp (1961).

dorian mode > church modes.

drone n. mus. continuous or frequently sounded note[s] of the same pitch. Drones are often used as tonal reference point and background for the changing pitch of the music’s other strands; see also p. 337.

dual consciousness n. perception of the self as having two conflicting identities. Fanon (1952) referred specifically to the two different cultural identities of the colonised individual in relation to [1] colonisers and [2] colonised peers. I’ve taken the liberty of using the expression to denote the widespread phenomenon of dual consciousness involving the private and public identities of an individual (p. 2, ff.).

dyad n. chord consisting of two notes of different pitch.

episode n. mus. passage containing distinct material as part of a larger sequence of events in a piece of music.

episodic determinant n. neol. (2011) sign type determining the identification of a musical passage as an episode; episodic determinants are essential to the understanding of musical diataxis, i.e the order, placement, disposition and duration of episodes (passages, periods, sections, etc.) in a piece of music; see also episodic marker.

episodic marker n. neol. (1990) musical sign type consisting of a short processual structure mediating temporal position or relative importance (see p. 516, ff.); see also diataxis and episodic determinant.

equidurational. adj. neol. (2000) of equal duration; lasting for the same amount of time.

extended present n. ph., a.k.a. present-time experience. In music, the extended present is a duration roughly equivalent to no more than that of a musical phrase (exhalation), or to a few footsteps, or a short gestural pattern, or a few heartbeats; i.e. a duration experienced as a single unit (Gestalt) in present time, as ‘now’ rather than as an extended sequence of musical ideas (see p. 272, ff.; see also intensional, phonological loop, syncrisis). The extended present can be imagined as the human brain’s equivalent to a computer’s ram where information is processed immediately, rather than as its hard drive (longer-term memory) where access and retrieval times are longer.

extensional adj. (Chester, 1970) relating to ‘horizontal’ and diatactical aspects of musical expression extended over longer durations; opposite of intensional.

falsetto n., adj. vocal phonation distinct from that of ‘normal’ singing or speaking, it covers a pitch range extending from the upper end of the head register to considerably higher pitches; falsetto singing produces a characteristically high, ‘clean’ and flute-like timbre.

Fill n. mus. (e.g. ‘guitar fill’, ‘drum fill’) short melodic and/or rhythmic phrase heard in the gap between two longer melodic phrases presented on [an]other instrument[s] or by [an]other voice[s]. A fill can sometimes overlap momentarily with the longer phrase preceding and/or following it (elision); cf. lick, riff, turnaround.

Foley n. (c. 1930) sound effect recorded to synchronise with (usually visible) on-screen events (e.g. footsteps, door shutting, clothes rustling); etym. Jack Foley, sound effects specialist in the early days of talking film; pl. Foleys.

generic annexing n. ph. neol. (2011) process whereby a verbalised response to music (> VVA) derives not from a simple structural link to other music (> IOCM) and its connotations (> PMFC) but from generically typical connotations of those primary connotations, e.g. hearing a lyrical extract from a euroclassical piano concerto and responding Timotei shampoo, not because the advert uses such music but because the respondent has seen a woman strolling through the long grass of a summer meadow in connection with another piece of music resembling the lyrical extract. That extract may have resembled the second movement of Mozart’s 21st piano concerto (1785b) which underscored the summer meadow scene in Elvira Madigan (1967) which visually, not musically resembles the Timotei shampoo advert (see also p. 221, ff.).

genre n., mus. [!ZA:nr9] set of norms, rules or habits that ‘members of a given community find useful in identifying a given set of musical and music-related practices… Genre rules can relate to any of the codes involved in a musical event —including rules of behaviour,… proxemic and kinesic codes, business practices, etc.’ (Fabbri, 2005: 8-9); cf. style; see also pp. 266-268.

genre synecdoche [!ZAnr« sIn!Ekd«kI] n. ph. mus. neol. (1992) part-for-whole sign type referring to a musical style other than that of its immediate surroundings and, by extension, to paramusical or extramusical aspects of the genre with which that ‘other’ musical style is associated; see also genre, style, synecdoche.

gestural interconversion n. ph. mus. neol. (c. 2000) anaphonic type of semiosis involving transmodal connotation in a two-way transfer via a commonality of gesture between, on the one hand, particular sensations that seem to be both subjective and internal, and, on the other hand, particular external objects (animate or inanimate) in the material world; for example, gently undulating, legato sonorities in the music compatible with smooth, rounded, caressing gestures projectable on to either a loved one, or on to rolling hills, or a cornfield billowing in the breeze (one way); and (in the other direction) the rolling hills or cornfield traceable by the hand in smooth, round, caressing hand gestures compatible with gently undulating, legato sonorities.

gospel jaw [!gOsp(ldZo:] n. ph. mus. vocal technique used primarily by female singers in the gospel and soul music tradition to simulate real vocal vibrato. The simulation, produced by wobbling the jaw rapidly up and down, is often applied towards the end of long notes by such artists as Whitney Houston.

graphocentric [grQf«U!sEntrIk] adj. neol. (J-J Nattiez in conversations with the author, c. 2005) assuming written or other graphic signs to be more important than others (see logocentric and scopocentric).

groove n. mus. sense of gross-motoric movement produced by one or more simultaneously sounded rhythm patterns lasting, as single units, no longer than the extended present, and repeated throughout a musical episode or piece. Most commonly used in reference to the perception of continuous propulsion created, typically for dancing, by the interaction of musicians in a band’s rhythm section or its accompanying parts, groove can also denote other types of perceived gross-motoric movement, as in work songs and marches. See p. 296, ff. for more.

hemiola n. mus., etym. Gk. adj. ἡμιόλιος (= ‘half as much again´) sextuple metric pattern created when the same short duration (six subbeats) is divided into two equally spaced subbeats (2 × 3), the other based on its division into three (3 × 2), for example ‘I wanna be in America’ from West Side Story (Bernstein, 1957) sung as ‘I wanna | be in A-’ (2 groups of 3 subbeats each) ‘|-me-|ri-|ca’ (3 × 2 subbeats); see also p. 457, ff.

hemitonic adj. mus. (of modes or scales) containing one or more semitone intervals within the octave.

heptatonic adj. mus. (of modes or scales) having a tonal vocabulary of seven different notes within the octave. A heptatonic mode could contain any combination of different notes, but Western music’s familiar heptatonic modes all contain a note based on each of the first seven letters of the alphabet, e.g. a b c d e f g (aeolian heptatonic in A), d e f# g a b c# (ionian heptatonic in D), g a$ b$ c d e$ f (phrygian heptatonic in G); see also diatonic, pentatonic, hexatonic.

heterophony n. mus. etym. Gk. ἕτερος (héteros = other) and φωνή ( fonē = sound) polyphony resulting from simultaneous differences of pitch produced when two or more people sing or play roughly the same melodic line at the same time.

hexatonic adj. mus. (of scales and modes) containing six different tones inside each octave (cf. pentatonic, heptatonic).

high-heeled sax: see sexaphone.

hit point n. point in an audiovisual production at which a particular musical event synchronises with a particular visual event inside a cue; not to be confused with cue point.

holokinetic [hOl9UkaI!nEtIk] adj. neol. (2011) relating to or characterised by all aspects of movement.

homophony [hO!mOf9nI] n. mus., etym. Gk. homófonos (= sounding in unison or at the same time); type of polyphony in which different strands of the music move in the same rhythm at the same time; polyphonic antithesis of counterpoint. adj. homophonic [hOm9!fOnIk].

hook n. mus. the most ear-catching and memorable museme[s] in a popular song.

IASPM [!aIJEIJEspi!JEm] or [!jQsp(m] n. abbr. International Association for the Study of Popular Music ||.

icon n. semio. sign bearing physical resemblance to what it signifies (see p. 161, ff.).

IFPI [!Ifpi:] abbr. International Federation of Phonogram Industries.

index (pl. indices) n. semio. sign connected either by causality, or by spatial, temporal or cultural proximity, to what it signifies (p. 162, ff.).

intensional adj. (Chester, 1970) relating to ‘vertical’ aspects of musical expression and to the limits of the extended present; opposite of extensional; see also syncrisis.

interval n. pitch difference between two tones; adj. intervallic.

IOCM [aI«UsI!jEm] abbr., n., neol. (1979) Interobjective Comparison Material: musical intertext[s], i.e. music other than the analysis object which bears sonic resemblance to —i.e. sounds like part or parts of— the analysis object.

IPM [aIpi:!JEm] n. abbr. Institute of Popular Music (Univ. of Liverpool).

kinetic anaphone n. neol. (1990) type of anaphone relating musical structure with perception of movement (p. 498, ff.).

lexical adj. relating to the words of a language rather than to its grammar, syntax, style or prosody and to the denotative rather than connotative meaning of those words.

library music n. a.k.a. production music; music, mostly instrumental, prerecorded and typically used in TV or radio programming, in adverts and low-budget films. Library music differs from music commissioned for particular audiovisual productions in that it’s created and recorded in advance without prior knowledge of any specific audiovisual production in which it might later be used (see p. 222, ff.).

lick n. mus. ´a stock pattern or phrase consisting of a short series of notes that is used in solos and melodic lines and accompaniment´. Licks often occur in fills and riffs, and are often used as basis for melodic improvisation in solo passages.

LMR abbr. List of Musical References. See LRR.

logocentric adj. assuming, often implicitly, that the semiotic properties of (verbal) language apply to other symbolic systems.

logogenic adj. having properties that can adequately be put into words; conducive to verbal expression (etym. λόγος: word; γένος: type); deriv. abstr. n. logogeneity [lOg«UdZ«!ni:«tI]; cf. musogenic.

loop n. mus. [1] (a) originally, a strip of recording tape whose start is attached to its end and which, when played, repeats continuously; (b) by extension, a short audio or video file whose content can be repeated continuously; [2] > chord loop (short sequence of chords repeated continuously).

lydian mode > church modes.

madrigalism n. mus. [1] vocally performed word painting; [2] any occurrence of word painting (vocal or instrumental).

measure n. mus. US-American for bar (q.v.).

Melodyne n. name of digital audio pitch correction software produced by Celemony Software GmbH (München) > autotune.

meta-identity n. image of yourself that you think others have of you. It’s an identity that does not necessarily correspond with how you see yourself or with how others in fact see you.

metre n. mus. see pp. 293-296.

mic [maIk] n. abbr. (1961, M-W) microphone; see also mike (v.).


mickey-mousing n. mus. film music technique featuring anaphonic reinforcement of on-screen action, typically patterns of speech and other types of vocalisation (e.g. laughing, crying, teasing), but also of movement (e.g. running, galloping, flying), and sound effects (e.g. door slamming, birds singing). Mickey-mousing is often criticised for its apparently redundant duplication of on-screen rhythms and patterns of movement; see also ‘Lissa’s film music functions’, pp. 546-550.

middle eight n. ph. mus. UK English term for bridge [1] (q.v.).

MIDI [!mIdi] n., adj. abbr. Music Instrument Digital Interface, the music industry’s universal protocol enabling the interconnection of electronic instruments and devices. Midi neither generates nor transmits audio, neither digitally nor analogically. Midi code includes the following sort of data about each note: [1] which sample, ‘instrument’, preset or other type of sound should be used to produce the note in question; [2] the pitch at which the note should sound (or, if [1] is a bank of non-tonal sounds, the individual sound assigned to that ‘pitch’); [3] the volume/intensity of the note (‘velocity on’); [4] the points in time at which the note should start and end.

mike [maIk] v. abbr. (1939, M-W) to supply with a microphone; to position a microphone of a particular type in relation to a sound source: miking [!maIkIN], miked [maIkt]; occasionally also as n. (see mic).

mixolydian mode > church modes.

mode n. mus. distillation of a tonal vocabulary to individual occurrences of each tone used within an octave and to the relationship of those tones to each other and, in particular, to one reference tone (the tonic) or, if bimodal, to two. For more detail, see church modes.

monody [!mOn(dI] n. music consisting of a single vocal line, or of a single melodic line with instrumental accompaniment; adj. monodic [m(!nOdIk]; cf. monophony.

monophony [m(!nOf(nI] n. music consisting of one single strand, of only one note at a time; often used in reference to unaccompanied melody (cf. monody, heterophony, homophony, polyphony); adj. monophonic [mOn(U!fOnik].

morpheme n. ling. minimal unit of speech that is recurrent and meaningful; a linguistic form that is not further divisible without destruction of meaning, for example (in English) an, at, cat, sat, mat, man, van, lychee, banana. Morphemes consist of at least one, most commonly of several phonemes (q.v.).

ms. n. abbr. [1] milliseconds; [2] manuscript.

musematic [mjuùzI!mQtIk] adj. (of musical structure) carrying musical meaning; having the characteristics of a museme, museme stack or museme string.

museme [!mjuùziùm]n. (Seeger, 1960) minimal unit of musical meaning; see pp. 232-238.

museme stack n. neol. (1979) compound of simultaneously occurring musical sounds to produce one meaningful unit of ‘now sound’ (see extended present and syncrisis); components of a museme stack may or may not be musematic in themselves.

museme string n. neol. (1979) compound of consecutively occurring musemes in one strand of music.

music editor n. audiovisual production worker responsible for timing, organising and managing music cues; liaises between director, producer and composer.

music-led montage n. neol. (2010) audiovisual footage in which visuals are edited to fit music rather than vice versa. Music-led montage is typical for music videos and is common in title sequences.

muso [!mjuùz«U] n. colloq. musician or musicologist, more specifically someone who devotes a lot of time and energy to making or talking about music, especially its technical, structural and poïetic aspects; someone with either formal training in music, or who makes music on a professional or semi-professional basis.

musogenic [!mju:z«U!dZEnIk] adj. having properties that can adequately be put into music; conducive to musical expression; cf. logogenic.

muso music n. ph. colloq. neol. (c. 1988) music most of whose devotees are musos, e.g. avant-garde types of prog rock, jazz fusion.

non-muso n. colloq. someone not exhibiting muso characteristics.

note n. mus. any single, discrete sound of finite duration in a piece of music (cf. tone).

npm abbr. neol. (2011) notes per minute —unit of measurement for surface rate and subbeats (cf. bpm and see p. 289).

octave n. mus. interval of eight tones between notes of the same name at a pitch separated by a frequency factor of two, e.g. a3 at 220 Hz, a4 at 440 Hz, a5 at 880 Hz.

nouba [!nu:ba] n. mus. erudite type of traditional instrumental music from Tunisia and other parts of the Arab world.

NTSC n. abbr. National Television System Committee; also the video scanning and recording system used in North and Central America, and consisting of 29.97 interlaced frames per second in which each frame consists of 525 scan lines, of which 486 cover the actual picture. cf. PAL and SÉCAM.

P.A. n. abbr. Public Address, as in ‘PA-system’ whereby a speaker can make announcements that are amplified and relayed over multiple speakers in a large venue, to the public (see also ftnt. 37, p. 366).

PAL n. abbr. Phase Alternate Line; [1] analogue television encoding system used throughout the world except in those areas where NTSC or SECAM is in operation; [2] scanning and recording standard running at a rate of 25 frames per second with 625 scan lines per frame.

parallel motion n. ph. mus. movement of two or more strands (parts/voices) at different pitches in the same pitch direction; opposite of contrary motion.

paramusical adj. neol. (1983) literally ‘alongside’ the music, i.e. semiotically related to a particular musical discourse without being structurally intrinsic to that discourse; see also PMFC.

parlando adj./adv. mus. spoken rather than sung.

pentatonic adj. mus. (of scales and modes) containing five different tones inside each octave (cf. diatonic, hexatonic, heptatonic).

perceptional See aesthesic.

performance squad n. ph. neol. (2012) musical ensemble consisting of between three and seven or eight members; any type of trio, quartet, quintet, sextet or septet consisting of singers and/or instrumentalists with shared musical background; see page 468.

phoneme n. ling. smallest constituent unit of sound used to construct meaning in speech (cf. morpheme). Ten different phonemes are used in UK English to construct the 25 morphemes «, Qn, Qt, hQt, kQt, vQt, kQn, vQn, hEn, kEn, It, hIt, kIt, On, kOn, kIt, hOt, kOt, hYn, hYt, kYt, A:, kA:, b«!nA:n«, nI!va:n« (= a, an, at, hat, cat, vat, can, van, hen, ken, it, hit, kit, on, con, hot, cot, Hun, hut, cut, are, car, banana, Nirvana). The ten phonemes are /Q/ /k/ and /t/ as in cat, /h/ and /E/ as in hen, /«/ as in a (indefinite article) or the unstressed a-s in banana and Nirvana, /A:/ as in car and the stressed middle a in banana or Nirvana, /I/ as in hit, /O/ as in hot, and /Ã/ as in hut. Only one of the phonemes can also work as a morpheme: the phoneme /«/, as in [b«!nA:n«], becomes also a morpheme when used as the unstressed indefinite article a [«], as in [« b«!nA:n«] (a banana), which has a completely different meaning to u: b«!nA:n«, A: b«!nA:n«, «U b«!nA:n« and D« b«!nA:n«, i.e. Ooh! Banana! (nice surprise), Ah! Banana! (I understand!), O banana! (vocative, addressing the fruit, as people so often do) and the banana (not just any old banana).

phonogram n. physical object on to which sound has been recorded acoustically, electro-acoustically or digitally; sound carrier usually sold as a commodity and which can be played on stand-alone audio equipment, e.g. LP, CD, MiniDisc, audiocassette but not audio files or sheet music; see also text.

phonographic staging n. ph. (Lacasse, 2005) > aural staging.

phonological loop n. ph. neurol. short-term (J 2"), ongoing mini-chunk of audio information inside the brain’s working memory that can be instantly recalled and strung together with up to three others in immediate succession to produce a larger chunk of ‘now sound’; see also extended present.

piece of music n. ph. musical continuum delimited, both before and after, by something that is not heard as music (e.g. silence, talking, other sound). A piece of music can also start or end when immediately preceded or followed by other music that is clearly recognised as having a different identity. If a piece of music exists as recorded sound, it will typically occupy one cd track or constitute a single audio file.

phrygian mode > church modes.

pitch n. mus. the perceived ‘height’ or ‘lowness’ of a sound, measurable in terms of high or low frequency (Herz).

PMFC [pi:EmEf!si:] abbr., neol., n. (1991) paramusical field of connotation, i.e. connotatively identifiable semantic field relating to identifiable (sets of) musical structure(s) (see paramusical).

poïetic [pO!jEtIk] adj. (from Fr. poïétique, Molino via Nattiez; etym. Gk, ποιητικός (≈ productive)): relating to the poïesis [pO!jiùsIs], i.e. to the making of music rather than to its perception (a.k.a constructional); the opposite of aesthesic, poiëtic qualifies the denotation of musical structures from the standpoint of their construction rather than their perception, e.g. con sordino, minor major-seven chord, augmented fourth, pentatonicism, etc. rather than delicate, detective chord, allegro, etc.

polyphony n. [p9!lIf9nI] etym. Gk. πολύ (polý = many) and φωνή ( fonē = sound) music in which at least two sounds of clearly differing pitch, timbre or mode of articulation occur at the same time; adj. polyphonic [pOlI!fOnIk]. Warning: some scholars of conventional musicology use polyphony to refer solely to contrapuntal tonal polyphony of the type used by certain European composers between c.1400 and c.1650.

polysemic [pOlI!si:mIk] adj. having many meanings; n. polysemy [pO!lIs«mi:].

pomo [!p«Um«U] n. & adj. abbr. neol. colloq. derog. postmodern, postmodernism, postmodernist, postmodernising.

pomorockology [p«Um«UrO!kOl«dZi] n. neol. (2002); tradition of rock criticism and journalism influenced by the discourse of literary criticism and celebrating the supposedly liberating aspects of rock music without considering matters of musical structuration or meaning.

pragmatics n. branch of semiotics focusing of the use of a sign system in concrete situations, especially in terms of cultural, ideological, economic and social activity.

present-time experience: see extended present.

production music: see library music.

prosody [!pr(Uz(di:] n. ling. the rhythm, speed, accentuation, intensity, intonation, etc. of speech; i.e. the ‘musical’ rather than the lexical-semantic aspects of speech; adj. prosodic [pr(!zOdIk]; adv. prosodically [pr(!zOdIklI].

quena [!ke:na] n. flute (flauto diritto) used in traditional Andean music.

quodlibet [kwOdlI!bEt] n. musical piece or episode ‘combining several different melodies, usually popular tunes, in counterpoint’; etym. Lat. ‘what pleases’, i.e. ‘whatever you please’.

rec. n., v., abbr. recording, recorded by.

receptional adj., neol. (2001) See aesthesic.

recitative [rEsIt(!ti:v] n. mus., fr. It. recitativo [retSIta!ti:vo] type of vocal delivery in which pitches are tonal (melodic) but whose rhythms are much closer to those of speech than to those of metric song.

refrain [rI!frEIn] n. mus. recurring chorus episode in a piece of music.

reification [ri:jIfI!kEIS(n] n. process of alienation whereby human relations, actions and ideas are understood as objects or things and, in an ideological environment dominated by capital and quantification, the inverse process whereby objects assume (e.g. through ‘advertising’) a subjective, abstract value as ideas, as signs of human interaction (commodity fetishism); see Marx (1859), Lukács (1920), Pérec (1965) and Petrović (1983).

responsorial [rEspOn!so:rI9l] adj. mus. characterised by exchange of musical material between different participants making music together. One familiar type of responsorial practice is that between a lead (solo) singer or instrumentalist and a group of singers (choir, backing vocalists) or instrumentalists (tutti), another between singer[s] and instruments, a third between a solo singer and a particular instrument. Exchange of material involving only groups of participants is likely to constitute antiphony. When a solo or lead singer is answered by backing vocalists or by a solo instrument the practice is often called call and response. Although call and response techniques are particularly common in Sub-Saharan and African-American traditions, they have a long history in other parts of the world and occur in many different music cultures, for example as responsories in the Benedictine Divine Office, or as the sawal-jawab (= ‘question-answer’) in Indian rāga music.

rhapsody n. mus. a piece of music, or part thereof, in relatively free form, often of an improvisatory character; adj. rhapsodic.

riff n. short, repeated pattern of notes with pronounced rhythmic-melodic profile lasting no longer than a musical phrase, usually less Similar to the euroclassical notion of ostinato, riffs are particularly common in rock music, in big band and jump music, and in many types of Latin-American music; e.g. Boléro (Ravel, 1928), In The Mood (Miller, 1940), Choo-Choo Ch’Boogie (Jordan, 1946), Satisfaction (Rolling Stones, 1965), Malandro (Buarque, 1985), Tim Pop con Birdland (Van Van, 2002). Riffs are often key elements in the production of groove; see also fill, lick.

rock n. and attrib. adj. (qualifying ‘music’); a wide range of popular and mainly, though not exclusively, English-language musics produced since the mid 1950s for a primarily youth audience, more usually male than female. Rock spans everything from prog rock (e.g. Genesis) to country rock (e.g. Byrds), from punk rock (e.g. Sex Pistols) to folk rock (e.g. Steeleye Span) and from heavy metal (e.g. Led Zeppelin) through thrash (e.g. Metallica) to death and speed metal (e.g. Slayer). It’s well-nigh impossible to pinpoint stylistic common denominators for such a wide range of musics, apart from the fact that the music is usually loud and its tonal instruments electrically amplified. The heyday of rock lasted from the mid 1960s to the 1990s and its musicians are mainly, though not exclusively, male. Fun, anger, opposition and corporeal celebration (‘kick-ass’) are aesthetic concepts frequently linked to rock.

rock and roll — basically synonymous with rock; cf. rock ’n’ roll.

rockology n. derog. neol. (1994) academic study, with value-aesthetic agenda, of rock music; see also pomorockology.

rock ’n’ roll n. is a much more restrictive term than rock or rock and roll; it denotes rock music produced in the 1950s and early 1960s by such artists as Chuck Berry, Bill Haley, Little Richard, Jerry Lee Lewis and Elvis Presley.

scale n. mus. single occurrences of different tones in a mode presented in strict ascending or descending order of pitch; adj. scalar.

scopocentric [skOp«U!sEntrIk] adj. neol. (Bruce Johnson, c. 1994) assuming, usually implicitly, other types of expression than visual to be of lesser importance (cf. logocentric, graphocentric, scribal).

scribal [!skraIb«l] adj. [1] orig. of or relating to a scribe (1857, M-W); [2] relating to written rather than to oral/aural symbols (cf. logocentric, logogenic, graphocentric, musogenic, scopocentric).

SÉCAM n. abbr. = Séquentiel couleur à mémoire, Europe’s first colour TV standard; analogue TV scanning and recording system used mainly in France, the ex-USSR, and in France’s ex-colonies. cf. NTSC, PAL.

semantics n. branch of semiotics focusing on the relationship between signs and what they represent; adj. semantic; cf. syntax, pragmatics.

semiology n. term used in some language cultures, for example sémiologie (francophone) and semiología (hispanophone), to denote basically the same thing as semiotics (see p. 159, ff.).

semiosis n. activity or process involving signs and the production of meaning (see p. 156, ff.).

semiotics n. the study of semiosis, i.e. of processes involving the production of signs, their formal characteristics, their intended and perceived meanings, etc.

semitone n. interval of 100 cents, or one twelfth of an octave, i.e. a pitch difference equivalent to that between the tone produced by a black key and its immediately adjacent white key on a piano keyboard, or of that between neighbouring frets on a guitar.

senza misura adv. mus. lit. without measure, without bar line, i.e. without regular metre.

set piece n. in film music contexts a type of diegetic (source) music in which the actual musical performance is prominently visible on screen as a central part of narrative ‘reality’.

sexaphone n., a.k.a. high-heeled sax, media trope consisting of short, jazzy, legato phrases on (usually alto) saxophone to underscore sexual potential in stage or on-screen narrative; see || (e.g. ‘What is Kenny G doing in everyone’s bedroom?’). See footnote 6, p. 307, for more details.

shuttle > chord shuttle.

singalong [!sIng(lON] n. a tune or passage to which, when performed, it’s easy for members of an audience to sing along; in general a tune easily sung by many people, or an occasion on which such tunes are performed (e.g. ‘Friday night singalongs at the old people’s home’); adj. attrib., e.g. ‘a singalong evening with pianist Fred Bloggs’ or ‘the singalong part of the recording’.

SMPTE [!sImptI] n. ph. abbr. the (US) Society of Motion Picture and Television Engineers; often used as oral shorthand for SMPTE code, i.e. the Society’s standard timecode system used in audiovisual production and according to which passing time is given in hours, minutes, seconds and frames, e.g. ‘01:09:50;12’ for a point at which one hour, nine minutes, fifty seconds and twelve frames have elapsed since the start of the production at 00:00:00;00.

sonic anaphone n. neol. (1990) type of anaphone relating musical structure with para- or extramusical sound (p. 487, ff.).

spotting session n. preparatory stage in composing for movies: director and composer discuss what sort of music should be used at which points in the production (‘cue spotting’).

squad > performance squad.


stand-alone equipment n. electrically powered apparatus for playback and/or recording, without need of a computer connection, of audio or audiovisual material using external media carriers, e.g. record turntable, Walkman, VCR, CD player, MiniDisc player, DVD player.

strand n. mus. single thread of sound with identifiable traits (timbre, rhythm, register, pitch contour) distinguishing it from other simultaneously sounding strands or sonorities in the music; a.k.a. line (e.g. ‘melodic line’, ‘bass line’), part (e.g. ‘oboe part’, ‘four-part harmony’), voice (e.g. ‘madrigal for five voices’, ‘harmonic voicing’), stream (Lacasse, 2000). Each musical strand is usually assigned its own track in the processes of audio recording and mixing.

stringalong; see Charity stringalong.

style (musical) n. use of musical materials typical of an individual (composer, performer), or of a group of musicians, or of a genre, a place, a culture, a historical period, etc. See pp. 266-268 and Fabbri (2005: 8-9).

subbeat n. mus. unit resulting from division by either 2 or 3 of a beat into equal durations; for example, the arrangement of subbeats in a bar of 6/8 time can be: [1] 1 × 6 subbeats = h. = iiiiil ; [2] 2 × 3 subbeats = l. l. = iil iil; [3] 3 × 2 subbeats = l l l = il il il (see also hemiola).

sync [sINk] (abbr; 1945, M-W) [1] v. synchronise; sync-ing [!sINkIN] (pres. particip.), sync-ed [sINkt] (past); [2] n. synchronisation.

syncrisis [!sInkrIsIs] n. mus. neol. (2012) musical form in terms of the aggregation of several simultaneously ongoing sounds perceptible as a combined whole inside the limits of the extended present (> museme stack), as distinct from diataxis (q.v.); etym. σύγκρισις = a putting together, aggregate, combination, from συγκρίνω = to combine, compound, put together.

syntax n. etym. σύνταξις = order, array [1] (general) the study of principles and rules for constructing ‘texts’, including written or spoken language, musical works, recordings, etc; [2] branch of semiotics focusing on the formal relationship of signs to each other without necessarily considering their meaning; [3] mus. ordering of events in sequence rather than simultaneously, particularly inside a phrase but also inside an episode (motifs, phrases, harmonic progressions etc.). The ordering of episodes throughout a whole piece of music into an overall sequence (‘long-term syntax’) is referred to as diataxis.

tactile anaphone n. neol. (1990) type of anaphone relating musical structure with the sense of touch (p. 494, ff.).

tail n. mus. snippet of film music, often after a change of scene or at the end of a bridge, that sets the mood of the new scene and tails off, often on an unresolved sonority signalling that the dramatic narrative will continue with acoustic space clear for dialogue and sound effects.

temp track (a.k.a. temp music, temp score, scratch score) n. existing music added to an audiovisual production during the editing phase; used [1] to test a film on audience focus groups and on production executives, [2] to give the soundtrack composer an idea of the sort of music the director envisages at various points in the production.

text n. mus. piece of music whose sounds are physically fixed in stored form but which, when repeated identically, are rarely heard (‘read’) in the same way as either originally intended or as heard by previous audiences. Although sheet music resembles verbal text in that it is a visual medium, its identification as musical text is questionable since it has to be put into sound, made into music (’interpreted’) by performers: its mode of storage is no more than the visual representation of certain aspects of the music’s actual sounds. Most sound recordings can, on the other hand, be considered as musical texts.

timing notes see breakdown notes.

title music n. generic term denoting music conceived for an audiovisual production’s title sequences (or credits), at or near the start (the main or opening titles) and/or at the end of the film or programme (end titles).

TLTT abbr. = Ten Little Title Tunes; 914-page source book so often referred to that its title is abbreviated to save space; see Tagg & Clarida (2003) and explanations in the preface (p. 17, ff.).

tonal adj. mus. having the properties of a tone or tones.

tonality n. mus. system according to which tones are arranged and used.

tonatim [t«U!nEItIm] adv., neol. (1992) tone for tone (etym. tone + [verb]atim).

tone n. mus. note with discernible fundamental pitch.

tonic n. mus. main reference tone in any mode or key, usually numbered ‘1’ (or ‘I’ if designating a tonic chord) in scalar sequence; adj. tonical = tonal music featuring a tonic.

transcendence n. (relig.) ‘the aspect of God's nature and power which is [imagined as] wholly independent of (and removed from) the physical universe’ (Wikipedia); (gen.) any power or quality experienced as independent of or disconnected from the material world; adj. transcendent: extending beyond the limits of ordinary experience (Kant).

transmodal [trQns!m(Ud(l] adj. crossing from one sensory mode to another, e.g. ‘loud colours’, ‘meaty guitar sound’, or as in gestural interconversion; see also under ‘Transmodal anaphones’ (p. 494 ff.); etym. transmodal logistics companies moving freight between various modes of transport, e.g. by road, then by rail, then by sea.

transpose v. mus. to pitch shift, up or down, all notes in a piece or passage to a different key; deriv. n. transposition.

transscansion [trQn!skQnS(n] n. neol. (c. 1989) short wordless motif whose melodic and rhythmic profile closely resembles that of at least two spoken syllables associated with the music in which it occurs; etym. trans (across) + scan (speak or read metrically), i.e. with the metre and rhythm of the word[s] transferred from speech into music (see p. 489).

trio n. mus. [1] three people singing and/or playing instruments; [2] the less well-known middle episode of a march or dance piece, as in ‘minuet and trio’; see also chorus and bridge.

truck driver’s gear change n. ph. colloq. change of key occurring ‘near the end of a song, shifting upwards’ (see transpose) by some relatively small pitch increment —most commonly by one semitone (half step) or whole tone (whole step) (p. 414).

turnaround n. short chord sequence at the end of one section in a song or instrumental number and whose purpose is to facilitate recapitulation of the complete harmonic sequence of that section.

underscore n. invisible non-diegetic music, usually background or incidental music, written to fit an existing visual sequence.

vamp n. chord loop with several variants whose chords generically run NI-vi-ii/IV-VO.

videogram n. physical object containing an audiovisual recording, usually, but not necessarily, of a single work; carrier of recorded sound and moving image usually sold as a commodity and playable on stand-alone equipment, e.g. videocassette, dvd, games disc.

vocal persona n. ph. vocal representation of an individual or type of individual in terms of personality, state of mind, age, gender, nationality, ethnicity, narrative archetype, etc. (see Chapter 10).

vocal staging n. (Lacasse, 2000) vocal aspect of aural staging.

VVA [vi:vi:!EI] n. ph. abbr. neol. (1983): verbal-visual association, more specifically a response to music, expressed in words and/or images.

whole-tone scale n. hexatonic (six-tone) mode consisting solely of whole-tone steps: either c d e f# g#/a$ b$ or c#/d$ e$ f g a b8.

word painting n. mus. anaphonic rendering of some aspect of a sung text by either singers or instrumentalists, e.g. a rising figure for et resurrexit, descending for descendit, discords for crucifixus, quick notes in a high register for fluttering, etc. Occurrences of word painting performed by vocalists are also called madrigalisms (see also mickey-mousing).


WARNING.Please note that hyperlinks, incl. links to other pages in the book, do not work in this specimen file.

________________________________________ 2012-09-28, 19:30

WARNING.Please note that hyperlinks, incl. links to other pages in the book, do not work in this specimen file.



To save space and to avoid the inconvenience of being referred to several appendices to obtain information about one item, this appendix contains all references of every type. This deviation from standard scholarly procedure also relates to arguments, presented in Chapters 2 and 3, about the cross-domain character of music as a symbolic system and about the equal value of different types of knowledge. Books, articles, musical notation and other scribal-verbal sources are in other words treated as neither more nor less valid sources of information and ideas than audiovisual ones. Since all types of knowledge must logically interconnect in a book about how music works, there is no good reason to separate source types into historically conventional categories of storage technology.


Order of presentation

Items appear in alphabetical order of author (writer, composer/artist etc.) or, in the case of multi-authored or anonymous works, in alphabetical order of title (book, article, film, TV show, tune, album, etc.).

Items by the same author are arranged in chronological order of first known appearance. For example, music for a film which appeared in 1968 and which is included on a sound recording from 1988 or on a DVD from 2002 is ordered as 1966, not as 1988 or 2002. Similarly, a work known to have been composed, published or released in 1832, which also appears in a volume of sheet music published in 1954 and which is included on a CD issued in 1998, is chronologically listed as 1832, except in those instances where the original year of release, publication or performance is unknown. ‘n.d.’ (= no date) signals that the year of the item’s appearance/release/publication/first performance is unknown, except in cases where no date details are to be expected (e.g. sources of traditional music).

Space-saving icons and abbreviations

To save space and to facilitate identification of the type of source and the role of various authors referred to in this comprehensive reference appendix, the following symbols are used.


Table X-1: Symbols used in this appendix

F film production n musical notation b written word

t TV production c composer[s] o cover version

w off-air recording C conductor P first published

D DVD v vocalist[s] R 1st recorded

V videocassette m performer[s] $ advert

E YouTube j writer or lyricist T title theme

G on line f film director H audio example

g video/computer game * star, actor § section/paragraph

0 phonogram (CD, LP, etc.) p publisher

L audiocassette Ä arranger > see …


Three example entries with explanations

1. Addison, John (1984) c Murder She Wrote Tt CBS wSvTV (1990).

John Addison is composer of the title theme (T) for this TV production (t), first broadcast by CBS in 1984 and recorded off-air (w) from Swedish TV in 1990.

2. High Noon (1952) F Criterion/Republic/UA f Fred Zinnemann; V4Front 054 1463 (1998); >cT Dimitri Tiomkin; 0vo> Frankie Laine; 0vR> Tex Ritter.

The source used for the music throughout this 1952 film (F) from production companies Criterion, Republic and United Artists (UA), and directed (f) by Zinnemann, is a videocassette (V) released in 1998. Details of the sources used for the title theme (T) composed (c) by Dimitri Tiomkin can be found under other entries (>): [1] Tiomkin himself; [2] Frankie Laine, who sang (v) a popular cover version (o) of [3] the original recording (R) sung (v) by Tex Ritter.

3. Mozart, W A (1791) Concerto for Clarinet and Orchestra in A major, K622 •2nd mvt. FPadre Padrone > Macchi (1977); FOut of Africa >0 Barry (1986).

Details of the sound carriers used as sources for the second movement of this Mozart concerto from 1791 are provided under two other author entries, to which the reader is referred (>): [1] the album containing Egisto Macchi’s music for the film (F) Padre Padrone (released in 1977); [2] the album (