Animating OSCAR:
Design for an Interactive Improvisation System

Excerpts from the original paper

Introduction

Composers are arguably the group of musicians who have benefited the most from the proliferation of computers. The tools that computers provide for composers have generated radical shifts in the way composers approach the creation of music. Computers have done this in two important ways. First, computers allow musicians to compose, not only with sound, but in sound as well. Many contemporary composers do not write scores to be realized by the sounds of traditional instruments. Rather, they write the sound itself, and then shape the sound they create in meaningful, musical ways. Composers had been doing this with electronics for some time before the advent of computers, but computer composition programs, such as Csound (and its earlier cousins, such as the Music N languages pioneered by Max Mathews), give the composer unprecedented control over every nuance of sound. The musical ramifications of such power could not have even been considered before the advent of computers.

The second way in which computers have profoundly changed the mindset of composers is that they have provided the possibility for interaction between musicians and machines. Once again, there is some precedent for this notion in pre-computer electronic music. Some composers tried to have musicians interact with systems of sequencers and synthesizers that were fed back into each other in such a way so that an action on the part of the musician would "steer" the electronics in one direction or another.1 Others were content with the illusion of interactionóa musician playing along with a tape of prerecorded music that seemed to be echoing and altering the live musician's activity. The potential for true interactivityóa musician reacting to a computer, and the computer reacting to the musician in a real and musical wayóhas only existed since computers became readily available. The prospect for real musical interactivity has changed the way that many composers think about compositionóit is a truly new and essentially unprecedented way to approach the creation of music. There are, in short, two reasons for composers to use computers. "One is access to sound. The other is interaction." 2

In the following sections, I intend to explain the idea of musician-computer interaction, and describe the process that has gone into the development of my own interactive music system, OSCAR. Along the way, I hope to illustrate why interactive systems have sparked such excitement in the field of music composition.

 

Interactive Music Systems

An interactive music system is a computer system whose behavior changes in response to musical input, allowing for its participation in the live performance of notated and/or improvised music. The development of an interactive music system involves aspects of music theory, cognitive science, and artificial intelligence. Music theory attempts to develop an understanding of the processes of composing, performing, and listening to music, and interactive systems apply this understanding in a practical way. Cognitive science strives to understand how concepts and activities are dealt with by the human mind, often in terms of abstract representations and manipulation of those representations. For example, understanding the musical concept of ìphraseî involves a collaborative understanding of harmony, rhythm, dynamics, and so on. Artificial intelligence involves the simulation of human cognitive behavior. For example, a computer can be programmed to recognize changes in dynamics, rhythm, and harmony. Artificial intelligence is concerned with how a computer interprets these low-level representations into high-level representations of concepts such as the musical phrase.3 Very often, theories about how we understand music (created on the surface by music theory and at a deeper level by cognitive science) are tested by simulating them using the tools of artificial intelligence. For example, the Longuet-Higgins and Lee Musical Parser 4 is a model of human rhythmic perception that is based on Lerdahl and Jackendoff's theories of human musical cognition. 5 Interactive systems take such cognitive models and apply them in real musical settings.

 

Types of Interactive Systems

Many different interactive music systems have been created. Each system was created with a specific goal in mind, a goal that influenced each system's design, capabilities, and limitations. However, interactive systems can generally be classified according to three dimensions. First, interactive systems can be either score-driven or performance-driven. Score-driven systems store predetermined collections of events that are matched against the musical input. The system's activity is directed by its perception that the musician is at a certain point in the score. Alternately, performance-driven systems do not use stored representations. Instead, these systems rely on real-time analysis of the music that the musician is playing. A system can be programmed so that certain types of events, such as a series of dense chords, trigger the system's activity. Next, interactive music systems can be transformative, generative, or sequenced. Transformative systems apply transformations to produce variations on musical input or stored musical material. The input controls the type and amount of transformations that occur. Generative systems, on the other hand, use programmed musical "knowledge" as source material. Algorithms and rules are used to assemble output material that is appropriate to the input. Sequenced systems vary pre-recorded music based on the musical input. Finally, interactive music systems can be either instrument systems or player systems. Instrument systems use a humanís gestures to control or ìsteerî the musical output, resulting in a ìsoloî instrument whose output potential is extended beyond that of traditional instruments. Very often, these systems use input controllers that are based on traditional musical instruments, such as the Zeta Violin. This allows musicians to use musical skills that they have spent years developing to musically interact with the computer. Player systems, on the other hand, construct an artificial player or accompaniment with a musical personality of its own. Once again, musicians often use electronic variants of traditional musical instruments to interact with these systems, and the computer acts as an "orchestral" accompaniment or duet partner. 6

Many interactive music systems can not be clearly classified by one or more of these three dimensions. Nonetheless, classification of a system in this way can help identify its creator's goals and intent, which undoubtedly have a large influence on the way the system is designed.

 

Existing Interactive Music Systems

An exhaustive list of existing interactive music systems is beyond the scope of this paper. Rather, the examples below are illustrative of the wide variety of interactive systems that have been created.

Intelligent Music's M is a performance-driven, transformative, instrument system that was created for the Macintosh computer by Joel Chadabe and others in 1986. "M was a collection of algorithms, portrayed graphically on the computer screen and manipulated with particular graphic controlsórange bars, numerical grids, sliders. . . designed by [David] Zicarelli 7 with musicians in mind. A composer could record some basic material through a MIDI keyboard, for example, and then use the graphic controls to transform the pitches, rhythms, and timbres of that material in a wide variety of ways." 8 M is unique among interactive music systems in that it was the first system to be designed for the home entertainment market.

Morton Subotnick's All My Hummingbirds Have Alibis was a composition created in 1993 for interactive CD-ROM. It allows the home listener to select the ordering of the sections of the composition. Users can also choose to see pictures, words, or the musical score as the resulting composition plays. Similarly Peter Gabriel's Xplora I allows a home listener to use a mouse to move on-screen faders to change the sound of the composition. 9

Robert Rowe's Cypher is a performance-driven player system designed for performance and composition. It uses transformative and generative techniques. Cypher consists of a listener and a player. The listener looks for certain types of events to occur in the music being played into it. When these events occur, the listener sends a message to the player, which generates an appropriate response. The user can connect specific event types in the listener to specific response types in the player. Response types vary from transformation of the input to generation of new material. For example, Cyper can be programmed so that when the listener hears a dense chord, the player will output an arpeggiated version of the chord. Or, for example, when the listener hears a loud note, it can cause the player to output a trill. 10

 

OSCAR Interactive Improvisation System

OSCAR is the interactive music system that I have been developing. The capital letters in OSCAR's name don't stand for anything; 11 OSCAR's name comes from the fact that the system's behavior can be, at times, quite grouchy.

Being an interactive music system, OSCAR is designed to actively participate in the live performance of music. Ideally, OSCAR should work solely on musical inputóonce the program is set up, input from a mouse or QWERTY keyboard should not be necessary.

My goal in creating OSCAR is: if I play a melody (or two), OSCAR should create another melody (or two) as appropriate counterpoint to what I am playing. Further, OSCAR should achieve this by referencing "knowledge" that it has previously stored about the structure of Bach fugues. In other words, before a performance, I should be able to play OSCAR a series of Bach fugues, and it should be able to dissect and analyze these fugues to obtain "knowledge" about how Bach works, and then use this knowledge later in a performance.

In terms of the classifications discussed above, OSCAR is performance-based, since its activity is based on the music that is being played into it during a performance, rather than on a stored representation of a score. OSCAR is mostly generative, in that it generates music based on stored knowledge of music. OSCAR occasionally behaves in a way that is transformative, since, true to the Bach paradigm in which it was created, it should occasionally capture segments of the music being played into it and use these segments to create imitative sequences. Finally, OSCAR is a player system, since it should act as an improvisation partner to the user, and should have a unique musical "personality."

OSCAR is a MIDI-based program created in the PD programming environment. My reasons for choosing to work with MIDI (rather than digital audio) and with PD (rather than Max-MSP or C++) are somewhat complex, and require explanation.

 

Why MIDI?

MIDI is the Musical Instrument Digital Interface, a transmission protocol developed in the 1980's to allow computers, synthesizers, samplers, and controllers made by different manufacturers to communicate and therefore be used together in a single musical environment. MIDI represents music at the note level, transmitting discrete eight-bit words to communicate musical "events" such as note on, note off, and pitch bend. MIDI has inherent limitations that make it less than ideal for the creation of computer music. First, it was designed around a "keyboard controller" mentality, which forces the user to think in terms of discrete notes of the chromatic scale. Creating music that is not based around traditional Western conceptions of pitch and harmony is very difficult when one is using MIDI. Next, MIDI transmits information serially at a rate of roughly 32 kbps. This is far too slow to transmit the information-rich phenomenon of music, so much important musical information is lost. If, for example, one wanted to closely track the pitches being played by a violin player who was using a lot of vibrato, the bandwidth supplied by MIDI would quickly be used, resulting in "MIDI clog"ó information would be transmitted late, or not all. Finally, and perhaps most importantly for the creation of computer music, MIDI transmits very little information about timbre; aside from general program change information ("change now to an acoustic piano sound"), timbral information is not able to be transmitted via MIDI. Instead, the creation of timbre is left up to the synthesizer or sampler at the receiving end of a MIDI transmission. Changing a timbre in expressive or realistic ways during a live performance is nearly impossible to accomplish when using MIDI. MIDI is therefore very limited in its possible use for the production of computer music.

In spite of these limitations, MIDI can be of use when one is creating an interactive music system. The building of a program such as OSCAR involves combining the program's "understanding" of elementary sonic events into more complex structures. If you can program the system to understand harmonic progression and rhythm, you can combine these understandings and teach the program to understand concepts such as "phrase." If one were to try to make a program like OSCAR that worked with digital audio instead of MIDI, one would be starting at the most basic level of sonic understanding. Before anything else could be done, the system must be programmed to determine when notes start and stop, and what the fundamental frequency of each note is. Each note must be assigned a stable, quantifiable measurement of its amplitude. This must be accomplished despite the fact that, in real musical settings, multiple notes often are playing simultaneously, starting and stopping at different times. Acoustic instruments have complex timbres that further complicate this taskópitch, amplitude, and harmonic information all vary rapidly over time. Devices such as pitch trackers and envelope followers have been developed, but they operate best when analyzing monophonic music played on instruments with fairly simple harmonic content. Even under these restricted conditions, such devices do not work particularly well, and they generally fail completely when used in "real" musical situations, where multiple instruments with different harmonic content are playing any number of notes at the same time.

MIDI provides easily quantifiable note numbers and velocity (amplitude) measurements. The starting and stopping point of each note is precisely known. MIDI, therefore, allows the "perceptual" level of analysis to be skipped. One can then begin to work on the complex task of combining these low-level concepts of notes into higher-level conceptions such as melody, harmony, and rhythm.

 

Why PD?

PD ("Pure Data" or "Public Domain") is a graphical programming environment that is oriented toward the creation of musical systems, interactive or otherwise. PD was created by Miller Puckette, and closely resembles the Max-MSP programming environment, which was created by Puckette at IRCAM during the late 1980's. Like Max, PD uses a visual box-and-patch-cord approach to programming. An "object box" contains a data structure or functional operation, and the inputs and outputs of different object boxes are connected with patch cords. Programming in PD and Max is a matter of connecting various objects in ways that produces useful behavior. The architecture of PD is much more open-ended than that of Max. Many users have created extensions of PD to allow it perform specialized operations.  Gem, for example, is a set of libraries and objects created by Mark Danks that allows PD to create and interpret graphics. Also, some users have created Max-like graphical objects for PD. While these graphical objects resemble their Max counterparts, they are much more powerful. A button in Max, for example, sends out a bang (interpreted by most objects as a prompt to perform their specified operations) when clicked, but a similar-looking button in PD can be made to output anything: a number, a message, or a long string of information.

While Max is available only for the Macintosh, PD is supported across multiple platforms, including Windows NT and 98, Linux, and IRIX. Patches are portable between these platforms, so a patch created on a Windows machine would theoretically work on a Linux machine.

Finally, and perhaps most importantly, PD more closely approximates true object-oriented programming design. Max is object-based, but it falls short of true object-oriented design in several areas, most notably polymorphism and multiple inheritance. In PD, on the other hand, you can create template patches that can be modified with input, initialization arguments, and the activity of other patches. You can even create PD patches that, on demand, create other instances of themselves or of other patches. This greatly facilitates efficient programming, and provides capabilities that would otherwise not be possible.

 

Tools of Artificial Intelligence

If computers are to interact with musicians in real, musical ways, they must clearly be able to understand music on some level. How is such understanding accomplished?

As mentioned, artificial intelligence involves the simulation of human cognitive behavior. Specifically, artificial intelligence is concerned with making a computer interpret low-level representations (such as a stream of MIDI note information) into high-level representations of concepts (such as a conception of harmonic progression). Artificial intelligence systems have historically used a wide variety of tools to accomplish this goal, but systems generally take one of two possible approaches: rule-based or connectionist.

Rule-based systems use explicit rules to manipulate symbols: they are formal systems. For a formal system to be used, input must be turned into symbols. The system then processes the symbols according to explicitly stated rules. After processing, the symbols can then be converted to meaningful output. The important point is that while formal systems are processing symbols, they have no understanding of what the symbols represent; they simply know how to operate on the symbols. Calculus is an example of a formal system, in that it consists of a set of symbols and possible operations on the symbols. What the symbols actually represent is irrelevant as far as the system is concerned. The goal when designing a formal system is to make sure that any possible set of operations results in a set of symbols that can still be converted to meaningful output. So, when constructing a rule-based system, a programmer must find ways to convert the input to symbols, and must create a set of operations that can be used to convert symbols into other meaningful symbols.

Connectionist systems, also called "neural networks," represent a different approach to programming. Connectionist systems "consist of a large number of simple elements, or cells, each of which has its own activation level. These cells are interconnected in a network, the connections serving to excite or inhibit others." 12 In connectionist systems, all knowledge is represented implicitly. The behavior of connectionist systems depends not on rules, but on the manner in which these simple cells are interconnected. 13 The most important characteristic of connectionist systems is that they can learn their own rules. For example, a neural network can be used to teach a computer to identify the root of chords. The system would have twelve input "nodes;" each node corresponds to one of the twelve pitch classes in the chromatic scale. Each input node is connected to nodes representing all possible chord roots. A process of back propagation is used; the system is given three input notes (say for instance, C, E, and G) and is given the correct answer, corresponding to one of the chord root nodes (C in this case). The system automatically adjusts the "weights" of the connections between the correct chord root node and each of the input nodes. The connections to the correct (C, E, and G) input nodes are given a stronger weight, and the connections to the other input nodes are given weaker weights. Over time, given a "teaching set" of inputs and correct answers that is redundant and varied enough to represent all possible combinations of inputs, the weights of all of the connections will be refined. The system will then, given an input of three notes, always identify the correct chord, without the programmer having ever stated the rules needed to accomplish this. 14

A "fuzzy logic" approach is an attempt to create formal rule-based systems that are less reliant on explicit rules. A fuzzy logic rule has a set of possible solutions, and these solutions are ranked according to how often they should occur. Such systems can produce output that is orderly but not explicitly predictable. For example, in creating a system that generates harmonic progressions, one could create explicit rules: for example, "a major seventh should be followed by a octave." A system like this, however, would result in output that is always the sameóthis result would be "correct" in the strictest sense but would not, obviously, be particularly musical. A fuzzy-logic approach would soften the rules, allowing for the possibility of more varied output: "a major seventh should be followed by an octave seventy-five percent of the time, by a perfect fifth ten percent of the time, and by a major sixth five percent of the time." Such a system would result in output that would be orderly but varied. 15

A related approach is that of Markov Chains. "Markov chains are series of linked states. Each state moves to a successor, in what is called a transition. The state at the beginning of a transition is the source; the state at the end is the transition's destination. In a Markov chain, each successive destination of one transition becomes the source of the next. The behavior of the chain is captured by a table of transition probabilities, which gives the likelihood of any particular destination being reached from some source." 16 As a musical example, a first-order Markov chain can, given the previous harmonic interval, determine what the next harmonic interval should be. That generated interval becomes the next "previous" interval, and is used to generate what the next interval will be. A second-order Markov chain would look at the two most recent intervals to determine what the next interval should be. Markov chains can be created using a quasi-connectionist approach, enabling a system to construct the transition probability tables automatically.

 

Notes

  1. Joel Chadabe's Electric Sound: The Past and Promise of Electronic Music (New Jersey: Prentice Hall, 1997): 286-293.
  2. Ibid., 323 [tense change mine].
  3. Robert Rowe's. Interactive Music Systems (Cambridge, Mass.: The MIT Press, 1993):1-6.
  4. H. C. Longuet-Higgins and C. S. Lee's "The Perception of Musical Rhythms." Perception 11 (1982): 115-128. Also "The Rhythmic Interpretation of Monophonic Music." Music Perception 1, no. 4 (1984): 424-441.
  5. F. Lerdahl and R. Jackendoff's A Generative Theory of Tonal Music (Cambridge, Mass.: The MIT Press, 1983).
  6. Rowe, 6-7.
  7. David Zicarelli was also the creator of the commercial version of the Max-MSP graphical programming environment for musicians.
  8. Chadabe, 317.
  9. Ibid., 333.
  10. Rowe, 41-50.
  11. My apologies to Peter Beyls, who created an earlier system called "Oscar." I was unaware of his system's existence at the time that I named mine.
  12. Peter Desain's "A connectionist and a traditional AI quantizer, symbolic versus sub-symbolic models of rhythm perception." Contemporary Music Review 9, Parts 1 & 2 (1993): 239.
  13. P. Desain and H. Honing's "The Quantization of Musical Time: A Connectionist Approach." Computer Music Journal 13, No. 3 (1989): 57.
  14. Rowe, 135-136.
  15. Elsea, Peter. ìMusical Applications of Fuzzy Logic.î University of California at Santa Cruz. Online resource, available at: http://arts.ucsc.edu/EMS/Music/research/FuzzyLogicTutor/FuzzyTut.html. 1995
 

Bibliography

    Bresin, Roberto and Anders Friberg. "Emotional Coloring of Computer-Controlled Music Performances." Computer Music Journal 24, No. 4 (Winter 2000): 44-63.
    Castle, Harry D.. Musical Expression and the Human-Computer Interface. Online resource, available at http://www.crca.ucsd.edu/~hcastle/FinalPaper.htm. Spring 1999.
    Chadabe, Joel. Electric Sound: The Past and Promise of Electronic Music. New Jersey: Prentice Hall, 1997.
    Clarke, Eric R.. "Rhythm and Timing in Music." In The Psychology of Music, 2d ed., ed. Diana Deutsch, pp. 473-500. San Diego, California: Academic Press, 1999.
    Desain, Peter. "A connectionist and a traditional AI quantizer, symbolic versus sub-symbolic models of rhythm perception." Contemporary Music Review 9, Parts 1 & 2 (1993): 239-254.
    Desain, P., and H. Honing. "The Quantization of Musical Time: A Connectionist Approach." Computer Music Journal 13, No. 3 (1989): 56-66.
    ________. "Advanced issues in beat induction modeling: syncopation, tempo, and timing." Proceedings of the 1994 International Computer Music Conference, pp.92-94. San Francisco: International Computer Music Association, 1994.
    ________. "Computational models of beat induction: the rule-based approach." Journal of New Music Research 28, no. 1 (1999): 29-42.
    Dreyfus, Hubert L.. What Computers Canít Do. New York: Harper & Row, 1972.
    Elsea, Peter. "Musical Applications of Fuzzy Logic." University of California at Santa Cruz. Online resource, available at: http://arts.ucsc.edu/EMS/Music/research/FuzzyLogicTutor/FuzzyTut.html. 1995.
    Fodor, Jerry A., and Zenon W. Pylyshyn. "Connectionism and cognitive architecture: a critical analysis." From Connections and Symbols, Steven Pinker and Jacques Mehler, eds., pp. 3-71. Cambridge, Massachusetts: MIT Press, 1988.
    Hamman, Michael. "From Symbol to Semiotic: Representation, Signification, and the Composition of Music Interaction." Journal of New Music Research 28, No.2: pp. 90-104.
    Heijink, Hank, et al. "Make Me a Match: An Evaluation of Different Approaches to Score-Performing Matching." Computer Music Journal 24, No.1 (Spring 2000): 43-56.
    Large, Edward W.. "The Resonant Dynamics of Beat Tracking and Meter Perception." Proceedings of the 1994 International Computer Music Conference pp.90-91. San Francisco: International Computer Music Association, 1994.
    Lerdahl, F., and R. Jackendoff. A Generative Theory of Tonal Music. Cambridge, Mass.: The MIT Press, 1983.
    Longuet-Higgins, H. C., and C. S. Lee. "The Perception of Musical Rhythms." Perception 11 (1982): 115-128.
    ________. "The Rhythmic Interpretation of Monophonic Music." Music Perception 1, no. 4 (1984): 424-441.
    Parncutt, Richard. "A model of beat induction accounting for perceptual ambiguity by continuously variable parameters." Proceedings of the 1994 International Computer Music Conference, pp.83-84. San Francisco: International Computer Music Association, 1994.
    Rosenthal, David. "A Model of the Process of Listening to Simple Rhythms." Music Perception 6, no. 3 (1989): 315-328.
    Rosenthal, D., M. Goto, and Y. Muraoka. "Rhythm Tracking Using Multiple Hypotheses." Proceedings of the 1994 International Computer Music Conference, pp.85-87. San Francisco: International Computer Music Association, 1994.
    Rowe, Robert. Interactive Music Systems. Cambridge, Mass.: The MIT Press, 1993.
    Tenny, J., and L. Polansky. "Temporal Gestalt Perception in Music." Journal of Music Theory 24 (1980): 205-241.

Email Greg OSCAR Main Page