Sound Architecture Overview

Design Goals

The Java Sound API is a low-level API for effecting and controlling input and output of audio media. It provides explicit control over the capabilities commonly required for audio input and output in a framework that promotes extensibility and flexibility.

Because sound is so fundamental, Java Sound fulfills the needs of a wide range of customers. Potential application areas include:

Java Sound enables applications to extend audio support with specialized capabilities, and integrates well with architectures that provide higher-level interfaces and integration with other media types.

Java Sound provides the lowest level of audio support on the Java platform. It provides a high degree of control over audio-specific functionality. For example, it provides mechanisms for installing, accessing, and manipulating system resources such as digital audio and MIDI (Musical Instrument Digital Interface) devices. It does not include sophisticated sound editors and GUI tools; rather, it provides a set of capabilities upon which such applications can be built. It emphasizes low-level control beyond that commonly expected by the end user, who benefits from higher-level interfaces built on top of Java Sound.

Note: Throughout this document, the word "application" refers generically to Java applets as well as Java applications.

 

Packages

The Java Sound API includes support for both digital audio and MIDI data. These two major modules of functionality are provided in separate packages:

javax.sound.sampled
This package specifies interfaces for capture, mixing, and playback of digital (sampled) audio.

javax.sound.midi
This package provides interfaces for MIDI synthesis, sequencing, and event transport.

Two other packages permit service providers (as opposed to application developers) to create custom components that can be installed on the system:

javax.sound.sampled.spi
javax.sound.midi.spi

The next section of this document discusses the sampled-audio system, including an overview of the javax.sound.sampled API. The final section covers the MIDI system and the javax.sound.midi API.

 

Sampled Audio

The javax.sound.sampled package handles digital audio data, also referred to as sampled audio. ("Samples" refer to successive snapshots of a signal, which in the case of digital audio is a sound wave. For example, the audio recorded for storage on compact discs is sampled 44100 times per second. Typically, sampled audio comes from a sound recording, but the sound could instead be synthetically generated. The term "sampled audio" refers to the type of data, not its origin. Sampled audio can be thought of as the sound itself, whereas MIDI data can be thought of as a recipe for creating musical sound.)

Java Sound does not assume a specific audio hardware configuration; it is designed to allow different sorts of audio components to be installed on a system and accessed by the API. Java Sound supports common functionality such as input and output from a sound card (for example, for recording and playback of sound files) as well as mixing of multiple streams of audio. Here is one example of a typical audio architecture for which Java Sound might be used:

The logic flows from the input device to the software mixer and finaly to the output device. Each device has various input and output ports.

A Typical Audio Architecture

In this example, a device such as a sound card has various input and output ports, and mixing is provided in software. The MIDI synthesizer shown as one of the mixer’s audio inputs might also be a feature of the sound card, or it might be implemented in software. (The javax.sound.midi package, discussed later, supplies a Java interface for synthesizers.)

The major concepts used in the javax.sound.sampled package are described in the sections below.


Lines

A line is an element of the digital audio "pipeline," such as an audio input or output port, a mixer, or an audio data path into or out of a mixer. The audio data flowing through a line can be mono or multichannel (for example, stereo). Each type of line will be described shortly, but first some of their functional relationships will be illustrated, showing the flow of audio through the "pipeline." The following diagram shows different types of lines in a simple audio-output system:

The following context describes this graphic.

A Possible Configuration of Lines for Audio Output

In this example, an application has asked a mixer for one or more available clips and source data lines. A clip is a mixer input into which you can load audio data prior to playback; a source data line is a mixer input that accepts a real-time stream of audio data. The application preloads audio data from a sound file into the clips, and then pushes other audio data into the source data lines. The mixer reads data from these lines, each of which may have its own reverberation, gain, and pan controls, and uses the reverb settings to mix the "dry" audio signals with the reverberated ("wet") mix. The mixer delivers its final output to one or more output ports, such as a speaker, a headphone jack, and a line-out jack.

Although the various lines are depicted as separate rectangles in the diagram, they are all "owned" by the mixer, and can be considered integral parts of the mixer. The reverb, gain, and pan rectangles represent processing controls (rather than lines) that can be applied by the mixer to data flowing through the lines. (Note that this is just one example of a possible audio system that is supported by the API. Not all audio configurations will have all the features illustrated. An individual source data line might not support panning, a mixer might not implement reverb, and so on.)

A simple audio-input system might be similar:

The following context describes this graphic.

A Possible Configuration of Lines for Audio Input

Here, data flows in to the mixer from one or more input ports, commonly the microphone or the line-in jack. Gain and pan are applied, and the mixer delivers the captured data to an application via the mixer's target data line. A target data line is a mixer output, containing the mixture of the streamed input sounds. The simplest mixer has just one target data line, but some mixers can deliver captured data to multiple target data lines simultaneously.

The different types of line will now be examined more closely. Several types of line are defined by subinterfaces of the basic Line interface. The interface hierarchy is shown below.

See long description[D]

The Line Interface Hierarchy

The base interface, Line, describes the minimal functionality common to all lines:

Ports are simple lines for input or output of audio to or from audio devices. The Port interface has an inner class, Port.Info, that specifies the type of port. Some common types are the microphone, line input, CD-ROM drive, speaker, headphone, and line output.

The Mixer interface represents a hardware or software device that has one or more input lines and one or more output lines. This definition means that a mixer need not actually mix data; it might have only a single input. The Mixer API is intended to encompass a variety of devices, but the typical case supports mixing.

The Mixer interface provides methods for obtaining a mixer's lines. These can include target data lines, from which an application can read captured audio data, source data lines, to which an application can write audio data for playback (rendering), and clips, in which an application can preload sound data for playback. The mixer can lock these resources. For example, if the mixer has only one target data line and it is already in use, an attempt by an application to obtain a target data line will cause an exception to be thrown.

You can query a mixer for lines of different types, by passing the appropriate type of Line.Info. You can also ask the mixer how many lines of a particular type it supports.

A mixer maintains textual information about its specific device type in an inner class called Mixer.Info. This information include the product's name, version, and vendor, along with a textual description.

Notice that the generic Line interface does not provide a means to start and stop playback or recording. For that you need a data line. The DataLine interface supplies the following additional media-related features beyond those of a Line:

An application can obtain a data line from a mixer. If the data line cannot be allocated because of resource constraints (for example, if the mixer supports only one target data line and it is already in use), an exception is thrown.

A TargetDataLine receives audio data from a mixer. Commonly, the mixer has captured audio data from a port such as a microphone; it might process or mix this captured audio before placing the data in the target data line's buffer. The TargetDataLine interface provides methods for reading the data from the target data line's buffer and for determining how much data is currently available for reading. If an application attempts to read more data than is available, the read method blocks until the requested amount of data is available. This applies even if the amount of data requested is greater than the line's buffer size. The read method returns if the line is closed, paused, or flushed.

Applications recording audio should read data from the target data line fast enough to avoid overflow of the buffer, which results in discontinuities in the captured data. If the buffer does overflow, the oldest queued data is discarded and replaced by new data.

A SourceDataLine receives audio data for playback. It provides methods for writing data to the source data line's buffer for playback, and for determining how much data the line is prepared to receive without blocking. If an application attempts to write more data than is available, the read method blocks until the requested amount of data can be written. This applies even if the amount of data requested is greater than the line's buffer size. The write method also returns if the line is closed, paused, or flushed.

Applications playing audio should write data to the source data line fast enough to avoid underflow (emptying) of the buffer, which may result in discontinuities in audio playback. If audio playback stops due to underflow, a STOP event is generated. A START event is generated when presentation resumes.

A Clip is a data line into which audio data can be loaded prior to playback. Because the data is pre-loaded rather than streamed, the clip’s duration is known before playback, and you can choose any starting position in the media. Clips can be looped, meaning that upon playback, all the data between two specified loop points will repeat a specified number of times, or indefinitely.

A GroupLine is a synchronized group of data lines. If a mixer supports group lines, you can specify which data lines should be treated as a group. Then you can start, stop, or close all those data lines by sending a single message to the group, instead of having to control each line individually.


Controls

Data lines and ports often have a set of controls that affect the audio signal passing through the line. The way in which the signal is affected depends on the type of control. The Java Sound API defines the following subclasses of Control:

Programmatically, you obtain a particular control object from a line through a reference to the control’s class. You can also obtain an array of all the controls for that line.


AudioSystem

The AudioSystem class serves as an application's entry point for accessing the installed sampled-audio resources. You can query the AudioSystem to learn what sorts of audio components have been installed, and then you can obtain access to them. For example, an application might start out by asking the AudioSystem class whether there is a mixer that has a certain configuration, such as one of the input or output configurations illustrated earlier in the discussion of lines. From the mixer, the application would then obtain data lines, and so on.

Here are some of the resources an application can obtain from the AudioSystem:



System configuration (SPI classes)

Service provider interfaces for the sampled audio system are defined in the javax.sound.sampled.spi package. Service providers can extend the classes defined here so that their own audio devices, sound file parsers and writers, and format converters can be installed and made available by a Java Sound implementation.



MIDI

Interfaces describing MIDI event transport, synthesis, and sequencing are defined in the javax.sound.midi package. The major concepts used in the package are described in the sections below.


Functional Overview

The diagram below illustrates the functional relationships between the major components in a typical Java Sound MIDI configuration. (Java Sound permits a variety of devices to be installed and interconnected. The system shown here is just one possible scenario.) The flow of data between components is indicated by arrows. The data can be in a standard file format, or (as indicated by the key in the lower right corner of the diagram), it can be audio, raw MIDI bytes, or Java Sound's MidiEvent objects.

The following context describes this graphic

A Typical MIDI Configuration

In this example, the application prepares a musical performance by loading a musical score that is stored as a Standard MIDI File on a disk (lower left corner of the diagram). Standard MIDI files contain tracks, each of which is a list of time-tagged MIDI events. This MIDI file is read into a Sequence object, whose data structure reflects the file. A Sequence contains a set of Track objects, each of which contains a set of MidiEvent objects. The Sequence is then "performed" by a Sequencer. A Sequencer performs its music by sending MidiEvents to some other device, such as an internal or external synthesizer.

As illustrated, MidiEvents must be translated into raw (non-time-tagged) MIDI before being sent through a MIDI output port to an external synthesizer. This conversion is accomplished by a a MIDI-output device called a StreamGenerator. Similarly, raw MIDI data coming into the computer from an external MIDI source is translated into MidiEvents by a StreamParser.

The internal synthesizer (the rectangle marked "Synthesizer" in the diagram) accepts MidiEvents directly from the Sequencer or StreamParser. It parses each event and usually dispatches a corresponding command (such as noteOn) to one of its MidiChannel objects, according to the MIDI channel number specified in the event. (The MIDI Specification calls for 16 MIDI channels, so a Synthesizer typically has 16 MidiChannel objects.)

The MidiChannel uses the note information in these messages to synthesize music. For example, a noteOn message specifies the note's pitch and "velocity" (volume). However, the note information is insufficient; the synthesizer also requires precise instructions on how to create the audio signal for each note. These instructions are represented by an Instrument. Each Instrument typically emulates a different real-world musical instrument or sound effect. The Instruments might come as presets with the synthesizer, or they might be loaded from soundbank files. In the synthesizer, the Instruments are arranged by bank number (the rows in the diagram) and program number (the columns). An Instrument can make use of stored digital audio, included as Sample objects in the soundbank. For example, to play the sound of a trumpet playing a 5-second-long note, the synthesizer might loop (cycle) through a half-second snippet of a recording of a trumpet.

Now that the components have been introduced from a functional perspective, we will take a brief look at the API from a programmatic perspective.


MidiEvent

A MidiEvent object specifies the type, data length, and status byte of the raw MIDI message for which it serves as a wrapper. In addition, it provides a tick value that is used by devices involved in MIDI timing, such as sequencers.

There are three categories of events, each represented by a MidiEvent subclass:


MidiDevice

The base interface for devices is MidiDevice. All devices provide methods for listing the set of MIDI modes that they support, and for querying and setting the current mode. (The mode is a combination of MIDI's Omni mode and Mono/Poly mode.) Devices can be opened and closed, and they provide descriptions of themselves through a MidiDevice.Info object.

The following diagram illustrates the MidiDevice interface hierarchy. Also depicted are two classes, connected by dashed lines to the MidiDevice interfaces they implement.

The following context describes this graphic

MidiDevice Hierarchy

Devices are generally either transmitters or receivers of MidiEvents. The Transmitter subinterface of MidiDevice includes methods for setting and querying the receivers to which the transmitter is sending MidiEvents. From the perspective of a transmitter, these receivers fall into two categories: MIDI Out and MIDI Thru. The transmitter sends events that it generates itself to its MIDI Out receivers. If the transmitter is itself also a receiver, it passes along events that it has received from elsewhere to its MIDI Thru receivers. The Receiver subinterface of MidiDevice consists of a single method for receiving MidiEvents. Typically, this method is invoked by a Transmitter.

Java Sound includes concrete classes for converting between MidiEvent objects and the raw byte stream used in MIDI wire protocol. A StreamGenerator is a Receiver that accepts MidiEvent objects from a Transmitter and writes out a raw MIDI byte stream. Similarly, a StreamParser is a Transmitter that accepts a raw MIDI byte stream and writes the corresponding MidiEvent objects to its Receiver.


Synthesis

A Synthesizer is a type of MidiDevice that generates sound. The Synthesizer interface extends both Receiver and Transmitter. It provides methods for manipulating soundbanks and instruments. In addition, it provides access to a set of MIDI channels through which sound is actually produced. A Synthesizer receives MidiEvents and invokes corresponding MidiChannel messages.

MidiChannels have methods representing the common MIDI voice messages such as "note on" and "control change." They also permit queries of the channel's current state.


Sequencing

Like Synthesizer, the Sequencer interface extends both Transmitter and Receiver (and therefore MidiDevice). Sequencer adds methods for basic MIDI sequencing operations. A sequencer can load and play back a sequence, query and set the tempo, and control the master and slave sync modes. An application can register to be notified when the sequencer processes MetaEvents and controller events. (A controller event occurs when a MIDI controller, such as a pitch-bend wheel or a data slider, changes its value. These events are not MidiEvents, but are created when the Sequencer encounters certain ShortEvents in the Sequence.)

The Sequence object represents a MIDI sequence as one or more tracks and associated timing information. A track contains a list of time-stamped MIDI events. Sequences can be read from MIDI files, or created from scratch and edited by adding Tracks to the Sequence (or removing them). Similarly, MidiEvents can be added to or removed from the Tracks.

A track contains a list of time-stamped MIDI events. Sequences can be read from MIDI files, or created from scratch and edited by adding Tracks to the Sequence (or removing them). Similarly, MidiEvents can be added to or removed from the Tracks.

It is not necessary to load a MIDI file into a Sequence object before playing the file. The setSequence(java.io.InputStream) method of Sequencer lets you read a MIDI file directly into a Sequencer, without creating a Sequence object first.


MidiSystem

MidiSystem acts as an application's entry point to the MIDI music system. It provides information about, and access to, the set of installed devices, including transmitters, receivers, synthesizers, and sequencers.

The MidiSystem class provides methods for reading MIDI files to create Sequence objects, and for writing Sequences to MIDI files. A MIDI Type 0 file contains only one track, while a Type 1 file may contain any number. MidiSystem also provides methods to create Soundbank objects by parsing soundbank files.


System configuration (SPI interfaces)

Configuration of the MIDI system is handled in the javax.sound.midi.spi package. The abstract classes in this package allow service providers to supply and install their own MIDI devices, MIDI file readers and writers, and soundbank file readers.


Copyright © 1999 Sun Microsystems, Inc. All Rights Reserved.

Please send comments to: mailto:javasound-comments@javamedia.eng.sun.com
Sun
Java Software