The Java Sound API is a low-level API for effecting and controlling input and output of audio media. It provides explicit control over the capabilities commonly required for audio input and output in a framework that promotes extensibility and flexibility.
Because sound is so fundamental, Java Sound fulfills the needs of a wide range of customers. Potential application areas include:
Java Sound enables applications to extend audio support with specialized capabilities, and integrates well with architectures that provide higher-level interfaces and integration with other media types.
Java Sound provides the lowest level of audio support on the Java platform. It provides a high degree of control over audio-specific functionality. For example, it provides mechanisms for installing, accessing, and manipulating system resources such as digital audio and MIDI (Musical Instrument Digital Interface) devices. It does not include sophisticated sound editors and GUI tools; rather, it provides a set of capabilities upon which such applications can be built. It emphasizes low-level control beyond that commonly expected by the end user, who benefits from higher-level interfaces built on top of Java Sound.
Note: Throughout this document, the word "application" refers generically to Java applets as well as Java applications.
The Java Sound API includes support for both digital audio and MIDI data. These two major modules of functionality are provided in separate packages:
javax.sound.sampled
This package specifies interfaces for capture, mixing, and playback
of digital (sampled) audio.
javax.sound.midi
This package provides interfaces for MIDI synthesis, sequencing,
and event transport.
Two other packages permit service providers (as opposed to application developers) to create custom components that can be installed on the system:
The next section of this document discusses the sampled-audio system, including an overview of the javax.sound.sampled API. The final section covers the MIDI system and the javax.sound.midi API.
The javax.sound.sampled package handles digital audio data, also referred to as sampled audio. ("Samples" refer to successive snapshots of a signal, which in the case of digital audio is a sound wave. For example, the audio recorded for storage on compact discs is sampled 44100 times per second. Typically, sampled audio comes from a sound recording, but the sound could instead be synthetically generated. The term "sampled audio" refers to the type of data, not its origin. Sampled audio can be thought of as the sound itself, whereas MIDI data can be thought of as a recipe for creating musical sound.)
Java Sound does not assume a specific audio hardware
configuration; it is designed to allow different sorts of audio
components to be installed on a system and accessed by the API.
Java Sound supports common functionality such as input and output
from a sound card (for example, for recording and playback of sound
files) as well as mixing of multiple streams of audio. Here is one
example of a typical audio architecture for which Java Sound might
be used:
In this example, a device such as a sound card has various input and output ports, and mixing is provided in software. The MIDI synthesizer shown as one of the mixer???s audio inputs might also be a feature of the sound card, or it might be implemented in software. (The javax.sound.midi package, discussed later, supplies a Java interface for synthesizers.)
The major concepts used in the javax.sound.sampled package are described in the sections below.
A line is an element of the digital audio "pipeline," such as an audio input or output port, a mixer, or an audio data path into or out of a mixer. The audio data flowing through a line can be mono or multichannel (for example, stereo). Each type of line will be described shortly, but first some of their functional relationships will be illustrated, showing the flow of audio through the "pipeline." The following diagram shows different types of lines in a simple audio-output system:
A Possible Configuration of Lines for Audio OutputIn this example, an application has asked a mixer for one or more available clips and source data lines. A clip is a mixer input into which you can load audio data prior to playback; a source data line is a mixer input that accepts a real-time stream of audio data. The application preloads audio data from a sound file into the clips, and then pushes other audio data into the source data lines. The mixer reads data from these lines, each of which may have its own reverberation, gain, and pan controls, and uses the reverb settings to mix the "dry" audio signals with the reverberated ("wet") mix. The mixer delivers its final output to one or more output ports, such as a speaker, a headphone jack, and a line-out jack.
Although the various lines are depicted as separate rectangles in the diagram, they are all "owned" by the mixer, and can be considered integral parts of the mixer. The reverb, gain, and pan rectangles represent processing controls (rather than lines) that can be applied by the mixer to data flowing through the lines. (Note that this is just one example of a possible audio system that is supported by the API. Not all audio configurations will have all the features illustrated. An individual source data line might not support panning, a mixer might not implement reverb, and so on.)
A simple audio-input system might be similar:
A Possible Configuration of Lines for Audio InputHere, data flows in to the mixer from one or more input ports, commonly the microphone or the line-in jack. Gain and pan are applied, and the mixer delivers the captured data to an application via the mixer's target data line. A target data line is a mixer output, containing the mixture of the streamed input sounds. The simplest mixer has just one target data line, but some mixers can deliver captured data to multiple target data lines simultaneously.
The different types of line will now be examined more closely. Several types of line are defined by subinterfaces of the basicLine
interface. The interface hierarchy is shown
below.
The Line Interface Hierarchy
The base interface, Line
, describes the
minimal functionality common to all lines:
Line.Info
) that indicates what mixer (if any) sends
its mixed audio data as output directly to the line, and what mixer
(if any) gets audio data as input directly from the line.
Subinterfaces of Line
may have corresponding
subclasses of Line.Info
that provide other kinds of
information specific to the particular types of line.Closing a line indicates that any resources used by the line may now be released. To free up resources, applications should close lines whenever they are not in use, and must close all opened lines when exiting.
Mixers are assumed to be shared system resources, and can be opened and closed repeatedly. Other lines may or may not support re-opening once they have been closed. Mechanisms for opening lines vary with the different sub-types and are documented where they are defined.
LineEvent
class. Two types of
LineEvent
are OPEN
and
CLOSE
, but subinterfaces of Line
can
introduce other types of events.OPEN
or CLOSE
event, the event is sent to all objects that have registered to
"listen" for events on that line. Such objects must
implement the LineListener
interface. An application
can create these objects, register them to listen for line events,
and react to the events as desired.Ports
are simple
lines for input or output of audio to or from audio devices. The
Port
interface has an inner class,
Port.Info
, that specifies the type of port. Some
common types are the microphone, line input, CD-ROM drive, speaker,
headphone, and line output.The Mixer
interface represents a hardware or
software device that has one or more input lines and one or more
output lines. This definition means that a mixer need not actually
mix data; it might have only a single input. The Mixer
API is intended to encompass a variety of devices, but the typical
case supports mixing.
The Mixer
interface provides methods for obtaining
a mixer's lines. These can include target data lines, from which an
application can read captured audio data, source data lines, to
which an application can write audio data for playback (rendering),
and clips, in which an application can preload sound data for
playback. The mixer can lock these resources. For example, if the
mixer has only one target data line and it is already in use, an
attempt by an application to obtain a target data line will cause
an exception to be thrown.
You can query a mixer for lines of different types, by passing
the appropriate type of Line.Info
. You can also ask
the mixer how many lines of a particular type it supports.
A mixer maintains textual information about its specific device
type in an inner class called Mixer.Info
. This
information include the product's name, version, and vendor, along
with a textual description.
Notice that the generic Line
interface does not
provide a means to start and stop playback or recording. For that
you need a data line. The DataLine
interface
supplies the following additional media-related features beyond
those of a Line
:
AudioFormat
) specifies the
arrangement of the bytes in the audio stream. Some of the format's
properties are the number of channels, the sample rate, the sample
size, and the encoding technique. Common encoding techniques
include linear pulse-code modulation (PCM), mu-law encoding, and
a-law encoding.START
and STOP
events are produced when
active presentation or capture of data from or to the data line
starts or stops.An application can obtain a data line from a mixer. If the data line cannot be allocated because of resource constraints (for example, if the mixer supports only one target data line and it is already in use), an exception is thrown.
A TargetDataLine
receives audio data from a
mixer. Commonly, the mixer has captured audio data from a port such
as a microphone; it might process or mix this captured audio before
placing the data in the target data line's buffer. The
TargetDataLine
interface provides methods for reading
the data from the target data line's buffer and for determining how
much data is currently available for reading. If an application
attempts to read more data than is available, the read method
blocks until the requested amount of data is available. This
applies even if the amount of data requested is greater than the
line's buffer size. The read method returns if the line is closed,
paused, or flushed.
Applications recording audio should read data from the target data
line fast enough to avoid overflow of the buffer, which results in
discontinuities in the captured data. If the buffer does overflow,
the oldest queued data is discarded and replaced by new
data.
A SourceDataLine
receives audio data for
playback. It provides methods for writing data to the source data
line's buffer for playback, and for determining how much data the
line is prepared to receive without blocking. If an application
attempts to write more data than is available, the read method
blocks until the requested amount of data can be written. This
applies even if the amount of data requested is greater than the
line's buffer size. The write method also returns if the line is
closed, paused, or flushed.
Applications playing audio should write data to the source data
line fast enough to avoid underflow (emptying) of the buffer, which
may result in discontinuities in audio playback. If audio playback
stops due to underflow, a STOP
event is generated. A
START
event is generated when presentation
resumes.
A Clip
is a data line into which audio data
can be loaded prior to playback. Because the data is pre-loaded
rather than streamed, the clip???s duration is known before playback,
and you can choose any starting position in the media. Clips can be
looped, meaning that upon playback, all the data between two
specified loop points will repeat a specified number of times, or
indefinitely.
GroupLine
is a synchronized group of data
lines. If a mixer supports group lines, you can specify which data
lines should be treated as a group. Then you can start, stop, or
close all those data lines by sending a single message to the
group, instead of having to control each line individually.Data lines and ports often have a set of controls that affect
the audio signal passing through the line. The way in which the
signal is affected depends on the type of control. The Java Sound
API defines the following subclasses of Control
:
GainControl
GainControl
object can be queried for its resolution and for the minimum and
maximum gain values it permits. The resolution is expressed as the
number of increments over which the range of possible values is
distributed.PanControl
ReverbControl
SampleRateControl
GainControl
, the change can be made gradually instead
of immediately, and the control can be queried for its resolution
and its minimum and maximum possible values.Programmatically, you obtain a particular control object from a line through a reference to the control???s class. You can also obtain an array of all the controls for that line.
AudioSystem
class serves as an application's entry
point for accessing the installed sampled-audio resources. You can
query the AudioSystem to learn what sorts of audio components have
been installed, and then you can obtain access to them. For
example, an application might start out by asking the
AudioSystem
class whether there is a mixer that has a
certain configuration, such as one of the input or output
configurations illustrated earlier in the discussion of lines. From
the mixer, the application would then obtain data lines, and so on.
Here are some of the resources an application can obtain from
the AudioSystem
:
Mixer.Info
object. You can learn what mixers are
available by invoking the getMixerInfo
method, which
returns an array of Mixer.Info
objects.AudioSystem
, without dealing
explicitly with mixers.An application can use format conversions to translate audio
data from one format to another. (See the discussion of AudioFormat above.) Format conversions
are often used to compress and decompress audio data. An
application can query the AudioSystem
to learn what
translations are supported. It can then pass the
AudioSystem
a stream of audio data and get back a
translated stream in a particular format.
An audio stream is an input stream with a specified audio data
format (AudioFormat
) and data length. The
AudioInputStream
class represents such a stream, from
which you can read bytes. Some audio input streams permit you to
remember positions in the stream and skip around in it. The
AudioSystem
class provides methods for translating
between audio files and audio streams. The AudioSystem
can also report the file format of a sound file and can write files
in the different formats. A file format is represented by the
AudioFileFormat
class. An AudioFileFormat
includes an AudioFormat
, the file's length, and its
type (WAV, AIFF, AU, etc.).
Service provider interfaces for the sampled audio system are
defined in the javax.sound.sampled.spi
package.
Service providers can extend the classes defined here so that their
own audio devices, sound file parsers and writers, and format
converters can be installed and made available by a Java Sound
implementation.
Interfaces describing MIDI event transport, synthesis, and
sequencing are defined in the javax.sound.midi
package. The major concepts used in the package are described in
the sections below.
The diagram below illustrates the functional relationships
between the major components in a typical Java Sound MIDI
configuration. (Java Sound permits a variety of devices to be
installed and interconnected. The system shown here is just one
possible scenario.) The flow of data between components is
indicated by arrows. The data can be in a standard file format, or
(as indicated by the key in the lower right corner of the diagram),
it can be audio, raw MIDI bytes, or Java Sound's
MidiEvent
objects.
In this example, the application prepares a musical performance
by loading a musical score that is stored as a Standard MIDI File
on a disk (lower left corner of the diagram). Standard MIDI files
contain tracks, each of which is a list of time-tagged MIDI events.
This MIDI file is read into a Sequence
object, whose
data structure reflects the file. A Sequence
contains
a set of Track
objects, each of which contains a set
of MidiEvent
objects. The Sequence
is
then "performed" by a Sequencer
. A
Sequencer
performs its music by sending
MidiEvents
to some other device, such as an internal
or external synthesizer.
As illustrated, MidiEvents
must be translated into
raw (non-time-tagged) MIDI before being sent through a MIDI output
port to an external synthesizer. This conversion is accomplished by
a a MIDI-output device called a StreamGenerator
.
Similarly, raw MIDI data coming into the computer from an external
MIDI source is translated into MidiEvents
by a
StreamParser
.
The internal synthesizer (the rectangle marked "Synthesizer" in
the diagram) accepts MidiEvents
directly from the
Sequencer
or StreamParser
. It parses each
event and usually dispatches a corresponding command (such as
noteOn
) to one of its MidiChannel
objects, according to the MIDI channel number specified in the
event. (The MIDI Specification calls for 16 MIDI channels, so a
Synthesizer
typically has 16 MidiChannel
objects.)
The MidiChannel
uses the note information in these
messages to synthesize music. For example, a noteOn
message specifies the note's pitch and "velocity" (volume).
However, the note information is insufficient; the synthesizer also
requires precise instructions on how to create the audio signal for
each note. These instructions are represented by an
Instrument
. Each Instrument
typically
emulates a different real-world musical instrument or sound effect.
The Instruments
might come as presets with the
synthesizer, or they might be loaded from soundbank files. In the
synthesizer, the Instruments
are arranged by bank
number (the rows in the diagram) and program number (the columns).
An Instrument
can make use of stored digital audio,
included as Sample
objects in the soundbank. For
example, to play the sound of a trumpet playing a 5-second-long
note, the synthesizer might loop (cycle) through a half-second
snippet of a recording of a trumpet.
Now that the components have been introduced from a functional perspective, we will take a brief look at the API from a programmatic perspective.
A MidiEvent
object specifies the type, data length,
and status byte of the raw MIDI message for which it serves as a
wrapper. In addition, it provides a tick value that is used by
devices involved in MIDI timing, such as sequencers.
There are three categories of events, each represented by a
MidiEvent
subclass:
ShortEvents
are the most common and have at most
two data bytes following the status byte.SysexEvents
contain system-exclusive MIDI
messages. They may have many bytes, and generally contain
manufacturer-specific instructions.MetaEvents
occur in MIDI files, but not in raw
MIDI data streams. Meta events contain data, such as lyrics or
tempo settings, that might be useful to sequencers but are usually
meaningless for synthesizers.The base interface for devices is MidiDevice
. All
devices provide methods for listing the set of MIDI modes that they
support, and for querying and setting the current mode. (The mode
is a combination of MIDI's Omni mode and Mono/Poly mode.) Devices
can be opened and closed, and they provide descriptions of
themselves through a MidiDevice.Info
object.
The following diagram illustrates the MidiDevice
interface hierarchy. Also depicted are two classes, connected by
dashed lines to the MidiDevice
interfaces they
implement.
Devices are generally either transmitters or receivers of
MidiEvents
. The Transmitter
subinterface
of MidiDevice
includes methods for setting and
querying the receivers to which the transmitter is sending
MidiEvents
. From the perspective of a transmitter,
these receivers fall into two categories: MIDI Out and MIDI Thru.
The transmitter sends events that it generates itself to its MIDI
Out receivers. If the transmitter is itself also a receiver, it
passes along events that it has received from elsewhere to its MIDI
Thru receivers. The Receiver
subinterface of
MidiDevice
consists of a single method for receiving
MidiEvents
. Typically, this method is invoked by a
Transmitter
.
Java Sound includes concrete classes for converting between
MidiEvent
objects and the raw byte stream used in MIDI
wire protocol. A StreamGenerator
is a
Receiver
that accepts MidiEvent
objects
from a Transmitter
and writes out a raw MIDI byte
stream. Similarly, a StreamParser
is a
Transmitter
that accepts a raw MIDI byte stream and
writes the corresponding MidiEvent
objects to its
Receiver
.
A Synthesizer
is a type of MidiDevice
that generates sound. The Synthesizer
interface
extends both Receiver
and Transmitter
. It
provides methods for manipulating soundbanks and instruments. In
addition, it provides access to a set of MIDI channels through
which sound is actually produced. A Synthesizer
receives MidiEvents
and invokes corresponding
MidiChannel
messages.
MidiChannels
have methods representing the common
MIDI voice messages such as "note on" and "control change." They
also permit queries of the channel's current state.
Like Synthesizer
, the Sequencer
interface extends both Transmitter
and
Receiver
(and therefore MidiDevice
).
Sequencer
adds methods for basic MIDI sequencing
operations. A sequencer can load and play back a sequence, query
and set the tempo, and control the master and slave sync modes. An
application can register to be notified when the sequencer
processes MetaEvents
and controller events. (A
controller event occurs when a MIDI controller, such as a
pitch-bend wheel or a data slider, changes its value. These events
are not MidiEvents
, but are created when the
Sequencer
encounters certain ShortEvents
in the Sequence
.)
The Sequence
object represents a MIDI sequence as
one or more tracks and associated timing information. A track
contains a list of time-stamped MIDI events. Sequences can be read
from MIDI files, or created from scratch and edited by adding
Tracks
to the Sequence
(or removing
them). Similarly, MidiEvents
can be added to or
removed from the Tracks
.
Tracks
to the Sequence
(or
removing them). Similarly, MidiEvents
can be added to
or removed from the Tracks
.
It is not necessary to load a MIDI file into a
Sequence
object before playing the file. The
setSequence(java.io.InputStream)
method of
Sequencer
lets you read a MIDI file directly into a
Sequencer
, without creating a Sequence
object first.
MidiSystem
acts as an application's entry point to
the MIDI music system. It provides information about, and access
to, the set of installed devices, including transmitters,
receivers, synthesizers, and sequencers.
The MidiSystem
class provides methods for reading
MIDI files to create Sequence
objects, and for writing
Sequences
to MIDI files. A MIDI Type 0 file contains
only one track, while a Type 1 file may contain any number.
MidiSystem
also provides methods to create
Soundbank
objects by parsing soundbank files.
Configuration of the MIDI system is handled in the
javax.sound.midi.spi package
. The abstract classes in
this package allow service providers to supply and install their
own MIDI devices, MIDI file readers and writers, and soundbank file
readers.