Difference between revisions of "Audio stream"

From GTAMods Wiki
Jump to navigation Jump to search
(Fixed heading levels.)
(No difference)

Revision as of 02:13, 20 August 2008

Introduction

San Andreas handles music much differently than GTA III and Vice city. Rather than a simple long loop for the radio stations, San Andreas stations are dynamic and change based upon game conditions. As a result, the stream format was introduced with San Andreas. This article details what is known about the stream format.

Encoding

The streams on disk are encoded and so cannot be directly read or written. Fortunately, the encoding algorithm is a very simple one; essentially, the encoding is simply XORing the stream data with certain key values. The key is 16-bytes long. Consecutive bytes of the stream are XORed with consecutive bytes of the key; once the end of the key is reached it wraps around to the start again. There are several ways to approach the encoding; this article discusses encoding byte-by-byte. A program could also encode a stream in chunks of 2, 4, 8, or 16 bytes by adjusting the key appropriately (streams are little-endian.)

The 16-byte encoding key, in hex notation, is:

 EA 3A C4 A1   9A A8 14 F3   48 B0 D7 23   9D E8 FF F1

The algorithm for encoding would be:

  • XOR the first byte of the stream with the first byte of the key (EA)
  • XOR the second byte of the stream with the second byte of the key (3A)

... (continues) ...

  • XOR the sixteenth byte of the stream with the sixteenth byte of the key (F1)
  • XOR the seventeenth byte of the stream with the first byte of the key (EA)

Here is a simple C++ algorithm to do the encoding (taken from the San Andreas Audio Toolkit source code)

void stream_encode(int8_t *buff, const size_t size, int &index) {
  for (size_t i = 0; i < size; ++i) {
    buff[i] ^= encode_key[index];
    index = (index + 1) % 16;
  }
  return;
}

This is a two-way encoding method and if a chunk of data is run through the encoding algorithm twice, the result is the original data. Thus, the same algorithm can be used for both encoding and decoding.

Tracks

The decoded stream is simply a consecutive list of tracks. The tracks contain a header with metadata followed by the actual audio in Ogg Vorbis format.

Track Header

The track header is 8068 bytes long. It is divided into 3 major sections: 8000 bytes of Beat information, 64 bytes which contain the length of the sound file and other information, and 4 constant bytes of unknown meaning.

Track Header:
 8000 bytes - Beat Entry x 1000 - see below for details
   64 bytes - Length Entry x 8  - see below for details
    4 bytes - CHAR[4]           - always "01 00 CD CD" (hex)

The Beat section is used for the Dancing minigame and the Low Rider Challenge minigame. In each of these games, the player has to match the rhythm of a song by entering specified controls at specified times. This beat information is specified in the track header.

Beat Entry:
  4 byte - DWORD - Timing value
  4 byte - DWORD - Control value

The timing value is an integer denoting the point in the song (in milliseconds) where the beat is triggered. If the track doesn't need any beat information, the default value is -1 (0xFFFFFF). The control value is an integer denoting what button, key, etc. should be pressed. These values are defined below. If the track doesn't need any beat information, the default value is 0.

Dance Minigame Controls
Value (hex) Input Control
xbox PS2 PC Default
0x01 A X Down Arrow
0x02 X Square Left Arrow
0x03 Y Triangle Up Arrow
0x04 B Circle Right Arrow
0x21 End of Beats Token
Lowrider Minigame Controls
Value (hex) Input Control
xbox or PS2 PC Default
0x09 Rt Analog Stick: Right Numpad 6
0x0a Rt Analog Stick: Left Numpad 4
0x0b Rt Analog Stick: Up & Right Numpad 8 & Numpad 6
0x0c Rt Analog Stick: Down & Left Numpad 2 & Numpad 4
0x0d Rt Analog Stick: Up Numpad 8
0x0e Rt Analog Stick: Down Numpad 2
0x0f Rt Analog Stick: Up & Left Numpad 8 & Numpad 4
0x10 Rt Analog Stick: Down & Right Numpad 2 & Numpad 6
0x21 End of Beats Token

The second header section is somewhat strange. It is a series of 8 DWORD Length Entry pairs, but all but one of these pairs contain unused padding values. Additionally, which of the 8 pairs is not padding varies somewhat.

Length Entry:
 4 byte - DWORD - Length of Ogg Vorbis file or 0xCDCDCDCD padding
 4 byte - DWORD - Variable, probably unused or 0xCDCDCDCD padding

If a given entry is padding, both DWORDS are 0xCDCDCDCD. For the one length entry pair which isn't padding, the first DWORD is the length of the following Ogg Vorbis file, and the second DWORD is a variable value which was probably intended to be a sample rate and is now unused. In the unmodified audio streams, this value is generally 24000 for AMBIENCE tracks, 0 for CUTSCENE tracks, and 48000 for other tracks. There are some exceptions to these guidelines.

In the unmodified game, the first of the 8 length entries normally contains the useful length information. However, for the 6 tracks which contain beat information, the length information is instead stored in the second length entry pair.

As noted above, the final four bytes of the Track header are always "01 00 CD CD" (hex).

Ogg Vorbis Sound

Following the Track Header, the actual sound (in Ogg Vorbis format) is stored directly in the stream. The length of the sound is variable, but is known from the value stored in its header. While none of the Ogg Vorbis sounds in the unmodified game streams contain comment tags, the existence of such tags within the sound will not create any problems.