Mission Scripting (Overview)

From GTAMods Wiki
Revision as of 13:02, 28 July 2013 by Wesser (talk | contribs) (Completed documenting Stories' data types hopefully.)
Jump to navigation Jump to search
This article handles the scripting technology and backround for the GTA 3 game series (including GTA LCS and GTA VCS). It does not cover GTA IV.
For more information about the GTA IV script read the article about the SCO format.
40px-Ambox rewrite orange.svg.png This article may need to be rewritten.
Please help improve this article. The discussion page may contain suggestions.

This article deals with the general overview on the mission scripting in GTA 3D series. Mission scripting is the process of writing scripts: small codes that control many aspects of gameplay. Although most of the game features are hardcoded, still much things could be done via scripting. In fact, every single mission in Grand Theft Auto series comes from the scripts. That is, knowing the format of scripts and having a proper tool, it is possible to change the mission details and even create an absoletely new story plot (although it's considered to be the most complex area in GTA modding, so most often the scripting results in small scripts adding new features in gameplay).

Introduction

The original mission script is looked like this[*] (taken from Vice City debug.sc file):

IF IS_BUTTON_PRESSED PAD2 RIGHTSHOULDER1
AND flag_create_car = 1
AND button_press_flag = 0
	IF IS_CAR_DEAD magic_car
		DELETE_CAR magic_car
	ELSE
		IF NOT IS_PLAYER_IN_CAR player magic_car
			DELETE_CAR magic_car
		ELSE
			MARK_CAR_AS_NO_LONGER_NEEDED magic_car
		ENDIF
	ENDIF 
	flag_create_car = 0
	initial_car_selected = 0
	button_press_flag = 1
ENDIF

Easy to read and understand, it is fairly basic so anyone with an idea of basic coding (or maybe even English) can understand it. However, very little code came with the game like that. The majority of the mission script comes in a file called main.scm (although in San Andreas there are alternate mains and external scripts, but they all follow the same basic format - hex codes). Example, for the code:

IF IS_CAR_DEAD magic_car
	DELETE_CAR magic_car

The equivalent in the main.scm would look something like this:

D6 00 04 00 19 01 02 45 0E 4D 00 01 FE 3D 87 02 A6 00 02 45 0E

This is how the beginning of the San Andreas mission script looks like:

Byte data Decompiled data Decompiled data with description
A4 03   09   4D 41 49 4E 00 00 00 00 03A4: 'MAIN' 03A4: name_thread 'MAIN'
6A 01   04   00   04   00 016A: 0 0 016A: fade 0 time 0
2C 04   05   93 00 042C: 147 042C: set_total_missions_to 147
0D 03   05   BB 00 030D: 187 030D: set_max_progress 187

Script instructions

A SCM file itself is a bytecode containing instructions telling to the game what to do. An instruction consist of an opcode and its parameters (if there are any). Sometimes the whole script instruction is called opcode.

Opcode

This section deals with the technical information on the opcode format. For the opcodes documentation see Opcodes

Each script instruction is represented by a number called operation code which is implemented using an 16 bit unsigned integer. By this number the game engine identifies an action to perform. Say, opcode 0001 tells to wait for amount of time, 0003 shakes the camera, 0053 creates a player, etc.

This is how an opcode 0001 looks in a scm file:

0100 04 00
  • First part is the opcode number in a little-endian format.
  • Second part is the data type
  • Third part is a parameter value

When a mission script is disassembled, opcodes are written in a human-readable format. The example above will look something like this:

wait 0

This is made for the end-user convenience only. The game does not know what the word wait means, but it knows what the opcode 0001 is, so when a mission script is assembled the commands are written back in raw byte form.

As it has been said, an opcode is UINT16 number. It means the minimum opcode is 0000 and maximum opcode is 0xFFFF. However due to a specific of the SCM language, any numbers above 0x7FFF denote negative conditional opcodes. More on this read there. The original unmodded game supports a way smaller amount of opcodes (maximum 0A4E for San Andreas), but there are tools adding new ones, most notably CLEO library.

After an opcode number the data types and parameter values follow[*].

Data types

Data type is a single byte written before any parameter[*]. The purpose of it is to tell to the game how much bytes to read next and what kind of data it is.

Data type (hex) Arg.
length
Game Description
Typified
00 0 GTA III Vice City San Andreas End of argument list (EOAL, 004F or 0913 and similar)[*]
01 4 GTA III Vice City San Andreas
unValue.m_iDWord = *(int *)&paucScriptBuffer[uiIp];
02 2 GTA III Vice City San Andreas Global integer/floating-point variable
usGlobalOffset = *(unsigned short *)&paucScriptBuffer[uiIp];
03 2 GTA III Vice City San Andreas Local integer/floating-point variable
usLocalId = *(unsigned short *)&paucScriptBuffer[uiIp];
04 1 GTA III Vice City San Andreas Immediate 8-bit signed int
unValue.m_cByte = paucScriptBuffer[uiIp];
05 2 GTA III Vice City San Andreas Immediate 16-bit signed int
unValue.m_sWord = *(short *)&paucScriptBuffer[uiIp];
06 4 Vice City San Andreas Immediate 32-bit floating-point
unValue.m_fFloat = *(float *)&paucScriptBuffer[uiIp];
06 2 GTA III Immediate 16-bit fixed-point
unValue.m_fFloat = (float)(*(unsigned short *)&paucScriptBuffer[uiIp]) / 16.0f;
07 6 San Andreas Global integer/floating-point array[*]
usArrayGlobalOffset = *(unsigned short *)&paucScriptBuffer[uiIp];
usArrayIndex = *(unsigned short *)&paucScriptBuffer[uiIp + 2];
usArraySize = paucScriptBuffer[uiIp + 3];
usArrayFlag = paucScriptBuffer[uiIp + 4];
08 6 San Andreas Local integer/floating-point array[*]
usArrayLocalId = *(unsigned short *)&paucScriptBuffer[uiIp];
usArrayIndex = *(unsigned short *)&paucScriptBuffer[uiIp + 2];
usArraySize = paucScriptBuffer[uiIp + 3];
usArrayFlag = paucScriptBuffer[uiIp + 4];
09 8 San Andreas Immediate 8-byte string[*]
strcpy(unValue.m_szShort, &paucScriptBuffer[uiIp]);
0A 2 San Andreas Global 8-byte string variable
usGlobalOffset = *(unsigned short *)&paucScriptBuffer[uiIp];
0B 2 San Andreas Local 8-byte string variable
usLocalId = *(unsigned short *)&paucScriptBuffer[uiIp];
0C 6 San Andreas Global 8-byte string array[*]
usArrayGlobalOffset = *(unsigned short *)&paucScriptBuffer[uiIp];
usArrayIndex = *(unsigned short *)&paucScriptBuffer[uiIp + 2];
usArraySize = paucScriptBuffer[uiIp + 3];
usArrayFlag = paucScriptBuffer[uiIp + 4];
0D 6 San Andreas Local 8-byte string array[*]
usArrayLocalId = *(unsigned short *)&paucScriptBuffer[uiIp];
usArrayIndex = *(unsigned short *)&paucScriptBuffer[uiIp + 2];
usArraySize = paucScriptBuffer[uiIp + 3];
usArrayFlag = paucScriptBuffer[uiIp + 4];
0E 1+x San Andreas Immediate variable-length string[*]
strncpy(unValue.m_szVarlen, &paucScriptBuffer[uiIp + 1], paucScriptBuffer[uiIp]);
0F 16 San Andreas Immediate 16-byte string[*]
strcpy(unValue.m_szLong, &paucScriptBuffer[uiIp]);
10 2 San Andreas Global 16-byte string variable
usGlobalOffset = *(unsigned short *)&paucScriptBuffer[uiIp];
11 2 San Andreas Local 16-byte string variable
usLocalId = *(unsigned short *)&paucScriptBuffer[uiIp];
12 6 San Andreas Global 16-byte string array[*]
usArrayGlobalOffset = *(unsigned short *)&paucScriptBuffer[uiIp];
usArrayIndex = *(unsigned short *)&paucScriptBuffer[uiIp + 2];
usArraySize = paucScriptBuffer[uiIp + 3];
usArrayFlag = paucScriptBuffer[uiIp + 4];
13 6 San Andreas Local 16-byte string array[*]
usArrayLocalId = *(unsigned short *)&paucScriptBuffer[uiIp];
usArrayIndex = *(unsigned short *)&paucScriptBuffer[uiIp + 2];
usArraySize = paucScriptBuffer[uiIp + 3];
usArrayFlag = paucScriptBuffer[uiIp + 4];
Untypified
N/A 8 GTA III Vice City Immediate 8-byte string[*]
strcpy(unValue.m_szShort, &paucScriptBuffer[uiIp]);

As it might be seen from the table two bytes 02 00 could have 3 different meanings as a parameter: if it's preceeded by a data type of 02 it is a global variable ($2), data type of 03 - local variable (2@), data type of 05 - 16-bit number (2), so only the data type allows the game to determine the correct parameter meaning.

Data types for Liberty City Stories and Vice City Stories are much different. First of all, many data types itself denote an immediate value. For example, data type 01 is a value of 0, data type 02 the value 0.0, etc. Floating-point values are packed (1, 2 or 3 bytes of length instead of the common 4). Some data types itself are somewhat the identifier of a variable.

Data type (hex) Arg.
length
Game Description
Typified
00 0 Liberty City Stories Vice City Stories End of argument list (EOAL)
01 0 Liberty City Stories Vice City Stories
unValue.m_cByte = 0;
02 0 Liberty City Stories Vice City Stories Immediate 8-bit floating-point constant 0.0
unValue.m_fFloat = 0.0;
03 1 Liberty City Stories Vice City Stories Immediate 8-bit packed floating-point
unValue.m_iDWord = paucScriptBuffer[uiIp] << 24;
04 2 Liberty City Stories Vice City Stories Immediate 16-bit packed floating-point
unValue.m_iDWord = *(unsigned short *)&paucScriptBuffer[uiIp] << 16;
05 3 Liberty City Stories Vice City Stories Immediate 24-bit packed floating-point
unValue.m_iDWord = (*(unsigned short *)&paucScriptBuffer[uiIp] << 16) | (paucScriptBuffer[uiIp + 2] << 8);
06 4 Liberty City Stories Vice City Stories Immediate 32-bit signed integer
unValue.m_iDWord = *(int *)&paucScriptBuffer[uiIp];
07 1 Liberty City Stories Vice City Stories Immediate 8-bit signed integer
unValue.m_cByte = paucScriptBuffer[uiIp];
08 2 Liberty City Stories Vice City Stories Immediate 16-bit signed integer
unValue.m_sWord = *(short *)&paucScriptBuffer[uiIp];
09 4 Liberty City Stories Vice City Stories Immediate 32-bit floating-point
unValue.m_fFloat = *(float *)&paucScriptBuffer[uiIp];
0A 8 Vice City Stories Immediate null-terminated string
strcpy(unValue.m_szString, &paucScriptBuffer[uiIp]);
Untypified
T<0C 1 Liberty City Stories Local timer
ucTimerLocalId = paucScriptBuffer[uiIp] + 0x5E;
T<0D 1 Vice City Stories Local timer
ucTimerLocalId = paucScriptBuffer[uiIp] + 0x5D;
T<6C 1 Liberty City Stories Local integer/floating-point variable
ucLocalId = paucScriptBuffer[uiIp] - 0x0C;
T<6D 1 Vice City Stories Local integer/floating-point variable
ucLocalId = paucScriptBuffer[uiIp] - 0x0D;
T<CC 3 Liberty City Stories Local integer/floating-point array
ucArrayLocalId = paucScriptBuffer[uiIp] - 0x6C;
ucIndexLocalId = paucScriptBuffer[uiIp + 1];
ucArraySize = paucScriptBuffer[uiIp + 2];
T<CD 3 Vice City Stories Local integer/floating-point array
ucArrayLocalId = paucScriptBuffer[uiIp] - 0x6D;
ucIndexLocalId = paucScriptBuffer[uiIp + 1];
ucArraySize = paucScriptBuffer[uiIp + 2];
T<E6 2 Liberty City Stories Vice City Stories Global integer/floating-point variable
usGlobalId = (*(short *)&paucScriptBuffer[uiIp] - (isGameLcs() ? 0x00CC : 0x00CD)) << 8;
T>=E6 4 Liberty City Stories Vice City Stories Global integer/floating-point array
usArrayGlobalId = (*(short *)&paucScriptBuffer[uiIp] - 0x00E6) << 8;
ucIndexLocalId = paucScriptBuffer[uiIp + 2];
ucArraySize = paucScriptBuffer[uiIp + 3];
N/A 8 Liberty City Stories Immediate 8-byte string
strcpy(unValue.m_szString, &paucScriptBuffer[uiIp]);

All the data types above haven't been tested in a decompiling process yet.

This section is incomplete. You can help by fixing and expanding it.

Parameters

The game engine knows amount of parameters for each opcode (1 for 0001, 2 for 0004, 13 for 014B, etc). If the script contains another number of parameter it causes a crash.

The parameters could be one of following kinds:

This section is incomplete. You can help by fixing and expanding it.

Strings

Strings are the sequences of symbols. Those including letters, numbers, some other chars like underscore or at-sign. GTA has no limits on what symbols could or could not be used in the strings. Also, no matter with what symbol a string begins. It could be any, even a space.

There are two kinds of the strings used.

^ Short string. This is the most common type of a string, that is used in every game since GTA 3. The term short means that this string is strongly limited to its length. Maximum symbols it could contain is 7 and the last one (8th) is a null terminator byte. When compiled such strings occupy 8 bytes of a SCM file no matter if the string is actually shorter (the rest of bytes is filled with zero bytes).

^ SA scripting engine also has data type 15 that denotes the short string containing up to 15 symbols. This kind of strings is only supported by Sanny Builder. They are handled in a same manner as 8 bytes strings, but always occupy 16 bytes of a SCM file.

String Equivalent in SCM
'MAIN' 09   4D 41 49 4E 00 00 00 00
'MODDING' 09   4D F4 44 49 4E 47 00
'SAVE_YOUR_SOULS!' 0F   53 41 56 45 5F 4F 55 52 5F 53 4F 55 4C 53 21 00

^ Long string. This type was first introduced in San Andreas. Maximum length depends on the opcode[*].

This section is incomplete. You can help by fixing and expanding it.

Arrays

^ Native arrays support was introduced in GTA SA, however there were different implementations of arrays in Vice City. In SA SCM arrays are assembled as 2 UINT16s and 2 bytes:

2b - UINT16 - array offset[*]
2b - UINT16 - array index[*]
1b - BYTE   - array size
1b - BYTE   - flags[*]

^ An array offset basically is a variable number. If it's a global array, the offset is a global variable index from which the array begins. For example, if the global array offset is 150 (96 00) it means that the first element of the array is $150, the second one is $151, etc. Same valid for the local arrays (offset is a local variable index).

^ An array index is a variable number (global or local one) that holds the value of array index. For example, if array index is 3 (03 00), the game will read either global variable $3 or local variable 3@ depending on the array flag (see below). This variable holds the number which is array element ID to work with. For example, if the array index is $3, and $3 holds number 5, the game will read 5th element of the array.

^ Flags

0x0 - integer array with local variable as index 0x80 - integer array with global variable as index
0x1 - float array with local variable as index 0x81 - float array with global variable as index
0x2 - short string array with local variable as index 0x82 - short string array with global variable as index
0x3 - long string array with local variable as index 0x83 - long string array with global variable as index
Array Equivalent in SCM
$150(3@,6f) 07   96 00 03 00 06 01
10@(9@,5s) 0D   0A 00 09 00 05 02

Notes

^ In GTA 3, Vice City and Liberty City Stories short strings (8 bytes) have no data type preceeding it. If the byte does not fit data type range (00-06 for GTA 3 and VC), it's recognized as a beginning of a string and next 8 bytes are read.

^ Some opcodes have variable amount of parameters. Most known opcode is 004F that creates a new thread and passes arguments to it. The number of such parameters could vary, so the special data type denotes the end of parameters.

The maximum amount of parameters for any opcode is 16 for GTA 3 and VC, 32 for SA, 96 for LCS and VCS.

^ San Andreas Opcode 05B6 is a special opcode that defines a table. Immediately after opcode number the stream of data (128 bytes) follows, without a data type.

^ Post.png GTAForums: Post by Seemann describing limits for the long strings in SA

Cracking the SCM

As has been said, very little of the code was supplied with the game in a decompiled state (only two small files, both test scripts), so how, as asked, do we create our own scripts based on the original? With a decompiler - but how do these work (no decompilers have been provided by Rockstar).

The original SCM format was cracked shortly after the release of GTA 3 (the first game to use this mission coding method), with people having to first figure out what all the sections did (there are 5 segments is an SCM - memory, objects, mission defines, MAIN and missions (GTA SA has more, but only one of these (global variables) has had its use determined), where they started/ended etc, figuring out how many parameters each OpCode had and a lot more. Once this was done, they knew where each OpCode began and ended, so they could split them up to make it more readable, but the data on what each one does was lost in the compiling, so they still only had something that looked like this:

:label035F78
0001: 0?
00D6: 0?
0256: 4??


That doesn't still doesn't mean a lot though, so people had to try figure out what the different OpCodes meant.

(Note: this code is in early Mission Builder format:

:labelxxxxxx means this code was originally at this offset in the mission script (the 'label' is added in by the decompiler)
x? means a one byte number
x?? means a variable stored at this offset from the start

label (i.e. for if we wanted to 'jump' to a label))

Some were easy, the very first line of a decompiled script (besides decompiler headers) looks something like:


The only parameter this command has is a reference to a label, so this is most likely (and in fact is) a jump statement, so we know all 0002s are jumps. Of course, finding what OpCodes do (and in fact finding the original number of parameters took a while to confirm) takes time, you have to have an idea first and then have to test your theory - many OpCodes have still not been named, but with the amount of OpCodes discovered so far, we have a general idea on what the mission script does.

Once the mission script had been cracked, people could write programs to read through it and output it in a form we could understand (based on a format of opcodes, text to say what they do and a list of parameter values - nothing like the original - the opcodes are needed to determine which opcode it is, the describing text is completely ignored). Originally there were two main decompilers, BWME (Barton Waterduck's Mission Editor) and CyQ's disassembler, each with their own compilers (to compile the decompiled code back into an SCM file). BWME quickly became the most commonly used, especially among newer coders, probably due to the fact that the parameters were inter-mixed with the code, so you had something like:

00B1: is_car $car in_cube $lowerx $lowery $lowerz $upperx $uppery $upperz 0

As opposed to the gtaMa/DisAsm format:

is_car_in_cube $car, $lowerx, $lowery, $lowerz, $upperx, $uppery, $upperz, 0

(also note the lack of OpCode in the second example, this builder uses a lookup to find the opcode (if the function is known) instead of just quoting it)

Although you can't see much difference with that example, it can make a lot of difference. Since Barton left the modding community, Seemann created an even more versatile decompiler, the Sanny Builder. It has become the most popular mission builder.

The Tools

Main article: Mission Scripting Tools

There are three main builders for GTA 3, VC, and SA, and one for LCS and VCS.

Mission Builder

Main article: Mission Builder (BWME - note: although BWME was a slightly different tool, I shall be referring to this as that):

This tool uses only OpCodes to compile the code, all the text on the line is ignored. Traditionally, it decompiles to a file called main.scm.txt, which is just a big text file with all the code in it, expanded to be readable. This tool used to be the most popular MB until Sanny Builder was released. This project is abandoned and the creator retired.

Code format

Early builders used data type identifiers on all numbers, these were

 ? - small int (data type 04)
& - medium int (data type 05)
&& - big int (data type 01)<- global jump (data type 01)<- mission jump (data type 01 - negative addressing)
! - integer (data type 06)
$ - global variable (data type 02)
@ - local variable (data type 03)
?? - DMA global variable (like a global variable, only its memory position is its name, not assigned to it - data type 02)
: - label (text directly after used to reference this label in jumps)
"" - string (no data type, first 8 bytes after opcode when compiled)
# - model identifier (means you can enter the id name of a model rather than the number - data type 05 for the compiled number)

Later versions of the builder got rid of number type definitions, assigning types based on the size of the number. Integers were made integers by being not a whole number (e.g. 10.5 or 10.0 if you want a whole number defined as an integer). They also replaced DMA variables with global variables where the name was the hex address in decimal divided by 4 (each variable uses 4 bytes of memory).

Advantages
Commands related to the parameters.
Macros and program execution facilities inbuilt.
Disadvantages
Not widely used anymore
Creator retired (no future updates / bug fixes).
Decompilation bugs (especially in certain advanced jumps).
Many unofficially usable SCM features uncatered for (although these are mostly advanced problems never experienced by the average coder).
Inconvenient syntax.
Other notes
GUI.
Compiler inbuilt.

Point

Main article: Point

It is a scripting tool developed by Jon Caruana.This is still very much in the development stages, but it was abandoned in 2006. It was the first user made high-level scripting tool (in fact, first high-level at all, even Rockstar's compiler is only an advanced parser) made for coding III Era GTAs. Originally developed for VC/III, this tool has been expanded to work for all three 3D games(III, VC, SA). One major disadvantage to this is that the file headers it writes, while readable by the game, are unrecognized by any line by line decompilers. So once a file has been compiled in Point, it cannot be decompiled again by another tool to see the exact generated code.

Sanny Builder

Main article: Sanny Builder

Sanny Builder is made and produced by Seemann. Sanny Builder is the fastest and the most powerful tool. Moreover, it is widely used by the modding community. It includes a compiler and a decompiler. Furthermore, it supports CLEO library which extends the coding possibilities in all III Era Games by adding a lot of new and useful opcodes and allowing to run the scripts without modifying the main.scm file. It includes many useful features, as well as detailed help including description and solutions for all run-time error messages. Initially, it was developed for GTA:SA and based on MB code but since then it has been expanded and includes many different features, some of them taken from Gtama and most of them are new (such as a basic class system and direct HEX input (as requested by Y_Less)).

The code format is based on a combination of both Gtama and MB formats, although you can't force data types as you could in early Mission Builders (e.g. 0004: $var = 0&& which would normally be assigned one byte, not four). However, it supports high-level statements.

Data Types
$ - global variable
s$ - global string variable
v$ - global long string variable
@ - label (text directly AFTER used to reference this label in jumps)
@ - local variable (number BEFORE denotes variable)
@s - local string variable (number BEFORE denotes variable)
@v - local long string variable (number BEFORE denotes variable)
'...' - string (first 8 bytes after opcode when compiled)
"..." - debug string text
# - model identifier (means you can enter the id name of a model rather than the number)
0x - Hexadecimal number
& - ADMA (Advanced Direct Memory Access)
Advantages
It is the fastest
Supports CLEO library
It has many language translations
It is very well documented and it includes useful information and detailed help including description and solutions for all run-time error messages
It has useful tools such as Coords Manager, Opcode search , MB > SB syntax converter and many others
Supports all III era games (Can decompile from GTA III, GTA VC, GTA SA, GTA LCS, GTA VCS and Can compile in GTA III, GTA VC, GTA SA)
It uses color code highlighter
Supports high-level constructions
It has a coding method with classes
Online updateable ini.

Disassembler/Assembler

Main article: gtama (GTA Mission Assembler - gtaMa, Vice City Disassembler - DisAsm):

These tools use one word commands, although they may consist of multiple words concatenated by an underscore ("_"), e.g. is_player_defined. They still compile each line as-is (i.e. no interpretation or code generation) so the game will execute exactly the commands you enter. This is similar to programming in ASM mnemonics, whereas BWME is more similar to machine code. The decompiled file is split up into a number of .gsr files, each one containing the code to one mission. This reduces file sizes considerably as BWME generated files are huge (around 6 MB .txt files), containing the whole code. The code is in the format of a command, followed by a list of parameters, separated by spaces - this can make named variables easy to distinguish from commands.

The disassembler (DisAsm) is written by CyQ and the assembler (gtaMa) is written by Dan Strandberg. These two tools work together to de- and re-compile the code.

Code format
$ - global variable (data type 02)
! - local variable (data type 03)
@ - label (text directly after used to reference this label in jumps)
"" - string (no data type, first 8 bytes after opcode when compiled)
% - model identifier (means you can enter the id name of a model rather than the number - data type 05 for the compiled number)
Advantages
Small files sizes.
Clearer code - data and commands separated.
Active creator (although no longer developing).
More support for advanced features (supports memory hacking methods not widely used).
Open source.
Online updateable ini.
Format used on custom error handler for VC.
Disadvantages
Not widely used.
Code spread across multiple files - harder for searching.
Data not easily related to code.
Other notes
Command line based.

See also

External links

^ Sources of the GTA 3 missions at GTAModding.ru