PS2 Native Geometry

From GTAMods Wiki
Jump to navigation Jump to search

This page describes the format of pre-instanced DFFs in RenderWare for PS2. For an explanation of how geometry is rendered on the PS2 see PS2All.

Terminology

The PS2 has the following integer data types:

  • byte
  • halfword (2 bytes)
  • word (4 bytes)
  • dword (8 bytes)
  • qword (16 bytes)

Format

The Native Data chunk contains a single Struct which contains all the instanced data. For every mesh there is a block of data, after which the next mesh's data block follows. The chunk begins with a header:

int32  platform        - always 4 for PS2

After that the following for each mesh:

uint32 size            - size of data
bool32 noPointers      - data contains no pointers and thus needs no relocation
uint8  data[size]      - instance data

The data can include trailing junk data due to rounding and alignment reasons. The general format of the data is a chain of DMA packets that are sent to the VIF when the geometry is rendered. Some kinds of DMA tags have pointers to their data, if this is the case, the noPointers value is false.

VU1, VIF and DMA

This is only a brief explanation of the most important things. For a complete description of the DMA controller, the VIF and VU1, consult Sony's documentation (the EE User's Manual).

A mesh is rendered by sending data to the PS2's Vector Unit 1 (VU1), which then does operations like transformation, projection, lighting and clipping, and then kicks this transformed geometry to the Graphics Synthesizer (GS) for rasterization. The VU1 has 16kb memory for each code and data, so geometry has to be split into manageable batches. Both the data and program that is to transform and render the data are sent to the VU1 by DMA transfers to the VU interface (VIF). The VIF understand a number of commands (VIFcodes) that control the the vector unit and its memory.

DMA transfers are done using DMA packets; they consist of a DMA tag, which describes the type and size of the packet, and the the actual data. A DMA tag is one qword of size of which only lower dword is used by the DMA controller, the upper dword can be filled with other data; RW makes use of this. The size of the data transferred by a DMA transfer is stored in the DMA tag of the packet as a number of qwords. RW uses three types of DMA packets:

  • cnt: the data immediately follows the tag.

The next packet is after the data.

  • ref: the data is pointed to by a pointer stored in the DMAtag.

The next pakcet is after the tag.

  • ret: same as DMAcnt but the next packet is after where the current chain

was called from. In general, the last packet of the instance data must be of this type.

The (slightly simplified) format of a DMA tag is like this:

bits 0-15: number of qwords to transfer
bits 28-30: type (cnt = 1, ref = 3, ret = 6)
bits 32-63: address (if applicable, i.e. only ref in our case)

The address is a byte address but must be qword aligned.

All DMA packets in the instanced data are sent to the VIF, their contents are VIFcodes. The ones important to us are:

  • NOP 0x00 - no operation
  • STCYCL 0x01 - set CYCLE register
  • ITOP 0x04 - set ITOPS register
  • STMOD 0x05 - set MODE register
  • MSKPATH3 0x06 - masks PATH3 transfer
  • MARK 0x07 - set MARK register
  • FLUSH 0x11 - wait for end of microprogram and GIF transfer
  • MSCALF 0x15 - call micro program
  • MSCNT 0x17 - continue micro program
  • UNPACK 0x60 - unpack the following data and write to VU memory

Another important VIFcode is MPG which uploads program code into instruction memory but it's not needed for geometry transfers. The general chain of commands looks like this:

  • STCYCL and UNPACK data into memory
  • ITOP to set vertex count of this batch
  • MSCALF or MSCNT to start/continue the program
  • FLUSH to wait for completion

Geometry is processed in a double buffer mode with two input buffers and (depending on the VU program) at least two output buffers. First a batch of geometry is uploaded into one input buffer. Then the program processes the data in this input buffer and writes to one output buffer. Meanwhile the other input buffer is filled. After the VU program kicks off the geometry in the first buffer, it processes the next input and writes to the other output buffer. Meanwhile the first input buffer is begin filled again. In this way the VU program always fills one output buffer with the processed data of one input buffer while the other output buffer is being sent to the GS and the other input buffer is being filled by an UNPACK VIFcode.

Data format

Each vertex has a number of attributes: positions, texture coordinates, colors and normals are the most common. The pipeline is configured by specifying a so called cluster for each attribute that describes in which way and format the attributes are expected to be used.

Clusters can be opaque which means that the instancing is handled internally by RenderWare and the programmer is not expected to read or write data for this cluster. Custom user clusters are never opaque but written by the programmer. For opaque clusters the data to be UNPACKed is split into manageable batches and embedded into the VIFcode stream inside DMAcnt/ret packets. User clusters are stored in continuous blocks called stripes that DMAref packets point into. If no user clusters are specified for the pipeline, all VIFcodes can be embedded into big DMAcnt/ret packets (in many cases just one DMAret). If user clusters are specified, the chain has to be broken up into DMAref and DMAcnt packets (for each batch one DMAref per user cluster and one DMAcnt for all opaque clusters).

The clusters that can be instanced are the following:

  • xyz - 3 floats
  • xyzw - 3 floats + adc flag
  • uv - 2 floats
  • uv2 - 4 floats
  • rgba - 4 bytes
  • normal - 3 bytes
  • user1
  • user2
  • user3
  • user4

The general format for meshes having no stripes:

 DMAcnt/ret [FLUSH; FLUSH] {
     foreach batch {
             foreach cluster {
                     NOP; STMOD; STCYCL; UNPACK
                     unpack-data (padded to qword boundary)
             }
             ITOP; MSCALF/MSCNT; NOP/FLUSH; NOP/FLUSH
     }
 }

and for meshes having stripes:

 foreach batch {
     foreach broken out cluster {
             DMAref [STCYCL; UNPACK] -> pointer into stripe-data
             DMAcnt [NOP; NOP] empty
     }
     DMAcnt/ret [NOP; NOP] {
             foreach cluster {
                     NOP; STMOD; STCYCL; UNPACK
                     unpack-data (padded to qword boundary)
             }
             ITOP; MSCALF/MSCNT; NOP/FLUSH; NOP/FLUSH
     }
 }
 stripe-data

Since a DMA tag only occupies one dword, the other dword can hold two VIFcodes - given in brackets after the DMA tag above.

The first batch starts VU code execution with MSCALF, all subsequent batches restart the program with MSCNT.

Only the last batch has [FLUSH; FLUSH] after MSCALF/MSCNT.

In older versions of RW the MSKPATH3 VIFcode was used instead of the second FLUSH. the MARK VIFcode was used to mark batches before the STMOD VIFcode.

The argument to ITOP is the vertex count of this batch.

The WL field of the STCYCL VIFcode is set to 1, the CL is set to the vertex stride (= number of clusters).

Batch contents

It is not possible to draw primitives by indexing into a vertex buffer on the PS2. Instead all indexing has to occur beforehand and vertices have to be sent in full to the VU1 memory.

PS2 pipelines can draw 5 different geometric primitives:

  • triangle lists
  • triangle strips
  • line lists
  • line strips
  • point lists

For triangle strips the last two vertices of the last triangle are used as the first two of the next one. Similarly for line strips the last vertex of the last line segment is used as the first vertex for the next one. In triangle and line lists every vertex is only used once.

Since meshes have to be split up into smaller batches, longer strips are interrupted. Therefore it is necessary the last one/two vertices of the last batch be repeated as the first one/two vertices of the next batch to restart the strip. This is important to keep in mind to understand some alignment issues.

Alignment and vertex count

The data for an UNPACK VIFcode is always padded to a multiple of a word (this is a fact about the hardware). For opaque clusters the data is additionally padded to qword alignment with NOP VIFcodes.

For non-opaque clusters the situation is more complicated because two aligment requirements must be satisfied:

1. Data pointed to by a DMAref packet must be qword aligned (again, required by the hardware). Since consecutive batches are pointed to by the DMAref tag, this means that the beginning of each batch inside the stripe data must be qword aligned.

2. The contents of the data to be UNPACKed must fit exactly into qwords. This is because trailing data in the same DMA packet would be interpreted as VIFcodes. The packet data can obviously not be padded by NOP VIFcodes because they would have to be inserted into the (continuous) stripe data.

Do note that these two requirements are not the same since consecutive batches may overlap when using triangle or line strips. So both can be satisfied, all vertex attributes must be a multiple of 4 bytes in size.

For the first requirement the batch vertex count excluding the repeated vertices (!) must be a multiple of 4. This way the number of bytes the attribute pointer is advanced by for the next DMA tag is guaranteed to be a multiple of 16. Hence the maximum possible vertex count for a given input buffer size is rounded down to fit the general formula c=(4n-r) where c is the resulting vertex count, r is the number of vertices to repeat and n is the biggest natural number such that c would not imply overflowing the intput buffer.

For the second requirement the UNPACK count must be a multiple of 4. This way the number of bytes unpacked into VU memory is guaranteed to be a multiple of 16. Hence the UNPACK count is the batch vertex count rounded up to a multiple of 4 or 2 depending on the vertex attribute size.

Note that if a pipeline has no stripe clusters the vertex count is still rounded to a multiple of 4 even though this is not strictly necessary.

Furthermore for list primitives the vertex count must be a multiple of the number of vertices needed to draw the primitive (3 for triangles, 2 for lines). For lines this is automatically the case since the base vertex count is already a multiple of 4. For triangles this means the base vertex count must be a multiple of both 4 and 3: 12.