NVIDIA OptiX is the API for GPU-accelerated ray tracing with CUDA, and is often used to render scenes containing a wide variety of objects and materials. During an OptiX launch, when a ray intersects a geometric primitive, a hit shader is executed. The question of which shader is executed for a given intersection is answered by the Shader Binding Table (SBT). The SBT may also be used to map input data to the shading operation. Â
This post describes several different approaches to laying out the SBT in your application, and the different ways that your shader can access its data. By making both the SBT and the shading data as small as possible, you can save memory, improve performance, and simplify management of the SBT itself.Â
Shader Binding Table design patternsÂ
Commonly, ray tracing applications store two main types of data for each mesh object: geometric information, such as shading normals, and material parameters, such as a diffuse reflectance or a roughness parameter. These data are accessed by the material shader to perform calculations such as lighting on the current intersection point. Â
The following sections develop two approaches to using the SBT and implementing the data lookup. The first approach offers a straightforward utilization of the SBT to store shader programs and lookup data. It is simple to implement but inefficient in memory usage. The second approach iterates on this basic approach to arrive at a more efficient layout, reducing redundancy in the SBT records and shading data to increase efficiency. Â
The two designs target the common case of a scene hierarchy with many instances sharing a set of materials. For expository purposes, we will assume all geometry uses OptiX built-in triangle intersection and that the application uses a single ray type. These constraints are straightforward to relax, as discussed in more detail below.
Naive approach: Geometry and material properties per instanceÂ
The easiest way to map shaders and data to an instance is to explicitly store references to all shaders and data on a separate SBT record per instance (per ray type). The shaders themselves are referred to in the header section of the SBT record while references to geometric data and shading parameters are stored within the user defined data block. Small parameters may be stored directly.Â
One example is a path-traced renderer that supports simple diffuse and glossy materials and performs smooth shading through per-vertex normals. Note that heavy-weight data like vertex attributes are stored in a separate global memory allocation and only a reference to the data is stored directly in the data block. Figure 1 shows the associated data per component.
The following structure would accommodate this in an SBT record:Â
struct ShadingParams {
Float3* normals
Float3 reflectance
Float roughness
}
Note that the reflectance parameter can be shared between the diffuse and glossy materials, but the roughness parameter must be stored even when it is not necessary. This is a downside to this approach. The data section must be at least as large as the footprint of the largest parameter set out of all materials. Â
Let’s look at this layout applied to a hypothetical scene with:Â
- 100,000 instances Â
- 50,000 unique triangle meshes (GASs)Â
- Two material shaders (glossy and diffuse as previously discussed)Â
- 10,000 unique material parameter setsÂ
The number of materials is often smaller than the number of instances since some objects might share the same material description.
The storage required for this scheme is the number of instances multiplied by the size of the SBT record. In this case, the SBT data section size is 24 bytes; however, the 16-byte alignment causes rounding up to 32 billion. The header section is always 32 bytes, as required by the API. This gives a total shading data overhead of about 6 MB, the total size of the SBT hit group list.Â
This approach is memory heavy, storing identical SBT headers and data sections many times. This memory bloat not only takes up valuable GPU storage but may cause a decrease in GPU-side performance due to incoherent memory accesses and incurs host-side overheads to populate and maintain an oversized SBT array. However, this technique can still be a reasonable choice for simple applications and scene setups.Â
Also, note that the data section can always be kept at the minimum non-zero size of 16 bytes by putting all shading and geometry data in a global allocation and simply storing a pointer to the data in the record. However, this incurs the storage cost of an additional pointer and an extra memory indirection.Â
Optimized approach: Reducing redundancy by moving away from per-instance storage
In this section we will work through optimizing the SBT and data layout to mitigate these problems. The key to reducing the memory footprint of the SBT and all shading data is to exploit redundancy. Â
First, completely remove the data section of the SBT record. Instead, store the shading data in global memory and use knowledge of the OptiX hierarchy to access it. In the case of a single SBT record per GAS, this mapping is trivial. The shading parameters are moved into an array in global memory with one entry for every unique combination of material parameters and geometry data present in the scene.Â
The number of ShadingParams entries in this array is at most one per instance, but in practice could be much smaller as there often are multiple instances of a given mesh with the same material parameters. Each instance in the scene then has its instance ID set to the array index for its shading parameters. Â
Figure 3 shows how the data now looks. All shading data is moved into a separate global array. The size of the array is determined by the number of unique combinations of geometry and material parameters.
The current shading data can be accessed in device code:Â
ShadingParams& params = shading_data_array[optixGetInstanceId()]
Note that instance IDs do not need to be unique. If multiple instances share the same shading and geometric data, they can share an identical instance ID used to index into the param list.Â
Next tackle the redundant storage of geometry params. When a given GAS is instanced multiple times, the reference to its shading-normal array is stored multiple times—once per instance in the naive layout and once per unique geometry/shading parameter set in our current approach. Ideally, the normals should only be stored once per GAS since they do not vary from instance to instance.  Â
Storage of per-GAS data can be easily implemented with the new optixGetGASPointerFromHandle function introduced in OptiX 8.1. This function retrieves the address of the binary acceleration structure data associated with a given GAS. An application may therefore prefix this acceleration structure data with an arbitrary chunk of user data when allocating memory at GAS build time. A device function can retrieve the user data by calling optixGetGASPointerFromHandle and subtracting the size of the user data segment.Â
Now the data is organized as follows:
MaterialParams {
float3 reflectance
float roughness
}
GeometryParams {
float3* normals
}
Figure 4 shows the memory layout for this new scheme. Geometry shading parameters are now associated with their respective geometry meshes. The global shading data array size is now only dependent on the number of unique material parameters.
This enables eliminating multiple stores of each unique GeometryParams, but also redundancy within the material params is reduced since uniqueness of params no longer requires a unique set of both material and geometry parameters.Â
The final optimization eliminates the redundant storage of material programs in the SBT header. There are only two unique sets of material programs, one for glossy and one for diffuse, yet these are stored many times over, once per instance. Rather than storing a single SBT entry per instance, it’s possible to store a single SBT record per material type—in this example, two. Now each instance has its SBT offset set to either zero or one, depending on whether it is diffuse or glossy.Â
Figure 5 shows the final data layout. Note that the geometry and material parameter layout is unchanged from Figure 4.
The final data usage for shading data is as follows:Â Â
num-GASs*sizeof(GeometryParams) + Â
num-unique-material-instances*sizeof(MaterialParams) + Â
num material-shaders*OPTIX_SBT_RECORD_HEADER_SIZE Â
This is well less than a megabyte. Note that the storage size is no longer tied at all to the number of instances in the scene. This can be a huge win for complex real-world scenes with massive instancing and shaders with tens or hundreds of parameters each.  Â
Extending for alternative shading setupsÂ
OptiX intrinsics provide access to scene state, which enables more complicated data lookups than the constrained example. These state querying functions include:Â
- optixGetSBTDataPointer: The address of the current primitive’s SBT record data block. This is simply the address calculated using [Equation 1] offset by the size of the opaque record header.Â
- optixGetInstanceId: The ID assigned by the application when creating the current instance (OptixInstance::instanceId). This value does not need to be unique across instances and can be chosen arbitrarily within the range [0 – 2^28).Â
- optixGetInstanceIndex: Returns the zero-based index of the instance within its instance acceleration structure.Â
- optixGetPrimitiveIndex: The primitive index of the currently intersected triangle, sphere, curve, or custom geometry. For details, see the OptiX Programming Guide.Â
- optixGetSbtGASIndex: The SBT offset within the current GAS (as specified by the build input’s sbtIndexOffsetBuffer).Â
- optixGetGASPointerFromHandle: This retrieves the address of the beginning of a geometry acceleration structure in device memory. The application can then precede the GAS memory with a per-geometry chunk of data.Â
Example shading setupsÂ
This section provides a few example shading setups and how they might be approached.
Multiple geometry typesÂ
The improved layout can easily accommodate multiple different types of geometry, but the number of SBT entries now depends on the number of combinations of geometry types and material shaders since a hit group encapsulates geometry through the intersection program, and the material through the hit programs. For three geometry types and two materials there would be at most six SBT hit-group entries.Â
Multiple ray typesÂ
The number of ray types is again a multiplier of the number of SBT hit-group entries. Even for real-world examples, the total product of ray types, geometry types, and material types is usually in the 10s to low 100s.Â
Multiple materials within a single GASÂ
There are multiple ways to handle this situation, but one easy method is to create an ancillary lookup table to redirect to shader param indices. This table can be indexed by the instance ID plus the SBT GAS index.Â
Summary
The NVIDIA OptiX library provides flexible mechanisms for binding shaders and data to your ray tracing applications. Using the shader binding table to store and retrieve data at the instance level is a straightforward approach that might work well for simple applications. However, a more optimized approach that avoids redundant bindings and data storage can increase performance and avoid memory bloat.
To get started, download the NVIDIA OptiX SDK and check out the documentation for more details.