spot_img
2.8 C
London
HomeHIGH ENDEfficient Ray Tracing with NVIDIA OptiX Shader Binding Table Optimization

Efficient Ray Tracing with NVIDIA OptiX Shader Binding Table Optimization

NVIDIA OptiX is the API for GPU-accelerated ray tracing with CUDA, and is often used to render scenes containing a wide variety of objects and materials. During an OptiX launch, when a ray intersects a geometric primitive, a hit shader is executed. The question of which shader is executed for a given intersection is answered by the Shader Binding Table (SBT). The SBT may also be used to map input data to the shading operation.  

This post describes several different approaches to laying out the SBT in your application, and the different ways that your shader can access its data. By making both the SBT and the shading data as small as possible, you can save memory, improve performance, and simplify management of the SBT itself. 

Shader Binding Table design patterns 

Commonly, ray tracing applications store two main types of data for each mesh object: geometric information, such as shading normals, and material parameters, such as a diffuse reflectance or a roughness parameter. These data are accessed by the material shader to perform calculations such as lighting on the current intersection point.  

The following sections develop two approaches to using the SBT and implementing the data lookup. The first approach offers a straightforward utilization of the SBT to store shader programs and lookup data. It is simple to implement but inefficient in memory usage. The second approach iterates on this basic approach to arrive at a more efficient layout, reducing redundancy in the SBT records and shading data to increase efficiency.  

The two designs target the common case of a scene hierarchy with many instances sharing a set of materials. For expository purposes, we will assume all geometry uses OptiX built-in triangle intersection and that the application uses a single ray type. These constraints are straightforward to relax, as discussed in more detail below.

Naive approach: Geometry and material properties per instance 

The easiest way to map shaders and data to an instance is to explicitly store references to all shaders and data on a separate SBT record per instance (per ray type). The shaders themselves are referred to in the header section of the SBT record while references to geometric data and shading parameters are stored within the user defined data block. Small parameters may be stored directly. 

One example is a path-traced renderer that supports simple diffuse and glossy materials and performs smooth shading through per-vertex normals. Note that heavy-weight data like vertex attributes are stored in a separate global memory allocation and only a reference to the data is stored directly in the data block. Figure 1 shows the associated data per component.

Diagram showing three fundamental data blocks. From left to right: Per-Vertex, Per-Glossy Material, and Per-Diffuse Material.
Figure 1. The fundamental data blocks present in the example setup

The following structure would accommodate this in an SBT record: 

struct ShadingParams { 
	Float3* normals 
	Float3 reflectance 
	Float roughness 
} 

Note that the reflectance parameter can be shared between the diffuse and glossy materials, but the roughness parameter must be stored even when it is not necessary.  This is a downside to this approach. The data section must be at least as large as the footprint of the largest parameter set out of all materials.  

Let’s look at this layout applied to a hypothetical scene with: 

  • 100,000 instances  
  • 50,000 unique triangle meshes (GASs) 
  • Two material shaders (glossy and diffuse as previously discussed) 
  • 10,000 unique material parameter sets 

The number of materials is often smaller than the number of instances since some objects might share the same material description.

Diagram showing simple data layout with inlined SBT data segments.
Figure 2. A simple layout with data inlined in the SBT data segments

The storage required for this scheme is the number of instances multiplied by the size of the SBT record. In this case, the SBT data section size is 24 bytes; however, the 16-byte alignment causes rounding up to 32 billion. The header section is always 32 bytes, as required by the API. This gives a total shading data overhead of about 6 MB, the total size of the SBT hit group list. 

This approach is memory heavy, storing identical SBT headers and data sections many times. This memory bloat not only takes up valuable GPU storage but may cause a decrease in GPU-side performance due to incoherent memory accesses and incurs host-side overheads to populate and maintain an oversized SBT array. However, this technique can still be a reasonable choice for simple applications and scene setups. 

Also, note that the data section can always be kept at the minimum non-zero size of 16 bytes by putting all shading and geometry data in a global allocation and simply storing a pointer to the data in the record. However, this incurs the storage cost of an additional pointer and an extra memory indirection. 

Optimized approach: Reducing redundancy by moving away from per-instance storage

In this section we will work through optimizing the SBT and data layout to mitigate these problems. The key to reducing the memory footprint of the SBT and all shading data is to exploit redundancy.  

First, completely remove the data section of the SBT record. Instead, store the shading data in global memory and use knowledge of the OptiX hierarchy to access it. In the case of a single SBT record per GAS, this mapping is trivial. The shading parameters are moved into an array in global memory with one entry for every unique combination of material parameters and geometry data present in the scene. 

The number of ShadingParams entries in this array is at most one per instance, but in practice could be much smaller as there often are multiple instances of a given mesh with the same material parameters. Each instance in the scene then has its instance ID set to the array index for its shading parameters.  

Figure 3 shows how the data now looks. All shading data is moved into a separate global array. The size of the array is determined by the number of unique combinations of geometry and material parameters.

Diagram of global memory array with ShadingParams elements.
Figure 3. Global array for shading params, separated from the SBT 

The current shading data can be accessed in device code: 

ShadingParams& params = shading_data_array[optixGetInstanceId()]

Note that instance IDs do not need to be unique. If multiple instances share the same shading and geometric data, they can share an identical instance ID used to index into the param list. 

Next tackle the redundant storage of geometry params. When a given GAS is instanced multiple times, the reference to its shading-normal array is stored multiple times—once per instance in the naive layout and once per unique geometry/shading parameter set in our current approach. Ideally, the normals should only be stored once per GAS since they do not vary from instance to instance.   

Storage of per-GAS data can be easily implemented with the new optixGetGASPointerFromHandle function introduced in OptiX 8.1. This function retrieves the address of the binary acceleration structure data associated with a given GAS. An application may therefore prefix this acceleration structure data with an arbitrary chunk of user data when allocating memory at GAS build time. A device function can retrieve the user data by calling optixGetGASPointerFromHandle and subtracting the size of the user data segment. 

Now the data is organized as follows:

MaterialParams { 
	float3 reflectance 
	float roughness 
} 
GeometryParams { 
	float3* normals 
}

Figure 4 shows the memory layout for this new scheme. Geometry shading parameters are now associated with their respective geometry meshes. The global shading data array size is now only dependent on the number of unique material parameters.

Diagram of geometry shading parameters associated with respective geometry meshes.
Figure 4. Geometry shading parameters are now associated with their respective geometry meshes 

This enables eliminating multiple stores of each unique GeometryParams, but also redundancy within the material params is reduced since uniqueness of params no longer requires a unique set of both material and geometry parameters. 

The final optimization eliminates the redundant storage of material programs in the SBT header. There are only two unique sets of material programs, one for glossy and one for diffuse, yet these are stored many times over, once per instance. Rather than storing a single SBT entry per instance, it’s possible to store a single SBT record per material type—in this example, two. Now each instance has its SBT offset set to either zero or one, depending on whether it is diffuse or glossy. 

Figure 5 shows the final data layout. Note that the geometry and material parameter layout is unchanged from Figure 4.

Reduced SBT showing Header (32 bytes) and Opaque Shader Handle.
Figure 5. SBT dependent only on the number of unique shaders in the scene 

The final data usage for shading data is as follows:  

num-GASs*sizeof(GeometryParams) +  
num-unique-material-instances*sizeof(MaterialParams) +  
num material-shaders*OPTIX_SBT_RECORD_HEADER_SIZE  

This is well less than a megabyte. Note that the storage size is no longer tied at all to the number of instances in the scene. This can be a huge win for complex real-world scenes with massive instancing and shaders with tens or hundreds of parameters each.   

Extending for alternative shading setups 

OptiX intrinsics provide access to scene state, which enables more complicated data lookups than the constrained example. These state querying functions include: 

  • optixGetSBTDataPointer: The address of the current primitive’s SBT record data block. This is simply the address calculated using [Equation 1] offset by the size of the opaque record header. 
  • optixGetInstanceId: The ID assigned by the application when creating the current instance (OptixInstance::instanceId). This value does not need to be unique across instances and can be chosen arbitrarily within the range [0 – 2^28). 
  • optixGetInstanceIndex: Returns the zero-based index of the instance within its instance acceleration structure. 
  • optixGetPrimitiveIndex: The primitive index of the currently intersected triangle, sphere, curve, or custom geometry. For details, see the OptiX Programming Guide. 
  • optixGetSbtGASIndex: The SBT offset within the current GAS (as specified by the build input’s sbtIndexOffsetBuffer). 
  • optixGetGASPointerFromHandle: This retrieves the address of the beginning of a geometry acceleration structure in device memory. The application can then precede the GAS memory with a per-geometry chunk of data. 

Example shading setups 

This section provides a few example shading setups and how they might be approached.

Multiple geometry types 

The improved layout can easily accommodate multiple different types of geometry, but the number of SBT entries now depends on the number of combinations of geometry types and material shaders since a hit group encapsulates geometry through the intersection program, and the material through the hit programs. For three geometry types and two materials there would be at most six SBT hit-group entries. 

Multiple ray types 

The number of ray types is again a multiplier of the number of SBT hit-group entries.  Even for real-world examples, the total product of ray types, geometry types, and material types is usually in the 10s to low 100s. 

Multiple materials within a single GAS 

There are multiple ways to handle this situation, but one easy method is to create an ancillary lookup table to redirect to shader param indices. This table can be indexed by the instance ID plus the SBT GAS index. 

Summary

The NVIDIA OptiX library provides flexible mechanisms for binding shaders and data to your ray tracing applications. Using the shader binding table to store and retrieve data at the instance level is a straightforward approach that might work well for simple applications. However, a more optimized approach that avoids redundant bindings and data storage can increase performance and avoid memory bloat.

To get started, download the NVIDIA OptiX SDK and check out the documentation for more details.

latest articles

explore more