Using RTX to Accelerate Instant Radiosity

Posted by Daqi's Blog on January 24, 2019

The past 2018 was an exciting year for computer graphics. Nvidia announced RTX graphics cards which brings real-time ray tracing to consumers. Following the announcement, we saw new releases of mainstream game series including Battlefield V and Shadow of the Tomb Raider, putting RTX powered graphics in their games. This screenshot below captured from a Battlefield V promotion video ( shows super clear ray traced reflections in water.


However, the recent new game releases branded with RTX graphics mostly use RTX for tracing reflections and shadows (including soft shadow), many other possibilities with real-time ray tracing are still to be explored. Of course, ray traced reflections and shadows can largely enhance the overall graphics quality, considering the importance of these two components and how bad their quality was even using very complex rasterization tricks. But there are still tons of possible applications of real-time ray tracing that can lift the overall graphics quality to a new level. For example, we can achieve more faithful subsurface scattering in translucent objects like marbles and human skin. Current technique use shadow maps to estimate the distanced traveled by light inside the object, which can fall short with concave objects and have artefacts around object edges. But ray tracing simply avoids all artefacts brought by rasterization tricks and everything will appear as they should be.

The thing I want to advocate is using RTX to generate virtual point lights (VPL) and trace shadow rays to them. This technique is known as instant radiosity, introduced by Alexander Keller in 1997. It resembles bidirectional path tracing and photon mapping in the sense that it also traces light paths and records hit positions along the paths. These surface records are then used as point lights that represent discretized indirect lighting. Comparing to photon mapping, instant radiosity is cheap and effective to render smooth diffuse reflection, thanks to the low frequency spatial radiance distribution of point lights. A few hundreds of VPLs are enough to provide indirect lighting from a small light source with reasonable quality (of course, there is singularity problem caused by the point light approximation, but that can be bypassed by setting a minimum distance between point lights and surface points), which often requires more than 1M photons when using photon mapping to eliminate the wavy artefact. As a result, virtual point lights have been used in games for flashlights. An example is the rendering of indirect illumination in rooms from gun mounted flashlights in Gears of War 4. [Malmros, 2017] (talk: The developers used reflection shadow maps [Dachsbacher & Stamminger, 2005] ( to sample single-bounce VPLs and merge VPLs according to some geometric and material heuristics to lower the computational cost. This technique gives real-time single bounce GI (likely unshadowed). The following screenshots from the talk video a comparison between flashlight aiming at the red tapestry and the wall. Clearly, the technique generates reasonable color bleeding as shown by the change of color on the ceiling.


Looking carefully into the talk video, there are some temporal incoherence as the flashlight moves. However it is not obvious in a dark environment like this, especially when using a moving FPS game character. But the render quality can still be improved if the virtual point lights can be generated and evaluated at a lower cost. First, using more VPLs improves the temporal stability and reduces the bright blotches. Also, given strict performance requirements Gears of War 4 probably only used didn’t trace shadow rays to VPLs for indirect illumination shadows, which could be very important given a more complex scene setting. Now with RTX available, we can make VPLs much cheaper, solving the aforementioned problems.

DirectX Ray Tracing

There are several options to get access to the ray tracing function in RTX cards. But the most convenient one for PC gaming development is the DirectX Ray Tracing (DXR) API, which integrates seamlessly with the rasterization pipeline we use everyday. There is a very nice introduction to DXR from Nvidia ( In most concise words, DXR breaks the ray tracing process into three new shaders, “raygen”, “hit” and “miss”. Rendering starts from “raygen” or ray generation, invoked in a grid manner like a compute shasder. In all shaders, calling TraceRay() executes the fixed function hardware scene traversal using an acceleration structure (BVH). Because can a ray can hit any object, a shader table is used to store shading resources for each geometry object. Upon intersection, the entry for the current object can be retrieved from the shader table to determine which shaders and textures to use. With these functions, we can virtually implement all possible ray tracing functions with DXR, except choosing and our own acceleration structure (for example, k-d tree).

Example Implementation

Here I provide a brief code walk through of using DXR to implement the original (brute force) instant radiosity algorithm. Some parts of code is modified from the Microsoft MiniEngine DXR example ( Notice that it is not meant to be interactive as the original instant radiosity is an offline rendering algorithm that goes through all virtual point lights for each pixel, and shooting shadow rays to resolve the visibility. In our experiment we will generate 1 million VPLs for at most two-bounce indirect diffuse reflection and render a 1280x720 instant radiosity image. This means we need to trace 921.6 giga shadow rays against the scene. My DXR program running on an RTX 2080 takes 25 minutes to render a instant radiosity image for Crytek Sponza (262k triangles), about 600 mega rays per second which is quite impressive. Please notice that I a very unoptimzed way of tracing shadow rays (trace shadow rays for one light for all pixels in each frame with which contains a fixed G-buffer overhead). With proper optimization, the speed should at least reach 1 Giga rays per second. The VPL generation process is relatively fast, taking less than 2 ms. Here is the raygen shader I used to generate the VPLs.

To sample light rays from a directional distant light (sun light), I used a function like this to randomly sample a point from the top disk of the bounding cylinder of the scene bounding sphere aligned to the sun light direction (a technique introduced in PBRT). This point is then used as the origin of the light ray, which is then traced through the scene and make diffuse bounces when intersecting with surfaces.

void GenerateRayFromDirectionalLight(uint2 seed, out float3 origin, out float pdf)
	float3 v1, v2;
	CoordinateSystem(SunDirection, v1, v2); // get a local coordinate system
	float2 cd = GetUnitDiskSample(seed + DispatchOffset);
	float3 pDisk = - SceneSphere.w * SunDirection +
            SceneSphere.w * (cd.x * v1 + cd.y * v2);
	origin = pDisk;
	pdf = 1 / (PI * SceneSphere.w * SceneSphere.w);

DXR requires us to specify the ray using a ray description (RayDesc) structure that contains the origin, tMin (min parametric intersection distance along the ray), ray direction and tMax. Additionally, for this algorithm we also need to define a payload structure that records the carried radiance on the light path which I call “alpha”. It is multiplied with the surface albedo and the cosine factor during each surface hit. The payload also stores the current recursion depth. This payload feeds the last argument of TraceRay, which can pass the information from ray generation to hit shader and between bounces.

/// some resource and function definitions

struct RayPayload
	float3 alpha;
	uint recursionDepth;

void RayGen()
	float3 origin;
	float pdf;
        // using the 2D ray dispatch index as random seed
	GenerateRayFromDirectionalLight(DispatchRaysIndex().xy, origin, pdf);
	float3 alpha = SunIntensity / pdf;

	RayDesc rayDesc = { origin,
		FLT_MAX };
	RayPayload rayPayload;
	rayPayload.alpha = alpha;
	rayPayload.recursionDepth = 0;

        // defition of parameters can be found on

	TraceRay(accelerationStructure, RAY_FLAG_NONE, ~0, 0, 1, 0, rayDesc, rayPayload);    

Here is the “closesthit” shader that creates VPLs at surface hits. I use three DX12 linear buffers (StructuredBuffer) to store VPL positions, normals and colors (flux). An atomic counter is incremented and returned each time a new VPL is created to prevent writing to the same position. Following that, a diffuse reflection ray is sampled from the hemisphere centered at the surface normal to generate next bounce. Notice how DXR provides a wide range of fixed functions and variables that stores necessary ray tracing information we need for lighting computation. For example, barycentric coordinates are fetched from “BuiltInTriangleIntersectionAttributes” and the intersection distance is provided in RayTCurrent().

/// some resource and function definitions

void Hit(inout RayPayload rayPayload, in BuiltInTriangleIntersectionAttributes attr)
	uint materialID = MaterialID;
	uint triangleID = PrimitiveIndex();
	RayTraceMeshInfo info = meshInfo[materialID];

        ///fetch texture coordinates (uv0, uv1, uv2), vNormal), vBinormal, vTangent for triangle vertices

	float3 bary = float3(1.0 - attr.barycentrics.x - attr.barycentrics.y, attr.barycentrics.x, attr.barycentrics.y);
	float2 uv = bary.x * uv0 + bary.y * uv1 + bary.z * uv2;

	float3 worldPosition = WorldRayOrigin() + WorldRayDirection() * RayTCurrent();
	uint2 threadID = DispatchRaysIndex().xy + DispatchOffset;
	const float3 rayDir = normalize(-WorldRayDirection());
	uint materialInstanceId = info.m_materialInstanceId;
	const float3 diffuseColor = g_localTexture.Sample(defaultSampler, uv).rgb;
	float3 normal = g_localNormal.SampleLevel(defaultSampler, uv, 0).rgb * 2.0 - 1.0;
	float3x3 tbn = float3x3(vTangent, vBinormal, vNormal);
	normal = normalize(mul(normal, tbn));

        // sample a diffuse bounce
	float3 R = GetHemisphereSampleCosineWeighted(threadID * (rayPayload.recursionDepth+1), normal);
	float3 alpha = rayPayload.alpha;

        // attenuate carried radiance
	alpha *= diffuseColor;

        // increment atomic counter
	uint VPLid = g_vplPositions.IncrementCounter();

        // store a new vpl
	g_vplPositions[VPLid] = worldPosition;
	g_vplNormals[VPLid] = normal;
	g_vplColors[VPLid] = alpha;

        //push to vpl storage buffer
	if (rayPayload.recursionDepth < MAX_RAY_RECURSION_DEPTH)
		TraceNextRay(worldPosition + epsilon * R, R, alpha, rayPayload.recursionDepth);

Finally, we can gather the VPL contribution for all visible pixels to generate an irradiance buffer. Notice that texPosition and texNormal are surface world position and normal from the G buffer.

The irradiance $E$ can be calcuated as $ E = \Phi \frac{<n_s, r><n_l, -r>}{||r||^2} $, where $\Phi$ is VPL flux, $n_s$, $n_l$ are surface normal and VPL normal, $p_s$, $p_l$ are surface and VPL positions, and $r$ is the normalized shadow ray direction.

The following shader code shows computing the contribution from one VPL, indexed by VPLId. A shadow ray is traced to resolve the visibility between current pixel and VPL, and the max ray T is set to the light distance which means any hit is an occlusion. This message can be stored in the ray payload as a boolean. Because we are tracing shadow rays, it is better to set RAY_FLAG_ACCEPT_FIRST_HIT_AND_END_SEARCH argument in TraceRay() to avoid the unnecessary search of closest hit.

/// some resource and function definitions

/// an anyhit shader setting payload.IsOccluded to translucent

void RayGen()
	float3 output = 0;

	int2 screenPos = DispatchRaysIndex().xy;

	float3 SurfacePosition = texPosition[screenPos].xyz;
	float3 lightPosition = vplPositions[VPLId].xyz;
	float3 lightDir = lightPosition - SurfacePosition;
	float lightDist = length(lightDir);
	lightDir = lightDir / lightDist; //normalize

	// cast shadow ray
	RayDesc rayDesc = { SurfacePosition,
			  0.1f, //bias
			  lightDist };

	ShadowRayPayload payload;
	payload.IsOccluded = false;

		~0, 0, 1, 0, rayDesc, payload);
	if (!payload.IsOccluded) // no occlusion
		float3 SurfaceNormal = texNormal[pixelPos].xyz;

		float3 lightNormal = vplNormals[VPLId].xyz;
		float3 lightColor = vplColors[VPLId].xyz;

		output = max(dot(SurfaceNormal, lightDir), 0.0) * lightColor;
		output *= max(dot(lightNormal, -lightDir), 0.0) / (lightDist*lightDist + CLAMPING_BIAS);

	irradianceBuffer[pixelPos] += output / numVPLs;

After we iterate through all VPLs, we can modulate the irradiance buffer with the surface albedo to produce the final indirect illumination and add that to the direct illumination. And here is the final image.


If you look into VPL-based GI literature, you’ll find a ton of methods that approximates the instant radiosity with a dramatic reduction in evaluation cost. But none of them can reach real time performance yet. But we’ll continue with this topic to see what RTX can bring us for real time global illumination.


Dachsbacher, C., & Stamminger, M. (2005, April). Reflective shadow maps. In Proceedings of the 2005 symposium on Interactive 3D graphics and games (pp. 203-231). ACM.

Malmros, J. (2017, July). Gears of War 4: custom high-end graphics features and performance techniques. In ACM SIGGRAPH 2017 Talks (p. 13). ACM.