[GAPT Series - 13] Conclusions

Posted by Daqi's Blog on June 30, 2017

13.1 Summary

This series focuses on how to improve the solution to the real-time path tracing problem by introducing and discussing possible optimizations in 3 categories – SAS, sampling and SIMD, which are implemented in a program with real-time rendering and interaction capability. While the SIMD optimization bases itself on the parallel computing model in GPGPU and aimed specially for the real-time requirement, the first two categories – SAS and sampling – are not hardware dependent and also used in off-line renderers as they are defined in the domain of a single computing thread. However, it is also possible to improve the models involved in these two categories to achieve better collaboration with the GPGPU model. For SAS, as a common bottleneck of ray tracing processes, SAH based kd-tree and BVH were introduced for being the optimum of their peers in minimizing expected global cost of ray-primitive intersection test and their indispensable functions in different applications, and optimization techniques on such data structure including triangle clipping and short stack traversal for kd-tree and node refitting for dynamic BVH are also discussed with implementation details. In the chapter for sampling, different context-based optimization methods on Monte Carlo algorithm which are all aimed for decrease variance in rendering – importance sampling on BSDF, next event estimation for direct lighting, multiple importance sampling combining the previous two, and bidirectional path tracing for difficult lighting conditions – were introduced. Moreover, Metropolis Light Transport as a modification of the basic Monte Carlo process based on Markov Chain was introduced and some implementation details on GPU were shared. For SIMD optimization, data structure rearrangement, code-level thread divergence reduction, thread compaction as three different types were illustrated with codes and test cases. A more efficient ray-triangle intersection solution which transforms the problem space was cited for its contribution on the performance increase of our program. More importantly, we proposed a new GPU construction algorithm for SAH kd-tree in full details, which turns out to help greatly reducing the initialization overhead for complex model. In addition, the underlying mechanism of rendering effects chosen and supported in our program – surface-to-surface reflection/refraction, volume rendering, and subsurface scattering were analyzed to clarify possible complications in usage. For most methods we introduced and discussed, test cases on our path tracer were provided to verify the ideas. Finally, we benchmarked our program with the path tracing demo in NVIDIA’s Optix engine and a free mainstream path tracer to prove that our program has a large advantage in rendering simple scenes like the Cornell Box by improving the performance by up to 30% and slightly outperforms a free mainstream path tracer for a complex rendering of a car, which means it is at least competitive with most of the mainstream path tracers nowadays in real-time rendering of models with industrial complexity. By analyzing, gathering, testing, and integrating different optimization techniques into a whole process, and choosing the correct rendering methods, we can efficiently produce aesthetically-pleasing, photorealistic results.

13.2 Limitations & Recommendations for Further Work

Given the immense potential of GPGPU, it is possible to see path tracing offering a photorealistic, film standard experience, replacing rasterization-based graphics to be the gaming standard in the future as the hardware performance continues to multiply. However, improvements in algorithm and software structure are also necessary to reduce as much workload as possible to accelerate the coming of such day. This thesis addresses many distinctive issues of real-time path tracing such as large thread divergence and dynamic geometry. However, many problems that may appear in future real-world applications of path tracing have not been considered due to the time limit. One such problem is to efficiently render a large set of animation data which may contain particle system or complex deformation. Another problem is the insufficient optimization of the spatial acceleration structure which is a bottleneck in ray-traced graphics. New algorithms or hardware need to be developed to continuously improve the traversal speed and update or rebuild the SAS with minimal efforts. In addition, better parallelization methods are still required for some algorithms with relatively obscure parallelizability but tremendous serial performance like Metropolis Light Transport, even though many have been developed. Moreover, parsing can be transferred to the GPU to greatly reduce the initialization time of geometrically complex scenes.


The following pictures show the result of rendering a BMW M6 car for one minute in Cycles Render, one minute in our path tracer, and one hour in our path tracer, successively. The BMW M6 car model was modeled by Fred C. M’ule Jr in 2006, under CC-Zero (public domain) license, downloaded from http://www.blendswap.com/blends/view/3557.