[GAPT Series - 13] Final Remarks

Posted by Daqi's Blog on June 30, 2017

13.1 Summary

This series focuses on how to improve the solution to the real-time path tracing problem by introducing and discussing possible optimizations in 3 categories – SAS, sampling and SIMD, which are implemented in a program with real-time rendering and interaction capability. While the SIMD optimization bases itself on the parallel computing model in GPGPU and aimed specially for the real-time requirement, the first two categories – SAS and sampling – are not hardware dependent and also used in off-line renderers as they are defined in the domain of a single computing thread. However, it is also possible to improve the models involved in these two categories to achieve better collaboration with the GPGPU model. For SAS, as a common bottleneck of ray tracing processes, SAH based kd-tree and BVH were introduced for being the optimum of their peers in minimizing expected global cost of ray-primitive intersection test and their indispensable functions in different applications, and optimization techniques on such data structure including triangle clipping and short stack traversal for kd-tree and node refitting for dynamic BVH are also discussed with implementation details. In the chapter for sampling, different context-based optimization methods on Monte Carlo algorithm which are all aimed for decrease variance in rendering – importance sampling on BSDF, next event estimation for direct lighting, multiple importance sampling combining the previous two, and bidirectional path tracing for difficult lighting conditions – were introduced. Moreover, Metropolis Light Transport as a modification of the basic Monte Carlo process based on Markov Chain was introduced and some implementation details on GPU were shared. For SIMD optimization, data structure rearrangement, code-level thread divergence reduction, thread compaction as three different types were illustrated with codes and test cases. A more efficient ray-triangle intersection solution which transforms the problem space was cited for its contribution on the performance increase of our program. More importantly, we proposed a new GPU construction algorithm for SAH kd-tree in full details, which turns out to help greatly reducing the initialization overhead for complex model. In addition, the underlying mechanism of rendering effects chosen and supported in our program – surface-to-surface reflection/refraction, volume rendering, and subsurface scattering were analyzed to clarify possible complications in usage. For most methods we introduced and discussed, test cases on our path tracer were provided to verify the ideas. Finally, we benchmarked our program with the path tracing demo in NVIDIA’s Optix engine and a free mainstream path tracer to prove that our program has a large advantage in rendering simple scenes like the Cornell Box by improving the performance by up to 30% and slightly outperforms a free mainstream path tracer for a complex rendering of a car, which means it is at least competitive with most of the mainstream path tracers nowadays in real-time rendering of models with industrial complexity. By analyzing, gathering, testing, and integrating different optimization techniques into a whole process, and choosing the correct rendering methods, we can efficiently produce aesthetically-pleasing, photorealistic results.

13.2 Limitations & Recommendations for Further Work

Given the immense potential of GPGPU, it is possible to see path tracing offering a photorealistic, film standard experience, replacing rasterization-based graphics to be the gaming standard in the future as the hardware performance continues to multiply. However, improvements in algorithm and software structure are also necessary to reduce as much workload as possible to accelerate the coming of such day. This thesis addresses many distinctive issues of real-time path tracing such as large thread divergence and dynamic geometry. However, many problems that may appear in future real-world applications of path tracing have not been considered due to the time limit. One such problem is to efficiently render a large set of animation data which may contain particle system or complex deformation. Another problem is the insufficient optimization of the spatial acceleration structure which is a bottleneck in ray-traced graphics. New algorithms or hardware need to be developed to continuously improve the traversal speed and update or rebuild the SAS with minimal efforts. In addition, better parallelization methods are still required for some algorithms with relatively obscure parallelizability but tremendous serial performance like Metropolis Light Transport, even though many have been developed. Moreover, parsing can be transferred to the GPU to greatly reduce the initialization time of geometrically complex scenes.


Ashikhmin, M., & Shirley, P. (2000). An anisotropic Phong BRDF model. Journal of graphics tools, 5(2), 25-32.

Beason, K. (2007). Smallpt: Global Illumination in 99 lines of C++. Retrieved from http://www.kevinbeason.com/smallpt/

Chandrasekhar, S. (1960). Radiative Transfer. New York: Dover Publications. Originally published by Oxford University Press, 1950.

Chandrasekhar, S. (1960). The stability of non-dissipative Couette flow in hydromagnetics. Proceedings of the National Academy of Sciences, 46(2), 253-257.

Cook, R. L., & Torrance, K. E. (1982). A reflectance model for computer graphics. ACM Transactions on Graphics (TOG), 1(1), 7-24.

Foley, T., & Sugerman, J. (2005, July). KD-tree acceleration structures for a GPU raytracer. In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware (pp. 15-22). ACM.

Henyey, L. G., & Greenstein, J. L. (1941). Diffuse radiation in the galaxy. The Astrophysical Journal, 93, 70-83.

Jensen, H. W., Marschner, S. R., Levoy, M., & Hanrahan, P. (2001, August). A practical model for subsurface light transport. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques (pp. 511-518). ACM.

Kajiya, J. T. (1986, August). The rendering equation. In ACM Siggraph Computer Graphics (Vol. 20, No. 4, pp. 143-150). ACM.

Kopta, D., Ize, T., Spjut, J., Brunvand, E., Davis, A., & Kensler, A. (2012, March). Fast, effective BVH updates for animated scenes. In Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games (pp. 197-204). ACM.

Lafortune, E. P., & Willems, Y. D. (1993). Bi-directional path tracing.

NVIDIA. (2015). Memory Transactions. NVIDIA® Nsight™ Development Platform, Visual Studio Edition 4.7 User Guide. Retrieved from http://docs.nvidia.com/gameworks/content/developertools/desktop/analysis/ report/cudaexperiments/sourcelevel/memorytransactions.htm

NVIDIA. (2017). CUDA Toolkit Documentation. Retrieved From http://docs.nvidia.com/cuda/thrust/#axzz4dK4GrjBF

Pauly, M. (1999). Robust Monte Carlo Methods for Photorealistic Rendering of Volumetric Effects (Doctoral dissertation, Master’s Thesis, Universität Kaiserslautern).

Pharr, M., Jakob, W., & Humphreys, G. (2011). Physically based rendering: From theory to implementation. Second Edition. Morgan Kaufmann.

Santos, A., Teixeira, J. M., Farias, T., Teichrieb, V., & Kelner, J. (2012). Understanding the efficiency of KD-tree ray-traversal techniques over a GPGPU architecture. International Journal of Parallel Programming, 40(3), 331-352.

Schlick, C. (1994, August). An Inexpensive BRDF Model for Physically‐based Rendering. In Computer graphics forum (Vol. 13, No. 3, pp. 233-246). Blackwell Science Ltd.

Schmittler, J., Woop, S., Wagner, D., Paul, W. J., & Slusallek, P. (2004, August). Realtime ray tracing of dynamic scenes on an FPGA chip. In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware (pp. 95-106). ACM.

Veach, E. (1997). Robust monte carlo methods for light transport simulation (Doctoral dissertation, Stanford University).

Vinkler, M., Havran, V., & Bittner, J. (2014, May). Bounding volume hierarchies versus kd-trees on contemporary many-core architectures. In Proceedings of the 30th Spring Conference on Computer Graphics (pp. 29-36). ACM.

Wald, I., & Havran, V. (2006, September). On building fast kd-trees for ray tracing, and on doing that in O (N log N). In Interactive Ray Tracing 2006, IEEE Symposium on (pp. 61-69). IEEE.

Walter, B., Marschner, S. R., Li, H., & Torrance, K. E. (2007, June). Microfacet models for refraction through rough surfaces. In Proceedings of the 18th Eurographics conference on Rendering Techniques (pp. 195-206). Eurographics Association.

Ward, G. J. (1992). Measuring and modeling anisotropic reflection. ACM SIGGRAPH Computer Graphics, 26(2), 265-272.

Zhou, K., Hou, Q., Wang, R., & Guo, B. (2008). Real-time kd-tree construction on graphics hardware. ACM Transactions on Graphics (TOG), 27(5), 126.


The following pictures show the result of rendering a BMW M6 car for one minute in Cycles Render, one minute in our path tracer, and one hour in our path tracer, successively. The BMW M6 car model was modeled by Fred C. M’ule Jr in 2006, under CC-Zero (public domain) license, downloaded from http://www.blendswap.com/blends/view/3557.