Daqi's Blog


A CS geek, a travel enthusiast, a music arranger
News: The new version of my CUDA Interactive Path Tracing demo is out! Download it and try! It is very fast!
[demo (requires Nvidia graphics cards)]
News: I've uploaded the demo video for my GPU Path Tracer to YouTube. Check it out! I've also uploaded the slides and the paper of the project to my website. Enjoy!
[slides] [paper]
News: I have put my two original technical series online, click to check out what I was/am researching on:
Physically-based Rendering Project [demo]
GPU Accelerated Path Tracing Project [demo (requires Nvidia graphics cards)]

Finding the Minimal Support Volume for 3D Printing


Latest Techniques in Interactive Global Illumination


Stencil Buffer Trick for Volume Deferred Shading


A Way of Rendering Crescent-shaped Shadows under Solar Eclipse


[GAPT Series - 13] Conclusions


Finding the Minimal Support Volume for 3D Printing

This post shares a very interesting topic I learned in my internship at Graphics Lab, Xiamen University, during last summer. Due to the fact that 3D printing is additive manufacturing, upper layers need to be glued upon lower layers, thus creating a very strict constraint that we can only print a layer within the boundary of its lower layer (but in practice, some material has some inner cohesion that allows some part of the layer protrudes from the supporting structure).

...


Latest Techniques in Interactive Global Illumination

This is my presentation at the University of Utah graphics seminar, which is about a very exciting topic - interactive GI!

...


Stencil Buffer Trick for Volume Deferred Shading

Deferred shading have been known to largely boost the efficiency of rendering when treating a large amount of lights. The mechanism is very simple: separate the geometry complexity from lighting. To do this, G-buffers which form an array of textures, often including position, material color, normal of the points to shade, are rendered in the first pass from the point of view. Then, an orthographic camera are used to render a quad that covers the whole screen. The normalized (0-1) screen coordinates are then used to retrieved the geometry/material data at the point of the screen, which is fed into the lighting function. In such a way, we avoid producing tons of fragments from the projected scene geometry, instead, only render those which are visible.

However, imagine we have a large group of lights. We’ll still have to go through the whole list of lights for each screen pixel. With a more physically based lighting model, in which each light has a influential radius (resulted from the physical fact that has an ideal point light source has inverse squared drop-off), fragments that are outside a certain light’s influential radius would waste time on waiting other fragments in the same batch to go to a different branch. We know that branching is bad for GPU. This leads to a severe time penalty. Many techniques have been proposed to alleviate this problem. Tiled deferred shading is a very popular method, probably most of you have heard of. It partition the screen into tiles and create a “light list” for each tile using only the lights that intersect with the tiles. This is of course an elegant method. However, we will always need to do some preprocessing before generating a new group of lights (if there is a need to).

A simpler solution is volume deferred shading. We just need to render “light volumes” for each light, which, as you might guess, can be a sphere with a radius equal to the light’s influential radius. For example, in OpenGL, we just need to create a list of vertices/indices of a sphere and prepare a model matrix for each light (which is simply scaling and translation). While rendering, we perform the draw call n times, where n is the number of lights. One such light volume will produce fragments that covers the specific region on the screen where the fragments are possible to be shaded by the light. Of course, by doing this we are losing the depth dimension. We have to explicitly test the fragments and make sure they are at about the same depth region of the light (which is only a necessary condition). Of course, tile rendering also require such testing, but if we have lights scattering everywhere in a very “deep” scene, the lights to be tested are significantly lesser. However, because no preprocessing are required, volume deferred shading have quite competitive performance in most cases.

Wait! We should not render n passes! Instead, a better way is to use the instancing function which is supported on every modern graphics card to avoid the latency caused by lots of communication between CPU and GPU. Also another important thing, depth write should be disabled and additive blending should be enabled. The reason that depth write should be disabled is that light volumes are not real geometry. While two light volumes are close to each other and are illuminating the same region of the scene, we don’t want them to occlude each other such that some part are only shaded by one of the light volumes.

If you do the volume deferred shading described above directly, we will immediately discover that something goes wrong. When you approaches a light’s location (with a moving camera), at some point the screen will suddenly be darkened. This is because no matter you turn backface culling on/off, you will fall in a dilemma that you either render the pixels twice as bright when you are out of the light volume, or not render anything when you are inside the light volume.

It turns out that this situation can be easily solved by switching the culling mode to front-face culling. However, this is not good enough. We can actually keep the Z buffer created by G-buffers rendering and use this information perform some rejection of fragments that are not intersected by light volumes. Here is a nice stencil buffer trick introduced by Yuriy O’Donnell (2009). What it do is basically using the stencil buffer as a counter to record whether the front face of a light volume is behind the visible scene geometry (so that it has no chance to shade the pixel). This is achieved by only rendering front faces (with color buffer write disabled) in the first pass and add 1 to the stencil of the Z-failed pixels. Another situation is that the backface of a light volume is before the visible scene geometry, which is solved by the second pass - rendering only back faces and use a Greater/Equal z-test to continue filter the final pixels from the pixels with a zero stencil value (already pass the first test). So that we can keep only the light volume pixels that “fail” the z-test (the original “LESS” one), which intuitively corresponds to the scene geometry-generated pixels that intersects with the light volume. Notice this trick also works when you are inside a light volume, in which case front faces won’t get rendered (it is illogical that the fact that we are inside a light volume and that the front face is behind the visible geometry hold at the same time!), leaving a zero stencil value that allow us to use Greater/Equal depth test only to filter the pixels to be shaded. Of course, in either pass we need to disable Z write. Certainly we don’t want the light volumes bumping into each other.

alt text

The original diagram used by Yuriy O'Donnell.

While this trick definitely increases the efficiency especially when the lighting situation is very complex, we can do something better. Often modeling a detailed sphere as polygon creates large number of vertices that cram up the pipeline. Instead, we can use a very coarse polygon-sphere (e.g., with 4 vertical/horizontal slices) with a slightly larger radius to ensure that the light volume is bounded correctly. We can even use just a bounding cube of the light volume! Of course, the least thing we can do is just use a quad. However, that gives up some aforementioned depth test benefits and it also involves some complex projective geometry. Just for fun, I also prepared a derivation of the axis-aligned screen space bounding quad of the projected sphere.

alt text alt text

...


A Way of Rendering Crescent-shaped Shadows under Solar Eclipse

Happy new year! I haven’t posted for a long time. However, I have done many exciting projects in the last half year and I’ll upload some of them soon. The first project I want to share with you is the render I created in in the Utah Teapot Rendering Competition. We were all awed by the great solar eclipse on August 21th. Do you remember the crescent-shaped shadow casted by tree leaves? It was so beautiful that the first time I saw it, I want to render it with ray tracing. Before working on this competition, I thought that there must be some complex math to figure out to simulate this rare phenomenon. However, the problem turned out to be embarrassingly simple. We can just model the actual geometry of sun and moon and trace rays. You might think that it is a crazy idea. In fact, instead of creating a sun with a diameter of 1.4 million kilometers and putting it 150 million kilometers away, we can simply put a 1-unit wide sun 100 units away from our scene, where 1 unit is approximately how big our scene is. It is supposed to have almost the same result. Then we use the same kind of trick to put a moon with a slightly smaller diameter and slightly ahead of the sun to make sure that the sun is eclipsed by it with a crescent shape. By treating the sun as an isotropic spherical emitter, the moon as a diffuse occluding sphere, and use a tree model with detailed alpha-masked leaf texture, I got results that are surprisingly good and also fast to compute. In such a simple way, I created a nice image with a glass Utah teapot sitting under a tree on a lawn behind the Warnock Engineering Building with crescent-shaped leaf shadows at the background.

alt text

The render. Click to enlarge.

alt text

See the "sun" and "moon"? This is really how it works!

alt text

a close-up showing the crescent shape by eclipse

Hope you like this project. Sometimes complex effects are really that simple!

Remark:

The tree model and the grass model (include their texture) are downloaded from TurboSquird.com
Grass: https://www.turbosquid.com/FullPreview/Index.cfm/ID/868103
Tree: https://www.turbosquid.com/FullPreview/Index.cfm/ID/884484

TurboSquid Royalty Free License
https://blog.turbosquid.com/royalty-free-license/

The pavement texture is downloaded from TextureLib.com
http://texturelib.com/texture/?path=/Textures/brick/pavement/brick_pavement_0099

License: http://texturelib.com/license/

The environment map is a paranoma taking in the vicinity of Warnock Engineering Building. Taken by Cameron Porcaro, uploaded to Google street view. By Google’s Terms of use it is considered a fair use since it is not for commerical use.

...


[GAPT Series - 13] Conclusions

...


[GAPT Series - 12] Benchmarking

...


[GAPT Series - 11] SIMD Optimization (cont.)

As mentioned half a year ago, apart from data structure rearrangement, thread divergence reduction, we can also optimize the SIMD performance by doing thread compaction. The first section below will introduce how I figure it out using the CUDA Thrust API, followed by a proposition of a new method for parallel construction of kd-tree on GPU.

...


[GAPT Series - 10] Rendering Effects

Before going to the discussion of SIMD optimization, we present this chapter to briefly introduce the rendering effects supported by the path tracer, the importance of which comes from the fact that it is the direct application of the sampling methods discussed before.

...


[GAPT Series - 9] Sampling Algorithms (Cont.)

This post corrects some misconception in the former section [GAPT Series - 3] Path Tracing Algorithm as well as introduces some new rendering methods.

...


[GAPT Series - 8] Spatial Acceleration Structure (Cont.)

Now we’ve added BVH as an alternative choice of SAS! It is necessary for real-time ray tracing against dynamic scene geometry with complex moving meshes as it maintains the interior hierarchy of the mesh and only updates exterior hierarchy which is usually much simpler to do.

...


[GAPT Series - 7] Overview of Software Workflow

...


[GAPT Series - 6] Real-time Path Tracing on GPU - Introduction

Hello, it’s been half a year since last update and now I’m back! I’ve finally finished the project and renamed it to Real-time Path Tracing on GPU. As I told you before, the second part will try to push the boundary of GPU accelerated path tracing to approach real-time performance. I want to clarify here that by “real-time” I don’t mean that you can get same image quality of mainstream game graphics with a speed comparative with rasterized graphics using a ray-tracing technique. However, with proper constraints (limitation of BRDF types, resolution and model complexity) you can actually get a pretty close real-time performance which must be implemented with tons of texture tricks in a rasterization setting (e.g. the 512x512 Cornell box, as I will show below). Besides introducing optimizations I’ve used for such a great leap in speed, I will analyze the gap between current performance and the ideal performance we want to have in future and try provide some suggestions what we can do for the improvement of this technique.

...


[GAPT Series - 5] Current Progress & Research Plan

Currently, I have implemented a Monte Carlo path tracer (demo) with full range of surface-to-surface lighting effects including specular reflection on anisotropic material simulated by GGX-based Ward model. A scene definition text file is read from the user, whose format is modified from the popular Wavefront OBJ format by adding material description and camera parameters. I use triangle as the only primitive due to simplicity and generality. Integrated with OpenGL and using a successive refinement method, the path tracer can display the rendering result in real time. Optimization methods include algorithm-based methods: SAH based kd-tree, short stack kd-tree traversal, ray-triangle intersection in “unit triangle space”, next event estimation (explicit light sampling); and hardware-based methods: adoption of GPU-friendly data structure which has a more coalesced memory access and better cache use, reduction of thread divergence which boosts warp efficiency, etc.

...


[GAPT Series - 4] SIMD Optimization

With each thread rendering a screen pixel, the problem of path tracing can be solved in an embarrassingly parallel way, without the need of inter-thread communication. However, it is hard to exploit the full capability of single-instruction-multiple-data (SIMD). There is very little locality in the memory access pattern due to generally inconsistent scene geometry, which means almost all scene data needs to be stored in global memory or texture memory. Even when the first ray hit has congruent pattern, the consequent bounces can be as divergent as possible. Moreover, sampling by Russian roulette method cannot avoid branching, which implies thread divergence. However, two types of optimization based on CUDA architecture – data structure rearrangement and thread divergence reduction can be achieved to reduce the overall rendering time.

...


[GAPT Series - 3] Path Tracing Algorithm

...


[GAPT Series - 2] Spatial Acceleration Structure

...


[GAPT Series - 1] Introduction to GPU Accelerated Path Tracing

This series records the progress of my final year paper - GPU Accelerated Path Tracing. It will be divided into 6 chapters.

  • 1 Introduction
  • 2 Spatial Acceleration Structure
  • 3 Path Tracing Algorithm
  • 4 SIMD Optimization
  • 5 Current Progress & Research Plan
...


[PBR Series - 6] Improvement of PBR

...


[PBR Series - 5] Real-time Skin Translucency

...


[PBR Series - 4] High Dynamic Range Imaging and PBR

...


[PBR Series - 3] Subsurface Scattering – Human Skin as Example

...


[PBR Series - 2] Image Based Lighting

...


[PBR Series - 1] Introduction to PBR

This series introduces you to the implementation of physically based rendering in light-weight game engines by GLSL shaders and precomputed textures. With our techniques, most phsically based surface-to-surface reflection can be simulated across a variety of material ranging from metals to dielectrics in HDR lighting and subsurface scattering can be simulated as well with human skin as an example, so that you can achieve high fidelity of game objects with fewest resources. Most of the contents are collected from my project report as an intern R&D engineer in a game company from May to Nov 2015. I want to share what I learned of, what I thought about and what I have done in this topic so that it makes easier for more people trying to dig into this topic. There are 6 episodes.

...


山西游记之一 (A Travel in Shanxi: Part 1)