Mittwoch, 27. Mai 2015

Sparse Voxel Octree Raytracing based Occlusion Culling (Theory)

I will share some thoughts on how the next generation of block based sandbox games rendering (triangle based) and creators pipelines might look like. In current games, overdraw is the most limiting factor when drawing high detailed scenes - that means, occlusions are not determined 100% accurate. Further, it is necessary to traverse some data structures on the CPU to create a list of objects to be rendered on the screen.

To improve on this, here the proposal for a completely GPU based solution that uses octree raycasting to determine the visibility.

The first step is to raycast a low res, lets say 128x128 pixel, ID buffer, which takes about 1ms from experiments. This is fast and only used to determine the visibility. In a second step, we create a list of visible objects and further merge with the list of objects that were visible in the past few frames. The list is generated as input buffer for glmultidrawelementsindirect. The steps until now are performed on the GPU using multiple CL or GL compute kernels. The final step is to draw all object with one GL call. The raycasting and grouping of duplicate IDs on the screen can be performed best in opengl compute or opencl. To not miss out distant objects only occupying a few pixels, it is necessary to add jittering to the rays casted. The proposed technology will allow the use of high resolution blocks for sandbox games while the performance is high enough to run on notebooks at high frame rates.

The creation pipeline is as usual:

First, sculpting / creating of the mesh as hi-res, then creating LODs+Normal maps using mesh simplification and transferring the details to a normal map.

Actually, there is already a similar method out there for a while called instant OC. It is not that effeciently integrated however. Its a script addon to Unity - yet, is uses raycasts to determine the visibility (1000-2000 have been the demo settings). 

Donnerstag, 21. Mai 2015

Sparse Voxel Octree (SVO) Reprojection Raycast Tech-DEMO

I decided to release a tech demo of the sparse voxel octree raytracing re-projection algorithm. Feel free to download and try it out. It uses OpenCL and should run on NVIDIA in any case. For ATI and INTEL HD I cannot guarantee. You might further need VS2012 redistributables 32 bit.


Mouse : Look around
w,s,a,d,q,e : Move around
Space : See the re-projected pixels (lower image)

The algorithm raycasts only a fraction of pixels for each screen. Most are re-used from the previous frame exploiting frame-to-frame coherency, which allows a speed up of up to 4x. The algorithm is explained in more detail in a previous post.

Sonntag, 12. April 2015

Work in Progress on Outstar (SSAO / Bugfixes)

After a year, I finally found time to continue the development. Here a shot from a castle I built in about 1 day, including sculpting of some new blocks. Note that there are also rooms behind windows where you can walk around inside The new version also has a lot of bugfixes and now is getting closer to be ready for a demo release. SSAO (the lowermost image) is in progress, but not yet to my satisfaction. The complete castle's octree ist stored in 2.1MB on Hard disk. Title of the game will be Outstar. You can soon check out . (The 20 fps in the screenshots is because I captured them on my notebook rather desktop PC)

Donnerstag, 22. Mai 2014

Voxel Engine Update - 18 Level Octree & Copy Paste in Action

Copy and Paste works well now. Areas of voxels can be grouped to new entities for placing them in the scene. The size of the entire voxel data of the screenshot below on disk is just 1.3MB, which is quite OK. Thanks to my new graphics card, voxel raycasting and terrain rendering also runs smoothly at 60+ fps at full HD resolution now. Also worth a note: The raycasting method is updated to allow octrees greater than 16 levels - the scene below is created in an 18 level octree. Now, problems with precision begin to arise, which need to be solved.

Dienstag, 13. Mai 2014

Quadric Mesh Simplification with Source Code

In the past days I have written a quadric based mesh simplification program. After searching the internet I couldnt find any code that was free to use, not unnecessarily bloated, fast and memory efficient, even the quadric based method is soon 20 years old. I therefore decided to write one myself.

Features / Summary:
  • Threshold based, therefore faster than sorting based methods
  • Since the Quadric Matrices are symmetric, only 10 elements are stored & computed per Matrix instead of 16
  • Non-closed meshes are supported by extra treating mesh borders
  • Simplifies 2.000.000 triangles to 20.000 triangles in 3 seconds on a Core i7
  • MIT License
  • MS Visual Studio 2012 , C++
Update Sept.20th 2014 : improved quality of reduced borders

The result is short and easy to use in case you need to adopt it to your project. You can fetch the C++ Project with source here: (about 300 lines for the main part, contained in Simplify.h )

Download Source and Data

The code is about 4x-7x faster than Meshlab, which is already fast. Using multi-core programming, it could even be faster.

Here a comparison along Meshlab, QSlim and this method:

Program output (left) and Meshlab (right). Note that Meshlab produces floating teeth and looses details around the eyes, nose and the mouth region. Reduction was 85k -> 3k Triangles.

Original (left) this code (middle) and meshlab (right)

Here another comparison: Program output (left) and Meshlab (right).

Samstag, 3. Mai 2014

Raycaster Speed-Up up to 400% by Image Warping (ReProjection)

Introduction: Since real-time raytracing is getting faster like with the Brigade Raytracer e.g., I believe this technology can be an important contribution to this area, as it might bring raytracing one step closer to being usable for video games.

Algorithm: A technology I am working on since a while now is to exploit temporal coherence between two consecutive rendered images to speed up ray-casting. The idea is to store the x- y- and z-coordinate for each pixel in the scene in a coordinate-buffer and re-project it into the following screen using the differential view matrix. The resulting image will look as Fig.1.

The method then gathers empty 2x2 pixel blocks on the screen and stores them into an indexbuffer for raycasting the holes. Raycasting single pixels too inefficient. Small holes remaining after the hole-filling pass are closed by a simple image filter. To improve the overall quality, the method updates the screen in tiles (8x4) by raycasting an entire tile and overwriting the cache. Doing so, the entire cache is refreshed after 32 frames. Further, a triple buffer system is used. That means two image caches which are copied to alternately and one buffer that is written to. This is done since it often happens that a pixel is overwritten in one frame, but becomes visible already in the next frame. Therefore, before the hole filling starts, the two cache buffers are projected to the main image buffer.

Limitations: The method also comes with limitations of course. So the speed up depends on the motion in the scene obviously, and the method is only suitable for primary rays and pixel properties that remain constant over multiple frames, such as static ambient lighting. Further, during fast motions, the silhouettes of geometry close to the camera tends to loose precision and geometry in the background will not move as smooth as if the scene is fully raytraced each time. There, future work might include creating suitable image filters to avoid these effects.

Results: Most of the pixels can be re-used using this technology. As only a fraction of the original needs to be raycasted, the speed up is significant and up to 5x the original speed, depending on the scene (see Fig.2 - Fig.4). Resolution for that test was 1024x768, the GPU was an NVIDIA GeForce GTX765M.

Here also two videos showing this technology in action: Video1 Video2 
(I uploaded them a while ago)

Finally a few papers for further reading:
Exploiting Temporal Coherence in Ray Casted Walkthroughs
Iterative Image Warping
A Shared-Scene-Graph Image-Warping Architecture for VR: Low Latency versus Image Quality
Three-Dimensional Image Warping on Programmable Graphics Hardware
Accelerating Real-Time Shading with Reverse Reprojection Caching

Fig.1 Result after basic re-projection

Fig.2 Original Version

Fig.3 With Re-Projection Enabled + In Motion

Fig.4 With Re-Projection Enabled + Standing

Mittwoch, 30. April 2014

Polygon Rendering vs. Voxel Raycasting

Since I started the project I have always wondered if it wouldnt be better to use polygons/triangles rather than voxels. To verify this decision, I created a reasonably complex test scene from multiple instances of the same block, and rendered the scene once with triangles using Display Lists & LOD and second using voxel raycasting (LOD inherently). The triangle block is created by exporting the voxels as polygon mesh, and then using Meshlab to reduce the triangles and create multiple LOD levels.

It turns out that triangles are much faster for a small to a medium number of instances (its about equal for 10x10x10 = 1000 instances) - but when having a large amount of 8000 instances (32 Million Triangles), 40x5x40 blocks, voxel raycasting is significantly faster ( 40 vs 13 fps ).

For triangles, I havent used any sophisticated occlusion culling. It would therefore be interesting to see how much more performance could be achieved using the voxel based umbra or the software based rasterization presented by intel.

For the ones of you that would like to experiment with occlusion culling, the source & data of the polygon benchmark can be downloaded [here].

Raycasting Outside
Raycasting inside

Triangles Outside
Triangles inside