Donnerstag, 26. November 2009

Perlin Noise Terrain Raycasting

Here a first trial to raycast perlin noise on the fly for achieving volumetric terrain rendering. In the demo, a 128^3 sized random volume data is used as a base for the scenes on the screenshots above.

By optimizing the empty-space skipping, it is possible to raycast reasonably large outdoor scenes at interactive framerates (20-40 fps) on a Nvidia GTX 260 GPU. The advantage of this kind of landscapes is, that they are extremely easy to handle and also that they are very memory friendly ( its just 128^3 rgba voxels = 8 MB of data ). Also can the performance easily adjusted for older graphics cards depending on the empty-space skipping configuration.

The Demo can be downloaded here: Perlin_Noise_Raycasting.zip Controls are w,s,a,d.

Samstag, 14. November 2009

SVO-Voxel-Raycasting

Here some demos of my new sparse-voxel-octree (SVO) rayster.

Technical details:

-Storage: ca. 100 bit/voxel
-Stack-based
-Uses a variant of persistent threads


Demo download: SVO-Demo-Cuda.2.3.7z

Montag, 22. Juni 2009

Tile-based memory layout

After long time now another update. Next logical step in the development is to add a tile-based memory layout to allow large, unique, non-repeating landscapes. Here a first screenshot showing the tiles.

Dienstag, 31. März 2009

More Videos

Here two videos showing the Happy Buddha scene (1024x2048x1024).
High quality video here: Buddha avi [mirror]

The updated demo download from today (right side, first position in the links)
also includes the endless Buddha executable.



Montag, 30. März 2009

Video

For the ones of you who cannot run the demo for some reason, I just captured a short video of it. You can watch it below in the window or download the larger version with better quality to see more details.

Landscape AVI [mirror]

Samstag, 28. März 2009

CUDA optimizations II

Today I would like to share a couple of interesting references about optimizing CUDA. There are many similariries among these presentations, but still its interesting as reading through give you new ideas about whats possible.

1.) Optimization Techniques for Large Data Structures on CUDA
2.) AstroGPU - CUDA Optimization Part I
3.) AstroGPU - CUDA Optimization Part II
4.) CUDA Programming Notes
5.) NVISION08: Advanced CUDA: Optimizing to Get 20x Performance
6.) Top 5 Optimization Strategies for CUDA
7.) CUDA at MIT - IAP2009

Looking at foil 3 of the first presentation, using the GPU should give an average speedup of factor 10 compared to the CPU in case the algorithm can be fully SIMD parallized. ( GPU: GTX280, 933GFlops/141.7 GB/s Mem, CPU: Intel Core 2 QX9650, 96 GFlops/12.8 GB/s Mem).

Now looking at NVidias CUDA page, I am often surprised to see that some algorithms seem to have been sped up like 100x or even more, compared to CPU - this seems to be rather hard to believe, taking the numbers above into account.

Montag, 23. März 2009

New Benchmark Version

Today I ported the CUDA version to CPU (multicore), it is included in the updated Demo

[-Download-] (CUDA 2.1 Required - Driver version 181.20 or newer )

The first results so far are:

CPU (3Ghz PentiumD) - Single/Repeated/Repeated 2xAA: 3/1.2/0.6 fps
CPU (Intel Core2 Quad Q6600, 4x 3Ghz) - Single/Repeated/Repeated 2xAA: 15/8/5 fps
GPU (8800GTS) - Single/Repeated/Repeated 2xAA: 33/24/17 fps
GPU (285GTX) - Single/Repeated/Repeated 2xAA: 44/34/36 fps

Scene is this time the complex version of the one shown in the pictures below
(spherescape_complex.rle4).

Reason for the low CPU performance is mostly due many floating point operations I guess. Changing the calculations to Integer might improve the speed. Now its the most possible fair comparison however, since CPU and GPU get the same c++ code to execute.