Here a first trial to raycast perlin noise on the fly for achieving volumetric terrain rendering. In the demo, a 128^3 sized random volume data is used as a base for the scenes on the screenshots above.
By optimizing the empty-space skipping, it is possible to raycast reasonably large outdoor scenes at interactive framerates (20-40 fps) on a Nvidia GTX 260 GPU. The advantage of this kind of landscapes is, that they are extremely easy to handle and also that they are very memory friendly ( its just 128^3 rgba voxels = 8 MB of data ). Also can the performance easily adjusted for older graphics cards depending on the empty-space skipping configuration.
The Demo can be downloaded here: Perlin_Noise_Raycasting.zip Controls are w,s,a,d.
Donnerstag, 26. November 2009
Samstag, 14. November 2009
SVO-Voxel-Raycasting
Here some demos of my new sparse-voxel-octree (SVO) rayster.Technical details:
-Storage: ca. 100 bit/voxel
-Stack-based
-Uses a variant of persistent threads
Demo download: SVO-Demo-Cuda.2.3.7z
Montag, 22. Juni 2009
Tile-based memory layout
Dienstag, 31. März 2009
More Videos
Here two videos showing the Happy Buddha scene (1024x2048x1024).
High quality video here: Buddha avi [mirror]
The updated demo download from today (right side, first position in the links)
also includes the endless Buddha executable.
High quality video here: Buddha avi [mirror]
The updated demo download from today (right side, first position in the links)
also includes the endless Buddha executable.
Montag, 30. März 2009
Video
For the ones of you who cannot run the demo for some reason, I just captured a short video of it. You can watch it below in the window or download the larger version with better quality to see more details.
Landscape AVI [mirror]
Landscape AVI [mirror]
Samstag, 28. März 2009
CUDA optimizations II
Today I would like to share a couple of interesting references about optimizing CUDA. There are many similariries among these presentations, but still its interesting as reading through give you new ideas about whats possible.
1.) Optimization Techniques for Large Data Structures on CUDA
2.) AstroGPU - CUDA Optimization Part I
3.) AstroGPU - CUDA Optimization Part II
4.) CUDA Programming Notes
5.) NVISION08: Advanced CUDA: Optimizing to Get 20x Performance
6.) Top 5 Optimization Strategies for CUDA
7.) CUDA at MIT - IAP2009
Looking at foil 3 of the first presentation, using the GPU should give an average speedup of factor 10 compared to the CPU in case the algorithm can be fully SIMD parallized. ( GPU: GTX280, 933GFlops/141.7 GB/s Mem, CPU: Intel Core 2 QX9650, 96 GFlops/12.8 GB/s Mem).
Now looking at NVidias CUDA page, I am often surprised to see that some algorithms seem to have been sped up like 100x or even more, compared to CPU - this seems to be rather hard to believe, taking the numbers above into account.
1.) Optimization Techniques for Large Data Structures on CUDA
2.) AstroGPU - CUDA Optimization Part I
3.) AstroGPU - CUDA Optimization Part II
4.) CUDA Programming Notes
5.) NVISION08: Advanced CUDA: Optimizing to Get 20x Performance
6.) Top 5 Optimization Strategies for CUDA
7.) CUDA at MIT - IAP2009
Looking at foil 3 of the first presentation, using the GPU should give an average speedup of factor 10 compared to the CPU in case the algorithm can be fully SIMD parallized. ( GPU: GTX280, 933GFlops/141.7 GB/s Mem, CPU: Intel Core 2 QX9650, 96 GFlops/12.8 GB/s Mem).
Now looking at NVidias CUDA page, I am often surprised to see that some algorithms seem to have been sped up like 100x or even more, compared to CPU - this seems to be rather hard to believe, taking the numbers above into account.
Montag, 23. März 2009
New Benchmark Version
Today I ported the CUDA version to CPU (multicore), it is included in the updated Demo
[-Download-] (CUDA 2.1 Required - Driver version 181.20 or newer )
The first results so far are:
CPU (3Ghz PentiumD) - Single/Repeated/Repeated 2xAA: 3/1.2/0.6 fps
CPU (Intel Core2 Quad Q6600, 4x 3Ghz) - Single/Repeated/Repeated 2xAA: 15/8/5 fps
GPU (8800GTS) - Single/Repeated/Repeated 2xAA: 33/24/17 fps
GPU (285GTX) - Single/Repeated/Repeated 2xAA: 44/34/36 fps
Scene is this time the complex version of the one shown in the pictures below
(spherescape_complex.rle4).
Reason for the low CPU performance is mostly due many floating point operations I guess. Changing the calculations to Integer might improve the speed. Now its the most possible fair comparison however, since CPU and GPU get the same c++ code to execute.
[-Download-] (CUDA 2.1 Required - Driver version 181.20 or newer )
The first results so far are:
CPU (3Ghz PentiumD) - Single/Repeated/Repeated 2xAA: 3/1.2/0.6 fps
CPU (Intel Core2 Quad Q6600, 4x 3Ghz) - Single/Repeated/Repeated 2xAA: 15/8/5 fps
GPU (8800GTS) - Single/Repeated/Repeated 2xAA: 33/24/17 fps
GPU (285GTX) - Single/Repeated/Repeated 2xAA: 44/34/36 fps
Scene is this time the complex version of the one shown in the pictures below
(spherescape_complex.rle4).
Reason for the low CPU performance is mostly due many floating point operations I guess. Changing the calculations to Integer might improve the speed. Now its the most possible fair comparison however, since CPU and GPU get the same c++ code to execute.
Abonnieren
Posts (Atom)
