Donnerstag, 19. März 2009


Today I made a comparison of CPU vs. GPU, to see if it was really worth the work to write everything in CUDA rather than for CPU. [detaild pics] [-CPU-Demo-]

The oponents:
CPU: 3.0 Ghz Pentium D, 1GB vs.
GPU: NVidia GTX285, 1GB

In the first round the CPU seems to provide a good performance, compared to the GPU - the GPU is just 3x faster than the CPU.

In the second round however, the GPU already wins over CPU with a speed factor of 7.3 : 1.

In the third round the CPU now lost all ground and the GPU wins about 20:1 (47.5:2.4)

Finally it would be interesting to know why the GPU doesnt work linear at all. I dont have any idea why the framerate is not half if the computations are doubled or vice versa.


  1. Those are crazy numbers! Could you do a comparison of a quad-core CPU with 2 GB RAM vs a GTX 285 with 1 GB VRAM, to make the comparison a little more fair? Also, why are there no textures in the CPU version? Does texturing on CPU lower the framerate even more?

  2. That would be interesting to know. I dont have a real multicore CPU, but I uploaded the CPU Demo now, so if somebody has a decent CPU I'd be interested to see some benchmarks.

    I just updated the executables, so there are versions for single and for multicore CPUs. The scene is not identical to the GPU, but it gives a general impression for the scalability I think.

    Thats true - the CPU has much less workload than the GPU version; It is not 6DOF and also no texturing.

    Another way of comparing is to run voxelstein or load the voxelstein level in Ken's Voxlap engine. Ken has written everything by MMX optimized assembly code, so there the performance is much better as simple c compiled code, also its the full 6 DOF algorithm with texturing.

  3. on my dual core laptop (Core 2 Duo @ 2.50 Ghz with 4 GB RAM):

    single_multicore: > 30 fps

    repeated_multicore: about 10 fps

    repeated_double_multicore: about 5 fps

    it would be much easier if there was a fps counter though, now the numbers are rough guesses.

  4. Thank you for the benchmarks so far.
    As for the fps counter, it is displayed in textmode in the background DOS window. Its a bit inconvinient but was easier to implement.

  5. A friend of me just run the test on a overclocked quadcore with 4x 3Ghz. There, the performance is already much higher, like: 70/30/20.
    His performance in voxelstein is about 30 fps (1024x768) - but voxelstein only uses one core.

  6. my actual numbers on dual core 2.50 GHz were 31/12/7.

    On another note, did you think about implementing SSAO or baked GI?

  7. Yes - actually I implemented SSAO already for a revious voxel experiment with screen space normals. Maybe I can add it to see how it looks like. But just plain colors is also not bad I think, as I like the old Amiga/C64 style very much. I try to revive this in some way :-)

    For baked GI I dont have any plan yet - however it can be implemented very efficient as each voxel is unique and not a texture.

    Now I'll first shift the CUDA part back to CPU for a fairer competition of CPU and GPU.