Today I would like to share a couple of interesting references about optimizing CUDA. There are many similariries among these presentations, but still its interesting as reading through give you new ideas about whats possible.
1.) Optimization Techniques for Large Data Structures on CUDA
2.) AstroGPU - CUDA Optimization Part I
3.) AstroGPU - CUDA Optimization Part II
4.) CUDA Programming Notes
5.) NVISION08: Advanced CUDA: Optimizing to Get 20x Performance
6.) Top 5 Optimization Strategies for CUDA
7.) CUDA at MIT - IAP2009
Looking at foil 3 of the first presentation, using the GPU should give an average speedup of factor 10 compared to the CPU in case the algorithm can be fully SIMD parallized. ( GPU: GTX280, 933GFlops/141.7 GB/s Mem, CPU: Intel Core 2 QX9650, 96 GFlops/12.8 GB/s Mem).
Now looking at NVidias CUDA page, I am often surprised to see that some algorithms seem to have been sped up like 100x or even more, compared to CPU - this seems to be rather hard to believe, taking the numbers above into account.
Abonnieren
Kommentare zum Post (Atom)
well...don't tell me why...but I had better fps ratings while using your demos wich used only cpu than only gpu... :P
AntwortenLöschensystem is over clocked.
specs:
Intel core 2 quad 8400 2.66GHz running at 3.21GHz
Ram is 3GB ddr2 800MHz running at 961MHz 4-4-4-12.
FSB over clocked to 1608 MHz.
Video Card is Asus EN9500 GT 1GB ddr2 (sold with pre over clock) with 32 cores in parallel.
PS: anyway, I like what you did :)