CUDA: How many concurrent threads in total?

The GTX 580 can have 16 * 48 concurrent warps (32 threads each) running at a time. That is 16 multiprocessors (SMs) * 48 resident warps per SM * 32 threads per warp = 24,576 threads. Don’t confuse concurrency and throughput. The number above is the maximum number of threads whose resources can be stored … Read more

Use of cudamalloc(). Why the double pointer?

All CUDA API functions return an error code (or cudaSuccess if no error occured). All other parameters are passed by reference. However, in plain C you cannot have references, that’s why you have to pass an address of the variable that you want the return information to be stored. Since you are returning a pointer, … Read more

Fortran vs C++, does Fortran still hold any advantage in numerical analysis these days? [closed]

Fortran has strict aliasing semantics compared to C++ and has been aggressively tuned for numerical performance for decades. Algorithms that uses the CPU to work with arrays of data often have the potential to benefit from a Fortran implementation. The programming languages shootout should not be taken too seriously, but of the 15 benchmarks, Fortran … Read more

Compression library using Nvidia’s CUDA [closed]

We have finished first phase of research to increase performance of lossless data compression algorithms. Bzip2 was chosen for the prototype, our team optimized only one operation – Burrows–Wheeler transformation, and we got some results: 2x-4x speed up on good compressible files. The code works faster on all our tests. We are going to complete … Read more

Structure of Arrays vs Array of Structures

Choice of AoS versus SoA for optimum performance usually depends on access pattern. This is not just limited to CUDA however – similar considerations apply for any architecture where performance can be significantly affected by memory access pattern, e.g. where you have caches or where performance is better with contiguous memory access (e.g. coalesced memory … Read more

How to let cmake find CUDA

cmake mentioned CUDA_TOOLKIT_ROOT_DIR as cmake variable, not environment one. That’s why it does not work when you put it into .bashrc. If you look into FindCUDA.cmake it clearly says that: The script will prompt the user to specify CUDA_TOOLKIT_ROOT_DIR if the prefix cannot be determined by the location of nvcc in the system path and … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)