CUDA: How to use -arch and -code and SM vs COMPUTE

Some related questions/answers are here and here. I am still not sure how to properly specify the architectures for code generation when building with nvcc. A complete description is somewhat complicated, but there are intended to be relatively simple, easy-to-remember canonical usages. Compile for the architecture (both virtual and real), that represents the GPUs you … Read more

CUDA model – what is warp size?

Direct Answer: Warp size is the number of threads in a warp, which is a sub-division used in the hardware implementation to coalesce memory access and instruction dispatch. Suggested Reading: As @Matias mentioned, I’d go read the CUDA C Best Practices Guide (you’ll have to scroll to the bottom where it’s listed). It might help … Read more

Running more than one CUDA applications on one GPU

CUDA activity from independent host processes will normally create independent CUDA contexts, one for each process. Thus, the CUDA activity launched from separate host processes will take place in separate CUDA contexts, on the same device. CUDA activity in separate contexts will be serialized. The GPU will execute the activity from one process, and when … Read more

CUDA: How many concurrent threads in total?

The GTX 580 can have 16 * 48 concurrent warps (32 threads each) running at a time. That is 16 multiprocessors (SMs) * 48 resident warps per SM * 32 threads per warp = 24,576 threads. Don’t confuse concurrency and throughput. The number above is the maximum number of threads whose resources can be stored … Read more

Compression library using Nvidia’s CUDA [closed]

We have finished first phase of research to increase performance of lossless data compression algorithms. Bzip2 was chosen for the prototype, our team optimized only one operation – Burrows–Wheeler transformation, and we got some results: 2x-4x speed up on good compressible files. The code works faster on all our tests. We are going to complete … Read more

Cuda gridDim and blockDim

blockDim.x,y,z gives the number of threads in a block, in the particular direction gridDim.x,y,z gives the number of blocks in a grid, in the particular direction blockDim.x * gridDim.x gives the number of threads in a grid (in the x direction, in this case) block and grid variables can be 1, 2, or 3 dimensional. … Read more

Does __syncthreads() synchronize all threads in the grid?

The __syncthreads() command is a block level synchronization barrier. That means it is safe to be used when all threads in a block reach the barrier. It is also possible to use __syncthreads() in conditional code but only when all threads evaluate identically such code otherwise the execution is likely to hang or produce unintended … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)