cuda – Page 5 – Tarik Billa

allocating shared memory

June 4, 2023 by Tarik

CUDA supports dynamic shared memory allocation. If you define the kernel like this: __global__ void Kernel(const int count) { extern __shared__ int a[]; } and then pass the number of bytes required as the the third argument of the kernel launch Kernel<<< gridDim, blockDim, a_size >>>(count) then it can be sized at run time. Be … Read more

What does #pragma unroll do exactly? Does it affect the number of threads?

June 3, 2023 by Tarik

No. It means you have called a CUDA kernel with one block and that one block has 100 active threads. You’re passing size as the second function parameter to your kernel. In your kernel each of those 100 threads executes the for loop 100 times. #pragma unroll is a compiler optimization that can, for example, … Read more

Why does cudaMalloc() use pointer to pointer?

June 1, 2023 by Tarik

In C, data can be passed to functions by value or via simulated pass-by-reference (i.e. by a pointer to the data). By value is a one-way methodology, by pointer allows for two-way data flow between the function and its calling environment. When a data item is passed to a function via the function parameter list, … Read more

What’s the difference between CUDA shared and global memory?

May 27, 2023 by Tarik

When we use cudaMalloc() In order to store data on the gpu that can be communicated back to the host, we need to have alocated memory that lives until it is freed, see global memory as the heap space with life until the application closes or is freed, it is visible to any thread and … Read more

CUDA Driver API vs. CUDA runtime

May 27, 2023 by Tarik

The CUDA runtime makes it possible to compile and link your CUDA kernels into executables. This means that you don’t have to distribute cubin files with your application, or deal with loading them through the driver API. As you have noted, it is generally easier to use. In contrast, the driver API is harder to … Read more

High level GPU programming in C++ [closed]

May 20, 2023 by Tarik

There are many high-level libraries dedicated to GPGPU programming. Since they rely on CUDA and/or OpenCL, they have to be chosen wisely (a CUDA-based program will not run on AMD’s GPUs, unless it goes through a pre-processing step with projects such as gpuocelot). CUDA You can find some examples of CUDA libraries on the NVIDIA … Read more

CUDA: How to use -arch and -code and SM vs COMPUTE

May 18, 2023 by Tarik

Some related questions/answers are here and here. I am still not sure how to properly specify the architectures for code generation when building with nvcc. A complete description is somewhat complicated, but there are intended to be relatively simple, easy-to-remember canonical usages. Compile for the architecture (both virtual and real), that represents the GPUs you … Read more

ImportError: libcublas.so.9.0: cannot open shared object file

May 17, 2023 by Tarik

You will need to update your LD_LIBRARY_PATH, so that it points to the /usr/local/cuda-9.0/lib64. Add the following line to your .bashrc file (or any other terminal you use) export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-9.0/lib64/

CUDA model – what is warp size?

May 5, 2023 by Tarik

Direct Answer: Warp size is the number of threads in a warp, which is a sub-division used in the hardware implementation to coalesce memory access and instruction dispatch. Suggested Reading: As @Matias mentioned, I’d go read the CUDA C Best Practices Guide (you’ll have to scroll to the bottom where it’s listed). It might help … Read more

Running more than one CUDA applications on one GPU

May 3, 2023 by Tarik

CUDA activity from independent host processes will normally create independent CUDA contexts, one for each process. Thus, the CUDA activity launched from separate host processes will take place in separate CUDA contexts, on the same device. CUDA activity in separate contexts will be serialized. The GPU will execute the activity from one process, and when … Read more