cuda – Page 8 – Tarik Billa

CUDA determining threads per block, blocks per grid

March 2, 2023 by Tarik

In general you want to size your blocks/grid to match your data and simultaneously maximize occupancy, that is, how many threads are active at one time. The major factors influencing occupancy are shared memory usage, register usage, and thread block size. A CUDA enabled GPU has its processing capability split up into SMs (streaming multiprocessors), … Read more

Error Message : Cannot find or open the PDB file

February 25, 2023 by Tarik

The PDB file is a Visual Studio specific file that has the debugging symbols for your project. You can ignore those messages, unless you’re hoping to step into the code for those dlls with the debugger (which is doubtful, as those are system dlls). In other words, you can and should ignore them, as you … Read more

GPU Programming, CUDA or OpenCL? [closed]

February 16, 2023 by Tarik

If you use OpenCL, you can easily use it both on Windows and Linux because having display drivers is enough to run OpenCL programs and for programming you would simply need to install the SDK. CUDA has more requirements on specific GCC versions etc. But it is not much more difficult to install on Linux … Read more

Passing pointers between C and Java through JNI

February 12, 2023 by Tarik

I used the following approach: in your JNI code, create a struct that would hold references to objects you need. When you first create this struct, return its pointer to java as a long. Then, from java you just call any method with this long as a parameter, and in C cast it to a … Read more

When to call cudaDeviceSynchronize?

January 30, 2023 by Tarik

Although CUDA kernel launches are asynchronous, all GPU-related tasks placed in one stream (which is the default behavior) are executed sequentially. So, for example, kernel1<<<X,Y>>>(…); // kernel start execution, CPU continues to next statement kernel2<<<X,Y>>>(…); // kernel is placed in queue and will start after kernel1 finishes, CPU continues to next statement cudaMemcpy(…); // CPU … Read more

How can I flush GPU memory using CUDA (physical reset is unavailable)

January 27, 2023 by Tarik

check what is using your GPU memory with sudo fuser -v /dev/nvidia* Your output will look something like this: USER PID ACCESS COMMAND /dev/nvidia0: root 1256 F…m Xorg username 2057 F…m compiz username 2759 F…m chrome username 2777 F…m chrome username 20450 F…m python username 20699 F…m python Then kill the PID that you no … Read more

LNK2038: mismatch detected for ‘RuntimeLibrary’: value ‘MT_StaticRelease’ doesn’t match value ‘MD_DynamicRelease’ in file.obj

January 25, 2023 by Tarik

This error can occur when you are statically linking your project with a library (typically a file with .lib extension) but the linker setting in your Visual Studio project are set to dynamically link (meaning the link will occur during runtime, usually with a .dll file). To define that you need the project to use … Read more

Why is CUDA pinned memory so fast?

January 21, 2023 by Tarik

CUDA Driver checks, if the memory range is locked or not and then it will use a different codepath. Locked memory is stored in the physical memory (RAM), so device can fetch it w/o help from CPU (DMA, aka Async copy; device only need list of physical pages). Not-locked memory can generate a page fault … Read more

In CUDA, what is memory coalescing, and how is it achieved?

January 18, 2023 by Tarik

It’s likely that this information applies only to compute capabality 1.x, or cuda 2.0. More recent architectures and cuda 3.0 have more sophisticated global memory access and in fact “coalesced global loads” are not even profiled for these chips. Also, this logic can be applied to shared memory to avoid bank conflicts. A coalesced memory … Read more

Best approach for GPGPU/CUDA/OpenCL in Java?

January 15, 2023 by Tarik

AFAIK, JavaCL / OpenCL4Java is the only OpenCL binding that is available on all platforms right now (including MacOS X, FreeBSD, Linux, Windows, Solaris, all in Intel 32, 64 bits and ppc variants, thanks to its use of JNA). It has demos that actually run fine from Java Web Start at least on Mac and … Read more