cuda – Page 5 – Tarik Billa

What is a bank conflict? (Doing Cuda/OpenCL programming)

December 31, 2022 by Tarik

For nvidia (and amd for that matter) gpus the local memory is divided into memorybanks. Each bank can only address one dataset at a time, so if a halfwarp tries to load/store data from/to the same bank the access has to be serialized (this is a bank conflict). For gt200 gpus there are 16 banks … Read more

Is it possible to run CUDA on AMD GPUs?

December 30, 2022 by Tarik

Nope, you can’t use CUDA for that. CUDA is limited to NVIDIA hardware. OpenCL would be the best alternative. Khronos itself has a list of resources. As does the StreamComputing.eu website. For your AMD specific resources, you might want to have a look at AMD’s APP SDK page. Note that at this time there are … Read more

GPU Emulator for CUDA programming without the hardware [closed]

December 19, 2022 by Tarik

For those who are seeking the answer in 2016 (and even 2017) … Disclaimer I’ve failed to emulate GPU after all. It might be possible to use gpuocelot if you satisfy its list of dependencies. I’ve tried to get an emulator for BunsenLabs (Linux 3.16.0-4-686-pae #1 SMP Debian 3.16.7-ckt20-1+deb8u4 (2016-02-29) i686 GNU/Linux). I’ll tell you … Read more

How do I select which GPU to run a job on?

December 18, 2022 by Tarik

The problem was caused by not setting the CUDA_VISIBLE_DEVICES variable within the shell correctly. To specify CUDA device 1 for example, you would set the CUDA_VISIBLE_DEVICES using export CUDA_VISIBLE_DEVICES=1 or CUDA_VISIBLE_DEVICES=1 ./cuda_executable The former sets the variable for the life of the current shell, the latter only for the lifespan of that particular executable invocation. … Read more

Difference between global and device functions

November 29, 2022 by Tarik

Global functions are also called “kernels”. It’s the functions that you may call from the host side using CUDA kernel call semantics (<<<…>>>). Device functions can only be called from other device or global functions. __device__ functions cannot be called from host code.

How do CUDA blocks/warps/threads map onto CUDA cores?

November 11, 2022 by Tarik

Two of the best references are NVIDIA Fermi Compute Architecture Whitepaper GF104 Reviews I’ll try to answer each of your questions. The programmer divides work into threads, threads into thread blocks, and thread blocks into grids. The compute work distributor allocates thread blocks to Streaming Multiprocessors (SMs). Once a thread block is distributed to a … Read more

Understanding CUDA grid dimensions, block dimensions and threads organization (simple explanation) [closed]

October 26, 2022 by Tarik

Hardware If a GPU device has, for example, 4 multiprocessing units, and they can run 768 threads each: then at a given moment no more than 4*768 threads will be really running in parallel (if you planned more threads, they will be waiting their turn). Software threads are organized in blocks. A block is executed … Read more

A top-like utility for monitoring CUDA activity on a GPU

October 15, 2022 by Tarik

To get real-time insight on used resources, do: nvidia-smi -l 1 This will loop and call the view at every second. If you do not want to keep past traces of the looped call in the console history, you can also do: watch -n0.1 nvidia-smi Where 0.1 is the time interval, in seconds.

Using GPU from a docker container?

October 15, 2022 by Tarik

Regan’s answer is great, but it’s a bit out of date, since the correct way to do this is avoid the lxc execution context as Docker has dropped LXC as the default execution context as of docker 0.9. Instead it’s better to tell docker about the nvidia devices via the –device flag, and just use … Read more

How to verify CuDNN installation?

October 10, 2022 by Tarik

The installation of CuDNN is just copying some files. Hence to check if CuDNN is installed (and which version you have), you only need to check those files. Install CuDNN Step 1: Register an nvidia developer account and download cudnn here (about 80 MB). You might need nvcc –version to get your cuda version. Step … Read more