cuda – Page 6 – Tarik Billa

How to install CUDA in Google Colab GPU’s

May 1, 2023 by Tarik

Cuda is not showing on your notebook because you have not enabled GPU in Colab. The Google Colab comes with both options GPU or without GPU. You can enable or disable GPU in runtime settings Go to Menu > Runtime > Change runtime. Change hardware acceleration to GPU. To check if GPU is running or … Read more

Using std::vector in CUDA device code

April 29, 2023 by Tarik

You can’t use the STL in CUDA, but you may be able to use the Thrust library to do what you want. Otherwise just copy the contents of the vector to the device and operate on it normally.

CUDA: How many concurrent threads in total?

April 28, 2023 by Tarik

The GTX 580 can have 16 * 48 concurrent warps (32 threads each) running at a time. That is 16 multiprocessors (SMs) * 48 resident warps per SM * 32 threads per warp = 24,576 threads. Don’t confuse concurrency and throughput. The number above is the maximum number of threads whose resources can be stored … Read more

Python GPU programming [closed]

April 21, 2023 by Tarik

PyCUDA provides very good integration with CUDA and has several helper interfaces to make writing CUDA code easier than in the straight C api. Here is an example from the Wiki which does a 2D FFT without needing any C code at all.

Use of cudamalloc(). Why the double pointer?

April 20, 2023 by Tarik

All CUDA API functions return an error code (or cudaSuccess if no error occured). All other parameters are passed by reference. However, in plain C you cannot have references, that’s why you have to pass an address of the variable that you want the return information to be stored. Since you are returning a pointer, … Read more

Fortran vs C++, does Fortran still hold any advantage in numerical analysis these days? [closed]

April 12, 2023 by Tarik

Fortran has strict aliasing semantics compared to C++ and has been aggressively tuned for numerical performance for decades. Algorithms that uses the CPU to work with arrays of data often have the potential to benefit from a Fortran implementation. The programming languages shootout should not be taken too seriously, but of the 15 benchmarks, Fortran … Read more

Compression library using Nvidia’s CUDA [closed]

April 11, 2023 by Tarik

We have finished first phase of research to increase performance of lossless data compression algorithms. Bzip2 was chosen for the prototype, our team optimized only one operation – Burrows–Wheeler transformation, and we got some results: 2x-4x speed up on good compressible files. The code works faster on all our tests. We are going to complete … Read more

Structure of Arrays vs Array of Structures

April 8, 2023 by Tarik

Choice of AoS versus SoA for optimum performance usually depends on access pattern. This is not just limited to CUDA however – similar considerations apply for any architecture where performance can be significantly affected by memory access pattern, e.g. where you have caches or where performance is better with contiguous memory access (e.g. coalesced memory … Read more

How can I compile CUDA code then link it to a C++ project?

April 8, 2023 by Tarik

I was able to resolve my issue with a couple of different posts including these ones. Don’t forget that if you are using a 64 bit machine to link to the 64 bit library! It seams kind of obvious, but for clowns like me, that is something I forgot. Here is the make file that … Read more

How to let cmake find CUDA

April 4, 2023 by Tarik

cmake mentioned CUDA_TOOLKIT_ROOT_DIR as cmake variable, not environment one. That’s why it does not work when you put it into .bashrc. If you look into FindCUDA.cmake it clearly says that: The script will prompt the user to specify CUDA_TOOLKIT_ROOT_DIR if the prefix cannot be determined by the location of nvcc in the system path and … Read more