When is CUDA’s __shared__ memory useful?

In the specific case you mention, shared memory is not useful, for the following reason: each data element is used only once. For shared memory to be useful, you must use data transferred to shared memory several times, using good access patterns, to have it help. The reason for this is simple: just reading from … Read more

Cuda 12 + tf-nightly 2.12: Could not find cuda drivers on your machine, GPU will not be used, while every checking is fine and in torch it works

I think that, as of March 2023, the only tensorflow distribution for cuda 12 is the docker package from NVIDIA. A tf package for cuda 12 should show the following info >>> tf.sysconfig.get_build_info() OrderedDict([(‘cpu_compiler’, ‘/usr/bin/x86_64-linux-gnu-gcc-11’), (‘cuda_compute_capabilities’, [‘compute_86’]), (‘cuda_version’, ‘12.0’), (‘cudnn_version’, ‘8’), (‘is_cuda_build’, True), (‘is_rocm_build’, False), (‘is_tensorrt_build’, True)]) But if we run tf.sysconfig.get_build_info() on any tensorflow … Read more

Cannot dlopen some GPU libraries. Skipping registering GPU devices

Judging from your logs it looks like tensorflow finds the correct cuda version but the cudnn library is missing. 2020-02-13 14:11:31.474361: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0 2020-02-13 14:11:31.526168: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library ‘libcudnn.so.7’; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.2/lib64 Have you installed the … Read more

How to use GPU for mathematics [closed]

I haven’t done it from C#, but basically you use the CUDA (assuming you’re using an nVidia card here, of course) SDK and CUDA toolkit to pull it off. nVidia has ported (or written?) a BLAS implementation for use on CUDA-capable devices. They’ve provided plenty of examples for how to do number crunching, although you’ll … Read more

Setting up Visual Studio Intellisense for CUDA kernel calls

Wow, lots of dust on this thread. I came up with a macro fix (well, more like workaround…) for this that I thought I would share: // nvcc does not seem to like variadic macros, so we have to define // one for each kernel parameter list: #ifdef __CUDACC__ #define KERNEL_ARGS2(grid, block) <<< grid, block … Read more

What is actually a Queue family in Vulkan?

To understand queue families, you first have to understand queues. A queue is something you submit command buffers to, and command buffers submitted to a queue are executed in order[*1] relative to each other. Command buffers submitted to different queues are unordered relative to each other unless you explicitly synchronize them with VkSemaphore. You can … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)