gpu – Page 2 – Tarik Billa

When is CUDA’s shared memory useful?

December 3, 2023 by Tarik

In the specific case you mention, shared memory is not useful, for the following reason: each data element is used only once. For shared memory to be useful, you must use data transferred to shared memory several times, using good access patterns, to have it help. The reason for this is simple: just reading from … Read more

Fastest SVM implementation usable in Python [closed]

December 1, 2023 by Tarik

The most scalable kernel SVM implementation I know of is LaSVM. It’s written in C hence wrap-able in Python if you know Cython, ctypes or cffi. Alternatively you can use it from the command line. You can use the utilities in sklearn.datasets to load convert data from a NumPy or CSR format into svmlight formatted … Read more

Cuda 12 + tf-nightly 2.12: Could not find cuda drivers on your machine, GPU will not be used, while every checking is fine and in torch it works

November 26, 2023 by Tarik

I think that, as of March 2023, the only tensorflow distribution for cuda 12 is the docker package from NVIDIA. A tf package for cuda 12 should show the following info >>> tf.sysconfig.get_build_info() OrderedDict([(‘cpu_compiler’, ‘/usr/bin/x86_64-linux-gnu-gcc-11’), (‘cuda_compute_capabilities’, [‘compute_86’]), (‘cuda_version’, ‘12.0’), (‘cudnn_version’, ‘8’), (‘is_cuda_build’, True), (‘is_rocm_build’, False), (‘is_tensorrt_build’, True)]) But if we run tf.sysconfig.get_build_info() on any tensorflow … Read more

printf inside CUDA global function

September 25, 2023 by Tarik

CUDA now supports printfs directly in the kernel. For formal description see Appendix B.16 of the CUDA C Programming Guide.

WKWebView crashes in acceleratedAnimationDidStart

September 24, 2023 by Tarik

You should create the webView in the main thread – (void)createWebView{ if (![NSThread isMainThread]) { dispatch_async(dispatch_get_main_queue(), ^{ [self createWebView]; }); return; } self.webView = [[UIWebView alloc] initWithFrame:CGRectMake(0, 0, 320, 320)]; //Rest of my code }

Cannot dlopen some GPU libraries. Skipping registering GPU devices

September 14, 2023 by Tarik

Judging from your logs it looks like tensorflow finds the correct cuda version but the cudnn library is missing. 2020-02-13 14:11:31.474361: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0 2020-02-13 14:11:31.526168: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library ‘libcudnn.so.7’; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.2/lib64 Have you installed the … Read more

How to use GPU for mathematics [closed]

September 3, 2023 by Tarik

I haven’t done it from C#, but basically you use the CUDA (assuming you’re using an nVidia card here, of course) SDK and CUDA toolkit to pull it off. nVidia has ported (or written?) a BLAS implementation for use on CUDA-capable devices. They’ve provided plenty of examples for how to do number crunching, although you’ll … Read more

Setting up Visual Studio Intellisense for CUDA kernel calls

August 29, 2023 by Tarik

Wow, lots of dust on this thread. I came up with a macro fix (well, more like workaround…) for this that I thought I would share: // nvcc does not seem to like variadic macros, so we have to define // one for each kernel parameter list: #ifdef __CUDACC__ #define KERNEL_ARGS2(grid, block) <<< grid, block … Read more

What is actually a Queue family in Vulkan?

August 13, 2023 by Tarik

To understand queue families, you first have to understand queues. A queue is something you submit command buffers to, and command buffers submitted to a queue are executed in order[*1] relative to each other. Command buffers submitted to different queues are unordered relative to each other unless you explicitly synchronize them with VkSemaphore. You can … Read more

How does CUDA assign device IDs to GPUs?

August 6, 2023 by Tarik

Set the environment variable CUDA_DEVICE_ORDER as: export CUDA_DEVICE_ORDER=PCI_BUS_ID Then the GPU IDs will be ordered by pci bus IDs.