When is CUDA’s __shared__ memory useful?

In the specific case you mention, shared memory is not useful, for the following reason: each data element is used only once. For shared memory to be useful, you must use data transferred to shared memory several times, using good access patterns, to have it help. The reason for this is simple: just reading from … Read more

how does one fix when torch can’t find cuda, error: version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference?

Like eval said, it is because pytorch1.13 automatically install nvidia_cublas_cu11, nvidia_cuda_nvrtc_cu11, nvidia_cuda_runtime_cu11 and nvidia_cudnn_cu11. While I have my own CUDA toolKit already installed, I have the same problem. In my case, I used pip uninstall nvidia_cublas_cu11 and solved the problem. I think the PyTorch team should solve this issue, since users often have their own … Read more

Nvcc missing when installing cudatoolkit?

Met this question when installing cudatoolkit of 10.1 with PyTorch 1.4. There is a conda-forge package https://anaconda.org/conda-forge/cudatoolkit-dev. After installing this, nvcc as well as other CUDA libraries will be then available at /home/li/anaconda3/envs/<env_name>/pkgs/cuda-toolkit in bin/ and lib/.

cudaStreamSynchronize vs CudaDeviceSynchronize vs cudaThreadSynchronize

These are all barriers. Barriers prevent code execution beyond the barrier until some condition is met. cudaDeviceSynchronize() halts execution in the CPU/host thread (that the cudaDeviceSynchronize was issued in) until the GPU has finished processing all previously requested cuda tasks (kernels, data copies, etc.) cudaThreadSynchronize() as you’ve discovered, is just a deprecated version of cudaDeviceSynchronize. … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)