gpgpu
CUDA apps time out & fail after several seconds – how to work around this?
I’m not a CUDA expert, — I’ve been developing with the AMD Stream SDK, which AFAIK is roughly comparable. You can disable the Windows watchdog timer, but that is highly not recommended, for reasons that should be obvious. To disable it, you need to regedit HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Watchdog\Display\DisableBugCheck, create a REG_DWORD and set it to 1. You … Read more
OpenCL vs OpenMP performance [closed]
The benchmarks I’ve seen indicate that OpenCL and OpenMP running on the same hardware are usually comparable in performance, or OpenMP has slightly better performance. However, I haven’t seen any benchmarks that I would consider conclusive, because they’ve been mostly lacking in detailed explanations of their methodology. However, there are a few useful things to … Read more
What is the current status of C++ AMP
What leads me to this thought is that even the MS C++AMP blogs have been silent for about a year. Looking at the C++ AMP algorithms library http://ampalgorithms.codeplex.com/wikipage/history it seems nothing at all has happened for over a year. I used to work on the C++AMP algorithms library. After the initial release, which Microsoft put … Read more
Should I unify two similar kernels with an ‘if’ statement, risking performance loss?
You have a third alternative, which is to use C++ templating and make the variable which is used in the if/switch statement a template parameter. Instantiate each version of the kernel you need, and then you have multiple kernels doing different things with no branch divergence or conditional evaluation to worry about, because the compiler … Read more
GPGPU vs. Multicore?
Interesting question. I have researched this very problem so my answer is based on some references and personal experiences. What types of problems are better suited to regular multicore and what types are better suited to GPGPU? Like @Jared mentioned. GPGPU are built for very regular throughput workloads, e.g., graphics, dense matrix-matrix multiply, simple photoshop … Read more
How does CUDA assign device IDs to GPUs?
Set the environment variable CUDA_DEVICE_ORDER as: export CUDA_DEVICE_ORDER=PCI_BUS_ID Then the GPU IDs will be ordered by pci bus IDs.
Choosing between GeForce or Quadro GPUs to do machine learning via TensorFlow
I think GeForce TITAN is great and is widely used in Machine Learning (ML). In ML, single precision is enough in most of cases. More detail on the performance of the GTX line (currently GeForce 10) can be found in Wikipedia, here. Other sources around the web support this claim. Here is a quote from … Read more
CUDA Driver API vs. CUDA runtime
The CUDA runtime makes it possible to compile and link your CUDA kernels into executables. This means that you don’t have to distribute cubin files with your application, or deal with loading them through the driver API. As you have noted, it is generally easier to use. In contrast, the driver API is harder to … Read more
CUDA model – what is warp size?
Direct Answer: Warp size is the number of threads in a warp, which is a sub-division used in the hardware implementation to coalesce memory access and instruction dispatch. Suggested Reading: As @Matias mentioned, I’d go read the CUDA C Best Practices Guide (you’ll have to scroll to the bottom where it’s listed). It might help … Read more