CUDA apps time out & fail after several seconds – how to work around this?

I’m not a CUDA expert, — I’ve been developing with the AMD Stream SDK, which AFAIK is roughly comparable. You can disable the Windows watchdog timer, but that is highly not recommended, for reasons that should be obvious. To disable it, you need to regedit HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Watchdog\Display\DisableBugCheck, create a REG_DWORD and set it to 1. You … Read more

OpenCL vs OpenMP performance [closed]

The benchmarks I’ve seen indicate that OpenCL and OpenMP running on the same hardware are usually comparable in performance, or OpenMP has slightly better performance. However, I haven’t seen any benchmarks that I would consider conclusive, because they’ve been mostly lacking in detailed explanations of their methodology. However, there are a few useful things to … Read more

What is the current status of C++ AMP

What leads me to this thought is that even the MS C++AMP blogs have been silent for about a year. Looking at the C++ AMP algorithms library http://ampalgorithms.codeplex.com/wikipage/history it seems nothing at all has happened for over a year. I used to work on the C++AMP algorithms library. After the initial release, which Microsoft put … Read more

Should I unify two similar kernels with an ‘if’ statement, risking performance loss?

You have a third alternative, which is to use C++ templating and make the variable which is used in the if/switch statement a template parameter. Instantiate each version of the kernel you need, and then you have multiple kernels doing different things with no branch divergence or conditional evaluation to worry about, because the compiler … Read more

GPGPU vs. Multicore?

Interesting question. I have researched this very problem so my answer is based on some references and personal experiences. What types of problems are better suited to regular multicore and what types are better suited to GPGPU? Like @Jared mentioned. GPGPU are built for very regular throughput workloads, e.g., graphics, dense matrix-matrix multiply, simple photoshop … Read more

Choosing between GeForce or Quadro GPUs to do machine learning via TensorFlow

I think GeForce TITAN is great and is widely used in Machine Learning (ML). In ML, single precision is enough in most of cases. More detail on the performance of the GTX line (currently GeForce 10) can be found in Wikipedia, here. Other sources around the web support this claim. Here is a quote from … Read more

CUDA Driver API vs. CUDA runtime

The CUDA runtime makes it possible to compile and link your CUDA kernels into executables. This means that you don’t have to distribute cubin files with your application, or deal with loading them through the driver API. As you have noted, it is generally easier to use. In contrast, the driver API is harder to … Read more

CUDA model – what is warp size?

Direct Answer: Warp size is the number of threads in a warp, which is a sub-division used in the hardware implementation to coalesce memory access and instruction dispatch. Suggested Reading: As @Matias mentioned, I’d go read the CUDA C Best Practices Guide (you’ll have to scroll to the bottom where it’s listed). It might help … Read more

tech