Why has atomicAdd not been implemented for doubles?

Question

Edit: As of CUDA 8, double-precision atomicAdd() is implemented in CUDA with hardware support in SM_6X (Pascal) GPUs.

~~Currently, no CUDA devices support atomicAdd for double in hardware.~~ As you noted, it can be implemented in terms of atomicCAS on 64-bit integers, but there is a non-trivial performance cost for that.

Therefore, the CUDA software team chose to document a correct implementation as an option for developers, rather than make it part of the CUDA standard library. This way developers are not unknowingly opting in to a performance cost they don’t understand.

Aside: I don’t think this question should be closed as “not constructive”. I think it’s a perfectly valid question, +1.

Leave a Comment Cancel reply