How do I choose grid and block dimensions for CUDA kernels?
There are two parts to that answer (I wrote it). One part is easy to quantify, the other is more empirical. Hardware Constraints: This is the easy to quantify part. Appendix F of the current CUDA programming guide lists a number of hard limits which limit how many threads per block a kernel launch can … Read more