loop-unrolling – Tarik Billa

What does #pragma unroll do exactly? Does it affect the number of threads?

June 3, 2023 by Tarik

No. It means you have called a CUDA kernel with one block and that one block has 100 active threads. You’re passing size as the second function parameter to your kernel. In your kernel each of those 100 threads executes the for loop 100 times. #pragma unroll is a compiler optimization that can, for example, … Read more

When, if ever, is loop unrolling still useful?

January 4, 2023 by Tarik

Loop unrolling makes sense if you can break dependency chains. This gives a out of order or super-scalar CPU the possibility to schedule things better and thus run faster. A simple example: for (int i=0; i<n; i++) { sum += data[i]; } Here the dependency chain of the arguments is very short. If you get … Read more