You are using an unsigned 32-bits type for the array index (on line 21). This forces the compiler to consider at each step through the loop whether you might have overflowed its available range, in which case it needs to go back to the beginning of the array. The extra code you see is related to this check! There are at least three ways to avoid this over-cautious approach by the compiler:
- Use a 64-bit type for index on line 21. Now the compiler knows you will never wrap around the array, and generate the same code as without the lambda.
- Use a signed 32-bit type for index on line 21. Now the compiler no longer cares about overflow: signed overflow is considered UB, and therefore ignored. I consider this to be a bug in the interpretation of the standard, but opinions differ on this.
- Make it clear to the compiler that an overflow will never occur, by adding a line ‘int32_t iter = 0;’ at the beginning of the function, and removing iter from the declaration. Clearly this does not solve your problem, but it illustrates how it is the overflow analysis that causes the extra code to be generated.
You aren’t complaining about the code before the loop starts, but here you have the same problem. Just make iter and limit int64_t, and you’ll see it gets considerably shorter as the compiler no longer considers the possibility of array overflow.
So to recap: it is not the calculation of X1 and X2 that gets moved into the loop that causes the size to balloon, but the use of an incorrectly-typed array index variable.