I developed this answer after following a link from dmckee’s answer, but it takes a different approach than his/her answer.
Function Attributes documentation from GCC mentions:
noinline
This function attribute prevents a function from being considered for inlining. If the function does not have side-effects, there are optimizations other than inlining that causes function calls to be optimized away, although the function call is live. To keep such calls from being optimized away, putasm ("");
This gave me an interesting idea… Instead of adding a nop instruction at the inner loop, I tried adding an empty assembly code in there, like this:
unsigned char i, j;
j = 0;
while(--j) {
i = 0;
while(--i)
asm("");
}
And it worked! That loop has not been optimized-out, and no extra nop instructions were inserted.
What’s more, if you use volatile, gcc will store those variables in RAM and add a bunch of ldd and std to copy them to temporary registers. This approach, on the other hand, doesn’t use volatile and generates no such overhead.
Update: If you are compiling code using -ansi or -std, you must replace the asm keyword with __asm__, as described in GCC documentation.
In addition, you can also use __asm__ __volatile__("") if your assembly statement must execute where we put it, (i.e. must not be moved out of a loop as an optimization).