The problem with low-level advice like this (even coming from Andrei Alexandrescu) is that it ignores the fact that compilers optimize.
Modern compilers optimize so aggressively (and, in general, successfully) that it really becomes a mug’s game to try to second-guess them. On the whole, writing clear, readable code will help you, your colleagues and your compilers analyze the code. And I honestly believe that is the best general advice that can be given.
One of the well-known optimizations which modern compilers use is the conversion between index- and pointer-based loops. In the particular case of your benchmark, with most optimization settings, gcc will compile both the pointer-based and the 32-bit-index-based loop to the same assembler output.
In the following, I replaced the chrono stuff with ++sentry where sentry is a volatile in order to reduce the code size. The assembly output corresponds to:
for (int* q{p}; q != p + n; ++q) ++(*q);
++sentry;
for (int i{0}; i < n; i++) p[i] += 1;
Compile with -O2, this produced the following: (%rdi and %ebp were still initialized from the loop which populated p)
movq %rdi, %rdx
cmpq %rcx, %rdi
je .L10
.L16:
addl $1, (%rdx)
addq $4, %rdx
cmpq %rcx, %rdx
jne .L16
.L10:
movl sentry(%rip), %eax
movq %rdi, %rdx
addl $1, %eax
movl %eax, sentry(%rip)
testl %ebp, %ebp
jle .L8
.L14:
addl $1, (%rdx)
addq $4, %rdx
cmpq %rdx, %rsi
jne .L14
.L8:
You can see that there is no difference at all between the loops at .L16 and .L14.
Different optimization settings produce different results, of course. With -O3 the loops are vectorized using SIMD instructions and Duff’s device, but again are almost identical. clang does this optimization at -O2
None of that denies the point being made, which is that the compiler may need to work harder to prove that a pointer which is being written through cannot modify arbitrary memory.
But in this case, as in many cases, the loop index is a local variable and the loop is simple enough that the compiler can fully analyze it, thus allowing strength reduction, unrolling, and vectorization; whether the control variable is a pointer or an index is then irrelevant.
A more interesting example (possibly) is a loop over two arrays where the base elements are different sizes. Given the following two functions:
void d2f_ptr(float* out, const double* in, int n) {
for (auto lim = out + n; out < lim;) *out++ = *in++;
}
void d2f_idx(float out[], const double in[], int n) {
for (int i = 0; i < n; ++i) out[i] = in[i];
}
gcc (v5.3.0, -O2) does produce different loops and the index-based loop is one instruction shorter:
d2f_ptr(float*, double const*, int): d2f_idx(float*, double const*, int):
movslq %edx, %rdx xorl %eax, %eax
leaq (%rdi,%rdx,4), %rax testl %edx, %edx
cmpq %rax, %rdi jle .L16
jnb .L11
.L15: .L20:
addq $4, %rdi pxor %xmm0, %xmm0
addq $8, %rsi cvtsd2ss (%rsi,%rax,8), %xmm0
pxor %xmm0, %xmm0 movss %xmm0, (%rdi,%rax,4)
cvtsd2ss -8(%rsi), %xmm0 addq $1, %rax
movss %xmm0, -4(%rdi)
cmpq %rdi, %rax cmpl %eax, %edx
ja .L15 jg .L20
.L11: .L16:
ret ret
But change the double and float to objects whose sizes no longer permit the use of the Intel chip’s indexed addressing mode, and the compiler once again converts the index-based code to a pointer-based variant.
Here the code is essentially the same as before, but the double has been padded to 48 bytes:
struct Big { double val; char padding[40]; };
struct Small {
float val;
Small& operator=(const Big& other) {
val = other.val;
return *this;
}
};
d2f_ptr(Small*, Big const*, int): d2f_idx(Small*, Big const*, int):
movslq %edx, %rdx testl %edx, %edx
leaq (%rdi,%rdx,4), %rax jle .L26
cmpq %rax, %rdi leal -1(%rdx), %eax
jnb .L21 leaq 4(%rdi,%rax,4), %rax
.L25: .L29:
addq $48, %rsi pxor %xmm0, %xmm0
addq $4, %rdi addq $4, %rdi
pxor %xmm0, %xmm0 cvtsd2ss (%rsi), %xmm0
cvtsd2ss -48(%rsi), %xmm0 addq $48, %rsi
movss %xmm0, -4(%rdi) movss %xmm0, -4(%rdi)
cmpq %rdi, %rax cmpq %rax, %rdi
ja .L25 jne .L29
.L21: .L26:
ret ret
It’s possibly worth adding that for compilers, it is not necessarily more difficult to analyze which object a particular pointer write will modify. [Edited: There was a quote from Alexandrescu here, but it wasn’t as relevant as I thought, so I removed it leaving this section to be mostly a strawman.]
In fact, if a pointer is only directly assigned to once, and all other modifications are through increment and decrement operations (including += and -=), then the compiler is totally within its rights to assume that the pointer always points within the same object. If some additive modification of the pointer were to overshoot into some other object, that would be Undefined Behaviour and the compiler can discard that possibility. It’s easy enough to track assign and inc/dec operations in a flow graph, so in cases where the pointer could have been replaced with an index expression, it is quite possible for a compiler to figure that out and thus know that other objects are not being randomly mutated by writes through the pointer.