It turned out that this is not an issue with lambda expressions, just that the compiler optimized-out the outer loop in the first case by caching the result of the sum()
function.
After changing the first case to this form:
out = 0.0;
for (size_t i = 0; i < MAX; ++i)
{
out += sum(v);
v[i] = 1.0; // this adds O(1) time and prevents caching
}
in both cases the timings are approximately equal, with the lambda as a favourite.