The extra code is for handling misalignment because the instruction used, vmovdqa64
, requires 64 byte alignment.
My testing shows that even though the standard doesn’t, gcc does allow a definition in another module to override the one here when in C mode. That definition might only comply with the basic alignment requirements (4 bytes) thus the compiler can’t rely on the bigger alignment. Technically, gcc emits a .comm
assembly directive for this tentative definition, while an external definition uses a normal symbol in the .data
section. During linking this symbol takes precedence over the .comm
one.
Note if you change the program to use extern unsigned int buffer[2048];
then even the C++ version will have the added code. Conversely, making it static unsigned int buffer[2048];
will turn the C version into the optimized one.