Why does GCC pad functions with NOPs?

First of all, gcc doesn’t always do this. The padding is controlled by -falign-functions, which is automatically turned on by -O2 and -O3: -falign-functions -falign-functions=n Align the start of functions to the next power-of-two greater than n, skipping up to n bytes. For instance, -falign-functions=32 aligns functions to the next 32-byte boundary, but -falign-functions=24 would … Read more

Why can’t C compilers rearrange struct members to eliminate alignment padding? [duplicate]

There are multiple reasons why the C compiler cannot automatically reorder the fields: The C compiler doesn’t know whether the struct represents the memory structure of objects beyond the current compilation unit (for example: a foreign library, a file on disc, network data, CPU page tables, …). In such a case the binary structure of … Read more

Why does struct alignment depend on whether a field type is primitive or user-defined?

I think this is a bug. You are seeing the side-effect of automatic layout, it likes to align non-trivial fields to an address that’s a multiple of 8 bytes in 64-bit mode. It occurs even when you explicitly apply the [StructLayout(LayoutKind.Sequential)] attribute. That is not supposed to happen. You can see it by making the … Read more

What is the meaning of “__attribute__((packed, aligned(4))) “

Before answering, I would like to give you some data from Wiki Data structure alignment is the way data is arranged and accessed in computer memory. It consists of two separate but related issues: data alignment and data structure padding. When a modern computer reads from or writes to a memory address, it will do … Read more

Compelling examples of custom C++ allocators?

As I mention here, I’ve seen Intel TBB’s custom STL allocator significantly improve performance of a multithreaded app simply by changing a single std::vector<T> to std::vector<T,tbb::scalable_allocator<T> > (this is a quick and convenient way of switching the allocator to use TBB’s nifty thread-private heaps; see page 7 in this document)

Purpose of memory alignment

The memory subsystem on a modern processor is restricted to accessing memory at the granularity and alignment of its word size; this is the case for a number of reasons. Speed Modern processors have multiple levels of cache memory that data must be pulled through; supporting single-byte reads would make the memory subsystem throughput tightly … Read more