Does Rust’s memory management result in fragmented memory?

Question

TL;DR: Most programs will never have to worry about fragmentation in C, C++ or Rust. Those which do will have to handle it by themselves.

Does the Rust programming language’s automatic memory management need to reclaim fragmented memory?

Rust does not have automatic memory management; it has manual memory management which the compiler checks for correctness. The difference might sound theoretical, however it is important because it means that the memory operations map directly to the source code, there is no magic going on behind the scenes.

In general, a language needs to have a Compacting GC to be able to compact fragmented memory. Rust, like C and C++, does not have a GC so its memory may be fragmented depending on usage, and cannot be defragmented without the program freeing the annoying blocks because relocation is impossible.

However, before we start dreading fragmentation, we must first think about what it means.

What is the effect of fragmentation?

Fragmentation causes a waste of physical memory and address space: your program occupies more than it uses. At the extreme, this waste may prevent allocation requests even though the amount of unused memory should be sufficient to grant them.

When making a parallel with a GC’ed language, it’s important to realize that most GC’ed languages also cause some waste.

Indeed, it’s notable that fragmentation is not the ONLY source of waste; over-allocation is a common “issue” too:

a Vec will allocate a power-of-2 number of elements, but maybe you only use 2^N + 1, wasting 2^N - 1 slots
a BTreeMap or HashMap allocate more space than they really use
even the memory allocator generally allocates chunks of predefined sizes, so that asking for 157 bytes may actually be rounded up to maybe 196 bytes, wasting 39 bytes

And that’s not even counting the waste going on with the management of this memory, since the memory allocator maintains some state to know what pages it has, and what’s used in them.

This underlines a very important fact: there’s little point getting rid of fragmentation if you consume so much memory because of the book-keeping/overhead your allocation scheme imposes that you suffer the same issues.

How do modern allocators manage memory?

Modern allocators are NOT your da’s free-list allocators.

The typical allocation scheme is relatively simple, and yet very good at keeping fragmentation down for small requests:

Large slabs of memory for “big” requests (close or over the OS page size: 4kB in general)
Small slabs of memory for “smaller” requests

For the small slabs, a number of classes are defined by size. For example: (0, 8], (8, 12], (12, 16], …, (164, 196], …, (..., 512]. Each class size manage its own list of OS pages, and carves each OS page for its own private use. An example of the 512 bytes class on a 4kB OS page could be:

+---+---+---+---+---+---+---+---+
| a | b | c | d | e | f | g | h |
+---+---+---+---+---+---+---+---+

where 512-bytes slots a through g are available for allocations and the latest slots h is reserved for meta-data (free slots, next/prev pages in the same class, etc…). Note that the bigger the class size, the more is wasted in the last slot, which is why larger allocations use a different scheme.

When deallocating, the page is kept in the class size until the last slot is deallocated at which point the page is blank again and can be used for another group.

What does it mean for memory consumption?

The maximum memory consumption of the small slabs scheme¹ is a number of OS pages which can be computed as the sum of the maximum number of OS pages consumed by each bucket size, which itself is the maximum number of concurrent allocations in this size multiplied divided by the number of allocations fitting in a page (and rounded up).

This is because if you allocate 1000 slots of a given size, release most of them in a haphazard fashion that pokes holes in the OS pages, and then re-allocate slots of the same size until you reach 1000 again… then your memory consumption is constant because the allocator will use the free slots from the already partially filled OS pages to fulfill the second wave of allocations.

Which means that allocations in small class sizes is both fast, and yet does not contribute much to fragmentation.

Of course, that’s ignoring the case of a program which would make 1M 1 byte allocation, deallocate most of them in a way that leaves all pages used, then do the same with 2 bytes, 3 bytes, etc… but this seems like a pathological case.

¹ Yes, I am lying through my teeth. You also need to account for the allocator’s internal structures overhead and the fact that it may cache a few unused OS pages to prepare for future allocations, … still, it’s sufficient for explaining the effect of fragmentation.

So, is fragmentation an issue?

Well, it can still be. The address space can still be fragmented, though at the granularity of OS pages.

With virtual memory, the RAM need not be contiguous, so as long as there is sufficient space the pages can be used. That is, the address space gives the user the illusion of contiguous memory even if the memory is physically spread all over RAM.

And there lies the issue: this illusion of contiguous memory requires finding a contiguous region of the address space, which is subject to fragmentation.

This fragmentation does not show up in small requests, but for requests over the page size they can be an issue. These days, with 64-bits pointers, this is much less of an issue in practice (even when only using 47-bits for user-land) however in 32-bits programs it is slightly more likely to surface: for example, it is extremely difficult to mmap a 2GB file in the 32 bits address space, because it immediately occupies half of it… assuming no stray allocation prevents it (in which case the request will fail).

Is the fight lost?

Well, the main advantage of systems programming is that… you can talk the systems language.

If your program has an allocation behavior which the typical allocator does not handle well, you can:

take control of the particular allocations that cause issues (using sbrk/mmap yourself to handle them)
or just rewrite a specifically tuned-up allocator

In 10 years, I’ve personally never needed to, and only wrote allocators for fun in my spare time; but it’s possible.

Leave a Comment Cancel reply