Create a large buffer (MB+) in thread-local storage. (Actual memory on heap, management in TLS).
Allow clients to request memory from it in FILO manner (stack-like). (this mimics how it works in C VLAs; and it is efficient, as each request/return is just an integer addition/subtraction).
Get your VLA storage from it.
Wrap it pretty, so you can say stack_array<T> x(1024);
, and have that stack_array
deal with construction/destruction (note that ->~T()
where T
is int
is a legal noop, and construction can similarly be a noop), or make stack_array<T>
wrap a std::vector<T, TLS_stack_allocator>
.
Data will be not as local as the C VLA data is because it will be effectively on a separate stack. You can use SBO (small buffer optimization), which is when locality really matters.
A SBO stack_array<T>
can be implemented with an allocator and a std vector unioned with a std array, or with a unique ptr and custom destroyer, or a myriad of other ways. You can probably retrofit your solution, replacing your new/malloc/free/delete with calls to the above TLS storage.
I say go with TLS as that removes need for synchronization overhead while allowing multi-threaded use, and mirrors the fact that the stack itself is implicitly TLS.
Stack-buffer based STL allocator? is a SO Q&A with at least two “stack” allocators in the answers. They will need some adaption to automatically get their buffer from TLS.
Note that the TLS being one large buffer is in a sense an implementation detail. You could do large allocations, and when you run out of space do another large allocation. You just need to keep track each “stack page” current capacity and a list of stack pages, so when you empty one you can move onto an earlier one. That lets you be a bit more conservative in your TLS initial allocation without worrying about running OOM; the important part is that you are FILO and allocate rarely, not that the entire FILO buffer is one contiguous one.