Open Table of contents

Conclusion

The Generalist’s Burden

In Part 1, we tore down the grand illusion of memory and stared into the machinery of the OS. We now understand that our pointers are virtual and that a page fault is not a bug, but a conversation with the kernel.

With this knowledge, let’s look at the tools we use every day, new in C++ and malloc in C, with new, critical eyes. These functions are the workhorses of memory allocation. They are robust, they are thread-safe, and they are general-purpose. And that is their greatest weakness.

A general-purpose allocator is a jack-of-all-trades and a master of none. It must be prepared for any situation: a 1-byte allocation followed by a 1-gigabyte allocation, followed by freeing the 1-byte allocation. To handle this chaos, it pays a heavy price in performance and complexity.

When you call new, you are not just getting a piece of memory. You are triggering a cascade of hidden operations:

Locking: The allocator must first acquire a lock (like a mutex) to prevent other threads from corrupting its internal state. This is a major point of contention in multithreaded applications.
Searching: It then searches through its complex data structures (like free lists or trees) to find a block of memory that fits your request. This search is not free.
Metadata: When it finds a block, it has to write metadata next to your allocation, a small “header” that stores the size of the block and other info. This metadata wastes memory and, more importantly, pollutes your CPU caches.
Unlocking: Finally, it releases the lock.

Every single allocation pays this tax. Deallocation is just as complex, involving more locking and manipulation of the free lists. This is the generalist’s burden. For high-performance applications like game engines, physics simulators, or servers, this overhead is unacceptable. We need a specialist. We need an Arena.

The Arena: A New Philosophy of Allocation

A Memory Arena (also known as a linear allocator, region, or bump allocator) is a radical departure from the new/delete model. The philosophy is simple and brutal:

Allocate a huge, contiguous block of memory upfront. To allocate memory within the arena, just increment a pointer. To free memory, destroy the entire arena in one shot.

That’s it. Let’s break down the earth-shattering implications of this model.

Blazing-Fast Allocation: An allocation is reduced to its bare essence: an addition and a pointer assignment. There are no locks, no searches, no complex data structures. It is, quite literally, one of the fastest possible ways to allocate memory.
Zero Per-Allocation Overhead: There is no metadata header for each allocation. The objects are packed tightly together in memory, one after the other. This not only saves memory but creates a paradise for the CPU’s prefetcher, dramatically improving cache performance.
Simplified Lifetime Management: The most powerful feature of an arena is that all objects allocated within it share a single lifetime. You don’t free them individually. You use them for a specific task (rendering a frame, loading a level, processing a request) and then you obliterate the entire arena at once. This completely eliminates an entire class of memory leaks and double-free bugs.

Typical Heap (new/malloc) Arena Allocator

Forging Our Arena with Virtual Memory

Now, let’s build one. A naive arena might just malloc a big chunk of memory. But we are no longer naive. We are armed with the power of virtual memory. We will build an arena that is not just fast, but also incredibly memory-efficient, using the reserve/commit pattern we learned in Part 1.

Our Arena will have four key properties:

struct Arena {
    char* base_ptr;          // The start of our massive virtual reservation
    size_t reserved_size;    // The total size of the reservation (e.g., 1GB)
    size_t committed_size;   // How much memory is currently committed
    size_t current_offset;   // The "bump pointer" offset from the base
};

Step 1: Creation - Claiming Our Land

First, we need a function to create the arena. This function will reserve a huge swath of virtual address space but will commit nothing. We only pay the physical RAM cost for the pages we actually touch, when we touch them, thanks to demand paging. The first allocation will trigger the first commit.

// A simplified, cross-platform implementation
#ifdef _WIN32
#include <windows.h>
#else
#include <sys/mman.h>
#include <unistd.h> // For sysconf
#endif

// Dynamically get the OS page size once.
const size_t PAGE_SIZE = []() {
#ifdef _WIN32
    SYSTEM_INFO sysInfo;
    GetSystemInfo(&sysInfo);
    return sysInfo.dwPageSize;
#else
    return (size_t)sysconf(_SC_PAGESIZE);
#endif
}();

Arena* CreateArena(size_t reserve_size) {
    Arena* arena = (Arena*)malloc(sizeof(Arena));
    if (!arena) return NULL;

    // Align reservation up to the nearest page size
    reserve_size = (reserve_size + PAGE_SIZE - 1) / PAGE_SIZE * PAGE_SIZE;

#ifdef _WIN32
    void* block = VirtualAlloc(NULL, reserve_size, MEM_RESERVE, PAGE_NOACCESS);
    if (!block) {
        free(arena);
        return NULL;
    }
#else
    void* block = mmap(NULL, reserve_size, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
    if (block == MAP_FAILED) {
        free(arena);
        return NULL;
    }
#endif

    arena->base_ptr = (char*)block;
    arena->reserved_size = reserve_size;
    // IMPORTANT: Start with zero committed memory.
    arena->committed_size = 0;
    arena->current_offset = 0;

    return arena;
}

Step 2: Allocation - The Bump Pointer

This is the heart of the arena. The ArenaAlloc function. Its logic is beautiful: check if there’s enough committed space. If not, commit more. Then, just bump the pointer.

The Allocation Process

void* ArenaAlloc(Arena* arena, size_t size) {
    if (!arena || size == 0) return nullptr;
    
    size_t new_offset = arena->current_offset + size;

    if (new_offset > arena->reserved_size) {
        // Out of reserved space
        return NULL;
    }

    if (new_offset > arena->committed_size) {
        // Align the required commit size up to the nearest page
        size_t new_commit_target = (new_offset + PAGE_SIZE - 1) / PAGE_SIZE * PAGE_SIZE;
        // Clamp to the reservation limit
        if (new_commit_target > arena->reserved_size) {
            new_commit_target = arena->reserved_size;
        }

        size_t size_to_commit = new_commit_target - arena->committed_size;
        void* commit_start_addr = arena->base_ptr + arena->committed_size;

#ifdef _WIN32
        if (!VirtualAlloc(commit_start_addr, size_to_commit, MEM_COMMIT, PAGE_READWRITE)) {
            return NULL;
        }
#else
        if (mprotect(commit_start_addr, size_to_commit, PROT_READ | PROT_WRITE) != 0) {
            return NULL;
        }
#endif
        arena->committed_size = new_commit_target;
    }

    // The actual "allocation" is just this.
    void* memory = arena->base_ptr + arena->current_offset;
    arena->current_offset = new_offset;

    return memory;
}

Step 3: Deallocation - The Great Reset

How do you free an object from the arena? You don’t. That’s the point. When you are done with everything in the arena, you have two choices.

If you want to reuse the arena for the next task (e.g., the next frame), you simply reset it. This is the fastest deallocation in the world:

void ArenaReset(Arena* arena) {
    arena->current_offset = 0;
}

If you are done with the arena forever, you release all of its memory back to the OS in a single call.

void ArenaRelease(Arena* arena) {
#ifdef _WIN32
    VirtualFree(arena->base_ptr, 0, MEM_RELEASE);
#else
    munmap(arena->base_ptr, arena->reserved_size);
#endif
    free(arena);
}

The Arena in Action: A Game Loop

Imagine a game frame. You need to allocate memory for particle effects, UI elements, temporary physics data, and AI command lists. All of this memory is needed for just 16 milliseconds, then it’s all garbage. This is the perfect use case for an arena.

// At the start of the game
Arena* frame_arena = CreateArena(1024 * 1024 * 64); // 64MB frame arena

// --- Main Game Loop ---
while (game_is_running) {
    // --- Start of Frame ---
    ArenaReset(frame_arena);

    // Allocate memory for various systems using our lightning-fast allocator
    Particle* particles = (Particle*)ArenaAlloc(frame_arena, sizeof(Particle) * 1000);
    UICommand* ui_cmds = (UICommand*)ArenaAlloc(frame_arena, sizeof(UICommand) * 50);
    TempCollisionData* coll_data = (TempCollisionData*)ArenaAlloc(frame_arena, sizeof(TempCollisionData) * 200);

    // ... do all game logic, physics, and rendering ...
    // All the pointers above are used here.

    // --- End of Frame ---
    // No need to call free() or delete on anything.
    // The ArenaReset at the top of the loop already handled it.
}

// At the end of the game
ArenaRelease(frame_arena);

We just serviced thousands of potential allocations and deallocations with two trivial function calls per frame. We have achieved memory management nirvana.

Level Up: Scoped Arenas for Nested Lifetimes

The frame arena is a powerful pattern, but what happens when a function needs its own temporary memory within a frame?

Imagine a function ProcessPhysicsCollisions() that needs to generate a temporary list of potential collision pairs. It could allocate from the main frame_arena, but then it permanently consumes that space for the rest of the frame, even though the data is only needed for a few milliseconds.

This is where the concept of a Scoped Arena (or scratch arena) comes in. Instead of creating a whole new virtual memory arena, we create a temporary, lightweight allocator that “borrows” memory from a parent arena. The mechanism is beautifully simple: we just save the state of the parent arena’s pointer and restore it when we’re done.

We can wrap this logic in a clean C++ RAII object:

// A temporary "save point" for a parent arena
struct ScopedArena {
    Arena* parent_arena;
    size_t original_offset;

    // Constructor saves the parent's current state
    ScopedArena(Arena* parent) {
        parent_arena = parent;
        original_offset = parent->current_offset;
    }

    // Destructor restores the parent's state, effectively "freeing"
    // all memory allocated during this scope's lifetime.
    ~ScopedArena() {
        parent_arena->current_offset = original_offset;
    }

    // We can even add an Alloc method for convenience
    void* Alloc(size_t size) {
        return ArenaAlloc(parent_arena, size);
    }
};

Now, our ProcessPhysicsCollisions function can have its own clean, isolated memory space without any long-term impact on the frame arena.

void ProcessPhysicsCollisions(Arena* frame_arena) {
    // All memory allocated inside this block will be automatically
    // reclaimed when 'scope' goes out of scope.
    ScopedArena scope(frame_arena);

    // This allocation is temporary
    PotentialPair* pairs = (PotentialPair*)scope.Alloc(sizeof(PotentialPair) * 1000);

    // ... do complex work with the 'pairs' list ...

} // <-- ~ScopedArena() is called here, 'frame_arena->current_offset' is reset!

Scoped Arena

The Performance Showdown: By the Numbers

Talk is cheap. Let’s write a benchmark. We’ll perform a stress test: allocate 1 million small, randomly-sized objects, then free them all. This is a nightmare scenario for malloc due to its overhead and search complexity.

The Results:

Test	Allocation Time	Deallocation Time
`malloc`/`free`	~8.77 ms	~15.07 ms
Arena Allocator	~0.84 ms	~0.042 µs (42 ns)

Ran on a M4 Pro 12-Core Macbook

The arena isn’t just a little faster. It’s in a completely different league. In this run, the allocations were over 10.44x faster because they are just pointer bumps. The deallocation is astronomically faster, over 358,000x faster, because it’s a single integer assignment, so fast we have to measure it in nanoseconds. This isn’t an optimization; it’s a paradigm shift.

Embracing the Arena’s “Limitations”

At first glance, the arena’s “all or nothing” deallocation seems like a major limitation. It’s natural to think they’re only useful for niche cases or trivial, throwaway data. But this perspective misses the bigger picture. In high-performance software like game engines, the vast majority of allocations are not random, they are grouped by a well-defined lifetime.

Think about it:

All the data needed to render a single frame.
All the assets and state loaded for a specific game level.
All the temporary data generated to process a physics tick.

These are not individual, unrelated allocations. They are groups of objects that are born together and die together. This is the exact pattern the arena is designed to master. The “limitation” of no individual free becomes a feature, forcing you to think about memory in terms of lifetimes, which leads to simpler, more robust code. It structurally eliminates entire classes of bugs like use-after-free and memory leaks from forgotten delete calls.

But what about truly temporary, scratch memory?

This is where the ScopedArena pattern shines. It’s the standard solution for handling nested lifetimes. If your main frame_arena lives for 16ms, but a function within that frame needs memory for only 50 microseconds, you create a ScopedArena. This gives you a “scratch pad” that borrows from the main arena and automatically cleans itself up when the function ends, without permanently consuming space from the frame’s budget. It’s the perfect way to handle temporary data without compromising the efficiency of the parent arena.

Conclusion

The arena allocator is not a drop-in replacement for every call to new. It’s a foundational tool that requires a shift in thinking—from managing individual objects to managing object lifetimes. Once you make that shift, you’ll find that arenas can handle almost all of the allocations in a typical game loop, making your code faster, safer, and easier to reason about.

And for that remaining, long-lived objects, that truly do need individual deallocation? As we’ll see in Part 3, the arena still serves as the perfect, high-performance foundation upon which we can build even more powerful, specialized allocators.

Memory Magic Part 2: The Arena of Power

Table of contents