Work-Stealing Deque Part 3: The Lock-Free Owner

Part 1 Part 2 Part 3 Part 4

Now that we understand memory ordering, we can start building the lock-free version of our deque. We’ll focus on the Owner’s operations first, specifically push_bottom and pop_bottom.

The Data Structure

First, we need to update our class members. We replace the raw integers with std::atomic and add some padding.

As we discussed in Part 1, we don’t want the top and bottom indices to sit on the same cache line.

bottom is written frequently by the Owner.
top is written by Thieves (and occasionally the Owner).

If they share a cache line, the Owner and Thieves will constantly invalidate each other’s caches. We use alignas to force them apart.

False Sharing

template<typename T>
class WSDeque {
public:
    WSDeque(std::size_t capacity_hint) { /* ... */ }
    ~WSDeque() { delete[] m_buffer; }

    bool push_bottom(T item);
    bool pop_bottom(T* out);
    bool steal_top(T* out); // To be implemented in Part 4

private:
    std::size_t m_capacity, m_mask;
    T* m_buffer;

    // Force m_bottom and m_top to be on separate cache lines
    alignas(std::hardware_destructive_interference_size) 
        std::atomic<std::uint64_t> m_bottom;
        
    alignas(std::hardware_destructive_interference_size) 
        std::atomic<std::uint64_t> m_top;
};

The Owner’s `push_bottom`

This is a classic Producer operation. The Owner produces a task and publishes it to the deque.

We need to ensure that:

We read the current top to check for space.
We write the item to the buffer.
We update bottom to make the item visible.

template<typename T>
bool WSDeque<T>::push_bottom(T item) {
    // 1. Load `bottom`. Relaxed is fine. Only we (the owner) write to it.
    std::uint64_t b = m_bottom.load(std::memory_order_relaxed);

    // 2. Load `top`. We need ACQUIRE to ensure we see the latest updates 
    // from thieves (who might have incremented top).
    std::uint64_t t = m_top.load(std::memory_order_acquire);

    if (b - t >= m_capacity) {
        return false; // Deque is full.
    }

    // 3. Write the item. No atomics needed for the buffer slot itself
    // because it's not effectively public yet.
    m_buffer[b & m_mask] = item;

    // 4. Publish. RELEASE ensures the write to m_buffer happens BEFORE
    // the update to m_bottom is visible to others.
    m_bottom.store(b + 1, std::memory_order_release);
    return true;
}

The Owner’s `pop_bottom`

This is trickier. Usually, the Owner just pops from the bottom, and Thieves steal from the top. They don’t touch.

But what if there is only one item left?

In that case, top and bottom are adjacent. The Owner tries to pop the last item (decrementing bottom), while a Thief might try to steal it (incrementing top) at the exact same time.

If bottom becomes less than top, the deque is empty.
If they cross each other without realizing it, both might think they successfully claimed the item.

To handle this, we use a Claim, Fence, Check strategy. We need a seq_cst fence to prevent the CPU from reordering the “write to bottom” with the “read from top”. Without this fence, the CPU’s store buffer might hold the new bottom value locally, causing the Owner to read a stale top and think it’s safe to proceed, while the Thief does the exact same thing from the other side.

template<typename T>
bool WSDeque<T>::pop_bottom(T* out) {
    std::uint64_t b = m_bottom.load(std::memory_order_relaxed);
    
    // Optimization: Check if empty before doing expensive work.
    // We use ACQUIRE to see recent steals.
    std::uint64_t t = m_top.load(std::memory_order_acquire);
    if (t >= b) {
        return false; // Empty
    }

    // 1. CLAIM: Decrement bottom. We tentatively claim the item.
    b--;
    m_bottom.store(b, std::memory_order_relaxed);

    // 2. FENCE: A SEQ_CST fence ensures
    // that our store to `bottom` is globally visible before we load `top`.
    std::atomic_thread_fence(std::memory_order_seq_cst);

    // 3. CHECK: Reload `top` to see if we raced.
    t = m_top.load(std::memory_order_relaxed);

    if (t <= b) {
        // Good case: There are items left (t < b) or we took the last one (t == b).
        T item = m_buffer[b & m_mask];

        if (t == b) { 
            // This was the LAST item. We need to be careful.
            // A thief might have incremented `top` concurrently.
            // We try to increment `top` ourselves to "seal" the deal.
            if (!m_top.compare_exchange_strong(t, t + 1,
                    std::memory_order_seq_cst, std::memory_order_relaxed)) {
                
                // We failed. A thief won the race and stole the item.
                // Restore `bottom` to reflect that the deque is empty.
                m_bottom.store(b + 1, std::memory_order_relaxed);
                return false;
            }
            // We won. The deque is now empty (top == bottom + 1).
            // We restore `bottom` to match `top`.
            m_bottom.store(b + 1, std::memory_order_relaxed);
        }
        *out = item;
        return true;
    } else {
        // Bad case: t > b. This means a thief stole the item *after* we decremented bottom.
        // The deque is empty. Restore `bottom` and fail.
        m_bottom.store(b + 1, std::memory_order_relaxed);
        return false;
    }
}

This logic ensures that even in the worst-case scenario (simultaneous pop and steal of the last item), only one thread succeeds.

In the final part, we’ll implement the other side of this race: the Thief.

Work-Stealing Deque Part 3: The Lock-Free Owner

The Data Structure

Preventing False Sharing

The Owner’s push_bottom

The Owner’s pop_bottom

The Owner’s `push_bottom`

The Owner’s `pop_bottom`