Skip to content
Go back

Work-Stealing Deque Part 3: The Lock-Free Owner

Suggest an edit

Part 1 Part 2 Part 3 Part 4

Now that we understand memory ordering, we can start building the lock-free version of our deque. We’ll focus on the Owner’s operations first, specifically push_bottom and pop_bottom.

The Data Structure

First, we need to update our class members. We replace the raw integers with std::atomic and add some padding.

Preventing False Sharing

As we discussed in Part 1, we don’t want the top and bottom indices to sit on the same cache line.

If they share a cache line, the Owner and Thieves will constantly invalidate each other’s caches. We use alignas to force them apart.

False Sharing

template<typename T>
class WSDeque {
public:
    WSDeque(std::size_t capacity_hint) { /* ... */ }
    ~WSDeque() { delete[] m_buffer; }

    bool push_bottom(T item);
    bool pop_bottom(T* out);
    bool steal_top(T* out); // To be implemented in Part 4

private:
    std::size_t m_capacity, m_mask;
    T* m_buffer;

    // Force m_bottom and m_top to be on separate cache lines
    alignas(std::hardware_destructive_interference_size) 
        std::atomic<std::uint64_t> m_bottom;
        
    alignas(std::hardware_destructive_interference_size) 
        std::atomic<std::uint64_t> m_top;
};

The Owner’s push_bottom

This is a classic Producer operation. The Owner produces a task and publishes it to the deque.

We need to ensure that:

  1. We read the current top to check for space.
  2. We write the item to the buffer.
  3. We update bottom to make the item visible.
template<typename T>
bool WSDeque<T>::push_bottom(T item) {
    // 1. Load `bottom`. Relaxed is fine. Only we (the owner) write to it.
    std::uint64_t b = m_bottom.load(std::memory_order_relaxed);

    // 2. Load `top`. We need ACQUIRE to ensure we see the latest updates 
    // from thieves (who might have incremented top).
    std::uint64_t t = m_top.load(std::memory_order_acquire);

    if (b - t >= m_capacity) {
        return false; // Deque is full.
    }

    // 3. Write the item. No atomics needed for the buffer slot itself
    // because it's not effectively public yet.
    m_buffer[b & m_mask] = item;

    // 4. Publish. RELEASE ensures the write to m_buffer happens BEFORE
    // the update to m_bottom is visible to others.
    m_bottom.store(b + 1, std::memory_order_release);
    return true;
}

The Owner’s pop_bottom

This is trickier. Usually, the Owner just pops from the bottom, and Thieves steal from the top. They don’t touch.

But what if there is only one item left?

In that case, top and bottom are adjacent. The Owner tries to pop the last item (decrementing bottom), while a Thief might try to steal it (incrementing top) at the exact same time.

To handle this, we use a Claim, Fence, Check strategy. We need a seq_cst fence to prevent the CPU from reordering the “write to bottom” with the “read from top”. Without this fence, the CPU’s store buffer might hold the new bottom value locally, causing the Owner to read a stale top and think it’s safe to proceed, while the Thief does the exact same thing from the other side.

template<typename T>
bool WSDeque<T>::pop_bottom(T* out) {
    std::uint64_t b = m_bottom.load(std::memory_order_relaxed);
    
    // Optimization: Check if empty before doing expensive work.
    // We use ACQUIRE to see recent steals.
    std::uint64_t t = m_top.load(std::memory_order_acquire);
    if (t >= b) {
        return false; // Empty
    }

    // 1. CLAIM: Decrement bottom. We tentatively claim the item.
    b--;
    m_bottom.store(b, std::memory_order_relaxed);

    // 2. FENCE: A SEQ_CST fence ensures
    // that our store to `bottom` is globally visible before we load `top`.
    std::atomic_thread_fence(std::memory_order_seq_cst);

    // 3. CHECK: Reload `top` to see if we raced.
    t = m_top.load(std::memory_order_relaxed);

    if (t <= b) {
        // Good case: There are items left (t < b) or we took the last one (t == b).
        T item = m_buffer[b & m_mask];

        if (t == b) { 
            // This was the LAST item. We need to be careful.
            // A thief might have incremented `top` concurrently.
            // We try to increment `top` ourselves to "seal" the deal.
            if (!m_top.compare_exchange_strong(t, t + 1,
                    std::memory_order_seq_cst, std::memory_order_relaxed)) {
                
                // We failed. A thief won the race and stole the item.
                // Restore `bottom` to reflect that the deque is empty.
                m_bottom.store(b + 1, std::memory_order_relaxed);
                return false;
            }
            // We won. The deque is now empty (top == bottom + 1).
            // We restore `bottom` to match `top`.
            m_bottom.store(b + 1, std::memory_order_relaxed);
        }
        *out = item;
        return true;
    } else {
        // Bad case: t > b. This means a thief stole the item *after* we decremented bottom.
        // The deque is empty. Restore `bottom` and fail.
        m_bottom.store(b + 1, std::memory_order_relaxed);
        return false;
    }
}

This logic ensures that even in the worst-case scenario (simultaneous pop and steal of the last item), only one thread succeeds.

In the final part, we’ll implement the other side of this race: the Thief.


Suggest an edit
Share this post on:

Previous Post
Work-Stealing Deque Part 4: The Lock-Free Thief
Next Post
Work-Stealing Deque Part 2: Memory Ordering