Skip to content
Go back

Work-Stealing Deque Part 4: The Lock-Free Thief

Suggest an edit

Part 1 Part 2 Part 3 Part 4

We have the Owner’s code working. Now we need to implement the Thief’s logic, steal_top.

This function is called by other threads when they run out of work. They look at our deque and try to grab the oldest task.

The Thief’s steal_top

The logic here mirrors the Owner’s pop_bottom. It also needs to participate in the seq_cst handshake to handle the race for the last item.

A note on the ABA Problem: In many lock-free data structures, if a value changes from A to B and back to A, a thread might think nothing changed. Here, because our top and bottom indices are strictly increasing 64-bit integers, they will practically never wrap around (it would take centuries). This makes the ABA problem impossible for all practical purposes, simplifying our logic significantly compared to a pointer-based stack.

template<typename T>
bool WSDeque<T>::steal_top(T* out) {
    // 1. Load `top`. ACQUIRE ensures we see the latest state.
    std::uint64_t t = m_top.load(std::memory_order_acquire);

    // 2. FENCE. This matches the fence in `pop_bottom`.
    // It ensures our read of `top` happens before we read `bottom`.
    std::atomic_thread_fence(std::memory_order_seq_cst);

    // 3. Load `bottom`. ACQUIRE ensures that if we see a new bottom,
    // we also see the data written to the buffer.
    std::uint64_t b = m_bottom.load(std::memory_order_acquire);

    if (t < b) {
        // There is at least one item.
        // It is safe to read from the buffer *before* the CAS because
        // we know the item exists at index `t`.
        T item = m_buffer[t & m_mask];
        
        // 4. CAS. We try to increment `top` to claim the item.
        // This is the final arbiter. If multiple thieves (or the owner)
        // are fighting for this item, only one CAS will succeed.
        if (m_top.compare_exchange_strong(t, t + 1,
                std::memory_order_seq_cst, std::memory_order_relaxed)) {
            // Success! We stole the item.
            *out = item;
            return true;
        }
    }
    
    // Failed. Either empty, or we lost the race.
    return false;
}

How It All Fits Together

The correctness of this deque relies on the interactions between the fences and atomic operations.

1. Push vs. Steal (Producer-Consumer)

2. Steal vs. Steal (Thieves Fighting)

3. Pop vs. Steal (The Critical Race)

Practical Tips for the Scheduler

Now that you have a deque, you need a scheduler to use it. Here are two tips for building a robust work-stealing thread pool.

1. Random Victim Selection

When a thief needs to steal, which deque should it target? Don’t use a complex heuristic. Just pick a random thread. It’s statistically robust and simple. Crucial: Do not use a shared stateful random number generator (like std::mt19937 or rand()) protected by a lock. That introduces the same contention we just fought to remove. Instead, use a thread_local generator for each thread or a stateless hash function.

2. Exponential Backoff

If a thief tries to steal from everyone and fails, the system is likely under load (or empty). Don’t spin tightly in a loop. You’ll burn CPU cycles and slow down the working threads by contending for cache lines. Implement an exponential backoff:

  1. Yield (std::this_thread::yield()).
  2. Sleep for 1 microsecond.
  3. Sleep for 2 microseconds… up to a cap (e.g., 1ms). Reset the backoff as soon as you find work.

Conclusion

We started this series with a problem: a dynamic, unpredictable task tree that couldn’t be efficiently managed by a single shared queue. We saw how locks create bottlenecks and how false sharing silently kills performance.

By building this lock-free deque, we’ve created a structure that allows every thread to work independently on its own branch of that tree, only interacting when absolutely necessary. This is the engine of modern parallelism, powering systems like Intel TBB and Go’s scheduler.


Suggest an edit
Share this post on:

Previous Post
Taming Time in Game Engines
Next Post
Work-Stealing Deque Part 3: The Lock-Free Owner