This final chapter completes our journey. We will implement the steal_top function, putting the final gear in place. Then, we will stand back and analyze the machine as a whole, observing the beautiful and intricate dance of memory fences that makes the entire lock-free algorithm safe.
Implementing steal_top: The Thief’s Gambit
A thief’s logic is a mirror image of the owner’s pop. It must also contend for the last item, not just with other thieves, but with the owner. To do this safely, it must participate in the same global ordering protocol established in pop_bottom.
template<typename T>
bool WSDeque<T>::steal_top(T* out) {
    // 1. Load `top`. An acquire load ensures we see the latest value of `top`
    // as updated by other competing thieves.
    std::uint64_t t = m_top.load(std::memory_order_acquire);
    // 2. FENCE: The thief's half of the global ordering agreement. This
    // `seq_cst` fence synchronizes with the fence in `pop_bottom`. It
    // ensures that in the critical race for the last item, the two
    // operations are strictly ordered relative to each other.
    std::atomic_thread_fence(std::memory_order_seq_cst);
    // 3. Load `bottom`. The acquire semantic synchronizes with the owner's
    // `release` store in `push_bottom`, guaranteeing that if we see an
    // increased `bottom`, we also see the item that was written to the
    // buffer.
    std::uint64_t b = m_bottom.load(std::memory_order_acquire);
    if (t < b) {
        // Deque appears to have at least one item.
        // It's safe to read from the buffer now, before the CAS.
        T item = m_buffer[t & m_mask];
        
        // 4. Attempt to claim the item. The `seq_cst` CAS is the final,
        // incorruptible referee. It participates in the single global
        // timeline and ensures that only one operation, this one, another
        // thief's, or the owner's in pop_bottom, can succeed.
        if (m_top.compare_exchange_strong(t, t + 1,
                std::memory_order_seq_cst, std::memory_order_relaxed)) {
            // Success! The item is ours.
            *out = item;
            return true;
        }
    }
    
    // The deque was empty, or we lost the CAS race.
    return false;
}
The Grand Unification: How It All Works Together
Our deque is complete. The owner pushes and pops from the bottom, while thieves steal from the top. Most of the time, they work on opposite ends and never interact. But when the deque is nearly empty, they collide. Let’s analyze how our carefully placed fences make these interactions safe.
Interaction 1: Producer-Consumer (Push ↔ Steal)
This is the simplest case, the classic “Mailbox Handshake.”
- In 
push_bottom, the owner writes tom_bufferthen executesm_bottom.store(..., std::memory_order_release). - In 
steal_top, the thief executesm_bottom.load(..., std::memory_order_acquire). - The Guarantee: The release-acquire pair ensures that if the thief’s load sees the updated 
bottomvalue, it is guaranteed to also see the data that was written to the buffer slot. The task is always in the buffer before it is made available. 
Interaction 2: The Multi-Thief Heist (Steal ↔ Steal)
What happens if two thieves try to steal the very last item at the exact same time?
- Both thieves will load the same 
topvalue, say5. - Both will pass the 
t < bcheck. - Both will attempt 
compare_exchange_strongto changetopfrom5to6. - The Guarantee: The CAS operation on 
m_topis atomic. Only one of them can succeed. Let’s say Thief A wins. Its CAS succeeds, and it takes the item. When Thief B’s CAS runs, it will see thattopis now6, not the5it expected, so its CAS will fail harmlessly. Theseq_cstordering ensures the winner’s update is published correctly to all other contenders. 
Interaction 3: The Critical Race (Pop ↔ Steal)
This is the heart of the algorithm’s correctness: the owner and a thief race for the last item.
- The owner’s 
pop_bottomexecutesm_bottom.store(relaxed)followed byfence(seq_cst). - The thief’s 
steal_topexecutes its ownfence(seq_cst). - The Guarantee: The C++ memory model mandates a single, total order for all 
seq_cstoperations. Therefore, the owner’s fence and the thief’s fence must execute in a specific order in this global timeline.- If Pop’s fence happens first: The owner’s claim on 
bottomis now globally visible. When the thief later loadsbottom, it is guaranteed to see the new, decremented value and will correctly conclude thatt >= b, causing it to fail without attempting a CAS. - If Steal’s fence happens first: The thief has established its view of the memory state. When the owner reloads 
topafter its fence, it is guaranteed to see a value oftopthat is at least as up-to-date as the one the thief saw. 
 - If Pop’s fence happens first: The owner’s claim on 
 - This “fencing duel” ensures that at least one of the participants gets a consistent, up-to-date view of the deque’s size. This prevents them from both mistakenly believing they can take the last item. The final, tie-breaking 
seq_cstCAS onm_topsimply confirms who won the race that the fences already refereed. 
Building the Work-Stealing Pool
Victim Selection & Per-Thread Randomness
A good scheduler needs a smart way for idle threads to choose a victim. Random selection is a robust general-purpose strategy. To avoid creating a new bottleneck, never use a single, shared random number generator with a mutex. The correct solution is a thread_local generator, giving each thread its own private instance with zero contention.
The Art of Idleness: Backoff Strategies
When a thread fails to find work, std::this_thread::yield() is a simple choice. But a better one is exponential backoff:
- If a thread fails to find work, it waits for a tiny duration.
 - If it fails again, it doubles the wait time, up to a maximum.
 - On success, the wait time resets. This keeps the system responsive when busy and quiet when idle.
 
Conclusion: The End of the Dojo
Our journey is complete. We started with the simple problem of a shared queue and progressed step-by-step to one of the most elegant and powerful data structures in concurrent programming. You have met the Out of Order Monster, and used the fundamental principles of memory ordering to tame it. The dojo is now yours.