Chapter 22 of 23

Concurrency and Multithreading

Learn to write concurrent C++ programs using std::thread, mutexes, atomic variables, condition variables, and std::async — with a producer-consumer queue as a practical worked example.

Meritshot10 min read
C++Multithreadingstd::threadMutexstd::atomicstd::asyncConcurrency
All C++ Chapters

Concurrency and Multithreading

Modern CPUs ship with 8, 16, or even 128 cores. Yet a single-threaded program uses exactly one of them. For computationally intensive work — image processing, data analytics, game physics — this is an enormous waste. Companies like Dream11, Razorpay, and Ola run backend services that must handle thousands of concurrent operations. C++11 standardised a portable threading library (no more platform-specific pthreads or Windows threads), giving you std::thread, std::mutex, std::atomic, std::condition_variable, and std::future out of the box.

This chapter walks through all of these tools and ends with a classic interview problem: a producer-consumer queue that is safe under concurrent access.


std::thread — Launching a Thread

Include <thread> to use std::thread. You construct a thread object by passing it a callable (function, lambda, functor) and its arguments:

#include <iostream>
#include <thread>

void greet(const std::string& name) {
    std::cout << "Hello from " << name << "!\n";
}

int main() {
    std::thread t(greet, "Arjun");
    t.join(); // wait for t to finish
    std::cout << "Main thread done.\n";
}

The thread starts running immediately after construction. The call to join() blocks the calling thread until t completes.

Joining vs Detaching

MethodBehaviour
t.join()Calling thread blocks until t finishes. Safe and recommended for most use cases.
t.detach()Thread runs independently ("fire and forget"). The runtime cleans it up when it finishes. Dangerous if the thread accesses local variables of the spawning function.

Rule: Before a std::thread object is destroyed, you must call either join() or detach(). If you do neither, the destructor calls std::terminate().

#include <iostream>
#include <thread>

int main() {
    std::thread t([]() {
        std::cout << "Background task running\n";
    });

    // Must join or detach before t goes out of scope
    t.join();
}

Passing Arguments to Threads

Arguments are copied into the thread by default. Use std::ref to pass by reference, or a lambda to capture variables:

#include <iostream>
#include <thread>
#include <functional>

void increment(int& val) {
    val += 10;
}

int main() {
    int counter = 0;
    std::thread t(increment, std::ref(counter));
    t.join();
    std::cout << "Counter: " << counter << "\n"; // 10
}

A lambda capture is often cleaner:

int counter = 0;
std::thread t([&counter]() { counter += 10; });
t.join();

Race Conditions

When two threads read and write the same variable concurrently without synchronisation, you have a race condition — the result depends on the order in which the threads happen to be scheduled, which is non-deterministic.

#include <iostream>
#include <thread>

int total = 0; // shared variable — DANGER

void addMillion() {
    for (int i = 0; i < 1000000; ++i)
        total++; // not atomic: read-modify-write is three steps
}

int main() {
    std::thread t1(addMillion);
    std::thread t2(addMillion);
    t1.join();
    t2.join();
    std::cout << total << "\n"; // Should be 2000000, but is not
}

The ++ operator compiles to three instructions: load, add, store. Two threads can interleave these steps, losing increments. The final value is unpredictable.


std::mutex and std::lock_guard

A mutex (mutual exclusion) ensures that only one thread executes a critical section at a time.

#include <iostream>
#include <thread>
#include <mutex>

int total = 0;
std::mutex mtx;

void addMillion() {
    for (int i = 0; i < 1000000; ++i) {
        std::lock_guard<std::mutex> lock(mtx); // locks on construction
        total++;
    } // lock released when 'lock' goes out of scope (RAII)
}

int main() {
    std::thread t1(addMillion);
    std::thread t2(addMillion);
    t1.join();
    t2.join();
    std::cout << total << "\n"; // 2000000 — always correct
}

std::lock_guard is RAII: it locks the mutex on construction and unlocks it on destruction. You never forget to unlock, even if an exception is thrown.

Performance note: Locking on every iteration is extremely slow. In this example the critical section should be redesigned (e.g., use thread-local accumulation then a single final lock). We keep it simple here for illustration.

std::unique_lock

std::unique_lock is a more flexible RAII wrapper: you can unlock early, defer locking, or use it with condition variables. It has a small overhead over lock_guard due to the extra flexibility.

std::unique_lock<std::mutex> lock(mtx);
// ... do work ...
lock.unlock(); // unlock early if needed
// ... do other work without the lock ...

std::atomic — Lock-Free Simple Operations

For simple types (integers, booleans, pointers), std::atomic provides thread-safe operations without a mutex. Atomic operations are indivisible — no interleaving is possible.

#include <iostream>
#include <thread>
#include <atomic>

std::atomic<int> total{0};

void addMillion() {
    for (int i = 0; i < 1000000; ++i)
        total++; // atomic increment — no mutex needed
}

int main() {
    std::thread t1(addMillion);
    std::thread t2(addMillion);
    t1.join();
    t2.join();
    std::cout << total << "\n"; // 2000000 — correct and fast
}

Use std::atomic when:

  • The shared data is a simple type (int, bool, pointer).
  • The operation is a single read-modify-write (increment, compare-and-swap).

Use a std::mutex when the critical section involves multiple variables or complex invariants that must be updated together.


std::condition_variable — Synchronising on a Condition

A condition variable lets a thread wait until some condition becomes true, without busy-spinning. It must be used with a std::unique_lock.

#include <iostream>
#include <thread>
#include <mutex>
#include <condition_variable>

std::mutex mtx;
std::condition_variable cv;
bool dataReady = false;

void producer() {
    {
        std::unique_lock<std::mutex> lock(mtx);
        dataReady = true;
        std::cout << "Producer: data is ready\n";
    }
    cv.notify_one(); // wake up one waiting thread
}

void consumer() {
    std::unique_lock<std::mutex> lock(mtx);
    cv.wait(lock, []{ return dataReady; }); // releases lock while waiting
    std::cout << "Consumer: received data\n";
}

int main() {
    std::thread c(consumer);
    std::thread p(producer);
    p.join();
    c.join();
}

cv.wait(lock, pred) atomically releases the mutex and puts the thread to sleep. When cv.notify_one() is called (from another thread), the waiting thread wakes up, re-acquires the lock, and checks the predicate. If the predicate is false (spurious wakeup), it goes back to sleep.


std::async and std::future

std::async launches a task asynchronously (possibly on a new thread) and returns a std::future that you can later query for the result:

#include <iostream>
#include <future>
#include <numeric>
#include <vector>

int sumRange(int start, int end) {
    int total = 0;
    for (int i = start; i <= end; ++i) total += i;
    return total;
}

int main() {
    // Launch two tasks concurrently
    auto f1 = std::async(std::launch::async, sumRange, 1,    500000);
    auto f2 = std::async(std::launch::async, sumRange, 500001, 1000000);

    int result = f1.get() + f2.get(); // blocks until both finish
    std::cout << "Sum 1..1000000 = " << result << "\n"; // 500000500000
}

std::launch::async forces the task onto a new thread. std::launch::deferred runs the task lazily on get(). Without a policy, the implementation decides.

future.get() blocks until the result is ready and returns it. If the async function threw an exception, get() rethrows it.


Worked Example: Producer-Consumer Queue

The producer-consumer pattern is a classic concurrency problem that appears in operating systems, message queues, and streaming pipelines. One or more threads produce work items and push them into a shared queue; one or more threads consume items and process them.

#include <iostream>
#include <thread>
#include <mutex>
#include <condition_variable>
#include <queue>
#include <string>
#include <chrono>

class SafeQueue {
public:
    void push(const std::string& item) {
        {
            std::unique_lock<std::mutex> lock(mtx_);
            queue_.push(item);
        }
        cv_.notify_one(); // wake a waiting consumer
    }

    // Returns false if stopped and queue is empty
    bool pop(std::string& item) {
        std::unique_lock<std::mutex> lock(mtx_);
        cv_.wait(lock, [this]{ return !queue_.empty() || stopped_; });

        if (queue_.empty()) return false; // stopped with no items

        item = queue_.front();
        queue_.pop();
        return true;
    }

    void stop() {
        {
            std::unique_lock<std::mutex> lock(mtx_);
            stopped_ = true;
        }
        cv_.notify_all(); // wake all consumers so they can exit
    }

private:
    std::queue<std::string> queue_;
    std::mutex              mtx_;
    std::condition_variable cv_;
    bool                    stopped_ = false;
};

// Producer: sends task IDs into the queue
void producer(SafeQueue& q, int count) {
    for (int i = 0; i < count; ++i) {
        std::string task = "Task-" + std::to_string(i);
        q.push(task);
        std::cout << "[Producer] Pushed: " << task << "\n";
        std::this_thread::sleep_for(std::chrono::milliseconds(10));
    }
    q.stop(); // signal consumers that no more items will come
}

// Consumer: processes items from the queue
void consumer(SafeQueue& q, int id) {
    std::string item;
    while (q.pop(item)) {
        std::cout << "[Consumer " << id << "] Processing: " << item << "\n";
        std::this_thread::sleep_for(std::chrono::milliseconds(25));
    }
    std::cout << "[Consumer " << id << "] Done.\n";
}

int main() {
    SafeQueue q;

    // One producer, two consumers
    std::thread prod(producer, std::ref(q), 6);
    std::thread cons1(consumer, std::ref(q), 1);
    std::thread cons2(consumer, std::ref(q), 2);

    prod.join();
    cons1.join();
    cons2.join();

    std::cout << "All done.\n";
}

Key design decisions:

  • The mutex protects both the queue and the stopped_ flag.
  • The condition variable uses a predicate that checks both !queue_.empty() and stopped_, handling spurious wakeups.
  • stop() calls notify_all() so every waiting consumer wakes up and can exit cleanly.
  • This design is easily extended to multiple producers without any changes.

Common Pitfalls

1. Forgetting to join or detach Destroying a std::thread that is still joinable calls std::terminate(). Always join or detach before the thread object goes out of scope. Consider wrapping threads in RAII helpers or using std::jthread (C++20) which auto-joins.

2. Data races on non-atomic shared variables Any unsynchronised concurrent write (or write + read) to a non-atomic variable is undefined behaviour. Use a mutex or std::atomic.

3. Deadlocks from lock ordering If thread A holds mutex1 and waits for mutex2, while thread B holds mutex2 and waits for mutex1, both block forever. Always acquire multiple mutexes in a consistent global order, or use std::scoped_lock (C++17) which uses deadlock-avoidance internally.

4. Spurious wakeups cv.wait(lock) without a predicate can return spuriously (without notify being called). Always pass a predicate: cv.wait(lock, [] { return condition; });.

5. Accessing a moved-from future Calling get() on a std::future more than once throws std::future_error. A future can only be consumed once.

6. Expensive locking in tight loops Locking a mutex on every iteration of a million-iteration loop is extremely slow. Batch work and lock once per batch, or use std::atomic for simple counters.


Practice Exercises

  1. Launch 5 threads, each printing its thread ID (hint: std::this_thread::get_id()) 3 times. Add a mutex so the prints do not interleave.

  2. Rewrite the race-condition example using std::atomic<int> and verify the output is always 2000000.

  3. Implement a thread-safe Counter class with increment(), decrement(), and get() methods, using a std::mutex internally.

  4. Use std::async to compute the maximum element in each half of a large vector concurrently, then combine the results.

  5. Extend the SafeQueue to support multiple producers (no code change needed — explain why it is already safe) and add a size() method that returns the current number of items in the queue.


Summary

  • std::thread launches concurrent execution; you must join() or detach() before the thread object is destroyed.
  • Arguments to threads are copied by default; use std::ref() or a lambda capture for references.
  • A race condition occurs when two threads access shared data without synchronisation; the result is undefined behaviour.
  • std::mutex with std::lock_guard (RAII) serialises access to shared data and prevents races.
  • std::atomic<T> provides lock-free, thread-safe operations for simple types like integers and booleans.
  • std::condition_variable lets threads wait efficiently for a condition without busy-spinning; always use it with a predicate to handle spurious wakeups.
  • std::async with std::future provides a high-level way to run tasks asynchronously and retrieve their results.
  • The producer-consumer pattern uses a mutex-protected queue and a condition variable for producer-consumer coordination.
  • Common bugs: missing join/detach, data races, deadlocks from inconsistent lock ordering, and future.get() called twice.