Concurrency and Multithreading
Modern CPUs ship with 8, 16, or even 128 cores. Yet a single-threaded program uses exactly one of them. For computationally intensive work — image processing, data analytics, game physics — this is an enormous waste. Companies like Dream11, Razorpay, and Ola run backend services that must handle thousands of concurrent operations. C++11 standardised a portable threading library (no more platform-specific pthreads or Windows threads), giving you std::thread, std::mutex, std::atomic, std::condition_variable, and std::future out of the box.
This chapter walks through all of these tools and ends with a classic interview problem: a producer-consumer queue that is safe under concurrent access.
std::thread — Launching a Thread
Include <thread> to use std::thread. You construct a thread object by passing it a callable (function, lambda, functor) and its arguments:
#include <iostream>
#include <thread>
void greet(const std::string& name) {
std::cout << "Hello from " << name << "!\n";
}
int main() {
std::thread t(greet, "Arjun");
t.join(); // wait for t to finish
std::cout << "Main thread done.\n";
}
The thread starts running immediately after construction. The call to join() blocks the calling thread until t completes.
Joining vs Detaching
| Method | Behaviour |
|---|---|
t.join() | Calling thread blocks until t finishes. Safe and recommended for most use cases. |
t.detach() | Thread runs independently ("fire and forget"). The runtime cleans it up when it finishes. Dangerous if the thread accesses local variables of the spawning function. |
Rule: Before a std::thread object is destroyed, you must call either join() or detach(). If you do neither, the destructor calls std::terminate().
#include <iostream>
#include <thread>
int main() {
std::thread t([]() {
std::cout << "Background task running\n";
});
// Must join or detach before t goes out of scope
t.join();
}
Passing Arguments to Threads
Arguments are copied into the thread by default. Use std::ref to pass by reference, or a lambda to capture variables:
#include <iostream>
#include <thread>
#include <functional>
void increment(int& val) {
val += 10;
}
int main() {
int counter = 0;
std::thread t(increment, std::ref(counter));
t.join();
std::cout << "Counter: " << counter << "\n"; // 10
}
A lambda capture is often cleaner:
int counter = 0;
std::thread t([&counter]() { counter += 10; });
t.join();
Race Conditions
When two threads read and write the same variable concurrently without synchronisation, you have a race condition — the result depends on the order in which the threads happen to be scheduled, which is non-deterministic.
#include <iostream>
#include <thread>
int total = 0; // shared variable — DANGER
void addMillion() {
for (int i = 0; i < 1000000; ++i)
total++; // not atomic: read-modify-write is three steps
}
int main() {
std::thread t1(addMillion);
std::thread t2(addMillion);
t1.join();
t2.join();
std::cout << total << "\n"; // Should be 2000000, but is not
}
The ++ operator compiles to three instructions: load, add, store. Two threads can interleave these steps, losing increments. The final value is unpredictable.
std::mutex and std::lock_guard
A mutex (mutual exclusion) ensures that only one thread executes a critical section at a time.
#include <iostream>
#include <thread>
#include <mutex>
int total = 0;
std::mutex mtx;
void addMillion() {
for (int i = 0; i < 1000000; ++i) {
std::lock_guard<std::mutex> lock(mtx); // locks on construction
total++;
} // lock released when 'lock' goes out of scope (RAII)
}
int main() {
std::thread t1(addMillion);
std::thread t2(addMillion);
t1.join();
t2.join();
std::cout << total << "\n"; // 2000000 — always correct
}
std::lock_guard is RAII: it locks the mutex on construction and unlocks it on destruction. You never forget to unlock, even if an exception is thrown.
Performance note: Locking on every iteration is extremely slow. In this example the critical section should be redesigned (e.g., use thread-local accumulation then a single final lock). We keep it simple here for illustration.
std::unique_lock
std::unique_lock is a more flexible RAII wrapper: you can unlock early, defer locking, or use it with condition variables. It has a small overhead over lock_guard due to the extra flexibility.
std::unique_lock<std::mutex> lock(mtx);
// ... do work ...
lock.unlock(); // unlock early if needed
// ... do other work without the lock ...
std::atomic — Lock-Free Simple Operations
For simple types (integers, booleans, pointers), std::atomic provides thread-safe operations without a mutex. Atomic operations are indivisible — no interleaving is possible.
#include <iostream>
#include <thread>
#include <atomic>
std::atomic<int> total{0};
void addMillion() {
for (int i = 0; i < 1000000; ++i)
total++; // atomic increment — no mutex needed
}
int main() {
std::thread t1(addMillion);
std::thread t2(addMillion);
t1.join();
t2.join();
std::cout << total << "\n"; // 2000000 — correct and fast
}
Use std::atomic when:
- The shared data is a simple type (int, bool, pointer).
- The operation is a single read-modify-write (increment, compare-and-swap).
Use a std::mutex when the critical section involves multiple variables or complex invariants that must be updated together.
std::condition_variable — Synchronising on a Condition
A condition variable lets a thread wait until some condition becomes true, without busy-spinning. It must be used with a std::unique_lock.
#include <iostream>
#include <thread>
#include <mutex>
#include <condition_variable>
std::mutex mtx;
std::condition_variable cv;
bool dataReady = false;
void producer() {
{
std::unique_lock<std::mutex> lock(mtx);
dataReady = true;
std::cout << "Producer: data is ready\n";
}
cv.notify_one(); // wake up one waiting thread
}
void consumer() {
std::unique_lock<std::mutex> lock(mtx);
cv.wait(lock, []{ return dataReady; }); // releases lock while waiting
std::cout << "Consumer: received data\n";
}
int main() {
std::thread c(consumer);
std::thread p(producer);
p.join();
c.join();
}
cv.wait(lock, pred) atomically releases the mutex and puts the thread to sleep. When cv.notify_one() is called (from another thread), the waiting thread wakes up, re-acquires the lock, and checks the predicate. If the predicate is false (spurious wakeup), it goes back to sleep.
std::async and std::future
std::async launches a task asynchronously (possibly on a new thread) and returns a std::future that you can later query for the result:
#include <iostream>
#include <future>
#include <numeric>
#include <vector>
int sumRange(int start, int end) {
int total = 0;
for (int i = start; i <= end; ++i) total += i;
return total;
}
int main() {
// Launch two tasks concurrently
auto f1 = std::async(std::launch::async, sumRange, 1, 500000);
auto f2 = std::async(std::launch::async, sumRange, 500001, 1000000);
int result = f1.get() + f2.get(); // blocks until both finish
std::cout << "Sum 1..1000000 = " << result << "\n"; // 500000500000
}
std::launch::async forces the task onto a new thread. std::launch::deferred runs the task lazily on get(). Without a policy, the implementation decides.
future.get() blocks until the result is ready and returns it. If the async function threw an exception, get() rethrows it.
Worked Example: Producer-Consumer Queue
The producer-consumer pattern is a classic concurrency problem that appears in operating systems, message queues, and streaming pipelines. One or more threads produce work items and push them into a shared queue; one or more threads consume items and process them.
#include <iostream>
#include <thread>
#include <mutex>
#include <condition_variable>
#include <queue>
#include <string>
#include <chrono>
class SafeQueue {
public:
void push(const std::string& item) {
{
std::unique_lock<std::mutex> lock(mtx_);
queue_.push(item);
}
cv_.notify_one(); // wake a waiting consumer
}
// Returns false if stopped and queue is empty
bool pop(std::string& item) {
std::unique_lock<std::mutex> lock(mtx_);
cv_.wait(lock, [this]{ return !queue_.empty() || stopped_; });
if (queue_.empty()) return false; // stopped with no items
item = queue_.front();
queue_.pop();
return true;
}
void stop() {
{
std::unique_lock<std::mutex> lock(mtx_);
stopped_ = true;
}
cv_.notify_all(); // wake all consumers so they can exit
}
private:
std::queue<std::string> queue_;
std::mutex mtx_;
std::condition_variable cv_;
bool stopped_ = false;
};
// Producer: sends task IDs into the queue
void producer(SafeQueue& q, int count) {
for (int i = 0; i < count; ++i) {
std::string task = "Task-" + std::to_string(i);
q.push(task);
std::cout << "[Producer] Pushed: " << task << "\n";
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
q.stop(); // signal consumers that no more items will come
}
// Consumer: processes items from the queue
void consumer(SafeQueue& q, int id) {
std::string item;
while (q.pop(item)) {
std::cout << "[Consumer " << id << "] Processing: " << item << "\n";
std::this_thread::sleep_for(std::chrono::milliseconds(25));
}
std::cout << "[Consumer " << id << "] Done.\n";
}
int main() {
SafeQueue q;
// One producer, two consumers
std::thread prod(producer, std::ref(q), 6);
std::thread cons1(consumer, std::ref(q), 1);
std::thread cons2(consumer, std::ref(q), 2);
prod.join();
cons1.join();
cons2.join();
std::cout << "All done.\n";
}
Key design decisions:
- The mutex protects both the queue and the
stopped_flag. - The condition variable uses a predicate that checks both
!queue_.empty()andstopped_, handling spurious wakeups. stop()callsnotify_all()so every waiting consumer wakes up and can exit cleanly.- This design is easily extended to multiple producers without any changes.
Common Pitfalls
1. Forgetting to join or detach
Destroying a std::thread that is still joinable calls std::terminate(). Always join or detach before the thread object goes out of scope. Consider wrapping threads in RAII helpers or using std::jthread (C++20) which auto-joins.
2. Data races on non-atomic shared variables
Any unsynchronised concurrent write (or write + read) to a non-atomic variable is undefined behaviour. Use a mutex or std::atomic.
3. Deadlocks from lock ordering
If thread A holds mutex1 and waits for mutex2, while thread B holds mutex2 and waits for mutex1, both block forever. Always acquire multiple mutexes in a consistent global order, or use std::scoped_lock (C++17) which uses deadlock-avoidance internally.
4. Spurious wakeups
cv.wait(lock) without a predicate can return spuriously (without notify being called). Always pass a predicate: cv.wait(lock, [] { return condition; });.
5. Accessing a moved-from future
Calling get() on a std::future more than once throws std::future_error. A future can only be consumed once.
6. Expensive locking in tight loops
Locking a mutex on every iteration of a million-iteration loop is extremely slow. Batch work and lock once per batch, or use std::atomic for simple counters.
Practice Exercises
-
Launch 5 threads, each printing its thread ID (hint:
std::this_thread::get_id()) 3 times. Add a mutex so the prints do not interleave. -
Rewrite the race-condition example using
std::atomic<int>and verify the output is always2000000. -
Implement a thread-safe
Counterclass withincrement(),decrement(), andget()methods, using astd::mutexinternally. -
Use
std::asyncto compute the maximum element in each half of a large vector concurrently, then combine the results. -
Extend the
SafeQueueto support multiple producers (no code change needed — explain why it is already safe) and add asize()method that returns the current number of items in the queue.
Summary
std::threadlaunches concurrent execution; you mustjoin()ordetach()before the thread object is destroyed.- Arguments to threads are copied by default; use
std::ref()or a lambda capture for references. - A race condition occurs when two threads access shared data without synchronisation; the result is undefined behaviour.
std::mutexwithstd::lock_guard(RAII) serialises access to shared data and prevents races.std::atomic<T>provides lock-free, thread-safe operations for simple types like integers and booleans.std::condition_variablelets threads wait efficiently for a condition without busy-spinning; always use it with a predicate to handle spurious wakeups.std::asyncwithstd::futureprovides a high-level way to run tasks asynchronously and retrieve their results.- The producer-consumer pattern uses a mutex-protected queue and a condition variable for producer-consumer coordination.
- Common bugs: missing
join/detach, data races, deadlocks from inconsistent lock ordering, andfuture.get()called twice.