Multithreading in C++: Threads, Mutexes, and Writing Thread-Safe Code
Introduction: Why Concurrency Matters in Modern Software
Imagine you’re building a web server. While one request is waiting for data from the database, shouldn’t you be able to handle other requests simultaneously? Or picture a desktop application: while the program loads a large file from disk, shouldn’t the UI remain responsive?
This is where multithreading comes in. Modern C++ gives you powerful tools to write concurrent programs that can do multiple things at the same time. But with great power comes great responsibility—threads introduce complexity, and mistakes can lead to subtle, hard-to-catch bugs.
In this guide, we’ll build a solid mental model of multithreading, starting from the basics and working up to practical, production-ready patterns.
What is a Thread? Building the Mental Model
Think of a thread as a “worker” that executes code independently. If your program is a factory, a single-threaded program is like having one worker doing all the jobs sequentially. A multithreaded program is like hiring multiple workers who can tackle different jobs in parallel.
Here’s the crucial insight: all threads in a single process share the same memory. This is powerful (threads can easily communicate), but dangerous (if two workers modify the same piece of equipment without coordinating, chaos ensues).
In C++, threads are managed by the operating system. You can’t control exactly when each thread runs—the OS scheduler decides that. This unpredictability is a key source of threading bugs.
Creating Threads with std::thread
The std::thread class, introduced in C++11, makes thread creation straightforward:
#include <iostream>
#include <thread>
void worker() {
std::cout << "Hello from thread!" << std::endl;
}
int main() {
// Create a thread that runs the worker function
std::thread t(worker);
// The main thread continues here while t runs in parallel
std::cout << "Main thread continues" << std::endl;
return 0; // DANGER: t is still running!
}
Problem: The program ends while t is still running. The thread is destroyed, which causes undefined behavior.
Solution: Use join() to wait for the thread to finish:
int main() {
std::thread t(worker);
std::cout << "Main thread continues" << std::endl;
t.join(); // Wait for t to complete
std::cout << "Thread finished" << std::endl;
return 0;
}
Passing Arguments to Threads
Threads can run functions with parameters:
void greet(const std::string& name, int id) {
std::cout << "Hello " << name << " (thread " << id << ")" << std::endl;
}
int main() {
std::thread t1(greet, "Alice", 1);
std::thread t2(greet, "Bob", 2);
t1.join();
t2.join();
return 0;
}
Important: Arguments are copied into the thread by default. If you want to pass by reference, use std::ref:
void modify(int& value) {
value += 10;
}
int main() {
int x = 5;
std::thread t(modify, std::ref(x));
t.join();
std::cout << x << std::endl; // Prints: 15
return 0;
}
Using Lambda Functions
Modern C++ threads often use lambdas:
int main() {
std::thread t([]() {
std::cout << "Running in a thread" << std::endl;
});
t.join();
return 0;
}
With captured variables:
int main() {
int value = 42;
std::thread t([value]() {
std::cout << "Value: " << value << std::endl;
});
t.join();
return 0;
}
Joining and Detaching Threads
join(): Waiting for Completion
join() blocks the calling thread until the target thread finishes. Use this when you need the thread’s work to complete before continuing.
detach(): Fire and Forget
detach() releases the thread, letting it run independently:
std::thread t(worker);
t.detach(); // Thread runs independently
// Warning: Be careful with detached threads!
Caution: With detached threads, you lose control. The thread might still be running when your program exits, potentially causing crashes. Use detach() only when you’re certain about the thread’s lifetime.
The Race Condition Problem: A Real Example
Here’s where multithreading gets tricky. Suppose two threads increment a shared counter:
#include <thread>
#include <iostream>
int counter = 0;
void increment() {
for (int i = 0; i < 100000; ++i) {
counter++; // DANGER: Race condition!
}
}
int main() {
std::thread t1(increment);
std::thread t2(increment);
t1.join();
t2.join();
std::cout << "Counter: " << counter << std::endl; // Expect 200000, but...?
return 0;
}
Run this several times. You’ll likely get different results each time: 176543, 194201, 188900… never 200000. Why?
The operation counter++ is not atomic. At the CPU level, it’s three steps:
- Load the current value of
counterinto a register - Increment the register
- Store the register back to
counter
With two threads running simultaneously, this can happen:
- Thread 1: Load counter (0)
- Thread 2: Load counter (0)
- Thread 1: Increment (1), store back to counter
- Thread 2: Increment (1), store back to counter
- Result: counter is 1, but we did 2 increments!
This is a race condition—the outcome depends on the unpredictable ordering of thread execution.
Mutexes: std::mutex and How to Use Them
A mutex (mutual exclusion) is a lock. Only one thread can hold a mutex at a time. When a thread needs to access shared data, it locks the mutex, does its work, then unlocks it.
#include <thread>
#include <mutex>
#include <iostream>
int counter = 0;
std::mutex counter_mutex;
void increment() {
for (int i = 0; i < 100000; ++i) {
counter_mutex.lock();
counter++;
counter_mutex.unlock();
}
}
int main() {
std::thread t1(increment);
std::thread t2(increment);
t1.join();
t2.join();
std::cout << "Counter: " << counter << std::endl; // Now consistently 200000
return 0;
}
Problem: If an exception occurs between lock() and unlock(), the mutex stays locked forever. Other threads wait infinitely—a deadlock.
std::lock_guard and std::unique_lock: RAII Locking
RAII (Resource Acquisition Is Initialization) applies to locking too. std::lock_guard automatically unlocks when it goes out of scope:
void increment() {
for (int i = 0; i < 100000; ++i) {
std::lock_guard<std::mutex> lock(counter_mutex);
counter++;
// Mutex automatically unlocked here
}
}
This is exception-safe. Even if an exception occurs inside the block, the lock is released.
std::unique_lock is more flexible—you can manually unlock and move ownership:
std::unique_lock<std::mutex> lock(counter_mutex);
counter++;
lock.unlock();
// lock can be re-locked or moved to another variable
Best practice: Use std::lock_guard for simple cases and std::unique_lock when you need flexibility.
Deadlock: What It Is and How to Avoid It
Deadlock occurs when threads are stuck waiting for each other. Classic example:
std::mutex m1, m2;
void thread1_func() {
std::lock_guard<std::mutex> l1(m1); // Thread 1 locks m1
std::this_thread::sleep_for(std::chrono::milliseconds(100));
std::lock_guard<std::mutex> l2(m2); // Thread 1 waits for m2
}
void thread2_func() {
std::lock_guard<std::mutex> l2(m2); // Thread 2 locks m2
std::this_thread::sleep_for(std::chrono::milliseconds(100));
std::lock_guard<std::mutex> l1(m1); // Thread 2 waits for m1
}
Both threads wait forever. Thread 1 holds m1 and wants m2; Thread 2 holds m2 and wants m1.
How to avoid deadlock:
- Always acquire locks in the same order across all threads
- Use
std::lock()to acquire multiple mutexes safely:
void thread1_func() {
std::unique_lock<std::mutex> l1(m1, std::defer_lock);
std::unique_lock<std::mutex> l2(m2, std::defer_lock);
std::lock(l1, l2); // Safe: acquires both or neither
// ... critical section ...
}
std::atomic: Lock-Free Thread-Safe Operations
For simple data types, atomic operations are faster than mutexes. Atomics provide thread-safe access without explicit locks:
#include <atomic>
std::atomic<int> counter(0); // Initialize to 0
void increment() {
for (int i = 0; i < 100000; ++i) {
counter++; // Atomic increment, thread-safe!
}
}
Behind the scenes, the CPU provides atomic instructions. No mutex overhead.
Other atomic operations:
std::atomic<int> x(0);
x.store(5); // Atomic write
int val = x.load(); // Atomic read
x.exchange(10); // Swap and return old value
x.compare_exchange_strong(expected, new_val); // CAS operation
When to use atomics: Simple counters, flags, or state variables. For complex data structures, use mutexes.
Condition Variables: Signalling Between Threads
Condition variables let threads wait for a specific event. Classic use case: producer-consumer pattern.
#include <condition_variable>
#include <queue>
std::queue<int> data_queue;
std::mutex queue_mutex;
std::condition_variable cv;
void producer() {
for (int i = 0; i < 5; ++i) {
{
std::lock_guard<std::mutex> lock(queue_mutex);
data_queue.push(i);
std::cout << "Produced: " << i << std::endl;
}
cv.notify_one(); // Wake one waiting thread
std::this_thread::sleep_for(std::chrono::milliseconds(100));
}
}
void consumer() {
while (true) {
std::unique_lock<std::mutex> lock(queue_mutex);
cv.wait(lock, []() { return !data_queue.empty(); }); // Wait for data
if (!data_queue.empty()) {
int val = data_queue.front();
data_queue.pop();
std::cout << "Consumed: " << val << std::endl;
}
}
}
The wait() call releases the mutex while waiting, then reacquires it when notified. This prevents deadlocks and is more efficient than busy-waiting.
Thread-Local Storage with thread_local
Variables declared thread_local are independent for each thread:
thread_local int thread_id_cache = -1;
void worker(int id) {
thread_id_cache = id; // Each thread has its own copy
std::cout << "Thread " << thread_id_cache << std::endl;
}
Useful for caches, stateful objects, or thread-specific data.
Common Concurrency Patterns
Producer-Consumer Pattern
One or more threads produce data; others consume it. Use a queue and condition variables to coordinate:
std::queue<std::string> message_queue;
std::mutex queue_lock;
std::condition_variable queue_cv;
bool done = false;
void producer() {
for (const auto& msg : messages) {
{
std::lock_guard<std::mutex> lock(queue_lock);
message_queue.push(msg);
}
queue_cv.notify_one();
}
{
std::lock_guard<std::mutex> lock(queue_lock);
done = true;
}
queue_cv.notify_all();
}
void consumer() {
while (true) {
std::unique_lock<std::mutex> lock(queue_lock);
queue_cv.wait(lock, []() { return !message_queue.empty() || done; });
if (message_queue.empty() && done) break;
std::string msg = message_queue.front();
message_queue.pop();
std::cout << "Processing: " << msg << std::endl;
}
}
Thread Pool Concept
Instead of creating a new thread for each task, reuse a pool of worker threads. Tasks are queued, and workers pick them up as they become available. This avoids the overhead of thread creation.
std::async and std::future: Higher-Level Concurrency
For simpler cases, std::async handles thread management for you:
#include <future>
int expensive_calculation() {
std::this_thread::sleep_for(std::chrono::seconds(2));
return 42;
}
int main() {
std::future<int> result = std::async(std::launch::async, expensive_calculation);
std::cout << "Doing other work..." << std::endl;
int value = result.get(); // Block until calculation completes
std::cout << "Result: " << value << std::endl;
return 0;
}
std::async returns a std::future that you can wait on with get(). Cleaner than manual thread management for many scenarios.
Practical Example: Parallel File Processing
Process multiple files in parallel:
#include <vector>
#include <thread>
#include <fstream>
std::vector<std::string> files = {"file1.txt", "file2.txt", "file3.txt"};
std::mutex result_mutex;
std::vector<std::string> results;
void process_file(const std::string& filename) {
std::ifstream file(filename);
std::string content((std::istreambuf_iterator<char>(file)),
std::istreambuf_iterator<char>());
// Process content...
std::string result = "Processed: " + filename;
{
std::lock_guard<std::mutex> lock(result_mutex);
results.push_back(result);
}
}
int main() {
std::vector<std::thread> workers;
for (const auto& file : files) {
workers.emplace_back(process_file, file);
}
for (auto& worker : workers) {
worker.join();
}
std::cout << "Processed " << results.size() << " files" << std::endl;
return 0;
}
Performance Considerations
-
Lock Contention: The more threads compete for the same mutex, the slower things get. Use fine-grained locking (protect small critical sections).
-
Context Switching: Too many threads cause the OS to spend more time switching between threads than actually running them. A good rule of thumb: number of threads ≈ number of CPU cores.
-
False Sharing: Two threads modifying different variables that happen to be on the same CPU cache line causes the cache to constantly invalidate. Align data carefully.
-
Atomic vs Mutex: Atomics are faster for simple operations. Mutexes are clearer for protecting complex data structures.
-
Lock-Free Data Structures: For extreme performance, use lock-free queues or other structures, but they’re complex to implement correctly.
Conclusion
Multithreading is powerful but requires careful design. Start with clear mental models: threads share memory, race conditions arise from unsynchronized access, and mutexes/atomics protect shared data. Use RAII-style locking (std::lock_guard), avoid deadlock by consistent lock ordering, and test thoroughly—threading bugs are notoriously hard to reproduce.
The patterns shown here—producer-consumer, fine-grained locking, condition variables—form the foundation of robust concurrent C++ code. Master these, and you’ll write programs that are both correct and efficient.
What aspect of multithreading do you want to explore next? Drop a question in the comments!
Take Your C++ Further
If you’re looking to go deeper with C++, the C++ Better Explained Ebook is perfect for you — whether you’re a complete beginner or looking to solidify your understanding. Just $19.
👉 Get the C++ Better Explained Ebook — $19