Multithreading can dramatically boost program performance - but it also introduces subtle, dangerous bugs like data races, which are notoriously hard to detect with traditional debugging techniques. Thankfully, tools such as Helgrind, part of the Valgrind suite, simplify this process significantly. This tutorial explains how to detect thread errors using Helgrind.
Here's a minimal C code that uses two threads to increment a shared counter:
#include <stdio.h>
#include <pthread.h>
int counter = 0;
void* increment(void* _) {
for (int i = 0; i < 1000000; ++i) {
++counter; // Data race here (no synchronization)
}
return NULL;
}
int main() {
pthread_t t1, t2;
pthread_create(&t1, NULL, increment, NULL);
pthread_create(&t2, NULL, increment, NULL);
pthread_join(t1, NULL);
pthread_join(t2, NULL);
printf("%d\n", counter);
return 0;
}
Compile code with debug symbols:
gcc -g main.c -o my_program
Run program with Helgrind:
valgrind --tool=helgrind ./my_program
Helgrind output (excerpt):
==46268== This conflicts with a previous write of size 4 by thread #2
==46268== Locks held: none
==46268== at 0x1091C7: increment (main.c:8)
==46268== by 0x485396A: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
==46268== by 0x4908AC2: start_thread (pthread_create.c:442)
==46268== by 0x4999A03: clone (clone.S:100)
==46268== Address 0x10c014 is 0 bytes inside data symbol "counter"
This tells us that two threads are writing to the counter
simultaneously without locking, which is a data race.
To fix the code, use a pthread_mutex_t
to ensure only one thread can update counter
at a time.
#include <stdio.h>
#include <pthread.h>
int counter = 0;
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
void* increment(void* _) {
for (int i = 0; i < 1000000; ++i) {
pthread_mutex_lock(&lock);
++counter;
pthread_mutex_unlock(&lock);
}
return NULL;
}
int main() {
pthread_t t1, t2;
pthread_create(&t1, NULL, increment, NULL);
pthread_create(&t2, NULL, increment, NULL);
pthread_join(t1, NULL);
pthread_join(t2, NULL);
printf("%d\n", counter);
return 0;
}
Re-compile code and run program with Helgrind again. Now you'll see:
==46565== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 3000004 from 7)
No more data races - the program is thread-safe.
Leave a Comment
Cancel reply