Detect Thread Errors using Helgrind

Detect Thread Errors using Helgrind

Multithreading can dramatically boost program performance - but it also introduces subtle, dangerous bugs like data races, which are notoriously hard to detect with traditional debugging techniques. Thankfully, tools such as Helgrind, part of the Valgrind suite, simplify this process significantly. This tutorial explains how to detect thread errors using Helgrind.

Here's a minimal C code that uses two threads to increment a shared counter:

#include <stdio.h>
#include <pthread.h>

int counter = 0;

void* increment(void* _) {
    for (int i = 0; i < 1000000; ++i) {
        ++counter; // Data race here (no synchronization)
    }

    return NULL;
}

int main() {
    pthread_t t1, t2;

    pthread_create(&t1, NULL, increment, NULL);
    pthread_create(&t2, NULL, increment, NULL);
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);

    printf("%d\n", counter);

    return 0;
}

Compile code with debug symbols:

gcc -g main.c -o my_program

Run program with Helgrind:

valgrind --tool=helgrind ./my_program

Helgrind output (excerpt):

==46268== This conflicts with a previous write of size 4 by thread #2
==46268== Locks held: none
==46268==    at 0x1091C7: increment (main.c:8)
==46268==    by 0x485396A: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
==46268==    by 0x4908AC2: start_thread (pthread_create.c:442)
==46268==    by 0x4999A03: clone (clone.S:100)
==46268==  Address 0x10c014 is 0 bytes inside data symbol "counter"

This tells us that two threads are writing to the counter simultaneously without locking, which is a data race.

To fix the code, use a pthread_mutex_t to ensure only one thread can update counter at a time.

#include <stdio.h>
#include <pthread.h>

int counter = 0;
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;

void* increment(void* _) {
    for (int i = 0; i < 1000000; ++i) {
        pthread_mutex_lock(&lock);
        ++counter;
        pthread_mutex_unlock(&lock);
    }

    return NULL;
}

int main() {
    pthread_t t1, t2;

    pthread_create(&t1, NULL, increment, NULL);
    pthread_create(&t2, NULL, increment, NULL);
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);

    printf("%d\n", counter);

    return 0;
}

Re-compile code and run program with Helgrind again. Now you'll see:

==46565== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 3000004 from 7)

No more data races - the program is thread-safe.

Leave a Comment

Cancel reply

Your email address will not be published.