Memory allocation debugging with glibc – heap consistency checks

Although there are many great tools to debug memory allocation errors, like Valgrind or Electric Fence, sometimes these tools aren’t available or it’s not feasible to use them. In this post I want to show you a debugging technique that doesn’t require any other software beside GNU C library. All of the examples were created and ran on a standard x86-64 machine.

GNU C library provides malloc(), free() and other related routines for dynamic memory allocation. Alongside malloc() and friends, glibc features two very interesting mechanisms to help finding common dynamic memory allocation errors.

In the previous article I presented the methods for memory allocation tracing. Today, I want to show you an another technique that helps to catch some common memory allocation and usage errors like off by one (overflow and underflow) or double free errors.

An overview of the Heap Consistency Checking feature.

The heap consistency checking feature consists of the following functions:

  • int mcheck(void (*abortfunc)(enum mcheck_status status))
  • int mcheck_pedantic(void (*abortfunc)(enum mcheck_status status))
  • void mcheck_check_all(void)
  • enum mcheck_status mprobe(void *ptr)

To enable heap consistency checking, you have to call mcheck() before the first malloc() call. What mcheck() does is that it installs a set of debugging hooks for malloc() and friends. This function must be called before the first malloc/calloc call because is would be unsafe to provide this functionality when malloc is already in use. Even if you do call mcheck() after malloc(), it won’t install the hooks and will just return -1, indicating an error. For security reasons, this feature will be disabled by the dynamic runtime linker for set-user-ID and set-group-ID applications. If you really want to enable heap consistency check for these applications, create /etc/suid-debug file.

The mcheck_pedantic() function behaves same as mcheck(), but forces checks on all allocated blocks upon calling any of the malloc() family functions. This will slow things down considerably.

If the library detects an inconsistency on the heap, the abort function provided by abortfunc pointer is called with the status parameter indicating what type of heap violation was detected. Should you supply NULL to the mcheck()/mcheck_pedantic(), a default abort function will be called, printing an error message to the stderr and aborting the process.

Of course, you can trigger the consistency check on demand, but mcheck() must be called beforehand. The mcheck_check_all() function issues the check on all allocated blocks immediately, whereas mprobe(void *ptr) function issues the check on a given memory block, returning an enum mcheck_status  value as the result. The result can have the following values:

  • MCHECK_DISABLED – heap consistency checking is disabled due to mcheck() not being called prior to the first memory allocation function call.
  • MCHECK_OK – no inconsistency detected
  • MCHECK_HEAD – memory before the tested block was overwritten
  • MCHECK_TAIL – memory after the tested block was overwritten
  • MCHECK_FREE – block was freed more than once

“How does this stuff work, anyway?”, you might ask :). Well… 😉
If the HCC is enabled, every allocated memory block is “guarded” by certain known magic numbers both before and after the block. The data before and after the block is compared with these magic numbers during the check – which occurs in the free() function hook. An error is reported if these values differ. The double-free situation is checked in the free() function hook as well – when the block is freed for the first time, it is filled with certain magic numbers as well. If free() is called again, the contents of the block is compared to those magic numbers before it is actually freed. If these numbers match – an error is reported.

Using Heap Consistency Checking with your applications.

Let’s see how it works in some real life scenarios, then. This is the first listing:

This little piece of code has a classic off-by-one error at the highlighted line. Again, it is quite obvious here, but in the real life situations, it is not so easy to spot. The worst thing about it is that this code will run just fine:

See? No problems whatsoever ;).

Now, let’s add the heap consistency check to that one:

Next, compile and run it:

Bingo! Now you can see that something is wrong. The library has aborted the application because it has found an inconstistency on the heap! If we compile this program with debugging information (by adding the -g switch), we will be able to pinpoint the offending memory block by debugging the core dump. I can take the following steps to do this on my system (Arch Linux):

You can tell from the backtrace that the application was aborted at the 30th line of the hcc02.c file (Listing 2). This is the place when we call free() to release the offending block. Then the freehook() function was called. This is the hook function (a handler) for free() installed by the mcheck(). The checkhdr() performed the check on the soon-to-be freed block. Obviously, this check did not succeed, therefore the abort function was called. Since we didn’t provide the user abort function, library called the default one – which is mabort(). This one eventually called the function which printed the error message and raised the SIGABRT signal.

Other methods of enabling the HCC.

There are other ways to enable heap consistency checking. These are most useful when you can’t or don’t want to modify your application source code.

First one is to link the application to the mcheck library by using -lmcheck linker option. Let’s compile the Listing 1 with this option to see how it works:

Recompiling with -lmcheck caused the application to behave in the same manner as we’d added the mcheck() with the default abort function (as seen above).

Second one is to set the MALLOC_CHECK_ environment variable (mind the trailing underscore!). The value of this variable is a 3-bit bitmask that controls the feature behavior upon finding an inconsistency. Bit 0 enables a detailed information about the error, bit 1 controls whether the process should be aborted or continued. If both of these bits are set, the process backtrace and memory map is printed before terminating the process. Bit 2 controls the verbosity of the message printed if bit 0 is set. Let’s use the Listing 1 program with different possible values of MALLOC_CHECK_:

As you can see, using MALLOC_CHECK_ environment variable is the easiest way to utilize the heap consistency check with any application.

This article concludes the two-part series describing memory allocation debugging features included in the GNU C library. Both malloc tracing and heap consistency checking are easy to use and don’t require any external software. I strongly recommend to use them in your software development practice, as these features can really save you some time (and major p.i.t.a. ;)).

Please, feel free to comment this post – and others as well ;).

Cheers!
Cristos. 🙂

4 thoughts on “Memory allocation debugging with glibc – heap consistency checks

  • 12th November 2014 at 22:47
    Permalink

    Cristos, thank you for this article.
    I really like the idea of enabling HCC without source code changes. Also I think you’ve pointed here really important issue, by saying: “The worst thing about it is that this code will run just fine”. Memory/data corruption is even worse than program segmentation fault becasue a programmer does not know about it existence, so the error can be discoverd in production code – really bad !!

    I have one doubt: when debugging the core dump, backtrace says about line 30, where memory is freed. But in this example you described memory overrun, so I expected MCHECK_TAIL error which occurs at line 27 (last itteration), instead MCHECK_FREE at line 30. Did I missed something ?

    Regards

    Reply
    • 13th November 2014 at 20:04
      Permalink

      Jehoszafat,
      The Heap Consistency Checking feature works by installing some hooks when you call mcheck() in your code.
      These hooks are named “mallochook”, “reallochook”, “freehook” and “memalignhook”. Unless you called mcheck_pedantic() or perform the consistency check on demand with mprobe(), the consistency check is performed by the checkhdr() function in the freehook() function only (see glibc source code – malloc/mcheck.c). So, in the given example, memory was overrun in the last ‘for’ loop iteration at line 27, and the check was performed in the hook for the free() function. Then the inconsistency was detected (checkhdr() in the freehook() failed) and program was aborted by calling a default abort function – mabort().

      Best regards,
      Cristos

      Reply
  • 28th April 2015 at 07:25
    Permalink

    Thanks for this post. Its really useful in understanding the HCC.

    But one drawback of using mcheck() is it doesn’t catch the place where the corruption
    happenes. Even if we use mcheck_pedantic(), it check the heap consistency only on every call to malloc() or free() or related function.

    Another solution I tried to implement is using malloc hooks and protecting the memory using mprotect(). But with 32 bit processor, I ran out of memory very quickly as it requires the memory to be aligned on page boundaries also the hooks are deprecated as its not thread safe.

    Do you have any other solution for HCC, which can detect the crash when and where we overwrite the memory.

    Any help is appreciated.

    Thank You!!

    Reply
    • 28th April 2015 at 17:34
      Permalink

      Hi Rahul,
      Thanks for visiting my site, I’m glad that it’s content was helpful to you.

      As for your question, unfortunately you cannot do this with HCC feature by itself.
      However, this kind of memory problems can be diagnosed with valgrind.
      For example:

      [cristos@tesla hcc]$ valgrind ./hcc01
      ==2906== Memcheck, a memory error detector
      ==2906== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
      ==2906== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
      ==2906== Command: ./hcc01
      ==2906==
      ==2906== Invalid write of size 4
      ==2906== at 0x400579: main (hcc01.c:18)
      ==2906== Address 0x51d9130 is 0 bytes after a block of size 240 alloc'd
      ==2906== at 0x4C29F90: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
      ==2906== by 0x400557: main (hcc01.c:14)
      ==2906==
      ==2906==
      ==2906== HEAP SUMMARY:
      ==2906== in use at exit: 0 bytes in 0 blocks
      ==2906== total heap usage: 1 allocs, 1 frees, 240 bytes allocated
      ==2906==
      ==2906== All heap blocks were freed -- no leaks are possible
      ==2906==
      ==2906== For counts of detected and suppressed errors, rerun with: -v
      ==2906== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

      This is the most basic invocation of valgrind which uses its ‘memcheck’ tool by default. As you can see on the output, it has nailed the problematic memory overrun error in the code perfectly.

      Valgrind is a indispensable tool, it is huge, with lots of different tools and options for diagnosing problems and profiling your code. Go ahead an look it up, read its docs (which are very good) and use it 🙂

      Cheers!
      Cristos

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.