A crash course in GNU Debugger – A GDB tutorial

A GNU Debugger, originally created by Richard Stallman in 1986, is a standard debugger for the GNU operating system and an essential tool for software developers. Being a portable application, it is widely used on many Unix-like operating systems as well. In this article I’ll show some basic commands and techniques for debugging software with GDB. I’ll be using GDB version 7.9 under Arch Linux on x86-64, unless noted otherwise. Source code for this article is available on my GitHub site.

Prerequisites – debugging information

First of all, it helps a lot if your application is built with the debugging information, or this information is available separately. Debugging information basically maps a symbol address to an object name in the source code (e.g. 0x25f352a is ‘int x’ and 0x7f25fea0 is ‘void printStuff()’ ) and a compiled machine code instruction to a particular source code line.

Since GCC and GDB run on many different platforms, there are many different formats of debugging information as well. On Linux and other Unix-like operating systems running on Intel’s x86, the ELF – Executable and Linkable Format – and DWARF debugging information format are standard.

To generate the debugging information for your program, use ‘-g‘ or ‘-g2‘ (they are semantically equivalent) GCC option. Long story short, this will generate debug info in your system’s native format. Use ‘-g3‘ to have some extra info about preprocessor macros, so you can debug them as well. To generate the most debug info you can get, use ‘-ggdb3‘, which generates the debugging info with additional GNU extensions in the most expressive format available on your system. However, these extensions are only readable by GDB, so running a different debugger with these can cause crashes or other scary stuff to happen :). It is possible to debug an optimized application as well. In fact, GDB is one of the very few debuggers that are able do it at all. However, some “strange” (but still correct in such case) behavior is likely to occur, e.g. unavailable (optimized out) variables or code. So, unless you really want to, please refrain from using optimization flags when debugging your application. Interestingly, debugging information is only read when it’s needed, so the program won’t run any slower or won’t need more memory, unless being run by GDB. Therefore, GDB manual suggests that you should always compile your code with debugging information.

Many Linux distributions offer separate packages with debugging information only. There are two methods of locating the debugging information files automatically. The .gnu.debuglink section in the executable specifies a file name (usually ending with .debug) whereas .note.gnu.build.id specifies a hex number – a unique build identification. If one of these is present, it is read by the GDB and the debugger tries to find files specified in these sections in the default debug symbol directory – which usually is /usr/lib/debug. Of course, you can always specify the location of a debugging information file manually – using a -s command line switch or the sym (or symbol-file) command inside the GDB shell.

Note that it is always possible to debug an application, even without having any debugging information. However, in these situations you will have a disassembled code only and lots and lots of ‘bare’ addresses to examine. Nonetheless, this is fun (and a great learning experience) to see what’s going on in such programs :).

Controlling program execution

Now that you know how to generate the debugging information when compiling your application, let’s get down to business and debug something. Let’s start with a basic program:

Obviously, this code doesn’t do anything interesting, but it will be OK for our debugging session presentation. Let’s compile it with the optimizations disabled (-O0) and debugging information generation enabled (-g), and, finally, run it using GDB:

After the “NO WARRANTY” disclaimer, you can see that GDB has successfully loaded the debugging information for our program and waits for the user input. At this moment, the program is ready to be run. Of course, you can skip this verbose disclaimer by passing ‘-q‘ (or ‘–quiet‘, ‘–silent‘), so only the symbol loading information and prompt will be displayed.

GDB features a tab key completion – a very handy, shell-like completion for commands, parameters, symbols and file names. A very nice help for all of the commands is available as well – just enter ‘help‘ followed by a command or ‘apropos‘ with word to search in help.

At this point, the program is loaded and ready to be run. Now, let’s look at some essential execution control commands:

  • run (r) – run program
  • start – create a temporary breakpoint at an entry point of a program and run it
  • continue (ccont, fg) [count]  – continues execution, optionally specify to ignore this breakpoint [count] times
  • next (n) [count]- continue to the next source line, stepping over any function calls, optionally step count times (but stop at a breakpoint if hit)
  • step (s) [count] – continue to the next source line, stepping into a function call, optionally step count times (but stop at a breakpoint if hit)
  • nexti, stepi – same as above, but use one assembly instruction rather than source line
  • finish – finish execution of a function (in a current stack frame)
  • return [value] – return immediately (abort) from a function with an optional return value (e.g. pop current stack frame)
  • until [location] – continue execution up to the given location (see below for location parameter description.
  • jump [location] – jump to location and continue program execution

One useful shortcut – tapping <ENTER> at the GDB prompt will execute the previously entered command again.

We can use ‘run‘ or ‘start‘ commands to run our program. Command line parameters can be placed after these commands, and, input/output redirection (e.g. ‘<‘, ‘>‘, ‘>>‘ operators) can be used as well. Once passed with ‘run‘ or ‘start‘, command line arguments stay unchanged across the program runs, as they are stored in the args GDB variable. These may be explicitly specified using ‘set args’ command. Using ‘set args’ without any arguments clears the argument list. As with any other GDB variables (and there are a lot of them), you can show a variable anytime, e.g. ‘show args’. If your application needs a particular environment setup, use ‘set env variable value‘ or ‘unset env variable value for that. You can use ‘show env‘ with an optional variable parameter for showing the whole environment or a given variable. Please bear in mind that all of these changes (both arguments and environment) will affect your program upon next execution.

If you use ‘start‘ command, you have to keep in mind that the application’s entry point will vary across programming languages. For the C language under Linux/Unix, default entry point is the main() function, which is called after the _start initialization code from glibc (see start.S file for your platform in the glibc source). However, if you’re using C++, this command can break before main(), as static/global objects constructors may be executed beforehand. If, for example, you’re developing for a bare metal embedded target, the entry point will be different as well.

This is the GDB output after ‘start‘ing our program:

GDB tells you that it created a temporary breakpoint at the beginning of the main() function, at line 8 of the listing1.c file. Don’t mind the warning about the linux-vdso.so object. It’s a common and harmless warning about missing debugging information for a virtual shared object that is loaded by kernel and injected into the address space of every program to provide maximum system call performance. After this warning GDB informs you that your program hit the temporary breakpoint at the line 8, prints the line to be executed and waits for user action with the prompt.

You can now find out how to use some of the commands listed above when examining the program flow, e.g.:

Although this program’s flow of execution is really simple, you can still see a number of interesting things going on here. First, you can see how ‘stepi‘ and ‘nexti‘ (abbreviated ‘si‘ and ‘ni‘, respectively) present the output. Those commands show the real program counter value before the line number to be executed. Second, please note that despite using ‘step‘ (abbreviated ‘s‘) to enter the ‘atoi()’ function at the 12th line, nothing actually happens. That’s because we don’t have the debugging symbols for that function. Moreover, when ‘finish’ is invoked for the first time, the program counter was in the ‘malloc()’ function, which, obviously, doesn’t have any debugging information on my system. GDB then tells you that it will run until it exits the ‘malloc()’ function. Then GDB provides you the information about where in the main the execution will continue. Tapping <ENTER> executes ‘finish’ again, but in a not meaningful context, as the warning says.

All about breakpoints… and more 🙂

Setting breakpoints

In most cases though, stepping through your program from an entry point with ‘start’ is not really usable. So, after you have loaded your program, you may want to set some breakpoints at specific places in your code with ‘break‘ (or ‘b‘) command. The syntax is ‘break [PROBE_MODIFIER] [LOCATION] [thread THREADNUM] [if CONDITION]. All of the parameters are optional and when break is invoked without them, it will set a breakpoint in the current stack frame at the next source line to be executed. The LOCATION specifier, also known as the ‘linespec’, is the method of defining where to place the breakpoint. The same format is used in other commands (like list, edit or until), and the following specifiers are possible:

  • function, e.g. ‘break main‘ – set a breakpoint at the beginning of the function.
  • linenum, e.g. ‘break 17‘ – set a breakpoint at the 17th line of the current source file, or at the closest line possible after it.
  • filename:linenum, e.g. ‘break program.c:12‘ – set a breakpoint at the 12th line (or closest one possible) in the file program.c
  • filename:function, e.g. ‘break program.c:myPrint’ – set a breakpoint at the beginning of the myPrint function in the program.c file
  • +offset or -offset, e.g. ‘break +10‘ – set a breakpoint 10 lines farther from the line the program stopped (in the current stack frame).
  • *address, e.g. ‘break *address‘ – set a breakpoint at a specific address. This can be specified using the current programming language expression that gives an address e.g., for C, ‘break *calculateFoo‘ (function name is its address). A very useful specifier for a code without debugging information.
  • label, e.g. ‘break cleanup‘ – set a breakpoint at the ‘cleanup’ label in the current function (current stack frame)
  • function:label – as above, but search in the specified function

Conditional breakpoints are created using if parameter. The CONDITION is an expression in the source file language, which is evaluated each time the breakpoint is hit. If this expression is true (e.g. evaluates to a non-zero value), the program is stopped. Interesting thing about conditional breakpoints is that they actually can call other functions that exist in current context. For example, if you have a global function max() that determines the maximum of its parameters, your breakpoint condition may look like ‘break program.c:120 if max(a,b,c)==3‘. You can add a condition to an existing breakpoint by using ‘condition‘ command as well.

The thread parameter is used to set a breakpoint inside a given thread selected by THREADNUM value. More about debugging multiple processes and threads can be found in the advanced part of this tutorial (coming soon 😉 ). There’s one more parameter for break command. The PROBE_MODIFIER parameter is used when the command is to be placed in a probe point. We won’t be using any SDT probes here – see the GDB help or manual for more information.

The same syntax and parameters are used with other commands that set different kinds of breakpoints. The ‘hbreak‘ command sets a hardware assisted breakpoint. Such breakpoint uses a dedicated logic in the CPU instead of a software interrupt instruction to stop program execution and transfer control to the debugger. This may or may not be available, depending on your target platform. Please be aware that in order to set a hardware assisted breakpoint, program must be already started. When using ‘break‘, GDB will try to use hardware-assisted breakpoints by default, if possible. This feature is controlled by the ‘set breakpoint auto-hw {on|off}’ command. The ‘tbreak‘ command sets a temporary breakpoint – a standard software breakpoint that is immediately deleted when hit. The ‘thbreak‘ is the combination of the previous two – sets a temporary, hardware-assisted breakpoint. The ‘rbreak regexp‘ command sets a breakpoint in any function that matches the regular expression regexp. It uses a grep-like regular expression syntax.

Breakpoint control

The ‘info breakpoints‘ (or ‘i b‘) command prints the table with all breakpoints (as well as watchpoints and catchpoints) that are set. Most of the commands in this paragraph also apply to watchpoints and catchpoints as well. This is how a breakpoint table looks like:

It comprises several columns, which are:

  • Num – breakpoint number
  • Type – breakpoint, watchpoint or catchpoint
  • Disp – a disposition – i.e. action to be taken when hit. Three dispositions are possible – keep, dis or del – which means to keep it, disable it or delete it, respectively
  • Address – the address of the breakpoint in memory. Can be <PENDING> – when breakpoint is pending, i.e. is supposed to be set in a shared library that is not loaded yet, or <MULTIPLE> – when it refers to multiple locations in the code
  • What – source code location of the breakpoint

There are several commands for controlling the breakpoints in your program. Two commands are used to delete breakpoints, that differ slightly in semantics. First one, ‘clear [location]‘, deletes a breakpoint at the specified location, or deletes any breakpoints at the next line to be executed in the selected stack frame when issued without parameters. The other one is ‘delete breakpoints [no. or range] (abbreviated ‘delete [no. or range]‘ or ‘d [no. or range]‘). It is used to delete breakpoints, watchpoints and catchpoints (or ranges of them) by number. If no arguments are given, all breakpoints are deleted. Luckily, GDB asks for confirmation of delete command by default.

Breakpoint can be disabled as well, by using ‘disable breakpoints [breakpoint no. or range]‘ (abbreviated ‘disable’ or ‘dis’) command. The ‘enable breakpoints‘ (abbreviated ‘enable’ or ‘en’) is used for re-enabling the breakpoints, yet there are some interesting features in it, as follows:

  • enable breakpoints [no. or range] – enable one or more breakpoints
  • enable breakpoints once [no. or range] – enable one or more breakpoints and disable when hit
  • enable breakpoints count [count] [no. or range] – enable one or more breakpoints for a count of times
  • enable breakpoints delete [no. or range] – enable one or more breakpoints and delete when hit – effectively transforms a standard breakpoint to a temporary one.

When no parameters are given, enable or disable will act on every breakpoint (watchpoint and catchpoints as well).

Since breakpoints are deleted when another program file is loaded or when GDB exits, it would be very inconvenient , you can save the breakpoints into a GDB command file (a GDB script) with ‘save breakpoints filename‘, and load them with ‘source filename‘. It is a text file, so you can even modify it by hand, or add other GDB commands if you like ;).

If it turns out that you need a conditional breakpoint instead of an ordinary one, you can always add a conditional expression to an existing breakpoint using ‘condition Num [expression]‘ command. Similar to the previous commands, it clears the condition for a given breakpoint when issued without expression parameter.

Each breakpoint (and watchpoint or catchpoint) can be assigned to execute some GDB commands when hit. This very powerful feature can help you nailing some more tricky problems in your program. For example, you can enable/disable other breakpoints when one breakpoint is hit, you can calculate stuff ‘on the fly’, print stuff, modify stuff – everything as the program runs. This is done using ‘command [breakpoint no. or range] … end‘ command, where ‘…’ is where the command list go.

Here are a few examples of setting breakpoints in our program:

Here’s what happens if you want to set a hardware breakpoint with a condition:

At first, it seems that the we can’t use hardware breakpoints at all, but after the program has been started, they work ok. So this is is how our breakpoint table looks like after all the examples above:

We have four breakpoints, the last one being a hardware one with a condition. We’ve had a temporary breakpoint that was assigned a number 4, but, being a temporary one, it’s disposition is ‘delete’ when hit. We can also add a condition to other breakpoints, like this:

And this is how you utilize the command list described above – a really powerful technique:

In this command list we print a value of ‘i’ multiplied by 3.33 and then continue the execution. The ‘silent’ expression means that GDB will not print out the information of a breakpoint being hit.

There’s another interesting thing to keep in mind – if you set a breakpoint on a given location, keep in mind that it may refer to several locations in the code. For example, setting a breakpoint at a class constructor in C++ will set breakpoints in all constructors. Look at the example below:

We have defined two constructors for the MyClass class. Now, the GDB session:

As you can see, one ‘break MyClass::MyClass‘ command sets breakpoints in 3 locations that match the parameter – two explicitly defined constructors and a copying constructor generated by the compiler. Of course, you can always be more specific and, for example, enter ‘break MyClass::MyClass(int)‘ to set a breakpoint in that particular constructor.

Watchpoints and catchpoints

Watchpoints, or data breakpoints, are a special kind of breakpoints that make GDB stop running your program whenever a value of a given expression changes. The expression in question may be a simple variable, a memory region, or even a valid expression in the current programming language. It’s worth mentioning that watchpoints can be implemented in hardware as well. A hardware watchpoint uses a dedicated hardware in the CPU – just like the breakpoints do, so it won’t slow down your application. A software watchpoint, on the other hand, is implemented as a set of instructions that check the given memory range (e.g. a local or a global variable, memory range), evaluates the expression if applicable and stops whenever it detects any change. This ‘subprogram’, as we may call it, is interleaved with each of your programs machine code instruction, thus slowing its execution down by many orders of magnitude. Software and hardware watchpoints also behave differently when debugging multiple threads of execution. Software watchpoints can watch the expression in a single thread – so you must be quite sure that no other thread will change anything within a given expression. Hardware watchpoints, however, watch the expression in all threads.

The ‘watch‘ command sets a watchpoint in your session. The syntax is ‘watch [-l|-location] [expr]’. If  the ‘-l‘ or ‘-location’ optional switch is provided, GDB will watch the memory location the expr (which must be a valid expression in the language of the source code) refers to.  The ‘rwatch‘ command (read watch) has the same syntax, but breaks when the value of expr is read, whereas ‘awatch‘ (access watch) breaks on both read and write. Here’s an example of how to use watchpoints:

An example of a read watchpoint behavior is presented below:

And here’s how access watchpoints work – a loop from calculateFoo function gives a perfect example:

A catchpoint is an event breakpoint. It’s set with a ‘catch‘ or ‘tcatch‘ (temporary catchpoint) command, and it will stop the program execution on a certain event, like:

  • catch [throw|rethrow|catch] [regexp] – C++ exceptions, all, or matching optional regexp parameter
  • exception – Ada exceptions
  • exec – an exec system call
  • syscall [name|number] – a system call, by name or number
  • fork/vfork – a fork or vfork system call
  • load/unload [regexp] – shared library load/unload event, all, or matching optional regexp parameter
  • signal [name|number|all] – signal delivered to the application, by name or number. ‘All’ means to break on all signals, even those used by GDB itself.

Here’s a simple example of ‘catch’ usage and behavior:

As you can see, a catchpoint on loading a shared object event was hit two times before a reaching a temporary breakpoint in main(). First, the linux-vdso is ‘loaded’ (see the top of the article – it’s actually injected into the address space), and then libc.so – a C standard library – is loaded. Remember that ‘linux-vdso.so’ is a virtual library provided by the kernel and doesn’t really exist in the filesystem.

Examining your software

No real debugging would be ever possible without peeking and poking both the data and the code. Of course, GDB offers an extensive set of features for these, so let’s take a look at them.

Listing source code

Obviously, the list command is used for listing code, and there are some interesting features that make this command quite versatile. Here are some examples of these:

  • list [file:]36 – list source at around the 36th line of the [file] or the current file
  • list main, list [file:]calculateFoo  – list source around the function definition
  • list [function]:label – list source around the label
  • list *0x400640 – list source at around the given address
  • list 10,45 – list source from the 10th to the 45th line.
  • list 10, – list from the 10th line
  • list ,45 – list up to the 45th line
  • list +20 – list 20 next lines
  • list -15 – list 15 previous lines
  • list – – list the source before the previous listing
  • list – with no arguments, list more lines from the previous listing

Apart from using an ordinary linespec, like break does, a range can be used as well. By default, GDB lists ten lines “around” a given argument, i.e. the 5th being the place you asked to be shown. This can be changed (and I frequently do this) with set listsize to a number that is comfortable in a given situation, or unlimited (or 0). There is a slight change of behavior when <ENTER> is tapped to invoke the last command again. With list, it is semantically equivalent to entering list with no arguments, so more lines will be listed starting from the previous listing.

Another useful command for examining the code is the ‘info line [linespec]‘ command. It will present you with a detailed information on the addresses within the program code area (e.g. .text section) that a given line maps to, like this:

By using an address, a ‘reverse’ mapping can be obtained, i.e a source code line a given address is mapped to.

More often than not, especially while debugging an application that you don’t know well, you may lose track of which source file you’re in at the moment. In such moments of confusion, the info source command comes to the rescue! It gives you some nice info about the file you’re in. You can use ‘info sources‘ to list all source files for which the symbols has been loaded (or pending load):

The ‘info functions REGEXP‘ command will print out all the matching functions (or all) signatures. If you need to get an address of a function and/or the file it is defined in, use ‘info address‘ and ‘info symbol‘ commands, as below:

Disassembling your program

Disassembling is a process of translating the binary code found in the executable, library or object file to a human readable form of the assembly language mnemonics. This may be your last resort when the source code or debugging information are not available. Besides, getting to know your platform’s assembly language and the operating system’s application binary interface is an extremely valuable learning experience :).

You can disassemble your application in GDB using ‘disassemble‘ command.  When invoked without any parameters it will disassemble the function of the current stack frame. It can be invoked with the following parameters:

  • disassemble function – disassemble a given function
  • disas 0x40068d – disassemble the function surrounding a given address
  • disas 0x40068d,0x400700 – the range of addresses
  • disas 0x40068d, +length – the ‘length’ bytes starting from a given address
  • disas ‘function’::file.c – disassemble ‘function’ in ‘file.c’ – mind the syntax!

Two additional modifiers can be added to the parameters. The ‘/m‘ modifier (‘mixed’) adds the source code lines to the assembly output, if available, whereas ‘/r‘ adds the ‘raw’ opcodes to the assembly mnemonics. Look at the following example of disassembling the listing1.c program, compiled for x86_64. Note the arrow indicating current program counter address and the jump addresses in both hexadecimal and ‘function+offset’ form:

As you have noticed, the x86 assembly uses the AT&T syntax that is commonly used by the GNU toolchain. However, this can be changed using ‘set disassembly-flavor [att|intel]‘. Another useful disassembly related feature is ‘set disassemble-next-line [auto|on|off]‘. Setting this one to on makes GDB print next source code line and its disassembly as well, including the opcodes. If set to auto mode, it will display disassembly only if the source code for the next line is not available. See the example below:

Remember the ‘info line’ command from the previous section? When used with the ‘examine‘ command (abbr. ‘x‘), it is possible to look at the next assembly instruction to be executed:

The ‘info line‘ command, besides printing the line info, sets the default address for ‘examine‘. Next, ‘x/i‘ tells GDB to examine a memory block, starting from a default address, and interpret it as an assembly instruction. See ‘Examining data’ for more information on this command.

Inspecting the stack

The stack can be examined with the backtrace‘ command (abbr. ‘bt, also ‘where‘). Backtrace is a list of functions that where called up to the current place of execution. Each function call creates a stack frame, where function arguments, return address and local variables are stored. So, the backtrace command lists the current stack frame and all the parent frames up to the outermost, main() stack frame. It looks like this:

As you can see, it prints the exact place where the execution stopped, with all the arguments. However, this information is still somewhat general, as it is lacking information about local variables or the frame pointers. To get information about locals, use ‘backtrace full‘ command, which in our case prints this:

In most cases, ‘bt‘ or ‘bt full‘ will give you sufficient amount of information about the stack. The detailed data about a given stack frame can be obtained by ‘info frame [framespec]’ command. It will print a complete stack frame data with addresses and caller-saved register values. The framespec parameter can be a frame number or a frame address. Oddly enough, this command will not print the locals, so you may have to use ‘info locals‘ or more elaborate ‘info scope‘ just for that ;). If you want to see the function arguments only, use ‘info args‘. An example of using the mentioned commands:

The ‘info args’, ‘info locals’ and many other GDB commands (like ‘finish’ or ‘return’) will refer to a current stack frame. With ‘frame [framespec]’ command you can select any frame from the stack to operate on as current, by its number or address. You can move ‘up‘ or ‘down‘ by one or more frames on the stack as well. Each of these commands will print the general info about the new current stack frame, like so:

Investigating data

The ‘print‘ (abbr. ‘p‘, alias ‘inspect‘) is one of the commands used for data inspection with GDB. As you might expect, it is more than just a data printer. This command will evaluate and print the result of a given expression that is written in the language of the debugged program. By using an optional /f format specifier you can present the output in one of the following formats:

  • /x – hexadecimal
  • /z – zero-padded hex
  • /d – decimal
  • /u – unsigned decimal
  • /o – octal
  • /t – binary
  • /f – floating point
  • /s – string
  • /c – character
  • /a – address

A special ‘@‘ operator may be used to show a memory referenced by a pointer as an array, e.g. “print *argv@3” – print 3 consecutive values from an array that starts from argv pointer. Here’s an example of using print command in the wild (using listing1.c):

Since the argument for print is an expression – you can call any of the defined function with it, like in the last example. Because of this nice feature, another alias to print is call. For more information about expressions, please see the GDB Manual.

Interestingly, each printed value looks like a variable assignment. This is because GDB stores each print command output in the value history. Each printed value is assigned an index and is accessible by $index statement. Two convenience values can be used as well – the ‘$‘ and ‘$$‘ values that mean last and second-to-last value printed, respectively. This can be very useful in situations like the following. First, the code:

And the debugging session:

In this example session we’re using a previous value from the history to get a list item by pointer and use the “next” pointer to scan through the linked list. Because we’re referring to the “last one” printed value – getting the next item is done just by tapping the Enter key. You can print the value history with a ‘show values n‘ command when necessary. Print command is also extensively configurable. A number of ‘set print …‘ commands are at your disposal to set up the output of print according to your needs, e.g. set print array on will display arrays in a more readable style, or set print null stop on will make print stop printing a string (or an array) whenever NULL character is encountered. There are a lot of these commands. Go and see for yourself, enter ‘set print‘ for a complete list of settings, use the GDB’s built-in help or look them up in the GDB manual.

More in-depth info can be obtained with ‘examine‘ (abbr. x‘) command. It uses the same syntax and format modifiers as print does, and adds some more to control the output, like the number of how many items to examine and a size of an item, e.g. “x /4g var” which means “examine four giant words starting from an address stored in var. The additional size specifiers are: b, h, w, g – byte, halfword (2 bytes), word (4 bytes), giant word (8 bytes), respectively. If no format specifier is entered, examine uses default one – one hexadecimal word.This command also adds an /i format to interpret the data as a target machine instructions. We have already used this – see the disassembly section.This command also sets two convenience variables – $_ (single underscore) is the last address examined whereas $__ (double underscore) is the data pointed by that address. Here are two short examples of what ‘x’ command can do, using slightly modified program from the Listing 3. First, the basic usage:

And another one – a deep memory inspection of a linked list:

So, in this example we have found out the exact memory layout of the linked list items (and the list itself of course) using the examine command.

Oftentimes you may want to print a value every step or, more generally, whenever the execution of your program is halted. To save you the effort of printing again and again, a ‘display‘ command is used to automatically display a value of a given expression if it can be evaluated in current context. Each displayed expression is added to the display list which can be printed with ‘info display‘. It uses the same syntax and format modifiers as examine. Since all the auto-displayed items are stored in the list, the operations are similar to, say, breakpoint list. Therefore, you can ‘delete display x (or ‘undisplay x‘ – apparently, whatever displayed can be undisplayed 😉 ), as well as enable/disable display x, where is a list of display list items to operate on.

If you happen to have a GDB compiled with Python extension (most Linux distros have it that way, AFAIK), there’s an interactive command to help you investigate more complicated types in your software. The explore command accepts either an expression or a type name. However, if you don’t know the type of a variable, use explore type variableRegardless of the extra parameter, you’ll eventually come to the interactive menu that looks like this:

With this command, you can explore any given type and any given value you can encounter in your programs. However, sometimes an interactive command is a bit unwieldy. If this is the case, both ‘whatis’ and ‘ptype’ commands print the type information about a given expression. The whatis command usage scenario is simple. For example, during a debugging session, you come across a variable or a complicated expression and you want to know the type of it. Simple and effective, at least in easy cases. On the other hand, ‘ptype‘ command gives you way more detailed information about the type or an expression. It substitutes all typedefs by default, and, when using C++, can print method and typedefs defined in class. This behavior is configurable by an additional flags parameter – see help ptype for details. Here’s an example of how to use these commands

Apart from the dedicated data examination commands like print, examine and others, there are several info commands to display data. Your CPU registers and their contents can be printed with info registers, however, this will print only the integer ones. Floating point unit status can be obtained with info float, and, if your CPU has a vector unit, you can print its registers too, by using info vector command. To display a full set of registers in one go, just type info all-registers (or info registers all). You can use print for displaying the registers contents as well, just prefix the register name with $, e.g.: print $rbx or print /t $eflags. Of course, some register based data like the program counter or a stack pointer are universal across the architectures supported by GDB, so there are some convenient aliases for these:

  • $pc – a program counter
  • $sp – a stack pointer
  • $fp – a frame pointer
  • $ps – program status word (a.k.a. flags register)

Altering your software

So far we’ve only peeked at the internals of the software being run with GDB. But, no debugger would ever be useful if it hadn’t had the possibility to modify a running piece of software, both data and code-wise. Since GDB is a complete debugger, let’s use it and poke our programs with some numbers, shall we? ;).

Modifying variables and registers

To set any variable visible in current context, use set var VARIABLE_NAME=VALUE to do so. If the variable name is not ambiguous (it doesn’t collide with the other set command names), the ‘var’ part may be omitted. An arbitrary memory address may be used instead of a variable name, but it must be cast to a correct type using the GDB cast syntax. Here we have a Listing 4, and, let’s use this one for some ultimate peeking and poking 😉

Let’s start with something simple – print some values and then change them using various expressions:

And now, the change part:

Well, nothing to it, it works as expected :). However, remember that the you assign a return value of an expression, so you can do more crazy things with this, like (restarting from before the change part):

In this case we’re using a convenience variable $tmp – a weakly and dynamically typed variable managed by GDB. These variables exists within the GDB – your program is not affected by them whatsoever. Thus, these are free to use as a placeholders, temporaries, what have you. GDB offers convenience functions as well – these are defined inside the GDB and may be used in any expression. Use “show convenience” to list all the convenience variables and functions, including ones predefined by GDB. See the manual for more details on them.

In the example above we allocate a new, 3 element array of struct nice_struct and assign its address to the $tmp convenience variable. Then, after printing out the original array, we assign this address to the original variable. To confirm that it has changed, we print the ns_arr again. With this technique it is possible to switch the data the inferior operates on dynamically during the debugging session.

If you don’t have a symbolic variable available (e.g. no debugging symbols are available), an address-based approach to altering data is presented below:

Note that a raw address is a simple literal, it’s not a l-value and cannot be assigned. Therefore, it has to be cast to a desired type before assignment with the brackets notation, like {int}.

Remember that all registers can be printed with ‘print’ and prefixing their name with a $. The good news is, the registers can be poked with arbitrary values with set. For example, to make the program jump to another code location, set your program counter value to a new one, like so:

Memory dump and restore

If you have to examine (or even change) a larger chunks of memory, you can dump a memory region (and even single variables and expression results) to a file and restore it later. This is done with the dump/append and restore commands. Let’s dump 3 out of 4 elements of the ns_arr array:

Next, lets print the contents of the dump file:

Now we can zero-out this part of the ns_arr:

Now we can restore the array from file:

A very powerful and useful feature, this is. Apart from raw binary dump/restore, there are several binary data formats available, including the ubiquitous Intel HEX. Please refer to the GDB Manual or embedded GDB help for more details about these command.

Code injection

With brand new, shiny GCC version 5.0 and GDB 7.9 comes a very interesting feature of on-demand code compilation and injection into the inferior. New commands – ‘compile code‘ and ‘compile file‘ will compile the code entered at the GDB prompt or from the provided file, respectively. If the compilation is successful, GDB will inject and execute the code in the inferior space. Thus, all the types, variables and functions are available to the injected code in a given context. You can do whatever you want do with them, as you would in normal C code. After execution, everything gets cleaned up, so the injected code and newly created objects (types, variables, functions) are deleted from the inferior space.

Because I don’t want to install GCC 5 on my Arch Linux machine yet, I made the example below using virtualized Debian Sid with GCC 5 from the experimental repository. at the time of writing this (March 25th, 2015), due to a bug in GDB 7.9 I had to compile the GDB from the latest source with this patch to make code injection working.

Now, having our Listing 4. handy, let’s mess around with the code injection:

In this example a small piece of code is compiled, injected and executed on behalf of the inferior process. With this feature, a C code can be written to be executed on-demand during debugging session and help you to, say, test different algorithms or use more elaborate logic to modify the data in your program. Another great tool in the toolbox :).

Binary patching

By default, GDB opens executables and core files in read-only mode to prevent permanent modifications of the code. This can be changed either with ‘–write‘ command line switch or with ‘set write on‘ command. When using the latter, you have to re-read the executable file with ‘exec-file‘.

At the time of writing, GDB 7.9 has a bug that messes up the executable structure – just entering ‘gdb –write ./executable_file’ and quitting immediately will render the file in useless. Because of that, I made the following example using a vanilla CentOS 7 on a VM, with GDB 7.6.1-57.el7.

Let’s try binary patching on our listing4 executable. To make it simple, let’s try changing the amount of memory allocated by malloc() call. First, we have to find the address to change

We have stopped before the malloc() call. In the disassembled output we can see that the mov $0x120, %edi instruction at the address 0x4005e8 places the “size” argument for malloc() into the %edi register before making a “callq” to the glibc’s malloc implementation (see the the x86_64 System V ABI for more details about function calls and parameter passing). So, the address of the value to change is 0x4005e8+1 byte. Let’s change this single byte to make 0x166 instead of 0x120:

Nothing to it ;). Now, let’s re-run our patched version to check it:

Great, we have made a binary patched executable! With this technique you can make permanent modifications to the binary code, crack software, or, in other words – “fix a misbehaving application for which you don’t have a source code” ;).

Happy ending

Phew! That was a long one, wasn’t it 😉 ?.

I really hope that this article will give you a head start with debugging using GDB on Linux. I am aware that I’ve only scratched the surface in many places, so, if you need more information – grab the GNU GDB Manual, which may be already supplied as infopages in your distro. GDB Documentation page is loaded with heaps of materials as well, including standarization documents, ABI documents, GDB internals and more. Of course, the ultimate source of information is the source code, which can be pulled from git://sourceware.org/git/binutils-gdb.git.

Thank you for visiting my blog, and, as always, stay tuned for more (hopefully shorter and meatier :P) articles


Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.