Symptoms and causes

A crash happens when a program encounters a condition (erroneous in most cases) that it does not have code to handle. The result would be that the program stops abruptly with a backtrace in the logs and, in some cases, a core file, depending on the platform, programming language, etc. So, right away, we’ll know the code path that led to the crash from the backtrace. This code path is a victim of the erroneous condition.

Debugging a crash is figuring out the unhandled code path or condition and what led to this condition. A program has to handle many internal and external conditions, e.g., when reading a file from a disk, dereferencing a pointer, sharing a resource among threads, etc. Many of these conditions could face issues in various ways, resulting in a crash. Our final goal here is to find out what leads to this condition from the symptom, which is the crash.

The backtrace in the crash is a code path affected by the bug. Returning to our analogy, debugging a crash is like solving a murder mystery with just the victim’s last words. The last word could be a name, which would be a good starting point, but more is needed to make a conviction.

The backtrace in the crash is just the code path affected by the bug. We have to get to the bug from this backtrace. The bug could be in any functions specified in the backtrace or elsewhere. Our objective is to go after the cause of this backtrace.

Pattern to debug a crash

In this section, we’ll present a general pattern: an ordered sequence of steps to follow for debugging crashes. The goal is to identify the erroneous condition that led to the crash and what caused it.

Step 1: Identify the erroneous condition

The first step is to pinpoint the unhandled erroneous condition that has led to the crash. At the code level, a software process acts on a programming entity (opening a file, accessing a variable, etc.), which usually succeeds. However, sometimes, conditions are not favorable for this action: the entity needs to be in a state for the attempted action, or the action is performed incorrectly on the entity, leading to a crash. This step aims to find the software entity, the action performed, and the faulty condition.

In most crashes, both the software entity and the action are readily available on top of the backtrace seen with the crash. A backtrace will typically have the reason for the crash at its top, followed by the call stack that led to it. Depending on the programming language, a core file is available on some platforms that provide even more information about the crash. It is essential to dig deeper here to know the responsibility and purpose of the software entity and the action on it. For example, consider the following snippet from a backtrace from a Python program:

Get hands-on with 1400+ tech skills courses.