Memory Management and Garbage Collection
TL;DR: Stop reading if you have a computer with infinite memory.
Let’s go over some well-known garbage collection techniques and analyze the tradeoffs involved from the system design perspective. Garbage collection is a subject that is very close to my heart, and by writing about it I hope to improve my own understanding of modern language runtimes.
In this lesson, we’ll go over a brief overview of the explicit vs automatic memory management approaches. Then we’ll establish the terminology that’ll serve as the basis of deeper dive into garbage collection algorithms and techniques.
Computer memory is a finite resource. Historically, for any reasonably large software system, memory management was typically one of the leading concerns of its designers. They had the momentous task of picking one of the following options:
- Explicit memory management
- Automatic memory management
Let's briefly look at the pros and cons of both approaches.
Explicit memory management requires programmers to manually free or delete the dynamically allocated memory. Languages, like C/C++, offer almost complete control (even address alignment at times) over the object’s lifecycle. For large systems, it requires strict programming discipline to ensure the long-term correctness and reliability of the programs. Some common coding pitfalls include:
- Dangling pointers due to releasing a live object
- Memory corruption due to double free
- Memory leaks due to forgetting to release memory
- Higher coupling between components due to modeling of ownership semantics in API signatures
There are many established patterns to avoid these pitfalls, but pretty much everything boils down to maintaining strict coding discipline that requires consistently using the correct memory management strategy. This becomes tricky for large programs where third-party dependencies or libraries might force a mix of multiple approaches. Memory issues are arguably among the hardest class of bugs to investigate.
For example, in case of dangling pointers or double free, the best outcome is the immediate crash of the program but typically it doesn’t happen and the program continues to run exhibiting unpredictable behavior. Hence the saying, to err is human — to blame it on a computer is even more so.
In this day and age, the biggest problem with the option of explicit memory management is that it exists.
Automatic memory management is an essential feature of all modern programming languages. It relieves the programmer from the burden of memory management by presenting an illusion of infinite memory. For example, the following java program will never run out of memory even when run with no optimizations (-Djava.compiler=NONE
).
import java.util.UUID;public class A {public static void main(String []args) {while (true) {System.out.println(new String(UUID.randomUUID().toString()));}}}
Other pros include:
- No dangling pointers.
- No double frees
- Simpler code
- Faster development cycle
Simpler code is the key win with automatic memory management. In garbage collected languages, the code is more readable and easier to reason about.
Having said that, automatic memory management isn't the silver bullet. It doesn't guarantee the absence of memory leaks. It introduces space/time tradeoffs with subtle impacts on a program's runtime. Let's start with the basics before getting into subtleties.
An automatic memory management system typically comprises of two almost independent parts:
- Dynamic memory allocation
- Garbage collection
Dynamic memory allocation involves a heap that serves as a pool of memory that backs all the allocated objects. Heap allocation allows the programmers to allocate objects of variable sizes and then freely pass them around beyond current execution context. All heap allocated objects are accessed through references. A reference typically points to one of the following:
- Address of the allocated object's memory on the heap.
- A handle that in-turn contains the object's memory address on the heap
Handles offer the advantage of allowing an object to be re-allocated easily by updating the handle's value as opposed to updating all the object's references.
Heap offers APIs to allocate and deallocate the objects. We'll refer to this heap interface as allocator.
Garbage collector is the magic component that makes the illusion of infinite memory possible. It typically consists of two semi-independent components known as mutator and collector.
Typically, all components of a language runtime that are responsible for executing application code, allocating/de-allocating new objects, and managing execution contexts get categorized as the mutators. Most reasonably large programs have more than one mutator threads. Exact number of mutator threads is generally irrelevant to the garbage collection approaches.
The collector actually executes the garbage collection code. It discovers the garbage (read: unreachable) objects and reclaims their storage. In concurrent GCs, there may be more than one collector threads. Exact number of collectors is mostly irrelevant to the approaches that we are going to discuss. Most GCs, offer control over the behavior of collectors through configuration settings (more on that later).
The collector is correct only if it never reclaims live objects. Theoretically, determining liveness of an object might not even be possible (e.g. in systems supporting hot patching) even if we've the time to scan the world. So, most garbage collectors cheat a bit and treat reachability as liveness. An object is considered reachable if a mutator can reach it by following direct or indirect references of the objects in its execution contexts (local/global variables, stack frames, virtual/physical CPU registers, thread local storage, heap references etc.).
Just like distributed systems, designing a garbage collection mechanism is an exercise in balancing tradeoffs. Some of the desirable properties of garbage collectors are:
- Correctness - A GC must be correct and safe. It must never reclaim live objects. The safety typically comes at the cost of performance or efficiency. There are cases where a GCs performance can be improved by feeding it hints from compiler and other runtime systems. More on that later.
- Performance - Overall time spent by a GC should be as short as possible. Typically a GC's performance is measured in two dimensions - latency and throughput. Both significantly depend on the implementation, workload and runtime environment. GCs are by nature intrusive. They introduce pauses by stopping all mutators while performing garbage accounting. It is clearly desirable for these pauses to be as short and as infrequent as possible. In some systems (i.e. distributed lease/transaction coordinators etc.), long pauses could trigger failures resulting in system downtime. Having said that, measuring pause time alone isn't particularly useful in itself. A GC that has no pauses and not garbage reclaiming throughput wouldn't be very useful. Ideally, a GC should reclaim all the garbage in the heap. However, this isn't always possible or even desirable (say for large heaps). For performance reasons, it may be desirable not to collect the whole heap at every collector cycle. Generational garbage collectors (used in Java/.NET etc.) segregate objects by their age into two or more regions called generations. By concentrating effort on the youngest generation, generational collectors can both improve total collection time and reduce the average pause time for individual collections. More on that later.
- Space Overhead - The goal of automatic memory management is the safe and efficient use of space. Memory managers, both explicit and automatic, impose space overheads on the system. Some GCs may impose per-object space overhead (e.g. reference counts); others may be able to piggy back on an object's existing layout (e.g, a dirty bit). Collectors also have space overhead as they are actually doing the garbage accounting. The standard space-vs-time tradeoff applies to all algorithms and GCs are no exception.
GC algorithms are complex and juggle way too many tradeoffs. There are cases where programming language semantics can aid a GC’s performance. For example, functional languages like ML/Scala, distinguish mutable data from immutable — and GC implementations can leverage this information to improve performance.