While trying to implement concurrent evacuation I've found out that there is a serious problem with my simple incremental low-pause garbage collector design. Implementing the phase ordering exactly as specified would corrupt the root set, resulting in not very fun to debug crashes, even with extensive heap verification and lots of assertions everywhere. Why? Let's go through the pertinent assumptions, and the way they interact, carefully, and find a solution.
What goes wrong?
This innocuous, perfectly reasonable code attempts to access from-space memory of an evacuated object:
root_pointer<widget> blue_widget = get_widget(color::blue);
// Suppose our blue widget gets evacuated by incremental GC during this call,
// and evacuation hasn't completed yet, so the update roots phase hasn't run.
// blue_widget now points to from-space memory, frobbing it won't affect
// the to-space copy!
// Or if the evacuation process poisons the from-space memory, frobbing it
// will crash.
The ingredients of our corruption soup
There are several assumptions the original design makes that together cause the problem:
- Root pointers are always valid, as far as the mutator is concerned. This makes accessing data through root pointers fast, because there is no barrier code to execute, it's just a regular memory access through a pointer.
- Evacuation is concurrent. Obviously, if it's not concurrent, the issue magically disappears, but Blue Ridge is a mostly concurrent collector, so evacuation better stay concurrent.
- Objects directly pointed to by roots can move. Treating all objects pointed to from the root set as pinned is not something I'd consider desirable, even if pinning in a regionalized collector can be easy and cheap compared to traditional collectors with unified heap.
Clearly, something needs to change, but none of the apparent solutions sound satisfactory. Introducing a barrier to resolve the to-space address of a root pointer is too expensive — just think of a just-in-time compiled implementation that keeps some of the root pointers in CPU registers having to run a non-trivial chunk of code containing several conditionals every time it wants to access something, the overhead would be prohibitive. Stop-the-world evacuation is out of the question in a mostly-concurrent collector, and treating root set pointees as pinned is a hack. Situation appears bleak…
Easy, just reorder phases!
…except there is a way out: what if the update roots phase ran before concurrent evacuation, thus ensuring the mutator can never have from-space pointers in the root set (remember that the load reference barrier prevents the mutator from loading a from-space pointer from the heap during evacuation already)? It cannot just update the root pointers, because it doesn't know the to-space addresses of those objects yet, because evacuation hasn't started yet, so this phase has to evacuate those objects itself; we can now call this phase evacuate roots.
It's a tradeoff
There is such thing as a free lunch, but this is not it, so naturally, there is a catch. Updating pointers is almost trivial: look at the pointee to see if it has a forwarding pointer, and change the pointer being updated to that value if it does. Evacuation is more involved, as it requires copying the object, so naturally, it is slower. Since this is a stop-the-world phase, the worst-case latency is going to rise. It's still affected only by the root set size, though, only the constant factor is larger, which is acceptable in my case, as this approach is much simpler than the alternatives listed above, relative simplicity being another important factor of the design after all.
Phase ordering, updated
For reference, the updated list of phases is now as follows.
- Mark roots.
- Incremental mark.
- Remark roots.
- Evacuate roots, the subject of this post.
- Incremental evacuation.
- Incremental update references.