Supervisors are one of the strongest selling points of Erlang/Elixir and the OTP set of abstractions. They allow us to structure the life cycle of the processes inside our application in a resilient way, making isolating failures a breeze.

However, supervisors are one of the toughest things to test that we’ve come across. The reason for this is that their main job is to allow our application to recover from complex and cascading failures, and these types of failures are hard to trigger on purpose during testing.

Imagine having a complex and deep supervision tree. Now imagine that a child in the corner of the tree starts crashing and doesn’t recover just by being restarted on its own. OTP works beautifully and propagates the failure to the parent supervisor of that child, which starts crashing and restarting all of its children. If that doesn’t solve the problem, then the failure is propagated up and up until restarting enough of your application fixes the problem (or, if it’s a severe problem, until the whole thing crashes).

Get hands-on with 1200+ tech skills courses.