Break Our Application Like a Server (Part I)

Learn what happens when an error occurs with your database.

Errors do not always happen from user-initiated actions—different processes and tools can fail on the server. Our application may experience network disconnections between servers, database slowness or downtime, and crashed processes due to bugs or a large amount of work. It’s nearly impossible to consider everything that can go wrong in an application, so we often won’t realize that there is a problem with failure handling until it’s too late. We can simulate many issues locally and in staging environments before experiencing them in production.

This lesson will test what happens to our application during database downtime and when different processes crash on the server. We’ll utilize the observer tool that ships with Erlang/OTP to view our application’s supervision tree. We’ll kill various processes to ensure that our application doesn’t reach an incorrect state. A good rule is to ensure that any custom GenServers, custom Supervisors, and our Ecto Repo can be killed without our application crashing. We’ll be performing manual acceptance tests throughout this section. However, our tests will be doing things outside of what a typical user could do.

Simulate database downtime

A database outage is a serious issue. An application’s database is often the source of truth, so any operation that requires strong consistency should fail. Operations that don’t perform updates or require strong consistency may still work in the event of a database outage. This type of test is pretty advanced for a normal QA process but is useful when testing flows that involve money or other vital resources. It’s good to know how our application will respond when a database disconnects, although, hopefully, we won’t see that happen very frequently.

Define the test

A shopper is initially connected to the store, waiting for a shoe to release—the application database restarts during this time. The shopper should be able to reload the page without error but should not see a shoe release during this time. From an application admin perspective, the application will disallow the release of a sneaker.

The server should serve pages during this time, but the server will not work if restarted.

Write steps for the test

  1. Start the server in a freshly seeded state.
  2. Load the webpage.
  3. Stop your database to simulate a downtime event.
  4. Refresh the webpage several times.
  5. Attempt to release sneaker with ID 1.
  6. Start your database.
  7. Release sneaker with ID 1 while viewing the page.

Write expectations for the test

  • The shopper sees “coming soon” after step 2.
  • The shopper can refresh the page without issue at step 4.
  • The release process should fail at step 5.
  • The release process should succeed at step 7.
  • The shopper sees the released shoe’s selector after step 7.

We will need to discover how to stop our database locally to perform this test. We’ll be using service postgresql stop. We may need to use a different command to run it locally depending on our operating system and how we installed Postgres. Let’s run through our test now.

Get hands-on with 1400+ tech skills courses.