The Data Problem

Learn how to work with data and the importance of refactoring code.

Fast test

We still want the vast majority of our JUnit tests to be fast. This shouldn’t be a problem. If we isolate all of our persistence interaction to one place in the system, we end up with a reasonably small amount of code that must be integration-tested.

In-memory database

We might be tempted to consider using an in-memory database such as H2 to emulate our production database for testing purposes. This would help us get the speed we want, but otherwise can be a mess. Attempts we’ve made to use in-memory databases were fraught with problems due to sometimes subtle differences between the in-memory database and the production RDBMS.

When we write integration tests for code that interacts with the real database, the data in the database and how it gets there become important considerations. To verify that the database operations return query results as expected, for example, we need to either put appropriate data into the database or assume it’s already there.

Breaking tests

Assuming that data is already in the database is a recipe for long-term pain. Over time, the data will change without our knowledge, thus breaking tests. Divorcing the data from the test code makes it a lot harder to understand why a particular test passes or not. The meaning of the data with respect to the tests is lost by dumping it all into the database.

It’s preferable to let the tests create and manage the data.

Your own database

If it’s our database on our own machine, the simplest route might be for each test to start with a clean database (or one prepopulated with necessary reference data). Each test then becomes responsible for adding and working with its own data. This minimizes intertest dependency issues, where one test breaks because of data that another test left lying around, which can be a headache to debug!

Shared database

Get hands-on with 1300+ tech skills courses.