Unit testing Spark

Unit testing in Spark does not stray much from the usual manner in which we can test individual units of source code in an application. The goal remains the same, parts or units of our application should be tested separately and (when possible) in an isolated fashion to ensure that they behave as expected and meet their intended design. A desired by-product of the unit testing process is that it allows developers to detect potential bugs or errors in an early phase of the implementation process, thus increasing code robustness in the long term.

In Spark, it is reasonable to state that, just like in most applications, as long as the different objects that collaborate with a certain unit (usually a class), or with resources (such as a database), are faked, mocked, or their behaviors stubbed, the procedure pretty much remains the same:

  • Test the different and potential execution flows. For example, if there is an if block followed by an else, make sure both paths are asserted when code executes, and that they work as expected.

  • Test edge cases to make sure tricky or erroneous scenarios are tested.

  • To test in an isolated environment, remove the dependencies on collaborator objects (such as objects communicating with other systems, DB, and so on) by mocking their behavior and manipulating it in a way that different responses can be simulated, according to our testing cases and needs.

Batch application unit test example

Let’s crystallize these concepts we just explained by looking at a unit test written in the Spark batch application template. We won’t be using Spring Boot testing that much while developing unit tests, and for a good reason: Spring might take quite some time and do too much while initializing, before running tests.

The testing framework our tests are written on is JUnit 5, which runs very quickly (as unit tests should, ideally in the milliseconds) and should be a de facto standard for Java testing by now. We do, however, make use of Mockito Framework, which ships with the Spring Boot testing dependency.

All the testing dependencies we include in the Spark batch application’s pom.xml, which can be found starting at the following dependency and below:

Get hands-on with 1400+ tech skills courses.