Modeling Gherkin
Learn about how to write a Gherkin parser using Gherkin keywords.
We'll cover the following...
Applied techniques: Writing a Gherkin parser
Gherkin is an indentation-based language that allows developers to write software tests in a way that reads like a natural language, such as English or French. We will not be looking to explain how to use Gherkin to run tests but rather explore the structure of the language and write a parser in PHP that will handle it. While we do not want to get into deep discussions on how tests written in Gherkin are eventually used, we need to look at quite a few language examples to get a sense of what we are dealing with before we start writing our parser. Let’s take a look at the following code example from the Gherkin reference:
Feature: Guess the word# The first example has two stepsScenario: Maker starts a gameWhen the Maker starts a gameThen the Maker waits for a Breaker to join# The second example has three stepsScenario: Breaker joins a gameGiven the Maker has started a game with the word "silky"When the Breaker joins the Maker's gameThen the Breaker must guess a word with 5 characters
We can already see a few notable things in the code snippet above before we get into our parser implementation. The first thing to note is that a Gherkin file always begins with a Feature
block and can contain multiple children. We also have two other block types called scenarios.
Gherkin uses a scenario to express a testable behavior within the feature test. Every Gherkin scenario belongs to a feature, and a scenario can have many steps.
Gherkin’s parent-child relationships are indicated by the indentation of the file. In the code snippet above, we have the following program structure:
- Feature
- Scenario
- Step
- Step
- Scenario
- Step
- Step
- Step
- Scenario
The empty lines on lines 2 and 7 can largely be ignored for our purposes. Lines 3 and 8 contain an interesting construct we need to consider: the Gherkin comment.
The Gherkin comment
Gherkin comments can appear on any line, with any number of leading whitespace, but will always start with the #
character. These few facts will make it relatively painless for us to parse these later.
We can update our mental program structure to:
- Feature
- Comment
- Scenario
- Step