Data Tests

Learn how to set up data quality tests to ensure the accuracy, reliability, and integrity of the data models.

What are data tests?

When performing data transformation, it’s crucial to make sure that data is accurate and reliable. As the number of models increases, it can be hard to check all of them manually. Fortunately, dbt provides a feature called data test.

A data test is a validation mechanism used to verify the expected behavior of a data model. Tests are configured in the project and are run with the test command.

dbt provides two types of data tests:

  • Generic tests

  • Singular tests

Generic tests

A generic test can be applied to different models. For example, checking that a column does not contain null values is a generic test that can be applied to several models but to different columns.

Configuring generic tests

Generic tests need to be set up in property files (such as schema.yml). This setup allows the user to apply predefined tests to their models and columns to ensure data quality and integrity.

Property files

A property file is a YAML file that is stored in the models directory and contains information about model properties.

Press + to interact
version: 2
models:
- name: good_orders
description: A model with valid orders.
columns:
- name: order_id
description: A unique id for the order
- name: customer_id
data_type: integer
- name: bad_orders
columns:
- name: product_id
- name: order_status

It’s possible to store all model properties in a single file, but it’s usually more convenient to split the configuration into different files.

Generic tests have to be applied to a column in a particular model, under the test config:

Press + to interact
version: 2
models:
- name: good_orders
columns:
- name: order_id
tests:
- unique

Only tests that are defined in a property file can be run. It’s not possible to directly trigger a test excution through the CLI without prior code configuration. ...