Snapshots

Learn how to take snapshots of the current data to monitor changes in your tables.

Mutable tables

Sometimes, rows in a table can be updated or deleted. In that case, it can be tricky to access information based on a previous state of that table.

Imagine we have a table of orders that looks like this:

Orders

order_id

order_status

last_updated_at

1

Payment Accepted

2023-01-01

2

Sent

2023-01-01

After some time, we take another look at our table and it now looks like this:

Orders

order_id

order_status

last_updated_at

1

Sent

2023-01-02

2

Sent

2023-01-01

3

Payment Accepted

2023-01-02

If we want to know how long it took for order_id “1” to be delivered, we lost that information. Luckily, dbt provides a feature called snapshots. Every time they are run, snapshots track and record the changes that happened since the last snapshot.

Understanding snapshots

In our case, if we run the snapshot on the first of January and on the second, we would have a snapshot like this:

Orders Snapshot

order_id

order_status

dbt_updated_at

dbt_valid_from

dbt_valid_to

dbt_scd_id

1

Payment Accepted

2023-01-01

2023-01-01

2023-01-02

260b3e575635a5515b8a3af019fa3573

1

Sent

2023-01-02

2023-01-02


8361e7812ccc2614533d85b3fe4e6ba1

2

Sent

2023-01-01

2023-01-02


e8e880f6f30f74f6743705c4c8c097b3

3

Payment Accepted

2023-01-02

2023-01-02


26d498a4661ec3f4a9571b2a03ce93b2

We have two rows for order_id “1” because it has been in two different states: Payment Accepted and Sent.

If we want to have an overview of our orders at a precise date, we can run the following query:

Press + to interact
SELECT *
FROM {{ ref("orders_snapshot") }}
WHERE ("2023-01-02" between dbt_valid_from and dbt_valid_to) or dbt_valid_to is null

The frequency at which we run our table is crucial for the quality of the snapshot. If a row is changed twice between two snapshot executions, dbt will only be able to record the second change.

Creating a snapshot

✅ It’s considered a best practice to create a snapshot from your source tables, and to select all columns. This will avoid missing data in ...