Snapshots
Learn how to take snapshots of the current data to monitor changes in your tables.
Mutable tables
Sometimes, rows in a table can be updated or deleted. In that case, it can be tricky to access information based on a previous state of that table.
Imagine we have a table of orders that looks like this:
Orders
|
|
|
1 | Payment Accepted | 2023-01-01 |
2 | Sent | 2023-01-01 |
After some time, we take another look at our table and it now looks like this:
Orders
|
|
|
1 | Sent | 2023-01-02 |
2 | Sent | 2023-01-01 |
3 | Payment Accepted | 2023-01-02 |
If we want to know how long it took for order_id
“1” to be delivered, we lost that information. Luckily, dbt provides a feature called snapshots. Every time they are run, snapshots track and record the changes that happened since the last snapshot.
Understanding snapshots
In our case, if we run the snapshot on the first of January and on the second, we would have a snapshot like this:
Orders Snapshot
|
|
|
|
|
|
1 | Payment Accepted | 2023-01-01 | 2023-01-01 | 2023-01-02 | 260b3e575635a5515b8a3af019fa3573 |
1 | Sent | 2023-01-02 | 2023-01-02 | 8361e7812ccc2614533d85b3fe4e6ba1 | |
2 | Sent | 2023-01-01 | 2023-01-02 | e8e880f6f30f74f6743705c4c8c097b3 | |
3 | Payment Accepted | 2023-01-02 | 2023-01-02 | 26d498a4661ec3f4a9571b2a03ce93b2 |
We have two rows for order_id
“1” because it has been in two different states: Payment Accepted
and Sent
.
If we want to have an overview of our orders at a precise date, we can run the following query:
SELECT *FROM {{ ref("orders_snapshot") }}WHERE ("2023-01-02" between dbt_valid_from and dbt_valid_to) or dbt_valid_to is null
The frequency at which we run our table is crucial for the quality of the snapshot. If a row is changed twice between two snapshot executions, dbt will only be able to record the second change.
Creating a snapshot
✅ It’s considered a best practice to create a snapshot from your source tables, and to select all columns. This will avoid missing data in ...