Incremental Model with the MERGE Strategy
Learn how to create an incremental model with the MERGE strategy.
Defining an incremental model
Tables come with one big issue: they don’t stay updated. To refresh the table, we need to run the dbt model. Every time we run a dbt model, it recreates the whole table. When dealing with large amounts of data, it can become very slow and ineffective.
Fortunately, there’s a third type of materialization that incrementally merges new rows with the existing table through an incremental strategy. This materialization requires that we enable billing on our BigQuery project (it can be done through the free trial).
To use incremental models with BigQuery, you’ll need to enable billing on your project.
You’ll then get 300€ of credits to use before being charged.
{{ config(materialized="incremental")}}SELECT *FROM {{ ref("orders") }}LEFT JOIN {{ ref("addresses") }}USING(address_id){% if is_incremental() %}WHERE order_id > (SELECT MAX(order_id) FROM {{this}}){% endif %}
We specify an if
statement to filter the model and select only the rows that we want to add. Here, we specify that we want to add only the rows that have an order_id
higher than what we already have in the database. Here’s what happens in the background when we run this model:
merge into `project-id`.`dataset-id`.`shipping_tags` as DBT_INTERNAL_DESTusing (SELECT *FROM `project-id`.`dataset-id`.`orders`LEFT JOIN `project-id`.`dataset-id`.`addresses`USING(address_id)WHERE order_id > (SELECT MAX(order_id) FROM `project-id`.`dataset-id`.`shipping_tags`)) as DBT_INTERNAL_SOURCEon FALSEwhen not matched then insert(`address_id`, `order_id`, `gift_message`, `address_first_line`, `address_second_line`, `zip_code`, `city`, `country`)values(`address_id`, `order_id`, `gift_message`, `address_first_line`, `address_second_line`, `zip_code`, `city`, `country`)
The MERGE
statement in BigQuery
In BigQuery, a MERGE
statement is used to update a table based on the content of another table. First, it starts by declaring a target and a ...