...

/

Local Stack with DuckDB and dbt

Local Stack with DuckDB and dbt

Get familiar with dbt and data transformation pipelines.

A typical analytics environment in the corporate world is built around (distributed) central storage and query technology optimized for analytics. Any other components like ETL, business intelligence, and entity resolution must be integrated to maintain the efficiency of this data stack.

The distributed nature of technologies like Snowflake, Databricks, and BigQuery is abstracted away, so it feels like there is one place to store and query data. Let’s make this idea concrete by replicating this kind of stack with open source on a single machine.

Configuring our data stack

The following image illustrates a technology that could consist of several proprietary (and costly) components. Here, we will replicate the basic functionality with open source and refer to the different components by the colors in the image below:

Press + to interact
Example data stack, consisting of a local DuckDB instance for storage
Example data stack, consisting of a local DuckDB instance for storage

Here is a quick overview of our open-source data stack:

  • Storage and compute: The heart of our stack is the DuckDB database with its powerful SQL data transformation engine. We also use a Python environment for data transformations that are painful to write in SQL.

  • Extract and load: A simple Python script ingests raw data ...