Jan 12, 2026

From Zero to Hero: dbt with Ibis Framework

A complete guide from dbt fundamentals to Python-powered transformations

Introduction

After completing the Complete dbt Data Build Tool Bootcamp, I realized dbt brings software engineering best practices to data transformation. But what if you could combine dbt’s structure with Python’s flexibility? That’s where dbt-ibis comes in.

This guide will take you from zero to hero – starting with dbt fundamentals, then showing you how to supercharge your transformations with Ibis framework. We’ll use DuckDB as our data warehouse.

Part 1: Understanding dbt Basics

What is dbt?

dbt (data build tool) is an open-source command-line tool that enables analysts and engineers to:

Transform data using SQL (or Python with Ibis)
Manage data models with version control
Test data quality automatically
Document your data pipeline
Deploy changes safely

Think of dbt as “software engineering practices for data” – it brings Git workflows, testing, and modularity to your data transformations.

dbt Core Concepts

1. Models

Models are SQL or Python files that transform raw data into useful tables/views.

SQL Model (models/customer_orders.sql):

SELECT
    customer_id,
    COUNT(*) as order_count
FROM {{ ref('orders') }}
GROUP BY customer_id

2. Sources

Define where your raw data lives (Snowflake, BigQuery, DuckDB, etc.).

models/sources.yml:

version: 2

sources:
  - name: raw_data
    schema: main
    tables:
      - name: orders
      - name: customers

3. Seeds

CSV files stored in your project that dbt loads as tables. Perfect for small reference data.

seeds/customer_segments.csv:

customer_id,segment_name,segment_priority
1,Enterprise,High
2,SMB,Medium
3,Startup,Low

Load seeds with:

dbt seed

Reference in models:

SELECT
    c.*,
    s.segment_name
FROM {{ ref('customers') }} c
LEFT JOIN {{ ref('customer_segments') }} s ON c.customer_id = s.customer_id

4. Snapshots

Track changes to your data over time (Type 2 Slowly Changing Dimensions).

snapshots/customer_snapshot.sql:

{% snapshot customer_snapshot %}

{{
    config(
      target_schema='snapshots',
      unique_key='customer_id',
      strategy='timestamp',
      updated_at='updated_at'
    )
}}

SELECT * FROM {{ source('raw_data', 'customers') }}

{% endsnapshot %}

5. Refs and Sources

{{ ref(‘model_name’) }} – Reference another model
{{ source(‘source_name’, ‘table_name’) }} – Reference raw data
{{ ref(‘seed_name’) }} – Reference a seed

6. Tests

Ensure data quality with built-in and custom tests.

models/schema.yml:

version: 2

models:
  - name: customer_orders
    columns:
      - name: customer_id
        tests:
          - unique
          - not_null

dbt Project Structure

my_dbt_project/
├── dbt_project.yml          # Project configuration
├── profiles.yml             # Database connections (usually in ~/.dbt/)
├── models/
│   ├── sources.yml          # Raw data definitions
│   ├── schema.yml           # Tests and docs
│   ├── staging/             # Cleaned raw data
│   │   ├── stg_orders.sql
│   │   └── stg_customers.sql
│   └── marts/               # Business logic
│       └── customer_metrics.sql
├── seeds/                   # CSV files
├── snapshots/               # Historical tracking
└── tests/                   # Custom tests

Essential dbt Commands

# Setup and validation
dbt debug                    # Test connection
dbt deps                     # Install dependencies

# Development
dbt run                      # Run all models
dbt run --select stg_orders  # Run specific model
dbt run --select +customer_orders  # Run model + upstream

# Testing
dbt test                     # Run all tests
dbt test --select customer_orders  # Test specific model

# Documentation
dbt docs generate            # Generate docs
dbt docs serve               # View docs locally

# Other
dbt seed                     # Load CSV files
dbt snapshot                 # Capture historical data
dbt compile                  # Compile without running

Your First dbt Project

Step 1: Setup

# Install dbt with DuckDB adapter
pip install dbt-core dbt-duckdb

# Initialize project
dbt init my_project

# Navigate to project
cd my_project

Step 2: Configure DuckDB

~/.dbt/profiles.yml:

my_project:
  target: dev
  outputs:
    dev:
      type: duckdb
      path: "analytics.duckdb"

Step 3: Create Sources

models/sources.yml:

version: 2

sources:
  - name: jaffle_shop
    schema: main
    tables:
      - name: orders
      - name: customers

Step 4: Create Staging Models

models/staging/stg_orders.sql:

SELECT
    id as order_id,
    customer_id,
    order_date,
    status
FROM {{ source('jaffle_shop', 'orders') }}

Step 5: Create Business Logic

models/marts/customer_orders.sql:

SELECT
    c.customer_id,
    c.first_name,
    c.last_name,
    MIN(o.order_date) as first_order,
    MAX(o.order_date) as most_recent_order,
    COUNT(o.order_id) as number_of_orders
FROM {{ ref('stg_customers') }} c
LEFT JOIN {{ ref('stg_orders') }} o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.first_name, c.last_name

Step 6: Run and Test

dbt run
dbt test
dbt docs generate && dbt docs serve

Part 2: Why Switch to Ibis?

Now that you understand dbt fundamentals, let’s see why Ibis is a game-changer.

The Problem with Pure SQL

SELECT
    c.customer_id,
    c.first_name,
    c.last_name,
    MIN(o.order_date) as first_order,
    MAX(o.order_date) as most_recent_order,
    COUNT(o.order_id) as number_of_orders,
    SUM(o.amount) as total_spent
FROM {{ ref('stg_customers') }} c
LEFT JOIN {{ ref('stg_orders') }} o ON c.customer_id = o.customer_id
WHERE o.status = 'completed'
GROUP BY c.customer_id, c.first_name, c.last_name
HAVING COUNT(o.order_id) >= 2
ORDER BY total_spent DESC

Issues with SQL:

No type checking
Hard to reuse logic
Difficult to unit test
Limited IDE autocomplete
Database-specific syntax variations
Hard to debug complex queries

The Ibis Solution

Ibis Model (models/marts/customer_orders.ibis):

from dbt_ibis import depends_on, ref

@depends_on(ref("stg_customers"), ref("stg_orders"))
def model(stg_customers, stg_orders):
    """Calculate customer order metrics with Python"""
    # Filter completed orders
    completed_orders = stg_orders.filter(stg_orders["status"] == "completed")

    # Aggregate by customer
    customer_orders = completed_orders.group_by("customer_id").aggregate(
        first_order=completed_orders["order_date"].min(),
        most_recent_order=completed_orders["order_date"].max(),
        number_of_orders=completed_orders.count(),
        total_spent=completed_orders["amount"].sum()
    )

    # Join with customers
    result = stg_customers.join(customer_orders, "customer_id", how="left")

    # Filter and order
    return result.filter(
        result["number_of_orders"] >= 2
    ).order_by(result["total_spent"].desc())

Benefits of Ibis:

Type safety – Catch errors before execution
Reusable functions – Build complex transformations
Unit testing – Test with pytest
IDE support – Autocomplete and refactoring
Backend agnostic – Works with DuckDB, Snowflake, BigQuery, etc.
Python ecosystem – Use any Python library for logic

Comparison: SQL vs Ibis

Feature	SQL	Ibis
Type Safety	None	Full Python typing
Testing	Limited	pytest integration
Reusability	Copy-paste or Jinja	Functions & classes
Debugging	Hard	Python debugger
IDE Support	Basic	Full autocomplete
Multi-backend	Different syntax	Same code

Part 3: Implementing dbt with Ibis

Installation

# Install dbt-ibis with DuckDB support
pip install dbt-ibis dbt-duckdb "ibis-framework[duckdb]"

# Verify installation
dbt --version
dbt-ibis --version

Project Structure with Ibis

my_dbt_project/
├── dbt_project.yml
├── models/
│   ├── sources.yml
│   ├── schema.yml
│   ├── staging/
│   │   ├── stg_orders.ibis        # Note: .ibis extension
│   │   ├── stg_customers.ibis
│   │   └── __ibis_sql/            # Auto-generated SQL (don't edit)
│   │       ├── stg_orders.sql
│   │       └── stg_customers.sql
│   └── marts/
│       ├── customer_orders.ibis
│       └── __ibis_sql/
│           └── customer_orders.sql
├── seeds/
│   └── customer_segments.csv
├── snapshots/
│   └── customer_snapshot.sql      # Snapshots remain in SQL
└── tests/
    └── test_customer_orders.ibis

Important: Ibis models use the .ibis file extension, NOT .py. The dbt-ibis CLI automatically compiles these to SQL files in __ibis_sql/ subfolders.

Configuration

~/.dbt/profiles.yml:

my_project:
  target: dev
  outputs:
    dev:
      type: duckdb
      path: "analytics.duckdb"

dbt_project.yml:

name: "my_project"
version: "1.0.0"
config-version: 2

profile: "my_project"

model-paths: ["models"]
seed-paths: ["seeds"]
snapshot-paths: ["snapshots"]
test-paths: ["tests"]

target-path: "target"
clean-targets:
  - "target"
  - "dbt_packages"

models:
  my_project:
    staging:
      +materialized: view
    marts:
      +materialized: table

Building Models with Ibis

1. Define Sources with Column Data Types

Critical: For non-Ibis models, sources, seeds, and snapshots, you must specify column data types. This is how dbt-ibis knows the schema.

models/sources.yml:

version: 2

sources:
  - name: jaffle_shop
    schema: main
    tables:
      - name: orders
        columns:
          - name: id
            data_type: integer
          - name: customer_id
            data_type: integer
          - name: order_date
            data_type: date
          - name: status
            data_type: varchar
          - name: amount
            data_type: decimal
      - name: customers
        columns:
          - name: id
            data_type: integer
          - name: first_name
            data_type: varchar
          - name: last_name
            data_type: varchar
          - name: email
            data_type: varchar

Tip: Use the dbt-codegen package to auto-generate these column definitions:

dbt run-operation generate_source --args '{"database_name": "main", "schema_name": "main", "generate_columns": true, "table_names": ["orders", "customers"]}'

2. Create Staging Models

models/staging/stg_orders.ibis:

from dbt_ibis import depends_on, source

@depends_on(source("jaffle_shop", "orders"))
def model(orders):
    """Clean and standardize orders data"""
    return orders.select(
        order_id=orders["id"],
        customer_id=orders["customer_id"],
        order_date=orders["order_date"],
        status=orders["status"],
        amount=orders["amount"]
    )

models/staging/stg_customers.ibis:

from dbt_ibis import depends_on, source

@depends_on(source("jaffle_shop", "customers"))
def model(customers):
    """Standardize customer data"""
    return customers.select(
        customer_id=customers["id"],
        first_name=customers["first_name"],
        last_name=customers["last_name"],
        email=customers["email"].lower()
    )

3. Create Business Logic Models

models/marts/customer_orders.ibis:

from dbt_ibis import depends_on, ref

@depends_on(ref("stg_customers"), ref("stg_orders"))
def model(stg_customers, stg_orders):
    """
    Calculate customer order metrics:
    - First order date
    - Most recent order
    - Total number of orders
    - Total spent
    """
    # Filter for completed orders only
    completed_orders = stg_orders.filter(stg_orders["status"] == "completed")

    # Aggregate by customer
    customer_orders = completed_orders.group_by("customer_id").aggregate(
        first_order=completed_orders["order_date"].min(),
        most_recent_order=completed_orders["order_date"].max(),
        number_of_orders=completed_orders.count(),
        total_spent=completed_orders["amount"].sum()
    )

    # Join with customer details
    result = stg_customers.join(customer_orders, "customer_id", how="left")

    # Select final columns
    return result.select(
        "customer_id",
        "first_name",
        "last_name",
        "first_order",
        "most_recent_order",
        "number_of_orders",
        "total_spent"
    )

Note: Parameter names in the model() function must match the names used in ref() and source(). For example, ref("stg_customers") means the parameter should be stg_customers.

Running Your Ibis Project

The key command is dbt-ibis instead of dbt:

# Run all models (compiles .ibis to .sql, then runs dbt)
dbt-ibis run

# Run specific model
dbt-ibis run --select customer_orders

# Run with upstream dependencies
dbt-ibis run --select +customer_orders

# Run tests
dbt-ibis test

# Generate documentation
dbt-ibis docs generate
dbt-ibis docs serve

# Load seeds (seeds are still CSV, no Ibis)
dbt-ibis seed

# Run snapshots (snapshots are still SQL)
dbt-ibis snapshot

Understanding the Compilation Process

# This single command:
dbt-ibis run

# Is equivalent to:
dbt-ibis precompile    # Convert .ibis files to .sql
dbt run                # Run the generated SQL with dbt

The precompile step generates SQL files in __ibis_sql/ subfolders. You can inspect these to debug your Ibis expressions.

Shell Alias (Optional)

To continue using dbt command instead of dbt-ibis, add this alias to your ~/.bashrc or ~/.zshrc:

alias dbt="dbt-ibis"

Part 4: Working with Seeds and Snapshots

Seeds with Ibis

Seeds remain CSV files, but you can reference them in Ibis models.

seeds/customer_segments.csv:

customer_id,segment_name,segment_priority
1,Enterprise,High
2,SMB,Medium
3,Startup,Low

seeds/schema.yml (required for Ibis to know the types):

version: 2

seeds:
  - name: customer_segments
    columns:
      - name: customer_id
        data_type: integer
      - name: segment_name
        data_type: varchar
      - name: segment_priority
        data_type: varchar
        tests:
          - accepted_values:
              values: ['High', 'Medium', 'Low']

models/marts/enriched_orders.ibis:

from dbt_ibis import depends_on, ref

@depends_on(ref("stg_orders"), ref("customer_segments"))
def model(stg_orders, customer_segments):
    """Enrich orders with customer segments"""
    return stg_orders.join(
        customer_segments,
        "customer_id",
        how="left"
    ).select(
        "order_id",
        "customer_id",
        "order_date",
        "amount",
        "segment_name",
        "segment_priority"
    )

Snapshots

Snapshots currently remain in SQL (dbt-ibis limitation). You can still reference them in Ibis models.

snapshots/customer_snapshot.sql:

{% snapshot customer_snapshot %}

{{
    config(
      target_schema='snapshots',
      unique_key='customer_id',
      strategy='timestamp',
      updated_at='updated_at'
    )
}}

SELECT * FROM {{ source('jaffle_shop', 'customers') }}

{% endsnapshot %}

models/schema.yml (define snapshot columns for Ibis):

version: 2

snapshots:
  - name: customer_snapshot
    columns:
      - name: customer_id
        data_type: integer
      - name: first_name
        data_type: varchar
      - name: last_name
        data_type: varchar
      - name: email
        data_type: varchar
      - name: dbt_valid_from
        data_type: timestamp
      - name: dbt_valid_to
        data_type: timestamp

models/marts/customer_history.ibis:

from dbt_ibis import depends_on, ref

@depends_on(ref("customer_snapshot"))
def model(customer_snapshot):
    """Analyze customer changes over time"""
    # Get current customers (dbt_valid_to is null)
    current = customer_snapshot.filter(
        customer_snapshot["dbt_valid_to"].isnull()
    )

    return current.select(
        "customer_id",
        "first_name",
        "last_name",
        "dbt_valid_from"
    )

Part 5: Writing Tests with Ibis

Singular Tests

Create .ibis files in your tests/ folder with a test function (not model).

tests/test_no_duplicate_names.ibis:

from dbt_ibis import depends_on, ref

@depends_on(ref("stg_customers"))
def test(stg_customers):
    """Test that no customer has first_name equal to last_name"""
    return stg_customers.filter(
        stg_customers["first_name"] == stg_customers["last_name"]
    )

The test passes if the query returns zero rows.

Unit Testing with pytest

You can also write unit tests outside of dbt using pytest and Ibis memtables:

tests/unit/test_transformations.py:

import pytest
import ibis

def test_customer_orders_aggregation():
    """Test that customer orders aggregation works correctly"""

    # Create test data using memtables
    test_orders = ibis.memtable({
        "order_id": [1, 2, 3, 4],
        "customer_id": [1, 1, 2, 3],
        "order_date": ["2024-01-01", "2024-01-15", "2024-01-20", "2024-02-01"],
        "status": ["completed", "completed", "completed", "pending"],
        "amount": [100.0, 150.0, 200.0, 50.0]
    })

    # Apply the same logic as your model
    completed_orders = test_orders.filter(test_orders["status"] == "completed")

    result = completed_orders.group_by("customer_id").aggregate(
        order_count=completed_orders.count(),
        total_spent=completed_orders["amount"].sum()
    ).execute()

    # Assert expectations
    assert len(result) == 2  # Only customers 1 and 2 have completed orders

    customer_1 = result[result["customer_id"] == 1].iloc[0]
    assert customer_1["order_count"] == 2
    assert customer_1["total_spent"] == 250.0

Run with:

pytest tests/unit/

Part 6: Advanced Ibis Patterns

Complex Aggregations

from dbt_ibis import depends_on, ref

@depends_on(ref("stg_orders"))
def model(stg_orders):
    """Customer lifetime value analysis"""
    return stg_orders.group_by("customer_id").aggregate(
        first_order=stg_orders["order_date"].min(),
        last_order=stg_orders["order_date"].max(),
        total_spent=stg_orders["amount"].sum(),
        avg_order_value=stg_orders["amount"].mean(),
        order_count=stg_orders.count()
    )

Window Functions

from dbt_ibis import depends_on, ref

@depends_on(ref("stg_orders"))
def model(stg_orders):
    """Rank orders by customer"""
    return stg_orders.mutate(
        order_rank=stg_orders["order_id"].rank().over(
            group_by="customer_id",
            order_by="order_date"
        ),
        running_total=stg_orders["amount"].sum().over(
            group_by="customer_id",
            order_by="order_date"
        )
    )

Conditional Logic with Case Statements

from dbt_ibis import depends_on, ref
import ibis

@depends_on(ref("stg_orders"))
def model(stg_orders):
    """Categorize order values"""
    return stg_orders.mutate(
        order_category=ibis.case()
            .when(stg_orders["amount"] Create reusable transformation functions:

```python
# models/utils/transformations.py (regular Python file)
def add_fiscal_quarter(table, date_column):
    """Add fiscal quarter based on date column"""
    import ibis
    month = table[date_column].month()
    return table.mutate(
        fiscal_quarter=ibis.case()
            .when(month.isin([1, 2, 3]), "Q1")
            .when(month.isin([4, 5, 6]), "Q2")
            .when(month.isin([7, 8, 9]), "Q3")
            .else_("Q4")
            .end()
    )

from dbt_ibis import depends_on, ref
from utils.transformations import add_fiscal_quarter

@depends_on(ref("stg_orders"))
def model(stg_orders):
    """Add fiscal quarter to orders"""
    return add_fiscal_quarter(stg_orders, "order_date")

Part 7: Configuration Tips

Column Name Casing (Snowflake, etc.)

For databases like Snowflake that store identifiers in uppercase:

dbt_project.yml:

vars:
  # Format: dbt_ibis_letter_case_in_db_{profile}_{target}
  dbt_ibis_letter_case_in_db_my_project_prod: upper
  dbt_ibis_letter_case_in_expr: lower

This lets you write lowercase column names in your Ibis code while Snowflake stores them in uppercase.

VS Code Configuration

Add to your VS Code settings.json:

{
    "files.associations": {
        "*.ibis": "python"
    }
}

Part 8: Limitations and Workarounds

Current dbt-ibis Limitations

No database connection in models – Ibis models generate SQL; they cannot query the database directly
Column types required – Non-Ibis sources, seeds, snapshots, and SQL models need explicit column type definitions
Snapshots remain SQL – Currently no Ibis support for snapshot definitions

Workarounds

For complex queries requiring database connection: Use a SQL model for that specific transformation, then reference it in your Ibis models.

For auto-generating column types:

# Install dbt-codegen
dbt deps  # After adding to packages.yml

# Generate source definitions
dbt run-operation generate_source \
  --args '{"database_name": "main", "schema_name": "main", "generate_columns": true, "table_names": ["orders"]}'

# Generate model column definitions
dbt run-operation generate_model_yaml \
  --args '{"model_names": ["stg_orders"]}'

Part 9: CI/CD Integration

GitHub Actions Example

.github/workflows/dbt.yml:

name: dbt CI

on: [push, pull_request]

jobs:
  build-and-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Install dependencies
        run: |
          pip install dbt-ibis dbt-duckdb "ibis-framework[duckdb]"
          pip install pytest

      - name: Verify Ibis compilation is up to date
        run: |
          dbt-ibis precompile
          # Check if any files changed
          if [[ -n $(git status --porcelain) ]]; then
            echo "Error: dbt-ibis precompile generated changes. Please commit them."
            git status
            exit 1
          fi

      - name: Run dbt
        run: |
          dbt-ibis run
          dbt-ibis test

      - name: Run pytest
        run: pytest tests/unit/

Pre-commit Hook

.pre-commit-config.yaml:

repos:
  - repo: local
    hooks:
      - id: dbt-ibis-precompile
        name: dbt-ibis precompile
        entry: dbt-ibis precompile
        language: system
        pass_filenames: false
        files: '\.ibis$'

Part 10: Switching Backends

One of Ibis’s greatest strengths is backend portability. Your models work unchanged across databases.

Development with DuckDB, Production with Snowflake

~/.dbt/profiles.yml:

my_project:
  target: dev
  outputs:
    dev:
      type: duckdb
      path: "analytics.duckdb"

    prod:
      type: snowflake
      account: "{{ env_var('SNOWFLAKE_ACCOUNT') }}"
      user: "{{ env_var('SNOWFLAKE_USER') }}"
      password: "{{ env_var('SNOWFLAKE_PASSWORD') }}"
      role: analyst
      database: analytics
      warehouse: compute_wh
      schema: dbt_analytics

Run against different targets:

# Development
dbt-ibis run

# Production
dbt-ibis run --target prod

Your Ibis models remain unchanged – they automatically compile to the appropriate SQL dialect!

Conclusion

This guide took you from zero to hero – starting with dbt fundamentals and ending with Python-powered transformations using Ibis.

What You Now Know

dbt fundamentals: Models, sources, refs, tests, seeds, snapshots
dbt commands: run, test, docs, seed, snapshot
dbt-ibis syntax: .ibis files, @depends_on, ref(), source()
Column type requirements: Why and how to specify them
Testing: Singular tests and pytest integration
Advanced patterns: Window functions, aggregations, reusable code
CI/CD: GitHub Actions integration
Multi-backend: Same code for DuckDB and Snowflake

Quick Reference

Task	Command
Run all models	dbt-ibis run
Run specific model	dbt-ibis run —select model_name
Run tests	dbt-ibis test
Load seeds	dbt-ibis seed
Run snapshots	dbt-ibis snapshot
Generate docs	dbt-ibis docs generate
Precompile only	dbt-ibis precompile

Supported Backends

dbt-ibis supports: DuckDB, Snowflake, BigQuery, Postgres, Redshift, RisingWave, Databricks, Trino, MySQL, SQLite, Oracle

Resources

dbt-ibis Documentation
dbt-ibis GitHub
Ibis Project
dbt Documentation
dbt-ibis Blog Post

Ready to transform your data pipeline? Start with the examples above, experiment locally with DuckDB, and watch your productivity soar.