From Zero to Hero: dbt with Ibis Framework
A complete guide from dbt fundamentals to Python-powered transformations
Introduction
After completing the Complete dbt Data Build Tool Bootcamp, I realized dbt brings software engineering best practices to data transformation. But what if you could combine dbt’s structure with Python’s flexibility? That’s where dbt-ibis comes in.
This guide will take you from zero to hero – starting with dbt fundamentals, then showing you how to supercharge your transformations with Ibis framework. We’ll use DuckDB as our data warehouse.
Part 1: Understanding dbt Basics
What is dbt?
dbt (data build tool) is an open-source command-line tool that enables analysts and engineers to:
-
Transform data using SQL (or Python with Ibis)
-
Manage data models with version control
-
Test data quality automatically
-
Document your data pipeline
-
Deploy changes safely
Think of dbt as “software engineering practices for data” – it brings Git workflows, testing, and modularity to your data transformations.
dbt Core Concepts
1. Models
Models are SQL or Python files that transform raw data into useful tables/views.
SQL Model (models/customer_orders.sql):
SELECT customer_id, COUNT(*) as order_countFROM {{ ref('orders') }}GROUP BY customer_id2. Sources
Define where your raw data lives (Snowflake, BigQuery, DuckDB, etc.).
models/sources.yml:
version: 2
sources: - name: raw_data schema: main tables: - name: orders - name: customers3. Seeds
CSV files stored in your project that dbt loads as tables. Perfect for small reference data.
seeds/customer_segments.csv:
customer_id,segment_name,segment_priority1,Enterprise,High2,SMB,Medium3,Startup,LowLoad seeds with:
dbt seedReference in models:
SELECT c.*, s.segment_nameFROM {{ ref('customers') }} cLEFT JOIN {{ ref('customer_segments') }} s ON c.customer_id = s.customer_id4. Snapshots
Track changes to your data over time (Type 2 Slowly Changing Dimensions).
snapshots/customer_snapshot.sql:
{% snapshot customer_snapshot %}
{{ config( target_schema='snapshots', unique_key='customer_id', strategy='timestamp', updated_at='updated_at' )}}
SELECT * FROM {{ source('raw_data', 'customers') }}
{% endsnapshot %}5. Refs and Sources
-
{{ ref(‘model_name’) }} – Reference another model
-
{{ source(‘source_name’, ‘table_name’) }} – Reference raw data
-
{{ ref(‘seed_name’) }} – Reference a seed
6. Tests
Ensure data quality with built-in and custom tests.
models/schema.yml:
version: 2
models: - name: customer_orders columns: - name: customer_id tests: - unique - not_nulldbt Project Structure
my_dbt_project/├── dbt_project.yml # Project configuration├── profiles.yml # Database connections (usually in ~/.dbt/)├── models/│ ├── sources.yml # Raw data definitions│ ├── schema.yml # Tests and docs│ ├── staging/ # Cleaned raw data│ │ ├── stg_orders.sql│ │ └── stg_customers.sql│ └── marts/ # Business logic│ └── customer_metrics.sql├── seeds/ # CSV files├── snapshots/ # Historical tracking└── tests/ # Custom testsEssential dbt Commands
# Setup and validationdbt debug # Test connectiondbt deps # Install dependencies
# Developmentdbt run # Run all modelsdbt run --select stg_orders # Run specific modeldbt run --select +customer_orders # Run model + upstream
# Testingdbt test # Run all testsdbt test --select customer_orders # Test specific model
# Documentationdbt docs generate # Generate docsdbt docs serve # View docs locally
# Otherdbt seed # Load CSV filesdbt snapshot # Capture historical datadbt compile # Compile without runningYour First dbt Project
Step 1: Setup
# Install dbt with DuckDB adapterpip install dbt-core dbt-duckdb
# Initialize projectdbt init my_project
# Navigate to projectcd my_projectStep 2: Configure DuckDB
~/.dbt/profiles.yml:
my_project: target: dev outputs: dev: type: duckdb path: "analytics.duckdb"Step 3: Create Sources
models/sources.yml:
version: 2
sources: - name: jaffle_shop schema: main tables: - name: orders - name: customersStep 4: Create Staging Models
models/staging/stg_orders.sql:
SELECT id as order_id, customer_id, order_date, statusFROM {{ source('jaffle_shop', 'orders') }}Step 5: Create Business Logic
models/marts/customer_orders.sql:
SELECT c.customer_id, c.first_name, c.last_name, MIN(o.order_date) as first_order, MAX(o.order_date) as most_recent_order, COUNT(o.order_id) as number_of_ordersFROM {{ ref('stg_customers') }} cLEFT JOIN {{ ref('stg_orders') }} o ON c.customer_id = o.customer_idGROUP BY c.customer_id, c.first_name, c.last_nameStep 6: Run and Test
dbt rundbt testdbt docs generate && dbt docs servePart 2: Why Switch to Ibis?
Now that you understand dbt fundamentals, let’s see why Ibis is a game-changer.
The Problem with Pure SQL
SELECT c.customer_id, c.first_name, c.last_name, MIN(o.order_date) as first_order, MAX(o.order_date) as most_recent_order, COUNT(o.order_id) as number_of_orders, SUM(o.amount) as total_spentFROM {{ ref('stg_customers') }} cLEFT JOIN {{ ref('stg_orders') }} o ON c.customer_id = o.customer_idWHERE o.status = 'completed'GROUP BY c.customer_id, c.first_name, c.last_nameHAVING COUNT(o.order_id) >= 2ORDER BY total_spent DESCIssues with SQL:
-
No type checking
-
Hard to reuse logic
-
Difficult to unit test
-
Limited IDE autocomplete
-
Database-specific syntax variations
-
Hard to debug complex queries
The Ibis Solution
Ibis Model (models/marts/customer_orders.ibis):
from dbt_ibis import depends_on, ref
@depends_on(ref("stg_customers"), ref("stg_orders"))def model(stg_customers, stg_orders): """Calculate customer order metrics with Python""" # Filter completed orders completed_orders = stg_orders.filter(stg_orders["status"] == "completed")
# Aggregate by customer customer_orders = completed_orders.group_by("customer_id").aggregate( first_order=completed_orders["order_date"].min(), most_recent_order=completed_orders["order_date"].max(), number_of_orders=completed_orders.count(), total_spent=completed_orders["amount"].sum() )
# Join with customers result = stg_customers.join(customer_orders, "customer_id", how="left")
# Filter and order return result.filter( result["number_of_orders"] >= 2 ).order_by(result["total_spent"].desc())Benefits of Ibis:
-
Type safety – Catch errors before execution
-
Reusable functions – Build complex transformations
-
Unit testing – Test with pytest
-
IDE support – Autocomplete and refactoring
-
Backend agnostic – Works with DuckDB, Snowflake, BigQuery, etc.
-
Python ecosystem – Use any Python library for logic
Comparison: SQL vs Ibis
| Feature | SQL | Ibis |
|---|---|---|
| Type Safety | None | Full Python typing |
| Testing | Limited | pytest integration |
| Reusability | Copy-paste or Jinja | Functions & classes |
| Debugging | Hard | Python debugger |
| IDE Support | Basic | Full autocomplete |
| Multi-backend | Different syntax | Same code |
Part 3: Implementing dbt with Ibis
Installation
# Install dbt-ibis with DuckDB supportpip install dbt-ibis dbt-duckdb "ibis-framework[duckdb]"
# Verify installationdbt --versiondbt-ibis --versionProject Structure with Ibis
my_dbt_project/├── dbt_project.yml├── models/│ ├── sources.yml│ ├── schema.yml│ ├── staging/│ │ ├── stg_orders.ibis # Note: .ibis extension│ │ ├── stg_customers.ibis│ │ └── __ibis_sql/ # Auto-generated SQL (don't edit)│ │ ├── stg_orders.sql│ │ └── stg_customers.sql│ └── marts/│ ├── customer_orders.ibis│ └── __ibis_sql/│ └── customer_orders.sql├── seeds/│ └── customer_segments.csv├── snapshots/│ └── customer_snapshot.sql # Snapshots remain in SQL└── tests/ └── test_customer_orders.ibisImportant: Ibis models use the .ibis file extension, NOT .py. The dbt-ibis CLI automatically compiles these to SQL files in __ibis_sql/ subfolders.
Configuration
~/.dbt/profiles.yml:
my_project: target: dev outputs: dev: type: duckdb path: "analytics.duckdb"dbt_project.yml:
name: "my_project"version: "1.0.0"config-version: 2
profile: "my_project"
model-paths: ["models"]seed-paths: ["seeds"]snapshot-paths: ["snapshots"]test-paths: ["tests"]
target-path: "target"clean-targets: - "target" - "dbt_packages"
models: my_project: staging: +materialized: view marts: +materialized: tableBuilding Models with Ibis
1. Define Sources with Column Data Types
Critical: For non-Ibis models, sources, seeds, and snapshots, you must specify column data types. This is how dbt-ibis knows the schema.
models/sources.yml:
version: 2
sources: - name: jaffle_shop schema: main tables: - name: orders columns: - name: id data_type: integer - name: customer_id data_type: integer - name: order_date data_type: date - name: status data_type: varchar - name: amount data_type: decimal - name: customers columns: - name: id data_type: integer - name: first_name data_type: varchar - name: last_name data_type: varchar - name: email data_type: varcharTip: Use the dbt-codegen package to auto-generate these column definitions:
dbt run-operation generate_source --args '{"database_name": "main", "schema_name": "main", "generate_columns": true, "table_names": ["orders", "customers"]}'2. Create Staging Models
models/staging/stg_orders.ibis:
from dbt_ibis import depends_on, source
@depends_on(source("jaffle_shop", "orders"))def model(orders): """Clean and standardize orders data""" return orders.select( order_id=orders["id"], customer_id=orders["customer_id"], order_date=orders["order_date"], status=orders["status"], amount=orders["amount"] )models/staging/stg_customers.ibis:
from dbt_ibis import depends_on, source
@depends_on(source("jaffle_shop", "customers"))def model(customers): """Standardize customer data""" return customers.select( customer_id=customers["id"], first_name=customers["first_name"], last_name=customers["last_name"], email=customers["email"].lower() )3. Create Business Logic Models
models/marts/customer_orders.ibis:
from dbt_ibis import depends_on, ref
@depends_on(ref("stg_customers"), ref("stg_orders"))def model(stg_customers, stg_orders): """ Calculate customer order metrics: - First order date - Most recent order - Total number of orders - Total spent """ # Filter for completed orders only completed_orders = stg_orders.filter(stg_orders["status"] == "completed")
# Aggregate by customer customer_orders = completed_orders.group_by("customer_id").aggregate( first_order=completed_orders["order_date"].min(), most_recent_order=completed_orders["order_date"].max(), number_of_orders=completed_orders.count(), total_spent=completed_orders["amount"].sum() )
# Join with customer details result = stg_customers.join(customer_orders, "customer_id", how="left")
# Select final columns return result.select( "customer_id", "first_name", "last_name", "first_order", "most_recent_order", "number_of_orders", "total_spent" )Note: Parameter names in the model() function must match the names used in ref() and source(). For example, ref("stg_customers") means the parameter should be stg_customers.
Running Your Ibis Project
The key command is dbt-ibis instead of dbt:
# Run all models (compiles .ibis to .sql, then runs dbt)dbt-ibis run
# Run specific modeldbt-ibis run --select customer_orders
# Run with upstream dependenciesdbt-ibis run --select +customer_orders
# Run testsdbt-ibis test
# Generate documentationdbt-ibis docs generatedbt-ibis docs serve
# Load seeds (seeds are still CSV, no Ibis)dbt-ibis seed
# Run snapshots (snapshots are still SQL)dbt-ibis snapshotUnderstanding the Compilation Process
# This single command:dbt-ibis run
# Is equivalent to:dbt-ibis precompile # Convert .ibis files to .sqldbt run # Run the generated SQL with dbtThe precompile step generates SQL files in __ibis_sql/ subfolders. You can inspect these to debug your Ibis expressions.
Shell Alias (Optional)
To continue using dbt command instead of dbt-ibis, add this alias to your ~/.bashrc or ~/.zshrc:
alias dbt="dbt-ibis"Part 4: Working with Seeds and Snapshots
Seeds with Ibis
Seeds remain CSV files, but you can reference them in Ibis models.
seeds/customer_segments.csv:
customer_id,segment_name,segment_priority1,Enterprise,High2,SMB,Medium3,Startup,Lowseeds/schema.yml (required for Ibis to know the types):
version: 2
seeds: - name: customer_segments columns: - name: customer_id data_type: integer - name: segment_name data_type: varchar - name: segment_priority data_type: varchar tests: - accepted_values: values: ['High', 'Medium', 'Low']models/marts/enriched_orders.ibis:
from dbt_ibis import depends_on, ref
@depends_on(ref("stg_orders"), ref("customer_segments"))def model(stg_orders, customer_segments): """Enrich orders with customer segments""" return stg_orders.join( customer_segments, "customer_id", how="left" ).select( "order_id", "customer_id", "order_date", "amount", "segment_name", "segment_priority" )Snapshots
Snapshots currently remain in SQL (dbt-ibis limitation). You can still reference them in Ibis models.
snapshots/customer_snapshot.sql:
{% snapshot customer_snapshot %}
{{ config( target_schema='snapshots', unique_key='customer_id', strategy='timestamp', updated_at='updated_at' )}}
SELECT * FROM {{ source('jaffle_shop', 'customers') }}
{% endsnapshot %}models/schema.yml (define snapshot columns for Ibis):
version: 2
snapshots: - name: customer_snapshot columns: - name: customer_id data_type: integer - name: first_name data_type: varchar - name: last_name data_type: varchar - name: email data_type: varchar - name: dbt_valid_from data_type: timestamp - name: dbt_valid_to data_type: timestampmodels/marts/customer_history.ibis:
from dbt_ibis import depends_on, ref
@depends_on(ref("customer_snapshot"))def model(customer_snapshot): """Analyze customer changes over time""" # Get current customers (dbt_valid_to is null) current = customer_snapshot.filter( customer_snapshot["dbt_valid_to"].isnull() )
return current.select( "customer_id", "first_name", "last_name", "dbt_valid_from" )Part 5: Writing Tests with Ibis
Singular Tests
Create .ibis files in your tests/ folder with a test function (not model).
tests/test_no_duplicate_names.ibis:
from dbt_ibis import depends_on, ref
@depends_on(ref("stg_customers"))def test(stg_customers): """Test that no customer has first_name equal to last_name""" return stg_customers.filter( stg_customers["first_name"] == stg_customers["last_name"] )The test passes if the query returns zero rows.
Unit Testing with pytest
You can also write unit tests outside of dbt using pytest and Ibis memtables:
tests/unit/test_transformations.py:
import pytestimport ibis
def test_customer_orders_aggregation(): """Test that customer orders aggregation works correctly"""
# Create test data using memtables test_orders = ibis.memtable({ "order_id": [1, 2, 3, 4], "customer_id": [1, 1, 2, 3], "order_date": ["2024-01-01", "2024-01-15", "2024-01-20", "2024-02-01"], "status": ["completed", "completed", "completed", "pending"], "amount": [100.0, 150.0, 200.0, 50.0] })
# Apply the same logic as your model completed_orders = test_orders.filter(test_orders["status"] == "completed")
result = completed_orders.group_by("customer_id").aggregate( order_count=completed_orders.count(), total_spent=completed_orders["amount"].sum() ).execute()
# Assert expectations assert len(result) == 2 # Only customers 1 and 2 have completed orders
customer_1 = result[result["customer_id"] == 1].iloc[0] assert customer_1["order_count"] == 2 assert customer_1["total_spent"] == 250.0Run with:
pytest tests/unit/Part 6: Advanced Ibis Patterns
Complex Aggregations
from dbt_ibis import depends_on, ref
@depends_on(ref("stg_orders"))def model(stg_orders): """Customer lifetime value analysis""" return stg_orders.group_by("customer_id").aggregate( first_order=stg_orders["order_date"].min(), last_order=stg_orders["order_date"].max(), total_spent=stg_orders["amount"].sum(), avg_order_value=stg_orders["amount"].mean(), order_count=stg_orders.count() )Window Functions
from dbt_ibis import depends_on, ref
@depends_on(ref("stg_orders"))def model(stg_orders): """Rank orders by customer""" return stg_orders.mutate( order_rank=stg_orders["order_id"].rank().over( group_by="customer_id", order_by="order_date" ), running_total=stg_orders["amount"].sum().over( group_by="customer_id", order_by="order_date" ) )Conditional Logic with Case Statements
from dbt_ibis import depends_on, refimport ibis
@depends_on(ref("stg_orders"))def model(stg_orders): """Categorize order values""" return stg_orders.mutate( order_category=ibis.case() .when(stg_orders["amount"] Create reusable transformation functions:
```python# models/utils/transformations.py (regular Python file)def add_fiscal_quarter(table, date_column): """Add fiscal quarter based on date column""" import ibis month = table[date_column].month() return table.mutate( fiscal_quarter=ibis.case() .when(month.isin([1, 2, 3]), "Q1") .when(month.isin([4, 5, 6]), "Q2") .when(month.isin([7, 8, 9]), "Q3") .else_("Q4") .end() )from dbt_ibis import depends_on, reffrom utils.transformations import add_fiscal_quarter
@depends_on(ref("stg_orders"))def model(stg_orders): """Add fiscal quarter to orders""" return add_fiscal_quarter(stg_orders, "order_date")Part 7: Configuration Tips
Column Name Casing (Snowflake, etc.)
For databases like Snowflake that store identifiers in uppercase:
dbt_project.yml:
vars: # Format: dbt_ibis_letter_case_in_db_{profile}_{target} dbt_ibis_letter_case_in_db_my_project_prod: upper dbt_ibis_letter_case_in_expr: lowerThis lets you write lowercase column names in your Ibis code while Snowflake stores them in uppercase.
VS Code Configuration
Add to your VS Code settings.json:
{ "files.associations": { "*.ibis": "python" }}Part 8: Limitations and Workarounds
Current dbt-ibis Limitations
-
No database connection in models – Ibis models generate SQL; they cannot query the database directly
-
Column types required – Non-Ibis sources, seeds, snapshots, and SQL models need explicit column type definitions
-
Snapshots remain SQL – Currently no Ibis support for snapshot definitions
Workarounds
For complex queries requiring database connection: Use a SQL model for that specific transformation, then reference it in your Ibis models.
For auto-generating column types:
# Install dbt-codegendbt deps # After adding to packages.yml
# Generate source definitionsdbt run-operation generate_source \ --args '{"database_name": "main", "schema_name": "main", "generate_columns": true, "table_names": ["orders"]}'
# Generate model column definitionsdbt run-operation generate_model_yaml \ --args '{"model_names": ["stg_orders"]}'Part 9: CI/CD Integration
GitHub Actions Example
.github/workflows/dbt.yml:
name: dbt CI
on: [push, pull_request]
jobs: build-and-test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4
- name: Setup Python uses: actions/setup-python@v5 with: python-version: '3.11'
- name: Install dependencies run: | pip install dbt-ibis dbt-duckdb "ibis-framework[duckdb]" pip install pytest
- name: Verify Ibis compilation is up to date run: | dbt-ibis precompile # Check if any files changed if [[ -n $(git status --porcelain) ]]; then echo "Error: dbt-ibis precompile generated changes. Please commit them." git status exit 1 fi
- name: Run dbt run: | dbt-ibis run dbt-ibis test
- name: Run pytest run: pytest tests/unit/Pre-commit Hook
.pre-commit-config.yaml:
repos: - repo: local hooks: - id: dbt-ibis-precompile name: dbt-ibis precompile entry: dbt-ibis precompile language: system pass_filenames: false files: '\.ibis$'Part 10: Switching Backends
One of Ibis’s greatest strengths is backend portability. Your models work unchanged across databases.
Development with DuckDB, Production with Snowflake
~/.dbt/profiles.yml:
my_project: target: dev outputs: dev: type: duckdb path: "analytics.duckdb"
prod: type: snowflake account: "{{ env_var('SNOWFLAKE_ACCOUNT') }}" user: "{{ env_var('SNOWFLAKE_USER') }}" password: "{{ env_var('SNOWFLAKE_PASSWORD') }}" role: analyst database: analytics warehouse: compute_wh schema: dbt_analyticsRun against different targets:
# Developmentdbt-ibis run
# Productiondbt-ibis run --target prodYour Ibis models remain unchanged – they automatically compile to the appropriate SQL dialect!
Conclusion
This guide took you from zero to hero – starting with dbt fundamentals and ending with Python-powered transformations using Ibis.
What You Now Know
-
dbt fundamentals: Models, sources, refs, tests, seeds, snapshots
-
dbt commands: run, test, docs, seed, snapshot
-
dbt-ibis syntax: .ibis files, @depends_on, ref(), source()
-
Column type requirements: Why and how to specify them
-
Testing: Singular tests and pytest integration
-
Advanced patterns: Window functions, aggregations, reusable code
-
CI/CD: GitHub Actions integration
-
Multi-backend: Same code for DuckDB and Snowflake
Quick Reference
| Task | Command |
|---|---|
| Run all models | dbt-ibis run |
| Run specific model | dbt-ibis run —select model_name |
| Run tests | dbt-ibis test |
| Load seeds | dbt-ibis seed |
| Run snapshots | dbt-ibis snapshot |
| Generate docs | dbt-ibis docs generate |
| Precompile only | dbt-ibis precompile |
Supported Backends
dbt-ibis supports: DuckDB, Snowflake, BigQuery, Postgres, Redshift, RisingWave, Databricks, Trino, MySQL, SQLite, Oracle
Resources
-
dbt-ibis Documentation
-
dbt-ibis GitHub
-
Ibis Project
-
dbt Documentation
-
dbt-ibis Blog Post
Ready to transform your data pipeline? Start with the examples above, experiment locally with DuckDB, and watch your productivity soar.