Skip to main content

Unit Testing Data Pipelines

Starlake lets you unit test data pipelines locally using DuckDB, with automatic SQL transpilation from your target data warehouse dialect. No cloud access, no service accounts, and fast deterministic feedback -- ideal for CI pipelines.

Why test locally with DuckDB?

Testing data pipelines on the target data warehouse presents several challenges:

ChallengeCloud-based testingLocal DuckDB testing
Data access costsQueries consume warehouse creditsFree, in-memory execution
Shared databaseUnsuitable for isolated unit testsEach test gets its own instance
Service accountsRequired for CI environmentsNot needed
Feedback speedSlow round-trips to cloudSub-second execution
ReproducibilityHard to ensure deterministic resultsFully deterministic

How it works

Starlake transpiles your SQL from the target dialect (BigQuery, Snowflake, Databricks, etc.) to DuckDB-compatible SQL. Tests run against a local DuckDB instance populated with sample data. Starlake compares the actual results against expected data you define in CSV or JSON files.

You write SQL in your target dialect. Starlake handles the transpilation automatically. No query modification is needed.

Starlake unit testing architecture

What you can test

Starlake supports two types of unit tests:

  • Load tests -- Validate data loading pipelines. Provide sample input files and expected output data. Starlake loads the data using DuckDB and compares the result.
  • Transform tests -- Validate SQL transformations. Provide source data and expected output. Starlake transpiles and runs the SQL, then compares the result.

Both types generate detailed HTML test reports with the DuckDB database, unexpected rows, and missing rows for debugging.

Running tests

Run all tests (load and transform):

starlake test

Run only load tests:

starlake test --load

Run only transform tests:

starlake test --transform

Run a specific test:

starlake test --name domain.table.test-name

Frequently Asked Questions

Why does Starlake use DuckDB for unit testing?

Starlake uses DuckDB because it is a fast, in-memory SQL database that runs locally without requiring cloud access or service accounts. This eliminates data access costs, provides fast feedback, and allows deterministic, repeatable tests.

How do Starlake unit tests work locally?

Starlake transpiles your target data warehouse SQL dialect to DuckDB-compatible SQL. This lets you run your production load and transform logic on a local DuckDB instance, comparing actual results against expected data defined in JSON files.

Can I run Starlake unit tests in my CI pipeline?

Yes, since Starlake unit tests run on DuckDB locally, they can be executed in any CI environment without needing a service account or access to the target data warehouse. This makes them ideal for automated testing before deployment.

What can I test with Starlake unit tests?

You can test both data loading and data transformations. For load tests, you provide sample input files and expected output data. For transform tests, you provide sample source data and verify the SQL transformation produces the expected results.

Does SQL transpilation change my queries?

No. You write SQL in your target dialect (BigQuery, Snowflake, Databricks, etc.) and Starlake automatically transpiles it to DuckDB for testing. Your original queries remain unchanged.