Prove your data.
Declarative data quality engine. Define checks in YAML, run anywhere, catch issues before they break your decisions.
source: type: postgres table: orderschecks: - not_null: [id, amount] - unique: id - range: column: amount min: 0Check Types
Built-in validations
Connectors
DuckDB to BigQuery
Lines to Start
Minimal config
Open Source
Apache 2.0 forever
Declarative
Data quality in 3 lines of YAML. No boilerplate, no classes to extend, no framework to learn. Write what you expect, Provero handles the rest.
checks: - not_null: [id, amount] - unique: id
16 check types: not_null, unique, range, regex, freshness, anomaly, custom_sql, row_count, completeness, accepted_values, type, latency, unique_combination, row_count_change, referential_integrity, email_validation.
Free Anomaly Detection
Z-Score, MAD, IQR. No scipy, no cloud subscription, no asterisks. Statistical anomaly detection that runs locally, built into the core.
Apache 2.0. Forever.
No vendor lock-in. No surprise license changes. No features behind a paywall. Same simplicity as Soda, real open source license. Same power as GX, 90% less config.
Your AI is only as good as your data. Prove it.
Apache Griffin retired in November 2025. Soda Core moved to ELv2, putting anomaly detection and key features behind a cloud paywall. Great Expectations requires 50+ lines of Python to validate a single table.
There was no truly open source, lightweight, YAML-first data quality tool left. So we built one.
Provero fills that vacuum. Apache 2.0 licensed, vendor-neutral, zero heavy dependencies. Data quality that belongs to the community.
Apache Griffin retires
The only Apache-licensed DQ engine moves to the attic. The ecosystem loses its open standard.
Soda Core goes ELv2
Anomaly detection, data contracts, and key features locked behind Soda Cloud. Not truly open source anymore.
Provero is born
Declarative, vendor-neutral, Apache 2.0. Everything built-in. Nothing behind a paywall.
Three commands. That's it.
Install
Pure Python. No Java, no Spark, no Docker.
Define
YAML checks. Version-controlled, reviewable in PRs.
Run
Locally, in CI/CD, or inside Airflow DAGs.
Data quality checks in YAML. Runs anywhere. Free anomaly detection.
Define checks next to your pipeline code. Run with provero run in CI or locally. No Java, no Spark, no Docker.
Runs where your data lives
Same YAML, same checks, any source. From local DuckDB files to cloud warehouses, Provero connects natively without extra drivers or plugins.
Databases
Connect directly to your production or analytical databases.
Cloud Warehouses
Validate data at scale in your cloud data platform.
DataFrames
Run checks on in-memory data without a database connection.
Orchestration & Alerts
Integrate into your existing workflow and notification stack.
Missing a connector?
Provero has a plugin system. Build your own or request one.
How Provero compares
| Feature | Provero | Great Expectations | Soda Core | dbt Tests |
|---|---|---|---|---|
| Open Source License | Apache 2.0 | Apache 2.0 | ELv2 | Apache 2.0 |
| YAML-first Config | Python | SQL + YAML | ||
| Anomaly Detection | Built-in | Plugin | Cloud only | |
| Data Contracts | Cloud only | |||
| Zero Config Start | 3 lines | ~50 lines | Needs dbt | |
| No Heavy Dependencies | ||||
| SQL Batch Optimization | ||||
| Airflow Integration | ||||
| Migration Tools | SodaCL & dbt |
Get in early.
Provero is being built in the open. Join now, shape the project, and be part of it from the start.
Your data deserves proof.
We're building Provero in the open. Star the repo to follow our progress and be the first to know when we launch.