v0.2.1 · Open Source · Apache 2.0

Prove your data.

Declarative data quality engine. Define checks in YAML, run anywhere, catch issues before they break your decisions.

provero.yaml
source:
type: postgres
table: orders
checks:
- not_null: [id, amount]
- unique: id
- range:
column: amount
min: 0
terminal
$
not_nullidPASS
not_nullamountPASS
uniqueidPASS
rangeamountFAIL
Score: 75/1003 passed, 1 failed · 18ms

Check Types

Built-in validations

Connectors

DuckDB to BigQuery

Lines to Start

Minimal config

%

Open Source

Apache 2.0 forever

Declarative

Data quality in 3 lines of YAML. No boilerplate, no classes to extend, no framework to learn. Write what you expect, Provero handles the rest.

checks:
  - not_null: [id, amount]
  - unique: id

16 check types: not_null, unique, range, regex, freshness, anomaly, custom_sql, row_count, completeness, accepted_values, type, latency, unique_combination, row_count_change, referential_integrity, email_validation.

Free Anomaly Detection

Z-Score, MAD, IQR. No scipy, no cloud subscription, no asterisks. Statistical anomaly detection that runs locally, built into the core.

Z-ScoreMADIQRstdlib onlyno cloud needed

Apache 2.0. Forever.

No vendor lock-in. No surprise license changes. No features behind a paywall. Same simplicity as Soda, real open source license. Same power as GX, 90% less config.

Why Provero

Your AI is only as good as your data. Prove it.

Apache Griffin retired in November 2025. Soda Core moved to ELv2, putting anomaly detection and key features behind a cloud paywall. Great Expectations requires 50+ lines of Python to validate a single table.

There was no truly open source, lightweight, YAML-first data quality tool left. So we built one.

Provero fills that vacuum. Apache 2.0 licensed, vendor-neutral, zero heavy dependencies. Data quality that belongs to the community.

Nov 2025

Apache Griffin retires

The only Apache-licensed DQ engine moves to the attic. The ecosystem loses its open standard.

2024

Soda Core goes ELv2

Anomaly detection, data contracts, and key features locked behind Soda Cloud. Not truly open source anymore.

2025

Provero is born

Declarative, vendor-neutral, Apache 2.0. Everything built-in. Nothing behind a paywall.

How it works

Three commands. That's it.

1

Install

$ pip install provero && provero init

Pure Python. No Java, no Spark, no Docker.

2

Define

$ vim provero.yaml

YAML checks. Version-controlled, reviewable in PRs.

3

Run

$ provero run

Locally, in CI/CD, or inside Airflow DAGs.

Data quality checks in YAML. Runs anywhere. Free anomaly detection.

Define checks next to your pipeline code. Run with provero run in CI or locally. No Java, no Spark, no Docker.

Ecosystem

Runs where your data lives

Same YAML, same checks, any source. From local DuckDB files to cloud warehouses, Provero connects natively without extra drivers or plugins.

Databases

Connect directly to your production or analytical databases.

DuckDB
DuckDB
stable
PostgreSQL
PostgreSQL
stable
MySQL
MySQL
beta
SQLite
SQLite
beta

Cloud Warehouses

Validate data at scale in your cloud data platform.

Snowflake
Snowflake
beta
BigQuery
BigQuery
beta
Redshift
Redshift
beta

DataFrames

Run checks on in-memory data without a database connection.

Pandas
Pandas
stable
Polars
Polars
stable

Orchestration & Alerts

Integrate into your existing workflow and notification stack.

Apache Airflow
Apache Airflow
stable
Flyte
Flyte
stable
Slack
Slack
stable
PagerDuty
PagerDuty
beta

Missing a connector?

Provero has a plugin system. Build your own or request one.

Request Connector
Comparison

How Provero compares

FeatureProveroGreat ExpectationsSoda Coredbt Tests
Open Source License
Apache 2.0
Apache 2.0
ELv2
Apache 2.0
YAML-first Config
Python
SQL + YAML
Anomaly Detection
Built-in
Plugin
Cloud only
Data Contracts
Cloud only
Zero Config Start
3 lines
~50 lines
Needs dbt
No Heavy Dependencies
SQL Batch Optimization
Airflow Integration
Migration Tools
SodaCL & dbt
Community

Get in early.

Provero is being built in the open. Join now, shape the project, and be part of it from the start.

GitHub

Follow the development, report issues, and help shape the project from day one.

Star on GitHub

Slack

Join the conversation, ask questions, and connect with early adopters.

Join Slack

Reddit

Discuss data quality, share feedback, and follow project updates.

Join r/provero

Contributing

Read the contributing guide and submit your first pull request.

Read Guide

Your data deserves proof.

We're building Provero in the open. Star the repo to follow our progress and be the first to know when we launch.