Great Expectations
Open-source Python library with declarative expectations to validate data in files, SQL databases, data warehouses, and in-memory DataFrames.
Best for data engineering teams looking for a code-first OSS data testing library with a large built-in expectation library and Python extensibility.
Deequ
Open-source Scala library built on Apache Spark to define and verify data quality constraints and profile large datasets at scale.
Best for data engineering teams using Apache Spark looking for a code-first OSS library to define data quality constraints programmatically in Scala or Python.
Google CloudDQ
Cloud-native data validation CLI with YAML-based data quality checks for BigQuery tables and GCS structured data.
Best for data teams looking for a BigQuery-native solution to write reusable SQL checks and consume data quality outputs programmatically.
DQX by Databricks
Data quality framework for Apache Spark with data quality rule generation from profiling results, and YAML and Python-based data validation checks.
Best for Databricks users looking to validate PySpark DataFrames and Tables across Spark Core, Spark Structured Streaming, and Lakeflow Pipelines / DLT.
Monte Carlo
Leading data observability platform with data monitors, anomaly detection, customizable data quality dashboards, and column-level lineage.
Best for data teams with a big budget looking for a mature and customizable data observability platform that also offers AI observability.
DQLabs
Unified data quality and observability platform with anomaly detection, data quality checks, end-to-end data lineage, and pipeline observability.
Best for enterprises looking for unified data quality and observability that integrates with modern data catalogs and issue management tools.
Qualytics
ML-powered data quality platform with auto-generated tests from profiling results, anomaly detection, and data quality context for humans and AI agents.
Best for enterprises in highly regulated industries looking for a scalable data quality platform with on-premise cloud deployments via Kubernetes.
DQOps
Open-source data quality testing and observability platform with data quality checks, monitors, data lineage with Marquez, and data quality dashboards.
Best for data teams looking to customize built-in data quality checks and data quality dashboards with Looker Studio to monitor data quality KPIs.
DataKitchen
Open-source data testing and observability platform with automated test generation, data profiling, and anomaly detection.
Best for data teams looking for a cost-effective data testing and observability solution that prices per database connection and user.
AWS Glue Data Quality
Managed data quality platform built on the open-source Deequ framework with data quality rulesets, scheduling, data quality dashboards, and anomaly detection.
Best for data teams using AWS Glue Data Catalog and ETL jobs that want to monitor data quality at rest and in transit, with the possibility to quarantine data.
Building or buying a data tool in 2026?
One email a month — a new market guide and tool list, straight to your inbox. Next up: Data Governance, LLMOps, Data Orchestration.
By Ari Bajo - Data Engineer turned Writer.
Soda Core
Open-source Python library and CLI to write and run data contracts in YAML using SodaCL with integrations for data warehouses, databases and query engines.
Best for data engineering teams looking for a YAML-based OSS data testing library that embeds directly in pipelines and CI/CD workflows.
Soda Cloud
Managed data quality platform with built-in metrics to write data contracts (using YAML, UI, or AI), anomaly detection and AI agents to clean data.
Best for data teams looking to embed data contracts within data pipeline steps, collaborate with business users to fix bad data, and integrate with data catalogs.
OpenMetadata
Open-source unified metadata platform with data discovery, data quality checks, observability metrics, column-level lineage, and governance workflows.
Best for data teams looking for a self-hosted open-source platform covering data discovery, observability, and governance with a wide range of integrations.
Collate
Managed enterprise data platform built on OpenMetadata with data discovery, observability metrics, column-level lineage, and governance workflows.
Best for data teams looking for a fully managed enterprise version of OpenMetadata with dedicated support, security features, and advanced governance worflows.
SelectZero
Comprehensive data observability platform with data validation, data profiling, column-level data lineage, a data catalog, and a business glossary.
Best for enterprises looking for a data quality tool that can be easily self-hosted with a Docker deployment.
Ataccama ONE
Data trust platform with data quality evaluation rules, anomaly detection, data lineage, a data catalog, and master data management.
Best for organizations looking to scale data management initiatives with enterprise master data management, data quality, and data governance.