Column Validators#

Validators assess whether input data for a particular column is valid. Validators provide a framework for describing expectations for data.

Validators are implemented in a functional style. There are a bunch of functions that can create new Validator instances to verify data matches given preconditions.

The validator functions are built on pyarrow.compute calls, so the data types they can handle are determined by the available kernels in the Arrow C++ compute library.

The validators can be combined using the and_() function. This can combine several validators to make one compound statement. For example, this code validates the size column is between 5 and 30:

import quivr as qv
from quivr import and_, ge, le

class Hat(qv.Table):
    size = qv.Int8Column(validator=and_(ge(5), le(30)))
quivr.eq(val)#

Validator that all data in a column is equal to a given value.

Return type:

Validator

quivr.lt(val)#

Validator that all data in a column is less than a given value.

Return type:

Validator

quivr.le(val)#

Validator that all data in a column is less than or equal to a given value.

Return type:

Validator

quivr.gt(val)#

Validator that all data in a column is greater than a given value.

Return type:

Validator

quivr.ge(val)#

Validator that all data in a column is greater than or equal to a given value.

Return type:

Validator

quivr.is_in(val, fail_on_null=False)#

Validator that all data in a column is in a given set.

Parameters:
  • val (Any) – The set of values to check against. This can be a list, tuple, or pyarrow.Array. If it is a list or tuple, it will be converted to a pyarrow.Array.

  • fail_on_null (bool) – If True, then nulls always trigger an error. Otherwise, they are matched to the value set, just like regular values.

Return type:

Validator

quivr.and_(*validators)#

Validator that all data in a column passes all of the given validators.

Return type:

Validator

class quivr.Validator(func, args, label)#

A Validator is a tool to validate that data in a pyarrow.Array matches a predicate expression.

Variables:
  • func – The predicate function to use for validation. This must be a scalar or scalar aggregate function.

  • args – The arguments to pass to the predicate function.

  • label – A label to use when reporting validation errors.

evaluate(array)#

Evaluates the predicate function on the given array.

Parameters:

array (Array) – The array to evaluate.

Return type:

Array

valid(array)#

Returns True if the given array is valid, False otherwise.

Parameters:

array (Array) – The array to validate.

Return type:

bool

validate(array)#

Raises a ValidationError if the given array is not valid.

Parameters:

array (Array) – The array to validate.

failures(array)#

Returns a tuple of two arrays, the first containing the indices of the invalid values, and the second containing the invalid values themselves.

If the validator is a scalar aggregate function, raises a TypeError.

Return type:

tuple[Array, Array]