Columns#

Columns are descriptions of the data that populates a given Table. Columns are associated with tables by writing them as class attributes on Table subclasses. For example:

import quivr

class Person(quivr.Table):
    name = quivr.StringColumn()
    age = quivr.Uint8Column()
    favorite_books = quivr.ListColumn(quivr.StringColumn())

The Person Table defined above has three columns: a string name, a uint8 age, and a list of strings for books.

All of the column types below inherit from Column, and so they are Descriptors: they all have Column.__get__(), Column.__set__(), and Column.__set_name__() methods.

Simple Types#

These are the simplest column types. They all are initialized the same way:

import quivr

class MyTable(quivr.Table)
    # With no arguments, you get a nullable column that doesn't validate data.
    # This could be quivr.Uint8Column(), or quivr.StringColumn(), whatever
    quivr.Int32Column()

    # Pass nullable=False to get a non-nullable column.
    quivr.Int32Column(nullable=False)

    # Pass a validator to check the input data against a constraint.
    quivr.Int32Column(validator=quivr.gt(0))

    # Pass metadata for arbitrary extra information to
    # include. Metadata should be a string-to-string dictionary.
    quivr.Int32Column(metadata={'units': 'seconds'})

    # Set a default value to be used if any inputs are null
    quivr.Int32Column(nullable=False, default=3)

When you access a column of a primitive type on a Table instance, you get a pyarrow.Array back. The data type of the array is described in this table:

Column Type

Data Type

Description

StringColumn

pyarrow.StringArray

UTF-8 string data. Strings must all be less than 231 characters long.

LargeStringColumn

pyarrow.LargeStringArray

UTF-8 string data up to 263 characters long.

Int8Column

pyarrow.Int8Array

8-bit signed integers (-128 to 127)

Int16Column

pyarrow.Int16Array

16-bit signed integers (-32,768 to 32,767)

Int32Column

pyarrow.Int32Array

32-bit signed integers (-231 to 231 - 1)

Int64Column

pyarrow.Int64Array

64-bit signed integers (-263 to 263 - 1)

UInt8Column

pyarrow.UInt8Array

8-bit unsigned integers (0 to 255)

UInt16Column

pyarrow.UInt16Array

16-bit unsigned integers (0 to 65,535)

UInt32Column

pyarrow.UInt32Array

32-bit unsigned integers (0 to 232 - 1)

UInt64Column

pyarrow.UInt64Array

64-bit unsigned integers (0 to 264 - 1)

Float16Column

pyarrow.HalfFloatArray

16-bit floating point values

Float32Column

pyarrow.FloatArray

32-bit floating point values

Float64Column

pyarrow.DoubleArray

64-bit floating point values

BooleanColumn

pyarrow.BooleanArray

Boolean (true/false) values

NullColumn

pyarrow.NullArray

A zero-sized array of nulls

BinaryColumn

pyarrow.BinaryArray

Arbitrary binary blobs, variably sized, up to 231 bytes long each (about 4GB)

LargeBinaryColumn

pyarrow.LargeBinaryArray

Arbitrary binary blobs, variably sized, up to 263 bytes long each (about 9 exabytes)

class quivr.StringColumn(nullable=True, metadata=None, validator=None, default=None)#

A column for storing strings.

This can be used to store strings of any length, but it is not recommended for storing very long strings (over 2GB, for example). For long strings, use LargeStringColumn instead.

class quivr.LargeStringColumn(nullable=True, metadata=None, validator=None, default=None)#

A column for storing large strings (over 231 bytes long). Large string data is stored in variable-length chunks.

class quivr.Int8Column(nullable=True, metadata=None, validator=None, default=None)#

A column for storing 8-bit integers.

class quivr.Int16Column(nullable=True, metadata=None, validator=None, default=None)#

A column for storing 16-bit integers.

class quivr.Int32Column(nullable=True, metadata=None, validator=None, default=None)#

A column for storing 32-bit integers.

class quivr.Int64Column(nullable=True, metadata=None, validator=None, default=None)#

A column for storing 64-bit integers.

class quivr.UInt8Column(nullable=True, metadata=None, validator=None, default=None)#

A column for storing 8-bit unsigned integers.

class quivr.UInt16Column(nullable=True, metadata=None, validator=None, default=None)#

A column for storing 16-bit unsigned integers.

class quivr.UInt32Column(nullable=True, metadata=None, validator=None, default=None)#

A column for storing 32-bit unsigned integers.

class quivr.UInt64Column(nullable=True, metadata=None, validator=None, default=None)#

A column for storing 64-bit unsigned integers.

class quivr.Float16Column(nullable=True, metadata=None, validator=None)#

A column for storing 16-bit floating point numbers.

16-bit floating point arrays have limited support in Arrow, and thus limited support in quivr.

In particular: - They cannot be written to or read from Parquet files.

  • They cannot be constructed from Python floats in a natural way (one must use numpy.float16).

  • They don’t support quivr column default values (because they are not supported by Arrow’s compute functions).

class quivr.Float32Column(nullable=True, metadata=None, validator=None, default=None)#

A column for storing 32-bit floating point numbers.

class quivr.Float64Column(nullable=True, metadata=None, validator=None, default=None)#

A column for storing 64-bit floating point numbers.

class quivr.BooleanColumn(nullable=True, metadata=None, validator=None, default=None)#

A column for storing booleans.

class quivr.NullColumn(nullable=True, metadata=None, validator=None)#

A column for storing null values.

Nulls are represented as a single bit, and do not take up any memory space.

class quivr.BinaryColumn(nullable=True, metadata=None, validator=None, default=None)#

A column for storing opaque binary data.

class quivr.LargeBinaryColumn(nullable=True, metadata=None, validator=None, default=None)#

A column for storing large binary objects (over 231 bytes long). Large binary data is stored in variable-length chunks.

Fixed-Size Binary Data#

BinaryColumn and LargeBinaryColumn work with variably-sized binary. If every item is of identical size, you can use FixedSizeBinaryColumn to save some overhead.

class quivr.FixedSizeBinaryColumn(byte_width, nullable=True, metadata=None, validator=None, default=None)#

A column for storing opaque fixed-size binary data.

Parameters:
  • byte_width (int) – The number of bytes per value.

  • nullable (bool) – Whether the column can contain null values.

  • metadata (Optional[Dict[Union[bytes, str], Union[bytes, str]]]) – Optional metadata to associate with the column.

  • validator (Optional[Validator]) – An optional validator to apply to the column.

  • default (Union[None, bytes, Callable[[], bytes]]) – An optional default value for the column. This can be a scalar value or a callable that takes no arguments.

Decimals#

Decimal data uses fixed-point, which guarantees that it has a consistent number of significant digits.

Decimal columns can be either 128-bit or 256-bit. When you set them up, you provide the “precision” and “scale” to be used.

class quivr.Decimal128Column(precision, scale, nullable=True, metadata=None, default=None)#

A column for storing arbitrary-precision decimal numbers.

Arrow decimals are fixed-point decimal numbers encoded as a scaled integer. The precision is the number of significant digits that the decimal type can represent; the scale is the number of digits after the decimal point (note the scale can be negative).

As an example, Decimal128Column(7, 3) can exactly represent the numbers 1234.567 and -1234.567 (encoded internally as the 128-bit integers 1234567 and -1234567, respectively), but neither 12345.67 nor 123.4567.

DecimalColumn(5, -3) can exactly represent the number 12345000 (encoded internally as the 128-bit integer 12345), but neither 123450000 nor 1234500.

If you need a precision higher than 38 significant digits, consider using Decimal256Column.

Parameters:
  • precision (int) – The number of significant digits.

  • scale (int) – The number of digits after the decimal point.

  • nullable (bool) – Whether the column can contain nulls.

  • metadata (Optional[Dict[Union[bytes, str], Union[bytes, str]]]) – A dictionary of metadata to attach to the column.

  • default (Union[None, Decimal, Callable[[], Decimal]]) – An optional default value for the column. This can be a scalar value or a callable that takes no arguments.

class quivr.Decimal256Column(precision, scale, nullable=True, metadata=None, default=None)#

A column for storing arbitrary-precision decimal numbers.

Arrow decimals are fixed-point decimal numbers encoded as a scaled integer. The precision is the number of significant digits that the decimal type can represent; the scale is the number of digits after the decimal point (note the scale can be negative).

Values are stored as 256-bit integers.

Parameters:
  • precision (int) – The number of significant digits.

  • scale (int) – The number of digits after the decimal point.

  • nullable (bool) – Whether the column can contain nulls.

  • metadata (Optional[Dict[Union[bytes, str], Union[bytes, str]]]) – A dictionary of metadata to attach to the column.

  • default (Union[None, Decimal, Callable[[], Decimal]]) – An optional default value for the column. This can be a scalar value or a callable that takes no arguments.

Structured Data#

Columns can contain nested structural data. With these types, each row of the column contains some structure.

It is easy to get confused by this: All columns are “lists” in a sense, but ListColumn is a two-dimensional structure. Each row of the column is, itself, a list.

class quivr.ListColumn(value_type, nullable=True, metadata=None, validator=None)#

A column for storing variably-sized lists of values.

The values in the list can be of any type.

Note that all quivr tables.Tables are storing lists of values, so this column type is only useful for storing lists of lists.

Parameters:
class quivr.FixedSizeListColumn(value_type, list_size, nullable=True, metadata=None, validator=None)#

A column for storing lists of values of a fixed size.

The values in the list can be of any type.

Note that all quivr Tables are storing lists of values, so this column type is only useful for storing lists of lists.

Parameters:
class quivr.LargeListColumn(value_type, nullable=True, metadata=None, validator=None)#

A column for storing large lists of values (over 231 objects).

Unless you need to represent data with more than 2**31 elements, prefer ListColumn.

The values in the list can be of any type.

Note that all quivr Tables are storing lists of values, so this column type is only useful for storing lists of lists.

Parameters:
class quivr.MapColumn(key_type, item_type, nullable=True, metadata=None, validator=None)#

A column for storing maps of key-value pairs.

The keys and values can be of any type, as long as the keys are hashable and unique.

Parameters:
class quivr.StructColumn(fields, nullable=True, metadata=None, validator=None)#

A column for storing structured data.

In general, prefer to define Tables and use their as_column method instead of using StructColumn.

Parameters:
class quivr.SubTableColumn(table_type, nullable=True, metadata=None)#

A column which represents an embedded Quivr table.

Parameters:
  • table_type (type[~T]) – The type of the table to embed.

  • nullable (bool) – Whether the column can contain null values.

  • metadata (Optional[Dict[Union[bytes, str], Union[bytes, str]]]) – A dictionary of metadata to attach to the column.

SubTableColumns are generally created through Table.as_column(), and are not created directly.

Types For Encoding Efficiency#

class quivr.DictionaryColumn(index_type, value_type, ordered=False, nullable=True, metadata=None, validator=None)#

A column for storing dictionary-encoded values.

This is intended for use with categorical data. See MapColumn for a more general mapping type.

Parameters:
  • index_type (DataType) – The type of the dictionary indices. Must be an integer type.

  • value_type (Union[DataType, Field, Column]) – The type of the values in the dictionary.

  • ordered (bool) – Whether the dictionary is ordered.

  • nullable (bool) – Whether the dictionary can contain null values.

  • metadata (Optional[Dict[Union[bytes, str], Union[bytes, str]]]) – A dictionary of metadata to attach to the column.

  • validator (Optional[Validator]) – A validator to run against the column’s values.

class quivr.RunEndEncodedColumn(run_end_type, value_type, nullable=True, metadata=None, validator=None)#

A column for storing run-end encoded data.

This is a special column type that is used to efficiently store highly ordered data. Internally, the data is stored as two buffers:

  • An array of values, with all consecutive runs of the same value reduced to a single element

  • An array of run lengths, with the length of each run

This is more compact than storing the redundant values and also allow for very efficient computations like aggregations upon the data.

Parameters:
  • run_end_type (DataType) – The type of the run-end encoded values. Must be a 16-, 32-, or 64-bit integer type.

  • value_type (DataType) – The type of the values in the run-end encoded data.

  • nullable (bool) – Whether the data can contain null values.

  • metadata (Optional[Dict[Union[bytes, str], Union[bytes, str]]]) – A dictionary of metadata to attach to the column.

  • validator (Optional[Validator]) – A validator to run against the column’s values.

Base Class#

All of the column types inherit from the Column base class. Users are not expected to use Column directly. Instead, use one of the appropriate subclasses.

class quivr.Column(dtype, nullable=True, metadata=None, validator=None, default=None)#

A Column is an accessor for data in a Table, and also a descriptor for the Table’s structure.

This is a base class for all column types. It is not intended to be used directly; instead, use one of its subclasses.

Columns implement the descriptor protocol, so they should be used as class attributes on a Table subclass.

Parameters:
Variables:
  • dtype – The pyarrow data type of the column.

  • nullable – Whether the column can contain null values.

  • metadata – A dictionary of metadata to attach to the column.

  • validator – A validator to use when setting the column.

  • default – A default value to use when setting the column.

__get__(obj: None, objtype: type) Self#
__get__(obj: quivr.tables.Table, objtype: type) pyarrow.lib.Array
__get__(obj, objtype)

Gets the Column object from a Table class, or the associated data from a Table instance.

This method is part of the descriptor protocol.

Return type:

Union[Self, Array]

__set__(obj, value)#

Sets the data for this column on a Table instance.

This method is part of the descriptor protocol.

__set_name__(owner, name)#

Sets the name of the column.

This method is part of the descriptor protocol.

fill_default(array)#

Fills null values in the array with the Column’s default value.

Return type:

Array

pyarrow_field()#

Returns a pyarrow Field object for this column.

Return type:

Field

Typing Helpers#

quivr.columns.T = TypeVar(T, bound=Table)#

Type:    TypeVar

Invariant TypeVar bound to quivr.tables.Table.

quivr.MetadataDict#

alias of Dict[Union[bytes, str], Union[bytes, str]]

quivr.Byteslike#

alias of Union[bytes, str]