Columns#
Columns are descriptions of the data that populates a given
Table. Columns are associated with tables by writing them as
class attributes on Table subclasses. For example:
import quivr
class Person(quivr.Table):
name = quivr.StringColumn()
age = quivr.Uint8Column()
favorite_books = quivr.ListColumn(quivr.StringColumn())
The Person Table defined above has three columns: a string name, a
uint8 age, and a list of strings for books.
All of the column types below inherit from Column, and so
they are Descriptors: they all have Column.__get__(),
Column.__set__(), and Column.__set_name__() methods.
Simple Types#
These are the simplest column types. They all are initialized the same way:
import quivr
class MyTable(quivr.Table)
# With no arguments, you get a nullable column that doesn't validate data.
# This could be quivr.Uint8Column(), or quivr.StringColumn(), whatever
quivr.Int32Column()
# Pass nullable=False to get a non-nullable column.
quivr.Int32Column(nullable=False)
# Pass a validator to check the input data against a constraint.
quivr.Int32Column(validator=quivr.gt(0))
# Pass metadata for arbitrary extra information to
# include. Metadata should be a string-to-string dictionary.
quivr.Int32Column(metadata={'units': 'seconds'})
# Set a default value to be used if any inputs are null
quivr.Int32Column(nullable=False, default=3)
When you access a column of a primitive type on a Table
instance, you get a pyarrow.Array back. The data type of the
array is described in this table:
Column Type |
Data Type |
Description |
|---|---|---|
UTF-8 string data. Strings must all be less than 231 characters long. |
||
UTF-8 string data up to 263 characters long. |
||
8-bit signed integers (-128 to 127) |
||
16-bit signed integers (-32,768 to 32,767) |
||
32-bit signed integers (-231 to 231 - 1) |
||
64-bit signed integers (-263 to 263 - 1) |
||
8-bit unsigned integers (0 to 255) |
||
16-bit unsigned integers (0 to 65,535) |
||
32-bit unsigned integers (0 to 232 - 1) |
||
64-bit unsigned integers (0 to 264 - 1) |
||
|
16-bit floating point values |
|
|
32-bit floating point values |
|
|
64-bit floating point values |
|
Boolean (true/false) values |
||
A zero-sized array of nulls |
||
Arbitrary binary blobs, variably sized, up to 231 bytes long each (about 4GB) |
||
Arbitrary binary blobs, variably sized, up to 263 bytes long each (about 9 exabytes) |
- class quivr.StringColumn(nullable=True, metadata=None, validator=None, default=None)#
A column for storing strings.
This can be used to store strings of any length, but it is not recommended for storing very long strings (over 2GB, for example). For long strings, use LargeStringColumn instead.
- class quivr.LargeStringColumn(nullable=True, metadata=None, validator=None, default=None)#
A column for storing large strings (over 231 bytes long). Large string data is stored in variable-length chunks.
- class quivr.Int8Column(nullable=True, metadata=None, validator=None, default=None)#
A column for storing 8-bit integers.
- class quivr.Int16Column(nullable=True, metadata=None, validator=None, default=None)#
A column for storing 16-bit integers.
- class quivr.Int32Column(nullable=True, metadata=None, validator=None, default=None)#
A column for storing 32-bit integers.
- class quivr.Int64Column(nullable=True, metadata=None, validator=None, default=None)#
A column for storing 64-bit integers.
- class quivr.UInt8Column(nullable=True, metadata=None, validator=None, default=None)#
A column for storing 8-bit unsigned integers.
- class quivr.UInt16Column(nullable=True, metadata=None, validator=None, default=None)#
A column for storing 16-bit unsigned integers.
- class quivr.UInt32Column(nullable=True, metadata=None, validator=None, default=None)#
A column for storing 32-bit unsigned integers.
- class quivr.UInt64Column(nullable=True, metadata=None, validator=None, default=None)#
A column for storing 64-bit unsigned integers.
- class quivr.Float16Column(nullable=True, metadata=None, validator=None)#
A column for storing 16-bit floating point numbers.
16-bit floating point arrays have limited support in Arrow, and thus limited support in quivr.
In particular: - They cannot be written to or read from Parquet files.
They cannot be constructed from Python floats in a natural way (one must use numpy.float16).
They don’t support quivr column default values (because they are not supported by Arrow’s compute functions).
- class quivr.Float32Column(nullable=True, metadata=None, validator=None, default=None)#
A column for storing 32-bit floating point numbers.
- class quivr.Float64Column(nullable=True, metadata=None, validator=None, default=None)#
A column for storing 64-bit floating point numbers.
- class quivr.BooleanColumn(nullable=True, metadata=None, validator=None, default=None)#
A column for storing booleans.
- class quivr.NullColumn(nullable=True, metadata=None, validator=None)#
A column for storing null values.
Nulls are represented as a single bit, and do not take up any memory space.
- class quivr.BinaryColumn(nullable=True, metadata=None, validator=None, default=None)#
A column for storing opaque binary data.
- class quivr.LargeBinaryColumn(nullable=True, metadata=None, validator=None, default=None)#
A column for storing large binary objects (over 231 bytes long). Large binary data is stored in variable-length chunks.
Fixed-Size Binary Data#
BinaryColumn and LargeBinaryColumn work with
variably-sized binary. If every item is of identical size, you can
use FixedSizeBinaryColumn to save some overhead.
- class quivr.FixedSizeBinaryColumn(byte_width, nullable=True, metadata=None, validator=None, default=None)#
A column for storing opaque fixed-size binary data.
- Parameters:
byte_width (
int) – The number of bytes per value.nullable (
bool) – Whether the column can contain null values.metadata (
Optional[Dict[Union[bytes,str],Union[bytes,str]]]) – Optional metadata to associate with the column.validator (
Optional[Validator]) – An optional validator to apply to the column.default (
Union[None,bytes,Callable[[],bytes]]) – An optional default value for the column. This can be a scalar value or a callable that takes no arguments.
Decimals#
Decimal data uses fixed-point, which guarantees that it has a consistent number of significant digits.
Decimal columns can be either 128-bit or 256-bit. When you set them up, you provide the “precision” and “scale” to be used.
- class quivr.Decimal128Column(precision, scale, nullable=True, metadata=None, default=None)#
A column for storing arbitrary-precision decimal numbers.
Arrow decimals are fixed-point decimal numbers encoded as a scaled integer. The precision is the number of significant digits that the decimal type can represent; the scale is the number of digits after the decimal point (note the scale can be negative).
As an example,
Decimal128Column(7, 3)can exactly represent the numbers 1234.567 and -1234.567 (encoded internally as the 128-bit integers 1234567 and -1234567, respectively), but neither 12345.67 nor 123.4567.DecimalColumn(5, -3)can exactly represent the number 12345000 (encoded internally as the 128-bit integer 12345), but neither 123450000 nor 1234500.If you need a precision higher than 38 significant digits, consider using
Decimal256Column.- Parameters:
precision (
int) – The number of significant digits.scale (
int) – The number of digits after the decimal point.nullable (
bool) – Whether the column can contain nulls.metadata (
Optional[Dict[Union[bytes,str],Union[bytes,str]]]) – A dictionary of metadata to attach to the column.default (
Union[None,Decimal,Callable[[],Decimal]]) – An optional default value for the column. This can be a scalar value or a callable that takes no arguments.
- class quivr.Decimal256Column(precision, scale, nullable=True, metadata=None, default=None)#
A column for storing arbitrary-precision decimal numbers.
Arrow decimals are fixed-point decimal numbers encoded as a scaled integer. The precision is the number of significant digits that the decimal type can represent; the scale is the number of digits after the decimal point (note the scale can be negative).
Values are stored as 256-bit integers.
- Parameters:
precision (
int) – The number of significant digits.scale (
int) – The number of digits after the decimal point.nullable (
bool) – Whether the column can contain nulls.metadata (
Optional[Dict[Union[bytes,str],Union[bytes,str]]]) – A dictionary of metadata to attach to the column.default (
Union[None,Decimal,Callable[[],Decimal]]) – An optional default value for the column. This can be a scalar value or a callable that takes no arguments.
Structured Data#
Columns can contain nested structural data. With these types, each row of the column contains some structure.
It is easy to get confused by this: All columns are “lists” in a
sense, but ListColumn is a two-dimensional structure. Each
row of the column is, itself, a list.
- class quivr.ListColumn(value_type, nullable=True, metadata=None, validator=None)#
A column for storing variably-sized lists of values.
The values in the list can be of any type.
Note that all quivr tables.Tables are storing lists of values, so this column type is only useful for storing lists of lists.
- Parameters:
value_type (
Union[DataType,Field,Column]) – The type of the values in the list.nullable (
bool) – Whether the list can contain null values.metadata (
Optional[Dict[Union[bytes,str],Union[bytes,str]]]) – A dictionary of metadata to attach to the column.validator (
Optional[Validator]) – A validator to run against the column’s values.
- class quivr.FixedSizeListColumn(value_type, list_size, nullable=True, metadata=None, validator=None)#
A column for storing lists of values of a fixed size.
The values in the list can be of any type.
Note that all quivr Tables are storing lists of values, so this column type is only useful for storing lists of lists.
- Parameters:
value_type (
Union[DataType,Field,Column]) – The type of the values in the list.list_size (
int) – The size of the list. Must be greater than 0.nullable (
bool) – Whether the list can contain null values.metadata (
Optional[Dict[Union[bytes,str],Union[bytes,str]]]) – A dictionary of metadata to attach to the column.validator (
Optional[Validator]) – A validator to run against the column’s values.
- class quivr.LargeListColumn(value_type, nullable=True, metadata=None, validator=None)#
A column for storing large lists of values (over 231 objects).
Unless you need to represent data with more than 2**31 elements, prefer ListColumn.
The values in the list can be of any type.
Note that all quivr Tables are storing lists of values, so this column type is only useful for storing lists of lists.
- Parameters:
value_type (
Union[DataType,Field,Column]) – The type of the values in the list.nullable (
bool) – Whether the list can contain null values.metadata (
Optional[Dict[Union[bytes,str],Union[bytes,str]]]) – A dictionary of metadata to attach to the column.validator (
Optional[Validator]) – A validator to run against the column’s values.
- class quivr.MapColumn(key_type, item_type, nullable=True, metadata=None, validator=None)#
A column for storing maps of key-value pairs.
The keys and values can be of any type, as long as the keys are hashable and unique.
- Parameters:
key_type (
Union[DataType,Field,Column]) – The type of the keys in the map.item_type (
Union[DataType,Field,Column]) – The type of the values in the map.nullable (
bool) – Whether the map can contain null key-value pairs.metadata (
Optional[Dict[Union[bytes,str],Union[bytes,str]]]) – A dictionary of metadata to attach to the column.validator (
Optional[Validator]) – A validator to run against the column’s values.
- class quivr.StructColumn(fields, nullable=True, metadata=None, validator=None)#
A column for storing structured data.
In general, prefer to define Tables and use their as_column method instead of using StructColumn.
- class quivr.SubTableColumn(table_type, nullable=True, metadata=None)#
A column which represents an embedded Quivr table.
- Parameters:
SubTableColumns are generally created through
Table.as_column(), and are not created directly.
Types For Encoding Efficiency#
- class quivr.DictionaryColumn(index_type, value_type, ordered=False, nullable=True, metadata=None, validator=None)#
A column for storing dictionary-encoded values.
This is intended for use with categorical data. See MapColumn for a more general mapping type.
- Parameters:
index_type (
DataType) – The type of the dictionary indices. Must be an integer type.value_type (
Union[DataType,Field,Column]) – The type of the values in the dictionary.ordered (
bool) – Whether the dictionary is ordered.nullable (
bool) – Whether the dictionary can contain null values.metadata (
Optional[Dict[Union[bytes,str],Union[bytes,str]]]) – A dictionary of metadata to attach to the column.validator (
Optional[Validator]) – A validator to run against the column’s values.
- class quivr.RunEndEncodedColumn(run_end_type, value_type, nullable=True, metadata=None, validator=None)#
A column for storing run-end encoded data.
This is a special column type that is used to efficiently store highly ordered data. Internally, the data is stored as two buffers:
An array of values, with all consecutive runs of the same value reduced to a single element
An array of run lengths, with the length of each run
This is more compact than storing the redundant values and also allow for very efficient computations like aggregations upon the data.
- Parameters:
run_end_type (
DataType) – The type of the run-end encoded values. Must be a 16-, 32-, or 64-bit integer type.value_type (
DataType) – The type of the values in the run-end encoded data.nullable (
bool) – Whether the data can contain null values.metadata (
Optional[Dict[Union[bytes,str],Union[bytes,str]]]) – A dictionary of metadata to attach to the column.validator (
Optional[Validator]) – A validator to run against the column’s values.
Base Class#
All of the column types inherit from the Column base
class. Users are not expected to use Column
directly. Instead, use one of the appropriate subclasses.
- class quivr.Column(dtype, nullable=True, metadata=None, validator=None, default=None)#
A Column is an accessor for data in a Table, and also a descriptor for the Table’s structure.
This is a base class for all column types. It is not intended to be used directly; instead, use one of its subclasses.
Columns implement the descriptor protocol, so they should be used as class attributes on a Table subclass.
- Parameters:
dtype (
DataType) – The pyarrow data type of the column.nullable (
bool) – Whether the column can contain null values.metadata (
Optional[Dict[Union[bytes,str],Union[bytes,str]]]) – A dictionary of metadata to attach to the column.validator (
Optional[Validator]) – A validator to use when setting the column.default (
Union[None,Callable[[],Any],Any]) – A default value to use when setting the column.
- Variables:
dtype – The pyarrow data type of the column.
nullable – Whether the column can contain null values.
metadata – A dictionary of metadata to attach to the column.
validator – A validator to use when setting the column.
default – A default value to use when setting the column.
- __get__(obj: None, objtype: type) Self#
- __get__(obj: quivr.tables.Table, objtype: type) pyarrow.lib.Array
- __get__(obj, objtype)
Gets the Column object from a Table class, or the associated data from a Table instance.
This method is part of the descriptor protocol.
- __set__(obj, value)#
Sets the data for this column on a Table instance.
This method is part of the descriptor protocol.
- __set_name__(owner, name)#
Sets the name of the column.
This method is part of the descriptor protocol.
- fill_default(array)#
Fills null values in the array with the Column’s default value.
- Return type:
Typing Helpers#
- quivr.columns.T = TypeVar(T, bound=Table)#
Type:
TypeVarInvariant
TypeVarbound toquivr.tables.Table.