Linkages Example#
Let’s imagine we have two tables: one is a table of people, and the other is a table of their pets. The people table has a unique identifier for each person, and the pets table has a column that contains the identifier of the person who owns the pet.
from quivr import Table, StringColumn, UInt32Column, Linkage, concatenate
class People(Table):
id = UInt32Column()
name = StringColumn()
age = UInt32Column()
class Pets(Table):
name = StringColumn()
owner_id = UInt32Column()
species = StringColumn()
For example:
Owners |
||
|---|---|---|
id |
name |
age |
1 |
Bob |
30 |
2 |
Sue |
25 |
3 |
Joe |
40 |
4 |
Mary |
35 |
5 |
John |
50 |
Pets |
|||
|---|---|---|---|
name |
owner_id |
owner_id |
species |
Fido |
1 |
1 |
Dog |
Spot |
1 |
1 |
Dog |
Mittens |
2 |
2 |
Cat |
Rover |
3 |
3 |
Dog |
Lucy |
4 |
4 |
Dog |
Whiskers |
5 |
5 |
Cat |
Max |
5 |
5 |
Dog |
Let’s suppose we want to get the average age of people who own a particular species of pet.
The problem linkages solve#
One way without linkages would be to construct a new table that merges the two tables together:
class PetsAndOwners(Table):
owner = People.as_column()
pets = Pets.as_column()
But this table is inefficient: it contains extra copies of the owner information, and it includes columns we won’t use like the owner’s name. It also requires us to duplicate the owner information for each pet they own.
Take a look:
owner.id |
owner.name |
owner.age |
pet.name |
pet.owner_id |
pet.species |
|---|---|---|---|---|---|
1 |
Bob |
30 |
Fido |
1 |
Dog |
1 |
Bob |
30 |
Spot |
1 |
Dog |
2 |
Sue |
25 |
Mittens |
2 |
Cat |
3 |
Joe |
40 |
Rover |
3 |
Dog |
4 |
Mary |
35 |
Lucy |
4 |
Dog |
5 |
John |
50 |
Whiskers |
5 |
Cat |
5 |
John |
50 |
Max |
5 |
Dog |
Building a Linkage#
Instead, we’d like to link the two tables together. We can do this by using quivr.Linkage:
people = People.from_data(
id=[1, 2, 3, 4, 5],
name=['Bob', 'Sue', 'Joe', 'Mary', 'John'],
age=[30, 25, 40, 35, 50]
)
pets = Pets.from_data(
name=['Fido', 'Spot', 'Mittens', 'Rover', 'Lucy', 'Whiskers', 'Max'],
owner_id=[1, 1, 2, 3, 4, 5, 5],
species=['Dog', 'Dog', 'Cat', 'Dog', 'Dog', 'Cat', 'Dog']
)
linkage = Linkage(people, pets, people.id, pets.owner_id)
Using the linkage#
The linkage has two main methods: quivr.Linkage.select and
quivr.Linkage.iterate. These methods are also aliased for
ergonomics: you can use linkage[...] to select rows, and for row
in linkage to iterate over rows.
Let’s select the info associated with the person with ID=1:
person, pets = linkage.select(1)
print(person.name)
# [
# "Bob"
# ]
print(pets.name)
# [
# "Fido",
# "Spot"
# ]
We can also iterate over all the groups in the linkage. This yields
out tuples of (id, owner, pets) in our case. In general, it yields
the (key, left_table, right_table) for each unique key found
in the two tables.
for id, owner, pets in linkage:
print(f"{owner.name[0]} has {len(pets)} pets")
# Mary has 1 pets
# John has 2 pets
# Bob has 2 pets
# Sue has 1 pets
# Joe has 1 pets
One thing to notice here is that linkages are unsorted. The order that the keys were provided does not necessarily correspond to the order of the rows yielded from the linkage.
Using the linkage in computation#
Let’s put this together to compute the average age of people who own cats and dogs:
cat_owners = []
dog_owners = []
for id, owner, pets in linkage:
if 'Cat' in pets.species.tolist():
cat_owners.append(owner)
if 'Dog' in pets.species.tolist():
dog_owners.append(owner)
cat_owners = concatenate(cat_owners)
dog_owners = concatenate(dog_owners)
print(cat_owners.age.to_numpy().mean())
# 37.5
print(dog_owners.age.to_numpy().mean())
# 38.75
There are a few things to note here.
Iteration yields
Tableinstances. Each group is a table that contains the rows from the left and right tables that match the key. In this case, the left table is theownerstable, and the right table is thepetstable.The
Tableinstances are views into the original tables. No data is copied. This makes linkages very efficient.Combining across multiple groups is handled with
quivr.concatenate. This utility function takes a list of tables and concatenates them together.