Linkages Example#

Let’s imagine we have two tables: one is a table of people, and the other is a table of their pets. The people table has a unique identifier for each person, and the pets table has a column that contains the identifier of the person who owns the pet.

import quivr as qv

class People(qv.Table):
    id = qv.UInt32Column()
    name = qv.StringColumn()
    age = qv.UInt32Column()

class Pets(qv.Table):
    name = qv.StringColumn()
    owner_id = qv.UInt32Column()
    species = qv.StringColumn()

For example:

Owners

id

name

age

1

Bob

30

2

Sue

25

3

Joe

40

4

Mary

35

5

John

50

Pets

name

owner_id

owner_id

species

Fido

1

1

Dog

Spot

1

1

Dog

Mittens

2

2

Cat

Rover

3

3

Dog

Lucy

4

4

Dog

Whiskers

5

5

Cat

Max

5

5

Dog

Let’s suppose we want to get the average age of people who own a particular species of pet.

The problem linkages solve#

One way without linkages would be to construct a new table that merges the two tables together:

class PetsAndOwners(qv.Table):
    owner = People.as_column()
    pets = Pets.as_column()

But this table is inefficient: it contains extra copies of the owner information, and it includes columns we won’t use like the owner’s name. It also requires us to duplicate the owner information for each pet they own.

Take a look:

owner.id

owner.name

owner.age

pet.name

pet.owner_id

pet.species

1

Bob

30

Fido

1

Dog

1

Bob

30

Spot

1

Dog

2

Sue

25

Mittens

2

Cat

3

Joe

40

Rover

3

Dog

4

Mary

35

Lucy

4

Dog

5

John

50

Whiskers

5

Cat

5

John

50

Max

5

Dog

Building a Linkage#

Instead, we’d like to link the two tables together. We can do this by using quivr.Linkage:

people = People.from_kwargs(
    id=[1, 2, 3, 4, 5],
    name=['Bob', 'Sue', 'Joe', 'Mary', 'John'],
    age=[30, 25, 40, 35, 50]
)

pets = Pets.from_kwargs(
    name=['Fido', 'Spot', 'Mittens', 'Rover', 'Lucy', 'Whiskers', 'Max'],
    owner_id=[1, 1, 2, 3, 4, 5, 5],
    species=['Dog', 'Dog', 'Cat', 'Dog', 'Dog', 'Cat', 'Dog']
)

linkage = qv.Linkage(people, pets, people.id, pets.owner_id)

Using the linkage#

The linkage has two main methods: quivr.Linkage.select and quivr.Linkage.iterate. These methods are also aliased for ergonomics: you can use linkage[...] to select rows, and for row in linkage to iterate over rows.

Let’s select the info associated with the person with ID=1:

person, pets = linkage.select(1)
print(person.name)
# [
#  "Bob"
# ]
print(pets.name)
# [
#  "Fido",
#  "Spot"
# ]

We can also iterate over all the groups in the linkage. This yields out tuples of (id, owner, pets) in our case. In general, it yields the (key, left_table, right_table) for each unique key found in the two tables.

for id, owner, pets in linkage:
    print(f"{owner.name[0]} has {len(pets)} pets")
# Mary has 1 pets
# John has 2 pets
# Bob has 2 pets
# Sue has 1 pets
# Joe has 1 pets

One thing to notice here is that linkages are unsorted. The order that the keys were provided does not necessarily correspond to the order of the rows yielded from the linkage.

Using the linkage in computation#

Let’s put this together to compute the average age of people who own cats and dogs:

cat_owners = []
dog_owners = []
for id, owner, pets in linkage:
    if 'Cat' in pets.species.tolist():
        cat_owners.append(owner)
    if 'Dog' in pets.species.tolist():
        dog_owners.append(owner)

cat_owners = qv.concatenate(cat_owners)
dog_owners = qv.concatenate(dog_owners)

print(cat_owners.age.to_numpy().mean())
# 37.5
print(dog_owners.age.to_numpy().mean())
# 38.75

There are a few things to note here.

  1. Iteration yields Table instances. Each group is a table that contains the rows from the left and right tables that match the key. In this case, the left table is the owners table, and the right table is the pets table.

  2. The Table instances are views into the original tables. No data is copied. This makes linkages very efficient.

  3. Combining across multiple groups is handled with quivr.concatenate. This utility function takes a list of tables and concatenates them together.