The following tutorial gives a quick overview of both data algebra and the algebraixlib Python package.

Data in an algebraixlib program is represented as `MathObject`s. `MathObject`s come in four types: `Atom`, `Couplet`, `Set`, and `Multiset`. `Multiset`s will not be covered in this tutorial. Values that aren't themselves modeled by Data Algebra, such as strings and numbers, are represented by `Atom`s.

In [1]:
from algebraixlib.mathobjects import Atom
peanut_butter = Atom("peanut butter")
jelly = Atom("jelly")

Every `MathObject` can be pretty-printed to the console using `print()`.

In [2]:
print(peanut_butter)

'peanut butter'


The non-`MathObject` value of the `Atom` can be accessed by its `value` property.

In [3]:
try:
 one = Atom(1)
 two = Atom(2)
 print("1 + 2 = {}".format(one.value + two.value))
 print("Will throw", one + two)
except TypeError as e:
 print("Error:", e)

1 + 2 = 3
Error: unsupported operand type(s) for +: 'Atom' and 'Atom'


`Couplet`s relate two pieces of information together. Those pieces of information must be represented as `MathObject`s; our two `Atom`s from earlier qualify.

In [4]:
from algebraixlib.mathobjects import Couplet
from algebraixlib.util.latexprinter import iprint_latex
from IPython.display import Math, display

import algebraixlib.util.latexprinter
algebraixlib.util.latexprinter.Config.colorize_output = False

together = Couplet(peanut_butter, jelly)
iprint_latex("together")
iprint_latex("nested", Couplet(together, together))





`MathObject` initializers will coerce their arguments to be `Atom`s if non-`MathObject`s are passed.

In [5]:
coerced = Couplet("this", "that")
print(repr(coerced))

Couplet(left=Atom('this'), right=Atom('that'))


The components of a `Couplet` are known as its `left` and `right`. Sometimes initializng a `Couplet` with named arguments can add clarity.

In [6]:
up_down = Couplet(left="up", right="down")
print("left is {}, right is {}".format(up_down.left, up_down.right))

left is 'up', right is 'down'


A `Couplet`'s components can be swapped by evaluating the unary operation $transpose$.

In [7]:
import algebraixlib.algebras.couplets as couplets
one_two = Couplet(1, 2)
transpose_result = couplets.transpose(one_two)
print("A couplet {} and its transpose {}".format(one_two, transpose_result))
iprint_latex("one_two")
iprint_latex("transpose_result")

A couplet (1->2) and its transpose (2->1)






When an expression is undefined in algebraixlib, it returns a special value, the singleton `Undef`. `Undef` cannot be used as a value in a `MathObject` and cannot be compared to any value (even itself). Use the `is` and `is not` operators to test if a value is undefined.

In [8]:
from algebraixlib.undef import Undef
print(Undef() is Undef())
print(Undef() is not Undef())
print(None is not Undef())

True
False
True


The binary operation $composition(a{\mapsto}b, c{\mapsto}d)$ evaluates to $c{\mapsto}b$ when $a = d$, otherwise it is undefined. Composition is often written with the infix operator $\circ$.

In [9]:
a_to_b = Couplet('a', 'b') # a->b
b_to_c = Couplet('b', 'c') # b->c
iprint_latex("b{\mapsto}c \circ a{\mapsto}b", couplets.compose(b_to_c, a_to_b)) # b->c * a->b = a->c
iprint_latex("a{\mapsto}b \circ b{\mapsto}c", couplets.compose(a_to_b, b_to_c)) # undef, composition is not commutative





`Set`s are used to create unordered collections of unique `MathObject`s. Note that this is a different class than Python's built-in `set` container. Non-`MathObject`s will be coerced into `Atom`s by `Set`'s initializer.

In [10]:
from algebraixlib.mathobjects import Set
many = Set(Atom("hello"), "world", Couplet("hola", "mundo"), "duplicate", "duplicate")
iprint_latex("many")
print("repr = ", repr(many))



repr = Set(Atom('duplicate'), Atom('hello'), Couplet(left=Atom('hola'), right=Atom('mundo')), Atom('world'))


`Set`s support `for...in` syntax for iteration and `in` and `not in` syntax for membership tests. Because sets are unordered, they do not support random access (no bracket operator).

In [11]:
nums = Set(1, 2, 3, 4, 5)
for elem in nums:
 print(elem)
print(1 in nums)
print(7 in nums)

1
2
3
4
5
True
False


`Set`s can be unioned, intersected, set-minused. Relations such as `is_subset` and `is_superset` are defined.

In [12]:
a = Set(1, 2)
b = Set(2, 3)

import algebraixlib.algebras.sets as sets

print("union(a, b) =", sets.union(a, b))
print("intersect(a, b) =", sets.intersect(a, b))
print("minus(a, b) =", sets.minus(a, b))
print("is_subset(a, b) =", sets.is_subset_of(a, b))
print("is_superset(a, {1}) =", sets.is_superset_of(a, Set(1)))

union(a, b) = {1, 2, 3}
intersect(a, b) = {2}
minus(a, b) = {1}
is_subset(a, b) = False
is_superset(a, {1}) = True


We can use a `Couplet` to model a single truth, such as ${sky}{\mapsto}{blue}$ or ${name}{\mapsto}{jeff}$. By collecting multiple `Couplet`s together in a `Set`, we form a mathematical model of a data record. This data structure, called a binary relation (abbreviated from here on as simply "relation"), is the fundamental set theory construct in a data algebra program.

In [13]:
record_relation = Set(Couplet('id', 123), Couplet('name', 'jeff'), Couplet('loves', 'math'),
 Couplet('loves', 'code'))
iprint_latex("record_relation")



Some relations specify a function from each couplet's left component to its right. This is the case when every left value maps to exactly one right value. Such a relation is called "left functional". Likewise, a relation can be said to be "right functional" when every right value maps to exactly one left value.

In [14]:
import algebraixlib.algebras.relations as relations

functional_relation = Set(Couplet('subject', 123), Couplet('name', 'james'), Couplet('level', 10))
print(relations.get_right(functional_relation, 'subject'))
print(relations.get_left(functional_relation, 123))
print(relations.get_right(record_relation,
 'loves')) # See non-functional record_relation above.

123
'subject'
undef


Function evaluation syntax makes this more concise for left functional relations.

In [15]:
subject = functional_relation('subject')
one_two_three = functional_relation(123)
iprint_latex("functional_relation(\mbox{'subject'})", subject)
iprint_latex("functional_relation(123)", one_two_three)





The power set of a set $S$, which we'll denote as $P(S)$, is the set of all subsets of $S$. Note how in the example below, the elements of `set_s` are numbers, and the elements of `powerset_s` are sets of numbers.

In [16]:
set_s = Set(1, 2, 3)
powerset_s = sets.power_set(set_s)
iprint_latex("S", set_s)
iprint_latex("P(S)", powerset_s)





Consider that if $C$ is the set of all couplets, then the set of all relations $R$ can be defined as $P(C)$, that is, every relation is an element of the power set of all couplets. It turns out that we can exploit this relationship by "extending" operations on couplets to relations and make them useful there. To extend a unary operation such as `couplets.transpose`, we apply it to every `Couplet` in a relation, which results in another relation.

In [17]:
import algebraixlib.extension as ext

first_relation = Set(Couplet('a', 1), Couplet('b', 2), Couplet('c', 3))
transposed_relation = ext.unary_extend(first_relation, couplets.transpose)
iprint_latex("first_relation")
iprint_latex("transposed_relation")





Similarly, a binary operation like `couplets.composition` can be extended by evaluating it for every element of the cross product of two relations. Notice that `couplets.composition` is a partial binary operation (given two legitimate `Couplet`s, it may be undefined). When `couplets.compose(a, b)` is not defined, it simply isn't included in the membership of the resulting relation. By extending, we have turned $composition$ into a full binary operation in the power set algebra.

In [18]:
second_relation = Set(Couplet('one', 'a'), Couplet('won', 'a'), Couplet('four', 'd'))
composed_relation = ext.binary_extend(first_relation, second_relation, couplets.compose)
empty_relation = ext.binary_extend(second_relation, first_relation,
 couplets.compose) # empty relation; still not commutative
iprint_latex("second_relation")
iprint_latex("composed_relation")
iprint_latex("empty_relation")







These extended operations are defined as functions in the `relations` module.

In [19]:
transpose_is_same = transposed_relation == relations.transpose(first_relation)
compose_is_same = composed_relation == relations.compose(first_relation, second_relation)
print("transpose_is_same:", transpose_is_same)
print("compose_is_same:", compose_is_same)

transpose_is_same: True
compose_is_same: True


The following docstring specifies a CSV table of words in various languages, with their meaning normalized to English.

In [20]:
vocab_csv = """word,language,meaning
hello,English,salutation
what's up,English,salutation
hola,Spanish,salutation
world,English,earth
mundo,Spanish,earth
gallo,Spanish,rooster
Duniyā,Hindi,earth
Kon'nichiwa,Japanese,salutation
hallo,German,salutation
nuqneH,Klingon,salutation
sekai,Japanese,earth
schmetterling,German,butterfly
mariposa,Spanish,butterfly
"""

Tables can be modeled as sets of binary relations, which we call "clans". In the case of tables, we can further specify that each relation will be a function from left to right (since each (row, header) coordinate corresponds to exactly one element).

In [21]:
from io import StringIO
from algebraixlib.io.csv import import_csv

file = StringIO(vocab_csv)
vocab_clan = import_csv(file)
iprint_latex("vocab_clan")



$superstrict(A, B)$ is a partial binary operation on sets. It is defined as $A$ if $A$ is a superset of $B$, otherwise it is undefined. We use the infix operator $\vartriangleright$ for superstriction.

In [22]:
hello_relation = Set(Couplet('word', 'hello'), Couplet('language', 'English'),
 Couplet('meaning', 'salutation'))
super_pos = sets.superstrict(hello_relation, Set(Couplet('language', 'English')))
super_neg = sets.superstrict(hello_relation, Set(Couplet('language', 'Mandarin')))

iprint_latex("hello_relation", hello_relation)
iprint_latex("hello_relation \\vartriangleright \{ \mbox{'language'}{\mapsto}\mbox{'English'} \}", super_pos)
iprint_latex("hello_relation \\vartriangleright \{ \mbox{'language'}{\mapsto}\mbox{'Mandarin'} \}", super_neg)







By extending the $superstrict$ operation to clans, which are sets of sets (of couplets), we can define a helpful mechanism to restrict `vocab_clan` to only those relations that contain particular values.

In [23]:
import algebraixlib.algebras.clans as clans
salutation_records_clan = clans.superstrict(vocab_clan, Set(Set(Couplet('meaning', 'salutation'))))
earth_records_clan = clans.superstrict(vocab_clan, Set(Set(Couplet('meaning', 'earth'))))
iprint_latex("salutation_records_clan")
iprint_latex("earth_records_clan")





Our extended `relations.compose` operation from earlier can be extended again to work with clans. By choosing an appropriate right-hand argument, clan composition can model the relational algebra notion of projection.

In [24]:
words_langs_clan = Set(Set(Couplet('word', 'word'), Couplet('language', 'language')))
iprint_latex("words_langs_clan")



The `relations.diag` and `clans.diag` utility functions create a "diagonal" relation or clan, respectively, with simpler syntax.

In [25]:
assert words_langs_clan == clans.diag('word', 'language')

Since the meaning of each set of records ('salutation') is invariant among the relations in `salutation_records_clan`, we can drop those `Couplet`s. Note that the cardinality of the resulting clan is the same, but each relation now contains only two `Couplet`s.

In [26]:
salutation_words_n_langs_clan = clans.compose(salutation_records_clan, words_langs_clan)
iprint_latex("salutation_words_n_langs_clan")



However, we can take this one step further and "rename" the 'word' attribute to something more specific by replacing the value 'word' with 'salutation' everywhere we find it as the left of a `Couplet`. By doing this, we both compress the information in each relation and also set our data up for later processing.

In [27]:
salutations_n_langs_clan = clans.compose(salutation_words_n_langs_clan,
 Set(Set(Couplet("salutation", "word"),
 Couplet("language", "language"))))
iprint_latex("salutations_n_langs_clan")



We'll do the same for `earth_records_clan`, but do the projection and "rename" all in one composition operation.

In [28]:
earths_n_langs_clan = clans.compose(earth_records_clan,
 Set(Set(Couplet("earth", "word"),
 Couplet("language", "language"))))
iprint_latex("earths_n_langs_clan")



Our next task will be to relate these clans to each other in a way that preserves the functional characteristic of every relation. We can define a partial binary operation $functional\_union(A, B)$ on relations to be $union(A, B)$ if $union(A, B)$ is left functional else undefined.

In [29]:
func_union_pos = relations.functional_union(hello_relation,
 Set(Couplet('language', 'English'),
 Couplet('more', 'info')))
func_union_neg = relations.functional_union(hello_relation,
 Set(Couplet('language', 'Spanish'),
 Couplet('more', 'info')))
iprint_latex("func_union_pos")
iprint_latex("func_union_neg")





Extending this operation to clans models natural join-like behavior.

In [30]:
salutations_words_langs_clan = clans.cross_functional_union(salutations_n_langs_clan,
 earths_n_langs_clan)
iprint_latex("salutations_words_langs_clan")



Now that the clans have been related to each other through their language attributes, we can do another projection. Notice how the "renaming" of 'word' to 'salutation' and 'earth' allows us to distinguish each of the words' meaning after joining the clans.

In [31]:
salutations_n_words_clan = clans.compose(salutations_words_langs_clan,
 clans.diag('salutation', 'earth'))
iprint_latex("salutations_n_words_clan")



Finally, we will distill this data down to a single relation describing "Hello, World" phrases.

In [32]:
greeting_relation = Set(Couplet(rel('salutation'), rel('earth'))
 for rel in salutations_n_words_clan)
iprint_latex("Greetings!!!", greeting_relation)



----
© Copyright Permission.io, Inc. (formerly known as Algebraix Data Corporation), Copyright (c) 2022.

This file is part of [`algebraixlib`][] .

[`algebraixlib`][] is free software: you can redistribute it and/or modify it under the terms of [version 3 of the GNU Lesser General Public License][] as published by the [Free Software Foundation][].

[`algebraixlib`][] is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along with [`algebraixlib`][]. If not, see [GNU licenses][].

[`algebraixlib`]: (A Python library for data algebra)
[Version 3 of the GNU Lesser General Public License]: 
[Free Software Foundation]: 
[GNU licenses]: 