Data in an algebraixlib program is represented as `MathObject`s. `MathObject`s come in four types: `Atom`, `Couplet`, `Set`, and `Multiset`. `Multiset`s are a mathematical construct like sets. Their difference is that they remove the restriction that all elements have to be unique. Usually a multiset will be denoted as a function that maps an element in the multiset to a whole integer that represents the number of times that element is in the multiset.

### The `Multiset` MathObject

In [1]:
from algebraixlib.mathobjects import Atom, Set, Couplet, Multiset

# Construct a multiset
ms_1 = Multiset({'a': 2, 'b': 3})
ms_2 = Multiset(['b', 'b', 'c'])

`Multiset` mathobjects can be constructed in two manners: one provides a dictionary of the elements and their multiples, the other takes a list or array and the constructor tallies the equal elements.

In [2]:
print(ms_1)
print(ms_2)

['a':2, 'b':3]
['b':2, 'c':1]


The printed representation of a multiset is annotated in brackets and Python dictionary style sytax for representing elements and the number of times that element in in the multiset.

Multisets can tell you about the data in them using iteration and `get_multiplicity` method.

In [3]:
# Multiplicity and Iteration functions.
a = Atom('a')
print("This multiset has a cardinality of " + str(ms_1.cardinality))
print("There are " + str(ms_1.get_multiplicity(a)) + " elements of " + str(a) + " in ms_1") 
# iteration
for elem in ms_1:
 print(elem)

This multiset has a cardinality of 5
There are 2 elements of 'a' in ms_1
'a'
'a'
'b'
'b'
'b'


Notice that when iterating over a multiset, the iterator returns an element for its multiple number of times.

The next code snippet loads some utility functions that will render a mathobject into LaTeX markup, making them easier to read.

In [4]:
# Loading utility printing functions.
import algebraixlib.util.latexprinter
algebraixlib.util.latexprinter.Config.colorize_output = False

from algebraixlib.util.latexprinter import math_object_to_latex, iprint_latex
from IPython.display import Math, display

### Multiset Algebra Operations
The same operations that are available in the set algebra haven been implemented in the multiset algebra. The notable difference is that the operations take account for the multiples of elements.

The union operation in the Multiset algebra takes the `max` of all the multiples merges the arguments into one result multiset, while the intersect operation takes the `min` of all the multiples merges the arguments into one result multiset.

In [5]:
import algebraixlib.algebras.multisets as multisets

iprint_latex("ms_1", ms_1)
iprint_latex("ms_2", ms_2)
iprint_latex("simple_union", "ms\_1 \cup ms\_2")
simple_union = multisets.union(ms_1, ms_2) # Execute the operation
iprint_latex("simple_union", simple_union)









In [6]:
iprint_latex("simple_intersect", "ms\_1 \cap ms\_2")
simple_intersect = multisets.intersect(ms_1, ms_2) # Execute the operation
iprint_latex("simple_intersect", simple_intersect)





This algebra also provides operations `add` and `minus`.

The addition operation in the algebra sums all the multiples of like values in the arguments into one result multiset. And the minus operation subtracts elements of the right multiset from the left argument. For values <=0, the elements are removed from the `multiset`, since a multiple is defined as being greater than 0.

In [7]:
iprint_latex("simple_add", "ms\_1 + ms\_2")
simple_add = multisets.add(ms_1, ms_2) # Execute the operation
iprint_latex("simple_add", simple_add)





In [8]:
iprint_latex("simple_minus", "ms\_1 - ms\_2")
simple_minus = multisets.minus(ms_1, ms_2) # Execute the operation
iprint_latex("simple_minus", simple_minus)





### Multiclan Algebra Operations
Like multisets, where set elements are not restricted to be unique, multiclans allow the same flexibility for relations: the elements of the relation are not restricted to being unique. Unlike multisets, which have an explicit object, multiclans do not have an object. Instead they are defined by a multiset of a relations.

In [9]:
# Building some relations for our multiclans.
rel_1 = Set(Couplet('x', 'y'), Couplet('w', 'y'))
rel_2 = Set(Couplet('a', 'x'), Couplet('b', 'w'))
rel_3 = Set(Couplet('x', 'z'), Couplet('v', 'y'))
rel_4 = Set(Couplet('c', 'z'), Couplet('a', 'v'))
rel_5 = Set(Couplet('b', 'w'), Couplet('w', 'y'))
rel_6 = Set(Couplet('a', 'x'), Couplet('x', 'y'))

In [10]:
# Creating multiclans (multisets of relations)
mc_1 = Multiset({rel_1: 2, rel_3: 3})
mc_2 = Multiset({rel_2: 5, rel_4: 1})
mc_3 = Multiset({rel_1: 2, rel_6: 7})
mc_4 = Multiset({rel_2: 5, rel_5: 11})

In [11]:
iprint_latex("mc_1", mc_1)
iprint_latex("mc_2", mc_2)
iprint_latex("mc_3", mc_3)
iprint_latex("mc_4", mc_4)









Note that all the multiclans used prime numbers for multiplicities. This was purposely done to help the reader know which left hand and right hand relations were used for the inner operation. Since the operation will be result in the product of the multiples, in most cases you can factor the result multiplicty yielding the two argument's prime numbers.

`transpose` swaps the left and right components of the couplets in each relation; multiplicity is unaffected.

In [12]:
import algebraixlib.algebras.multiclans as multiclans

simple_transpose = multiclans.transpose(mc_1)
iprint_latex("mc_1", mc_1)

simple_transpose = multiclans.transpose(mc_1)
iprint_latex("simple_transpose", "\overleftrightarrow{mc\_1}")
iprint_latex("simple_transpose", simple_transpose)







multiclan's compose applies a cross compose of the relations in each sides multiclan, the multiplicity is the product of each sides multiplicity.

In [13]:
iprint_latex("mc_1", mc_1)
iprint_latex("mc_2", mc_2)
simple_compose_1 = multiclans.compose(mc_1, mc_2)
iprint_latex("simple_compose_1", "mc\_1 \circ mc\_2")
iprint_latex("simple_compose_1", simple_compose_1)









In [14]:
simple_compose_2 = multiclans.compose(mc_2, mc_1)
iprint_latex("simple_compose_2", "mc\_2 \circ mc\_1")
iprint_latex("simple_compose_2", simple_compose_2)





`cross_union` applies the relation's union for all relations in the multiclan of each argument. Again the resulting multiple is the product of the two relations multiples.

In [15]:
simple_cross_union_1 = multiclans.cross_union(mc_1, mc_2)
iprint_latex("simple_cross_union_1", "mc\_1 \\blacktriangledown mc\_2")
iprint_latex("simple_cross_union_1", simple_cross_union_1)





`cross_union` like other operations can yield the same result for different inner relation unions. These same results are summed together.

In [16]:
iprint_latex("mc_3", mc_3)
iprint_latex("mc_4", mc_4)
iprint_latex("simple_cross_union_2", "mc\_3 \\blacktriangledown mc\_4")
simple_cross_union_2 = multiclans.cross_union(mc_3, mc_4)
iprint_latex("simple_cross_union_2", simple_cross_union_2)









In [17]:
iprint_latex("mc_1", mc_1)
iprint_latex("mc_2", mc_2)
simple_cross_intersect_1 = multiclans.cross_intersect(mc_1, mc_2)
iprint_latex("simple_cross_intersect_1", "mc\_1 \\blacktriangle mc\_2")
iprint_latex("simple_cross_intersect_1", simple_cross_intersect_1)









`cross_union` is utilizing the `union` defined on relations. And `'cross_intersection` uses the `intersection` defined on relations.

In [18]:
iprint_latex("mc_3", mc_3)
iprint_latex("mc_4", mc_4)
iprint_latex("simple_cross_intersect_2", "mc\_3 \\blacktriangle mc\_4")
simple_cross_intersect_2 = multiclans.cross_intersect(mc_3, mc_4)
iprint_latex("simple_cross_intersect_2", simple_cross_intersect_2)









### A Data Algebra Example with Multisets

Sometimes when importing CSV data, importing the contents into a clan is not the right thing to do. A clan requires each relation be unique, this translates to a requirement that the CSV file does not contain duplicate rows of data, or the user is ok losing that information.
If a user has a CSV file such that there are duplicate rows and this information is important in some way, then importing into a multiclan is a better choice. This is because rather than losing the duplicates, a multiclan will keep track of how many times the relation is present.

We can demonstrate the usefulness of the Multiset Object and the Multiclan Algebra when considering the import of CSV data. When employing Data Algebra, each row corresponds to a relation. It is not uncommon for CSV data to have duplicate rows, duplicate relations. In this case the multiplicity is not preserved in a Multiset which makes the import lossy which may not be acceptable. Instead, the multiplicity can be preserved by importing the data into a Multiclan. 

Suppose we have a csv data file that records products sold and the cashier that sold them. It is a likely and valid use case that the same product is sold by the same cashier. The results in a datalog where we will have duplicate rows, but each represnt a unique transaction, information that should not be lost. If we have a csv file like this, importing this information as multiclan can help introspect on the data.

In [19]:
# Importing data import tools
from io import StringIO
from algebraixlib.io.csv import import_csv
from algebraixlib.io.csv import export_csv

In [20]:
# Example csv data log of cashiers' products sold. Note that the first row contains the colulmn headings.
sales_csv = """product,cashier
apple,jane
banana,doug
apple,jane
peach,doug
rice,frank
rice,frank
banana,doug
apple,doug
rice,jane
apple,jane
"""

In [21]:
# Import the data. Note that we are simulating a file to simplify the example.
file = StringIO(sales_csv)
sales_multiclan = import_csv(file, has_dup_rows=True) # note the use of flag to return a multiclan
iprint_latex("sales_multiclan")



Frequently CSV data is used for data analysis and mined. Next we will examine how many sales each cashier made and how many of each product was sold using the Multiclan Algebra Compose Operation.
One example of data algebra, is to examine how many sales each cashier had, and examine how many of each product was sold. This information can be computers using the multiclan compose operation.

In [22]:
cashier_diagonal = Multiset({Set(Couplet('cashier', 'cashier')): 1})
cashier_sales = multiclans.compose(sales_multiclan, cashier_diagonal)

iprint_latex("cashier_diagonal")
iprint_latex("cashier_sales", "sales\_multiclan \circ cashier\_diagonal")
iprint_latex("cashier_sales")







In [23]:
product_diagonal = Multiset({Set(Couplet('product', 'product')): 1})
product_sales = multiclans.compose(sales_multiclan, product_diagonal)

iprint_latex("product_diagonal")
iprint_latex("product_sales", "sales\_multiclan \circ product\_diagonal")
iprint_latex("product_sales")







Converting this data back into an export of csv is also possible.

In [24]:
# confirm no data is lost by printing out entire multiclan
sales_csv_out = StringIO()
export_csv(sales_multiclan, sales_csv_out)
csv_str = sales_csv_out.getvalue()
print(csv_str)

cashier_sales_csv_out = StringIO()
export_csv(cashier_sales, cashier_sales_csv_out)
cashier_sales_csv_str = cashier_sales_csv_out.getvalue()
print(cashier_sales_csv_str)

product_sales_csv_out = StringIO()
export_csv(product_sales, product_sales_csv_out)
product_sales_csv_str = product_sales_csv_out.getvalue()
print(product_sales_csv_str)

cashier,product
doug,apple
doug,banana
doug,banana
doug,peach
frank,rice
frank,rice
jane,apple
jane,apple
jane,apple
jane,rice

cashier
doug
doug
doug
doug
frank
frank
jane
jane
jane
jane

product
apple
apple
apple
apple
banana
banana
peach
rice
rice
rice



----
© Copyright Permission.io, Inc. (formerly known as Algebraix Data Corporation), Copyright (c) 2022.

This file is part of [`algebraixlib`][] .

[`algebraixlib`][] is free software: you can redistribute it and/or modify it under the terms of [version 3 of the GNU Lesser General Public License][] as published by the [Free Software Foundation][].

[`algebraixlib`][] is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along with [`algebraixlib`][]. If not, see [GNU licenses][].

[`algebraixlib`]: (A Python library for data algebra)
[Version 3 of the GNU Lesser General Public License]: 
[Free Software Foundation]: 
[GNU licenses]: 