DGtal  1.5.beta
Set operations on arbitrary containers
Author(s) of this documentation:
Jacques-Olivier Lachaud
Since
0.9.1

Part of the Base package.

This part of the manual describes how to perform set operations (union, intersection, difference, symmetric difference) on arbitrary containers.

The following programs are related to this documentation: testSetFunctions.cpp

Motivation for a common framework for set operations

The STL library provides algorithm for performing set operations (std::set_union, std::set_intersection, std::set_difference, std::set_symmetric_difference). These algorithms are generic given iterators on an ordered range and an output iterator. So, in circumstances where you have two sorted vectors with unique elements, you could/should use directly STL set operations. You could also do this when you have two sets.

However, there are several cases when you cannot use them straightforwardly:

  • if you wish to assign the result to one of the input set, you must use a temporary container.
  • if your set is represented with some unordered_set variant, you must build intermediate vectors.
  • if you wish to perform set operations on keys of datas stored in two maps, you have also to specify the comparison operator.
  • if you are in templated class where the Container type is generic (could be a set or unordered_set for instance) then your code must be adapted to each variant.
  • the syntax is heavier than just using binary operators |, &, -, ^ or |=, &=, -=, ^=.

Therefore we propose functions to perform set operations on arbitrary containers. At compile time, the compiler chooses the most adequate way to compute the set operations, depending on the container type. It uses the traits class ContainerTraits to determine the type of container and to select the appropriate code. For the user, this is totally transparent, at least when he uses the binary operators |, &, -, ^ or |=, &=, -=, ^=.

Performing set operations

It is enough to include module SetFunctions.h and then write using namespace DGtal::functions::setops to use binary operators directly on your containers. The following snippet shows an example:

#include "DGtal/base/SetFunctions.h"
...
using namespace DGtal::functions::setops;
typedef std::list<int> Container; // could be boost::unordered_set<int>, etc
int S1[ 10 ] = { 4, 15, 20, 17, 9, 7, 13, 12, 1, 3 };
int S2[ 6 ] = { 17, 14, 19, 2, 3, 4 };
Container A( S1, S1 + 10 );
Container B( S2, S2 + 6 );
Container AorB = A | B; // union
Container AandB = A & B; // intersection
Container AxorB = A ^ B; // symmetric difference
Container AminusB = A - B; // difference
Container BminusA = B - A; // difference
A |= B; // assign union
B &= A; // assign intersection
A ^= B; // assign symmetric difference
B -= A; // assign difference

If you dislike operators, you may also use functions (defined in namespace DGtal::functions):

There are templated versions of these functions that are useful if you know that, at this point in the program, your container is sorted (for instance your list or vector is already sorted). You may thus give the hint to set operations at this point.

using namespace DGtal::functions;
typedef std::vector<int> Container; // could be boost::unordered_set<int>, etc
int S1[ 10 ] = { 1, 3, 4, 7, 9, 12, 13, 15, 17, 20 };
int S2[ 6 ] = { 2, 3, 4, 14, 17, 19 };
Container A( S1, S1 + 10 );
Container B( S2, S2 + 6 );
Container AorB = makeUnion<Container,true>( A, B );
functions namespace gathers all DGtal functionsxs.

Benchmark for set operations

We benchmark set operations for different kind of containers. In each case, we do not take into account the time for constructing containers A and B, we only measure the time to perform the operation. For each container, we compare the running time of our implementation and the running time obtained by first converting the container to a vector and perform STL set operation and converting back. Running times are in milliseconds.

N = approx. size of A,B 10000 100000 1000000 10000000
vector<int> union 0.6 7.6 91.1 1039.2
(conversion) union 0.7 7.7 90.7 1034.3
vector<int> inter 0.6 7.1 86.4 986.5
(conversion) inter 0.6 7.1 86.6 981.0
vector<int> diff 0.6 7.0 84.8 991.0
(conversion) diff 0.6 7.3 83.9 986.1
vector<int> sym_diff 0.6 7.2 85.2 987.1
(conversion) sym_diff 0.6 7.1 84.4 982.4
set<int> union 2.0 25.8 308.7 3402.6
(conversion) union 2.1 27.0 322.3 3559.7
set<int> inter 1.5 17.3 202.0 2269.4
(conversion) inter 1.7 19.3 241.8 2515.9
set<int> diff 1.3 14.5 171.0 1821.7
(conversion) diff 1.4 16.5 203.3 2135.7
set<int> sym_diff 1.7 18.6 213.8 2403.7
(conversion) sym_diff 1.6 20.0 242.6 2660.7
unordered_set<int> union 1.0 10.5 176.2 2464.1
(conversion) union 2.5 28.5 439.3 5855.7
unordered_set<int> inter 0.8 10.6 171.4 2036.6
(conversion) inter 2.1 21.3 321.3 3929.2
unordered_set<int> diff 0.9 11.8 203.6 2429.1
(conversion) diff 1.9 19.7 290.2 3523.1
unordered_set<int> sym_diff 2.5 29.5 428.5 5855.4
(conversion) sym_diff 2.1 22.1 339.4 4025.6
Note
It is worthy to note that the genericity offered by these functions does not slow down direct set operations on vectors, as can be seen on the upper part of the table. A second remark is that our set operations are almost always faster for set and unordered_set structures. In fact, for container set, it uses STL set operations. The only case where you could consider converting a container is when performing symmetric difference on an unordered set.
An unordered_set is slightly faster than a set for union and intersection, slightly slower for difference, much slower for symmetric difference.
Container A and B are built by inserting N elements whose values are in \( \{0,...,N-1\} \). Hence, both A and B have less than N elements but are of the order of N.

Set relations

We have chosen not to overload operators == (equality), <= (subset), >= (supset), != (difference), because there is a big risk of mess up with other standard operator overloading (for instance, it poses a problem with catch test framework). You may use functions::isEqual and functions::isSubset to compare sets.

using namespace DGtal::functions;
using namespace DGtal::functions::setops;
typedef std::vector<int> Container; // could be boost::unordered_set<int>, etc
int S1[ 10 ] = { 1, 3, 4, 7, 9, 12, 13, 15, 17, 20 };
int S2[ 6 ] = { 2, 3, 4, 14, 17, 19 };
Container A( S1, S1 + 10 );
Container B( S2, S2 + 6 );
if ( isSubset( A, A | B ) ) std::cout << "ok" << std::endl;
else std::cout << "erreur" << std::endl;
if ( isSubset( A & B, B ) ) std::cout << "ok" << std::endl;
else std::cout << "erreur" << std::endl;
if ( isEqual( ( A | B ) - ( A & B ), A ^ B ) )
std::cout << "ok" << std::endl;
else std::cout << "erreur" << std::endl;
bool isEqual(const Container &S1, const Container &S2)
Definition: SetFunctions.h:789
bool isSubset(const Container &S1, const Container &S2)
Definition: SetFunctions.h:845

For developpers: a traits class for containers

In order to have set operations that can be indifferently applied to many kind of containers, we use a mechanism called traits. The class ContainerTraits is used to define the category for each container. There are several categories, which form a kind of hierarchy. Each ContainerTraits should contain an inner type called Category that defines the type of container. By default, it is NotContainerCategory. But the ContainerTraits class is specialized for every STL container, for instance with:

// Defines container traits for std::vector<>.
template < class T, class Alloc >
struct ContainerTraits< std::vector<T, Alloc> >
{
typedef SequenceCategory Category;
};
// Defines container traits for std::map<>.
template < class Key, class T, class Compare, class Alloc >
struct ContainerTraits< std::map<Key, T, Compare, Alloc> >
{
typedef MapAssociativeCategory Category;
};
...
NotContainerCategory Category

If you define a new container, you may associate its correct Category by specializing its ContainerTraits class with the correct category among NotContainerCategory, ContainerCategory, SequenceCategory, AssociativeCategory, SimpleAssociativeCategory, PairAssociativeCategory, UniqueAssociativeCategory, MultipleAssociativeCategory, OrderedAssociativeCategory, UnorderedAssociativeCategory, SetAssociativeCategory, MultisetAssociativeCategory, MapAssociativeCategory, MultimapAssociativeCategory, UnorderedSetAssociativeCategory, UnorderedMultisetAssociativeCategory, UnorderedMapAssociativeCategory, UnorderedMultimapAssociativeCategory.