[/
Copyright 2017 Nick Thompson
Distributed under the Boost Software License, Version 1.0.
(See accompanying file LICENSE_1_0.txt or copy at
http://www.boost.org/LICENSE_1_0.txt).
]
[section:vector_functionals Vector Functionals]
[heading Synopsis]
``
#include
namespace boost{ namespace math{ namespace tools {
template
auto mean(ForwardIterator first, ForwardIterator last);
template
auto mean_and_variance(ForwardIterator first, ForwardIterator last);
template
auto median(ForwardIterator first, ForwardIterator last);
template
auto absolute_median(ForwardIterator first, ForwardIterator last);
template
auto gini_coefficient(ForwardIterator first, ForwardIterator last);
template
auto absolute_gini_coefficient(ForwardIterator first, ForwardIterator last);
template
auto hoyer_sparsity(ForwardIterator first, ForwardIterator last);
template
auto shannon_entropy(ForwardIterator first, ForwardIterator last);
template
auto l0_pseudo_norm(ForwardIterator first, ForwardIterator last);
template
auto l1_norm(ForwardIterator first, ForwardIterator last);
template
auto l2_norm(ForwardIterator first, ForwardIterator last);
template
auto sup_norm(ForwardIterator first, ForwardIterator last);
template
auto lp_norm(ForwardIterator first, ForwardIterator last, p);
template
auto total_variation(ForwardIterator first, ForwardIterator last);
}}}
``
[heading Description]
The file `boost/math/tools/vector_functionals.hpp` is a set of facilities for computing scalar values from vectors.
We use the word "vector functional" in the [@https://ncatlab.org/nlab/show/nonlinear+functional mathematical sense], indicating a map \u2113:\u211D[super n] \u2192 \u211D,
and occasionally maps from \u2102[super n] \u2192 \u211D and \u2102[super n] \u2192 \u2102.
The set of maps provided herein attempt to cover the most commonly encountered functionals from statistics, numerical analysis, and signal processing.
Many of these functionals have trivial naive implementations, but experienced programmers will recognize that even trivial algorithms are easy to screw up, and that numerical instabilities often lurk in corner cases.
We have attempted to do our "due diligence" to root out these problems-scouring the literature for numerically stable algorithms for even the simplest of functionals.
/Nota bene/: Some similar functionality is provided in [@https://www.boost.org/doc/libs/1_68_0/doc/html/accumulators/user_s_guide.html Boost Accumulators Framework].
These accumulators should be used in real-time applications; `vector_functionals.hpp` should be used when CPU vectorization is needed.
As a reminder, remember that to actually /get/ vectorization, compile with `-march=native -O3` flags.
We now describe each functional in detail.
Our examples use `std::vector` to hold the data, but this not required.
In general, you can store your data in an Eigen array, and Armadillo vector, `std::array`, and for many of the routines, a `std::forward_list`.
These routines are usable float, double, long double, and Boost.Multiprecision precision, as well as their complex extensions whenever the computation is well-defined.
[heading Mean]
std::vector v{1,2,3,4,5};
double mu = boost::math::tools::mean(v.cbegin(), v.cend());
The implementation follows [@https://doi.org/10.1137/1.9780898718027 Higham 1.6a].
The data is not modified and must be forward iterable.
Works with complex data.
[heading Mean and Population Variance]
std::vector v{1,2,3,4,5};
auto [mu, s] = boost::math::tools::mean_and_population_variance(v.cbegin(), v.cend());
The implementation follows [@https://doi.org/10.1137/1.9780898718027 Higham 1.6b].
Note that we do not provide computation of population variance alone;
we are unaware of any one-pass, numerically stable computation of population variance which does not simultaneously generate the mean.
If the mean is not required, simply ignore it.
The input datatype must be forward iterable and the range `[first, last)` must contain at least two elements.
It is /not/ in general sensible to pass complex numbers to this routine.
[heading Median]
Compute the median of a dataset:
std::vector v{1,2,3,4,5};
double m = boost::math::tools::median(v.begin(), v.end());
/Nota bene: The input vector is modified./
The calculation of the median is a thin wrapper around the C++11 [@https://en.cppreference.com/w/cpp/algorithm/nth_element nth-element].
Therefore, all requirements of `nth_element` are inherited by the median calculation.
[heading Absolute Median]
The absolute median is used in signal processing, where the median of the magnitude of the coefficients in some expansion are used to estimate noise variance.
See [@https://wavelet-tour.github.io/ Mallat] for details.
The absolute median supports both real and complex arithmetic, modifies its input, and requires random access iterators.
std::vector v{-1, 1};
double m = boost::math::tools::absolute_median(v.begin(), v.end());
// m = 1
[heading Gini Coefficient]
Compute the Gini coefficient of a dataset:
std::vector v{1,0,0,0};
double gini = boost::math::tools::gini_coefficient(v.begin(), v.end());
// gini = 1, as v[0] holds all the "wealth"
std::vector w{1,1,1,1};
gini = boost::math::tools::gini_coefficient(w.begin(), w.end());
// gini = 0, as all elements are now equal.
/Nota bene: The input data is altered-in particular, it is sorted./
/Nota bene:/ Different authors use different conventions regarding the overall scale of the Gini coefficient.
We have chosen to follow [@https://arxiv.org/pdf/0811.4706.pdf Hurley and Rickard's definition], which [@https://en.wikipedia.org/wiki/Gini_coefficient Wikipedia] calls a "sample Gini coefficient".
Hurley and Rickard's definition places the Gini coefficient in the range [0,1]; Wikipedia's population Gini coefficient is in the range [0, 1 - 1//n/].
If you wish to convert the Boost Gini coefficient to the population Gini coefficient, multiply by (/n/-1)//n/.
/Nota bene:/ There is essentially no reason to pass negative values to the Gini coefficient function.
However, a single use case (measuring wealth inequality when some people have negative wealth) exists, so we do not throw an exception when negative values are encountered.
You should have /very/ good cause to pass negative values to the Gini coefficient calculator.
The Gini coefficient, first used to measure wealth inequality, is also one of the best measures of the sparsity of an expansion in a basis.
A sparse expansion has most of its norm concentrated in just a few coefficients, making the connection with wealth inequality obvious.
However, for measuring sparsity, the phase of the numbers is irrelevant, so `absolute_gini_coefficient` should be used instead:
std::vector> v{{0,1}, {0,0}, {0,0}, {0,0}};
double abs_gini = boost::math::tools::absolute_gini_coefficient(v.begin(), v.end());
// now abs_gini = 1
std::vector> w{{0,1}, {1,0}, {0,-1}, {-1,0}};
double abs_gini = boost::math::tools::absolute_gini_coefficient(w.begin(), w.end());
// now abs_gini = 0
std::vector u{-1, 1, -1};
double abs_gini = boost::math::tools::absolute_gini_coefficient(u.begin(), u.end());
// now abs_gini = 0
Again, Wikipedia denotes our scaling as a "sample Gini coefficient".
We chose this scaling because it always returns unity for a vector which has only one nonzero coefficient.
If sorting the input data is too much expense for a sparsity measure (is it going to be perfect anyway?),
consider calculating the Hoyer sparsity instead.
[heading Hoyer Sparsity]
The Hoyer sparsity measures a normalized ratio of the \u2113[super 1] and \u2113[super 2] norms.
As the name suggests, it is used to measure sparsity in an expansion in some basis.
The Hoyer sparsity computes ([radic]/N/ - \u2113[super 1](v)/\u2113[super 2](v))/([radic]N -1).
For details, see [@https://arxiv.org/pdf/0811.4706.pdf Hurley and Rickard].
Usage:
std::vector v{1,0,0};
Real hs = boost::math::tools::hoyer_sparsity(v.begin(), v.end());
// hs = 1
std::vector v{1,-1,1};
Real hs = boost::math::tools::hoyer_sparsity(v.begin(), v.end());
// hs = 0
The container must be forward iterable and the contents are not modified.
[heading \u2113[super \u221E] norm]
Computes the supremum norm of a dataset:
std::vector v{-3, 2, 1};
double sup = boost::math::tools::sup_norm(v.cbegin(), v.cend());
// sup = 3
std::vector> v{{0, -8}, {1,1}, {-3,2}};
double sup = boost::math::tools::sup_norm(v.cbegin(), v.cend());
// sup = 8
Supports both real and complex arithmetic.
Container must be forward iterable and is not modified.
[heading \u2113[super /p/] norm]
std::vector v{-8, 0, 0};
double sup = boost::math::tools::lp_norm(v.cbegin(), v.cend(), 7);
// sup = 8
std::vector> v{{1, 0}, {0,1}, {0,-1}};
double sup = boost::math::tools::sup_norm(v.cbegin(), v.cend(), 3);
// sup = cbrt(3)
Supports both real and complex arithmetic.
The container must be forward iterable and the contents are not modified.
[heading \u2113[super 0] pseudo-norm]
Counts the number of non-zero elements in a container.
std::vector v{0,0,1};
size_t count = boost::math::tools::l0_pseudo_norm(v.begin(), v.end());
// count = 1
Supports real and complex numbers.
The container must be forward iterable and the contents are not modified.
Note that this measure is not robust against numerical noise.
[heading \u2113[super 1] norm]
The \u2113[super 1] norm is a special case of the \u2113[super /p/] norm, but is much faster:
std::vector v{1,1,1};
double l1 = boost::math::tools::l1_norm(v.begin(), v.end());
// l1 = 3
Requires a forward iterable input, does not modify input data, and works with complex numbers.
[heading \u2113[super 2] norm]
The \u2113[super 2] norm is again a special case of the \u2113[super /p/] norm, but is much faster:
std::vector v{1,1,1};
double l1 = boost::math::tools::l2_norm(v.begin(), v.end());
// l1 = sqrt(3)
Requires a forward iterable input, does not modify input data, and works with complex numbers.
[heading Total Variation]
std::vector v{1,1,1};
double tv = boost::math::tools::total_variation(v.begin(), v.end());
// no variation in v, so tv = 0.
v = {0,1};
double tv = boost::math::tools::total_variation(v.begin(), v.end());
// variation is 1, so tv = 1.
The total variation only supports real numbers.
All the constituent operations to compute the total variation are well-defined for complex numbers,
but the computed result is not meaningful.
The container must be forward iterable, and the contents are not modified.
[heading Shannon Entropy]
std::vector v{1/2.0, 1/2.0};
double Hs = boost::math::tools::shannon_entropy(v.begin(), v.end());
// Hs = ln(2).
The Shannon entropy only supports non-negative real-valued inputs, presumably for interpretational purposes in the range [0,1]-though this is not enforced.
The natural logarithm is used to compute the Shannon entropy; all other "Shannon entropies" are readily obtained by change of log base.
[heading References]
* Higham, Nicholas J. ['Accuracy and stability of numerical algorithms.] Vol. 80. Siam, 2002.
* Mallat, Stephane. ['A wavelet tour of signal processing: the sparse way.] Academic press, 2008.
* Hurley, Niall, and Scott Rickard. ['Comparing measures of sparsity.] IEEE Transactions on Information Theory 55.10 (2009): 4723-4741.
[endsect]
[/section:vector_functionals Vector Functionals]