histogram

Prev Next

Function Names

histogram

Description

This function creates a 1- or multi-dimensional histogram from a sequence of input values. A histogram is a visual representation of the distribution of the values across a defined set of intervals, also known as bins. This function allows to define intervals freely, even with different widths

1-dimensional histograms are the simplest form: All input values are put into a simple set like { 4, 3, 9, ... }, or into a nested set like {{ 4, 3, 9, ... }} and provided as the 1st function parameter. Then a set of intervals need to be defined in the 2nd function parameter. A total of n+2 bins will be created for n interval values. The first bin covers the values below the first interval, and the last bin counts all non-numeric and invalid values, for example blanks. The 3rd function parameter provides a comparison rule. Return value: The countings will be put into a set containing n+2 values corresponding to the bins described above.

For multi-dimensional histograms (e,g, 2, 3, 4, any dimension you wish!), multiplie sets of input values need to be provided in form of a nested sets. For example, for a 2-dimensional histogram, create a nested set containing two sets of values of same count. Different counts will flag errors. You also need to provide two sets of intervals in a nested set, but the intervals and the count may differ. The result will be an n-dimensional matrix in form of a nested set containing the counts.

Call as: function

Restrictions

Indirect parameter passing is disabled

Parameter count

2-3

Parameters

No.TypeDescription
1.
input
set Input values

The input values may contain both numeric and non-numeric values. Simple nesting is allowed.
Valid examples:

  • { 15, 21, 13, '', 19, ... }
  • {{ 15, 21, 13, 'n/a', 19, ... }}

For multi-dimsionsonal histograms, provide multiple sets of values in separate sets and enclose them in an overall nested paramter set. The values in each set must be the same.
Valid examples for 2 dimensions:

  • {{ 10, 20, 30 }, { 25, 3, 5 }}
  • {{ 10, '', 30 }, { 25, NA, 5 }}

2.
input
set Intervals

The interval values define the distribution of the input values into the different bins. For n intervals provided, n+2 bins will be provided. The first bin covers all values below the first interval. The next bins are for the values above the intervals. And the last bin is used to count all non-numeric values, for example blanks and text. All interval values must be numeric (For example dates are allowed) and in ascending order. Violations result in error messages and execution will stop.
Valid examples:

  • { 10, 20, 30, 40, 45, 50, 55, 60, 70, 80, 90, 100 }
  • { -50 .. -40 .. +50 }
  • { 0.1, 1, 10, 100, 1000 }
  • {{ 10, 20, 30, 40, 45, 50, 55, 60, 70, 80, 90, 100 }, { 0.1, 1, 10, 100, 1000 }} (Valid for a 2-dimensional histogram)

Invalid examples:

  • { 10, 20, 30, 25, 40 } (Not all values are ascending)
  • { below, 1..10, above } (Texts are not allowed. First bin will cover the 'below' values anyway)
  • { 10, 20, 30, 'n/a' } (The last bin will cover all non-numeric values, so do no specify this last 'n/a' entry)

Opt. 3.
input
string Comparison Operator

Following operators are allowed: '>=' and '>'. In the first case, the values are compared with '>='. For exmaple 10 is compared with interval value 10 and the bin related to 10 is incremented. In the second case, the values are compared with '>'. For example 10 would associate to bin before the interval value 10 and the value in that bin will be incremented accordingly.

Default value: >=

Return value

TypeDescription
set Histogram

An n-dimensional matrix modeled with (nested) sets contain the histogram results. For 1-dimensional histograms, a simple set containing n+2 values will be returned, where n is the number of intervals specified. Note that the 1st value covers the range below the first interval, and the last value the counts of all non-numeric input values. For 2-dimensional histogram, a 2-dimensional matrix (nested set) will be returned. For higher dimensions, the matrix consists of additional dimensions, e.g. a cubic matrix for 3-dimensional histograms.

Overall, the sum of all values returned equals to the number of input value sets provided in the 1st function parameter.

Exceptions

Multiple sets of input values contain different number of values
Interval values are not numeric or in ascending order
Invalid comparison choice
No interval values specified
Number of input value sets differe from number of interval values (histogram dimensions)

Examples


echo("1-dimensional histogram:", new line);

values[]       = { 0, -1, b,c,'',3.5, 4.5, 4.5, 5, 5.5, 5.5, 6, 7, 8, 12 };
bins[]         = { 0 .. 2 .. 10 };
echo("Input values: ", values[], new line );

a[]            = histogram( values[], bins[], '>=' );
b[]            = histogram( values[], bins[], '>' );

bins a[]       = {'<0' } + bins[] + { 'n/a' };
bins b[]       = {'<0' } + ( '>' +^ vstr(bins[]) ) + { 'n/a' };

print matrix   ("0", 3, "|", ">= case: ", bins a[], a[], "   > case: ", bins b[], b[] );

echo(new line, "2-dimensional histograms:", new line);

values 1[]     = { 1,  1.5, 2,   2.5, 2.6,  3, ''   };
values 2[]     = { 5,  9,   10, 11,   20,  10, 20   };
echo("Input values 1: ", values 1[] );
echo("Input values 2: ", values 2[], new line );

bins 1[]       = { 0..5 };
bins 2[]       = { 0..10..20 };

a[]            = histogram( { values 1[], values 2[] }, { bins 1[], bins 2[] }, '>=' );
b[]            = histogram( { values 1[], values 2[] }, { bins 1[], bins 2[] }, '>'  );

bins 1a[]      = {Bins, '<0'  } + bins 1[] + { 'n/a' };
bins 1b[]      = {Bins, '<=0' } + ( '>' +^ vstr(bins 1[]) ) + { 'n/a' };
bins 2a[]      = { '<0'  } + bins 2[] + { 'n/a' };
bins 2b[]      = { '<=0' } + bins 2[] + { 'n/a' };

a[]            = {bins 2a[]} + a[]; // Before outputting as matrix, add bin legend to the top
b[]            = {bins 2b[]} + b[]; // "

print matrix( "0", 4, "|", ">= case: ", bins 1a[], a[], "   > case: ", bins 1b[],  b[] );

Output

1-dimensional histogram:

Input values: {0,-1,'b','c','',3.5,4.5,4.5,5,5.5,5.5,6,7,8,12}

         | <0||  1|           | <0||  2|
         |  0||  1|           | >0||  0|
         |  2||  1|           | >2||  1|
         |  4||  5|           | >4||  6|
>= case: |  6||  2|   > case: | >6||  2|
         |  8||  1|           | >8||  0|
         | 10||  1|           |>10||  1|
         |n/a||  3|           |n/a||  3|

2-dimensional histograms:

Input values 1: {1,1.5,2,2.5,2.6,3,''}
Input values 2: {5,9,10,11,20,10,20}

         |Bins||  <0     0    10    20   n/a|           |Bins|| <=0     0    10    20   n/a|
         |  <0||   0     0     0     0     0|           | <=0||   0     0     0     0     0|
         |   0||   0     0     0     0     0|           |  >0||   0     1     0     0     0|
         |   1||   0     2     0     0     0|           |  >1||   0     2     0     0     0|
>= case: |   2||   0     0     2     1     0|   > case: |  >2||   0     1     2     0     0|
         |   3||   0     0     1     0     0|           |  >3||   0     0     0     0     0|
         |   4||   0     0     0     0     0|           |  >4||   0     0     0     0     0|
         |   5||   0     0     0     0     0|           |  >5||   0     0     0     0     0|
         | n/a||   0     0     0     1     0|           | n/a||   0     0     1     0     0|
Try it yourself: Open LIB_Function_histogram.b4p in B4P_Examples.zip. Decompress before use.

See also

table histogram