RANDOM_DATA
Generation of random data


RANDOM_DATA is a MATLAB library which generates sets of data points using certain random number distributions, in a given dimension, and in a given physical region.

Most of these routines assume that there is an available source of pseudorandom numbers, distributed uniformly in the unit interval [0,1]. In this package, that role is played by the routine R8_UNIFORM_01, which allows us some portability. We can get the same results in C, FORTRAN or MATLAB, for instance. In general, however, it would be more efficient to use the language-specific random number generator for this purpose.

If we have a source of pseudorandom values in [0,1], it's trivial to generate pseudorandom points in any line segment; it's easy to take pairs of pseudorandom values to sample a square, or triples to sample a cube. It's easy to see how to deal with square region that is translated from the origin, or scaled by different amounts in either axis, or given a rigid rotation. The same simple transformations can be applied to higher dimensional cubes, without giving us any concern.

For all these simple shapes, which are just generalizations of a square, we can easily see how to generate sample points that we can guarantee will lie inside the region; in most cases, we can also guarantee that these points will tend to be uniformly distributed, that is, every subregion can expect to contain a number of points proportional to its share of the total area.

However, we will not achieve uniform distribution in the simple case of a rectangle of nonequal sides [0,A] x [0,B], if we naively scale the random values (u1,u2) to (A*u1,B*u2). In that case, the expected point density of a wide, short region will differ from that of a narrow tall region. The absence of uniformity is most obvious if the points are plotted.

If you realize that uniformity is desirable, and easily lost, it is possible to adjust the approach so that rectangles are properly handled.

But rectangles are much too simple. We are interested in circles, triangles, and other shapes. Once the geometry of the region becomes more "interesting", there are two common ways to continue.

In the acceptance-rejection method, uniform points are generated in a superregion that encloses the region. Then, points that do not lie within the region are rejected. More points are generated until enough have been accepted to satisfy the needs. If a circle was the region of interest, for instance, we could surround it with a box, generate points in the box, and throw away those points that don't actually lie in the circle. The resulting set of samples will be a uniform sampling of the circle.

In the direct mapping method, a formula or mapping is determined so that each time a set of values is taken from the pseudorandom number generator, it is guaranteed to correspond to a point in the region. For the circle problem, we can use one uniform random number to choose an angle between 0 and 2 PI, the other to choose a radius. (The radius must be chosen in an appropriate way to guarantee uniformity, however.) Thus, every time we input two uniform random values, we get a pair (R,T) that corresponds to a point in the circle.

The acceptance-rejection method can be simple to program, and can handle arbitrary regions. The direct mapping method is less sensitive to variations in the aspect ratio of a region and other irregularities. However, direct mappings are only known for certain common mathematical shapes.

Points may also be generated according to a nonuniform density. This creates an additional complication in programming. However, there are some cases in which it is possible to use direct mapping to turn a stream of scalar uniform random values into a set of multivariate data that is governed by a normal distribution.

Another way to generate points replaces the uniform pseudorandom number generator by a quasirandom number generator. The main difference is that successive elements of a quasirandom sequence may be highly correlated (bad for certain Monte Carlo applications) but will tend to cover the region in a much more regular way than pseudorandom numbers. Any process that uses uniform random numbers to carry out sampling can easily be modified to do the same sampling with a quasirandom sequence like the Halton sequence, for instance.

The library includes a routine that can write the resulting data points to a file.

Licensing:

The computer code and data files described and made available on this web page are distributed under the GNU LGPL license.

Languages:

RANDOM_DATA is available in a C++ version and a FORTRAN90 version and a MATLAB version.

Related Data and Programs:

ASA183, a MATLAB library which implements the Wichman-Hill pseudorandom number generator.

BALL_GRID, a MATLAB library which computes grid points that lie inside a ball.

CIRCLE_GRID, a MATLAB library which computes grid points that lie inside a circle.

DISCRETE_PDF_SAMPLE, a MATLAB program which demonstrates how to construct a Probability Density Function (PDF) from a table of sample data, and then to use that PDF to create new samples.

RBOX, a C program which produces random data from a number of regions.

RSITES, a C++ program which produces random data in an M-dimensional box.

SIMPLEX_COORDINATES, a MATLAB library which computes the Cartesian coordinates of the vertices of a regular simplex in M dimensions.

TETRAHEDRON_GRID, a MATLAB library which computes a tetrahedral grid of points.

TETRAHEDRON_MONTE_CARLO, a MATLAB program which uses the Monte Carlo method to estimate integrals over a tetrahedron.

TETRAHEDRON_SAMPLES, a dataset directory which contains examples of sets of sample points from the unit tetrahedron.

TRIANGLE_GRID, a MATLAB library which computes a triangular grid of points.

TRIANGLE_HISTOGRAM, a MATLAB program which computes histograms of data on the unit triangle.

TRIANGLE_MONTE_CARLO, a MATLAB program which uses the Monte Carlo method to estimate integrals over a triangle.

TRIANGLE_SAMPLES, a dataset directory which contains examples of sets of sample points from the unit triangle.

UNIFORM, a MATLAB library which samples the uniform random distribution.

XYZ_DISPLAY, a MATLAB program which reads XYZ information defining points in 3D, and displays an image in the MATLAB graphics window.

XYZ_DISPLAY_OPENGL, a C++ program which reads XYZ information defining points in 3D, and displays an image using OpenGL.

Reference:

  1. Milton Abramowitz, Irene Stegun,
    Handbook of Mathematical Functions,
    National Bureau of Standards, 1964,
    ISBN: 0-486-61272-4,
    LC: QA47.A34.
  2. James Arvo,
    Stratified sampling of spherical triangles,
    Computer Graphics Proceedings, Annual Conference Series,
    ACM SIGGRAPH '95, pages 437-438, 1995.
  3. Gerard Bashein, Paul Detmer,
    Centroid of a Polygon,
    in Graphics Gems IV,
    edited by Paul Heckbert,
    AP Professional, 1994,
    ISBN: 0123361559,
    LC: T385.G6974.
  4. Paul Bratley, Bennett Fox, Linus Schrage,
    A Guide to Simulation,
    Second Edition,
    Springer, 1987,
    ISBN: 0387964673,
    LC: QA76.9.C65.B73.
  5. Russell Cheng,
    Random Variate Generation,
    in Handbook of Simulation,
    edited by Jerry Banks,
    Wiley, 1998,
    ISBN: 0471134031,
    LC: T57.62.H37.
  6. Jack Dongarra, Jim Bunch, Cleve Moler, Pete Stewart,
    LINPACK User's Guide,
    SIAM, 1979,
    ISBN13: 978-0-898711-72-1,
    LC: QA214.L56.
  7. John Halton,
    On the efficiency of certain quasi-random sequences of points in evaluating multi-dimensional integrals,
    Numerische Mathematik,
    Volume 2, Number 1, December 1960, pages 84-90.
  8. John Halton, GB Smith,
    Algorithm 247: Radical-Inverse Quasi-Random Point Sequence,
    Communications of the ACM,
    Volume 7, Number 12, December 1964, pages 701-702.
  9. John Hammersley,
    Monte Carlo methods for solving multivariable problems,
    Proceedings of the New York Academy of Science,
    Volume 86, 1960, pages 844-874.
  10. Ladislav Kocis, William Whiten,
    Computational Investigations of Low-Discrepancy Sequences,
    ACM Transactions on Mathematical Software,
    Volume 23, Number 2, June 1997, pages 266-294.
  11. Pierre LEcuyer,
    Random Number Generation,
    in Handbook of Simulation,
    edited by Jerry Banks,
    Wiley, 1998,
    ISBN: 0471134031,
    LC: T57.62.H37.
  12. Albert Nijenhuis, Herbert Wilf,
    Combinatorial Algorithms for Computers and Calculators,
    Second Edition,
    Academic Press, 1978,
    ISBN: 0-12-519260-6,
    LC: QA164.N54.
  13. Reuven Rubinstein,
    Monte Carlo Optimization, Simulation and Sensitivity of Queueing Networks,
    Krieger, 1992,
    ISBN: 0894647644,
    LC: QA298.R79.
  14. Peter Shirley,
    Nonuniform Random Point Sets Via Warping,
    in Graphics Gems III,
    edited by David Kirk,
    Academic Press, 1992,
    ISBN: 0124096735,
    LC: T385.G6973
  15. Greg Turk,
    Generating Random Points in a Triangle,
    in Graphics Gems I,
    edited by Andrew Glassner,
    AP Professional, 1990,
    ISBN: 0122861663,
    LC: T385.G697
  16. Daniel Zwillinger, editor,
    CRC Standard Mathematical Tables and Formulae,
    30th Edition,
    CRC Press, 1996,
    ISBN: 0-8493-2479-3,
    LC: QA47.M315.

Source Code:

Examples and Tests:

The sample calling program generates sets of points:

A file of commands is provided to simplify the use of PLOT_POINTS:

PLOT_POINTS makes Encapsulated PostScript images of the points, in cases where the data is 2 dimensional. The CONVERT command was used to convert these to PNG files.

You can go up one level to the MATLAB source codes.


Last revised on 24 August 2010.