This is the mail archive of the
gsl-discuss@sources.redhat.com
mailing list for the GSL project.
Re: GSL 1.6 random numbers (gauss.c and rng.c)
- From: James Theiler <jt at lanl dot gov>
- To: Charles Karney <ckarney at sarnoff dot com>
- Cc: gsl-discuss <gsl-discuss at sources dot redhat dot com>
- Date: Wed, 26 Jan 2005 13:54:17 -0700 (MST)
- Subject: Re: GSL 1.6 random numbers (gauss.c and rng.c)
On Wed, 26 Jan 2005, Charles Karney wrote:
] > From: James Theiler <jt@lanl.gov>
] > Subject: Re: GSL 1.6 random numbers (gauss.c and rng.c)
] > Date: Wed, 26 Jan 2005 10:52:03 -0700 (MST)
] >
] > ] Now, I've got some comments on
] > ]
] > ] gsl_ran_gaussian_ratio_method
] > ] gsl_rng_uniform_int
] > ]
] > ] in GSL 1.6
] > ]
] > ] (a) [MINOR NIT] In gsl_ran_gaussian_ratio_method, the last digit of the
] > ] constant for sqrt(8/e) is wrong. Worse still, the number you have
] > ] is too small. Rounding the result UP still gives the correct
] > ] results, however (at the expense of a few extra rejects). So you
] > ] might replace
] > ]
] > ] x = 1.71552776992141359295 * (v - 0.5) / u;
] > ] by
] > ] x = 1.715527769921413593 * (v - 0.5) / u;
] > ] or
] > ] x = 1.71552777 * (v - 0.5) / u;
] > ]
] > ] In my implementation I round up after additional digits:
] > ]
] > ] 1.71552776992141359296037928256
] >
] > Hmmm, if x were used solely as the rejection criterion, then I would
] > be inclined to agree that there is no harm to accuracy (and utterly
] > negligible harm to efficiency) to use a rounded-up value for the
] > constant.
] >
] > But it's a little more complicated since x is also used as part of the
] > returned variate. You may still be right that it doesn't matter,
] > maybe everything just scales up, but it's not obvious to me.
]
] If you check the description of the algorithm in Knuth or the original
] paper, you will see that you are just trying to sample a point uniformly
] in a funny shaped region. This is done by choosing a point in a
] rectangle enclosing the region and rejecting the point unless it's
] inside the region. The rectangle [0,1] x [-sqrt(2/e), sqrt(2/e)] snugly
] encloses the region. However [0,1] x [-1,1] would also "work" at some
] cost in efficiency and precision. Leva uses 1.7156 for his multiplier.
Thanks. With that explanation, it _is_ obvious.
And on those grounds, I like using 1.7156; someday you might care
about statisical precision to one part in 10^16, but you'll never
care about efficiency differences of one part in ten thousand.
]
] > ] (b) [EFFICIENCY] Leva gives quadratic tests to perform in
] > ] gsl_ran_gaussian_ratio_method which result in log being evaluated
] > ] 0.012 per normal deviate instead of 1.369 times. (Knuth's tests
] > ] result in 0.232 logs/deviate). Depending on the relative speed of
] > ] log vs other arithmetic operations, this might speed up
] > ] gsl_ran_gaussian_ratio_method significantly. The reference is
] > ]
] > ] J. L. Leva, A Fast Normal Random Number Generator,
] > ] ACM TOMS 18, 449-453 and 454-455 (1992).
] >
] > Worth looking into. In my experience, logs are pretty fast on machines
] > I use (eg, Pentiums), so that the actual gains in reducing their
] > number are often not as substantial as one might hope. But I haven't
] > done any benchmarks lately, and besides, if this is a better
] > algorithm, we should at least look into it.
]
] I had mixed results on a Pentium with gcc. My previous implemention
] used Knuth's bounds. Compiling with -ffast-math, Leva gives a 2.5%
] speedup. Without -ffast-math, I got a 18% speedup. (As I recall, the
] Knuthian bounds gave a 25% speedup over the raw ratio method.)
]
] Note that the coefficients in Leva's quadratic bounds are only given to
] a few sig. figs. This is OK -- see discussion above about the the
] sqrt(8/e) const. The only important thing is that the FINAL exit test x
] * x <= -4.0 * log (u) is accurate.
]
] One other comment on both gsl_ran_gaussian_ratio_method and
] gsl_ran_gaussian. I think you would be better off using
] gsl_rng_uniform_pos consistently in both routines. This has the
] property that (gsl_rng_uniform_pos - 0.5) is symmetrically distributed
] about zero. With gsl_rng_uniform, the gaussian routines return negative
] numbers slightly more often than positive ones. (A tiny effect -- but
] better to fix it now than to worry about simulation results 5 years down
] the road.)
Good point; I agree.
]
] > ] (c) [GENERALIZE?] In gsl_rng_uniform_int you quit if n > range. However
] > ] within that test you could allow the case n-1 == range. Thus
] > ]
] > ] if (n > range)
] > ] {
] > ] if (n - 1 == range)
] > ] return ((r->type->get) (r->state)) - offset;
] > ] GSL_ERROR_VAL ("n exceeds maximum value of generator",
] > ] GSL_EINVAL, 0) ;
] > ] }
] > ]
] > ] (Of course if range = maxint - 1, n would not be representable...)
] > ]
] >
] > I have to admit, I haven't looked at this particular function before.
] > It is confusing to me; it looks like range is defined in a funny way;
] > eg if min=0, max=99, shouldn't the range be 100 ? Or,
] > if we are going to define range=max-min, then we
] > should define the scale differently
] >
] > unsigned long int scale = (range + 1)/n;
] >
] > where the "+ 1" is added to what we have now. Then the rest proceeds
] > correctly and even more efficiently.
] >
] > ...except when r->type->max is the maximum unsigned long (ie,
] > 0xffffffffUL), which is the case for a lot of the generators. Then
] > the "+ 1" causes an overflow.
]
] I suspect that this was merely to avoid overflow. Here's my independent
] C++ implemention of this functionality where I encountered the same
] problem. The comments tell the story...
]
] Here MM = 2^30 and UInt30() is Knuth's recommended RNG (returning a
] result in [0, MM))
]
] // A random number in [0, 2^32).
] unsigned long Random::Integer() {
] // xor 2 30-bit numbers to give 32-bit result
] return (UInt30() << 2) ^ UInt30();
] }
]
] // Return random number in [0,n)
] unsigned long Random::Integer(const unsigned long n) throw() {
] if (n == 0)
] // Special case, equivalent to Integer(2^32). (Otherwise n == 0
] // is meaningless.)
] return Integer();
] else if (n == 1)
] // Saves using up a random number.
] return 0;
] else {
] // Do we need 32-bit randoms as our raw material?
] const bool big = n > MM;
] // The maximum random that we need. We'd like the first number
] // to be 2^32, but we can't deal with such large numbers. The
] // following will give the correct results at a slight cost in
] // efficiency. (The case n = 2^31 will need two trips through
] // the loop instead of one.)
] const unsigned long q = big ? 0xffffffffUL : MM;
] const unsigned long r = q / n; // r > 0
] const unsigned long m = r * n; // 0 < m <= q
] unsigned long u; // Find a random number in [0,m).
] do {
] // For small n, this is executed once (since m is nearly q).
] // Worst cases are n = MM/2 + 1 and 2*MM, in which case the
] // loop is executed slightly less than twice on average.
] u = big ? Integer() : UInt30();
] } while (u >= m);
] // Since m is r * n, u/r is unbiased random number in [0,n). An
] // unbiased alternative is to use u % n. However, division is
] // preferred, as it uses the "more random" leading bits in u.
] // This also makes Bool() and Integer(2) equivalent.
] return u / r;
] }
] }
Nicely written.
I like the use of 0 to indicate 2^32.
]
] >
] > --
] > James Theiler Space and Remote Sensing Sciences
] > MS-B244, ISR-2, LANL Los Alamos National Laboratory
] > Los Alamos, NM 87545 http://nis-www.lanl.gov/~jt
]
] One final remark. knuthran.c is out of date! See
]
] http://www-cs-faculty.stanford.edu/~knuth/news02.html#rng
] http://www-cs-faculty.stanford.edu/~knuth/programs/rng.c
]
] There are two significant changes:
]
] (1) The generator is "warmed up" more thoroughly. (This avoids problems
] with correlations between random streams with adjacent seeds.)
]
] (2) Only 100 out of every 1009 random numbers are used. (This is needed
] so that the "birthday test" is satisfied.)
]
]
This is good to know.
jt
--
James Theiler Space and Remote Sensing Sciences
MS-B244, ISR-2, LANL Los Alamos National Laboratory
Los Alamos, NM 87545 http://nis-www.lanl.gov/~jt