This is the mail archive of the gsl-discuss@sources.redhat.com mailing list for the GSL project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: GSL 1.6 random numbers (gauss.c and rng.c)

From: James Theiler <jt at lanl dot gov>
To: Charles Karney <ckarney at sarnoff dot com>
Cc: gsl-discuss <gsl-discuss at sources dot redhat dot com>
Date: Wed, 26 Jan 2005 13:54:17 -0700 (MST)
Subject: Re: GSL 1.6 random numbers (gauss.c and rng.c)
On Wed, 26 Jan 2005, Charles Karney wrote:

]  > From: James Theiler <jt@lanl.gov>
]  > Subject: Re: GSL 1.6 random numbers (gauss.c and rng.c)
]  > Date: Wed, 26 Jan 2005 10:52:03 -0700 (MST)
]  > 
]  > ] Now, I've got some comments on
]  > ] 
]  > ]     gsl_ran_gaussian_ratio_method
]  > ]     gsl_rng_uniform_int
]  > ] 
]  > ] in GSL 1.6
]  > ] 
]  > ] (a) [MINOR NIT] In gsl_ran_gaussian_ratio_method, the last digit of the
]  > ]     constant for sqrt(8/e) is wrong.  Worse still, the number you have
]  > ]     is too small.  Rounding the result UP still gives the correct
]  > ]     results, however (at the expense of a few extra rejects).  So you
]  > ]     might replace
]  > ] 
]  > ]       x = 1.71552776992141359295 * (v - 0.5) / u;
]  > ]     by
]  > ]       x = 1.715527769921413593 * (v - 0.5) / u;
]  > ]     or
]  > ]       x = 1.71552777 * (v - 0.5) / u;
]  > ] 
]  > ]     In my implementation I round up after additional digits:
]  > ] 
]  > ]       1.71552776992141359296037928256
]  > 
]  > Hmmm, if x were used solely as the rejection criterion, then I would
]  > be inclined to agree that there is no harm to accuracy (and utterly
]  > negligible harm to efficiency) to use a rounded-up value for the
]  > constant.
]  > 
]  > But it's a little more complicated since x is also used as part of the
]  > returned variate.  You may still be right that it doesn't matter,
]  > maybe everything just scales up, but it's not obvious to me.
] 
] If you check the description of the algorithm in Knuth or the original
] paper, you will see that you are just trying to sample a point uniformly
] in a funny shaped region.  This is done by choosing a point in a
] rectangle enclosing the region and rejecting the point unless it's
] inside the region.  The rectangle [0,1] x [-sqrt(2/e), sqrt(2/e)] snugly
] encloses the region.  However [0,1] x [-1,1] would also "work" at some
] cost in efficiency and precision.  Leva uses 1.7156 for his multiplier.

Thanks.  With that explanation, it _is_ obvious.

And on those grounds, I like using 1.7156; someday you might care
about statisical precision to one part in 10^16, but you'll never
care about efficiency differences of one part in ten thousand.

] 
]  > ] (b) [EFFICIENCY] Leva gives quadratic tests to perform in
]  > ]     gsl_ran_gaussian_ratio_method which result in log being evaluated
]  > ]     0.012 per normal deviate instead of 1.369 times.  (Knuth's tests
]  > ]     result in 0.232 logs/deviate).  Depending on the relative speed of
]  > ]     log vs other arithmetic operations, this might speed up
]  > ]     gsl_ran_gaussian_ratio_method significantly.  The reference is
]  > ] 
]  > ]       J. L. Leva, A Fast Normal Random Number Generator,
]  > ]       ACM TOMS 18, 449-453 and 454-455 (1992).
]  > 
]  > Worth looking into. In my experience, logs are pretty fast on machines
]  > I use (eg, Pentiums), so that the actual gains in reducing their
]  > number are often not as substantial as one might hope.  But I haven't
]  > done any benchmarks lately, and besides, if this is a better
]  > algorithm, we should at least look into it.
] 
] I had mixed results on a Pentium with gcc.  My previous implemention
] used Knuth's bounds.  Compiling with -ffast-math, Leva gives a 2.5%
] speedup.  Without -ffast-math, I got a 18% speedup.  (As I recall, the
] Knuthian bounds gave a 25% speedup over the raw ratio method.)
] 
] Note that the coefficients in Leva's quadratic bounds are only given to
] a few sig. figs.  This is OK -- see discussion above about the the
] sqrt(8/e) const.  The only important thing is that the FINAL exit test x
] * x <= -4.0 * log (u) is accurate.
] 
] One other comment on both gsl_ran_gaussian_ratio_method and
] gsl_ran_gaussian.  I think you would be better off using
] gsl_rng_uniform_pos consistently in both routines.  This has the
] property that (gsl_rng_uniform_pos - 0.5) is symmetrically distributed
] about zero.  With gsl_rng_uniform, the gaussian routines return negative
] numbers slightly more often than positive ones.  (A tiny effect -- but
] better to fix it now than to worry about simulation results 5 years down
] the road.)

Good point; I agree.

] 
]  > ] (c) [GENERALIZE?] In gsl_rng_uniform_int you quit if n > range.  However
]  > ]     within that test you could allow the case n-1 == range.  Thus
]  > ] 
]  > ]       if (n > range) 
]  > ] 	{
]  > ] 	  if (n - 1 == range)
]  > ] 	    return ((r->type->get) (r->state)) - offset;
]  > ] 	  GSL_ERROR_VAL ("n exceeds maximum value of generator",
]  > ] 			    GSL_EINVAL, 0) ;
]  > ] 	}
]  > ] 
]  > ]     (Of course if range = maxint - 1, n would not be representable...)
]  > ] 
]  > 
]  > I have to admit, I haven't looked at this particular function before.  
]  > It is confusing to me; it looks like range is defined in a funny way;
]  > eg if min=0, max=99, shouldn't the range be 100 ? Or, 
]  > if we are going to define range=max-min, then we 
]  > should define the scale differently
]  > 
]  >       unsigned long int scale = (range + 1)/n;
]  > 
]  > where the "+ 1" is added to what we have now.  Then the rest proceeds
]  > correctly and even more efficiently.
]  > 
]  > ...except when r->type->max is the maximum unsigned long (ie,
]  > 0xffffffffUL), which is the case for a lot of the generators.  Then
]  > the "+ 1" causes an overflow.  
] 
] I suspect that this was merely to avoid overflow.  Here's my independent
] C++ implemention of this functionality where I encountered the same
] problem.  The comments tell the story...
] 
] Here MM = 2^30 and UInt30() is Knuth's recommended RNG (returning a
] result in [0, MM))
] 
] // A random number in [0, 2^32).
] unsigned long Random::Integer() {
]     // xor 2 30-bit numbers to give 32-bit result
]     return (UInt30() << 2) ^ UInt30();
] }
] 
] // Return random number in [0,n)
] unsigned long Random::Integer(const unsigned long n) throw() {
]     if (n == 0)
]         // Special case, equivalent to Integer(2^32).  (Otherwise n == 0
]         // is meaningless.)
]         return Integer();
]     else if (n == 1)
]         // Saves using up a random number.
]         return 0;
]     else {
]         // Do we need 32-bit randoms as our raw material?
]         const bool big = n > MM;
]         // The maximum random that we need.  We'd like the first number
]         // to be 2^32, but we can't deal with such large numbers.  The
]         // following will give the correct results at a slight cost in
]         // efficiency.  (The case n = 2^31 will need two trips through
]         // the loop instead of one.)
]         const unsigned long q = big ? 0xffffffffUL : MM;
]         const unsigned long r = q / n; // r > 0
]         const unsigned long m = r * n; // 0 < m <= q
]         unsigned long u;        // Find a random number in [0,m).
]         do {
]             // For small n, this is executed once (since m is nearly q).
]             // Worst cases are n = MM/2 + 1 and 2*MM, in which case the
]             // loop is executed slightly less than twice on average.
]             u = big ? Integer() : UInt30();
]         } while (u >= m);
]         // Since m is r * n, u/r is unbiased random number in [0,n).  An
]         // unbiased alternative is to use u % n.  However, division is
]         // preferred, as it uses the "more random" leading bits in u.
]         // This also makes Bool() and Integer(2) equivalent.
]         return u / r;
]     }
] }

Nicely written.
I like the use of 0 to indicate 2^32.


] 
]  > 
]  > -- 
]  > James Theiler            Space and Remote Sensing Sciences
]  > MS-B244, ISR-2, LANL     Los Alamos National Laboratory
]  > Los Alamos, NM 87545     http://nis-www.lanl.gov/~jt
] 
] One final remark.  knuthran.c is out of date!  See
] 
]     http://www-cs-faculty.stanford.edu/~knuth/news02.html#rng
]     http://www-cs-faculty.stanford.edu/~knuth/programs/rng.c
] 
] There are two significant changes:
] 
] (1) The generator is "warmed up" more thoroughly.  (This avoids problems
]     with correlations between random streams with adjacent seeds.)
] 
] (2) Only 100 out of every 1009 random numbers are used.  (This is needed
]     so that the "birthday test" is satisfied.)
] 
] 


This is good to know.

jt

-- 
James Theiler            Space and Remote Sensing Sciences
MS-B244, ISR-2, LANL     Los Alamos National Laboratory
Los Alamos, NM 87545     http://nis-www.lanl.gov/~jt
Follow-Ups:
- GSL 1.6: patch for gauss.c
  - From: Charles Karney
References:
- Re: GSL 1.6 random numbers (gauss.c and rng.c)
  - From: Charles Karney
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]