This is the mail archive of the gsl-discuss@sources.redhat.com mailing list for the GSL project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: True Random Numbers

From: "Robert G. Brown" <rgb at phy dot duke dot edu>
To: C J Kenneth Tan -- OptimaNumerics <cjtan at OptimaNumerics dot com>
Cc: Rodney Sparapani <rsparapa at post dot its dot mcw dot edu>, hobbsk at ohiou dot edu,gsl-discuss at sources dot redhat dot com
Date: Mon, 15 Nov 2004 10:47:11 -0500 (EST)
Subject: Re: True Random Numbers
References: <200411052017.iA5KHGO07874@post.its.mcw.edu><Pine.LNX.4.61.0411150822420.4627@krylov.OptimaNumerics.com>

On Mon, 15 Nov 2004, C J Kenneth Tan -- OptimaNumerics wrote:

> 
> 
> On 2004-11-05 14:17 -0600 Rodney Sparapani (rsparapa@post.its.mcw.edu) wrote:
> 
> > I guess it is possible.  I think the main problem with true random numbers is 
> > that you can't reproduce them as needed.  That's why pseudo-random numbers are 
> > so popular.  As long as you keep track of your seed, then you can replay the 
> > sequence when necessary.
> > 
> > >Is it possible, or easy, to use the GSL random number distributions with
> > >true random number devices?  I'm thinking of a generator that really
> > >just reads from /dev/random or a device driver for one of the really
> > >fancy hardware random number devices.
> 
> Another problem is that you cannot verify the quality of random
> numbers obtained from physical sources.  It has also be shown by
> George Marsaglia that physical sources which may be expected to be
> "random" are actually not good sources of randomness.  This was
> covered also in Knuth's "Art of Computer Programming".

An excellent point; this is one of the reasons that I've been working on
the testing program.  A member of our department has discovered an
optical process that he believes might generate truly random bits, and
talked to me about how he might go about testing the output of his
prototype device for randomness.  Interestingly, although the process is
(supposedly) random, the noise it produces is not white -- the signal
has a distribution.  There are ways of taking a random signal that isn't
uniform and transforming it into one that is first order uniform (that
is, has the right 0/1 bit balance) but a) the transformation costs two
bits to make one bit, so it slows the generation process; and b) it
leaves open the point as to whether >>higher<< order bit correlations
are correctly (randomly) distributed, e.g. do 00, 01, 10, 11 occur in
the right proportions?  How about 000, 001, 010, etc?  Many of the
supposedly "random" physical processes that generate the bits have (if
you look closely) the right first order correlations but the wrong
second or higher order correlations.  Even supposedly purely random
quantum processes such as photon counting from a fluorescent source can
fail, because while single photon emission from an atom known to be
precisely in the excited state is random enough, photon emission in
fluorescence is antibunched.  Not even quantum processes are TRULY
random; at best they are unpredictable.

Testing generators is a highly nontrivial process.  As I am rewriting
Marsaglia's diehard and the NIST STS tests (from their descriptions in
his documentation and/or the literature, as most of them come from the
literature) fairly actively into a GPL shell tightly integrated with the
GSL) it is becoming clear that even these tests are not enough.  Most of
the diehard implementations of these tests date from when an IBM PC was
an adequate testing platform and/or when the actual tests were run on an
IBM mainframe from compiled OLD fortran code.  They were designed to be
run on sequences of a few thousand rands -- the biggest tests on a few
million -- read in from a file.  As I scale some of the tests up, I'm
learning things about even the "good" GSL generators that suggest that
they aren't "perfect" -- in many cases it may not be whether a test
fails diehard tests, but at what POINT it fails a diehard test scaled up
to a level appropriate to test a generator that might have to produce
10^18+ rands over the course of a single computation (where samples of
only a few thousand or even million only represent a tiny fraction of
the sequence).

Anyway, I spent a LOT of hours working on rand_rate over the weekend,
and added three diehard tests (runs, birthdays, and 3dsphere) to the
small list of bitlevel tests I already had implemented.  This required
that I write a couple of Kolmogorov-Smirnov tests for the uniformity of
the p's produced in these tests over independent runs (the regular test
for max|observed - expected| in the cumulative sorted list and the
Kuiper variant that sums the largest positive deviation and the largest
negative deviation that treats the tails better).  I still need to do
Anderson-Darling (which is what Marsaglia actually used) and also need
to do all the variants that test a vector supposedly drawn from an
arbitrary known distribution against the actual distribution.

Today I'll probably add the diehard "minimum distance test" which is
really "2D spheres" (same test as 3d spheres in 2D).  These tests make
me a bit nervous because the test (target) distributions aren't
necessarily known analytically and the test statistic therefore contains
a number that is itself evaluated by "extensive simulation".  This begs
a variety of interesting questions about whether these tests CAN be
safely cranked up -- at some point one expects that a modern "good"
generator such as the mt19937 will end up revealing the failure of the
older generators used to numerically determine the target distributions
used in the test code and not the other way around...

I added a major.minor version number e.g. [0].[4.31] to the project
abstract link for rand_rate so people who are playing with this along
with me can see when I add a test (incrementing minor by 0.1), fix an
important bug (incrementing minor by 0.01), or do a major structural
rewrite to incorporate ad hoc tools into a uniform interface available
to all the tests (bump minor by 1).  I probably won't bump major until I
"finish" incorporating all the existing diehard and sts tests.

I've also got a few other ideas for extending the already tight
integration between the tests and the GSL, including implementing a
"distribution test" sequence that basically generates a random sample
drawn from a GSL-supported distribution and performs a KS test on it,
for a given/selected GSL rng.  This is actually a very simple test of
both the GSL routine generating the distribution and the underlying rng
(most of the tests of rngs are of exactly this structure, but for some
exotic distributions).  They are in the abstract.  The man page and
usage documentation will likely lag development for a while as I add
tests, BTW.

If anybody else is interested in participating in this, please let me
know.  I've got one more major structural rewrite planned -- the need
for it has become evident as I implemented the diehard tests, which use
a slightly different validation process (using e.g. KS) compared to the
STS tests (that typically do not use KS).  The planned interface will
permit one to use a variety of ways of looking at result histograms of
pvalues to determine if a low p is just Marsaglia's "p happens" or is a
real signal of rng weakness.  Of course, by the time I'm done it will be
a true miracle if a weak rng fails only one test of the battery that
will be available, and since each test has a variety of adjustable
parameters (where possible or appropriate) in the rand_rate frame
already one can also crank the test up until a failure that is marginal
at the existing diehard or sts levels is glaringly apparent.  I've had
great fun doing that with some of the tests already implemented with
known-bad tests such as randu, which can actually not obviously fail
some of the diehard or sts tests at their existing parameterization,
sometimes, but which will generally fail spectacularly if pushed at a
higher level.

    rgb

> 
> 
> Kenneth Tan
> -----------------------------------------------------------------------
> News: OptimaNumerics Increases Performance of World's Fastest IBM
>       eServer BladeCenter JS20 Supercomputer!
> -----------------------------------------------------------------------
> C. J. Kenneth Tan, Ph.D.
> OptimaNumerics Ltd.
> E-mail: cjtan@OptimaNumerics.com      Telephone: +44 798 941 7838    
> Web: http://www.OptimaNumerics.com    Facsimile: +44 289 066 3015 
> -----------------------------------------------------------------------
> 

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb@phy.duke.edu

Follow-Ups:
- Re: True Random Numbers
  - From: C J Kenneth Tan -- OptimaNumerics

References:
- Re: True Random Numbers
  - From: Rodney Sparapani
- Re: True Random Numbers
  - From: C J Kenneth Tan -- OptimaNumerics

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]