This is the mail archive of the gsl-discuss@sourceware.org mailing list for the GSL project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Fwd: NaNs in the GSL]


Here's my problem: NaNs. Most real-world data that one would interrogate
is filled with them. The typical stats package has a global switch
named something like rm_NaNs; if rm_NaNs==0, then most functions (min, max,
variance, et cetera) will return NaN if any element of the input is NaN, and
if rm_NaNs==1, then these functions auto-prune, by prepending every use
of x with something like
    if (!rm_NaNs || !gsl_isnan(x))
        use x

So has the GSL team considered including such a flag in the GSL? As above,
fixing the code in most cases would be a trivial one-line insertion,
but are there other reasons for not adding a global gsl_rm_NaNs variable?

Hello,


Interesting subject. Generally we don't use the NaN as missing value interpretation in GSL, because of the risk of confusing the two. In GSL a NaN always indicates a numerical error and should always be propagated so that it is not lost or hidden in someway.

I have looked at adding support for an "NA" (not available) value in the past, as in R and Octave but decided against it ("NA" is a NaN with a specific bit pattern in an empty part of the IEEE fields). The problems with it were:

1) It is a non-standard usage. This creates some problems with operations on NA and NaN values converting between the two, e.g. x+NA could come out as NaN or NA.

2) It only works for floating point, we'd really want a uniform interface for all types. For integers R uses MIN_INT as NA but that is not really an option of GSL.

I think these limitations are less of a problem in an application like R or Octave where all the data is under the control of the environment itself, but not suitable for a general C library.

In terms of adding support for missing values in GSL I can see one way that would fit with another missing feature -- namely, online updating of statistics. If there were functions for online updating of means, sds, etc from individual datapoints the user could control what values were passed/discarded, at the cost of some function call overhead. The alternative would be passing an additional user-defined selection function argument, as in the n-tuples module, which could be used to drop selected values (could also be useful for trimming tails etc).

--
Brian Gough

Network Theory Ltd,
Publishing the GSL Manual - http://www.network-theory.co.uk/gsl/manual/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]