This is the mail archive of the gsl-discuss@sourceware.org mailing list for the GSL project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: gsl container designs


On Wed, 6 Jan 2010, Gerard Jungman wrote:

In fact, I don't even see a need to sacrifice efficiency. I think

I recall from my benchmarking days that -- depending on compiler -- there is a small dereferencing penalty for packed matrices (vectors packed into dereferencing **..* pointers) compared to doing the offset arithmetic via brute force inline or via a macro. I ran two different implementations of stream (one with direct pointer traversal of the vectors, one with vector offsets repacked into **matrices to do the tests. The size of this penalty did decrease as gcc advanced; I haven't run the benchmark recently and don't know how large it currently is. It was never so large that it stopped me from using repacked pointers for code clarity.

this goal is directly achievable at essentially optimal efficiency.
My attitude about C is that it is all about lining up the bytes.
A good C design is just some syntactic layer on top of a
rudimentary layer, where-in you are careful to line up all
the bytes. After that, optimal efficiency is pretty much
automatic. C is very friendly, once you are careful to
line everything up. I hope you see my small but non-vacuous
technical meaning.

Agreed. And I can see no reason that modern compilers may not have eliminated most of the dereferencing penalty, especially if one aligns things carefully. However, some of my beowulfish friends who work(ed) for compiler companies argued that one of Fortran's longstanding advantages in numerical code efficiency is simply because of the fact that matrices in Fortran are written in stone and so compiler writers can optimize the hell out of them. C pointers, OTOH -- well, one can make triangular arrays, rectangular arrays, one can start the indices at 0 or 1 or -100 or 47 as you like and run to wherever you like. How can anyone pre-optimize a compiler to handle all of that?

Truthfully, I can't see why they CAN'T, but I'm not a compiler writer
and empirically there has been a small penalty between C in general and
fortran and C with direct addressing and C with dereferenced addressing
as well.  IIRC there is one more layer of address resolution in indirect
dereferencing, which is the fundamental origin of the latter.

What I want is extremely simple:

* A way to construct/allocate a matrix of arbitrary dimension (out to some
(un)reasonable max, e.g. 10

Check.



but I want to be able to use it
just like m[i][j][k]...[r][s] when all is said and done.

This is the only part which I wonder about. I thought about this before and came to the conclusion that it is not easy to get right. Probably possible, but very tricky.

I've written code to do so many times -- Numerical Recipes provides the basic idea in its matrix-packing routines. Simply allocate e.g.

double **..*m,*v;

Then malloc v long enough to hold the actual data.  Then malloc m to
hold the first index, malloc vectors long enough to hold the second
index and assign them to these indices in a first-index loop (recurse
until the next to last index) and finally pack the appropriately offset
addresses from the actual vector v into the next to last index so that
the last index actually traverses v.  In this scheme the last index is
always the linear/fast index in the actual data vector, the other
indices are all there to permit the compiler to handle the
dereferencing.

To free the matrix, one has to work backwards -- free each successive
index one malloc'd from the right back to the left (you need the
leftmost indices to know the addresses to free at each level), and
finally free m and v themselves.  All very tedious and mechanical, but
all fully automatable and debuggable as well.  The biggest problem with
NR is that each type has an individual allocation/free command, and it
only goes to 2 dimensions.  To make a single routine pair handle more
dimensions and arbitrary types is more work and requires void typing and
casting (and/or switches to handle the unique aspects of individual
tyees internally).

The most annoying aspect of this is that these little problems
are trivial in C++. But maybe just beyond the capability of C.

How could that be? People write operating systems in C. I hardly think that allocating sensible matrices is beyond it.

I'm not trying to be argumentative or flame, by the way.  Could you be
more explicit about why packing pointers to permit the dereferencing of
an arbitrary matrix is technically difficult (as opposed to merely
tedious) in C and would be easier in C++ (which I thought discouraged
the use of pointers altogether)?

* The specific dimension bounds need to be user specified.  If I want to
make a 6 dimensional triple triangular array to hold e.g. 3j symbols, I
don't want to have to allocate a rectangular array most of which is
empty wasted space.

See the earlier comment by Tuomo Keskitalo, about banded/packed storage. Not something I have thought about yet, but probably not hard.

I'm not sure about "banded", but packed is (as noted) simple. It is just a matter of picking the lengths of the successive ***, **, * length vectors one mallocs to fill the toplevel e.g. **** pointer, plus the use of appropriate offsets. In particular, NR contains examples of how to do e.g.:

**dmatrix(l1,u1,l2,u2)

for arbitrary lower/upper indices in the two slots (as well as the
associated free) and it is straightforward to generalize the basic
algorithm to higher dimensions and allow for arbitrary data types.  I
don't think twice about allocating a vector of structs of more or less
arbitrary length (including structs that contain vectors or matrices
themselves) and dereferencing it via a packed, cast or typed, matrix.

This is one of the ADVANTAGES of C -- one can just do what one wants to
do in the most natural way via pointers, mallocs and suitable typedefs
and/or structs.  Without a safety net, admittedly, and you're
responsible for both code integrity and efficiency with little compiler
help, but OTOH you can quite easily visualize what the underlying
assembler is going to (approximately) look like, so one CAN tweak it in
reasonable ways for efficiency without resorting to actual assembler.

* Access to the underlying actual data vector (non-opaque, in other
words) so it can be passed to e.g. ODE solvers and the like that can
only be sensibly written in a general way for a vector.

Check.



My problem in this regard with the GSL is that so far, the matrix and
vector constructs are pretty much useless -- they are far away from the
standard way one usually does this sort of thing in C using pointers or
just allocating a vector or matrix.

I find them useless as well, for exactly these reasons.



One of the ADVANTAGES of C is that it permits one to work pointer magic
to improve the readability (and sometimes execution efficiency) of code.
Opaque types and set/get calls are dark evil in both regards.

Right. The layer which defines the "types" should be essentially syntactic, just there to make it easier to organize/understand your code. The underlying implementation should be a brutally simple pile of bytes, and it should be visible.

Amen, my brother. C is a THIN veneer of upper-level language
sensibilities on top of raw assembler, and its ability to work "close to
the metal" is its power. This power should be preserved as much as
possible in the GSL, in the process of providing a human readable
interface. I actually wrote all of this, in a more or less GSL--like
form, several years ago. Some of the code was ugly, but it worked
charmlike, at the cost of requiring a cast of a void type to tell the
compiler how to dereference the outermost (actual) vector index.
Internally, all the dereferencing pointer arithmetic is more or less the
same for a void (universal) type, but the pointer into the actual data vector has to be advanced with the correct stride, and the compiler has to know what that stride should be.


I can probably dig this code out and post it here (or post a link to it)
if you want to look at it.  I think I went up to 9 or 10 dimensions (way
more than I needed) and did a lot of things via brute force to make the
purely mechanical recursion obvious; I planned an eventual rewrite into
a more compact form that I never got around to.

rgb



--
G. Jungman



Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]