This is the mail archive of the guile@sources.redhat.com mailing list for the Guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: OK, what about some resolution (Re: GUILE's GC - why we strugglingto solve already solved problems?)


dirk@ida.ing.tu-bs.de writes:
> Hello together!

Good post! Let me just comment on a few small bits

> * a garbage collector as Boehm's requires that pointers always look like
>   pointers, because only pointers that look like pointers to the collector
>   are treated as references to memory regions that have to be kept.  In
>   guile, currently pointers do not always look like pointers.  For
>   example, in gloc cells there is an offset added, thus obfuscating the
>   original pointer.  This may not be a problem, since the resulting value
>   still looks like a pointer _inside_ of the memory region.  Thus, in this
>   special case it can be assumed that we would be on the safe

BGC deals with this: pointers into a block keep the block alive. You
can explicitly  switch off this behavior, but according to the
comments, it only speeds things up if the blocks are large.

Further, IIRC, glocs and structs are messy parts of guile that need
cleanup.

>   side.  However, a real problem are pointers on systems, where UNICOS is
>   defined, as the following excerpt from gc.h shows:

>   #ifdef _UNICOS
>   #  define SCM2PTR(x) ((SCM_CELLPTR) (SCM_UNPACK (x) >> 3))
>   #  define PTR2SCM(x) (SCM_PACK (((scm_bits_t) (x)) << 3))
>   #else
>   #  define SCM2PTR(x) ((SCM_CELLPTR) (SCM_UNPACK (x)))
>   #  define PTR2SCM(x) (SCM_PACK ((scm_bits_t) (x)))
>   #endif /* def _UNICOS */

I have the strong suspicion (given the amount tweaking needed to get
typedef void * SCM into Scheme, and the various hacks I encounter)
that there is lots of code that does not follow this convention.  When
was the UNICOS port last seen alive?

> * guile requires that all of it's cells are 8 byte aligned, because 
>   the lower 3 bits of the address are needed for type information.  When
>   replacing SCM_NEWCELL with a call to a standard-style malloc, then we
>   always have to over-allocate in order to be able to guarantee the 8 byte
>   alignment for the actual cell data.  If there is no support for this
>   kind of requirements, then with Boehm's collector we will always have an
>   memory usage overhead of at least 50% (assuming that the result of
>   malloc is 4 byte aligned we would have to allocate 12 instead of 8
>   bytes to be able to provide the required alignment.)

I think that BGC has  separate pools for small objects, (sizes from 1 up to
MAXOBJSZ words). I can't imagine that would be difficult to have these
pools aligned. In fact , I'm pretty sure they are , otherwise my
quick-and-dirty hack would have grotesquely failed.

> * the fact that guile's cell heap and the systems's general malloc heap
>   are separated gives us a variety of debugging possibilities.  For
>   example, the kind of checks enabled by compiling guile and extension
>   code with SCM_DEBUG_CELL_ACCESSES enabled _might_ become difficult to
>   provide with a system where there is simply no information about _where_
>   in memory cells may occur:  If cell memory is obtained via some malloc
>   function that is also used to obtain memory for other purposes, there
>   isn't any distinction any more.

Does SCM_DEBUG_CELL_ACCESSES really solve any problems? With strict
SCM typing enabled, it will be hard to put garbage into cells.

> * when cells are collected, we need to provide our own finalization code.  
>   It may be possible that Boehm provides this feature.

That's available, but GUILE doesn't have much finalization code, has
it? I don't think it's a problem.

> * the separation of the gc bits from the type tags.  This step may bring
>   performance improvements due to reduced page faults.  It may also bring
>   performance improvements due to the fact that it reduces the pressure on
>   guile's type bits, because we might be able to use more efficient type
>   encodings in some places.

we tried that some time ago. The net effect was very small.

> Separating the gc bits from the type tags will allow us to fully separate the
> gc subsystem from the rest of guile and put it into a separate gc library that
> then may be used also from code other than guile.  The nice thing is, that
> this library can be made almost completely independent of the type system that
> is used.  This allows everybody to experiment with gc, without the need to

Yep. I guess that the GC system should provide  the following
primitives in order to plug in boehm nicely:

	   SCM scm_tagged_cons (long tag, void * ptr)
	   SCM scm_tagged_double_cons (long tag, [contents] )
	   SCM scm_atomic_vector (size_t size); // for strings,  bit vects, float vects etc
	   SCM scm_scm_vector (int n) ; // for scheme vectors
	   SCM scm_normal_cons (SCM a, SCM b)

(I might be  mistaken about the tags. I'm not very familiar with
GUILE's tag structure.)

-- 

Han-Wen Nienhuys   |   hanwen@cs.uu.nl    | http://www.cs.uu.nl/~hanwen/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]