This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: cons expensive? (was Re: DHARMI project)


Telford Tendys <telford@triangle.triode.net.au> writes:

> Well I looked at trying to speed up the garbage collector but
> (to the extent of my understanding) there's nothing much to be improved --

[ ... ]

Guile currently allocates local environments in cons cells, for
typical programs this represents most of the consing done.  In the
later versions of SCM there is a small cache of cells used only for
environments; this cache is managed by a copying collector that moves
live cells to the usual Scheme heap.  Since most environment cells
become garbage almost immediately this saves the time that would be
used in the sweep phase of a mark/sweep collector and also results in
better locality of reference.

Using this copying collector required changes to the evaluator.
However, there is not really that much code that deals directly with
local environments.  Environment cacheing appealed to me because
it required only incremental changes to SCM;  with some more work
one might also allocate most local bindings directly on a stack.

> Since I use a lot of floating point numbers, there is an implicit
> cons for every calculation of every number (they are stored indirectly,
> actually scm_cons() is not used but SCM_NEWCELL() is and the result
> is much the same). Beating this would involve completely reworking the
> framework of types that guile uses, a massive task and potentially
> leading to more problems than it fixes <sigh>.

A substantial part of the cost of allocating floating point numbers
is in mallocating and freeing each double.  This part of the cost
can be reduced greatly without changing much:  allocate each double
or complex (just the double, not the cell header) in a dedicated heap
and collect by copying at mark time.  I have this working for floats
in SCM, and am working on using it for bignums.  Bignums are more
difficult because the arithmetic functions assume that pointers
to big digits remain valid across potential garbage collections.

> One thing that I do notice (from pairs.h):
> 
> #define SCM_NEWCELL(_into) \
> 	{ \
> 	  if (SCM_IMP(scm_freelist)) \
> 	     _into = scm_gc_for_newcell();\
> 	  else \
> 	    { \
> 	       _into = scm_freelist; \
> 	       scm_freelist = SCM_CDR(scm_freelist);\
> 	       ++scm_cells_allocated; \
> 	    } \
> 	}
> 
> What does scm_cells_allocated do? Can it be removed from this
> macro? I know it seems insignificant in terms of overhead but
> this macro is called LOTS of times.

It allows the repl to print out the allocation cost of each command,
which is useful if you are trying to minimize that cost.