This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: gc notes available


Jim Blandy <jimb@red-bean.com> writes:

> > > This will likely depend upon memory protection services in the host
> > > os, but any os that doesn't provide this probably isn't running
> > > guile in the first place. 
> > 
> > This is bad news for us, if it catches on.
> > 
> > We are putting guile to work in our router product, and that has no
> > fancy memory protection. Don't forget embedded systems, even if we are
> > only a few (one?) currently.
> 
> 
> Here's an excerpt from a message I just sent Greg Harvey.  I have some
> tentative problems with using memory protection:
> 
> -----
> 
> I actually think that memory protection will be much slower than doing it
> all in software.  Here are the steps involved in handling a page
> fault:
> 
>     1) The kernel gets the page fault, puts together a signal context,
>        and invokes the handler.
>     2) The handler does the bookkeeping which is the real point of the
>        exercise.
>     3) Now the handler has to actually do the write.  In order for
>        that write to succeed without causing yet another page fault,
>        you have to somehow:
> 	a) unprotect the page
> 	b) do the write
> 	c) re-protect the page, to catch the next write
> 
> So you're talking signal handling overhead, and at least two system
> calls.  Notice also that the handler can't really find the SCM object
> being modified; it only gets the address being written to, and
> probably the value being written to it.  But you can't find the cell
> at the head of the vector, given the address of some word in the
> malloc'ed array of elements.
>   
> Suppose we tell people that they must call SCM_WRITE_BARRIER (obj,
> new) after changing any field in OBJ to point to NEW.  So, for
> example:
> 
> 	SCM_VELTS (vector)[5] = some_object;
> 	scm_write_barrier (vector, some_object);
> 
> The thing is, they *know* when they write the code that they're going to
> do a write, so they might as well call the write barrier directly.
> Then you've got function call overhead (which is less than signal
> dispatch overhead) and bookkeeping.  No system calls.  And you also
> get the information you actually want: the object being modified.  So
> you can check the generations of the objects, and if vector is older
> than some_object, then you stick vector in a list of objects to be
> used as roots when collecting younger generations.
> 
> The only weirdness here is: what if someone doesn't know they're
> mutating an object?  What if they've just got a pointer to an SCM, and
> they're storing something in it?  What if they've just got a void *,
> and they're copying bytes from another void * into it?  I think the
> answer here is that whoever gave them that pointer must have had the
> real SCM, and it's the SCM holder's job to make sure the write barrier
> gets called appropriately.  Thus:
> 
> 	frob (SCM_CDRLOC (pair));
> 	scm_write_barrier (pair, SCM_NEWEST);
> 
> SCM_NEWEST would be a magic value which tells the write barrier, "I
> don't know what I stored; I may or may not have pointers to younger
> objects now."
> 

Here are the relevant bits from my response (dueling emails ;). 

[on the performance]

Actually, I've been thinking that the best way to deal with this on
the heap level is to only trap on one fault, set a dirty bit, and
unprotect the memory (I'm not even sure that it's possible to get the
value that we were trying to write without getting really gross, I'd
just assumed it was possible when I did the notes). It does mean
scanning through the info on that heap when collecting, but since the
new heap segments are going to be very small, it's a short romp
through an array with bit checking, really. It still loses on user
objects, though. In truth, as I've started actual work on the gc, I'm
much less happy with using a write barrier, and it might not be
necessary (see below).

[later, on the write barrier in general]
The thing I don't like about this is that it starts putting the same
sort of requirements on the c authors as keeping track of roots by
hand, and requires the programmer to know and understand the details
of the gc intimately. In general, I feel it should be non-obtrusive if
there's another way.

I'm starting to lean more and more towards doing a full scan (which
would be going in there as an option, anyway, for systems where the
memory protection doesn't exist/work the way it should), particularly
since my work so far with the current gc showed a dramatic speed
increase just by reducing the size of heap segments (the reported
gc-time is from 2.5 to 5 times smaller on startup, the smaller
improvements come once you get into loading a config file, since it
ends up with as much sweeping work as the current collector at that
point, though it never lost the initial benefits on the stress tests I
threw at it... used more memory, though).
 
If you toss in only sweeping a small portion of existing segments on
most gcs, it's seeming less and less likely to me that the additional
memory and overhead of keeping track of the intergenerational pointers
will be a great advantage over scanning the set of live objects,
because the sweep phase really seems to be where we can make up the
most ground. In fact, with the improvements that I'm making to the
current gc, it might not even make a lot of sense to go completely
generational at all, just age the segments according to how times an
object has survived collection there, to give the gc a few more hints
on which segments to sweep.

-- 
Greg