This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: gc notes available


Tel <telford@eng.uts.edu.au> writes:

> > In this case, I think that a function or macro that does
> > SCM_ASSIGN(loc, val) should be used. I don't think this is bad, and if
> > they aren't using the write barrier, it can just expand into loc=val
> > The one break that we do get with is that we don't have to worry too
> > much about our possible roots being inside an object, it just means a
> > little special care to remove those roots when the object dies (which
> > won't be a problem, because they've registered their memory with us).
> 
> As I asked before, if it's not too much to ask for the C user
> to register each and ever write, then why do we bother with conservative
> GC stack scanning when we can ask the user to register SCM variables
> on the stack? If fact, if we go for SCM_ASSIGN(loc, val) then
> we can throw out stack scanning then and there because the C user
> is implicitly announcing the existence of a stack variable as soon
> as he/she assigns anything to it.

And if the stack shrinks and grows during that period? Then you get
much futzing about to figure out what does and doesn't require
unprotecting, and so on. It is a hassle. 

> As the world moves to C++ and operator overloading gets taken forgranted,
> this sort of thing will seem a silly argument but I thought that there
> was this intention to take pressure off the C user -- we can't seem
> to trust them to even declare the top of the stack but we can trust
> them to declare all write operations. Is my sanity leaking like an
> old engine sump or is there a priority inversion occuring?

Possibly, I'm just trying to work out how the write barrier will work
(my personal opinion, let the user provide the stack top, and accept
the responsibility).

This is just proposing that users to change x = y to SCM_ASSIGN(x,
y). You could train a text editor to do this. 

Even better would be to require that a user either use an abstracted
interface in the places where they would currently be directly
modifying things (better for more reasons than making the write
barrier work, IMO), or to have some idea of what they have to do to
make the gc work if they want to directly modify things (like creating
smobs that deal with scheme values, functions that directly operate on
chunks of memory, and so on). I'd think that most of guile already
does this. Without thorough docs and a lack of impetus to write c code
(I like lisp much better, obviously) accessable from guile, I don't
really know for sure if this is the general case... things like the
ability to directly get at a vector comes to mind as places where you
should either use a system function, or need an idea about how to make
the gc play along.
 
> > > What if they've just got a void *,
> > > and they're copying bytes from another void * into it?  
> > 
> > The simplest possible answer: They shouldn't do that! Is there any
> > really good reason that we should be allowing (or rather, endorsing)
> > this sort of thing? I've been trying to think of one since you brought
> > this up, and I'm stumped.
> 
> Oh yeah, so memmove() and memcpy() are right out. Yes, that's right
> good boy users just shouldn't touch those nasty memory management routines.
> What if I have a large block of SCM objects and want to insert a
> new one into the middle? Do I write a loop of SCM_ASSIGN() or just
> do a memmove() ?

This isn't what I was talking about. If you are messing with chunks of
scheme values in such a way, it can be assumed that you'll have an
idea of what you need to do to keep the gc sane. This could involve
something like scm_memmove_protected, or just a function to let the gc
know that it's view of that particular piece of the world is a bit
screwed up. The gc at this point already knows what it considers
important to that block, but if you move it behind it's back, it can't
be expected to keep up.

It's not about limiting useful functionality like you described, but
limiting how much we are willing to support applications that aren't
particularly good ways of going about things.

Copying from one void * to another with no idea about what you're
copying is not a particularly good way of going about things, in or
outside of guile.  Depending on what's going on with the gc and how
it's configured, a memcpy of 5 bytes from some anonymous address to
another anonymous address might work. If it doesn't, I don't think
that's a big deal. If you start writing random integers to the heap,
it will likely kill guile. I don't think that's a big deal,
either. There are lots of ways you can abuse c, but I don't see any
need to accomidate them.


> > The changes I can see that will be required to support a sane
> > interface (this outside of the write barrier itself):
> > 
> > 1) SCM_ASSIGN(SCM x, val); this [is described earlier (sorry 'bout
> >    that Jim, I messed this up in the email)]
> > 
> > 2) smobs will have to add a copy(SCM from) function. Not big (I'd
> >    think most smobs would have something like this), but it means
> >    there's a sane way of getting at things from the end user's point
> >    of view.
> 
> Yes, I agree that this sounds useful, however then you get into
> issues of deep vs shallow copy and what exactly you mean by copying
> an SMOB anyhow. Guidelines are required.

The assurance it should be giving is that, if the original smob dies,
previous returns from copy(original) won't be affected. As long as
that holds, the smob can do whatever it wants to oblige. If the smob
provides functions that do shallow and deep copying when necessary, so
much the better. This is not so much a gc requirement, as one that
requires nice smobs to create a more abstract interface (no necessity
to fiddle with bytes that could require special attention in normal
use). A deep copy might be more useful if the gc wants to move things,
but I don't think it will. 

I'm not seeing this as a general problem, not having done any real
smob programming, but more of a warning that there should be a way to
get full functionality out of an smob, without the user having to know
the actual physical layout. If not, it either breaks, or everyone who
uses it has to know about the gc, which is something that would be
great to avoid.

> > 3) (if a copying/moving collector is enabled) smobs need a
> >    relocate(SCM current, SCM new) (not strictly necessary, but it
> >    shouldn't be an entirely big deal to have it, and it could be
> >    useful)
> 
> Yes, ideally it would be nice to move things around to repack memory.
> That would give scheme one solid advantage over just about every compiled
> languages and would give massive improvement to the generational GC
> performance. It would also be an incredible number of pointers to track.

I'm not big on copying, period. It has to do a lot of work, and has as
many drawbacks (particularly with a conservative gc) as it has
benefits. With a fairly sensible allocator, and some smart object
placement on the heap, fragmentation isn't the problem it's made out
to be, for the most part. Occasional copying might have it's place,
and it can certainly be implemented as an option, but life won't be
too difficult if copying isn't going on.

> I'm running B-trees containing approx 10000 SCM data values which are
> a mix of symbols, integers and floats (and the odd string and SMOB).
> I'm hoping to push this further by a factor of at least 10 before I could
> consider the system getting into the workable region.
> 
> As I said above, if we really do go with SCM_ASSIGN() then we have
> alrady declared every live stack value so we can even move objects that are
> pointed to from the stack (dangerous but fun!)

Not really, we wouldn't know if they were dead or alive, even if we
were treating stack values with something like SCM_ASSIGN, which isn't
too likely.

> > 4) (if using a memprotect based barrier) scm_malloc_protected,
> >    scm_realloc_protected, scm_free_protected: like the scm_must_xxx
> >    functions, these providing a chunk that will be stored in a mem
> >    protected region. Also, a scm_register_mem(SCM cell, void
> >    *prot_chunk); might prove useful, but not strictly necessary.
> 
> Hmmm, if the user is grabbing protected slabs of memory then
> what happens when one of them faults? Can the user also add hooks
> to the fault handling code? This seems like a can of worms.

Preferably we'd have a way to get know if it faulted on mem that could
contain a pointer. If we don't know, we just put address foo on the
list for the next collect of generation x if foo points to an object
there (this with the `software' write barrier, which is mostly the
motivation for these... with the hardware, we just mark it dirty and
futz around when collection comes).

The user almost certainly won't be able to add a hook to the fault
handling, partly because that would be a bit of a mess (yeah, it might
be already, but it's not a mess that gives me the willies), and partly
because there probably isn't a sensible way that they can handle the
guts of the write barrier if it's possible to use one or the other.
 
> > 5) smobs need a possible_pointers function, that can communicate to us
> >    where we can find pointers. This is closely related to mark, and
> >    shouldn't be difficult to provide. (really hairy smobs might make
> >    life painful here).
> 
> Better to just define a marking function that passes the pointer
> to the SCM rather than the SCM itself. OK, in many cases it is a double
> indirection but this way forcing a mark also forces declaration of all
> the pointers. Besides SCM_ASSIGN has already declared pointers!

Actually, the point is that smobs actually know what smobs are doing,
and whether any particular bit of memory is (or can be) a scheme
value.  I'm making a (possibly too simplistic, there are probably
places where they overlap) distinction between end-user code, and smob
code. I think that smob code can be considered enough of a
modification to the core of guile that knowing a bit about how the gc
wants things to work isn't an awful lot to ask. For the most part,
this just boils down to using some specific functions for certain
types of memory, and probably providing us with a bit more information
about what can be put in the memory (a performance win, if nothing
else).

For end user code, on the other hand, they just have to use
SCM_ASSIGN, rather than =.  The only motivation for this at all is the
cases where the guts of any particular object are exposed to the user
(like vectors), and it may be better to have and use an abstracted
interface to these sorts of things, anyway. Direct modification should
either be happening behind the scenes to most users, or be implemented
with some knowledge of the stickier details.

> The real trick is that when you want to repack memory, you force the
> SMOB to do a mark operation and you tweak each SCM while it is marking.
> Naturally, any const declarations are out the window but guile pretty
> much is that way at the moment. Using this means that the SMOBs are doing
> the pointer tracking for you, all you have to actively track is the
> stack pointers.
> 
> > 6) a lynch mob, for everyone who wants the unlimited freedom to copy
> >    unknown objects from void * to void * without breaking anything ;)
> 
> Any day dude! If you think I'm giving up my memmove, you better offer
> me some REALLY good reasons. Basically, the only reason that I would
> accept is an equivalent performance improvement in GC and dynamic memory
> repacking.
> 

Providing the functionality of the various mem handling functions with
bits that play nice with the system isn't a problem. Trying to make
everything work when someone is chucking around anonymous pointers
isn't worth the effort (of course, neither is a lynch mob, they're so
bloody expensive nowadays!).

I'm sure I probably missed something important in there. It's late,
and this is long.
-- 
Greg, even the cost of quality rope is killer!