This is the mail archive of the guile@cygnus.com mailing list for the guile project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Tel <telford@eng.uts.edu.au> writes: > > In this case, I think that a function or macro that does > > SCM_ASSIGN(loc, val) should be used. I don't think this is bad, and if > > they aren't using the write barrier, it can just expand into loc=val > > The one break that we do get with is that we don't have to worry too > > much about our possible roots being inside an object, it just means a > > little special care to remove those roots when the object dies (which > > won't be a problem, because they've registered their memory with us). > > As I asked before, if it's not too much to ask for the C user > to register each and ever write, then why do we bother with conservative > GC stack scanning when we can ask the user to register SCM variables > on the stack? If fact, if we go for SCM_ASSIGN(loc, val) then > we can throw out stack scanning then and there because the C user > is implicitly announcing the existence of a stack variable as soon > as he/she assigns anything to it. And if the stack shrinks and grows during that period? Then you get much futzing about to figure out what does and doesn't require unprotecting, and so on. It is a hassle. > As the world moves to C++ and operator overloading gets taken forgranted, > this sort of thing will seem a silly argument but I thought that there > was this intention to take pressure off the C user -- we can't seem > to trust them to even declare the top of the stack but we can trust > them to declare all write operations. Is my sanity leaking like an > old engine sump or is there a priority inversion occuring? Possibly, I'm just trying to work out how the write barrier will work (my personal opinion, let the user provide the stack top, and accept the responsibility). This is just proposing that users to change x = y to SCM_ASSIGN(x, y). You could train a text editor to do this. Even better would be to require that a user either use an abstracted interface in the places where they would currently be directly modifying things (better for more reasons than making the write barrier work, IMO), or to have some idea of what they have to do to make the gc work if they want to directly modify things (like creating smobs that deal with scheme values, functions that directly operate on chunks of memory, and so on). I'd think that most of guile already does this. Without thorough docs and a lack of impetus to write c code (I like lisp much better, obviously) accessable from guile, I don't really know for sure if this is the general case... things like the ability to directly get at a vector comes to mind as places where you should either use a system function, or need an idea about how to make the gc play along. > > > What if they've just got a void *, > > > and they're copying bytes from another void * into it? > > > > The simplest possible answer: They shouldn't do that! Is there any > > really good reason that we should be allowing (or rather, endorsing) > > this sort of thing? I've been trying to think of one since you brought > > this up, and I'm stumped. > > Oh yeah, so memmove() and memcpy() are right out. Yes, that's right > good boy users just shouldn't touch those nasty memory management routines. > What if I have a large block of SCM objects and want to insert a > new one into the middle? Do I write a loop of SCM_ASSIGN() or just > do a memmove() ? This isn't what I was talking about. If you are messing with chunks of scheme values in such a way, it can be assumed that you'll have an idea of what you need to do to keep the gc sane. This could involve something like scm_memmove_protected, or just a function to let the gc know that it's view of that particular piece of the world is a bit screwed up. The gc at this point already knows what it considers important to that block, but if you move it behind it's back, it can't be expected to keep up. It's not about limiting useful functionality like you described, but limiting how much we are willing to support applications that aren't particularly good ways of going about things. Copying from one void * to another with no idea about what you're copying is not a particularly good way of going about things, in or outside of guile. Depending on what's going on with the gc and how it's configured, a memcpy of 5 bytes from some anonymous address to another anonymous address might work. If it doesn't, I don't think that's a big deal. If you start writing random integers to the heap, it will likely kill guile. I don't think that's a big deal, either. There are lots of ways you can abuse c, but I don't see any need to accomidate them. > > The changes I can see that will be required to support a sane > > interface (this outside of the write barrier itself): > > > > 1) SCM_ASSIGN(SCM x, val); this [is described earlier (sorry 'bout > > that Jim, I messed this up in the email)] > > > > 2) smobs will have to add a copy(SCM from) function. Not big (I'd > > think most smobs would have something like this), but it means > > there's a sane way of getting at things from the end user's point > > of view. > > Yes, I agree that this sounds useful, however then you get into > issues of deep vs shallow copy and what exactly you mean by copying > an SMOB anyhow. Guidelines are required. The assurance it should be giving is that, if the original smob dies, previous returns from copy(original) won't be affected. As long as that holds, the smob can do whatever it wants to oblige. If the smob provides functions that do shallow and deep copying when necessary, so much the better. This is not so much a gc requirement, as one that requires nice smobs to create a more abstract interface (no necessity to fiddle with bytes that could require special attention in normal use). A deep copy might be more useful if the gc wants to move things, but I don't think it will. I'm not seeing this as a general problem, not having done any real smob programming, but more of a warning that there should be a way to get full functionality out of an smob, without the user having to know the actual physical layout. If not, it either breaks, or everyone who uses it has to know about the gc, which is something that would be great to avoid. > > 3) (if a copying/moving collector is enabled) smobs need a > > relocate(SCM current, SCM new) (not strictly necessary, but it > > shouldn't be an entirely big deal to have it, and it could be > > useful) > > Yes, ideally it would be nice to move things around to repack memory. > That would give scheme one solid advantage over just about every compiled > languages and would give massive improvement to the generational GC > performance. It would also be an incredible number of pointers to track. I'm not big on copying, period. It has to do a lot of work, and has as many drawbacks (particularly with a conservative gc) as it has benefits. With a fairly sensible allocator, and some smart object placement on the heap, fragmentation isn't the problem it's made out to be, for the most part. Occasional copying might have it's place, and it can certainly be implemented as an option, but life won't be too difficult if copying isn't going on. > I'm running B-trees containing approx 10000 SCM data values which are > a mix of symbols, integers and floats (and the odd string and SMOB). > I'm hoping to push this further by a factor of at least 10 before I could > consider the system getting into the workable region. > > As I said above, if we really do go with SCM_ASSIGN() then we have > alrady declared every live stack value so we can even move objects that are > pointed to from the stack (dangerous but fun!) Not really, we wouldn't know if they were dead or alive, even if we were treating stack values with something like SCM_ASSIGN, which isn't too likely. > > 4) (if using a memprotect based barrier) scm_malloc_protected, > > scm_realloc_protected, scm_free_protected: like the scm_must_xxx > > functions, these providing a chunk that will be stored in a mem > > protected region. Also, a scm_register_mem(SCM cell, void > > *prot_chunk); might prove useful, but not strictly necessary. > > Hmmm, if the user is grabbing protected slabs of memory then > what happens when one of them faults? Can the user also add hooks > to the fault handling code? This seems like a can of worms. Preferably we'd have a way to get know if it faulted on mem that could contain a pointer. If we don't know, we just put address foo on the list for the next collect of generation x if foo points to an object there (this with the `software' write barrier, which is mostly the motivation for these... with the hardware, we just mark it dirty and futz around when collection comes). The user almost certainly won't be able to add a hook to the fault handling, partly because that would be a bit of a mess (yeah, it might be already, but it's not a mess that gives me the willies), and partly because there probably isn't a sensible way that they can handle the guts of the write barrier if it's possible to use one or the other. > > 5) smobs need a possible_pointers function, that can communicate to us > > where we can find pointers. This is closely related to mark, and > > shouldn't be difficult to provide. (really hairy smobs might make > > life painful here). > > Better to just define a marking function that passes the pointer > to the SCM rather than the SCM itself. OK, in many cases it is a double > indirection but this way forcing a mark also forces declaration of all > the pointers. Besides SCM_ASSIGN has already declared pointers! Actually, the point is that smobs actually know what smobs are doing, and whether any particular bit of memory is (or can be) a scheme value. I'm making a (possibly too simplistic, there are probably places where they overlap) distinction between end-user code, and smob code. I think that smob code can be considered enough of a modification to the core of guile that knowing a bit about how the gc wants things to work isn't an awful lot to ask. For the most part, this just boils down to using some specific functions for certain types of memory, and probably providing us with a bit more information about what can be put in the memory (a performance win, if nothing else). For end user code, on the other hand, they just have to use SCM_ASSIGN, rather than =. The only motivation for this at all is the cases where the guts of any particular object are exposed to the user (like vectors), and it may be better to have and use an abstracted interface to these sorts of things, anyway. Direct modification should either be happening behind the scenes to most users, or be implemented with some knowledge of the stickier details. > The real trick is that when you want to repack memory, you force the > SMOB to do a mark operation and you tweak each SCM while it is marking. > Naturally, any const declarations are out the window but guile pretty > much is that way at the moment. Using this means that the SMOBs are doing > the pointer tracking for you, all you have to actively track is the > stack pointers. > > > 6) a lynch mob, for everyone who wants the unlimited freedom to copy > > unknown objects from void * to void * without breaking anything ;) > > Any day dude! If you think I'm giving up my memmove, you better offer > me some REALLY good reasons. Basically, the only reason that I would > accept is an equivalent performance improvement in GC and dynamic memory > repacking. > Providing the functionality of the various mem handling functions with bits that play nice with the system isn't a problem. Trying to make everything work when someone is chucking around anonymous pointers isn't worth the effort (of course, neither is a lynch mob, they're so bloody expensive nowadays!). I'm sure I probably missed something important in there. It's late, and this is long. -- Greg, even the cost of quality rope is killer!