This is the mail archive of the guile@cygnus.com mailing list for the guile project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

Re: SCM_ASSIGN( Re: gc notes available )

To: telford@eng.uts.edu.au
Subject: Re: SCM_ASSIGN( Re: gc notes available )
From: Greg Harvey <Greg.Harvey@thezone.net>
Date: 20 Nov 1998 01:52:45 -0330
Cc: guile@cygnus.com
References: <m34ssq4y0d.fsf@thezone.net> <xy7lnlxbkan.fsf@mdj.nada.kth.se> <13885.27253.605489.589650@chl> <wwnpvb35iuc.fsf@totoro.red-bean.com> <19981109091521.09186@localhost> <m3iuggd2gf.fsf@thezone.net> <19981116131540.25482@localhost> <m390hccaky.fsf@thezone.net> <19981119094759.54799@localhost>

Tel <telford@eng.uts.edu.au> writes:

> > > If fact, if we go for SCM_ASSIGN(loc, val) then
> > > we can throw out stack scanning then and there because the C user
> > > is implicitly announcing the existence of a stack variable as soon
> > > as he/she assigns anything to it.
> > 
> > And if the stack shrinks and grows during that period? Then you get
> > much futzing about to figure out what does and doesn't require
> > unprotecting, and so on. It is a hassle. 
> 
> OK, to properly replace stack scanning the user would have to announce
> when they were finished with the SCM value, which is a major annoyance.
> Otherwise the same stack memory might be used for something else by the
> time gc comes around. Again, targetting the C++ user is easier than the
> C user because the compiler knows when the local scope is complete --
> dunno about the situation with longjmp() and C++ compilers, my guess
> is that the destructors are not called (look at the way g++ does exception
> handling, it's a nightmare).
>
> > This is just proposing that users to change x = y to SCM_ASSIGN(x,
> > y). You could train a text editor to do this. 
> 
> Reading everything in capitals gets tiring.

Yeah, I was just thinking it might be a macro. 

> 
> > Even better would be to require that a user either use an abstracted
> > interface in the places where they would currently be directly
> > modifying things (better for more reasons than making the write
> > barrier work, IMO)
> 
> Well a suitable abstract layer works as its own documentation system
> (to some extent), presuming you can build in ways to make the compiler
> warn about misuse of the interface functions.

Documentation is nice, though. Actually, this shouldn't be as strict a
requirement as I was originally thinking, since for an smob to be
useful it has to have functions to implement the required behaviour
from scheme, anyway. 

> > This isn't what I was talking about. If you are messing with chunks of
> > scheme values in such a way, it can be assumed that you'll have an
> > idea of what you need to do to keep the gc sane. This could involve
> > something like scm_memmove_protected, or just a function to let the gc
> > know that it's view of that particular piece of the world is a bit
> > screwed up. The gc at this point already knows what it considers
> > important to that block, but if you move it behind it's back, it can't
> > be expected to keep up.
> 
> I'm happy enough with the idea of some scheme functions that replace
> memmove, etc -- that shouldn't be too rough. If you are sure that the
> memory contains no SCM values then use the normal memmove, otherwise
> use the special one. A little bit intrusive but livable.

That's the optimal goal, I think.

> > There are lots of ways you can abuse c, but I don't see any
> > need to accomidate them.
> 
> That's what Nick Wirth said when he made Pascal.
> Thankfully, the world got over it.

Ugh... if I had to pick one family of languages that I'd sooner never
have seen (and I sort of echoed Wirth... I feel so dirty!). His
solution, though, was to remove the good with the bad; I think the
solution lies more with keeping the good and letting the bad blow
their feet off (i.e. the c way ;). The uses in guile will be a little
different, but then, so are most things.

> > The assurance it should be giving is that, if the original smob dies,
> > previous returns from copy(original) won't be affected. As long as
> > that holds, the smob can do whatever it wants to oblige.
> 
> OK, that sounds clear enough, just wanted to see it in writing.

I'm going to write most of this up and put it with the gc notes, as
well. This was mostly a bare-bones sort of thing to see if it would
irk anyone ;)

> > > Yes, ideally it would be nice to move things around to repack memory.
> > > That would give scheme one solid advantage over just about every compiled
> > > languages and would give massive improvement to the generational GC
> > > performance. It would also be an incredible number of pointers to track.
> > 
> > I'm not big on copying, period.
> 
> Try using memcpy() and memmove(), enjoy the difference :-)

Not in general, in the gc (admittedly, I've done some pretty gross
things with pointers & memmove, as well ;).

> My B-trees use blocks of items so that a small tree may actually
> fit inside a single block and not be a tree at all. Inserting within
> a block is a copying operation more like an array than a tree. The
> block size is a compile time constant. Thus, trees made from big blocks
> do more array operations and a few big mallocs, trees made from small
> blocks do more pointer operations and many small mallocs.
> 
> I tried some speed tests on a Cyrix processor and found that a block
> size of 100 items is considerably better than small blocks of 5 or 10
> items. Performance stays pretty flat from 100 to 300 and is getting
> notably worse around 500 items per block (but 500 is still better than 5).

The small items case could probably be dramatically improved by
pre-mallocing larger chunks of memory and doling them out by hand. The
new gc does exactly this for heap segments, and I think the gnu c lib
supplies something like this (I'm probably thinking of obstacks here,
which are sort of similar, but flexible in terms of block sizes, at
the cost of runtime). 

> The implication of this is that you can do quite a lot of copying
> and still be faster than tracing through pointer lists. Remember that
> most of the big memory usage is inside SMOBs but each SMOB still has
> a cons cell that acts as scheme's handle to that SMOB. Only the cons cell
> needs shuffling, the main bulk of SMOB memory can sit where it likes.

If a relocate function per smob isn't too much hassle, then it can be
mostly painless to the gc (and allow a lightweight representation of
baby objects). Again, this puts a bit more on the heads of smob
authors (we could also do this generically by just searching through
the smob for self-references, but the ability of an smob to relocate
itself would likely save a ton of time with large smobs).

(Copying beef)
It might still leave us with a second scan to fix up forwarded
pointers, though. There may be a way around this, by leaving the
forwarded pointers after the scan, forwarding pointers when look at
them (at run time [and I must say: ugh!] and the next mark), and then
removing the forwarded pointers during the next gc, at which point all
of the objects that pointed there should be fixed up.

> > > I'm running B-trees containing approx 10000 SCM data values which are
> > > a mix of symbols, integers and floats (and the odd string and SMOB).
> > > I'm hoping to push this further by a factor of at least 10 before I could
> > > consider the system getting into the workable region.
> 
> I made it to a bit past 20000 data values yesterday, some of which are
> themselves large matricies containing several thousand elements --
> total disc file length of 15M, consuming 30M when loaded into core.
> Unfortunately it takes over a minute to load it all up, but once in
> memory, operations are quite quick. Even GC is less than a second.
> The matricies don't store their contents as SCM values so from the GC
> point of view each matrix is one item.
> 
> > > As I said above, if we really do go with SCM_ASSIGN() then we have
> > > alrady declared every live stack value so we can even move objects that are
> > > pointed to from the stack (dangerous but fun!)
> > 
> > Not really, we wouldn't know if they were dead or alive, even if we
> > were treating stack values with something like SCM_ASSIGN, which isn't
> > too likely.
> 
> Yup, you are right, items pointed to by the stack can't be moved,
> there is just no way to know when the stack is no longer using the item.

Editing error (I wrote this bit first, then went back and elaborated
at the top... I wasn't aiming for the rub it in effect ;).

> > The user almost certainly won't be able to add a hook to the fault
> > handling, partly because that would be a bit of a mess (yeah, it might
> > be already, but it's not a mess that gives me the willies), and partly
> > because there probably isn't a sensible way that they can handle the
> > guts of the write barrier if it's possible to use one or the other.
> 
> I agree that user fault handling is a real mess. I was kinda thinking about
> the idea of using memmap to save my database directly in tree form.
> Then I could access database files directly rather than have the time
> taken to load and save. I suspect that if I did that then I couldn't
> use SCM values in the tree anymore because they would be all different
> next time the database was loaded, so I haven't considered it too hard.
> Anyhow, I just might need access to memory protection in order to
> implement such a device (it might be better off in it's own process space
> since copying each item would probably be inevitable).

I'm pretty sure that the software barrier is going to be the default,
for a couple of reasons, not the least of which is that it doesn't
restrict the usage of user memory mappings like these (there may be
other places as well that want control of SIGSEGV & SIGBUS,
particularly if you're talking about tying in to a large c
application).

Weighing both the existing literature on the relative costs of write
barriers, and what we can actually expect in terms of a memprotect
based barrier, the software barrier generally wins (even better when
we have a compiler for guile, since it may be able to optimize away
unneeded checks).

> > Actually, the point is that smobs actually know what smobs are doing,
> > and whether any particular bit of memory is (or can be) a scheme
> > value.  I'm making a (possibly too simplistic, there are probably
> > places where they overlap) distinction between end-user code, and smob
> > code. I think that smob code can be considered enough of a
> > modification to the core of guile that knowing a bit about how the gc
> > wants things to work isn't an awful lot to ask.
> 
> A lot of users want guile as an extension language and really do
> need SMOBs. Expect users to mostly write SMOBs. Knowing a bit about
> gc is OK provided there are fill-in-the-blanks style examples
> for people to follow. Most people don't get too shirty about
> using an abstraction layer provided it's not too cumbersome and
> restrictive. Some people even prefer the feeling of safety from
> doing things through an abstraction layer.

This shouldn't be difficult to provide, since I seem to be documenting
pretty much everything else ;). I can probably work this around a port
of an svgalib interface I wrote for elk, which should provide examples
for most of the stuff that'll be required (I'll add a few chunks
ofscheme values into the mix to make things interesting).

A bit of 'forced abstraction' could make it a bit easier to program
from the c side, since you're heavily promoting an interface close to
the scheme one, rather than requiring a greater knowledge of the
actual representation of objects. This is mostly the case now, but
might need to be slightly extended to cover system objects, as well.

-- 
Greg

References:
- Re: gc notes available
  - From: Christian Lynbech <chl@tbit.dk>
- Re: gc notes available
  - From: Jim Blandy <jimb@red-bean.com>
- Re: gc notes available
  - From: Tel <telford@eng.uts.edu.au>
- Re: gc notes available
  - From: Greg Harvey <Greg.Harvey@thezone.net>
- Re: gc notes available
  - From: Tel <telford@eng.uts.edu.au>
- Re: gc notes available
  - From: Greg Harvey <Greg.Harvey@thezone.net>
- SCM_ASSIGN( Re: gc notes available )
  - From: Tel <telford@eng.uts.edu.au>

Prev by Date: Re: Scheme style auto-resizing hashtable (fwd)
Next by Date: Re: Scheme style auto-resizing hashtable (fwd)
Prev by thread: SCM_ASSIGN( Re: gc notes available )
Next by thread: Re: gc notes available
Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]