This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: regexp profiling hell....



Okay, that certainly suggests a plan of action.

> Conclusions:
> 
> 1. The biggest help would be to use gawk's regexp.c instead of the system
> regexp calls, at least for this particular regexec call.  Regular
> expression testing is complicated, so one would have to do many more
> tests for a variety of arguments (and a variety of platforms) before
> being certain about it.

Okay.  If you submit patches to make Guile use GAWK's regexp.c, I will
apply them.  I think it's dumb for Guile to be in the regexp business,
but every other interpreted I know of has their own, probably for
reasons similar to the ones you've uncovered here.

You need to change the regexp sources to be namespace-clean; that is,
all externally visible symbols, either in the library or the header
files, must begin with scm_ or SCM_.  This is give us some hope of not
squashing the user's variables.

> 2. Overhead in the regexp_exec call is significant, but not
> substantial.  Two things could be done to reduce it.  The easiest
> would be to malloc the space needed for matching in the rgx
> structure.  Then the calls to scm_must_malloc & scm_must_free could be
> skipped.  The other would be to skip the call to SCM_COERCE_SUBSTR, if
> that's possible.

I'm not so interested in this change, but if you want to implement
something that does this, and supports all the public operations in
regex.scm, I'll apply it.

> 3. Even with these changes, deleting startup time, and commenting
> everything out of scm_regexp_exec, gawk is still fundamentally faster
> gawk runtime is .8 seconds vs 3.19 seconds for guile - 4x slower!
> Guile's also slower than STk & scm, even though it's descended from
> scm.  In particular, scm manages to run the trivial function loop
> about 2x faster than guile - startup time.  Has the interpreter been
> changed so much that it's substantially slower now?  Another
> possibility is that guile's sucking in so much stuff initially that
> the gc scans of memory are killing us, but I don't see an easy way to
> check this.

There have been substantial speedups in SCM recently, although I don't
know whether those made it into your RPM.

Chris Hanson (one of the RnRS authors, and one of the MIT Scheme guys)
has volunteered to write a bytecode compiler/interpreter for Guile, as
well as a front end for the debugger.  That should improve performance
some, and give us low-overhead debugging as well.  This should also
reduce the GC overhead, by placing most variable bindings on a stack,
rather than in the heap.


> A general testing issue - Doing benchmarking is a pain because of the
> need to include (debug-disable 'debug) at the top of files means I
> need different files for other scheme implementations.  How about also
> -nodebug & -debug command line args & make the (configurable) default
> -nodebug?

If you put a file init.scm somewhere in your GUILE_LOAD_PATH, then
that should be loaded whenever Guile starts up.  You can put the
(debug-disable 'debug) there.