This is the mail archive of the guile@cygnus.com mailing list for the guile project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
I compiled -O2 -g. I gave up on looking at profiling output & just started hacking out pieces of scm_regexp_exec one at a time to see where the time was spent. (BTW, this is under linux, rh4.2, gcc 2.7.2.1 & on a pentium II 300. My other tests were on a similar linux setup but on a pentium 120). Here're the results: guile-test1.scm: (debug-disable 'debug) (define r (make-regexp "^r")) (do ((i 0 (1+ i))) ((>= i 300000)) (regexp-exec r "asdr")) guile-test2.scm: (debug-disable 'debug) (define r (make-regexp "^r")) (define (f a b) #f) (do ((i 0 (1+ i))) ((>= i 300000)) (f r "asdr")) guile-test4.scm: (debug-disable 'debug) (define r (make-regexp "^r")) (do ((i 0 (1+ i))) ((>= i 300000))) Base case: hjstein@blinky:~/remote-cvs-pkgs/anon/guile-core$ time libguile/guile.np.std -c '(exit)' 0.35user 0.24system 0:00.59elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (140major+108minor)pagefaults 0swaps We've got a .6 second startup & exit time. Empty case: hjstein@blinky:~/remote-cvs-pkgs/anon/guile-core$ time libguile/guile.np.std -s guile-test4.scm 2.12user 0.27system 0:02.38elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (149major+111minor)pagefaults 0swaps hjstein@blinky:~/remote-cvs-pkgs/anon/guile-core$ time libguile/guile.np.std -s guile-test4.scm This is the best we can hope for - 2.38 seconds elapsed. Minimal function case: hjstein@blinky:~/remote-cvs-pkgs/anon/guile-core$ time libguile/guile.np.std -s guile-test2.scm 3.39user 0.30system 0:03.68elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (149major+111minor)pagefaults 0swaps hjstein@blinky:~/remote-cvs-pkgs/anon/guile-core$ time libguile/guile.np.std -s guile-test2.scm This is the best we can reasonably expect - 3.68 seconds elapsed. That one defun + executing it 300,000 times costs 1.3 seconds. How much the work really costs: hjstein@blinky:~/remote-cvs-pkgs/anon/guile-core$ time libguile/guile.np.std -s guile-test1.scm 8.92user 0.28system 0:09.21elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (153major+113minor)pagefaults 0swaps We're 5.5 seconds over the minimal function case, so it looks like regexp-exec itself costs 5.5 seconds. How much of this is overhead: hjstein@blinky:~/remote-cvs-pkgs/anon/guile-core$ time libguile/guile.np.noregex -s guile-test1.scm 3.81user 0.31system 0:04.11elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (149major+110minor)pagefaults 0swaps I commented out the call to regexec & ran it again. Comparing this to the normal regexp case shows that the system regexec is taking most of the time - 5.1 seconds, and scm_regexp_exec has .43 seconds of overhead. How a different regexp behaves in this test: hjstein@blinky:~/remote-cvs-pkgs/anon/guile-core$ time libguile/guile.np.gawk -s guile-test1.scm 5.13user 0.25system 0:05.40elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (141major+109minor)pagefaults 0swaps This is with gawk's regexp.c compiled in. It's 1.29 over the case with regexec commented out, so for this test, gawk's regexp takes 1.29 seconds vs the system regexp which takes 5.1 seconds. Cutting back the overhead: hjstein@blinky:~/remote-cvs-pkgs/anon/guile-core$ time libguile/guile.np.noregex.nomalloc -s guile-test1.scm 3.54user 0.30system 0:03.83elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (149major+110minor)pagefaults 0swaps This is with the regexec call & the calls to scm_must_malloc & scm_must_free commented out. This cuts out .18 seconds. Cutting back more overhead: hjstein@blinky:~/remote-cvs-pkgs/anon/guile-core$ time libguile/guile.np.noregex.nomalloc.nocoerce -s guile-test1.scm 3.48user 0.30system 0:03.77elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (149major+110minor)pagefaults 0swaps As above with SCM_COERCE_SUBSTR removed. Everything out: hjstein@blinky:~/remote-cvs-pkgs/anon/guile-core$ time libguile/guile.np.nothing -s guile-test1.scm 3.47user 0.32system 0:03.78elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (149major+110minor)pagefaults 0swaps Everything except for variable declarations & return commented out from scm_regexp_exec. This is the real minimal case, and it's slightly slower than the trival fcn case - .1 seconds. Evidentally dispatching regexp_exec is measurably (but slightly) slower than running a trivial fcn. Maybe because of the 2 optional arguments? In any case, comparing this to the case with the call to regexec commented out gives the real measure of overhead, namely .33 seconds - 3% of the time. gawk version: hjstein@blinky:~/remote-cvs-pkgs/anon/guile-core$ time gawk 'BEGIN { for (i=0; i<300000; i++) "asdr" ~ /^r/; }' </dev/null 0.80user 0.01system 0:00.80elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (108major+13minor)pagefaults 0swaps The equivalent of test1 for STk, namely: (define r (string->regexp "^r")) (do ((i 0 (1+ i))) ((>= i 300000)) (r "asdr")) hjstein@blinky:~/remote-cvs-pkgs/anon/guile-core$ time snow -file stk-test1.scm 3.16user 0.01system 0:03.17elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (114major+47minor)pagefaults 0swaps This is lower than guile's with the body of scm_regexp_exec commented out, and is even faster than this minus the guile startup time (albet just barely). STk with the trivial fcn: hjstein@blinky:~/remote-cvs-pkgs/anon/guile-core$ time snow -file stk-test2.scm 3.05user 0.01system 0:03.05elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (114major+47minor)pagefaults 0swaps Looks like STk's regexp package is faster than gawk's! STk with empty do loop: hjstein@blinky:~/remote-cvs-pkgs/anon/guile-core$ time snow -file stk-test3.scm 2.43user 0.01system 0:02.44elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (114major+47minor)pagefaults 0swaps hjstein@blinky:~/remote-cvs-pkgs/anon/guile-core$ time snow -file stk-test3.scm Now somehow STk is slower than the corresponding guile test! I'd guess that guile did 1 less gc call when removing the body of the do loop, and STk didn't. Just goes to show the difficulty of benchmarking... I'd like to compare to scm, but the ... person who built the scm rpm didn't seem to include the regex feature. Timing scm on the empty fcn, i.e. - (define (f x y) #f) (do ((i 0 (1+ i))) ((>= i 300000)) (f 3 "asdr")) yields: hjstein@blinky:~/remote-cvs-pkgs/anon/guile-core$ time scm -f scm-test1.scm 1.48user 0.00system 0:01.48elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (100major+69minor)pagefaults 0swaps This is substantially faster than guile even with startup time deleted. Conclusions: 1. The biggest help would be to use gawk's regexp.c instead of the system regexp calls, at least for this particular regexec call. Regular expression testing is complicated, so one would have to do many more tests for a variety of arguments (and a variety of platforms) before being certain about it. 2. Overhead in the regexp_exec call is significant, but not substantial. Two things could be done to reduce it. The easiest would be to malloc the space needed for matching in the rgx structure. Then the calls to scm_must_malloc & scm_must_free could be skipped. The other would be to skip the call to SCM_COERCE_SUBSTR, if that's possible. 3. Even with these changes, deleting startup time, and commenting everything out of scm_regexp_exec, gawk is still fundamentally faster gawk runtime is .8 seconds vs 3.19 seconds for guile - 4x slower! Guile's also slower than STk & scm, even though it's descended from scm. In particular, scm manages to run the trivial function loop about 2x faster than guile - startup time. Has the interpreter been changed so much that it's substantially slower now? Another possibility is that guile's sucking in so much stuff initially that the gc scans of memory are killing us, but I don't see an easy way to check this. I think the fundamental questions are really wrt points 3 & 4 - Why is guile so much slower than gawk, STk & scm? I can think of 2 things wrt gawk - gawk doesn't gc (i.e. - is just 1 big memory leak), and guile's interpreter is just slow. This would be in line with the comparisons to STk & scm, although, again, it could be the gc scanning of all that stuff that's scarfed up at the beginning. This leaves two questions: 1. How much does guile's eval differ from scm's? 2. How much more time is guile spending in gc because it's initially sucking up so much code? If the problem really is in 2, then it means that either we need a generational gc for guile (hard), or it *must* ship with compiled init files (not hard). The latter would also help startup time & would make the stuff in the init files run faster. I really don't understand the resistance to doing this. I know there are things that *should* be done with the startup files that would speed things up, but why not just compile them anyway? At least people wouldn't have to live through the startup times waiting for the day when things are cleaned up. A general testing issue - Doing benchmarking is a pain because of the need to include (debug-disable 'debug) at the top of files means I need different files for other scheme implementations. How about also -nodebug & -debug command line args & make the (configurable) default -nodebug? -- Harvey J. Stein BFM Financial Research hjstein@bfr.co.il