This is the mail archive of the guile@cygnus.com mailing list for the guile project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Some benchmarking notes/results using extract.scm (appended below). 1. I moved almost all defines to the top level because I discovered that they were being evaled for each fcn invocation. 2. Timing readline on all the C code in scwm gives ~1.8 seconds to read all the code: guile> (define (test-read r p) (if (eof-object? (r p)) #t (test-read r p))) guile> (define (test-read-f r f) (call-with-input-file f (lambda (p) (test-read r p)))) guile> (with-profiling (test-read-f) (for-each (lambda (f) (test-read-f read-line f)) testfilelist)) Function Called Time ------------------- ---------- --------- test-read-f 45.0 1.770 3. Using %read-line instead shaves off about .7 seconds: guile> (define (mrl p) (car (%read-line p))) guile> (with-profiling (test-read-f) (for-each (lambda (f) (test-read-f mrl f)) testfilelist)) Function Called Time ------------------- ---------- --------- test-read-f 45.0 1.060 4. gawk does it in .2 seconds: hjstein@bacall:~/remote-cvs-pkgs/scwm/scwm$ time awk 'END {print NR}' *.c 26032 0.16user 0.05system 0:00.21elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (304major+243minor)pagefaults 0swaps 5. Perl takes .3 seconds: hjstein@bacall:~/remote-cvs-pkgs/scwm/scwm$ time perl -e 'while (<>) { } ; print "$.\n";' *.c 26032 0.27user 0.02system 0:00.29elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (170major+63minor)pagefaults 0swaps Thus, although %read-line would need to be ~7x faster to catch awk, or ~3x faster to catch perl, it's still not horrible. 6. The extraction portion of extract.scm seems to be spending a large chunk of it's time in regexp-exec. I'm effectively calling 2 regexps on each line - 1 to see if it's a SCWM_PROC & the 2nd when the 1st fails (which is usually) to see if it's an embedded document item. The calls to regexp-exec seem to be taking the bulk of the processing time: guile> (with-profiling (extract-docs-from-port) (define d (apply extract-docs-from-files testfilelist))) Function Called Time ------------------- ---------- --------- extract-docs-from-port 45.0 9.130 guile> (with-profiling (extract-docs-from-port regexp-exec) (define d (apply extract-docs-from-files testfilelist))) Function Called Time ------------------- ---------- --------- regexp-exec 51506.0 9.430 extract-docs-from-port 45.0 16.450 There's about 1 second of profiling overhead/10,000 calls in the parent, & about .5 seconds/10,000 calls in the routine itself, so about 6 second out of 9 are spent in regexp-exec. 7. Doing similar work with awk yields: hjstein@bacall:~/remote-cvs-pkgs/scwm/scwm$ time awk '/^[ \t]*SCWM_PROC[ \t]*\(/ {print $0} /^[ \t]*SCWM_PROC[ \t]*\(/ {print $0}' *.c >/dev/null 0.65user 0.06system 0:00.70elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (309major+451minor)pagefaults 0swaps According to this, guile regexps are >15x slower than awk's. Of course, the scheme version is doing much more work, so: 8. Doing the same as the above awk gives: guile> (define (for-each-line f p) (let loop ((l (car (%read-line p)))) (if (eof-object? l) #t (begin (f l) (loop (car (%read-line p))))))) guile> (define (for-each-line-f f file) (call-with-input-file file (lambda (p) (for-each-line f p)))) guile> (with-profiling (for-each-line-f) (for-each (lambda (f) (for-each-line-f (lambda (l) (regexp-exec proc-start-match l)) f)) testfilelist)) Function Called Time ------------------- ---------- --------- for-each-line-f 45.0 3.980 guile> (with-profiling (for-each-line-f) (for-each (lambda (f) (for-each-line-f (lambda (l) (regexp-exec proc-start-match l) (regexp-exec doc-start-match l)) f)) testfilelist)) Function Called Time ------------------- ---------- --------- for-each-line-f 45.0 6.910 So, of the 9 seconds it's taking extract.scm to read & parse the input files, about 7 seconds is devoted to reading each line (~1 second) and applying two regular expressions. This makes guile about 10x slower than awk. Putting the regexps together gives a decent improvement: guile> (with-profiling (for-each-line-f) (for-each (lambda (f) (for-each-line-f (lambda (l) (regexp-exec proc-or-doc-start l)) f)) testfilelist)) Function Called Time ------------------- ---------- --------- for-each-line-f 45.0 4.560 Can this be improved? Please? -- Harvey J. Stein BFM Financial Research hjstein@bfr.co.il