This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: grep


    In short, some of this will be fixed, but you'll never beat grep.

A couple of months back I wrote some code that parses large (10MB+) logfiles
of a very frequent basis (every 30 minutes) and produces tabular results
from this.

Because many of the fields in this file have a tendency to have partially
broken data in them whose values are still very important, I ended up having
to use the POSIX regex interface and large regex tables with multiple
patterns, etc.

This code does several hundred regcomp's when the program starts, and then
upwards of a hundred or so regexec's per record (approx. 5k-10k records per
file).

The standard libc regex was a bit slow. gnu rx-1.5 didn't work properly. It
would consistently crash, but when it worked it was a little faster...

Because the rx-1.5 stuff crashed I just used the default cruft in libc until
I had time to have a play with the 'regex' directory in the rx-1.5
distribution. This is a slightly more complete and much more robust version
of the rx-1.5 code (which is built in the top level directory of rx-1.5).

This code (which has had much of it's recent work done by some character
calling himself Jim Blandy, an unlikely name if ever I heard one) is about 7
times faster then what I previously used. It is really very fast. It brought
the run times down from 2m30s to about 20 seconds.

This code rules. I thoroughly recommend it. It's very fast an reliable
compared to what my default libc cruft is. I'm not sure why its hidden
inside the gnu rx-1.5 distribution though.

grep however, is apparently still _much_ faster again. Presumably it doesn't
need all of the other cruft required to maintain the location sub-matches,
etc.

gnu grep kicks butt.



-Chris

P.S. If it matters, this code was/is run on a Sparc something 110 (Solaris
     2.5.1) and a PPro200 (linux 2.1.x, w/libc-5.4.33).