This is the mail archive of the guile@cygnus.com mailing list for the guile project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
In short, some of this will be fixed, but you'll never beat grep. A couple of months back I wrote some code that parses large (10MB+) logfiles of a very frequent basis (every 30 minutes) and produces tabular results from this. Because many of the fields in this file have a tendency to have partially broken data in them whose values are still very important, I ended up having to use the POSIX regex interface and large regex tables with multiple patterns, etc. This code does several hundred regcomp's when the program starts, and then upwards of a hundred or so regexec's per record (approx. 5k-10k records per file). The standard libc regex was a bit slow. gnu rx-1.5 didn't work properly. It would consistently crash, but when it worked it was a little faster... Because the rx-1.5 stuff crashed I just used the default cruft in libc until I had time to have a play with the 'regex' directory in the rx-1.5 distribution. This is a slightly more complete and much more robust version of the rx-1.5 code (which is built in the top level directory of rx-1.5). This code (which has had much of it's recent work done by some character calling himself Jim Blandy, an unlikely name if ever I heard one) is about 7 times faster then what I previously used. It is really very fast. It brought the run times down from 2m30s to about 20 seconds. This code rules. I thoroughly recommend it. It's very fast an reliable compared to what my default libc cruft is. I'm not sure why its hidden inside the gnu rx-1.5 distribution though. grep however, is apparently still _much_ faster again. Presumably it doesn't need all of the other cruft required to maintain the location sub-matches, etc. gnu grep kicks butt. -Chris P.S. If it matters, this code was/is run on a Sparc something 110 (Solaris 2.5.1) and a PPro200 (linux 2.1.x, w/libc-5.4.33).