1.7] BUG - GREP slows to a crawl with large number of matches on a single file
Thomas Wolff
towo@towo.net
Fri Nov 6 13:13:00 GMT 2009
Christopher Faylor wrote:
> On Thu, Nov 05, 2009 at 07:11:02PM -0800, Linda Walsh wrote:
>
>> aputerguy wrote:
>>
>>> Running grep on a 20MB file with ~100,000 matches takes an incredible almost
>>> 8 minutes under Cygwin 1.7 while taking just 0.2 seconds under Cygwin 1.5
>>> (on a 2nd machine).
>>>
>> I've seen nasty behavior with grep that isnt' cygwin specific. Try
>> "pcregrep" and see if you have the same issue.
>>
>> I found it to be about ~100 times faster under _some_ searches though
>> 2-3x is more typical. The gnu re-parser isn't real efficient under
>> some circumstances.
>>
>> If you find a big difference, you might also want to report it to the
>> bug-grep@gnu.org mailing list, but last time I did, they told me
>> "that's the way it is" due to some posix conformance thing...
>>
>
> The fact that it behaves differently between Cygwin 1.5 and 1.7 would
> suggest that this isn't a grep problem.
>
This is likely to be triggered by the transition to UTF-8 as a default
charset. The same problem is observed on Linux, with grep as well as
with sed.
That's why I have changed most of my shell scripts to use something like
LC_ALL=C grep or LC_ALL=C sed
where possible. Please try this.
The problem *is* with grep (and sed), however, because there is no good
reason that UTF-8 should give us a penalty of being 100times slower on
most search operations, this is just poor programming of grep and sed.
Thomas
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
More information about the Cygwin
mailing list