1.7] BUG - GREP slows to a crawl with large number of matches on a single file

Fri Nov 6 13:13:00 GMT 2009

Christopher Faylor wrote:
> On Thu, Nov 05, 2009 at 07:11:02PM -0800, Linda Walsh wrote:
>   
>> aputerguy wrote:
>>     
>>> Running grep on a 20MB file with ~100,000 matches takes an incredible almost
>>> 8 minutes under Cygwin 1.7 while taking just 0.2 seconds under Cygwin 1.5
>>> (on a 2nd machine).
>>>       
>> I've seen nasty behavior with grep that isnt' cygwin specific.  Try
>> "pcregrep" and see if you have the same issue.
>>
>> I found it to be about ~100 times faster under _some_ searches though
>> 2-3x is more typical.  The gnu re-parser isn't real efficient under
>> some circumstances.
>>
>> If you find a big difference, you might also want to report it to the
>> bug-grep@gnu.org mailing list, but last time I did, they told me
>> "that's the way it is" due to some posix conformance thing...
>>     
>
> The fact that it behaves differently between Cygwin 1.5 and 1.7 would
> suggest that this isn't a grep problem.
>   
This is likely to be triggered by the transition to UTF-8 as a default 
charset. The same problem is observed on Linux, with grep as well as 
with sed.
That's why I have changed most of my shell scripts to use something like
LC_ALL=C grep or LC_ALL=C sed
where possible. Please try this.

The problem *is* with grep (and sed), however, because there is no good 
reason that UTF-8 should give us a penalty of being 100times slower on 
most search operations, this is just poor programming of grep and sed.

Thomas

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple