grepping a large file through a pipe takes eons

Jim Reisert AD1C jjreisert@alum.mit.edu
Sat Aug 31 15:59:11 GMT 2024


Hi Folks,

Something has changed in the last month or two.  I have a very large
file I am trying to grep (465 MB):

-rwxrw----+ 1 jjrei jjrei 465092052 Aug 31 09:39 all_spots.txt


If I grep for something near the end of the file, the results return right away:

# time grep -n N0FUL all_spots.txt

17027336:N0FUL,20240615,20240615,1
17027337:N0FUL,20240629,20240629,1

real    0m0.190s
user    0m0.078s
sys     0m0.078s


If I pipe the file through cat, grep takes much longer:

# time cat all_spots.txt | grep -n N0FUL

17027336:N0FUL,20240615,20240615,1
17027337:N0FUL,20240629,20240629,1


real    1m4.934s
user    0m0.031s
sys     0m0.124s


BACKGROUND:  The large file is really about 400 smaller files appended
together.  Normally I simply cat these smaller files to perform the
grep.  This is what I noticed slowed down quite a bit:

# cat spots/*/*.txt | grep N0FUL

I wanted to see if "cat" was the culprit, so I created the large file
ahead of time.  "cat" does not seem to be the issue.


What could explain this difference?  I have only noticed the problem
in the last month or two.  The appended file has not changed
significantly in size (less than 1%).

I'm running Windows 11 Pro (64-bit):

    *  Version 10.0.22635.4145
    *  12th Gen Intel(R) Core(TM) i7-12700   2.10 GHz
    *  64 GB of RAM
    *  Samsung SSD 990 Pro 1TB (M.2)

I realize there are several moving parts here - Cygwin, Windows, etc.
Any advice would be appreciated.

I like to copy/paste the results of "grep", but doing the grep over
all 400 files results in prepending each line of the results with the
file name, which I would then have to remove.  So I prefer a solution
where the file name is not present.  I now realize I can use "grep -h"
for this, but I still prefer the original solution.  I really want to
know if something in Cygwin (or Windows) may have changed.

-- 
Jim Reisert AD1C, <jjreisert@alum.mit.edu>, https://ad1c.us
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cygcheck.out
Type: application/octet-stream
Size: 107064 bytes
Desc: not available
URL: <https://cygwin.com/pipermail/cygwin/attachments/20240831/39233432/attachment-0001.obj>


More information about the Cygwin mailing list