grepping a large file through a pipe takes eons
Jim Reisert AD1C
jjreisert@alum.mit.edu
Sat Aug 31 15:59:11 GMT 2024
Hi Folks,
Something has changed in the last month or two. I have a very large
file I am trying to grep (465 MB):
-rwxrw----+ 1 jjrei jjrei 465092052 Aug 31 09:39 all_spots.txt
If I grep for something near the end of the file, the results return right away:
# time grep -n N0FUL all_spots.txt
17027336:N0FUL,20240615,20240615,1
17027337:N0FUL,20240629,20240629,1
real 0m0.190s
user 0m0.078s
sys 0m0.078s
If I pipe the file through cat, grep takes much longer:
# time cat all_spots.txt | grep -n N0FUL
17027336:N0FUL,20240615,20240615,1
17027337:N0FUL,20240629,20240629,1
real 1m4.934s
user 0m0.031s
sys 0m0.124s
BACKGROUND: The large file is really about 400 smaller files appended
together. Normally I simply cat these smaller files to perform the
grep. This is what I noticed slowed down quite a bit:
# cat spots/*/*.txt | grep N0FUL
I wanted to see if "cat" was the culprit, so I created the large file
ahead of time. "cat" does not seem to be the issue.
What could explain this difference? I have only noticed the problem
in the last month or two. The appended file has not changed
significantly in size (less than 1%).
I'm running Windows 11 Pro (64-bit):
* Version 10.0.22635.4145
* 12th Gen Intel(R) Core(TM) i7-12700 2.10 GHz
* 64 GB of RAM
* Samsung SSD 990 Pro 1TB (M.2)
I realize there are several moving parts here - Cygwin, Windows, etc.
Any advice would be appreciated.
I like to copy/paste the results of "grep", but doing the grep over
all 400 files results in prepending each line of the results with the
file name, which I would then have to remove. So I prefer a solution
where the file name is not present. I now realize I can use "grep -h"
for this, but I still prefer the original solution. I really want to
know if something in Cygwin (or Windows) may have changed.
--
Jim Reisert AD1C, <jjreisert@alum.mit.edu>, https://ad1c.us
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cygcheck.out
Type: application/octet-stream
Size: 107064 bytes
Desc: not available
URL: <https://cygwin.com/pipermail/cygwin/attachments/20240831/39233432/attachment-0001.obj>
More information about the Cygwin
mailing list