This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: locate and updatedb


On 2/16/2016 5:55 PM, Buchbinder, Barry (NIH/NIAID) [E] wrote:
Linda Walsh sent the following at Saturday, February 13, 2016 7:15 AM
Marco Atzeri wrote: ---
On 11/02/2016 19:33, Byron Boulton wrote:
On 2/11/2016 1:18 PM, cyg Simple wrote:
On 2/11/2016 9:00 AM, Byron Boulton wrote:
Does anyone here have success using `updatedb` and `locate` in
cygwin? I use `locate` heavily on my Linux machines, but everytime
I've tried to run `updatedb` on cygwin I've given up and killed the
process because it is taking too long.
There's a reason why on linux it is usually set to run when you are asleep.  ;-)

  Is there something wrong with cygwin's implementation of
`updatedb` making it not work at all or making it slower that on my
Linux machines? Or are there others who have success using it on
cygwin?

But it might have to do with disk speed and memory. Laptop drives are
usually among the slowest.

I ran it just now (this is with MS's Home Essentials real-time
protection turned on).
locate / >/tmp/all
wc /tmp/all
  1479146   4014375 133322318 /tmp/all
df .

law.Bliss/bin> time index_files.sh 670592 (process ID) old priority 0,
new priority 19 44.21sec 15.06usr 28.30sys (98.09% cpu) Filesystem Size
Used Avail Use% Mounted on C: 949G 585G 365G 62% / ----

So ~1.4 million files... Using the following exclusions:
  Local+=" /windows/sysnative/."

---(index_files.sh)---- renice +19 $$ Local="/" if [[ -d
/windows/sysnative/. ]]; then fi Prunepaths='/.usr /proc /C /B /H /I
/M /D /P /System[[:space:]]Volume[[:space:]]Information /Windows/CSC
/pagefile.sys /Music /Pictures /Share /Media /home /Doc /$RECYCLE.BIN
/cygdrive'

/bin/updatedb --findoptions=-noleaf --localpaths="$Local"
--prunepaths="$Prunepaths" --netpaths="$Net" ---- Most of those pruned
files are pruned either due to redundancy or being on a local network
server...

That's fairly fast vs. the MS-Home Essentials, full malware scan I
run once a week that takes ~ 8-16 hours (It scans a few of my network
directories,as well).

Processing every file on the drive will be slow just because it's
Windows.  Initializing the database with updatedb will require a large
amount of time.  There are processes such as AntiVirus intrusion
protection that might make it even slower.

Hmmm, the reason the slowness is particuarly strange to me is that in
place of using `locate` from my cygwin terminal, I have to use a program
called "Everything Search Engine" available at www.voidtools.com. The
first time I install it, it takes maybe a few minutes to index the hard
drive, then every once in a while when I open the program it takes a few
seconds to update the index, but in general the performance for indexing
and searching the index if comparable to `updatedb` and `locate` on a
Linux machine, so it's possible to do on Windows.

Byron


the time taken from updatedb is mainly due to
the execution time of "find" on the disks.

It takes ~ 70 minutes for my 500 GB of data,
and likely the AV is impacting the execution.

I suspect voidtools is using MS disk indexing
to speed up the things for it.

This is technically OT since this involved a non-cygwin tool.

find is slow compared with a non-Cygwin tool, specifically dir (cmd.exe).

Compare find with cmd.exe's dir.  Note that even with the benefit of
caching (compare the 1st and 3rd times), find takes twice as long as dir.
Comparing cached times (2nd vs 3rd), dir is 3X faster.

$ time cmd /c dir /s /b 'C:\usr' > /dev/null ; \
time find /c/usr > /dev/null ; \
time cmd /c dir /s /b 'C:\usr' > /dev/null

real    0m1.326s
user    0m0.000s
sys     0m0.047s

real    0m2.465s
user    0m0.280s
sys     0m2.184s

real    0m0.874s
user    0m0.000s
sys     0m0.031s

(Note: c:\usr has nothing to do with /usr.)

Here's how I use dir *in the abstract* for drives C: and D:.  (Note: the
/a: option of dir lists all files, including hidden ones; /o:n sorts by
name.)

for D in /c /d
do
     "$(cygpath "${COMSPEC}")" /c dir /s /b /a: /o:n "$(cygpath -w "$D")"
done | \
tr -s '\r\n' '\n' | \
cygpath -u -f - | \
sed -e '/^$/d' -e 's,/\+,/,g' \
sort -u \
/usr/libexec/frcode > /tmp/updatedb.tmp
chmod --reference /var/locatedb /tmp/updatedb.tmp
mv /tmp/updatedb.tmp /var/locatedb

What I actually do (attached) is more complicated.  My script chooses
which directories are scanned, does them in parallel, and prints pretty
messages.  I get error message for very long paths (> ~250 bytes).  It
works well enough for me; YMMV.

- Barry
   Disclaimer: Statements made herein are not made on behalf of NIAID.



--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


Barry,

Are you using dir in some sort of custom way to build the database used by locate? Or are you saying that rather than ever using the find command to find files, you use a custom script which uses dir?

Byron


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]