This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: FAQ update suggestion for "I'm having basic problems with find. Why?"


On 2004-07-08, Larry Hall wrote:
> At 10:02 AM 7/8/2004, you wrote:
> >I have been using *ixy-type systems on and off for what must now be
> >16 years, including using "find".
> >
> >I was using "find" today on an UDF/ISO format DVD-R, and was
> >perplexed by it seemingly missing out large chunks of the hierarchy at
> >random.
> >
> >It seems that "find" has an optimisation relating to the hard link
> >count on directories and the presence or otherwise of the "." and
> >".." objects.
> >
> >If the filesystem you are finding on doesn't have the "." and ".."
> >objects then "find" will fail silently(!)
> >
> >To get it to work, you need to turn the optimisation off with the "-
> >noleaf" option.
> >
> >This is documented in the man page, but when you come to the symptoms
> >cold, it looks more like a subsystem issue than an application issue, so
> >it didn't occur to me to look in the documentation for "find".
> >
> >The problem here is that the route to discovery of the solution is
> >somewhat tricky.
> >
> >(In fact you could say that it is a dangerous optimisation in find.
> >If the optimisation is not valid, there are no error messages and it
> >fails silently.  I guess I should be looking to see if this issue has
> >already come up on the upstream version of find.)
> 
> Right.  I'd agree with this notion.
> 
> 
> >My point is this:
> >
> >Whilst this is not an issue with Cygwin per se, the nature of Cygwin
> >means that this issue will tend to arise commonly with Cygwin, and tend
> >not to arise under traditional unixes.
> 
> 
> Why's that?

Traditional unixes have been around for longer.

Cygwin contains more to do with joining together stuff which has origins
in different paradigms, so you are likely to see more problems with edge
cases.

> >Perhaps it would be a good idea to mention this issue in the Cygwin FAQ?
> >
> >Possibly as a second point under the existing heading of 
> >"I'm having basic problems with find. Why?"
> >
> >We could have an extra paragraph that goes something like this:
> >
> >If find does not seem to be producing enough results, or seems to be
> >missing out some directories, you may be experiencing a problem with one
> >of find's optimisations.  See the documentation for the option '-noleaf'
> >in the man page.
> 
> That seems to be reasonable wording.  But my inclination would be to get
> the results of more research into the 'find' issue before adding this to the 
> Cygwin doc somewhere (not sure if the FAQ is quite the right spot given that
> we haven't seen allot of questions about it - at least not yet ;-) ).  Would
> you be able to look into this further?

On Windows XP over NTFS, "find" apparently worked fine without the
"-noleaf" option.

On Windows XP over three DVD-R discs, each containing a distinct data
set, all laid out in UDF/ISO format using Ahead Nero 5.5, "find"
required the "-noleaf" option in order to find all the objects as
expected.

The three discs were an archive copy of a hard disk from a notebook PC,
with the files spread over three DVDs with the decision as to which disk
each group of files was placed on made by estimation of usefulness.

Discs 1 and 2 contained only one directory at the top level of the
hierarchy called "Documents and Settings".  All other objects were below
that directory.

Disc 3 contained multiple directories and files at the top level of the
hierarchy.

The command lines containing "find" would be:

  cd /cygdrive/e
  
  find         -type f -print0 | xargs -0 md5sum | sort +1 > somefile

  find -noleaf -type f -print0 | xargs -0 md5sum | sort +1 > somefile

(Where /cygdrive/e refers to a DVD reading drive.)

The idea is to get a file containing a list of MD5 sums of all files in
the hierarchy.

The number of search hits returned was as follows:

  Disc 1
    without -noleaf:     0 hits
    with    -noleaf:  7379 hits
  
  Disc 2
    without -noleaf:     0 hits
    with    -noleaf:  4325 hits

  Disc 3
    without -noleaf: 17618 hits
    with    -noleaf: 37973 hits

I also made a list of MD5 sums from the original notebook hard disk
using a similar command line.

Combining the MD5 sums lists from the DVD-R discs, then sorting the
results and comparing with the MD5 sum list from the original hard disk
results were as follows:

DVD-R discs read without "-noleaf":

  Huge numbers of files completely missing from DVD-R set
  
DVD-R discs read with "-noleaf":

  Minor differences which could all be explained by:
  (a) Filename truncation on DVD-R filesystem
  (b) Permissions issues on the hard disk NTFS filesystem
  (c) Fondling of files by Windows XP in meantime (e.g. desktop.ini)

So, from this my conclusion was that "-noleaf" was necessary when
reading from a DVD-R filesystem made as described above.

The result of zero hits when the top level directory only contains one
directory is consistent with the behaviour described in the "find"
documentation under the "-noleaf" option.

I had Googled for a while to try and find the answer, but to no avail.

My search did lead me to the Cygwin mailing list, which lead me first to
a suggestion that I read the Cygwin FAQ.

The Cygwin FAQ mentions an issue with "find" but it is not the issue I
had.

That is how I arrived at the idea that it might be an idea for the issue
to be mentioned in the Cygwin FAQ.

I suppose even if it doesn't get into the FAQ, it's possible that this
thread will be archived and be indexed by Google.

Bill

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]