This is the mail archive of the cygwin-developers@cygwin.com mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: key64_t? ino64_t?


Robert Collins wrote:
> Well, thats why it was pseudo code :}. Yes, it should return -1 when it
> hits the wall.

OK.

>>Implementation details of FTOKNode::operator++.  

> We don't need such an operator - each node is created once. We have a
> static counter which is simply long (NextKey).

OK.

>>But is that the only place
>>that needs guards?  Etc.  
> 
> 
> Correct, it's the only place that needs guarding, as everything else
> accesses it via ftok().

 
> Well, another tweak to the pseudo code is needed - the canonical_path
> (which is a little theorectical, there is a routine that will do the
> job, but I don't recall it's name offhand) should have it's result
> checked for statability(). Then you'll need 1.6Million unique files -
> which should be more of a problem :}.

Especially on my overcrowded partition. 

> Whoa. I never imagined that there was a non-CS version of hash, other
> than 'hash brown'.

Look at it this way: perl coders use hashtables all the time.  Do you
think they all know what it is, under the surface? </me ducks>  There's a
colloquial 'hash' for you... 

(yes, yes, I know that the 'hash' in HASHtable only refers to how the key
is hashed to form an index into a storage structure, and technically we
should be discussing the key->index hash, and not the 'retreived =
storage[key->index]' lookup implied by a 'hashTABLE')
(yes, yes YES I know that it's not really an index into a static array,
but the lookup is performed in a dynamically allocated list-type
structure...MAN I hate guarding these rabbit holes.  It's like my
recurring nightmare of a thesis defense gone bad...]

> The key is in the FTOKNode, assigned in FTOKNode::FTOKNode.

[more on this point, below]

>>Now, about your quote from P1003.1:

> I am quoting a subsection, to move the discussion to the spec, not to
> the linux man page. The return value "(key_t) -1" is reserved for such a
> condition.

Understood.
 
>> cwilson says: blah blah Linux blah blah

> I've no idea. It may be that the needed conditions are so low it doesn't
> matter in a pragmatic sense. OTOH, care to grab the solaris source and
> do some clean room engineering? 

That's actually a very good idea -- but one for which I can't spare the
time right now, unfortunately.

> I'd expect it to accomodate hard links
> better than Linux for this...

Mebbe so.

>>But back to a minor point: how can both of the following (which you have
>>asserted concerning a Tree/Trie implementation) be true?
>>
>>(1) returned key value is solely dependent on the data in a node.  
>>
>>Since the data in the node is the pathname, which can be (at most) a
>>PATHMAX array of chars (where each char -- let's be charitable and leave
>>out diacritics -- can have one of about 100 different values.  That's a
>>domain space of 100^PATHMAX == 10^(2PATHMAX)  Now, the output range space
>>is 2^32 ~ 10^9 which is MUCH smaller than the domain.  So, IFF the output
>>key is dependent SOLELY on the input data, then we WILL hash from 10^512
>>down to 10^9.
> 
> 
> Are you using hash literally there, or as a placeholder for some other
> concept? Because we aren't hashing. 

Sorry.  Placeholder for 'fit big domain into small range'.   We have
10^bignumber possible filenames, that must all be uniquely represented by
2^32 key values.  Ain't gonna happen without collision.  Your tree
implementation insures that it will never happen until the 2^32 + 1'th
key -- but it WILL happen eventually (assuming you don't run out of
memory).  A hash makes no such guarantees -- your first two keys might
collide.  Although, with the hash used in my patch, the odds of collision
in any realistic scenario are very very low -- lower than on Linux --
unless I've screwed up.

>>(2) returned keys will be unique, as long as less than MAXKEYS (2^32)
>>keys are issued.
>>
>>Logically, these can't be true -- because (2) implies that the returned
>>key depends on node position.
> 
> 
> (2) implies that the returned key for previously unseen pathss depends
> on the number of previously seen paths - it does not imply that the
> returned key depends on the position of the node in our internal storage
> structure.

Ah...

When I saw : 'key = NextKey++' in your implementation, I just *knew* that
NextKey couldn't be an integer type.  Because that would be awful.  No,
NextKey was a FTOKNode, with a specially overridden assignment-to-integer
operator and operator++.  Because simply assigning keys based on the
order of insertion couldn't POSSIBLY be what your were proposing, because
it violates the determinism principle (see below).

And now I find that you were, in fact, proposing that.

>>I just don't get it.  :-(

>>That is, in this case, the returned value is not dependent on the node
>>data AT ALL -- you get the last key value regardless of what your
>>pathname or node data is.
> 
> 
> The returned value is 100% dependent on the node data. The node data is
> dependent on creation order, not on storage location.

See, I thought that node data == pathname.  Not 'This pathname was the
712th one inserted'.  Arg.

> Thats the key to this:
> If you return your storage data (be that hash value, array index, node
> pointer), then your system is fragile if that value is either mutable or
> subject to collisions from the domain.
> 
> So what I am suggesting we do is:
> Assign a unique key prefix 24 bits long each time a new node is needed.
> Store that node *any old how* so we can find it again in an efficient
> manner.
> Return the key prefix from the node matching the requested path || the
> requested 8 bit id.

Yes, of course -- that makes perfect sense, if you use insertion order as
your key value (or 24bits of your key value; whatever).

But it means that the returned key number IS dependent on something OTHER
than the fully expanded pathname.  It's also dependent on the execution
order of all other IPC processes on the system.

Surely that's not good.  You once said that 
"hashs collide. key_t's can not collide under any circumstance, and must
be deterministic (i.e. not dependent on currently issued keys)."

The word 'deterministic' can mean many things: (1) predictable by someone
with complete knowledge of past system inputs and initial system state,
or (2) predictable using only the current input.  (e.g. does
'determinism' include memory?)  [There's a name for this distinction from
finite-state-automata theory, but I don't remember it right now]

However, your parenthetical explanation "not dependent on currently
issued keys" nails that down to the second, memory-less determinism.

But simply counting all previously issued keys is deterministic only in
the (1) sense, not in the memory-less (2) sense.

Soooo.... ???

--Chuck
--
  Charles Wilson
  cygwin at removespam cwilson dot fastmail dot fm


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]