This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: avl-trees vs. hashes


> Also, I just realized that all the complexity arguments about resizing
> were wrong.  One doesn't resize the hash table every power of 2
> insertions.  One resizes when the max bucket size reaches some fixed
> constant.

One can do either.

> In the worst case, all insertions will be into the same bucket,
> in which case there'll be a resize every max-bucket-size 
> insertions, making insertions O(n) in the worst case.

Of course.  In the worst case, your hash function returns the
same value for every key.  Hashing is all about *expected*
behavior.

> O(1) insertions are actually the best case behavior, and probably
> average case behavior also, given reasonable distribution assumptions
> on the hashing function.

Yes.  All of these points are inherent in the whole concept of
hashing.  Re-sizing has nothing to do with it, as long as the
expected bucket size is O(1), which it *has* to be, for hashing
to "work".

As an aside, since it seems many people are unaware of this
implementation option, let me also point out that you can
implement hashtables so each bucket always has a single
entry.  Thus you don't need to chain entries together.
The way you handle collisions is that you probe the
hashtable for other entries in some deterministic order,
until you either find a match, or an empty slot.
The simplest way is to just scan succeeding slots, but
that is susceptible to clustering problems.  This approach
requires more slots in the hashtable (there *must* be more
slots than entries, and for performace there should many
empty slots).  However, it saves the space for linking together
the entries that are in the same bucket.

I wrote this illustrative algorithm in Java in response to
private email;  it should be easy to understand the basics.
The appliation here is mapping integer character values to
interned Char objects.  We use the character value i as the hash.

  static Char[] table = new Char[256];

  Char make (int i)
  {
    int hash = i % table.length;
    if (table[i] == null) /* an empty slot */
      return table[i] = new Char(i); /* fill it with new entry */
    if (table[i].val == i)  /* found a match */
      return table[i]; /* return it */
    /* set skip to a number relatively prime to table.length */
    /* If table.length is a power of two, any odd number will do */
    /* However, using a constant leads to clustering */
    int skip = (i << 1) + 1;
    for (; ; )
      {
        i = (i + skip) % table.length;
        if (table[i] == null) /* an empty slot */
	  return table[i] = new Char(i); /* fill it with new entry */
        if (table[i].val == i) /* found a match */
	  return table[i]; /* return it */
      }
  }

Note the example does not show the re-hashing nor does it handle
deletions.  (If you delete en entry, you have to leave a mark,
so the code can distinguish between an empty slot, and a deleted
one.  Deleted slots can be re-used, but they don't stop the search
for a match.)

	--Per Bothner
Cygnus Solutions     bothner@cygnus.com     http://www.cygnus.com/~bothner