This is the mail archive of the guile@cygnus.com mailing list for the guile project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
> Also, I just realized that all the complexity arguments about resizing > were wrong. One doesn't resize the hash table every power of 2 > insertions. One resizes when the max bucket size reaches some fixed > constant. One can do either. > In the worst case, all insertions will be into the same bucket, > in which case there'll be a resize every max-bucket-size > insertions, making insertions O(n) in the worst case. Of course. In the worst case, your hash function returns the same value for every key. Hashing is all about *expected* behavior. > O(1) insertions are actually the best case behavior, and probably > average case behavior also, given reasonable distribution assumptions > on the hashing function. Yes. All of these points are inherent in the whole concept of hashing. Re-sizing has nothing to do with it, as long as the expected bucket size is O(1), which it *has* to be, for hashing to "work". As an aside, since it seems many people are unaware of this implementation option, let me also point out that you can implement hashtables so each bucket always has a single entry. Thus you don't need to chain entries together. The way you handle collisions is that you probe the hashtable for other entries in some deterministic order, until you either find a match, or an empty slot. The simplest way is to just scan succeeding slots, but that is susceptible to clustering problems. This approach requires more slots in the hashtable (there *must* be more slots than entries, and for performace there should many empty slots). However, it saves the space for linking together the entries that are in the same bucket. I wrote this illustrative algorithm in Java in response to private email; it should be easy to understand the basics. The appliation here is mapping integer character values to interned Char objects. We use the character value i as the hash. static Char[] table = new Char[256]; Char make (int i) { int hash = i % table.length; if (table[i] == null) /* an empty slot */ return table[i] = new Char(i); /* fill it with new entry */ if (table[i].val == i) /* found a match */ return table[i]; /* return it */ /* set skip to a number relatively prime to table.length */ /* If table.length is a power of two, any odd number will do */ /* However, using a constant leads to clustering */ int skip = (i << 1) + 1; for (; ; ) { i = (i + skip) % table.length; if (table[i] == null) /* an empty slot */ return table[i] = new Char(i); /* fill it with new entry */ if (table[i].val == i) /* found a match */ return table[i]; /* return it */ } } Note the example does not show the re-hashing nor does it handle deletions. (If you delete en entry, you have to leave a mark, so the code can distinguish between an empty slot, and a deleted one. Deleted slots can be re-used, but they don't stop the search for a match.) --Per Bothner Cygnus Solutions bothner@cygnus.com http://www.cygnus.com/~bothner