This is the mail archive of the guile@cygnus.com mailing list for the guile project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
On 28 Oct 1998, Harvey J. Stein wrote: > Jay Glascoe <jglascoe@jay.giss.nasa.gov> writes: > > > 1. not storing the key's hash in the entry would be a big win for > > deletion in certain, relatively common situations. For example, if the > > keys and values you use are small (relative to the size of the [big] > > integer hash object), like small strings, integers, booleans, then > > > > (hash (key . value)) > > > > is much larger than > > > > (key . value) > > > > and guile will have a much easier time gc'ing the smaller of the two. > > What do you mean, "for deletion"? Because of the GC, more memory = > more CPU = slower in general (because more memory gets scanned), so > including the hash value makes the program slower in general. > > OTOH, all hash operations requiring looking for items might be faster > because you can test for hash equality before testing key equality & > the hash equality test might be faster. > > > 2. not storing the hash would be a big loss for resize since it's > > certainly faster (err, at least I hope it is) to do this > > > > index = (scm_num2long(SCM_CAR(entry), "foo", "bar")) % vec_len; > > > > rather than > > > > hash = my_fave_hasher(SCM_CAR(entry)); > > index = hash % vec_len; > > > > but then again, the speed of the hasher depends on the size of the key > > you're hashing. So if you've small keys... > > I can't imagine the hasher being faster than just looking up the > number, so it should always be faster to look them up than to > recompute all the hashes. However, it might be a minor point for > small keys because of the other work needed when rehashing. > > But, using the above trick it can make all ops that require locating > an object faster (as mentioned above). I think the best solution is to _not_ store the hash by default. Reason: The user could do so if he wishes. In a generic hashtable implementation, which allows to provide a hash-function and an equality predicate, you could do the following: * Keys are pairs (hash . real-key) * The hash-function which is provided to make-hashtable is simply 'car. * The equality function first compares the cars, then performs the real equality test on the real-keys: (define (my-hash-equal? k1 k2) (and (eq? (car k1) (car k2)) (real-equality-test (cdr k1) (cdr k2)))) * Instead of (hashtable-set! table key value) you do (hashtable-set! table (cons (real-hasher key) key) value) * make-hashtable is called as follows (or similarly): (make-hashtable car my-hash-equal?) By _not_ adding the hash you have the possibility to get the best of all worlds. Jay, you have done a great job providing the auto-resizing hashtables. What do you think of my suggestion? Best regards, Dirk Herrmann