This is the mail archive of the guile@sourceware.cygnus.com mailing list for the Guile project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

Re: hash?

To: Lars Arvestad <arve at nada dot kth dot se>
Subject: Re: hash?
From: Greg Harvey <Greg dot Harvey at thezone dot net>
Date: 30 Dec 1999 12:32:17 -0330
Cc: Miroslav Silovic <silovic at zesoi dot fer dot hr>, hanwen at cs dot uu dot nl, guile at sourceware dot cygnus dot com
References: <14435.21234.279317.759162@dokkum.cs.uu.nl> <m3ogbggwev.fsf@behemoth.dethfart.org> <7eg0wrt3qe.fsf@zesoi.fer.hr> <yueu2l4zpwp.fsf@turing.nada.kth.se>

Lars Arvestad <arve@nada.kth.se> writes:

> I am surprised only four characters are used. It can't be that
> expensive to use more. In fact, if I just picked a ready-made
> hash-function, I would be disappointed if not the whole string was
> used to compute the key. You want a fast hash function? Then make one
> yourself, because you know the data! 

scm_strhash is used for hashing guile's symbol tables as well, so
there is a motivation for wanting it to be really fast. It likely
shouldn't be the function used for arbitrary string hashes, though.

> 
> >> Looking at the code I'm curious about the selection of initial hash as
> >> 264 % table size; I'd think it'd be better to make the generated hash
> >> totally dependant on the value being hashed, rather than a fixed start
> >> value for each hash for a particular size table (264 might not be the
> >> best value to go there, either, but it's been a while since I've
> >> looked deeply at hashes, so I might just be being prime-centric).
> 
> MS> True - making hash independant on the table size would also allow
> MS> storing the full (nondivided) hashes into the table for dynamic
> MS> resize/rehash.
> 
> 264 seems to be totally arbitrary and only chosen to offset long
> strings from the short ones. I don't see why 256 was not chosen, or
> why a prime value would make a difference. 

I was thinking it'd be less likely to end up with a poor initial value
in the small table case; however, it's hard to say what, exactly, a
poor initial value would be ;). Using an actual character from the
string still might be a more sensible choice than 264.

> Regarding the code, isn't it possible to simply cast a block of four
> characters as a long? I realize that there can be word alignment
> problems, but given a string pointer, that should not be an
> issue. This way, four bytes/characters could be processed in one
> read. 

Yep, but the ease of getting the characters probably isn't as much the
reason that it only uses a small number of characters than the desire
to keep the total number of other operations down.

-- 
Gregh

References:
- hash?
  - From: Han-Wen Nienhuys
- Re: hash?
  - From: Greg Harvey
- Re: hash?
  - From: Miroslav Silovic
- Re: hash?
  - From: Lars Arvestad

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]