This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: Hashtables in guile.


> > they are fixed because they are implemented in user space as arrays.
> > anyone is free to write `rehash'.
> 
> Well, this isn't the best answer possible.  Table expansion should
> happen transparently, but you can't really change the size of an
> existing vector, so you need to construct a new one.  So anyone
> holding a pointer to the original vector gets left behind.

The trouble with a hash table is that it needs to modulo the hash numbers
by the table size, when you change the table size you change the modulo
and suddenly the objects need to go into different slots. Thus you
`rehash' the whole table into a new table. The standard Java library
has an implementation of a hash table that calculates it's own sparsity
and rehashes itself when it feels that it needs to.

Hash tables are fast to pull things out of and fast to put things
into but slow to resize -- them's the breaks.

> (Actually, I think you used to be able to resize vectors in Guile, but
> we either removed that, or have discouraged people from doing it.)

Resizing vectors can be good for other purposes. Any particular reason
why it should be avoided?

> The current implementation has the following nice properties:
> a) it's an ordinary Scheme data structure, just a vector of lists
> b) you can choose your equality predicate
> c) you can choose weak or double-weak references
> 
> It would be nice to have something which added:
> d) tables are automatically resized
> e) equality predicate is stored with the table, so you can just use
>    generic hash-ref/hash-set!/... functions
> 
> One might want to sacrifice a) and define a new opaque type to provide
> better type-checking, and ensure consistency, but you'd need to be
> sure to provide a full suite of accessors, so the opacity doesn't get
> in people's way too much.
> 
> Volunteers are welcome.  Go dust off that copy of Knuth and post your
> best hash table implementation!

Well my B-Tree index is reasonably stable, it is available for people
to play around with (but it is bundled together with some other stuff,
I'm not going to even try to unbundle it until the guile module system
gets its act together, in the meantime you will have to settle for
cut, paste, diff and patch).

In general, B-Trees are not as fast to do an insert or find as what a 
hash table is. However, they do dynamically resize and never need rehashing.
Also, you need more than an equality function, you need a function that
makes your keys ordinal (i.e. (test X Y) gives greater than zero if
X is greater than Y, less than zero if X is less than Y and zero if they
are equal). You can convert B-Trees to lists and you can convert regions
of the key to lists as well. You can use B-Trees as tolerably fast sorting
engines too (guaranteed faster than linear list scanning).

http://www.progsoc.uts.edu.au/~telford/samples/guile/

Contains not the latest but a workable archive. Only guaranteed to work
on i386 linux but I can give you a hand getting it working on other
systems too.

Implementation wise, it checks against the above points like so:

a) NO  - my B-Trees are opaque, but you can store any SCM type.
b) YES - the comparison function is a scheme function.
c) ??  - not quite sure how to apply this to B-Trees,
         would appreciate if someone explained the issue of weak
         and double-weak keys to me.
d) YES - you get this at no cost with B-Trees.
e) YES - storing it with the table is the only safe way to allow
         internal shuffles in the tree. However, when doing a `find'
         the tree is read-only so theoretically the `find' can specify
         a different function to `insert'. I'm still playing with
         this concept. If you need it then ask (or do the mods yourself).

	- Tel