This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: avl-trees vs. hashes


On 26 Oct 1998, Harvey J. Stein wrote:

> Also, I just realized that shrinking the table can cause substantially
> worse than O(1) behavior.  It can be O(n) if done improperly - if you
> add an element causing the hash table to double in size & then you
> delete said item & the code decides because of the deletion to shrink
> the hash table again.  Then you'll get O(n).  In general, it's
> important to only do these resizing operations every power of 2
> insertions to preserve the O(1) insertion/deletion speed.
> 

in practice, this just never happens, but just in case:

if (flag == -1 &&
    inumber_entries / (double)inumber_nonempty_buckets
    > MAX_MEAN_NONEMPTY_BUCKETS_SIZE)
{
	fprintf(stderr, "PANIC: downsized hash table prematurely.  "
		        "disabling auto-shrink.\n");
	fflush(stderr);
	new_header_elts[AUTO_SHRINK_FLAG_INDEX] = SCM_BOOL_F;
}

> Also, I just realized that all the complexity arguments about resizing
> were wrong.  One doesn't resize the hash table every power of 2
> insertions.  One resizes when the max bucket size reaches some fixed
> constant.

no, you can also resize when the mean nonempty bucket size exceeds a fixed
limit (3 seems to work well).  This works better than the max bucket size
approach since the expected max bucket size is proportional to
log(number of entries).  Of course, you could always modify the max
bucket size to reflect this.

> In the worst case, all insertions will be into the same
> bucket, in which case there'll be a resize every max-bucket-size
> insertions, making insertions O(n) in the worst case.  O(1) insertions
> are actually the best case behavior, and probably average case
> behavior also, given reasonable distribution assumptions on the
> hashing function.
> 

basically incorrect assumptions for Guile's current hasher.

	Jay
	jglascoe@jay.giss.nasa.gov