This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: hash table subtleties


On Wed, 28 Oct 1998, Tel wrote:

> > I'd like to also repeat what I mentioned elsewhere - namely if you
> > shrink the tables on deletions you can easily get the bad behavior of
> > resizing every few operations (on a mix of insertions & deletions)
> > causing the hash tables to give O(n) behavior instead of O(1).
> > They'll be slower than an alist, let alone a balanced tree.
> 
> Put some hysteresis into the grow and shrink. For example, when
> utilisation is greater than 3, grow, when it is less than 1, shrink.
> This means that resize cannot possibly occur every few operations.

my tables grow (double the number of buckets) when the mean nonempty
bucket size exceeds 3, and shrink (halve the number of buckets) when the
mean nonempty bucket size is less than 3 * 119/256 = 1.395 (3 * 1/2 is too
big, 3 * 1/4 is too small, ...  3 * 951/2048 is very close)

The magic ratio is chosen so that the expected mean nonempty bucket size
after upsize is the same as it is after downsize.  (and then, e.g. if a
user inserts just enough entries for the table to grow to 4096 buckets, he
must delete half of them before it will shrink back down to 2048). 

I look at the number of nonempty buckets rather than the total number of
buckets since empty buckets have no effect on lookup, re-insertion, and
deletion (assuming you're using keys that already exist in the table).

I do the following test

	  (set! mytab (make-proc))
	  (for-each (insert-entry mytab insert-proc) lines) 
 	  (for-each (delete-entry mytab delete-proc) lines) 
	  (for-each (insert-entry mytab insert-proc) lines) 
 	  (for-each (delete-entry mytab delete-proc) lines)
	  (gc)


insert 10,000 unique random numbers (both tables claw their way up from 4
buckets to 4,096 buckets), then delete them all (the auto-shrink enabled
hash table plunges back down to 4 buckets), then re-insert them all, and
again delete them all.  Run this test 10 times and get

auto-shrink enabled
*time: 945* 
(gc-time-taken . 499)
(cells-allocated . 36006)
(cell-heap-size . 98304)
(bytes-malloced . 93495)
(gc-malloc-threshold . 150000)
(cell-heap-segments (537251368 . 536989224) (537962488 . 537438200))

auto-shrink disabled
*time: 893* 
(gc-time-taken . 495)
(cells-allocated . 36006)
(cell-heap-size . 98304)
(bytes-malloced . 109865)
(gc-malloc-threshold . 150000)
(cell-heap-segments (537251368 . 536989224) (537909144 . 537384856))


The auto-shrink performance is so good that I'm seriously tempted to turn
it on by default.

> This same problem occurs in automatic gearboxes when you get onto a
> hill of just the right slope.
> 
cool analogy

> ...
<lots of cool stuff snipped, I'm still trying to digest it>