This is the mail archive of the guile@cygnus.com mailing list for the guile project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Maciej Stachowiak <mstachow@mit.edu> writes: > If you grow by a factor of K>1 only whenever the hashtable has more > than E/F (where E is the maximum number of elements and F >= 1) > elements, and shrink by a factor of M>1 whenever the hashtable has > fewer than E/G elemenets where G>F, I believe you will get amortized > constant time (assuming your hash function distributes well enough > that collisions are not a significant factor). F and G should > obviously be picked such that the difference between thresholds is > considerably less than one element for common sizes, or you will get > crappy performance anyway for the sizes you care about, regardless of > the asymptotic complexity. I take it you meant to grow the table by a factor of K > 1 when N_elements/N_buckets = avg_bucket_size > MAX_AVG_BUCKET_SIZE and to shrink it by K when N_elements/N_buckets = avg_bucket_size < MIN_AVG_BUCKET_SIZE, and that we want (obviously): MIN_AVG_BUCKET_SIZE < MAX_AVG_BUCKET_SIZE It looks like one should have: MIN_AVG_BUCKET_SIZE/K^2 = MAX_AVG_BUCKET_SIZE otherwise after shrinking by K the average bucket size might be close enough to MAX_AVG_BUCKET_SIZE to cause the table to grow again after insertion of a small number of items. Example 1 - MIN_AVG_BUCKET_SIZE = 2, MAX_AVG_BUCKET_SIZE = 4, K = 2. Suppose a deletion has caused the hash table to have 256 items & 128 buckets yielding an average bucket size of 2. The hash table will resize to 64 buckets and the average bucket size will be 4. The next insertion will make the average bucket size greater than MAX_AVG_BUCKET_SIZE, causing the hash table to double in size again, growing to 128 buckets. This can repeat forever, thus causing resizing for every pair of deletion/insertions for any given hash table size. This is bad. Example 2 - MIN_AVG_BUCKET_SIZE = 1, MAX_AVG_BUCKET_SIZE = 4, K = 2. Now when the MIN_AVG_BUCKET_SIZE is hit, halving the size of the hash table will yield an average bucket size of 2. The hash table is in the same state as after the previous doubling of table size. The hash table won't resize from insertions until (2 x number_of_buckets) insertions. This is good. And, yes - when sizing based on max bucket size it's probably much easier to get into a bad state as in Example 1. -- Harvey J. Stein BFM Financial Research hjstein@bfr.co.il