Difference between revisions of "Category:Hash Table"
(added text by saturn) |
|||
Line 30: | Line 30: | ||
+ | == More detailed information == | ||
+ | The following explanations aren't necessary for using hashtables in mIRC scripts, however, these facts could help you make your scripts faster, especially if you are going to use hash tables for a lot of data. | ||
+ | |||
+ | |||
+ | A hashtable is basically an array of N buckets (N being the size from /hmake) where each bucket is just a pointer (4 bytes) to a linked list of items. In principle you could enlarge the number of buckets later on by resizing the array, but that means the hash values for items change: the bucket number for an item, i.e. the index into the array of pointers, is determined by the formula "hashfunc(itemname) modulo N". Changing N on the fly essentially means you have to reassign most of the items to other buckets, which is a nightmare with large hashtables. | ||
+ | |||
+ | |||
+ | This means that $hget will indeed be slower if you take a smaller hashtable and fill it with the same amount of items as before, since the average amount of items to traverse in a bucket when looking for a match is N/2, which would mean that $hget takes much longer on a big file loaded in a small hash table, compared to a big file loaded in a big hash table. | ||
+ | A benchmark would show this, although you'd have to repeat $hget a large number of times of course, as the speed of individual $hgets is still negligible. | ||
+ | |||
+ | |||
+ | On the other hand, if your file contains M entries, /hload does M implicit $hgets to ensure that item names won't be added twice. If the factor M:N (i.e. items:size) is 10:1, then it will do (on average) 5 * M lookups, if it is 100:1, then it will do 50 * M lookups and so on. For large values of M, this difference is noticeable, even with one single /hload. So you should choose N very carefully before issuing a /hload. | ||
+ | |||
+ | |||
+ | Note that there is no such thing as an "overflow" in hashtables; buckets point to linked lists that can be of arbitrary length. If you make a hashtable of size 1, you are essentially using a single linked list of items, and everything will still work fine (but $hget would be just as slow as $hfind in that case). | ||
[[Category:MIRC Help]] | [[Category:MIRC Help]] |
Revision as of 22:55, 14 December 2005
Basic Defintion
Basically, hash tables are like ini files, storing information in following format:
Tablename
item1=data1
item2=data2
...
However, hash tables are a lot faster than ini- or text-files, which especially takes effect if you have a lot of entries. A kind of disadvantage is the missing possibility to see all entries. With some files, you can just open them and take a look on every entry, whereat hash table, you can't just open then. To have the chance to still get an overview, you can use the following script:
; lets make a little alias called showhash. At the end, it will look like /showhash <name> alias showhash { ; echo the name and a kind of "table header" echo -a $1 echo -a item => data var %i = 1 ; lets loop through all items. $hget($1,0).item will return the total amount of items. while (%i <= $hget($1,0).item) { echo -a $hget($1,%i).item => $hget($1,%i).data ; increase looping-variable inc %i } }
Modifying hash tables
You can make a new hashtable using the /hmake, respectivly delete one using the /hfree command. Using /hdel and /hadd, you can modify the data saved in your hashtable. Due to hash tables not being automatically saved, you have to use the /hsave and /hload command.
Receiving data
After saving data in hash tables, you can of course receive them. Therefor, use the $hget and $hfind identifiers.
More detailed information
The following explanations aren't necessary for using hashtables in mIRC scripts, however, these facts could help you make your scripts faster, especially if you are going to use hash tables for a lot of data.
A hashtable is basically an array of N buckets (N being the size from /hmake) where each bucket is just a pointer (4 bytes) to a linked list of items. In principle you could enlarge the number of buckets later on by resizing the array, but that means the hash values for items change: the bucket number for an item, i.e. the index into the array of pointers, is determined by the formula "hashfunc(itemname) modulo N". Changing N on the fly essentially means you have to reassign most of the items to other buckets, which is a nightmare with large hashtables.
This means that $hget will indeed be slower if you take a smaller hashtable and fill it with the same amount of items as before, since the average amount of items to traverse in a bucket when looking for a match is N/2, which would mean that $hget takes much longer on a big file loaded in a small hash table, compared to a big file loaded in a big hash table.
A benchmark would show this, although you'd have to repeat $hget a large number of times of course, as the speed of individual $hgets is still negligible.
On the other hand, if your file contains M entries, /hload does M implicit $hgets to ensure that item names won't be added twice. If the factor M:N (i.e. items:size) is 10:1, then it will do (on average) 5 * M lookups, if it is 100:1, then it will do 50 * M lookups and so on. For large values of M, this difference is noticeable, even with one single /hload. So you should choose N very carefully before issuing a /hload.
Note that there is no such thing as an "overflow" in hashtables; buckets point to linked lists that can be of arbitrary length. If you make a hashtable of size 1, you are essentially using a single linked list of items, and everything will still work fine (but $hget would be just as slow as $hfind in that case).