Difference between revisions of "Category:Hash Table"

From Scriptwiki
Jump to: navigation, search
(added text by saturn)
Line 30: Line 30:
  
  
 +
== More detailed information ==
  
 +
The following explanations aren't necessary for using hashtables in mIRC scripts, however, these facts could help you make your scripts faster, especially if you are going to use hash tables for a lot of data.
 +
 +
 +
A hashtable is basically an array of N buckets (N being the size from /hmake) where each bucket is just a pointer (4 bytes) to a linked list of items. In principle you could enlarge the number of buckets later on by resizing the array, but that means the hash values for items change: the bucket number for an item, i.e. the index into the array of pointers, is determined by the formula "hashfunc(itemname) modulo N". Changing N on the fly essentially means you have to reassign most of the items to other buckets, which is a nightmare with large hashtables.
 +
 +
 +
This means that $hget will indeed be slower if you take a smaller hashtable and fill it with the same amount of items as before, since the average amount of items to traverse in a bucket when looking for a match is N/2, which would mean that $hget takes much longer on a big file loaded in a small hash table, compared to a big file loaded in a big hash table.
 +
A benchmark would show this, although you'd have to repeat $hget a large number of times of course, as the speed of individual $hgets is still negligible.
 +
 +
 +
On the other hand, if your file contains M entries, /hload does M implicit $hgets to ensure that item names won't be added twice. If the factor M:N (i.e. items:size) is 10:1, then it will do (on average) 5 * M lookups, if it is 100:1, then it will do 50 * M lookups and so on. For large values of M, this difference is noticeable, even with one single /hload. So you should choose N very carefully before issuing a /hload.
 +
 +
 +
Note that there is no such thing as an "overflow" in hashtables; buckets point to linked lists that can be of arbitrary length. If you make a hashtable of size 1, you are essentially using a single linked list of items, and everything will still work fine (but $hget would be just as slow as $hfind in that case).
  
 
[[Category:MIRC Help]]
 
[[Category:MIRC Help]]

Revision as of 22:55, 14 December 2005

Basic Defintion

Basically, hash tables are like ini files, storing information in following format:

Tablename
item1=data1
item2=data2
...

However, hash tables are a lot faster than ini- or text-files, which especially takes effect if you have a lot of entries. A kind of disadvantage is the missing possibility to see all entries. With some files, you can just open them and take a look on every entry, whereat hash table, you can't just open then. To have the chance to still get an overview, you can use the following script:

; lets make a little alias called showhash. At the end, it will look like /showhash <name>
alias showhash {
 ; echo the name and a kind of "table header"
 echo -a $1
 echo -a item => data
 var %i = 1
 ; lets loop through all items. $hget($1,0).item will return the total amount of items.
 while (%i <= $hget($1,0).item) {
  echo -a $hget($1,%i).item => $hget($1,%i).data
  ; increase looping-variable
  inc %i
 } 
}

Modifying hash tables

You can make a new hashtable using the /hmake, respectivly delete one using the /hfree command. Using /hdel and /hadd, you can modify the data saved in your hashtable. Due to hash tables not being automatically saved, you have to use the /hsave and /hload command.

Receiving data

After saving data in hash tables, you can of course receive them. Therefor, use the $hget and $hfind identifiers.


More detailed information

The following explanations aren't necessary for using hashtables in mIRC scripts, however, these facts could help you make your scripts faster, especially if you are going to use hash tables for a lot of data.


A hashtable is basically an array of N buckets (N being the size from /hmake) where each bucket is just a pointer (4 bytes) to a linked list of items. In principle you could enlarge the number of buckets later on by resizing the array, but that means the hash values for items change: the bucket number for an item, i.e. the index into the array of pointers, is determined by the formula "hashfunc(itemname) modulo N". Changing N on the fly essentially means you have to reassign most of the items to other buckets, which is a nightmare with large hashtables.


This means that $hget will indeed be slower if you take a smaller hashtable and fill it with the same amount of items as before, since the average amount of items to traverse in a bucket when looking for a match is N/2, which would mean that $hget takes much longer on a big file loaded in a small hash table, compared to a big file loaded in a big hash table. A benchmark would show this, although you'd have to repeat $hget a large number of times of course, as the speed of individual $hgets is still negligible.


On the other hand, if your file contains M entries, /hload does M implicit $hgets to ensure that item names won't be added twice. If the factor M:N (i.e. items:size) is 10:1, then it will do (on average) 5 * M lookups, if it is 100:1, then it will do 50 * M lookups and so on. For large values of M, this difference is noticeable, even with one single /hload. So you should choose N very carefully before issuing a /hload.


Note that there is no such thing as an "overflow" in hashtables; buckets point to linked lists that can be of arbitrary length. If you make a hashtable of size 1, you are essentially using a single linked list of items, and everything will still work fine (but $hget would be just as slow as $hfind in that case).

Pages in category "Hash Table"

The following 10 pages are in this category, out of 10 total.