Difference between revisions of "Category:Hash Table"
(typos and clarification) |
|||
(6 intermediate revisions by 5 users not shown) | |||
Line 1: | Line 1: | ||
== Basic Defintion == | == Basic Defintion == | ||
− | Basically, hash tables are like [[:Category:Ini|ini files]], storing information in following format: | + | Basically, hash tables are like [[:Category:Ini|ini files]], storing information in the following format: |
<b>Tablename</b><br /> | <b>Tablename</b><br /> | ||
Line 7: | Line 7: | ||
... | ... | ||
− | However, hash tables are a lot faster than ini- or text-files, which especially takes effect if you have a lot of entries. A kind of disadvantage is the missing possibility to ''see'' all entries. With some files, you can just open them and take a look | + | However, hash tables are a lot faster than ini- or text-files, which especially takes effect if you have a lot of entries. A kind of disadvantage is the missing possibility to ''see'' all entries. With some files, you can just open them and take a look at every entry, but with hash tables, you can't just ''open'' them. To have the chance to still get an overview, you can use the following script: |
− | ; | + | ; let's make a little alias called ''showhash''. Its usage is ''/showhash <tablename>'' |
[[alias]] showhash { | [[alias]] showhash { | ||
; echo the name and a kind of "table header" | ; echo the name and a kind of "table header" | ||
− | [[echo]] -a $1 | + | [[echo]] -a [[$1-|$1]] |
echo -a item => data | echo -a item => data | ||
[[var]] %i = 1 | [[var]] %i = 1 | ||
; lets loop through all items. $hget($1,0).item will return the total amount of items. | ; lets loop through all items. $hget($1,0).item will return the total amount of items. | ||
− | [[while]] (%i <= $hget($1,0).item) { | + | [[while]] (%i [[If-Then-Else#<=|<=]] [[$hget]]($1,0).item) { |
echo -a $hget($1,%i).item => $hget($1,%i).data | echo -a $hget($1,%i).item => $hget($1,%i).data | ||
; increase looping-variable | ; increase looping-variable | ||
Line 24: | Line 24: | ||
=== Modifying hash tables === | === Modifying hash tables === | ||
− | You can make a new hashtable using the [[Hmake|/hmake]], | + | You can make a new hashtable using the [[Hmake|/hmake]], respectively delete one using the [[Hfree|/hfree]] command. By using [[Hdel|/hdel]] and [[Hadd|/hadd]], you can modify the data saved in your hashtable. Hash tables are lost whenever mIRC is closed, so if you want to keep the same hast table in your next mIRC session, you have to use the [[Hsave|/hsave]] and [[Hload|/hload]] commands upon respectively exiting and starting your mIRC. |
=== Receiving data === | === Receiving data === | ||
− | After saving data in hash tables, you can of course receive them. | + | After saving data in hash tables, you can of course receive them. Therefore, use the [[$hget]] and [[$hfind]] identifiers. |
+ | == More detailed information == | ||
+ | The following explanations aren't necessary for using hashtables in mIRC scripts, however, these facts could help you make your scripts faster, especially if you are going to use hash tables for a lot of data. | ||
+ | A hashtable is basically an array of N buckets (N being the size from /hmake) where each bucket is just a pointer (4 bytes) to a linked list of items. In principle you could enlarge the number of buckets later on by resizing the array, but that means the hash values for items change: the bucket number for an item, i.e. the index into the array of pointers, is determined by the formula "hashfunc(itemname) modulo N". Changing N on the fly essentially means you have to reassign most of the items to other buckets, which is a nightmare with large hashtables. | ||
+ | |||
+ | |||
+ | This means that $hget will indeed be slower if you take a smaller hashtable and fill it with the same amount of items as before, since the average amount of items to traverse in a bucket when looking for a match is N/2, which would mean that $hget takes much longer on a big file loaded in a small hash table, compared to a big file loaded in a big hash table. | ||
+ | A benchmark would show this, although you'd have to repeat $hget a large number of times of course, as the speed of individual $hgets is still negligible. | ||
+ | |||
+ | |||
+ | On the other hand, if your file contains M entries, /hload does M implicit $hgets to ensure that item names won't be added twice. If the factor M:N (i.e. items:size) is 10:1, then it will do (on average) 5 * M lookups, if it is 100:1, then it will do 50 * M lookups and so on. For large values of M, this difference is noticeable, even with one single /hload. So you should choose N very carefully before issuing a /hload. | ||
+ | |||
+ | |||
+ | Note that there is no such thing as an "overflow" in hashtables; buckets point to linked lists that can be of arbitrary length. If you make a hashtable of size 1, you are essentially using a single linked list of items, and everything will still work fine (but $hget would be just as slow as $hfind in that case). | ||
+ | |||
+ | |||
+ | == Deleting unwanted items from a hash table == | ||
+ | |||
+ | In some cases, you'll want to loop over ''all'' the items in the hashtable, deleting the ones you do not want anymore. It may be a bit unclear at first how to do this: deleting items while looping over them is kind of like removing the ground under your own feet; how can you be sure you don't end up skipping over items, for example? | ||
+ | |||
+ | Fortunately, there's one fundamental fact that you can use to do this: if you loop over N items using $hget().item, and then delete the N+1'th item, then those N items will remain the same. The N+1'th item, though, will be a new item (typically the N+2'th item). For example, if $hget().item returns A, B, C and D for respectively N values 1, 2, 3 and 4. and then you delete item C, then you can be sure that $hget().item with N values 1 and 2 will still return A and B after deleting C, respectively. Item D, however, will now be item 3 instead of 4, because there will never be any "gaps" between item numbers. | ||
+ | |||
+ | The following alias shows how to put this into practice, by looping over all the items in a given hashtable, and calling a custom identifier $want-to-delete on each of them, deleting the items for which $want-to-delete returns $true. The loop will always complete (no infinite loop!), and after it has completed, ''all'' the items that were to be deleted, will indeed be gone. | ||
+ | |||
+ | ; usage: /purgeitems <hashtable> | ||
+ | [[alias]] purgeitems { | ||
+ | ; our counter: we start off at the first item | ||
+ | [[var]] %i = 1 | ||
+ | |||
+ | ; as long as the %i-th item exists, continue looping | ||
+ | [[while]] ([[$hget]]([[$1-|$1]],%i).item != [[$null]]) { | ||
+ | ; temporarily store the name of this item in a variable | ||
+ | ; $v1 refers to the "$hget($1,%i).item" bit above | ||
+ | [[var]] %t = [[$v1]] | ||
+ | |||
+ | ; now perform a check to see if we still want to keep this item; | ||
+ | ; $want-to-delete is a custom alias (not included) and returns | ||
+ | ; either $true or $false, indicating whether we should delete %t | ||
+ | [[if]] ($want-to-delete(%t)) { | ||
+ | ; yes, we want to delete this item; use its item name (%t) to | ||
+ | ; actually delete it using /hdel | ||
+ | [[hdel]] $1 %t | ||
+ | |||
+ | ; the important part is that we do NOT increase %i here; | ||
+ | ; we just deleted the %i-th item, so the %i-th item is now a | ||
+ | ; new item. due to the way hashtable iteration works, we also | ||
+ | ; know for sure that we have not seen the %i-th item yet! | ||
+ | } | ||
+ | [[else]] { | ||
+ | ; we're leaving this item in, so let's increase %i so that we | ||
+ | ; move on to the next item. | ||
+ | [[inc]] %i | ||
+ | } | ||
+ | } | ||
+ | } | ||
+ | |||
+ | '''''Note:''''' This is a very expensive (slow) operation, you should not use $hget().item unless you have to. If you can, use [[hdel|/hdel]] with the -w (wildcard) switch instead! | ||
+ | |||
+ | For more information about hash tables, see http://en.wikipedia.org/wiki/Hash_table | ||
[[Category:MIRC Help]] | [[Category:MIRC Help]] |
Latest revision as of 12:28, 25 April 2006
Contents
Basic Defintion
Basically, hash tables are like ini files, storing information in the following format:
Tablename
item1=data1
item2=data2
...
However, hash tables are a lot faster than ini- or text-files, which especially takes effect if you have a lot of entries. A kind of disadvantage is the missing possibility to see all entries. With some files, you can just open them and take a look at every entry, but with hash tables, you can't just open them. To have the chance to still get an overview, you can use the following script:
; let's make a little alias called showhash. Its usage is /showhash <tablename> alias showhash { ; echo the name and a kind of "table header" echo -a $1 echo -a item => data var %i = 1 ; lets loop through all items. $hget($1,0).item will return the total amount of items. while (%i <= $hget($1,0).item) { echo -a $hget($1,%i).item => $hget($1,%i).data ; increase looping-variable inc %i } }
Modifying hash tables
You can make a new hashtable using the /hmake, respectively delete one using the /hfree command. By using /hdel and /hadd, you can modify the data saved in your hashtable. Hash tables are lost whenever mIRC is closed, so if you want to keep the same hast table in your next mIRC session, you have to use the /hsave and /hload commands upon respectively exiting and starting your mIRC.
Receiving data
After saving data in hash tables, you can of course receive them. Therefore, use the $hget and $hfind identifiers.
More detailed information
The following explanations aren't necessary for using hashtables in mIRC scripts, however, these facts could help you make your scripts faster, especially if you are going to use hash tables for a lot of data.
A hashtable is basically an array of N buckets (N being the size from /hmake) where each bucket is just a pointer (4 bytes) to a linked list of items. In principle you could enlarge the number of buckets later on by resizing the array, but that means the hash values for items change: the bucket number for an item, i.e. the index into the array of pointers, is determined by the formula "hashfunc(itemname) modulo N". Changing N on the fly essentially means you have to reassign most of the items to other buckets, which is a nightmare with large hashtables.
This means that $hget will indeed be slower if you take a smaller hashtable and fill it with the same amount of items as before, since the average amount of items to traverse in a bucket when looking for a match is N/2, which would mean that $hget takes much longer on a big file loaded in a small hash table, compared to a big file loaded in a big hash table.
A benchmark would show this, although you'd have to repeat $hget a large number of times of course, as the speed of individual $hgets is still negligible.
On the other hand, if your file contains M entries, /hload does M implicit $hgets to ensure that item names won't be added twice. If the factor M:N (i.e. items:size) is 10:1, then it will do (on average) 5 * M lookups, if it is 100:1, then it will do 50 * M lookups and so on. For large values of M, this difference is noticeable, even with one single /hload. So you should choose N very carefully before issuing a /hload.
Note that there is no such thing as an "overflow" in hashtables; buckets point to linked lists that can be of arbitrary length. If you make a hashtable of size 1, you are essentially using a single linked list of items, and everything will still work fine (but $hget would be just as slow as $hfind in that case).
Deleting unwanted items from a hash table
In some cases, you'll want to loop over all the items in the hashtable, deleting the ones you do not want anymore. It may be a bit unclear at first how to do this: deleting items while looping over them is kind of like removing the ground under your own feet; how can you be sure you don't end up skipping over items, for example?
Fortunately, there's one fundamental fact that you can use to do this: if you loop over N items using $hget().item, and then delete the N+1'th item, then those N items will remain the same. The N+1'th item, though, will be a new item (typically the N+2'th item). For example, if $hget().item returns A, B, C and D for respectively N values 1, 2, 3 and 4. and then you delete item C, then you can be sure that $hget().item with N values 1 and 2 will still return A and B after deleting C, respectively. Item D, however, will now be item 3 instead of 4, because there will never be any "gaps" between item numbers.
The following alias shows how to put this into practice, by looping over all the items in a given hashtable, and calling a custom identifier $want-to-delete on each of them, deleting the items for which $want-to-delete returns $true. The loop will always complete (no infinite loop!), and after it has completed, all the items that were to be deleted, will indeed be gone.
; usage: /purgeitems <hashtable> alias purgeitems { ; our counter: we start off at the first item var %i = 1 ; as long as the %i-th item exists, continue looping while ($hget($1,%i).item != $null) { ; temporarily store the name of this item in a variable ; $v1 refers to the "$hget($1,%i).item" bit above var %t = $v1 ; now perform a check to see if we still want to keep this item; ; $want-to-delete is a custom alias (not included) and returns ; either $true or $false, indicating whether we should delete %t if ($want-to-delete(%t)) { ; yes, we want to delete this item; use its item name (%t) to ; actually delete it using /hdel hdel $1 %t ; the important part is that we do NOT increase %i here; ; we just deleted the %i-th item, so the %i-th item is now a ; new item. due to the way hashtable iteration works, we also ; know for sure that we have not seen the %i-th item yet! } else { ; we're leaving this item in, so let's increase %i so that we ; move on to the next item. inc %i } } }
Note: This is a very expensive (slow) operation, you should not use $hget().item unless you have to. If you can, use /hdel with the -w (wildcard) switch instead!
For more information about hash tables, see http://en.wikipedia.org/wiki/Hash_table