Difference between revisions of "Category:Hash Table"

From Scriptwiki
Jump to: navigation, search
(added text by saturn)
(from the between-1am-and-2am dept.: added "deleting unwanted items" bit)
Line 47: Line 47:
 
Note that there is no such thing as an "overflow" in hashtables; buckets point to linked lists that can be of arbitrary length. If you make a hashtable of size 1, you are essentially using a single linked list of items, and everything will still work fine (but $hget would be just as slow as $hfind in that case).  
 
Note that there is no such thing as an "overflow" in hashtables; buckets point to linked lists that can be of arbitrary length. If you make a hashtable of size 1, you are essentially using a single linked list of items, and everything will still work fine (but $hget would be just as slow as $hfind in that case).  
  
 +
 +
== Deleting unwanted items from a hash table ==
 +
 +
In some cases, you'll want to loop over ''all'' the items in the hashtable, deleting the ones you do not want anymore. It may be a bit unclear at first how to do this: deleting items while looping over them is kind of like removing the ground under your own feet; how can you be sure you don't end up skipping over items, for example?
 +
 +
Fortunately, there's one fundamental fact that you can use to do this: if you loop over N items using $hget().item, and then delete the N+1'th item, then those N items will remain the same. The N+1'th item, though, will be a new item (typically the N+2'th item). For example, if $hget().item returns A, B, C and D for respectively N values 1, 2, 3 and 4. and then you delete item C, then you can be sure that $hget().item with N values 1 and 2 will still return A and B after deleting C, respectively. Item D, however, will now be item 3 instead of 4, because there will never be any "gaps" between item numbers.
 +
 +
The following alias shows how to put this into practice, by looping over all the items in a given hashtable, and calling a custom identifier $want-to-delete on each of them, deleting the items for which $want-to-delete returns $true. The loop will always complete (no infinite loop!), and after it has completed, ''all'' the items that were to be deleted, will indeed be gone.
 +
 +
; usage: /purgeitems <hashtable>
 +
[[alias]] purgeitems {
 +
  ; our counter: we start off at the first item
 +
  [[var]] %i = 1
 +
  ;
 +
  ; as long as the %i-th item exists, continue looping
 +
  [[while]] ($hget($1,%i).item != $null) {
 +
    ; temporarily store the name of this item in a variable
 +
    ; $v1 refers to the "$hget($1,%i).item" bit above
 +
    [[var]] %t = $v1
 +
    ;
 +
    ; now perform a check to see if we still want to keep this item;
 +
    ; $want-to-delete is a custom alias (not included) and returns
 +
    ; either $true or $false, indicating whether we should delete %t
 +
    [[if]] ($want-to-delete(%t)) {
 +
      ; yes, we want to delete this item; use its item name (%t) to
 +
      ; actually delete it using /hdel
 +
      [[hdel]] $1 %t
 +
      ;
 +
      ; the important part is that we do NOT increase %i here;
 +
      ; we just deleted the %i-th item, so the %i-th item is now a
 +
      ; new item. due to the way hashtable iteration works, we also
 +
      ; know for sure that we have not seen the %i-th item yet!
 +
    }
 +
    [[else]] {
 +
      ; we're leaving this item in, so let's increase %i so that we
 +
      ; move on to the next item.
 +
      [[inc]] %i
 +
    }
 +
  }
 +
}
 +
 +
Note that this is a very expensive (slow) operation, you should not use $hget().item unless you have to.
 
[[Category:MIRC Help]]
 
[[Category:MIRC Help]]

Revision as of 01:01, 12 April 2006

Basic Defintion

Basically, hash tables are like ini files, storing information in following format:

Tablename
item1=data1
item2=data2
...

However, hash tables are a lot faster than ini- or text-files, which especially takes effect if you have a lot of entries. A kind of disadvantage is the missing possibility to see all entries. With some files, you can just open them and take a look on every entry, whereat hash table, you can't just open then. To have the chance to still get an overview, you can use the following script:

; lets make a little alias called showhash. At the end, it will look like /showhash <name>
alias showhash {
 ; echo the name and a kind of "table header"
 echo -a $1
 echo -a item => data
 var %i = 1
 ; lets loop through all items. $hget($1,0).item will return the total amount of items.
 while (%i <= $hget($1,0).item) {
  echo -a $hget($1,%i).item => $hget($1,%i).data
  ; increase looping-variable
  inc %i
 } 
}

Modifying hash tables

You can make a new hashtable using the /hmake, respectivly delete one using the /hfree command. Using /hdel and /hadd, you can modify the data saved in your hashtable. Due to hash tables not being automatically saved, you have to use the /hsave and /hload command.

Receiving data

After saving data in hash tables, you can of course receive them. Therefor, use the $hget and $hfind identifiers.


More detailed information

The following explanations aren't necessary for using hashtables in mIRC scripts, however, these facts could help you make your scripts faster, especially if you are going to use hash tables for a lot of data.


A hashtable is basically an array of N buckets (N being the size from /hmake) where each bucket is just a pointer (4 bytes) to a linked list of items. In principle you could enlarge the number of buckets later on by resizing the array, but that means the hash values for items change: the bucket number for an item, i.e. the index into the array of pointers, is determined by the formula "hashfunc(itemname) modulo N". Changing N on the fly essentially means you have to reassign most of the items to other buckets, which is a nightmare with large hashtables.


This means that $hget will indeed be slower if you take a smaller hashtable and fill it with the same amount of items as before, since the average amount of items to traverse in a bucket when looking for a match is N/2, which would mean that $hget takes much longer on a big file loaded in a small hash table, compared to a big file loaded in a big hash table. A benchmark would show this, although you'd have to repeat $hget a large number of times of course, as the speed of individual $hgets is still negligible.


On the other hand, if your file contains M entries, /hload does M implicit $hgets to ensure that item names won't be added twice. If the factor M:N (i.e. items:size) is 10:1, then it will do (on average) 5 * M lookups, if it is 100:1, then it will do 50 * M lookups and so on. For large values of M, this difference is noticeable, even with one single /hload. So you should choose N very carefully before issuing a /hload.


Note that there is no such thing as an "overflow" in hashtables; buckets point to linked lists that can be of arbitrary length. If you make a hashtable of size 1, you are essentially using a single linked list of items, and everything will still work fine (but $hget would be just as slow as $hfind in that case).


Deleting unwanted items from a hash table

In some cases, you'll want to loop over all the items in the hashtable, deleting the ones you do not want anymore. It may be a bit unclear at first how to do this: deleting items while looping over them is kind of like removing the ground under your own feet; how can you be sure you don't end up skipping over items, for example?

Fortunately, there's one fundamental fact that you can use to do this: if you loop over N items using $hget().item, and then delete the N+1'th item, then those N items will remain the same. The N+1'th item, though, will be a new item (typically the N+2'th item). For example, if $hget().item returns A, B, C and D for respectively N values 1, 2, 3 and 4. and then you delete item C, then you can be sure that $hget().item with N values 1 and 2 will still return A and B after deleting C, respectively. Item D, however, will now be item 3 instead of 4, because there will never be any "gaps" between item numbers.

The following alias shows how to put this into practice, by looping over all the items in a given hashtable, and calling a custom identifier $want-to-delete on each of them, deleting the items for which $want-to-delete returns $true. The loop will always complete (no infinite loop!), and after it has completed, all the items that were to be deleted, will indeed be gone.

; usage: /purgeitems <hashtable>
alias purgeitems {
  ; our counter: we start off at the first item
  var %i = 1
  ;
  ; as long as the %i-th item exists, continue looping
  while ($hget($1,%i).item != $null) {
    ; temporarily store the name of this item in a variable
    ; $v1 refers to the "$hget($1,%i).item" bit above
    var %t = $v1
    ;
    ; now perform a check to see if we still want to keep this item;
    ; $want-to-delete is a custom alias (not included) and returns
    ; either $true or $false, indicating whether we should delete %t
    if ($want-to-delete(%t)) {
      ; yes, we want to delete this item; use its item name (%t) to
      ; actually delete it using /hdel
      hdel $1 %t
      ;
      ; the important part is that we do NOT increase %i here;
      ; we just deleted the %i-th item, so the %i-th item is now a
      ; new item. due to the way hashtable iteration works, we also
      ; know for sure that we have not seen the %i-th item yet!
    }
    else {
      ; we're leaving this item in, so let's increase %i so that we
      ; move on to the next item.
      inc %i
    }
  }
}

Note that this is a very expensive (slow) operation, you should not use $hget().item unless you have to.

Pages in category "Hash Table"

The following 10 pages are in this category, out of 10 total.