Big sockets tutorial

From Scriptwiki
Jump to: navigation, search

This is an advanced socket tutorial covering TCP sockets, and using as an example an http connection. It provides you with an explanation of everything going on, and assumes you have no clue as to what a socket really is. In addition, this tutorial uses binary variables to ensure this works correctly. You may use normal variables if you know what you are doing an understand what is going on.

In this tutorial we will create a script that goes to youtube.com and finds the title of any link pasted in the channel.

What Are Sockets, How Do Networks Work?

First let's de-romanticize sockets. The idea is rather simple but most scripters go into them thinking they are magical and difficult. Sockets (Read: TCP Sockets) are the method used by programs to communicate over a network, a network such as the internet.

A TCP Socket must connect to a computer using two things: an IP Address which specifies where the computer is in the world, and a port, which is a number between 1 and 65535 that contains the program you wish to communicate with on the computer.

Once a connection is established, you can read (receive data) and write (send data) through it, until it is disconnected.

Script Setup

First let us set up our script. We will catch all youtube links in any window and call an alias "youtubelookup" with the location of the file. Note: http://www.youtube.com/watch?v=dF184_T_eWw&feature=sub The HOST (which counts as the IP Address, in a manner of speaking) is www.youtube.com The LOCATION is /watch?v=dF184_T_eWw&feature=sub The PROTOCOL is http

on *:TEXT:*youtube.com/*:*: {
 ; Get the first youtube.com link from the text the users say. We use the $mid just so we don;t have to worry about the http://www.you... crap in our $pos
 var %link = $mid($wildtok($1-, *youtube.com/*,1,32),10)
 ; Get the location by returning everything after and including the /
 var %location = $mid(%link,$pos(%link,/))
 ; Call alias youtubelookup
 youtubelookup %location
}

Now that the dirty business there is done, lets get to the sockets.

Making the Connection

Now we must establish a connection to youtube's servers. I said before that you need an IP and a port, well theres a bit of a catch. The internet has a more practical way than remembering complex IP addresses such as 74.125.95.93, they are called host names. www.youtube.com is a host name. mIRC makes things easy for us, we can give it a host name and a port, or an a ip and a port.

So clearly our host name is www.youtube.com, but what is our port? Some ports are reserved, here is a list of reserved ports: Reserved Port Numbers. What you need to know now is that the reserved port for the HTTP protocol, which is what the web is based on in general, is 80. There are others, but 80 is the most common, and the default port for all browsers.

We will use the sockopen command to connect. In the interest of time I will not explain every command or event, but will instead provide a link, please use them to learn more.

In mIRC sockets are referred to by their names, so we will need to give this connection a unique name. We could use a name such as 'youtubecheck' but then we would only be able to do one at a time. Instead, let us use a randomly made name created by using this: youtubecheck $+ $ticks

alias youtubelookup {
  ; We make sure the location was given to us, otherwise we echo an error and stop doing things
  if ($0 != 1) { echo -a * /youtubelookup: invalid parameters | halt }
  ; We generate our random name and make absolutely certain this name isn't taken.  Note since our on text can work for both channels and queries, we use an iif
  var %name = youtubelookup $+ $ticks
  if ($sock(%name)) { msg $iif(#,#,$nick) Youtube lookup error, name was in use ( $+ %name $+ ) | halt }
  ;
  sockopen %name www.youtube.com 80
  ; We will need a way to tell which channel or nick to respond to once we get our reply from youtube's servers. We use sockmark for this. 
  ;Sockmark is a simple way to store data related to a socket in text form. Please click the link for more information
  sockmark %name $iif(#,#,$nick) $1
}

Sending Data

Now we have opened a connection to youtube's servers. A Protocol is the rules by which two things communicate. In this case, it tells us what we (the client) and what youtube (the server) should say in response to events. The HTTP protocol says that once connected, the client should send headers, a list of information about what we want the server to send us.
Headers look like this:
GET /mypage.html HTTP/1.1
Host: mysite.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; ru; rv:1.9.2.3) Gecko/20100401 Firefox/4.0 (.NET CLR 3.5.30729)
Connection: close


You can look them up at the link, by the basic gist of this is that we want to get the location /mypage.html from mysite.com. Once we get it, we want you to close the connection. We are browsing with Mozilla Firefox 5.0 for Windows. Live HTTP Headers is a plugin for firefox which allows you to see the headers firefox sends. If you have trouble viewing a site with your sockets, use every header that firefox sends just in case.

Let us now send our request to youtube once we connect:

on *:SOCKOPEN:youtubelookup*: {
 ; We set an variable as an alias to write the data to the socket so we don't have to type it every time
 var %n = sockwrite -n $sockname
 ; We GET the location we parsed out in our on text event. If you recall, we set this as the second word in our sockmark in the alias above.
 %n GET $gettok($sock($sockname).mark,2,32) HTTP/1.1
 ; Some IPs have many web sites on them.  We need to tell the web server that www.youtube.com is the host we are using.
 %n Host: www.youtube.com
 ; Some web sites only allow certain web browsers, so we lie a bit here and tell youtube we are firefox 5.0.
 %n User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; ru; rv:1.9.2.3) Gecko/20100401 Firefox/4.0 (.NET CLR 3.5.30729)
 ; We make sure youtube closes its connection with us as soon as its sent everything we asked for
 %n Connection: close
 ; This empty line is the HTTP protocol way to tell it that we are done sending our data.
 %n
}

Using the Reply

Youtube's server will reply to what we asked for. There are many possible replies depending on what the deal is: The page may not be found, and then it will reply to tell us as much. The page may be moved, or the page may be just fine. No matter what the deal is, the server will explain it to us in headers. If the page exists, it will send us the page's HTML (javascript css etc, the stuff the web site is written in) after the header telling us it exists.

Now, we do not receive this whole web page at once! Our internet speeds aren't unlimited, and data must travel, for that reason we can only receive data at most as fast as our connection is, and so we must deal with the data as we receive it. In mIRC the easiest way to do this is to add it all to the end of a binary variable as we receive the data then write it to a file for storage and once we are done receiving data (Once the server closes the connection) then we can work with all of it at once. Let us do this now.

Sockread is triggered every time more data is received.

on *:SOCKREAD:youtubelookup*: {
 ; If there was an error, $sockerr will contain the error number. We will message the user and tell them about the error, then stop.
 if ($sockerr) { msg $gettok($sock($sockname).mark,1,32) There was an error while verifying youtube link: $sock($sockname).wsmsg | sockclose $sockname | halt }
 

 :read
 ; We will read the data we received
 sockread -f &data 
 ; We will write the data to the end of the file with the same name as the socket. The -1 means the end of the file.
  if ($sockbr > 0) { bwrite $sockname -1 &data }
 ; $sockbr (sock bytes read) contains the number of bytes (read: ascii letters) that we received this time.  We want to keep reading until we can't read anymore!
 ; Note: Once $sockbr is 0, this doesn't mean the server doesn't have more to send! It just means we haven't gotten any more.
 while ($sockbr > 0) { goto read } 
 ;
}

Using the Data

Congratulations! You are done with sockets. We have now asked youtube for information, and received our information. Now, we need to do something with it. This isn't related to sockets, but since I took you this far I'll take you to the end. Remember: We have written all the data we received constantly to a file which has the same name as the socket, now we will read this data, first lets add this bit of practice that will just open the file so we can see it all in all its glory :)

Recall: When the server is done, we told it to close the socket.

on *:SOCKCLOSE:youtubelookup*: { run notepad $sockname }

I used the same link as I did from above to test it, saying "http://www.youtube.com/watch?v=dF184_T_eWw&feature=sub" in a channel I am in. If you did too, you probably saw a few headers setting some cookies, other things, and HTTP/1.1 200 OK telling us this page is fine and it gave it to us.

Then you will see an empty line followed by what may be gibberish to you. This is the web page. It is HTML, CSS, and Javascript. I'm certainly not going to teach you HTML in this tutorial, if you're interested see w3schools.com - what's important is that everything on the web page is represented here in one form or another. What I want my script to do is tell the person who pasted the link the title of the page. Press ctrl+f in notepad and search the document for <title>: You should see this:

<title>YouTube - Congressmen Submit Emergency 3 AM Bill Demanding IHOP Stay Open All Night</title>

Although it will not be as pretty. Youtube and most mass sites have BAD code, they do not care as long as it works. There will be line breaks and spaces everywhere, we are going to fix that. This part is somewhat advanced and is not related to sockets, if you wish to continue you may, but otherwise I encourage you to use what you learned here and try several things; Try web pages that dont exists or redirect and see what they say, once you have what you need read from the file you downloaded and paste the data to a channel or user, just do not forget to delete the file when you are done reading from it!

Parsing The Data

on *:SOCKCLOSE:youtubelookup*: { 
 ; The place to send the data
 var %target = $gettok($sock($sockname).mark,1,32)
 ; Read the file into the binary file &data
 bread $sockname 1 $file($sockname).size &data
 ; Gets the position of <title> in the document
 var %start = $bfind(&data,1,<title>).text
 ; Gets the position of </title> in the document
 var %end = $bfind(&data,%start,</title>).text 
 ; If we could not find these tags, the file likely is not valid.
 if (!%start || !%end) { msg %target Invalid youtube link. | halt }
 ; get the data in between %start and %end, excluding the 7 letters for <title>
 echo -atg  Test: $bvar(&data,$calc(%start +7),$calc(%end - %start - 7))
 ; Remove new lines
 breplace &data 10 32
 breplace &data 13 32
 var %title = $bvar(&data, $calc(%start + 7), $calc(%end - %start - 7)).text
 ; Tell target
 msg %target Youtube Link: %title
 ; Delete the file
 .remove $sockname
}