Today, we're going to continue our study into the internet. We'll look briefly at how computers talk to each other, and then we'll move into deeper detail as to how the internet works. Let's get started with a quick talk on protocol.
In the beginning tutorials, we talked about the client-server model, and how communication on the internet happens. However, the model as we introduced it is a little simplistic. For instance, how do the computers know how to speak the same language? The answer is a standardized protocol that is defined by a specification that everyone follows. All the programmers, that is. OS makers don't need to know how to implement protocols: they only give the programmers a way in and out of the computer with ports.
A single computer doesn't speak one language. All computers are quite multi-lingual, but imagine if you had one mouth and wanted to say different things in several different languages at once. With ports, one ethernet cord can be used to service around 65,535 different conversations at once. Each port is really conceptual, there's not actually 65,535 different lines in one ethernet cable. Rather, it's the thing that OS makers give to programmers as a way to say 'this conversation is going to be about this specific thing'. For instance, all communication on port 80 is definitely about http requests or responses, or simply about surfing the web. In the same way, if you use a local mail client like outlook, then port 23 is set aside for mail communication. If you make a program that uses such ports for other purposes, be prepared for some complaints1.
So let's say that we were talking on port 80 to a webserver. We would certainly be using the http protocol, which consists, in a simplified version, of:
HTTP 1.1 is the (currently) latest version of the HTTP protocol, in case you were wondering.
Another protocol you might use is FTP, which is used for transfering files over the internet. If you use http to transfer everything, then you might wonder why ftp is needed. Http is a one shot deal: can I have a page? yeah, here it is! whereas ftp is connect, then do things with files until the client wants to end the connection. Ftp is usually used to upload webpages, because trying to upload files through http is tedious and takes too much time. In this protocol, there are usually two ports involved - port 21, and a random port larger than 1024. Port 21 is the port through which you talk to the ftp server, while simutaneous file transfer happens on the other port.
Once again, we have given you a half-baked model. How exactly does a computer know where another computer sits in relation to it? The quick answer is that your request is sent out along the isp servers, which forward the request to one of the main DNS servers, which then pass your request to a router which ensures that the request gets to the right place. If you didn't catch that, that's okay. We'll take it bit by bit now.
We'll start with your computer, whereas you make an http request to your LAN router. You want to know where "google.com" is2. Your LAN router doesn't have a clue where google.com is, either (it's not in the LAN), so it asks your internet provider's (isp) servers. They might know, but chances are they don't know where exactly. So they then forward your request to a DNS server, or a domain name server.
Before we move on, let's take a quick look at the url, or the human-readable address system of the internet. A traditional url is formatted like this: http://www.google.com. Now, let's take it apart:
If your isp's servers forwarded the request to the right domain name server, then the DNS server would look through a list of names that it serves, and match up google to the ip address 64.233.167.99.
An ip address is the computer friendly way of stating the same thing as the url. It consists of 4 numbers split apart by .'s, which, as a general rule, goes from general to specfic, left to right.
After putting an ip address to the request, the DNS servers don't know how to get there, so they hand off the request to one of the core routers of the internet. This router knows what he's doing, so he sends the request in the right direction, in which a smaller router also knows where to send the request, and finally delivers it to google's server. Inside the server, the google machine searches, comes up with a bunch of results, and sends them back along the router path. Finally, you get your search list for bannanas back, and you are finally happy. Or are you?
If you ever check your ip, you might notice that it changes every time you reconnect to your service. If you don't ask for a static ip, they'll just give you a new ip address each time you reconnect, because manually allocating ip addresses is just tedious and hard: they would have to key track of your ip address, and you would have to make sure that your computer was configured to match that ip address. Automatically assigning ip addresses is just plain easier. However, if you were going to setup a server and serve your own website, you would definitly want a static ip address so people could find you.
Currently, humans are using up more than 3/4 of the ip addresses that can ever be in existence. Within a few years, the last couple ip addresses will be snatched up, which leaves us with no more internet real estate. It's kind of like the exhaustion of our natural resources, if you will. A new system, ipv6, will solve the crunch in a big way: It would make around 5x10^28 ip addresses for each person on earth, certainly more than enough. Each mobile device and each computer behind a router could have it's own ip address, unlike today where routers represent only one ip address for an entire LAN.
These core routers aren't very much like your LAN router; they're very large behemoths of routers, because they keep the internet running. Your local router mostly splits up among and transfers information between
If you use hotmail or gmail, you might have noticed some issues with certificates pop up once in a while. You might have also clicked on through the warnings because, gosh darn it, you just want to read your mail! Certificates are little files that guaruntee that a site you think you're looking at is the actual site you're looking at. This way, you'll know if someone is spoofing a connection, or telling your browser that a site is a different site, usually one that requires you to give a password. If you have a certificate on your hands, it must be signed by a certificate authority. You might have seen the "signed by verisign" animations around the internet. If it's not signed correctly, your browser will complain, even though most of the time it's a harmless mistake. If nothing looks terribly bad, it should be safe to click through the warnings.
If you have the guts to handle using some command line tools, let's take a look at ping and tracert(route). Let's do Start>run>"cmd", and you'll get a black box. Now, type in:
ping 64.233.167.99
And you should get something like:
Pinging 64.233.167.99 with 32 bytes of data:Reply from 64.233.167.99: bytes=32 time=99ms TTL=245...
Ping statistics for 64.233.167.99:
Packets: Sent = 4, Recieved = 4, Lost = 0 (0% lost),...
If you got something different, oh well. What ping did was say "hey, are you there?" and you should have gotten a response, because those were google's servers. You can do the same thing, but with a domain name, like:
ping google.com
Tracert does something similar to ping, but it essentially pings on every node that it passes through. Let's try tracert on google.com:
tracert google.com
And after a few seconds, we should have a list of hops. The first few entries should be servers that belong to your isp. After that, I got a couple random-looking ip addresses, which are probably routers, and finally got to en-in-f99.google.com. Whatever that means.
Bittorrent? You know? The file sharing system? Okay, so bittorrent is a different system of downloading things. Instead of one central server, and a bunch of clients downloading the same file, the clients download from each other.
Specifically, a client is told two things: what to download, and the tracker that keeps as list of all the downloaders downloading that certain file. The bittorrent client(A) links up with the tracker, and get a list of who's downloading the file. Then, the client goes around to the other clients and figures out who has what, because not everyone has everything. Then, the client finally goes to another client(B) that has a connection to spare, and starts downloading the file bit by bit. If that other client(B) goes off line, the client(A) merely has to connect to another client(C) to finish downloading the file. While all of this is happening, new clients(X,Y) are downloading from clients(A,C) that have more of the file than they do.
Just looking at the model, one can see a potential issue: if everyone logs off after they're done downloading, then how can anyone download a full copy of the file? Good-hearted people, though, leave up their bittorrent clients after they're done downloading to upload, or seed, new clients until they have a share ratio of 100% (meaning they uploaded all of what they downloaded)3.
If you've heard of bittorrent, you might also have heard of gnutella. This is another file sharing system, but this one works not by aggregating clients by the files they download, but by 'permentantly' linking clients together. For instance, if you do a search using gnutella, hen your client has the ip address of a couple other computers that use the gnutella network: we'll say 5 in this case. Now, you want to search for a file, so your gnutella client asks the computers it's linked to if they have any files fitting that query. If they do, then they send back a list of files they have. Otherwise, they then ask the files they're connected to for the same query, and over and over for a predetermined number of hops. Eventually, a large list is aggregated, and you, the user, can peruse the list to your lesuire.
If you've been following the computing world, you might have heard about the 100$ laptop project. It's a project to provide laptops to kids in the third world, hence the small price tag, while not skimping on quality. One ability of these laptops is the ablity to make mesh networks: without internet access (probable in the third world) kids should still be able to network, reasoned the guys at MIT that thought up the project. Thus, whenever a laptop is started up, it looks around for other laptops on a wireless network, and then joins them. If there aren't any laptops around, the lonely laptop on a network all by itself. Then, a laptop may power up in the range of the first laptop. The two will find each other wirelessly, and then make a new network. If the second laptop is in the range of two laptops that aren't in the range of each other, though, then the second laptop can act as a router for the two other laptops, making a little network of three computers.
And that's why it's called a mesh network: each node (laptop) can connect to everyother node in it's range, and they all act as clients and servers for each other. They're all 'equal'.
Now that you have a little more solid grasp of networks, let's move on to how to secure yourself in such a networked world.