Web Tricks And How To Spot Them
There are some cool things possible with web servers. Quite a few of these come from the roots of the protocols that the web is built on.
The complete form of a Universal Resource Locator (URL) is:
Most of these are optional. If you leave out the protocol in a web browser it will likely default to HTTP (web). If you leave out the directory, then it uses the default directory. If you leave out the filename, then it uses the default. The username/password options aren't often used because they are in plain sight of those looking over your shoulder. If the web server isn't checking the user name, it just discards it.
host.domain.tld - Connecting with the server.
Most people know that a domain name (www.ifxgroup.net) turns into an IP address like 188.8.131.52 before the web browser is able to connect to the web server. But few people know that the IP address is physically a 32 bit number that works same in decimal form as it does in
dotted IP address form.
Give it a try:
- http://www.ifxgroup.net/ - This is the domain name form. It must be translated into an IP address by a DNS resolver before it can be used.
- http://184.108.40.206/ - This is the dotted IP address of the web server above. It can be used directly by any web browser.
- http://411142642/ - This is the decimal version of the 32 bit number represented in dotted form above. It can also be used directly by any web browser but this is often a hard number for humans to memorize so it is not commonly used.
All three of the above links go to the same page on the same web server. Try clicking each of them.
username:password@ - Passing login information.
But it gets more interesting. Some web servers require a login with a username and password. It is not widely known that this information can be included on the URL line in a form that looks like an email address (e.g. user:firstname.lastname@example.org).
If your web server does not require a login, this extra information should be ignored by the server. This means you can have a URL with a user name that looks like the domain name of another web server. An example of this is below. Please note that all four URL addresses point to the same page on the same web server.
You can see that as the domain name and dotted IP address are hidden, it is easy to fool the human eye into thinking they are actually going someplace else. This has been already done for humor and abuse. Spammers often hide their web address information from the novice user with these tricks. Once you know the tricks, you can better protect yourself from them.
Did you notice all of the percent (%) signs in the fourth example above? This is another interesting artifact of the world wide web. There are a lot of different operating systems each with very different abilities and conventions for file reference and naming. To help bridge the gap between operating systems that may not be able to handle characters supported on another, a method of quoting the special characters was created using the percent sign. This simply takes the character value (ASCII on most Intel compatible hardware platforms) and turns it into a two digit hexadecimal number and adds the percent sign. This means %20 is the same as a space. With this knowledge, can you now make some sense of the fourth line? (Hint: the quoted characters translate to the same as the previous line.)
There is a known issue with some older Microsoft Internet Explorer versions that causes it to fail to correctly display all of the characters on the address bar when low ASCII characters are used in quoted format. This means if a URL has %01 right before the at (@) sign the address bar does not display the rest of the real address to the user. You can check your browser by using this simple test. The security bug does not appear to be a problem for any of the non-Microsoft web browsers. In fact, modern browsers like Firefox notify the user about this kind of deception attempt.
One of the easiest ways to protect yourself is to know how to read a URL. The first step is to be able to locate the identifying characters; slash (/), at (@) and colon (:). These are the characters used by the web browser to find the server too.
Some simple rules to remember:
- All URL addresses must start with a protocol identifier. This is normally the technical acronym of the protocol followed by a colon and two slash marks. (e.g. http://) Most browsers will default to http:// if the server portion of the URL starts with
- If there is an AT (@) symbol, ignore the user information until after the target server has been contacted. (e.g. email@example.com)
- If there is a slash (/) then ignore the remainder of the line until after the target server has been contacted. (e.g. www.example.com/default.htm) If no slash is given, one is implied at the end of the URL and will sometimes even show up on the URL line when you connect to some servers.
- If there is a colon (:) as part of the server address portion, use that port instead of the default one for the specified protocol. (e.g. http://www.example.com:81) The default TCP port number for HTTP is 80.
The above rules have implied defaults. That means they do not always have to be typed. If you are still unsure how all of this works and how to decode complex URL addresses, check out the tools section of this site for the official IFX Group URL translator.
It is becoming more popular to paste URL addresses into email messages so others can quickly reach the same location. The limit is that some URL addresses are so long that email client programs are forced to split them into multiple lines (called word-wrapping) in order to fit the text into the message display window. This wrapping often breaks the ability of the receiving email client program to correctly identify the lines that are part of the URL and which lines are just part of the email message text.
The easy way to fix this is to enclose the URL in angle brackets,
< at the start and
> at the end, or parentheses,
( at the start and
) at the end, that allows most of the modern email client programs to correctly locate all of the parts of the URL, even when it is split on multiple lines.