The Internet Domain Name System: from .com till the new TLDs.

This article will explore the basic concepts of the Internet domain addressing system and its role among Internet services. We will briefly outline the history of symbolic names and their use when accessing Web services, the principles of constructing a domain name, and the hierarchy of domain names.

When conducting business, companies often exchange business cards. On these cards, both the email address and website address are listed. Other times, “internet addresses” and email addresses are requested. In all of these cases, we are speaking about the use of domain names.

The domain name is what is written after the commercial at symbol – “@”. For example, in user@example.com, the domain name of the mail node is example.com.

The site name is the domain name of that site. For example, a Microsoft website has the domain name microsoft.com.

In most cases of searching for information on the Internet, we ponder over domain names or follow links.

Quite often, “domain address” is used with the phrase ‘Internet address’. Generally speaking, neither one nor the other concept exists in TCP / IP networks. There is numeric addressing, which is based on IP addresses (a group of 4 numbers separated by a “.”), and an Internet Domain Name System (Domain Name System – DNS).

Numeric addressing is convenient for computer processing of route tables, but completely (here we exaggerate somewhat) unacceptable for human use. Memorizing sets of numbers is much more difficult than meaningful mnemonic names.

Internet connections are set up by IP addresses. Symbolic names of the domain name system are the essence of a service that helps to find the IP addresses of network nodes that are necessary to establish a connection.

It is the domain name that acts as the address of the information resource. In the practice of administering local networks, it is not uncommon for users to complain to the network administrator about the inaccessibility of a particular site or long page loading. The reason may lie not in the fact that the network segment has lost connection with the rest of the network, but in the poor functioning of the DNS – there is no IP address, there is no connection.

DNS has not existed since the birth of TCP / IP networks. At first, to facilitate interaction with remote information resources on the Internet, tables of correspondence between numeric addresses and machine names were used.

The authorship of these tables belongs to Dr. Jon Postel (author of many RFCs – Request For Comments). He was the first to maintain the hosts.txt file, which could be obtained via FTP.

Modern operating systems also support tables of correspondence between IP address and machine name (more precisely, host) – these are files named hosts. If we are talking about a Unix-type system, then this file is located in the / etc directory and looks like this:

0.0.1 localhost
220.218.0 amazon.com

To access a computer, a user can use both the computer’s IP address and its name or alias. As you can see from the example, there can be many synonyms, and also, the same name can be specified for different IP addresses.

Remember that the mnemonic name itself can obtain no access to a resource.

The procedure for using the name is as follows:

first, by name in the host file, find the IP address,
then a connection with a remote information resource is established by the IP address.

In local networks, host files are used quite successfully so far. Almost all operating systems, from various clones of Unix to the latest Windows versions, support this system of mapping IP addresses to hostnames.

However, this way of using symbolic names was fine as long as the Internet was small. As the Web grew, it became difficult to keep large, consistent lists of names on every computer. The main problem was not even the size of the match list, but the synchronization of its contents. To solve this problem, DNS was invented.

DNS was described by Paul Mockapetris in 1984. These are two documents: RFC-882 and RFC-883 (These documents were later superseded by RFC-1034 and RFC-1035). Paul Mokapetris also wrote the DNS implementation, the JEEVES program for the Tops-20 OS. Administrators of machines with Tops-20 OS of the MILNET network are proposed to switch to that software in RFC-1031. We will not go into fine detail about the content of RFC-1034 and RFC-1035, but we will look at the basic concepts.

The role of the domain name in the connection establishment process remains the same. This means that the main thing for which it is needed is to obtain an IP address. Corresponding to this role, any DNS implementation is an application that works on the TCP / IP protocol stack. Thus, the IP address remains the basic element of addressing in TCP / IP networks, and domain naming serves as an auxiliary service.

The domain name system is built on a hierarchical basis. More precisely, according to the principle of nested sets. The system root is called “root” (literally translated as “root”) and is not indicated in any way (it has an empty name according to RFC-1034).

It is often written that the designation of the root domain is the symbol “.”, But this is not so, the dot is the separator of the domain name components, and since the root domain has no designation, the FQDN ends with a period. However, the “.” is quite firmly entrenched in the literature as a designation for the root domain. In part, this is because in the configuration files of DNS servers, this very character is indicated in the domain name field (NAME field according to RFC-1035) in resource description records when it comes to the root domain.

The root is all the many hosts on the Internet. This set is subdivided into first or top-level domains (top-level or TLD). The .ca domain, for example, corresponds to many hosts on the Canadian part of the Internet. Top-level domains are split into smaller domains, for example, corporate.

In the 80s, the first top-level domains were defined: gov, mil, edu, com, net. Later, when the network crossed the national borders of the United States, national domains appeared like co.uk, jp, au, ch, etc.

As already mentioned, the domains of the first level (top-level) are followed by domains that define either regions (msk) or organizations (kiae). These days, almost any organization can get its second-level domain. To do this, you need to send an application to the provider and receive a registration notification.

Next are the next levels of the hierarchy, which can be assigned either to small organizations or large organizations’ divisions.

Part of the domain naming tree can be represented as follows:

Fig. 1. An example of a portion of the domain name tree.

The root of the tree has no label name. Therefore, it is referred to as “”. The rest of the tree nodes have labels. Each of the nodes corresponds to either a domain or a host. In this tree, a host means a leaf, i.e., a node below which there are no other nodes.

You can name a host with either a partial name or a full name. The fully qualified hostname is a name that lists from left to right the names of all intermediate nodes between the leaf and the root of the domain naming tree, starting with the leaf name and ending with the root, for example:

amazon.com

A partial name is a name that does not list all, but only part of the node names.

Please note that partial (incomplete names) do not include a period at the end of the name. In real life, the domain name system software expands unqualified names to fully qualified names before contacting the domain exchange servers for an IP address.

The word “Host” is not in the full sense synonymous with the name of the computer, as it is often oversimplified. First, a computer can have multiple IP addresses, each of which can be associated with one or more domain names. Secondly, one domain name can be associated with several different IP addresses, which, in turn, can be assigned to different computers.

Note again that the naming goes from left to right, from the smallest hostname (from the sheet) to the root domain name. Let’s analyze, for example, the fully qualified domain name relay.amazon.com. The hostname is “relay”, the domain name, that this host belongs to, is “amazon”, which is part of the .com domain.

The name relay.amazon.com is already a domain name. It is understood as the name of a set of hosts that have relay.amazon.com in their name. Generally speaking, a specific IP address can be assigned to the name relay.amazon.com. In this case, in addition to the domain name, this name will also mean the hostname. This technique is often used to provide short and meaningful addresses in an email system.

Host and domain names are separated from each other in this notation by the “.” character. The fully qualified domain name must end with a “.”, the last period separates the empty root domain name from the top-level domain name. Often in the literature and in applications, this period is omitted when recording a domain name, using the incomplete domain name notation, even when all node names are listed from the leaf to the root of the domain name.

Keep in mind that domain names in real life are mapped to IP addresses, and even more so to real physical objects (computers, routers, switches, printers, etc.) that are connected to the network.

A computer physically installed and connected to the Network in the USA can easily have a name from a Japanese corporate domain. And vice versa, a computer or router in the Japan segment can have a name from the .com domain.

Moreover, the same computer can have multiple domain names. A variant is possible when several IP addresses can be assigned to one domain name, which is actually assigned to different servers serving the same type of requests. Thus, the correspondence between domain names and IP addresses within the domain name system is not one-to-one but is built according to the “many-to-many” scheme.

The last few remarks highlight that the hierarchy of the domain name system is strictly observed only in the names themselves, and only reflects the nesting of the naming and the areas of responsibility of the administrators of the corresponding domains.

The canonical domain names should also be mentioned. This concept occurs when describing the configurations of subdomains and areas of responsibility of individual domain name servers. From the point of view of the tree, domain names are not divided into canonical and non-canonical, but from the point of view of administrators, servers, and email systems, such a division is essential.

A canonical name is a name that is explicitly mapped to an IP address, and itself is explicitly mapped to an IP address. A non-canonical name is a synonym for a canonical name.

The most popular implementation of the domain name system is the Berkeley Internet Name Domain (BIND). But this implementation is not the only one. For example, Windows NT 4.0 has its domain name server that supports the DNS specification.

Nevertheless, even Windows administrators would like to know how BIND functions and how to configure it. It is this software that serves the domain name system from root to TLD (Top Level Domain).