We have lots to talk about here. But due to the size restraints of this book, I have organized a summary with the most important aspects of today’s network jargon and explained them briefly. I hope you can find some new information to help your automation work.
Protocol layers
It’s important to note that there are several different standards for protocol layers, and the most academic one is the ISO organization called OSI model, which defines seven layers. But we are going to consider only five defined in the TCP/IP protocol stack, which is used on the internet. Here is a short summary of each of the layers:
- Physical layer: In this layer are the technologies involved in the physical connection itself where the bits and bytes are transformed into the physical medium, such as the light in fiber optics, electricity in a cable, and radio waves in antennas. At this layer, physical checks can be implemented on the node input, such as power levels, collision, noise, and signal distortion, among other types of checks.
- Data link layer: Here, the information is called a frame, and it contains a delimited size, known as the maximum transmission unit (MTU). The reason is that a frame is a data representation in bytes that has to move from one node to another one and in a reliable manner without interruption. At this level, frame queues are present; the queues are used to place the frames on the physical layer in sequential order or in priority order. Some data link devices can prioritize certain types of frames, jumping to the front of the queue. At the data link layer, some checks are done, but within the frame itself, such as CRC or checksum. In addition, source and destination addresses can be added to the frame to differentiate destinations on a shared media. The information on the frame is normally used locally within the same organization. This layer is also known as the Ethernet layer.
- Network layer: This is also known as the IP layer, or the router layer. Here, the information is called a packet, and it contains the information that goes between nodes that are beyond the layer 2 domain (or the previous Ethernet layer). This level is where the routing protocols are used, the network address translation (NAT) does its job, some access control lists (ACLs) are present, and the control packets are, among other functions. The packet on this level has enough information to know where it came from and where it has to go. This layer is also responsible for fragmenting the packet into multiple packets if the frame MTU is smaller than the IP packet. The main information carried in the packet is the IP address and has source and destination addresses.
- Transport layer: The transport layer deals with data information that is called a segment. On today’s internet, only two types of protocols are used here, the User Data Protocol (UDP) and the Transmission Control Protocol (TCP). The idea is one provides more confirmation and control than the other. TCP has traffic flow control, packet loss detection, and packet retransmission, among other functions. UDP, on the other hand, is just the IP packet plus a little more information. The idea behind having TCP is to enhance communication on the unreliable internet, so the application has a guaranteed transport method. TCP has more overhead, with an additional header field, and might be slower in some cases than UDP. The transport layer adds a port number to the segment, which is carried inside every packet in the IP layer. The port number is used for two reasons: to designate which application is using the transport layer, such as port
80
for HTTP communication, and to associate it with a communication socket in the host. The port number is required for the source and destination, which will be used to designate the correct socket to communicate with the host.
- Application layer: This is the top of the layers, normally referred to by my professor as the cherry on the cake. An application layer is used to associate a socket on the host where data will be sent and received. The application normally handles the content of the data, such as page requests on HTTP. The software that we are producing in this book uses this layer to automate the network.
LAN, WAN, internet, and intranet
LAN, or local area network, is used to refer to networks that are local. Nowadays, it means networks that use the data link layer as the main communication, such as Ethernet. The reason why the name is more related to the communication layer than the geography is that technology has evolved, allowing Ethernet switches to communicate over thousands of kilometers. So, a LAN normally designates a topology inside the same organization using Ethernet, but not necessarily geographically in the same location.
WAN, or wide area network, is used to refer to networks that are remotely connected, or technologies that allow nodes to be far apart, such as extinct technologies such as X.25, Frame Relay, and Asynchronous Transfer Mode (ATM). Now, the term WAN is normally used to designate interfaces or networks that are connected to different networks, or in other words, networks that are not in the same organization, data link layer, or Ethernet domain.
Information
For more information about ATM, please refer to the article Technology and Applications in SSRN Electronic Journal, June 1998, by Jeffrey Scott Ray.
The internet is what you know, this gigantic network interconnecting everybody worldwide.
The term intranet was used when corporations were using the internet protocols to communicate internally on their network. The reason is that other technologies were competing with the internet TCP/IP protocol at that time, such as SNA and IPX. So, when the term intranet was used, it was simply to state that the corporate network uses TCP/IP. Nowadays, intranet refers to a network that is within the same organization and not connected to external nodes. Therefore, the network is safe from external interference.
Point-to-point connections
A point-to-point (P2P) connection is used to interconnect two nodes. A link between two nodes is normally a P2P connection (as shown in Figure 1.1), unless using media such as satellite or broadcast antennas. This connection can either be back to back or not. The term back to back is normally used to indicate that the nodes are connected directly without any other physical layer between them, such as repeaters. Therefore, back-to-back connections have limited distances due to the noise and distortion introduced in the connection as the wiring gets longer. Depending on the speed and the technology used, the distances are limited to within the same room or building.
Figure 1.1 – A P2P connection
Star or hub-spoke topologies
Star or hub-spoke topologies are used in small and medium companies, where one office is the main distributor and the other locations are consumers. The topology looks like a star, and network elements are smaller and simpler at the remote locations, while being larger and complex at the main distributor (see the example in Figure 1.2).
Normally, these types of topologies can scale up to hundreds of nodes, but depending on the traffic, the requirements can scale to thousands. Let’s look at two examples that illustrate the scale of these topologies.
For instance, in a bank, the automated teller machines are distributed in remote locations, where the main computer is located in the main branch. This can scale to thousands of remote machines as the traffic requirements are small in terms of byte transfer on a teller machine.
On the other hand, if you have a supermarket chain using a star topology, it won’t scale to thousands of remote machines, as each supermarket requires a large amount of data transfer to handle all transactions and employees.
So, the use of star topologies is limited to the amount of traffic it can handle in the central node. In the star topology, we have two device functions, a device that will be either at the remote location or in the main office.
Network capacity planning is trivial when dealing with star topologies, as the main office node is updated as it grows.
Figure 1.2 – A star topology
Hierarchical or tree topologies
Hierarchical topologies are used to optimize traffic, where larger nodes are used to aggregate traffic to smaller nodes in a hierarchical matter (see the example in Figure 1.3). These topologies can scale to thousands of nodes; however, because of the number of nodes in the path, the topologies can cause undesirable latency and extra node costs.
An internet service provider normally uses a hierarchical topology to concentrate customer traffic in certain remote locations before aggregating even more in other locations.
There is no limit on the number of nodes on this type of topology, and it’s one of the foundations of the internet global infrastructure.
In the hierarchical topologies, we have multiple device functions, the customer premises equipment (CPE), aggregators, distributors, core, and peering, among others.
Depending on the size of this topology, it can introduce a longer path, which will add significant latency. For instance, in Figure 1.3, A1 has to cross five hosts to reach A7.
Network capacity planning is focused on the aggregation points, and augmenting the network is not that difficult.
Figure 1.3 – A hierarchical or tree topology
Clos topologies
This type of topology is also known as a Clos network or fabric. This topology is used to increase the number of ports without compromising latency and throughput and is often used in data centers. This topology is composed of at least three stages. Note that there is no oversubscription or aggregation like in the hierarchical topologies. The Clos topology provides the same amount of available bandwidth on the input and output. The stage names are normally spines and leafs. The spines are always in the center and only have connections to the Clos nodes. Leaves are used to connect to external devices or networks.
Figure 1.4 shows an example of a 16-port Clos network. Note that normally, all connections between a spine node to a leaf node are back to back:
Figure 1.4 – A Clos topology
Why are these topologies used? To increase the number of ports available without compromising throughput. This kind of topology is also used inside a router to provide connectivity between interface cards. Some companies use small devices to increase the number of ports that are offered without raising the cost as smaller devices are normally cheaper.
Important note
One additional characteristic of the Clos network is that it has the same distance between any two external ports (in terms of nodes in the path), therefore the latency in normal conditions is the same. For instance, in Figure 1.4, the latency between an external port on node L1 to an external port on L4 or E1 is the same.
Important note
More information on Clos networks can be found in an interesting paper from Google called Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network – ACM SIGCOMM Computer Communication Review, Volume 45, Issue 4, October 2015.
Mixed topologies
A mixed topology is used in large corporations where latency and traffic are both important to care of. Normally, star topologies and P2P are used to shorten paths and reduce latency, whereas hierarchical topologies are used to optimize and aggregate traffic, and finally, Clos networks to increase the number of ports.
Modern cloud service providers are migrating to a more complex topology, where there are connections between elements where latency matters and aggregate device functions where traffic matters.
Network capacity planning is normally harder because connections are not totally hierarchical and aggregation points are not necessarily part of all traffic paths. An example of this kind of mixed topology is shown in Figure 1.5:
Figure 1.5 – A mixed topology
Interface speeds
A very important point that some engineers get confused about is the interface speed representation. 1 KB in memory representation is 2^10 or 1,024 bytes and 1 GB is 2^30, which is 1,073,741,824 bytes. For interface speeds, the same does not apply, and 1 Kbps is actually 1,000 bits/second, while 1 Gbps is 1,000,000,000 bits/second (more details can be found at https://en.wikipedia.org/wiki/Data-rate_units).
Device types and functions
Network devices used to have specific functions as CPU and memory were scarce and expensive. Nowadays, network devices can have multiple functions when required. In large networks, devices have fewer functions as they tend to get overloaded easier when traffic demands increase. Here are some of the functions that a device can have:
- Hub: This is a very old term to designate a device that only repeats the physical signal.
- Switch: A device that works only on the data link layer. It is normally used in LANs, and it works by switching frames. The most common protocol used on these devices to control paths is the Spanning Tree Protocol (STP).
- Router: A device that works only on the network layer or IP Layer. It is used to interconnect multiple LANs or create long-haul remote connections. Internally, a router routes packets using a routing protocol to exchange route information with other routers. Some routers can also switch frames or work as a switch.
- NAT: NAT is devices that replace source and destination IP addresses to allow the use of private IP addresses or to isolate internal traffic from external traffic.
- Firewall: Normally, devices that control the traffic that passes through it by looking into the content of the frame or the packet. There are several different types of firewalls, and some might be super complex, which includes encrypting and decrypting traffic.
- Load balancers: When servers can’t handle too many clients because of hardware limitations, load balancers can be used to deal with the client demands by sharing the client request between several servers. Those devices also look into the packet content to determine which server would get the traffic.
- Network server: A computer used to provide some sort of service to the network, for instance, an authentication server, an NTP server, or a Syslog collector.
Oversubscription
In network jargon, this term is used to describe nodes or links in the network that aggregate traffic from other parts of the network and statistically use it to their advantage. For instance, they have a 1 Gbps interface to connect to the internet and 1,000 customers with 10 Mbps interfaces to use the service, which is an oversubscription of 1 to 10. This practice is quite normal and is only possible to use because of the characteristics of the client’s traffic that allow such aggregation without degradation. There are lots of mathematical models and papers on the internet describing this behavior and how to use it in your favor.
But some traffic can’t be aggregated without being degraded. In a data center, the traffic that can’t be oversubscribed is the traffic between servers, such as remote disk, data transfers, and database replicas. In this scenario, the best solution is to interconnect them without oversubscription using a solution such as non-blocking Clos topologies.
Browsing web pages, watching videos, and receiving messages from most of the traffic on the internet, which easily allows the aggregation technique without degradation.
Important note
More information on oversubscription can be found in the paper Evaluating Impacts of Oversubscription on Future Internet Business Models by A. Raju, V. Gonçalves, and P. Ballon – Published in Networking Workshops, 25 May 2012 – Computer Science.
In this section, we went over the basic components of computer networks, including protocols, topology types, interface speeds, and device types. By now, you should be able to identify these terms more easily and will be familiar with their meanings, because we are going to use these terms throughout this book. Moving on, we are going to review more terms related to network architecture.