High Availability & Failover In Distributed Systems: Address Resolution Protocol

Jan 13, 2024

High availability has become a fundamental need for enterprises across all sectors when creating distributed systems in today's vigorous, unpredictable digital world, where downtime could result in severe financial losses, dangers to life, and reputational harm.

At the most fundamental level, there exist components that serve as the foundations for higher-level systems. And, as software engineers, knowing some of those aspects might help us become more familiar with how we architect our systems.

Address resolution mechanisms will be discussed as one of the aspects that any software developer must grasp.

In this week’s issue, we will be explaining the following:

What it means for a system to be highly-available
The concept of fail-overs in distributed systems
Address resolution protocol as a fundamental concept in the mix.

High availability is a measure of a system's accessibility and dependability to its users. High availability is critical in distributed systems to ensure that the system remains functional even in the face of failures or increasing demand. It is the foundation that allows organizations to deliver uninterrupted services to their customers in the face of unanticipated occurrences.

Implementing an infrastructure with a flawless and effortless failover system is one technique to assure a highly available system with little or no planned or accidental downtime.

Failover is the capacity to effortlessly and automatically transition to a solid backup system. When a key system component fails, either redundancy or switching to a standby operating mode should result in failover and lessen or eliminate the negative user effect.

This means that a redundant or standby node should be prepared to replace any previously active version if it terminates or fails abnormally. Because failover is required for disaster recovery, backup nodes must be resistant to failure.

But where does an ARP—address resolution protocol—come in? Before we explain, let’s understand what an ARP is.

To understand the purpose of ARP, first grasp the distinction between IP and physical addresses (MAC addresses). IP addresses are unique and logical addresses allocated to networked devices, whereas physical addresses, also known as Media Access Control (MAC) addresses, are unique and physical addresses issued to network interface controllers.

While IP addresses are ever-changing and dynamic, MAC addresses are static and fixed in a local-area network (LAN).

As a result, Address Resolution Protocol (ARP) is a protocol or technique in a local area network (LAN) that maps an Internet Protocol (IP) address to a fixed physical machine address, also known as a media access control (MAC) address.

This mapping technique is carried out in order for nodes to interact. This is significant because the lengths of the IP and MAC addresses differ, necessitating a translation so that the systems can communicate with one another. An IP address is made up of 32 bits. MAC addresses, on the other hand, are 48 bits long. ARP converts a 32-bit address to a 48-bit address and vice versa.

It is a broadcast-based protocol that runs at the OSI model's Data Link Layer. When a device needs to interact with another device on the same network, it first checks its ARP cache to verify whether it already knows the target device's MAC address.

If the MAC address is not discovered in the cache, the device broadcasts an ARP request to all network devices, requesting the MAC address of the device with the provided IP address. This process is referred to as ARP resolution.

How ARP Works

The address resolution protocol works by using two main messages: the ARP request and the ARP reply.

An ARP request is a message sent from one machine to another on the same network that requests the MAC address of one of the machines with a given IP address. This request always includes the IP address of the server issuing the request as well as the IP address of the machine with whom it is attempting to interact.

When a server receives an ARP request, it examines the IP address in the request to verify whether it matches its own IP address. If it does, the computer responds with an ARP response that includes its own MAC address as a payload; otherwise, the server drops and ignores the request.

Let’s assume we have two machines that each have their own IP address and MAC address in our distributed system. These two machines have software between them that allows them to communicate together; this is called a heartbeat.

Heartbeat detection is a technique for monitoring the availability of nodes in a distributed system. Each node in a distributed system transmits a signal known as a heartbeat to a central monitoring system on a regular basis to confirm that it is still alive and functioning properly.

There is a master node and a backup node on these two nodes. They pick who will lead and who will follow. They also agree on a single virtual IP address that does not exist in the real world.

When a client request arrives, the backup machine will accept and disregard the request in the expectation that the master node is in charge of that obligation; so, the backup machine dismisses the request. When the master node gets the request, it answers with its IP address and destination MAC address.

Essentially, the two machines are in continual communication via the heartbeat, and when the client requests come in, they are always processed by the master node by default.

However, when the master node fails and is unable to reply to a client request, the backup node detects this through the heartbeat and promptly assumes the master node's place.

This procedure is known as the address resolution protocol. The client makes a request to the network, looking for a machine with a certain IP address in the hopes of acquiring a MAC address so that communication may begin. Through a virtual router redundancy protocol, the master and backup nodes share the same IP address.

Unfortunately, the master node is offline, but the backup node takes up the request and answers with its own MAC address, and communication begins smoothly without the client seeing anything.

ARP Cache

An ARP table, also known as an ARP cache, is a database that stores a list of IP addresses and the MAC addresses that relate to them. When a device talks with another device on the same network, it adds the target device's MAC address to its ARP cache.

The device can then use the MAC address from the cache to communicate with the destination device without having to perform ARP resolution again.

All operating systems on an IPv4 Ethernet network maintain ARP caches. When a device requests a MAC address in order to deliver data to another device on the LAN, it checks its ARP cache to verify if the IP-to-MAC-address connection has already been accomplished.

If it already exists, there is no need to make a new request. If the translation has not yet been completed, a request for network addresses is issued, and ARP is executed.

The size of an ARP cache is intentionally restricted, and addresses typically stay in the cache for only a few minutes after their first input. It is frequently deleted to free up space. This architecture is also meant to protect privacy and security by preventing attackers from stealing or spoofing IP addresses. Unlike MAC addresses, IP addresses are regularly updated.

Unutilized addresses are erased during the purging process, as is any data linked to failed efforts to interact with machines that are not connected to the network or are not even switched on.

Conclusion

ARP is necessary because the software address (IP address) of the host or computer connected to the network needs to be translated to a hardware address (MAC address). Without ARP, a host would not be able to figure out the hardware address of another host.

This issue only highlighted the role of ARP as a component of computer networking elements that make distributed systems highly available. There are other elements like virtual IP (VIP) and virtual router redundancy protocol (VRRP), some of which we will be discussing in the upcoming issues.

Happy learning!

SkylineCodes

Discussion about this post

Ready for more?