A Redundant Virtual Service Redirector for Computer Networks

Yinong Chen1
Department of Computer Science
University of the Witwatersrand, Johannesburg

Reliability and availability of network servers, including servers for internet data caching, web servers, DNS servers and firewalls, are becoming a matter of concern of service providers and their clients. Many service providers have used redundant servers as backup spares in their systems to address this problem. Current implementations of redundant servers are

  • User Configuration. The redundant servers with different IP addresses are all connected to the network. The users have to put the IP addresses of the primary and backup servers in their client software configuration. When the primary server is not available, the client software looks for a backup server's IP address.
    The drawback of this implementation is that the service provider has to rely on users to setup their configuration correctly.
  • Manual Switch. The primary and backup servers have the same IP address. When the primary server is down, a backup spare can be manually switch on to replace the primary one.
  • Hard service redirector. The IP address of the server is in fact for the redirector that redirects each client request to one of the servers available, as shown in fig.1.

Currently, service redirectors are implemented by hardware and its price is about 10 times as expensive as a typical server that is not acceptable to many service providers. Another drawback is that there is still a single point of failure: The failure of the redirector will result in the unavailability of the entire service.

The aim of this research is to explore, based on our previous research results [Chen 98], an alternative way to implement the service redirection that can overcome the problems that current techniques encounter. The objectives of the research are to achieve low cost and high availability and reliability by using

  • automatic fault detection and system reconfiguration among the redundant servers,
  • eliminating or reducing the probability of the single point of failure in the system,
  • flexible configuration to cope with different kinds of applications and different level of dependability requirements.

The proposed approach achieving the aim and objectives of the project is by means of software implemented fault-tolerant protocols in distributed systems [Chen 93]. As shown in fig.2, all redundant servers are given the same IP address. Any request will be received by all servers. However, it will be dealt with by only a subset of the available servers in the form of task replication. A hash algorithm in each server is used to decide which servers will handle an incoming request. A comparison protocol running among the servers will check the output agreement among the replicate servers. The agreements and disagreements are logged in a syndrome table for fault diagnosis. The syndrome table, together with the results of a heartbeat protocol, can be used by a reconfiguration protocol to localise faults. Localised faulty servers can then be excluded from participating in the collaborative operation.

The set of protocols that distribute requests to a subset of fault-free server and that handle faults has the same function as a physical redirector and is called a virtual redirector. The advantages of the virtual redirector are:

  • It has a generic architecture that can be used for various service redirection, e.g., for network data caching, web servers, DNS servers and firewalls.
  • It has a flexible configuration that can manage from two to a number of servers.
  • The automatic fault detection and system reconfiguration are implemented by a comparison, a heartbeat protocol, and a reconfiguration protocol, which reduce the delay of fault detection and fault removal.
  • There is no single point of failure in a system with more than two servers;
  • The entire redundant system is not significantly more expensive than the costs of servers.

Currently, a fault-tolerant firewall for internet-intranet communication has been chosen for experimentation [Chen 98]. The system in fig.2 is implemented by three SGI stations connected by an ethernet, as shown in fig.3. Two extra stations are used. One simulates an internet from which requests will be sent to the firewalls and the other simulates an intranet which allows only controlled accesses from the internet.

Fig.4 gives more details of the software structure of the entire system. The fault-tolerant firewall is implemented on a rather generic hierarchy of layers. The firewall programs are at the application layer. Below are the fault-tolerant protocols that handle various fault conditions to ensure the application layer above works correctly. The fault injector is used only in the testing stage to ensure that the system can be tested under all possible fault conditions that should be tolerated by the system [Chen 93]. Since an ethernet doesn't ensure real-time communication, a token bus has been implemented in the system [Mdakane 97].

A service redirector doesn't need hard real-time property. However, soft real-time, that allows occasional deadline missing but keeps real-time responses in most cases, is desirable. Furthermore, the synchronisation among replicate tasks must have all results available within a time constraint so that a slower response and a never arriving response can be differentiated. These requirements motivated us to put a real-time schedule and a real-time communication level in the system to ensure deadlines of real-time tasks. The other three layers, i.e., the UNIX operating system, the ethernet communication, and the hardware system are off-the-shelf components.

The research project investigating dependable real-time distributed systems has been sponsored by the South African Foundation for Research Development (FRD) since 1996. The experimental system was first implemented in 1995, improved in 1996 and tested in 1997 with the help of Honours and Masters students in the department of Computer Science at the University of Witwatersrand. The system is used as the laboratory environment for the course Reliability of Computer Systems.

Applying the techniques developed in the project to construct dependable virtual network service redirectors is motivated by our industry partner where such systems are required not only for firewalls but also for data caching, web servers and DNS servers. Both research and implementation works are being further performed to meet their requirements.

References

[Chen 98] Y. Chen, On Development of a Dependable Local Area Network, IFIP International Workshop on Dependable Computing and its Applications, Johannesburg, January, 1998, pp. 83 - 96.

[Chen 93] Y. Chen, Testing and Evaluating Fault-Tolerant Protocols by Deterministic Fault Injection, VDI Series 10, No. 260, VDI Verlag, Duesseldorf, 1993.

[Mdakane 97] S Mdakane, Token bus protocol on ethernet, Honours Research Report, Department of Computer Science, University of the Witwatersrand, 1997.


1. Author contact: Department of Computer Science, University of the Witwatersrand, Johannesburg, 2050 Wits, SOUTH AFRICA. Email: yinong@cs.wits.ac.za, Tel.: +27 - 11- 716 3304, Fax: +27 - 11- 339 3513