Using The ATMS for Telecommunication Network Fault Management

Nigel Wells      Weiru Liu      Ken Adamson
School of Information & Software Engineering
University of Ulster, Shore Road, Jordanstown,
BT36 8QB, Northern Ireland.
Telephone (01232) 368209.
E-mail: nt.wells, w.liu, k.adamson {@ulst.ac.uk}

 The management of faults in a small homogeneous network is fairly straightforward and easy to accomplish. However, as the size of networks increase, and they become increasingly more complex, fault management becomes more problematical and time-consuming. Many different approaches to telecommunications network fault management currently exist. They include, among many others, expert systems [Todd, 1988], evidential theory [Dawes, Altoft, Pagurek, 1995], neural networks [Koivo, 1994], model-based approaches [Frank, 1992]. All of these methods have their own merits in their methods of resolving network faults, but nonetheless all suffer from various weaknesses in their implementations which can limit their effectiveness in telecommunications network fault management.

In this abstract, we present the intelligent fault management system for telecommunications networks we are developing. The system is based upon the Assumption-based Truth Maintenance System (ATMS) [de Kleer, 1986] but the ATMS structures have been modified to provide support for the fault management capabilities of a telecommunications network fault management system [Liu, 1998]. The design of the fault management system has been made as generic as possible in order that it can be implemented in a wide a range of telecommunication networks with minimal change to the management system structure.
The general structure of the network fault management system may be visualised by referring to Figure 1.

 
 
The main features of the system are:

  • Simple Network Configuration Database. The network configuration database, which maintains details of the network devices and the network connection topology, has a minimalist structure, via means of the adapted ATMS structures, whereby each network node is configured with only the knowledge of its immediate neighbour nodes to which the node is directly connected; unlike many ANMs where each node has detailed knowledge of the whole network structure. The use of a minimalist configuration database permits rapid diagnosis of network faults and easy maintenance of the network configuration.
  • High level control program (HLCP). This program has overall control of the fault management system and communicates directly with monitor nodes, strategically positioned around the telecommunications network. Each of these monitor nodes overlooks a section of the network and is responsible for monitoring the status and health of all network devices within its section. The HLCP polls the monitor nodes periodically and receives status messages in return which indicate the status of all devices in the network. When the HLCP receives a status message indicating a fault within the network, the HLCP invokes the fault diagnostics algorithm to act upon the fault. Diagnostic information received from the fault diagnostic algorithm is then relayed to the network operator for further action. When a number of fault messages are received, the HLCP will prioritise the fault reports in order of importance or urgency and will act on the higher priority faults first.
  • Incorporation of Uncertainty Management. The determination of failure of a device can rarely be made with total accuracy, especially in large complex systems. In telecommunication networks, this can be due to a number of factors such as network congestion which may prevent a device reporting its status, or unreliability of the network monitor itself which may result in incorrect status messages being sent to the HLCP. The incorporation of uncertainty management into the fault management system should permit the fault diagnostic system to evaluate fault reports in respect of known reliability measures which exist for each device and connecting link in the network. Thus it should be possible to generate more intuitive fault diagnostics which more accurately deal with imprecise fault reports.
  • Graphical Interface. A graphical user interface is being developed through which the network operator will interact with the fault management system in all aspects of network maintenance and fault management. The network operator will be able to set system parameters which determine how the management system responds to network faults, dynamically re-configure the network to add or remove network nodes, and request clarification of diagnosis decisions made by the fault diagnostics engine.
  • Parallel Implementation. The early prototype version was developed as a sequential implementation and this provided an ideal test-bed for the concepts and ideas proposed for the system [Wells, Liu, Adamson, 1998]. This sequential system is now being re-engineered to produce a parallel version of the fault management system, implemented on a network of INMOS transputers. The parallel system will handle multiple network faults concurrently - the only limitation being the number of processors in the transputer network. It is anticipated that this will lead to a significant speedup in the diagnosis of multiple faults in a network.

References

[Dawes, Altoft, Pagurek, 1995]  Dawes, N., Altoft, J., Pagurek, B.  Network Diagnosis by Reasoning in Nested Evidence Spaces, IEEE Transactions On Communications, Vol. 32, No. 2/3/4, February, March, April, 1995, pp. 466-76

[de Kleer, 1986]  de Kleer, Johan., An Assumption-based TMS,  Artificial Intelligence 28, (1986),
pp. 127-61.

[Frank, P.M., 1992]  Frank P.M.   Principles of  Model-Based Fault Detection, Proceedings of IFAC Artificial Intelligence in Real-Time Control, 1992, pp. 213-220

[Koivo, 1994]  Koivo, H. N., Artificial Neural Networks In Fault Diagnosis and Control, Control Engineering, Practice, Vol. 2, No. 1, pp. 89-101

[Liu, W, 1996]  Liu, W,  A Domain Independent Data Structure For Telecommunications Using Adapted ATMS, to appear at IPMU-98, July 1998, Paris.

[Todd, 1988]  Todd, E. Marques  A Symptom-Driven Expert System for Isolating and Correcting Network Faults, IEEE Communications Magazine, March 1988, Vol. 26, No. 3

[Wells, Liu, Adamson, 1998]  Wells, N.T., Liu, W., Adamson, K., Using the ATMS for Fault Management in Telecommunications Networks, to appear at IPMU-98, July 1998, Paris.