Fast Abstracts Archives . .

FastAbstracts


WHAT IS a
FastAbstract

The History

Archives of
FastAbstracts

ISSRE 2003
ISSRE 2002
ISSRE 2001
ISSRE 2000
ISSRE 1999
ISSRE 1998
FTCS 1999
FTCS 1998



 

 

 

Constructing Reliable Network Management Systems Using CORBA

Tong Luo 1       Anthony Confrey 2
GTE Labs, Waltham, USA
 
Kishor S. Trivedi 3 
Duke University, Durham, USA

Background and Motivation

Network Management provides the central nervous system for the networks of telecommunications providers. A Telco's Network Management System (NMS) needs to support uninterrupted management functionality of complex networks. Even a short down time of the NMS can cause customer dissatisfaction, revenue losses, and may even jeopardize life. The telecommunication industry is adopting CORBA as an underlying architecture in order to expedite the process of transforming technological capabilities into services and to shorten development cycles. However, neither the CORBA specifications[7] nor the CORBA services [8] currently provides direct support for fault-tolerant objects. Consequently, NMS developers using CORBA must provide their own fault-tolerance mechanism for mission-critical objects.

Recently, several different approaches have been proposed to build reliable distributed systems with CORBA. A ``warm standby'' idea is proposed by Sheu, et al[10]. Since it only handles two replications of an object, and the protocol for handling failed objects is not transparent to the client object, this approach is not scalable to a larger number of replications. An ``integrated'' approach which extends and modifies the standard Object Request Broker (ORB) with group communication mechanisms is adopted in Orbix+Isis[5] and Electra[6]. This approach keeps the replication of objects transparent to clients. A drawback of this approach is that it is ORB dependent (implemented with IONA's Orbix), and it does not comply with CORBA's philosophy that the architecture should be generic and simple, with special requirements being added on as separate services. Yet another approach is the ``service'' approach [1], which provides the group communication mechanism on top of a standard ORB. This approach keeps the replication of objects transparent to the client. It is ORB independent and follows CORBA's modularity philosophy. The drawback of using this approach is that there is no software product supporting the service at this time. Under the pressure of budget constraints and short project time frames, GTE is reluctant to invest in building its own group communication service which has potentially long development cycle.

We identify three key issues involved in constructing reliable CORBA-based software systems and present our solutions to them:

  • How to make the fault-tolerance behavior of server objects transparent to client objects.
  • How to keep the replications of the server objects consistent with each other.
  • How to make the fault-tolerance mechanism scalable to multiple object replications.

  Figure 1:  Architecture of GTE's New NMS

Architecture of GTE's New NMS

The component architecture of GTE's next generation NMS, which is based on the Telecommunications Management Network (TMN)[4, 5] layered model, is shown in Figure 1. At the element layer, a set of IEMS (Integrated Element Management System) objects manage network elements of different technology, vendor and protocol, and convert their propriety information model into the generic information model of the system. At the network layer is a set of NeMoW (Network Management On the Web) objects, which perform functions such as service assurance, service provisioning, inventory management, testing and fault isolation, ticketing, etc. Each NeMoW object is responsible for a partitioned subnet, or a set of subnets, and may talk to several IEMS objects in order to provide services for the network elements managed by these IEMS. A NeMoW object finds out which IEMS objects it should talk to through the IEMS_Locator object. The CORBA naming server provides the naming service for the whole system, and CORBA event channels provides the event service for upstream alarm dispatching.

If any one of the IEMS, NeMoW, or IEMS_Locator objects fail, the related management functionality will become unavailable to internal or external customers and upper layer objects. If the naming server object or a event channel object fails, the system will be unable to resolve and bind the object references or deliver alarms for some switches. We focus our discussion on the fault-tolerance mechanisms for the naming server, event channels, and critical business objects.

Our Fault-tolerance Approaches

For various business and technical reasons, we have adopted the VisiBroker ORB from Inprise as our primary ORB for development. However the VisiBroker ORB does not support multi-cast or mirroring [13] so we can not use ``hot standby'' approaches similar to Orbix+ISIS [2] and Electra [6]. Also, the tight project delivery date does not allow us to develop our own group communication service as proposed by [1]. On the other hand, VisiBroker ORB provided some facilities that can be used to implement our fault-tolerant objects.

Our fault-tolerant naming service consists two naming servers running on two different hosts. Each naming server has it own logfile, database, Interceptor, and db_sync process. The db_sync process is reused from GTE's current NMS TONICS [3,12,11], and maintains the consistent update of the two databases.

Our fault-tolerant event channel is achieved by having every supplier maintain a FIFO event backup queue and stamp a sequence number (in increasing order) on each event it is delivering. The supplier also need to implement its BindInterceptor class, which will force the supplier to re-transfer the events in the backup queue when its event channel fails.

The fault-tolerance for our critical business objects, such as NeMoW, IEMS, and IEMS_Locator objects, are designed and implemented in a more consistent architecture using CORBA Persistent State Service (PSS) [9] built inhouse.

General speaking, our fault-tolerance approaches for the naming server, event channels, and inhouse built critical business objects, belong to the ``warm standby'' category. Consistence between the working and protection objects is maintained in all these approaches. The fault-tolerance behavior is completely transparent to the clients for the naming server and the critical business objects. For event channels, the fault-tolerance protocol is not completely transparent to the client. The client needs to implement its BindInterceptor class, and the call-back functions to be invoked by the BindInterceptor, though the programming workload is quite minimal compared with the approach by Sheu [10]. All of our approaches can be easily scaled to multiple backup objects (i.e ``N : 1 warm standby'') by enhancing the db_sync process to synchronize multiple databases, No modification on the VisiBroker ORB is needed by our approaches, although they dependent on some VisiBroker specific features, such as the osagent and the Interceptor classes.

References

[1] P.Felber, B.Garbinato, and R.Guerraoui. "The design of a corba group communication service". Proceedings of 15th Symposium of Reliable Distributed Systems, pages 150--159, 1996.

[2] IONA and Isis. An introduction to Orbix+Isis. IONAY Tehnologies Ltd. And Isis Distributed Systems, Inc., 1994.

[3] S.Kheradpir, W.Stinson, J.Vucetic, and A.Gersht. "Real-time management of telephone operating company networks: issues and approachs". IEEE Journal on Selected Areas in Communications, pages 1385--1403, 1993.

[4] ITU-T.Recommendation M.3000. Overview fo TMN Standards. ITU-T, March, 1993.

[5] ITU-T Recommendation M.3010. Principles for a Telecommunications Management Network. ITU-T, 1992.

[6] S.Maffeis. "Run-time support for object oriented distributed programming". Ph.D thesis, University of Zurich (Switzerland), Feburary 1995.

[7] OMG. The Common Object Request Broker (CORBA): Architecture and Specification, v 2.0. Object Management Group Inc., 1995.

[8] OMG. Common Object Services Specification. Object Management Group Inc., 1995.

[9] OMG. Persistent State Service RFP. OMG, 1995/1997-06-07.

[10] G. W. Sheu, Y. S. Chang, D.Liang, S. M. Yuan, and W.Lo. "A fault-tolerant object service on corba". Proceedings of the 17 International Conferene on Dsitributed Computing, pages 393--400, 1997.

[11] W.Stinson and S.Kheradpir. "A state-based approach to real-time telecommunications network management". Proceedings of NOMS'92, Memphis, TN, 1992.

[12] W.Stinson, S.Kheradpir, and F.Ebrahimi. "Design and deployment of an integrated network management system for a large telco network". Proceedings of NOMS'94, Orlando, FL, 1994.

[13] Visigenic. VisiBroker for JAVA: programmer's guide, v3.2. Visigenic, 1998.


1.  Author contact: GTE Labs, 40 Sylvan Rd., Waltham MA 02454, USA
phone: (office) 781-466-4293; (fax) 781-466-2941; E-mail tluo@gte.com

2.  Author contact: GTE Labs, 40 Sylvan Rd, Waltham MA 02454, USA.
Phone: 781-466-2889. Fax: 781-466-2941. E-Mail: aconfrey@gte.com

3.  Author contact: Dept. of Electrical Engineering,
Duke University, Durham NC 27708, USA.
Phone: 919-660-5269. Fax: 919-660-5293. E-Mail: kst@ee.duke.edu