Constructing Reliable Network Management Systems
Using CORBA
- Tong Luo 1
Anthony Confrey 2
- GTE Labs, Waltham, USA
-
- Kishor S. Trivedi 3
- Duke University, Durham, USA
Background and Motivation
Network Management provides the central nervous system for
the networks of telecommunications providers. A Telco's Network
Management System (NMS) needs to support uninterrupted management
functionality of complex networks. Even a short down time of
the NMS can cause customer dissatisfaction, revenue losses, and
may even jeopardize life. The telecommunication industry is adopting
CORBA as an underlying architecture in order to expedite the
process of transforming technological capabilities into services
and to shorten development cycles. However, neither the CORBA
specifications[7] nor the CORBA services
[8] currently provides direct support
for fault-tolerant objects. Consequently, NMS developers using
CORBA must provide their own fault-tolerance mechanism for mission-critical
objects.
Recently, several different approaches have been proposed
to build reliable distributed systems with CORBA. A ``warm standby''
idea is proposed by Sheu, et al[10].
Since it only handles two replications of an object, and the
protocol for handling failed objects is not transparent to the
client object, this approach is not scalable to a larger number
of replications. An ``integrated'' approach which extends and
modifies the standard Object Request Broker (ORB) with group
communication mechanisms is adopted in Orbix+Isis[5]
and Electra[6]. This approach keeps
the replication of objects transparent to clients. A drawback
of this approach is that it is ORB dependent (implemented with
IONA's Orbix), and it does not comply with CORBA's philosophy
that the architecture should be generic and simple, with special
requirements being added on as separate services. Yet another
approach is the ``service'' approach [1],
which provides the group communication mechanism on top of a
standard ORB. This approach keeps the replication of objects
transparent to the client. It is ORB independent and follows
CORBA's modularity philosophy. The drawback of using this approach
is that there is no software product supporting the service at
this time. Under the pressure of budget constraints and short
project time frames, GTE is reluctant to invest in building its
own group communication service which has potentially long development
cycle.
We identify three key issues involved in constructing reliable
CORBA-based software systems and present our solutions to them:
- How to make the fault-tolerance behavior of server objects
transparent to client objects.
- How to keep the replications of the server objects consistent
with each other.
- How to make the fault-tolerance mechanism scalable to multiple
object replications.
Figure 1: Architecture of GTE's New
NMS
Architecture of GTE's New NMS
The component architecture of GTE's next generation NMS, which
is based on the Telecommunications Management Network (TMN)[4, 5] layered
model, is shown in Figure 1. At the element layer, a set of IEMS
(Integrated Element Management System) objects manage network
elements of different technology, vendor and protocol, and convert
their propriety information model into the generic information
model of the system. At the network layer is a set of NeMoW (Network
Management On the Web) objects, which perform functions such
as service assurance, service provisioning, inventory management,
testing and fault isolation, ticketing, etc. Each NeMoW object
is responsible for a partitioned subnet, or a set of subnets,
and may talk to several IEMS objects in order to provide services
for the network elements managed by these IEMS. A NeMoW object
finds out which IEMS objects it should talk to through the IEMS_Locator
object. The CORBA naming server provides the naming service for
the whole system, and CORBA event channels provides the event
service for upstream alarm dispatching.
If any one of the IEMS, NeMoW, or IEMS_Locator objects fail,
the related management functionality will become unavailable
to internal or external customers and upper layer objects. If
the naming server object or a event channel object fails, the
system will be unable to resolve and bind the object references
or deliver alarms for some switches. We focus our discussion
on the fault-tolerance mechanisms for the naming server, event
channels, and critical business objects.
Our Fault-tolerance Approaches
For various business and technical reasons, we have adopted
the VisiBroker ORB from Inprise as our primary ORB for development.
However the VisiBroker ORB does not support multi-cast or mirroring
[13] so we can not use ``hot standby''
approaches similar to Orbix+ISIS [2]
and Electra [6]. Also, the tight project
delivery date does not allow us to develop our own group communication
service as proposed by [1]. On the
other hand, VisiBroker ORB provided some facilities that can
be used to implement our fault-tolerant objects.
Our fault-tolerant naming service consists two naming servers
running on two different hosts. Each naming server has it own
logfile, database, Interceptor, and db_sync process. The db_sync
process is reused from GTE's current NMS TONICS [3,12,11],
and maintains the consistent update of the two databases.
Our fault-tolerant event channel is achieved by having every
supplier maintain a FIFO event backup queue and stamp a sequence
number (in increasing order) on each event it is delivering.
The supplier also need to implement its BindInterceptor class,
which will force the supplier to re-transfer the events in the
backup queue when its event channel fails.
The fault-tolerance for our critical business objects, such
as NeMoW, IEMS, and IEMS_Locator objects, are designed and implemented
in a more consistent architecture using CORBA Persistent State
Service (PSS) [9] built inhouse.
General speaking, our fault-tolerance approaches for the naming
server, event channels, and inhouse built critical business objects,
belong to the ``warm standby'' category. Consistence between
the working and protection objects is maintained in all these
approaches. The fault-tolerance behavior is completely transparent
to the clients for the naming server and the critical business
objects. For event channels, the fault-tolerance protocol is
not completely transparent to the client. The client needs to
implement its BindInterceptor class, and the call-back functions
to be invoked by the BindInterceptor, though the programming
workload is quite minimal compared with the approach by Sheu
[10]. All of our approaches can be
easily scaled to multiple backup objects (i.e ``N : 1
warm standby'') by enhancing the db_sync process to synchronize
multiple databases, No modification on the VisiBroker ORB is
needed by our approaches, although they dependent on some VisiBroker
specific features, such as the osagent and the Interceptor classes.
References
[1] P.Felber, B.Garbinato, and R.Guerraoui.
"The design of a corba group communication service".
Proceedings of 15th Symposium of Reliable Distributed Systems,
pages 150--159, 1996.
[2] IONA and Isis. An introduction
to Orbix+Isis. IONAY Tehnologies Ltd. And Isis Distributed
Systems, Inc., 1994.
[3] S.Kheradpir, W.Stinson, J.Vucetic,
and A.Gersht. "Real-time management of telephone operating
company networks: issues and approachs". IEEE Journal
on Selected Areas in Communications, pages 1385--1403, 1993.
[4] ITU-T.Recommendation M.3000. Overview
fo TMN Standards. ITU-T, March, 1993.
[5] ITU-T Recommendation M.3010. Principles
for a Telecommunications Management Network. ITU-T, 1992.
[6] S.Maffeis. "Run-time support
for object oriented distributed programming". Ph.D thesis,
University of Zurich (Switzerland), Feburary 1995.
[7] OMG. The Common Object Request
Broker (CORBA): Architecture and Specification, v 2.0. Object
Management Group Inc., 1995.
[8] OMG. Common Object Services Specification.
Object Management Group Inc., 1995.
[9] OMG. Persistent State Service
RFP. OMG, 1995/1997-06-07.
[10] G. W. Sheu, Y. S. Chang, D.Liang,
S. M. Yuan, and W.Lo. "A fault-tolerant object service on
corba". Proceedings of the 17 International Conferene
on Dsitributed Computing, pages 393--400, 1997.
[11] W.Stinson and S.Kheradpir. "A
state-based approach to real-time telecommunications network
management". Proceedings of NOMS'92, Memphis, TN,
1992.
[12] W.Stinson, S.Kheradpir, and F.Ebrahimi.
"Design and deployment of an integrated network management
system for a large telco network". Proceedings of NOMS'94,
Orlando, FL, 1994.
[13] Visigenic. VisiBroker for JAVA:
programmer's guide, v3.2. Visigenic, 1998.
Author contact: GTE Labs, 40 Sylvan Rd., Waltham
MA 02454, USA
- phone: (office) 781-466-4293; (fax) 781-466-2941; E-mail
tluo@gte.com
|