Simulation Study for a SONET Restoration Protocol
- Gheorghe Spiride and V. S. S. Nair
1
- Department of Computer Science and Engineering
- Southern Methodist University, Dallas, TX 75275, USA
-
In this abstract we describe
a simulation framework for modeling dynamic distributed restoration protocols
for SONET mesh networks ([1], [2]).
Along with the simulation model, we describe a new proactive distributed
restoration protocol that we propose for restoring single and multiple independent
link failures in such networks. Link failure restoration has been studied
in great detail over the past decade, due to the ever increasing need for
reliable transport protocols. Our work makes several contributions: we propose
a new distributed restoration protocol for SONET meshes, we develop a modeling
framework that is modular and can accommodate multiple restoration protocols,
and we use this tool to study and validate the performance of our proposed
protocol.
A self healing SONET network is an optical fiber
network in which digital cross-connect switches are used as transmission
hubs. The DCS performs switching, fault detection, and recovery functions.
The switches in the network are connected through point-to-point synchronous
links, similar to the structure presented in Figure 1.

We assume that once a link failure appears, the
end-points of the failed link are notified instantly, thus the detection
delay is considered to be negligible. More importantly, we assume that there
is no global information about the network topology. The initial network
configuration is such that all communication nodes have information about
the identities of its direct neighbors. The lack of global topology information
and which alternative a given protocol chooses to solve this problem is
a dominant factor in the performance of a restoration scheme.
For our proactive restoration protocol, we propose that each node
keeps track of topology related information while in the normal operation
phase. We distinguish between 3 phases of the protocol:
- Initialization : run once by every node in the network upon
startup or reset.
- Propagation : a distributed parallel algorithm that iteratively
constructs global topology information on each network node. This phase
is run whenever a topology change occurs.
- Restoration : a distributed parallel algorithm that selects
and restores along candidate paths determined in the Propagation phase.
Every node derives a local view of the global
topology information while running the propagation phase of our algorithm.
This phase of the protocol takes place in the absence of any facility failures;
this represents the proactive component of our protocol. All the nodes in
the network execute asynchronously the same algorithm, which iteratively
builds paths from each node to all other nodes in the network. These paths
are to be considered as candidates for restoration paths once a link failure
occurs in the network.
The number of such potential restoration paths
grows exponentially. We arbitrarily impose a restriction on the total number
of paths to be determined, by limiting the number of hops. Our experiments
show that for typical networks, the network diameter is a good heuristic
for the hop count. Other heuristics can be applied for pruning the candidate
paths, and we are currently studying their impact on algorithm performance.
Longer paths are less desirable in the event of a failure, since the propagation
delay becomes the dominant factor in the restoration time, as the path length
increases.
The proactive restoration scheme that we describe
differs from pre-planning or pre-configuration of restoration paths that
other authors propose, in the sense that it dynamically adapts to
network topology changes. Another advantage of our proposed approach is
that we decouple the problem of finding restoration paths from the problem
of restoring a failure in the network. The penalty of enumerating a potentially
large number of paths is considered to be less important than being able
to immediately use this set of candidate paths once a failure is detected.
The physical structure of the OPNET simulation
model at the highest Network level of abstraction consists of a network
similar to the one shown in Figure 1. Two types of
nodes are included in our model:
- Controller Node: there is one Controller per subnetwork; this
node is responsible for initializing the model and the configuration file
that describes the failures that occur in the network.
- Mesh Node : there could be as many Mesh nodes as desired
per subnetwork; these nodes model the DCS devices that are used as transmission
hubs in the network. In our model, these nodes run the distributed protocol
we propose.
Mesh Nodes communicate over point-to-point links
using several types of control messages:
- Propagation Messages : are used to spread path information through
the network according to the description presented in the previous Section.
Message propagation is asynchronous and new candidate paths are generated
until a fixed hop-count limit is attained. This limit is fixed as a simulation
parameter.
- Restoration Messages : are used to reroute the lost working
capacity across some of the candidate paths determined in the previous
protocol phase.

The bandwidth requirements for all types of
control messages are not very large due to a clever path encoding that ensures
that all paths are uniquely described by a fixed length representation.
Similarly, the storage requirements on the network nodes are low, since
paths are stored using the same mechanism.
The results of several experiments we conducted
are presented in Figure 3 and Figure
2 for the network shown in Figure 1. By decoupling
the path identification phase and the restoration phase, our proposed protocol
achieved an average restoration time of under 100 ms for this network, as
shown in Figure 2.
Figure 3 plots the time
it takes the propagation phase to converge for different link speed
rates (triangles indicate 64 Kbps, others 576 Kbps). The ordinate
axis indicates the number of paths found.

These results are very promising for further study,
because they open up the possibility of effectively competing with the restoration
times for other methods, such as APS, which need a much higher spare capacity
allocation than mesh networks.
References
[1] G. Spiride, G. Deprez, and S. Nair.
A proactive distributed network restoration protocol. Technical Report 97-CSE-01,
Southern Methodist University, Department of Computer Science and Engineering,
January 1997.
[2] G. Spiride and S. Nair. Proactive distributed
restoration protocol for SONET mesh networks. In Proc. of the 4th INFORMS
Telecommunications Meeting, March 1998. Author contact: Department of Computer Science and
Engineering Southern Methodist University, PO BOX 750122, Dallas,
TX 752755-0122, USA. Phone: (214) 768 2005. Fax: (214) 768 3085. E-mail:
gspiride@seas.smu.edu, nair@seas.smu.edu
|