Distributed Agreement in a Security Application
Axel W. Krings
Computer Science Dept.
University of Idaho, Moscow ID, USA
krings@cs.uidaho.edu
Miles A. McQueen
INEEL
Idaho Falls, ID, USA
amm@inel.gov
The INEEL's (Idaho National Engineering and Environmental Laboratories)
participation in national security programs occasionally leads to problem
domains in computing that have not been extensively studied due to their
unique nature. One of these that has become important recently is the application
of remote distributed monitoring networks to aid in the determination of
whether weapons, materials, or chemical agents are being moved inappropriately
from their approved storage locations. The uniqueness lies in the requirement
that the network reside on foreign soil and be completely physically unsecure.
As part of our effort to address this problem, we are investigating whether
agreement algorithms may be applied to the physically unsecure network in
such a fashion as to keep the system in a self-consistent state and to detect
and log apparent malicious tampering for later interrogation and evaluation.
Algorithms for Byzantine and approximate agreement have been studied
extensively. However, most algorithms make assumptions about network topology,
correct message delivery, authentication of the sender, and detectability
of message omissions. In implementations of agreement algorithms, these
assumptions may not match the given features of the underlying networking
technology and environment. Since the INEEL and other Department of Energy
laboratories have extensive experience building Echelon based monitoring
networks, which are being considered as the standard for the application
mentioned above, we have chosen to use Echelon with its LonTalk protocol
as the specific network to use as our testbed. Since Echelon networks are
relatively low bandwidth, bus based, CSMA networks, the layering of agreement
algorithms onto these networks is problematic. Given the technology and
topology limitations, the variety of potential adversarial attacks are puzzling.
The adaptation of agreement algorithms to the peculiar problem domain and
onto the specific network are the two primary issues we have begun to address
with this research.
Mapping agreement algorithms to the problem domain required cooperation
with problem domain experts to develop an abstraction for the networks required
functionality. The most important view to develop from these discussions
is that the network is best viewed as a finite state machine and that it
is absolutely imperative that each non-faulty node of the network be in
a common state at all times. Certainly, depending on external circumstances,
one system state may be more optimal for surveillance than another, but
it is far more important to retain system consistency than to be in the
optimal state. As an example, imagine a large room being monitored with
cameras that are panning back and forth across their physical area of concern.
If an intruder is detected on one side of the room, then we want one of
the cameras to zoom in on the intruder. By itself, this system response
is inadequate since part of the room would no longer be under surveillance.
An appropriate system response must ensure that the entire room stays under
surveillance, even if it is at a reduced level. Consequently, the other
cameras must modify their pan range to guarantee that the entire room is
still covered. If an adversary were able to drive one camera to the zoom
state and keep the others in their normal coverage state, then the system
will have failed. To prevent the system from entering these failed, internally
inconsistent states, some form of agreement is required before every state
change. The type of agreement required depends on the type of attacks one
postulates. Specifically, attacks may manifest themselves to the network
as one of three faults: benign, i.e. self-incriminating or globally self-evident
faults; symmetric, i.e. single-valued faults; or asymmetric, i.e. Byzantine
or two-faced faults [Tha88].
In attempting to subvert the network, adversaries have a variety of basic
attacks at their disposal. Each of these attack scenarios can be categorized
as one of the three fault types mentioned above. If the adversary attaches
an extra, subversive, node to the system and in its messages identifies
itself as one of the good nodes, then every non-faulty node on the network
will be able to detect the fault since every node will receive two different
messages per round from the ``same'' node. This is a single benign fault,
and the node may be excluded from further involvement in system state changes.
If the adversary were to partition the network by snipping the twisted pair
bus, then each partition will detect the absence of messages from nodes
on the other partition, and thus the omission of each such message would
become a benign fault. Note that partitioning is usually considered a violation
of connectivity assumptions and beyond the scope of most agreement algorithms.
If the adversary were to remove a good node and insert a subversive node,
then it could generate symmetric faults by broadcasting messages with intentionally
incorrect values. Even worse, the subversive node could multicast different
values to different groups of nodes and thus generate asymmetric faults.
This scenario is more general, i.e. less restrictive, than considered in
previous work based on reliable broadcast [Bab85]. So it is clear that
the agreement algorithm must be able to cope with benign, symmetric, and
asymmetric faults, which leads us to evaluating Byzantine agreement algorithms
for retaining a consistent system state.
Due to the bandwidth restrictions, agreement algorithms with early stopping
behavior are most feasible, and a specific adaptation closely related to
work in [Ber92] with an algorithmic definition is currently investigated.
In general, the implementation problems of agreement algorithms lay in the
following assumptions about the message passing scheme. First, every message
sent must be delivered correctly. Second, the absence of a message can be
detected. Third, the receiver of a message must know who sent it. The capabilities
of the LonTalk protocol are being evaluated to determine whether these requirements
can be met.
The first requirement, that every message sent is delivered correctly,
may be reasonably met through the data link layer and transport layer of
the LonTalk protocol. The data link layer generates a CRC using the CCITT
CRC-16 standard, and the transport layer provides acknowledged messaging
or unacknowledged-repeated messaging. With appropriate configuration these
two layers may provide a high probability that every message sent is delivered
correctly.
The second requirement, that the absence of a message can be detected,
may be met through the transport layer messaging system and watchdog timers,
but may require the use of clock synchronization. The transport layer provides
acknowledged messaging, and the retry wait time for receipt of acknowledgment
is selectable by the network administrator. Further, by implementing clock
synchronization in the application layer, rounds may be synchronized and
the absence of a message detected at the termination of each messaging round.
Clock synchronization has the beneficial result of not requiring acknowledged
messaging and thus saving precious network bandwidth. The third requirement,
that the receiver of a message knows who sent it, is problematic in the
unsecured physical environment in which we are working. While each LonTalk
network node has a ``...unique Neuron Id that is permanently written into
the EEPROM during manufacture'' it may in fact not be assumed permanent
and is susceptible to change in an unsecured environment. Even worse, for
standard message addressing the source node address is specified using entries
in the sending nodes domain table. These entries are modifiable by the node's
application program! Consequently, if desired, a single malicious node can
masquerade as any and all nodes. Also, since the LonTalk protocol is open,
it would be possible to develop specialized nodes that have none of the
restrictions that the neuron chip and firmware imposes on itself. So there
is no direct method for preventing mimicry. Consequently, the third requirement
is not met. However, there is a work around. Rather then prevent mimicry,
which is impossible, we have chosen to investigate methodologies to detect
it. We believe that we are close to developing a mechanism that will allow
us to reasonably argue that the third requirement is met.
Current research focuses on low-overhead mechanisms to meet the third
requirement for using agreement algorithms. Furthermore, we are investigating
the modification of agreement algorithms to adaptively change their agreement
groups. This would potentially be useful for enhanced performance under
the partitioned network scenario, but foremost, it could provide partial
or degraded functionality of the system in case of link damage. Finally,
implementation of basic agreement primitives is planned to establish timing
parameters of agreement on an Echelon network.
References
- Bab85
- Babaoglu, O., and R. Drummond, ``Streets of Byzantium: Network Architectures
for Fast Reliable Broadcasts'', IEEE Transactions on Software Engineering,
Vol. SE-11, No. 6, pp. 546-554, June 1985.
- Ber92
- Berman, Piotr, J.A. Garay, and K.J. Perry, ``Optimal Early Stopping
in Distributed Consensous'', Proc. 6th International Workshop on Distributed
Algorithms (WDAG), LNCS 947, pp. 221-237, Nov. 1992.
- Tha88
- Thambidurai, P., and You-Keun Park, ``Interactive Consistency with
Multiple Failure Modes'', 7th Reliable Distributed Systems Symposium,
Columbus, OH, pp. 93-100, Oct. 1988.
|