Distributed Agreement in a Security Application

Axel W. Krings
Computer Science Dept.
University of Idaho, Moscow ID, USA
krings@cs.uidaho.edu

Miles A. McQueen
INEEL
Idaho Falls, ID, USA
amm@inel.gov

The INEEL's (Idaho National Engineering and Environmental Laboratories) participation in national security programs occasionally leads to problem domains in computing that have not been extensively studied due to their unique nature. One of these that has become important recently is the application of remote distributed monitoring networks to aid in the determination of whether weapons, materials, or chemical agents are being moved inappropriately from their approved storage locations. The uniqueness lies in the requirement that the network reside on foreign soil and be completely physically unsecure. As part of our effort to address this problem, we are investigating whether agreement algorithms may be applied to the physically unsecure network in such a fashion as to keep the system in a self-consistent state and to detect and log apparent malicious tampering for later interrogation and evaluation.

Algorithms for Byzantine and approximate agreement have been studied extensively. However, most algorithms make assumptions about network topology, correct message delivery, authentication of the sender, and detectability of message omissions. In implementations of agreement algorithms, these assumptions may not match the given features of the underlying networking technology and environment. Since the INEEL and other Department of Energy laboratories have extensive experience building Echelon based monitoring networks, which are being considered as the standard for the application mentioned above, we have chosen to use Echelon with its LonTalk protocol as the specific network to use as our testbed. Since Echelon networks are relatively low bandwidth, bus based, CSMA networks, the layering of agreement algorithms onto these networks is problematic. Given the technology and topology limitations, the variety of potential adversarial attacks are puzzling. The adaptation of agreement algorithms to the peculiar problem domain and onto the specific network are the two primary issues we have begun to address with this research.

Mapping agreement algorithms to the problem domain required cooperation with problem domain experts to develop an abstraction for the networks required functionality. The most important view to develop from these discussions is that the network is best viewed as a finite state machine and that it is absolutely imperative that each non-faulty node of the network be in a common state at all times. Certainly, depending on external circumstances, one system state may be more optimal for surveillance than another, but it is far more important to retain system consistency than to be in the optimal state. As an example, imagine a large room being monitored with cameras that are panning back and forth across their physical area of concern. If an intruder is detected on one side of the room, then we want one of the cameras to zoom in on the intruder. By itself, this system response is inadequate since part of the room would no longer be under surveillance. An appropriate system response must ensure that the entire room stays under surveillance, even if it is at a reduced level. Consequently, the other cameras must modify their pan range to guarantee that the entire room is still covered. If an adversary were able to drive one camera to the zoom state and keep the others in their normal coverage state, then the system will have failed. To prevent the system from entering these failed, internally inconsistent states, some form of agreement is required before every state change. The type of agreement required depends on the type of attacks one postulates. Specifically, attacks may manifest themselves to the network as one of three faults: benign, i.e. self-incriminating or globally self-evident faults; symmetric, i.e. single-valued faults; or asymmetric, i.e. Byzantine or two-faced faults [Tha88].

In attempting to subvert the network, adversaries have a variety of basic attacks at their disposal. Each of these attack scenarios can be categorized as one of the three fault types mentioned above. If the adversary attaches an extra, subversive, node to the system and in its messages identifies itself as one of the good nodes, then every non-faulty node on the network will be able to detect the fault since every node will receive two different messages per round from the ``same'' node. This is a single benign fault, and the node may be excluded from further involvement in system state changes. If the adversary were to partition the network by snipping the twisted pair bus, then each partition will detect the absence of messages from nodes on the other partition, and thus the omission of each such message would become a benign fault. Note that partitioning is usually considered a violation of connectivity assumptions and beyond the scope of most agreement algorithms. If the adversary were to remove a good node and insert a subversive node, then it could generate symmetric faults by broadcasting messages with intentionally incorrect values. Even worse, the subversive node could multicast different values to different groups of nodes and thus generate asymmetric faults. This scenario is more general, i.e. less restrictive, than considered in previous work based on reliable broadcast [Bab85]. So it is clear that the agreement algorithm must be able to cope with benign, symmetric, and asymmetric faults, which leads us to evaluating Byzantine agreement algorithms for retaining a consistent system state.

Due to the bandwidth restrictions, agreement algorithms with early stopping behavior are most feasible, and a specific adaptation closely related to work in [Ber92] with an algorithmic definition is currently investigated. In general, the implementation problems of agreement algorithms lay in the following assumptions about the message passing scheme. First, every message sent must be delivered correctly. Second, the absence of a message can be detected. Third, the receiver of a message must know who sent it. The capabilities of the LonTalk protocol are being evaluated to determine whether these requirements can be met.

The first requirement, that every message sent is delivered correctly, may be reasonably met through the data link layer and transport layer of the LonTalk protocol. The data link layer generates a CRC using the CCITT CRC-16 standard, and the transport layer provides acknowledged messaging or unacknowledged-repeated messaging. With appropriate configuration these two layers may provide a high probability that every message sent is delivered correctly.

The second requirement, that the absence of a message can be detected, may be met through the transport layer messaging system and watchdog timers, but may require the use of clock synchronization. The transport layer provides acknowledged messaging, and the retry wait time for receipt of acknowledgment is selectable by the network administrator. Further, by implementing clock synchronization in the application layer, rounds may be synchronized and the absence of a message detected at the termination of each messaging round. Clock synchronization has the beneficial result of not requiring acknowledged messaging and thus saving precious network bandwidth. The third requirement, that the receiver of a message knows who sent it, is problematic in the unsecured physical environment in which we are working. While each LonTalk network node has a ``...unique Neuron Id that is permanently written into the EEPROM during manufacture'' it may in fact not be assumed permanent and is susceptible to change in an unsecured environment. Even worse, for standard message addressing the source node address is specified using entries in the sending nodes domain table. These entries are modifiable by the node's application program! Consequently, if desired, a single malicious node can masquerade as any and all nodes. Also, since the LonTalk protocol is open, it would be possible to develop specialized nodes that have none of the restrictions that the neuron chip and firmware imposes on itself. So there is no direct method for preventing mimicry. Consequently, the third requirement is not met. However, there is a work around. Rather then prevent mimicry, which is impossible, we have chosen to investigate methodologies to detect it. We believe that we are close to developing a mechanism that will allow us to reasonably argue that the third requirement is met.

Current research focuses on low-overhead mechanisms to meet the third requirement for using agreement algorithms. Furthermore, we are investigating the modification of agreement algorithms to adaptively change their agreement groups. This would potentially be useful for enhanced performance under the partitioned network scenario, but foremost, it could provide partial or degraded functionality of the system in case of link damage. Finally, implementation of basic agreement primitives is planned to establish timing parameters of agreement on an Echelon network.



 

References

 

Bab85
Babaoglu, O., and R. Drummond, ``Streets of Byzantium: Network Architectures for Fast Reliable Broadcasts'', IEEE Transactions on Software Engineering, Vol. SE-11, No. 6, pp. 546-554, June 1985.
Ber92
Berman, Piotr, J.A. Garay, and K.J. Perry, ``Optimal Early Stopping in Distributed Consensous'', Proc. 6th International Workshop on Distributed Algorithms (WDAG), LNCS 947, pp. 221-237, Nov. 1992.
Tha88
Thambidurai, P., and You-Keun Park, ``Interactive Consistency with Multiple Failure Modes'', 7th Reliable Distributed Systems Symposium, Columbus, OH, pp. 93-100, Oct. 1988.