On Automating Abstraction Proofs

Rob Sumners and Jacob A Abraham
Computer Engineering Research Center
The University of Texas at Austin

Introduction

Model checking methods have become a popular technique for the removal of faults by checking correctness of temporal logic properties in finite state hardware and software models. The difficulty with this approach is the familiar state explosion problem. A common technique for alleviating the state explosion problem is state space abstraction where irrelevant state information is quotiented out of the state space. Abstraction is often done manually and while abstract model analysis may provide important feedback, the analysis does not necessarily correlate to the original system unless the abstraction is shown to preserve the behavior of the original system. In a sense, abstraction takes a behavioral correctness proof for some system and breaks it into two steps, (1) abstract the atomic state model to reasonably finite abstract state model and (2) check resulting abstract model for behavioral correctness. Unless step (1) is guaranteed, the entire ``verification'' is in question. The problem is then how to show that step (1) is correct. For common rich system implementation languages, it is undecidable and some form of deduction may be required. Others have attempted to use theorem provers to verify the abstraction[HS96], but the abstraction proofs can often be as difficult if not more so than a complete proof of the property on the original system to begin with. We present some simple techniques to simplify these obligations so that a more-directed theorem prover like ACL2[BM97], Otter, etc. can be used to automatically prove abstractions correct and alleviate much of the need for user interaction. We have built a simple preprocessing tool which takes a LISP definition of a systems transition function and abstraction function and outputs (1) a simplified proof script for ACL2 and (2) a finite model definition which can be model checked. We have used this tool to automatically prove the abstraction for an infinite state 2-cache system given in the figure below. The example system consists of two caches with processor interfaces -- the latched address, data, command -- and a pending bit used to denote that a load or store command is waiting to be performed. Each processor has an unbounded cache and use a common shared memory. A property we wish to show is that for any address $A$ and data value D, if we perform the command store,A,D for some processor and perform no other stores to A, then the next $load$ should return the value D.

Abstractions

A state model is defined by an initial state q in Q and a next-state transition function step : Q X I --> Q which takes the current state from state set Q and an input from the set I and returns a next state. An abstraction is defined by a total idempotent function rep : Q --> Q which for every state returns a representative. Intuitively, a representative state is chosen such that it has representative behavior for all states which map to it relative to the other representative states chosen. We also need to choose representative inputs, so we will also assume an abstraction function repI : I --> I is defined. Formally, we would like to show:

(1) for any states s,t and inputs i,j, (rep(s) = rep(t) and repI(i) = repI(j)) implies (rep(step(s,i)) = rep(step(t,j))).

This would guarantee that states s and t were bisimilar and we could safely coalesce these states together assuming the resulting model -- defined by $rep(q)$ and next state function rep(step(rep(s),repI(i))) -- would measure the state propositions in our temporal logic formula. In practice it is difficult to get bisimulations with a finite number of representatives, so we may need to weaken the definition by including another abstraction function repN : Q --> Q, where repN(s) = repN(t) implies rep(s) = rep(t), which defines representatives which agree on the next rep states for repI equivalent inputs. Formally we need to show, for all states s,t and inputs i,j, (repN(s) = repN(t) and repI(i) = repI(j)) implies (rep(step(s,i)) = rep(step(t,j))). This allows the abstract model on rep states to hide enough information to be finite (with some cost of model correlation). We will focus on the first case, nrep = rep, for presentation simplicity.

Proof Automation

We want to prove property (1) using a theorem prover. Proving this directly will most likely require some amount of human-prover interaction. This interaction will consist of doing some case splitting on the relationship between s,t,i,j and in each of these cases use some reasoning about equality, induction, etc. to show that for this case, the theorem holds. Intuitively, though, we have defined all the cases in the abstract model. So a viable alternative would be to somehow recast (1) such that the case splitting required is defined by the abstract model. Assuming a finite number of reachable representatives and a finite number of representative inputs, we can construct the abstract model defined by initial state rep(q) in R within Q and next state function rstep : R X E --> R, where E within I and rstep := rep(step(rep(s),repI(i))) by simply executing over representatives. With rstep, instead of proving (1) monolithically, we can alternatively prove a simplified theorem for each edge in rstep:

(2) for all edges (v,k,w) rstep: for all states s and inputs i, (rep(s) = v and repI(i) = k) implies rep(step(s,i)) = w

More aggressive theorem provers such as ACL2 can deal much more effectively with theorems of the form in (2) rather than (1). In fact, for our cache example, ACL2 is able to prove every theorem in the set defined by (2). Unfortunately, in practice, while the abstract model may be an effective case splitter for automating proofs, the number of edges in abstract models can make the number of proofs required too large. For many cases though, including the cache example, we can decompose the problem of abstraction proofs to specific types used in the function definition. Finite types (such as booleans and {load,store}) do not need to be abstracted. The non-finite types -- address, data, cache, memory -- do need to be abstracted. Above these types though are composite types whose abstractions simply compose the abstractions of the underlying types. It is preferable to push the abstraction proof obligations down through the composite types to the non-finite types. We can do this using simple variants of the following observation. Given models step1 : Q1 X I --> Q1 and step2 : Q2 X I --> Q2 with proven abstractions rep1,rep2 respectively for input abstraction repI. Then, for input abstraction repI, the function rep : Q1 X Q2 --> Q1 X Q2 defined by rep(<s,t&gt) = <rep1(s),rep2(t) &gt is an abstraction for the model step : Q1 X Q2 X I --> Q1 X Q2 defined by step(<s,t&gt,i) = <step1(s,i),step2(t,i)&gt.

Conclusions

We wrote a simple tool which took a LISP definition of a next state transition function and generated a set of proof obligations. The tool attempted to find simple composite types from which it could push down proof obligations. For the simple cache example in figure 1, the resulting proof obligations were localized to the cache, memory, data, and address types -- the rest of the types are finite composites or preserved completely by the abstraction. The resulting 392 proof obligations are proven automatically using ACL2. The resulting model can then be used to check our property of interest. Our short presentation here is an attempt to show the usefulness in abstract models not only for model checking but for efficient methods of case analysis. Future work will include attempts to extend these constructions with well-foundedness arguments and more efficient representations.

References

[HS96]K. Havelund, and N. Shankar. ``Experiments in Theorem Proving and Model Checking for Protocol Verification,'' in FME 96.

[BM97]R.S.Boyer, and J.S. Moore. {\em A Computational Logic Handbook, 2nd ed.} Academic Press, 1997.