Fast Abstracts Archives . .

FastAbstracts


WHAT IS a
FastAbstract

The History

Archives of
FastAbstracts

ISSRE 2003
ISSRE 2002
ISSRE 2001
ISSRE 2000
ISSRE 1999
ISSRE 1998
FTCS 1999
FTCS 1998



 

 

 

 

Safe Reuse of Source to Intermediate Language Compilations

Andreas Heberle, Welf Löwe, and Martin Trapp 1 
Institut für Programmstrukturen und Datenorganisation
Universität Karlsruhe
 

 

Safety critical applications require software to be verified. Correctness proofs of programs usually assume that programs are written in a high level language. However, it is the object code which is executed on the target machine. Therefore the correctness depends on the correctness of the compiler. The construction of a correct compiler consists of two parts: (i) the verification of the transformation specification wrt. the semantics specification of source and target language and (ii) the verification of the implementation wrt. the transformation specification. 
Assumed that a fronted generates an AST representation for statically correct programs, the construction of a verified source to intermediate code compiler can be performed in the following way: 

  1. Give semantics to each node type of the abstract syntax. We specify semantics operationally by an abstract state machine (ASM) [Gur95]. 
  2. For each node type of the abstract syntax of the source language define a transformation into intermediate language terms . 
  3. Verify the transformation wrt. the specified semantics. Let ASM(N) be the semantics of an abstract syntax node N and ASM(T) be the semantics of T which is the intermediate representation of N. The latter is assembled by the ASMs of the respective node types of the intermediate language building up T. Then prove that ASM(N) is simulated by ASM(T)
  4. Finally, implement the compiler and verify the implementation wrt. the (already verified) specification of the transformation. 

All these steps have to be performed for each single pair of source and intermediate languages. Unfortunately, at least from the compiler constructor's point of view, neither source nor intermediate languages seem to converge however. Because source languages are designed wrt. applications and intermediate languages wrt. hardware architectures, such a convergence can not be expected in the future. 

The first step to ease these tasks is the design of yet another (intermediate intermediate) language, say AL, between the source and the intermediate languages. We could then specify the transformation from any source language to AL, and further specify the transformation from AL to any intermediate language. The advantages are obvious: 

  • We are able to reuse the parts of the transformation specification and their verification if we add a new source and intermediate language, respectively. 
  • New source and intermediate languages could be added independently. 

AL is not a universal compiler language and we could think of arbitrary levels of intermediate languages. Therefore, we define a kernel language AL, which is a very restricted subset of ASMs, but still suitable for the definition of programs' executions. Additionally, we provide a library of higher concepts each defined in terms of AL. Since this library can never be complete, we choose design restriction for such a library allowing compiler constructors to insert new constructs which define new semantical concepts.  AL  and the extensions of AL can easily be implemented by an object-oriented library. New terms are modeled by new classes, their reduction is modeled by a distinguished method that maps the new to already existing classes. This construction follows the principle of substitutability that guarantees correctness by construction. For each construct of terms in normal form we define a transformation to constructs of the intermediate language in question. This definition follows the same construction principles. Since normal form terms are low level, this transformation is often a one-to-one relation or supported by standard transformations, e.g. for memory mapping or building of basic blocks. Our intention is that the library accumulates to a tool supporting major parts of compilers for various source and intermediate languages.
Example: Assumed the programming language has a while statement defined as usual. The representation of an AST node is the following: 
 
class C-While is 
   expr : C-IntExpr;
   stats : C-Stats;
   ASM::semantics (in next_task: Task; out I, T: Task);
   define (in next_task: Task; out I, T: Task); 
end;
 

Figure 1: Montage for a C while loop

 

The attributes expr and stats define successor nodes in the AST representation, semantics defines the semantics of C-While in terms of ASMs modeled by an external class ASM. Together this can be considered as an object-oriented definition for the Montage [KP97] in Figure 1. It defines abstract syntax, control flow information (solid edges in the graph), data flow information (dashed edges), and dynamic semantics of the while loop. I and T are used to integrate the statement into its context. The dynamic semantics of the oval elements (tasks of a certain type) is defined in the lower part of the Montage. All together they define an interpreter using an abstract program counter ct. According to the rules computations are performed and ct is updated. The semantics of the boxes must be defined by other Montages. These elements are plugged in (using I and T) at the respective positions. Hence, a Montage can be seen as a generic ASM definition of language elements where the generic parameters (boxed elements) are itself ASMs. For each syntactically correct program the Montages specification of a language defines a task graph (tasks are nodes, control and data flow information are edges). Its semantics is defined by the interpreter described above. 


  Figure 2: Montage for the AL while loop

 

In addition to this specification, define describes the transformation to the while statement of AL. which is defined by the Montage in Figure 2 and is in turn represented by a class AL-While.  This transformation maps the do command to the One command of AL, C-Stats to AL-Stats and adapts C-IntExpr to AL-BoolExpr. Therefore we introduce a wrapper that converts the result of C-IntExpr to the required Boolean value.  This wrapper is defined in Figure 3. It is an instantiation of the Boolean expression in AL which gets as a generic parameter an AL-IntExpr. The C-IntExpr can now be mapped to this AL-IntExpr. It can be proved that AL-While together with the wrapped AL-IntExpr simulates C-While
 

Figure 3: Wrapper Int to Bool  

Each transformation step has to be verified by showing that the resulting ASM simulates the original one. For each AL construct there exists either a transformation to lower AL features which follows the same principles as described above or a  definition in terms of the intermediate representation. However, the transformations within AL are verified only once and can then safely be reused. The compiler constructor needs only proof correct the transformation of the source language to high level AL constructs.   

References

[Gur95] Y. Gurevich "Evolving Algebras: Lipari Guide", Specification and Validation Methods, Oxford University Press, 1995.
[KP97] P.W. Kutter and A. Pierantonio "Montages: Specifications of Realistic Programming Languages", Journal of Universal Computer Science, Vol. 3, Nr. 5, 1997. 

1. Author contact: Institut für Programmstrukturen und Datenorganisation, Universität Karlsruhe, Zirkel 2, 76128 Karlsruhe, Germany. Phone +49 721 608 7400. Fax: +49 721 30047. E-Mail: {heberle | loewe | trapp}@ipd.info.uni-karlsruhe.de.