The CompCert C compiler
CompCert C is a compiler for the C programming language. Its intended use is the compilation of life-critical and mission-critical software written in C and meeting high levels of assurance. It accepts almost all of the ISO~C~90 / ANSI~C language, with a few exceptions and several extensions. It produces machine code for the PowerPC, ARM, and IA32 (x86 32-bits) architectures. Performance of the generated code is decent but not outstanding: on PowerPC, about 90% of the performance of GCC version 4 at optimization level 1.
What sets CompCert C apart from any other production compiler, is that it is formally verified, using machine-assisted mathematical proofs, to be exempt from miscompilation issues. In other words, the executable code it produces is proved to behave exactly as specified by the semantics of the source C program. This level of confidence in the correctness of the compilation process is unprecedented and contributes to meeting the highest levels of software assurance. In particular, using the CompCert C compiler is a natural complement to applying formal verification techniques (static analysis, program proof, model checking) at the source code level: the correctness proof of CompCert C guarantees that all safety properties verified on the source code automatically hold as well for the generated executable.
CompCert C supports all of ISO C 90, also known as ANSI C, with the following exceptions:
switchstatements must be structured as in MISRA-C; unstructured "switch", as in Duff's device, is not supported.
- Unprototyped, old-style function types are not supported. All functions must be prototyped.
- Variable-argument functions cannot be defined.
setjmpare not guaranteed to work.
Consequently, CompCert supports all of the MISRA-C 2004 subset of C, plus many features excluded by MISRA (such as recursive functions and dynamic heap memory allocation).
Several extensions to ISO C 90 are supported:
//" single-line comments, as in ISO C99.
- Declarations in the middle of blocks and within
forloops, as in ISO C99.
_Booltype from ISO C99.
_Alignofoperator from ISO C11.
- Pragmas and attributes to control alignment and section placement of global variables.
Part 1: Parsing, type-checking, and pre-simplifications. This first part converts C source code into abstract syntax trees of the CompCert C language. Some constructs not natively supported by CompCert C are expanded away. For example, block-scoped local variables are renamed and lifted to function-local scope; and bit fields are emulated in terms of bit-level operations. Some other unsupported constructs, such as variable-arity function declarations, are rejected.
This part of CompCert (transformation of C source text to CompCert C
abstract syntax trees) is not formally verified. However, CompCert C is a
subset of C, and the compiler can output the generated
CompCert C code in C concrete syntax (flag
therefore the result of this transformation can be manually inspected.
Moreover, most static analysis and program verification tools for C
operate on a simplified C language similar to CompCert C. By conducting
the analysis or the program verification directly on the CompCert C form,
bugs potentially introduced by this first part of the compiler can be
Part 2: Compilation of CompCert C AST to assembly AST. This part is the bulk of the compiler and the one that is proved correct in Coq. It is structured in 14 passes and uses 11 intermediate language, as depicted on the following diagram.
All intermediate languages are given formal semantics, and each of the transformation passes is proved to preserve semantics.
Part 3: Assembling and linking.
The abstract syntax tree for PowerPC or ARM or x86 assembly language
produced by part 2 is printed in concrete assembly syntax. The
system's assembler and linker are then called to produce object files
and executable files, respectively. This part is not yet formally
verified. A benefit of using the standard assembler and linker is
that object files produced by CompCert can be linked with existing
libraries compiled with
gcc. This is convenient for
testing, although the formal guarantees of semantic preservation apply
only to whole programs that have been compiled as a whole by CompCert
The executable for CompCert C is called
It has the standard command-line interface for a Unix C compiler.
For instance, to compile the single-file program
and create an executable called
exec, just do
ccomp -o exec src.c
To compile a two-file program
ccomp -c src1.c ccomp -c src2.c ccomp -o exec src1.o src2.o
To see the CompCert C code and the assembly code generated for
ccomp -dc -S src1.c
The generated assembly code is left in file
and the generated CompCert C code in file
For more details, see the CompCert C user's manual.
On PowerPC and ARM, the code generated by CompCert runs at least twice
as fast as the code generated by GCC without optimizations
gcc -O0), and approximately 10% slower than GCC 4 at optimization
level 1 (
gcc -O1), 15% slower at optimization level 2
gcc -O2) and 20% slower at optimization level 3 (
gcc -O3 ). These numbers were obtained on the homemade benchmark mix shown in
the graph below. By lack of aggressive loop optimizations,
performance is lower on HPC codes involving lots of matrix
(The sources for the test programs are available in the
test/c subdirectory of the CompCert distribution.
Measurements were done on an IBM Power7 running Linux.)