The CompCert C compiler

      

CompCert C is a compiler for the C programming language. Its intended use is the compilation of life-critical and mission-critical software written in C and meeting high levels of assurance. It accepts most of the ISO C 99 language, with some exceptions and a few extensions. It produces machine code for the PowerPC, ARM, RISC-V and x86 (32 and 64 bits) architectures. Performance of the generated code is decent but not outstanding: on PowerPC, about 90% of the performance of GCC version 4 at optimization level 1.

What sets CompCert C apart from any other production compiler, is that it is formally verified, using machine-assisted mathematical proofs, to be exempt from miscompilation issues. In other words, the executable code it produces is proved to behave exactly as specified by the semantics of the source C program. This level of confidence in the correctness of the compilation process is unprecedented and contributes to meeting the highest levels of software assurance. In particular, using the CompCert C compiler is a natural complement to applying formal verification techniques (static analysis, program proof, model checking) at the source code level: the correctness proof of CompCert C guarantees that all safety properties verified on the source code automatically hold as well for the generated executable.

The subset of C supported

CompCert C supports all of ISO C 99, with the following exceptions:

  • By default, switch statements must be structured as in MISRA-C. Support for unstructured switch, as in Duff's device, can be activated via the command-line option -funstructured-switch.
  • longjmp and setjmp are not guaranteed to work.
  • Variable-length arrays are not supported.

Consequently, CompCert supports all of the MISRA-C 2004 subset of C, plus many features excluded by MISRA (such as recursive functions and dynamic heap memory allocation).

A number of ISO C 2011 features are also supported:

  • The _Alignof operator and the _Alignas attribute.
  • Anonymous structures and unions.
  • _Static_assert statements and _Generic expressions.
  • Unicode string literals and character constants.

CompCert also supports some extensions to ISO C taken from the GNU and Diab compilers:

  • Pragmas and attributes to control alignment and section placement of global variables.
  • Attributes to control struct packing, function inlining, etc.
  • Built-in functions to give access to some processor-specific instructions.
  • Inline assembly code with GNU-style arguments.

Architecture of the compiler

Part 1: Parsing, type-checking, and pre-simplifications. This first part converts C source code into abstract syntax trees of the CompCert C language. Some constructs not natively supported by CompCert C are expanded away. For example, block-scoped local variables are renamed and lifted to function-local scope; and unstructured switch statements are rewritten to structured switch statements with goto statements. Some other unsupported constructs, such as variable-length arrays, are rejected.

This part of CompCert (transformation of C source text to CompCert C abstract syntax trees) is not formally verified. However, CompCert C is a subset of C, and the compiler can output the generated CompCert C code in C concrete syntax (flag -dc), therefore the result of this transformation can be manually inspected. Moreover, most static analysis and program verification tools for C operate on a simplified C language similar to CompCert C. By conducting the analysis or the program verification directly on the CompCert C form, bugs potentially introduced by this first part of the compiler can be detected.

Part 2: Compilation of CompCert C AST to assembly AST. This part is the bulk of the compiler and the one that is proved correct in Coq. It is structured in 16 passes and uses 10 intermediate language, as depicted on the following diagram.

CompCert compiler passes

All intermediate languages are given formal semantics, and each of the transformation passes is proved to preserve semantics.

Part 3: Assembling and linking. The abstract syntax tree for PowerPC or ARM or RISC-V or x86 assembly language produced by part 2 is printed in concrete assembly syntax. The system's assembler and linker are then called to produce object files and executable files, respectively. This part is not yet formally verified. A benefit of using the standard assembler and linker is that object files produced by CompCert can be linked with existing libraries compiled with gcc. This is convenient for testing, although the formal guarantees of semantic preservation apply only to whole programs that have been compiled as a whole by CompCert C.

Using the compiler

The executable for CompCert C is called ccomp. It has the standard command-line interface for a Unix C compiler. For instance, to compile the single-file program src.c and create an executable called exec, just do

       ccomp -o exec src.c

To compile a two-file program src1.c and src2.c, do

       ccomp -c src1.c
       ccomp -c src2.c
       ccomp -o exec src1.o src2.o

To see the CompCert C code and the assembly code generated for src1.c, do

       ccomp -dc -S src1.c

The generated assembly code is left in file src1.s and the generated CompCert C code in file src1.compcert.c.

For more details, see the CompCert C user's manual.

Performance of the generated code

On PowerPC and ARM, the code generated by CompCert runs at least twice as fast as the code generated by GCC without optimizations (gcc -O0), and approximately 10% slower than GCC 4 at optimization level 1 (gcc -O1), 15% slower at optimization level 2 (gcc -O2) and 20% slower at optimization level 3 (gcc -O3 ). These numbers were obtained on the homemade benchmark mix shown in the graph below. By lack of aggressive loop optimizations, performance is lower on HPC codes involving lots of matrix computations.

CompCert benchmark results

(The sources for the test programs are available in the test/c subdirectory of the CompCert distribution. Measurements were done on an IBM Power7 running Linux.)