The CompCert verified compiler

Commented Coq development

Version 3.13, 2023-07-04

Introduction

CompCert is a compiler that generates ARM, PowerPC, RISC-V and x86 assembly code from CompCert C, a large subset of the C programming language. The particularity of this compiler is that it is written mostly within the specification language of the Coq proof assistant, and its correctness --- the fact that the generated assembly code is semantically equivalent to its source program --- was entirely proved within the Coq proof assistant.

High-level descriptions of the CompCert compiler and its proof of correctness can be found in the following papers (in increasing order of technical details):

Xavier Leroy, Formal verification of a realistic compiler. Communications of the ACM 52(7), July 2009.
Xavier Leroy, A formally verified compiler back-end. Journal of Automated Reasoning 43(4):363-446, 2009.

This Web site gives a commented listing of the underlying Coq specifications and proofs. Proof scripts are folded by default, but can be viewed by clicking on "Proof". Some modules (written in italics below) differ between the five target architectures. The PowerPC versions of these modules are shown below; the AArch64, ARM, x86 and RISC-V versions can be found in the source distribution.

This development is a work in progress; some parts have substantially changed since the overview papers above were written.

The complete sources for CompCert can be downloaded from the Git repository or the CompCert Web site.

This document and the CompCert sources are copyright Institut National de Recherche en Informatique et en Automatique (INRIA) and AbsInt Angewandte Informatik GmbH, and are distributed under the terms of the following license.

General-purpose libraries, data structures and algorithms

Coqlib: addendum to the Coq standard library.
Maps: finite maps.
Integers: machine integers.
Floats: machine floating-point numbers.
Iteration: various forms of "while" loops.
Ordered: construction of ordered types.
Lattice: construction of semi-lattices.
Kildall: resolution of dataflow inequations by fixpoint iteration.
UnionFind: a persistent union-find data structure.
Postorder: postorder numbering of a directed graph.

Definitions and theorems used in many parts of the development

Errors: the Error monad.
AST: identifiers, whole programs and other common elements of abstract syntaxes.
Linking: generic framework to define syntactic linking over the CompCert languages.
Values: run-time values.
Events: observable events and traces.
Memory: memory model.
See also: Memdata (in-memory representation of data).
Globalenvs: global execution environments.
Smallstep: tools for small-step semantics.
Behaviors: from small-step semantics to observable behaviors of programs.
Determinism: determinism properties of small-step semantics.
Op: operators, addressing modes and their semantics.
Builtins: semantics of built-in functions.
See also: Builtins0 (target-independent part), Builtins1 (target-dependent part).
Unityping: a solver for atomic unification constraints.

Source, intermediate and target languages: syntax and semantics

The CompCert C source language: syntax and semantics and determinized semantics and type system.
See also: type expressions and operators (syntax and semantics).
See also: reference interpreter.
Clight: a simpler version of CompCert C where expressions contain no side-effects.
Csharpminor: low-level structured language.
Cminor: low-level structured language, with explicit stack allocation of certain local variables.
CminorSel: like Cminor, with machine-specific operators and addressing modes.
RTL: register transfer language (3-address code, control-flow graph, infinitely many pseudo-registers).
See also: Registers (representation of pseudo-registers).
LTL: location transfer language (3-address code, control-flow graph of basic blocks, finitely many physical registers, infinitely many stack slots).
See also: Locations (representation of locations) and Machregs (description of processor registers).
Linear: like LTL, but the CFG is replaced by a linear list of instructions with explicit branches and labels.
Mach: like Linear, with a more concrete view of the activation record.
Asm: abstract syntax for PowerPC assembly code.

Compiler passes

Pass	Source & target	Compiler code	Correctness proof
Pulling side-effects out of expressions; fixing an evaluation order	CompCert C to Clight	SimplExpr	SimplExprspec SimplExprproof
Pulling non-adressable scalar local variables out of memory	Clight to Clight	SimplLocals	SimplLocalsproof
Simplification of control structures; explication of type-dependent computations	Clight to Csharpminor	Cshmgen	Cshmgenproof
Stack allocation of local variables whose address is taken; simplification of switch statements	Csharpminor to Cminor	Cminorgen	Cminorgenproof
Recognition of operators and addressing modes; if-conversion	Cminor to CminorSel	Selection SelectOp SelectLong SelectDiv SplitLong	Selectionproof SelectOpproof SelectLongproof SelectDivproof SplitLongproof
Construction of the CFG, 3-address code generation	CminorSel to RTL	RTLgen	RTLgenspec RTLgenproof
Recognition of tail calls	RTL to RTL	Tailcall	Tailcallproof
Function inlining	RTL to RTL	Inlining	Inliningspec Inliningproof
Postorder renumbering of the CFG	RTL to RTL	Renumber	Renumberproof
Constant propagation	RTL to RTL	Constprop ConstpropOp	Constpropproof ConstproppOproof
Common subexpression elimination	RTL to RTL	CSE CombineOp	CSEproof CombineOpproof
Redundancy elimination	RTL to RTL	Deadcode	Deadcodeproof
Removal of unused static globals	RTL to RTL	Unusedglob	Unusedglobproof
Register allocation (validation a posteriori)	RTL to LTL	Allocation	Allocproof
Branch tunneling	LTL to LTL	Tunneling	Tunnelingproof
Linearization of the CFG	LTL to Linear	Linearize	Linearizeproof
Removal of unreferenced labels	Linear to Linear	CleanupLabels	CleanupLabelsproof
Synthesis of debugging information	Linear to Linear	Debugvar	Debugvarproof
Laying out the activation records	Linear to Mach	Stacking Bounds Stacklayout	Stackingproof Separation
Emission of assembly code	Mach to Asm	Asmgen	Asmgenproof0 Asmgenproof1 Asmgenproof

All together

Compiler: composing the passes together; whole-compiler semantic preservation theorems.
Complements: interesting consequences of the semantic preservation theorems.

Static analyses

The following static analyses are performed over the RTL intermediate representation to support optimizations such as constant propagation, CSE, and dead code elimination.

Liveness: liveness analysis.
ValueAnalysis: value and alias analysis
See also: ValueDomain: the abstract domain for value analysis.
See also: ValueAOp: processor-dependent parts of value analysis.
Deadcode: neededness analysis
See also: NeedDomain: the abstract domain for neededness analysis.
See also: NeedOp: processor-dependent parts of neededness analysis.

Type systems

The type system of CompCert C is fully formalized. For some intermediate languages of the back-end, simpler type systems are used to statically capture well-formedness conditions.

Ctyping: typing for CompCert C + type-checking functions.
RTLtyping: typing for RTL + type reconstruction.
Lineartyping: typing for Linear.

xavier.leroy@college-de-france.fr