Previous Up Next

Chapter 5  The CompCert C language

This chapter describes the dialect of the C programming language that is implemented by the CompCert C compiler and reference interpreter. It follows very closely the ISO C99 standard [4]. A few features of C99 are not supported at all; some other are supported only if the appropriate language support options are selected on the command line. On the other hand, some extensions to C99 are supported, borrowed from the ISO C2011 standard [5].

In this chapter, we describe both the restrictions and the extensions of CompCert C with respect to the C99 standard. We also document how CompCert implements the behaviors specified as implementation-dependent in C99. The description follows the structure of the C99 standard document [4]. In particular, section numbers (e.g. “§5.1.2.2”) correspond to those of the C99 standard document.

§5 Environment

§5.1.2.2 Hosted environment.
CompCert C follows the hosted environment model. The function called at program startup is named main. According to the formal semantics, it must be defined without parameters: int main(void) { ... }. The CompCert C compiler also supports the two-parameter form int main(int argc, char *argv[]).
§5.2.1.2 Multibyte characters.
Multibyte characters in program sources are not supported.
§5.2.4.2 Numerical limits.
Integer types follow the “ILP32LL64” model:
TypeSizeRange of values
unsigned char1 byte0 to 255
signed char1 byte−128 to 127
char1 bytelike signed char on x86
  like unsigned char on PowerPC and ARM
unsigned short2 bytes0 to 65535
signed short2 bytes−32768 to 32767
short2 bytes−32768 to 32767
unsigned int4 bytes0 to 4294967295
signed int4 bytes−2147483648 to 2147483647
int4 bytes−2147483648 to 2147483647
unsigned long4 bytes0 to 4294967295
signed long4 bytes−2147483648 to 2147483647
long4 bytes−2147483648 to 2147483647
unsigned long long8 bytes0 to 18446744073709551615
signed long long8 bytes−9223372036854775808 to 9223372036854775807
long long8 bytes−9223372036854775808 to 9223372036854775807
_Bool1 byte0 or 1

Floating-point types follow the IEEE 754-2008 standard [10]:

TypeRepresentationSizeMantissaExponent
floatIEEE 754 single precision4 bytes23 bits−126 to 127
 (binary32)   
doubleIEEE 754 double precision8 bytes52 bits−1022 to 1023
 (binary64)   
long doublenot supported by default; with -flongdouble option, like double

During evaluation of floating-point operations, the floating-point format used is that implied by the type, without excess precision and range. This corresponds to a FLT_EVAL_METHOD equal to 0.

§6 Language

§6.2 Concepts
 
§6.2.5 Types
CompCert C supports all the types specified in C99, with the following exceptions:
  • The long double type is not supported by default. If the -flongdouble option is set, it is treated as a synonym for double.
  • Complex types (double _Complex, etc) are not supported.
  • The result type and argument types of a function type must not be a structure or union type, unless the -fstruct-passing option is active (section 3.2.8).
  • Variable-length arrays are not supported. The size N of an array declarator T[N] must always be a compile-time constant expression.
§6.2.6 Representation of types

Signed integers use two’s-complement representation.

§6.3 Conversions
Conversion of an integer value to a signed integer type is always defined as reducing the integer value modulo 2N to the range of values representable by the N-bit signed integer type.

Pointer values can be converted to any pointer type. A pointer value can also be converted to the integer types int, unsigned int, long and unsigned long and back to the original pointer type: the result is identical to the original pointer value.

Conversions from double to float rounds the floating-point number to the nearest floating-point number representable in single precision. Conversions from integer types to floating-point types round to the nearest representable floating-point number.

§6.4 Lexical elements
 
§6.4.1 Keywords.
The following tokens are reserved as additional keywords:
    _Alignas      _Alignof       __attribute__    __attribute
    __const       __const__      __inline         __inline__
    __restrict    __restrict__   __packed__
    asm           __asm          __asm__
     _Noreturn
§6.4.2 Identifiers
All characters of an identifier are significant, whether it has external linkage or not. Case is significant even in identifiers with external linkage. The “$” (dollar) character is accepted in identifiers.
§6.4.3 Universal character names
Universal character names are supported in character constants and string literals. They are not supported within identifiers.
§6.5 Expressions

CompCert C supports all expression operators specified in C99, with the restrictions and extensions described below.

Overflow in arithmetic over signed integer types is defined as taking the mathematically-exact result and reducing it modulo 232 or 264 to the range of representable signed integers. Bitwise operators (&, |, ^, ~, <<, >>) over signed integer types are interpreted following two’s-complement representation.

Floating-point operations round their results to the nearest representable floating point number, breaking ties by rounding to an even mantissa. If the program changes the rounding mode at run-time, it must be compiled with flag -ffloat-const-prop 0 (section 3.2.3). Otherwise, the compiler will perform compile-time evaluations of floating-point operations assuming round-to-nearest mode.

Floating-point intermediate results are computed in single precision if they are of type float (i.e. all arguments have integer or float types) and in double precision if they are of type double (i.e. one argument has type double). This corresponds to FLT_EVAL_METHOD equal to 0.

An integer or floating-point value stored in (part of) an object can be accessed by any lvalue having integer or floating-point type. The effect of such an access is defined taking into account the bit-level representation of the types (two’s complement for integers, IEEE 754 for floats) and the endianness of the target platform (big-endian for PowerPC, little-endian for x86, and configuration dependent endianness for ARM). In contrast, a pointer value stored in an object can only be accessed by an lvalue having pointer type or integer type similar to intptr_t and uintptr_t. In other words, while the bit-level, in-memory representation of integers and floats is fully exposed by the CompCert C semantics, the in-memory representation of pointers is kept opaque and cannot be examined at any granularity other than a pointer-sized word.

§6.5.2 Postfix operators
If a member of a union object is accessed after a value has been stored in a different member of the object, the behavior is as described in the last paragraph above: the operation is well defined as long as it does not entail accessing a stored pointer value with a type other than a pointer type or an integer type similar to intptr_t and uintptr_t. For example, the declaration
    union u { double d; unsigned int i[2]; unsigned char c[8]; };
supports accessing any member after any other member has been stored. On the other hand, consider
    union u { char * ptr; unsigned char c[4]; };
If a pointer value is stored in the ptr member, accesses to the elements of the c member are not defined.
§6.5.3 Unary operators
Symmetrically with the sizeof operator, CompCert C supports the _Alignof operator from C2011, which can also be written __alignof__ as in GNU C. This operator applies either to a type or to an expression. It returns the natural alignment, in bytes, of this type or this expression’s type.

The type size_t of the result of sizeof and _Alignof is taken to be unsigned long.

§6.5.4 Casts
See the comments on point §6.3 (“Conversions”) above concerning pointer casts and casts to signed integer types.
§6.5.5 Multiplicative operators
Division and remainder are undefined if the second argument is zero. Signed division and remainder are also undefined if the first argument is the smallest representable signed integer (−2147483648 for type int) and the second argument is −1 (the only case where division overflows).
§6.5.6 Additive operators
Adding a pointer and an integer, or subtracting an integer from a pointer, are always defined even if the resulting pointer points outside the bounds of the underlying object. The byte offset with respect to the base of the underlying object is treated modulo 232 or 264 (depending on the bitsize of pointers on the target processor). Such out-of-bounds pointers cause undefined behavior when dereferenced or compared with other pointers.
§6.5.7 Bitwise shift operators
The right shift operator >> applied to a negative signed integer is specified as shifting in “1” bits from the left.
§6.5.8 Relational operators
Comparison between two non-null pointers is always defined if both pointers fall within the bounds of the underlying objects. Additionally, comparison is also defined if one of the pointers is “one past the end of an object” and the other pointer is identical to the first pointer or falls within the bounds of the same object. Comparison between a pointer “one past” the end of an object and a pointer within a different object is undefined behavior.
§6.5.9 Equality operators
Same remark as in §6.5.8 concerning pointer comparisons.
§6.6 Constant expressions
No differences with C99.
§6.7 Declarations
CompCert C supports all declarations specified in C99, with the restrictions and extensions described below.
§6.7.2 Type specifiers
Complex types (the _Complex specifier) are not supported.
§6.7.2.1 Structure and union specifiers
Bit fields in structures and unions are not supported by default, but are supported through source-to-source program transformation if the -fbitfields option is selected (section 3.2.8).

Bit fields of “plain” type int are treated as signed. In accordance with the ELF Application Binary Interfaces, bit fields within an integer are allocated most significant bits first on the PowerPC platform, and least significant bits first on the ARM and x86 platforms. Bit fields never straddle an integer boundary.

Bit fields can be of enumeration type, e.g. enum e x: 2. Such bit fields are treated as unsigned if this enables all values of the enumeration to be represented exactly in the given number of bits, and as signed otherwise.

The members of a structure are laid out consecutively in declaration order, with enough bytes of padding inserted to guarantee that every member is aligned on its natural alignment. The natural alignment of a member can be modified by the _Alignas qualifier. Different layouts can be obtained if the -fpacked-structs option is set (section 3.2.8) and the packed attribute (section 6.2) is used.

Anonymous structures and anonymous unions are supported as in C2011. (See the C2011 standard, §6.7.2.1 paragraph 13 for a description.)

§6.7.2.2 Enums
The values of an enumeration type have type int.
§6.7.3 Type qualifiers
The const and volatile qualifiers are honored, with the restriction below on volatile composite types. The restrict qualifier is accepted but ignored.

Accesses to objects of a volatile-qualified scalar type are treated as described in paragraph 6 of section 6.7.3: every assignment and dereferencing is treated as an observable event by the CompCert C formal semantics, and therefore is not subject to optimization by the CompCert compiler. Accesses to objects of a volatile-qualified composite type (struct or union type) are treated as regular, non-volatile accesses: no observable event is produced, and the access can be optimized away. The CompCert compiler emits a warning in the latter case.

Following ISO C2011, CompCert supports the _Alignas construct as a type qualifier. This qualifier comes in two forms: _Alignas(N), where N is a compile-time constant integer expression that evaluates to a power of two; and _Alignas(T), where T is a type. The latter form is equivalent to _Alignas(_Alignof(T)).

The effect of the _Alignas(N) qualifier is to change the alignment of the type being qualified, setting the alignment to N. In particular, this affects the layout of struct fields. For instance:

    struct s { char c; int _Alignas(8) i; };

The Alignas(8) qualifier changes the alignment of field i from 4 (the natural alignment of type int) to 8. This causes 7 bytes of padding to be inserted between c and i, instead of the normal 3 bytes. This also increases the size of struct s from 8 to 12, and the alignment of struct s from 4 to 8.

The alignment N given in the _Alignas(N) qualifier should normally be greater than or equal to the natural alignment of the modified type. For target platforms that support unaligned memory accesses (x86 and PowerPC, but not ARM), N can also be smaller than the natural alignment.

Finally, CompCert C provides limited support for GCC-style attributes (__attribute keyword) used in type qualifier position. See section 6.2.

§6.7.4 Function specifiers
Two function specifiers are supported: inline from ISO C99 and _Noreturn from ISO C2011.
§6.7.5.2 Array declarators
Variable-length arrays are not supported. The only supported array declarators are those of ISO C90, namely [] for an incomplete array type, and [N] where N is a compile-time constant expression for a complete array type.
§6.7.5.3 Function declarators
The result type of a function must not be a structure or union type, unless the -fstruct-return option is active (section 3.2.8).
§6.7.8 Initialization
Both traditional (ISO C90) and designated initializers are supported, conforming with ISO C99.
§6.8 Statements and blocks
All forms of statements specified in C99 are supported in CompCert C, with the exception described below.

The asm statement (a popular extension for inline assembly) is not supported by default, but is supported if option -finline-asm is set. See section 6.5 for a complete description of the syntax and semantics of asm statements.

§6.8.4.2 The switch statement
The switch statement in CompCert C is restricted to the “structured” form present in Java and mandated by Misra-C. Namely, the switch statement must have the following form:
    switch (expression) {
        case expr1: ...
        case expr2: ...
        ...
        default: ...
    }
In other words, the case and default labels that pertain to a switch must occur at the top-level of the block following the switch. They cannot occur in nested blocks or under other control structures such as if, while or for. In particular, the infamous Duff’s device is not supported.

As an extension to Java and Misra-C, the default case of a switch statement can appear in the middle of the cases, not necessarily last.

§6.9 External definitions
Function definitions should be written in modern, prototype form. The compiler accepts traditional, non-prototype function definitions but converts them to prototype form. In particular, T f() {...} is automatically converted to T f(void) {...}.

Functions with a variable number of arguments, as denoted by an ellipsis ... in the prototype, are supported.

The result type of the function must not be a structure or union type, unless the -fstruct-return option is active (section 3.2.8).

§6.10 Preprocessing directives
The CompCert C compiler does not perform preprocessing itself, but delegates this task to an external C preprocessor, such as that of GCC. The external preprocessor is assumed to conform to the C99 standard.

§7 Library

The CompCert C compiler does not provide its own implementation of the C standard library. It provides a few standard headers and relies on the standard library of the target system for the others. CompCert has been successfully used in conjunction with the GNU glibc standard library. Note, however, the following points:

§7.6 Floating-point environment <fenv.h>
The CompCert formal semantics and optimization passes assume round-to-nearest behavior in floating-point arithmetic. If the program changes rounding modes during execution using the fesetround function, it must be compiled with option -ffloat-const-prop 0 to turn off certain floating-point optimizations.
§7.12 Mathematics <math.h>
Many versions of <math.h> include long double versions of the math functions. These functions cannot be called by CompCert-compiled code by lack of ABI-conformant support for the long double type.
§7.13 Non-local jumps <setjmp.h>
The CompCert C compiler has no special knowledge of the setjmp and longjmp functions, treating them as ordinary functions that respect control flow. It is therefore not advised to use these two functions in CompCert-compiled code. To prevent misoptimisation, it is crucial that all local variables that are live across an invocation of setjmp be declared with volatile modifier.

Previous Up Next