Previous Up Next

Chapter 6  Language extensions

This chapter describes several extensions to the C99 standard implemented by CompCert: compiler pragmas (section 6.1), attributes (section 6.2), built-in functions (section 6.3), code annotation mechanisms (sections 6.4 and 6.5), and GCC-style extended inline assembly (section 6.6).

6.1  Pragmas

This section describes the pragmas supported by CompCert C. The compiler emits a warning for an unrecognized pragma.

#pragma reserve_register reg-name
Ensure that all subsequent function definitions do not use register reg-name in their compiled code, and therefore preserve the value of this register. The register must be a callee-save register. The following register names are allowed:
On PowerPC:R14, R15, …, R31 (general-purpose registers)
 F14, F15, …, F31 (float registers)
On ARM:R4, R5, …, R11 (integer registers)
 F8, F9, …, F15 (float registers)
On x86-32:EBX, ESI, EDI, EBP (32-bit integer registers)
On x86-64:RBX, RBP, R12, R13, R14, R15 (64-bit integer registers)
On RISC-V:X8, X9; X18, X19, …, X27 (integer registers)
 F8, F9; F18, F19, …, F27 (float registers)
#pragma section ident "iname" "uname" addr-mode access-mode
Define a new linker section, or modify the characteristics of an existing linker section. The parameters are as follows:
ident
The compiler internal name for the section.
iname
The linker section name to be used for initialized variables and for function code.
uname
The linker section name to be used for uninitialized variables.
addr-mode
The addressing mode used to access variables located in this section. On PowerPC, setting addr-mode to near-data denotes a small data area, which is addressed by 16-bit offsets relative to the corresponding small data area register. On PowerPC, setting addr-mode to far-data denotes a relocatable data area, which is addressed by 32-bit offsets relative to the corresponding register. Any other value of addr-mode denotes standard absolute addressing.
access-mode
One or several of R to denote a read-only section, W to denote a writable section, and X to denote an executable section.

Functions and global variables can be explicitly placed into user-defined sections using the #pragma use_section directive (see below). Another, more indirect, less recommended way is to modify the characteristics of the default sections in which the compiler automatically place function code, global variables, and auxiliary compiler-generated data. These default sections are as follows:

Internal nameWhat is put there
DATAglobal, non-const variables of size greater than N bytes. N defaults to 0 and can be set using the -fsmall-data command-line option.
CONSTglobal, const variables of size greater than N bytes. N defaults to 0 and can be set using the -fsmall-const command-line option.
SDATAglobal, non-const variables of size less than or equal to N bytes. N defaults to 0 and can be set using the -fsmall-data command-line option.
SCONSTglobal, const variables of size less than or equal to N bytes. N defaults to 0 and can be set using the -fsmall-const command-line option.
STRINGstring literals
CODEmachine code for function definitions
LITERALconstants (e.g. floating-point literals) referenced from function code
JUMPTABLEjump tables generated for switch statements

A simpler, alternate way to control placement into sections is to use the section attribute.

Example:

explicit placement into user-defined sections. We first define four new compiler sections.

    #pragma section MYDATA "mydata_i" "mydata_u" standard RW
    #pragma section MYCONST "myconst" "myconst" standard R
    #pragma section MYSDA "mysda_i" "mysda_u" near-data RW
    #pragma section MYCODE "mycode" "mycode" standard RX

We then use the #pragma use_section to place variables and functions in these sections, then define the variables and functions in question.

    #pragma use_section MYDATA a b
    int a;                // uninitialized; goes into linker section "mydata_u"
    double b = 3.1415;    // initialized; goes into linker section "mydata_i"
    #pragma use_section MYCONST c
    const short c = 42;   // goes into linker section "myconst"
    #pragma use_section MYSDA d e
    int d;                // goes into linker section "mysda_u"
    double e = 2.718;     // goes into linker section "mysda_i"
    #pragma use_section MYCODE f
    int f(void) { return a + d; }
                          // goes into linker section "mycode"
                          // accesses d via small data relative addressing
                          // accesses a via absolute addressing
Example:

implicit placement by modifying the compiler default sections. In this example, we assume that the options -fsmall-data 8 and -fsmall-const 0 are given.

    #pragma section DATA "mydata_i" "mydata_u" standard RW
    #pragma section CONST "myconst" "myconst" standard R
    #pragma section SDATA "mysda_i" "mysda_u" near-data RW
    #pragma section CODE "mycode" "mycode" standard RX
    #pragma section LITERAL "mylit" "mylit" standard R

    int a;            // small data, uninitialized; goes into "mysda_u"
    char b[16];       // big data, uninitialized; goes into "mydata_u"
    const int c = 42; // big const data, initialized; goes into "myconst"
    double f(void) { return c + 3.14; }
                      // code goes into "mycode"
                      // literal 3.14 goes into "mylit"
Caveat:

when using non-standard sections, the linker must, in general, be informed of these sections and how to place them in the final executable, typically by providing an appropriate mapping file to the linker.

#pragma use_section ident var
Explicitly place the global variables or functions var, … in the compiler section named ident. The use_section pragma must occur before any declaration or definition of the variables and functions it mentions. See #pragma section above for additional explanations and examples.

6.2  Attributes

Like the GCC compiler, the CompCert C compiler allows the programmer to attach attributes to various parts of C source code.

An attribute qualifier is of the form __attribute((attribute-list)) or __attribute__((attribute-list)), where attribute-list is a possibly empty, comma-separated lists of attributes. Each attribute is of the following form:

attribute::=ident
 ident(attr-arg,,attr-arg)
attr-arg::=ident
 "string-literal"
 const-expr

Each attribute carries a name and zero or more arguments, which can be identifiers, string literals, or C expressions of integer types that are compile-time constants.

Attribute names can be specified with __ (two underscore characters) preceding and following each name. For instance, __aligned__(N) and aligned(N) denote the same attribute.

For compatibility with other C compilers, the keyword __packed__ is also recognized as an attribute:

__packed__()is equivalent to__attribute((packed))
__packed__(params)is equivalent to __attribute((packed(params)))

In C source files, attribute qualifiers can occur anywhere a standard type qualifier (const, volatile) can occur, and also just after the struct and union keywords. For partial compatibility with GCC, the CompCert parser allows attributes to occur in several other places, but may silently ignore them.

Warning. Some standard C libraries, when used in conjunction with CompCert, deactivate the __attribute__ keyword: the standard includes, or CompCert itself, define __attribute__ as a macro that erases its argument. This is the case for the Glibc standard library under Linux, and the Xcode header files under macOS. For this reason, please use the __attribute keyword in preference to the __attribute__ keyword.

The following attributes are recognized by CompCert. For unrecognized attributes CompCert issues a warning.

aligned(N)
(type attribute) Specify the alignment to use for a variable or a struct member. The argument N is the desired alignment, in bytes. It must be a power of 2. This attribute is equivalent to the qualifier _Alignas(N).
noinline
(function attribute) Prevents a function from being considered for inlining.
noreturn
(function attribute) Indicate that a function never returns normally. This information is currently not used for code optimization, but only to produce more precise warnings. This attribute is similar to the _Noreturn function specifier from ISO C 2011, but it can also appear in function pointer types.
packed(max-member-alignment, min-struct-alignment, byte-swap)
(structure attribute) This attribute is recognized only if the -fpacked-structs option is active (section 3.2.9).

The packed attribute applies to a struct declarations and affects the memory layout of the members of this struct. Zero, one, two or three integer arguments can be provided.

If the max-member-alignment parameter is provided, the natural alignment of every member (field) of the structure is reduced to at most max-member-alignment. In particular, if max-member-alignment = 1, members are not aligned, and no padding is ever inserted between members.

If the min-struct-alignment parameter is provided, the natural alignment of the whole structure is increased to at least min-struct-alignment.

If the byte-swap parameter is provided and equal to 1, accesses to structure members of integer or pointer types are performed using the opposite endianness than that of the target platform. For PowerPC, accesses are done in little-endian mode instead of the natural big-endian mode; for x86 and RISC-V accesses are done in big-endian mode instead of the natural little-endian mode; for ARM — where the endianness is configuration dependent — accesses are done in big-endian mode when configured as little-endian and vice versa.

Examples:
    struct __attribute__((packed(1))) s1 {    // suppress all padding
      char c;           // at offset 0
      short s;          // at offset 1
      int i;            // at offset 3
    };                  // total size is 7, structure alignment is 1

    struct __attribute__((packed(4,16,1))) s2 {
      char c;           // at offset 0
      short s;          // at offset 4 (because 4-aligned); byte-swap at access
      int i;            // at offset 8 (because 4-aligned); byte-swap at access
    };                  // total size is 16, structure alignment is 16

    struct s3 {         // default layout
      char c;           // at offset 0
      short s;          // at offset 2 (because 2-aligned)
      int i;            // at offset 4
    };                  // total size is 12, structure alignment is 4
Limitations:

For a byte-swapped structure, all members should be of integer or pointer types, or arrays of those types.

Reduced member alignment should not be used on the ARM platform, since unaligned memory accesses on ARM can crash the program or silently produce incorrect results. Only the PowerPC, x86 and RISC-V platforms support unaligned memory accesses.

packed
(structure attribute) This form of the packed attribute is equivalent to packed(1). It causes the the structure it modifies to be laid out with no alignment padding between the members. The size of the resulting structure is therefore the sum of the sizes of its members. The alignment of the resulting structure is 1.
section("section-name")
(name attribute) Specify the linker section section-name where to place functions and global variables whose types carry this section attribute. The linker section is declared as executable read-only if the attribute applies to a function definition; as read-only if the attribute applies to a const variable definition; and as read-write if the attribute applies to a non-const variable definition.

The #pragma section and #pragma use_section directives can be used to obtain finer control on user-defined sections (section 6.1).

unused
(variable or parameter attribute) This attribute, attached to a variable or function parameter, means that the variable/parameter is meant to be possibly unused. CompCert will not produce a warning for it.
Interpretation of attributes.

Attributes are attached to parts of the program in a way that depends on the kind of the attribute:

For example, consider

  __attribute((section("foo"), aligned(8))) int * p;

The section attributes applies to variable p while the aligned attributes modifies type int. The meaning, therefore, is that p is a variable in section foo that has type pointer to 8-aligned ints.

  __attribute((noreturn)) void (*fptr) (int x);

The noreturn attribute applies to the function type, not to the variable fptr. The meaning is that fptr is a variable of type “pointer to non-returning functions from int to void”.

6.3  Built-in functions

Built-in functions are functions that are predefined in the initial environment — the environment in which the compilation of a source file starts. In other words, these built-in functions are always available: there is no need to include header files.

Built-in functions give access to interesting capabilities of the target processor that cannot be exploited in standard C code. For example, most processors provide an instruction to quickly compute the absolute value of a floating-point number. In CompCert C, this instruction is made available to the programmer via the __builtin_fabs function. It provides a faster alternative to the fabs library function from <math.h>.

Invocations of built-in functions are automatically inlined by the compiler at point of use. It is a compile-time error to take a pointer to a built-in function.

Some built-in functions are available on all target platforms supported by CompCert. Others are specific to a particular platform.

6.3.1  Common built-in functions

Integer arithmetic:

unsigned int __builtin_bswap(unsigned int x)
Swap the bytes of x to change its endianness. If x is of the form 0xaabbccdd, the result is 0xddccbbaa.
unsigned int __builtin_bswap32(unsigned int x)
A synonym for __builtin_bswap.
unsigned short __builtin_bswap16(unsigned short x)
Swap the bytes of x to change its endianness. If x is of the form 0xaabb, the result is 0xbbaa.
unsigned long long __builtin_bswap64(unsigned long long x)
Reverses the order of the 8 bytes of x. Currently only available for architectures RiscV and x86/AMD64.
int __builtin_clz(unsigned int x)
int __builtin_clzl(unsigned long x)
Count the number of consecutive zero bits in x, starting with the most significant bit. On x86 the result is undefined if x is 0, otherwise the result is between 0 and 31 inclusive. On PowerPC and ARM architectures, the result is between 0 and 32 inclusive. This builtin is currently not available for RiscV.
int __builtin_clzll(unsigned long long x)
Count the number of consecutive zero bits in x, starting with the most significant bit. On x86 the result is undefined if x is 0, otherwise the result is between 0 and 63 inclusive. On PowerPC and ARM architectures, the result is between 0 and 64 inclusive. This builtin is currently not available for RiscV.
int __builtin_ctz(unsigned int x)
int __builtin_ctzl(unsigned long x)
Count the number of consecutive zero bits in x, starting with the least significant bit. On x86 the result is undefined if x is 0, otherwise the result is between 0 and 31 inclusive. On PowerPC and ARM architectures, the result is between 0 and 32 inclusive. If defined, the result is the number of the least significant bit that is set in x. This builtin is currently not available for RiscV.
int __builtin_ctzll(unsigned long long x)
Count the number of consecutive zero bits in x, starting with the least significant bit. On x86 the result is undefined if x is 0, otherwise the result is between 0 and 63 inclusive. On PowerPC and ARM architectures, the result is between 0 and 64 inclusive. If defined, the result is the number of the least significant bit that is set in x. This builtin is currently not available for RiscV.

Floating-point arithmetic:

double __builtin_fabs(double x)
Return the floating-point absolute value of its argument, or NaN if the argument is NaN. This function is equivalent to the fabs() function from the <math.h> standard library, but executes faster.

Block copy with known size and alignment:

void __builtin_memcpy_aligned
(void * dst, const void * src, size_t sz, size_t al)
Copy sz bytes from the memory area at src to the memory area at dst. The source and destination memory areas must be either disjoint or identical; the behavior is undefined if they overlap. The pointers src and dst must be aligned on an al byte boundary, where al is a power of 2. The sz and al arguments must be compile-time constants. A typical invocation is
    __builtin_memcpy_aligned(&dst, &src, sizeof(dst), _Alignof(type_dst));
where dst and src are two objects of the same complete type type_dst. An invocation of __builtin_memcpy_aligned(dst,src,sz,al) behaves like memcpy(dst,src,sz) as defined in the <string.h> standard library. Knowing the size and alignment at compile-time enables the compiler to generate very efficient inline code for the copy.

Memory accesses with reversed endianness (currently not available for RiscV):

unsigned short __builtin_read16_reversed(const unsigned short * ptr)
Read a 16-bit integer at address ptr and reverse its endianness by swapping the two bytes of the result.
unsigned int __builtin_read32_reversed(const unsigned int * ptr)
Read a 32-bit integer at address ptr and reverse its endianness by swapping the four bytes of the result, as __builtin_bswap does.
void __builtin_write16_reversed(unsigned short * ptr, unsigned short x)
Reverse the endianness of x by swapping its two bytes, then write the 16-bit result at address ptr.
void __builtin_write32_reversed(unsigned int * ptr, unsigned int x)
Reverse the endianness of x by swapping its four bytes, then write the 32-bit result at address ptr.

Synchronization:

void __builtin_membar(void)
Software memory barrier. Prevent the compiler from moving memory loads and stores across the call to __builtin_membar. No processor instructions are generated, hence the hardware can still reorder memory accesses. To generate hardware memory barriers, see the “synchronization” built-in functions specific to each processor.

6.3.2  PowerPC built-in functions

Integer arithmetic:

int __builtin_mulhw(int x, int y)
Return the high 32 bits of the full 64-bit product of two signed integers.
unsigned int __builtin_mulhwu(unsigned int x, unsigned int y)
Return the high 32 bits of the full 64-bit product of two unsigned integers.
long long __builtin_mulhd(long long x, long long y)
Return the high 64 bits of the full 128-bit product of two signed long longs (only available for 32/64-bit hybrid PowerPC).
unsigned long long __builtin_mulhdu
(unsigned long long x, unsigned long long y)
Return the high 64 bits of the full 128-bit product of two unsigned long longs (only available for 32/64-bit hybrid PowerPC).
int __builtin_isel(_Bool cond, int iftrue, int iffalse)
unsigned int __builtin_uisel
(_Bool cond, unsigned int iftrue, unsigned int iffalse)
Return iftrue if cond is true, iffalse if cond is false. Expands to an isel instruction on Freescale EREF processors, and to a branch-free instruction sequence on other PowerPC processors.

Floating-point arithmetic:

double __builtin_fmadd(double x, double y, double z)
Fused multiply-add. Compute x*y + z without rounding the intermediate product x*y.
double __builtin_fmsub(double x, double y, double z)
Fused multiply-sub. Compute x*y - z without rounding the intermediate product x*y.
double __builtin_fnmadd(double x, double y, double z)
Fused multiply-add-negate. Compute -(x*y + z) without rounding the intermediate product x*y.
double __builtin_fnmsub(double x, double y, double z)
Fused multiply-sub-negate. Compute -(x*y - z) without rounding the intermediate product x*y.
double __builtin_fsqrt(double x)
Return the square root of x, like the sqrt function from the <math.h> standard library. The corresponding PowerPC instruction is optional and not supported by all processors.
double __builtin_frsqrte(double x)
Compute an estimate (with relative accuracy 1/32) of 1 / √x, the reciprocal of the square root of x. The corresponding PowerPC instruction is optional and not supported by all processors.
float __builtin_fres(float x)
Compute an estimate (with relative accuracy 1/256) of the single-precision reciprocal of x. The corresponding PowerPC instruction is optional and not supported by all processors.
double __builtin_fsel(double x, double y, double z)
Return y if x is greater or equal to 0.0. Return z if x is less than 0.0 or is NaN. The corresponding PowerPC instruction is optional and not supported by all processors.
int __builtin_fcti(double x)
Round the given floating-point number x to an integer and return this integer. The difference with the standard C conversion (int)x is that the latter rounds x towards zero, while __builtin_fcti(x) rounds x according to the current rounding mode (by default: to the nearest integer, ties round to even).

Memory accesses with reversed endianness:

unsigned long long __builtin_read64_reversed
(const unsigned long long * ptr)
Read a 64-bit integer at address ptr and reverse its endianness by swapping the eight bytes of the result (currently only available for 32/64-bit hybrid PowerPC).
void __builtin_write64_reversed
(unsigned long long * ptr, unsigned long long x)
Reverse the endianness of x by swapping its eight bytes, then write the 64-bit result at address ptr (currently only available for 32/64-bit hybrid PowerPC).

Synchronization instructions:

void __builtin_eieio(void)
Issue an eieio barrier.
void __builtin_sync(void)
Issue a sync barrier.
void __builtin_isync(void)
Issue an isync barrier.
void __builtin_lwsync(void)
Issue an lwsync barrier.
void __builtin_mbar(int level)
Issue a mbar barrier. The integer argument must be 0 or 1.
void __builtin_trap(void)
Abort the program on an unconditional trap instruction.

Cache control instructions:

void __builtin_dcbf(void * addr)
void __builtin_dcbi(void * addr)
void __builtin_dcbtls(void * addr, int level)
void __builtin_dcbz(void * addr)
void __builtin_icbi(void * addr)
void __builtin_icbtls(void * addr, int level)
Issue the corresponding cache control instruction. addr is the address of the cache block affected. level must be the constant 0 (L1 cache) or 2 (L2 cache).
void __builtin_prefetch(void * addr, int for_write, int level)
Issue a dcbt instruction if for_write is 0 and a dcbtst instruction if for_write is 1.

Access to special registers:

unsigned int __builtin_get_spr(int spr)
unsigned long long __builtin_get_spr64(int spr)
void __builtin_set_spr(int spr, unsigned int value)
void __builtin_set_spr64(int spr, unsigned long long value)
The spr argument is the number of the special register accessed. It must be a compile-time constant.

Atomic (sequentially-consistent) memory operations:

void __builtin_atomic_exchange(int * addr, int * new, int * old)
Store the current value of *addr in *old and set *addr to the value *new.
void __builtin_atomic_load(int * p, int * val)
Read the current value of *p and store it into *val.
_Bool __builtin_atomic_compare_exchange
(int * p, int * expected, int * desired)
Compare the current value of *p with *expected. If equal, set *p to the value *desired and return 1. If different, set *expected to the current value of *p and return 0.
int __builtin_sync_fetch_and_add(int * p, int delta)
Add delta to the contents of *p. Return the initial value of *p before the addition.

6.3.3  x86 built-in functions

Floating-point arithmetic:

double __builtin_fsqrt(double x)
Return the square root of x, like the sqrt function from the <math.h> standard library.
double __builtin_fmax(double x, double y)
Return the greater of x and y. If x or y are NaN, the result is either x or y, but it is unspecified which.
double __builtin_fmin(double x, double y)
Return the smaller of x and y. If x or y are NaN, the result is either x or y, but it is unspecified which.
double __builtin_fmadd(double x, double y, double z)
Fused multiply-add. Compute x*y + z without rounding the intermediate product x*y. Requires a processor that supports the FMA3 extensions.
double __builtin_fmsub(double x, double y, double z)
Fused multiply-sub. Compute x*y - z without rounding the intermediate product x*y. Requires a processor that supports the FMA3 extensions.
double __builtin_fnmadd(double x, double y, double z)
Fused multiply-negate-add. Compute -(x*y) + z without rounding the intermediate product x*y. Requires a processor that supports the FMA3 extensions.
double __builtin_fnmsub(double x, double y, double z)
Fused multiply-negate-sub. Compute -(x*y) - z without rounding the intermediate product x*y. Requires a processor that supports the FMA3 extensions.

6.3.4  ARM built-in functions

Floating-point arithmetic:

double __builtin_fsqrt(double x)
Return the square root of x, like the sqrt function from the <math.h> standard library.

Synchronization instructions:

void __builtin_dmb(void)
Issue a dmb (data memory) barrier.
void __builtin_dsb(void)
Issue a dsb (data synchronization) barrier.
void __builtin_isb(void)
Issue an isb (instruction synchronization) barrier.

6.3.5  RISC-V built-in functions

Floating-point arithmetic:

double __builtin_fsqrt(double x)
Return the square root of x, like the sqrt function from the <math.h> standard library.
double __builtin_fmax(double x, double y)
Return the greater of x and y. If x or y are NaN, the result is either x or y, but it is unspecified which.
double __builtin_fmin(double x, double y)
Return the smaller of x and y. If x or y are NaN, the result is either x or y, but it is unspecified which.
double __builtin_fmadd(double x, double y, double z)
Fused multiply-add. Compute x*y + z without rounding the intermediate product x*y.
double __builtin_fmsub(double x, double y, double z)
Fused multiply-sub. Compute x*y - z without rounding the intermediate product x*y.
double __builtin_fnmadd(double x, double y, double z)
Fused multiply-add-negate. Compute -(x*y + z) without rounding the intermediate product x*y.
double __builtin_fnmsub(double x, double y, double z)
Fused multiply-sub-negate. Compute -(x*y - z) without rounding the intermediate product x*y.

6.4  Embedded program annotations for a3

CompCert C provides a mechanism to annotate C source code with information for AbsInt’s a3 framework (a3 is an acronym for AbsInt Advanced Analyzer) combining various analyzers modules like aiT, TimingProfiler, StackAnalyzer, or ValueAnalyzer.

The analyzer modules of a3 typically operate on machine code in fully linked executables and usually expect external information for improved analysis precision to be also presented on this rather low level.

With CompCert’s annotation mechanism it is now possible to reliably annotate on C source code level and reason about C variables instead of using code addresses or processor registers. Furthermore it allows to add annotations in the AIS specification language of a3 (refer [4]). These annotations are automatically carried through the compilation chain and the linked executable into a3 without the need of external annotation files.

The annotation mechanism is available for all target architectures with ELF binary format.

Concept

CompCert collects all annotations of a compilation unit and stores them in encoded form in a special section named __compcert_ais_annotations in the object file. a3 can automatically extract them from this section and utilize them in analyses.

For CompCert, annotations via __builtin_ais_annot look like a call to an external variadic function similar to printf: The first argument contains the AIS annotation and is also a format string. It can contain format specifiers (section 6.4.2) that are replaced with the AIS representation of the additional arguments to __builtin_ais_annot. In contrast to printf most format specifiers are tagged with numbers and refer to a specific argument independent of the order. It is also possible to refer to an argument more than once.

In the context of optimizations, the builtin is also a robust mechanism to attach annotations at specific code locations. It does not rely on the rather ambiguous line information of the DWARF debug information but rather utilizes the label mechanism of the assembler and linker to generate annotations with actual code addresses.

With the builtin-mechanism it is possible for CompCert to e.g. remove unreachable code together with the contained annotations or do code transformations like reordering code blocks without breaking the annotations. Consider the following C snippet:

    static void func(int count)
    {
      int i;
      for (int i = 0; i < count; i++) {
        __builtin_ais_annot("try loop %here bound: %e1;", count);
        ...
      }
      ...
    }

If constant propagation can prove that count is always zero, CompCert can remove the whole loop since it will never be executed. In such a situation the annotation will also be removed. In contrast to this, a conventional source code annotation via special formatted C comments (e.g. // ai: …) would remain visible and probably cause problems since a3 collects such annotations by scanning the source code. The same is true, when source code uses the C preprocessor for conditional compilation: CompCert can remove not used annotations while conventional source code annotations will remain visible for a3.

6.4.1  Examples

The following paragraphs depict some motivating examples on how to use the annotation mechanism with a3 in, e.g. a software library that will be integrated in an embedded system.

Label definitions

AIS labels can be used to create robust references to code locations, e.g. for the location of the different branches in an if-statement:

    void funcExecTask(int)
    {
      __builtin_ais_annot("label 'funcExecTask_before_cond': %here;");

      if (condition) {
        __builtin_ais_annot("label 'funcExecTask_cond_then': %here;");
        ...
      } else {
        __builtin_ais_annot("label 'funcExecTask_cond_else': %here;");
        ...
      }
      ...
    }

A user of a3 can then easily use the defined label references for his analysis specific annotations:

    ## User ais annotations for specific analyses

# annotations for code within then and else branches try instruction label('funcExecTask_cond_then') ...; try instruction label('funcExecTask_cond_else') ...;

# annotation for the actual branch instruction try instruction label('funcExecTask_before_cond') -> conditional(1) ...;

Excluding critical code snippets

Labels can also be used to exclude critical code snippets from analysis. First a label is defined to reference the beginning of a critical code section. Behind the critical code, a second annotation excludes the code snippet between the defined label and the current address (%here) from analysis:

    void funcExecTask(int taskNo)
    {
      ...

      // Label before critical code section
      __builtin_ais_annot("label 'execTask_begin_critical': %here;");

      // The critical code, not suitable for analysis
      ...

      // End of critical code: cut it away and annotate execution time, etc.
      // depending on function parameter
      __builtin_ais_annot("try instruction label('execTask_begin_critical')\n"
                          " snippet {\n"
                          " continue at: %here;\n"
                          " not analyzed;\n"
                          " takes: %e1 * 42 cycles;\n"
                          " ...\n"
                          " }", taskNo);

      ...
    }
Loop and recursion annotations

A (probably overestimated) annotation can be specified in the source code of a common library routine to ensure that a3 can compute reasonable results without annotation effort or to increase analysis precision at specific code loations. If necessary, a user of a3 can improve this annotation by giving more specific annotations for the actual analysis context. For example:

Specifying a bound for a data dependent loop:

    int strncmp_x(char s[], char t[], int len)
    {
      int i;
      for (i = 0; i < len && ...; i++) {
        __builtin_ais_annot("try loop %here bound: 0..%e1;", len);
        ...
      }
      return 0;
    }

Providing unrolling hints for loops for improved precision in a3:

    void strcpy_x(char s[], char t[])
    {
      int i = 0;
      while (( s[i] = t[i] ) != '\0') {
        __builtin_ais_annot("try loop %here mapping { default unroll: 50; }");
        ...
      }
    }

Specifying a time bound for a busy waiting loop:

    void openCanSocket(volatile struct device_t* device)
    {
      ...
      // Busy wait for ACK. Assume a worst-case timing of 23 us
      while(device->bus_data != 0x00) {
        __builtin_ais_annot("try loop %here takes: 23 us;");
      }
      ...
    }

Specifying a recursion bound and an incarnation bound for a recursive routine.

    void errorHook(unsigned char err_code)
    {
      __builtin_ais_annot("try routine %e1 {\n"
                          " recursion bound: 1;\n"
                          " incarnation limit: 1;\n"
                          "}", &errorHook);
      ...
    }
Refining values

In this example a double value with a known range is converted to an integer and used as an index, e.g. to access a data array. a3 has currently no knowledge of floating point values and needs help to restrict the range of i (and derived from it, a memory access) to a small range:

    double func(double x)
    {
      double data[10] = { ... };

      // x is known to be always >= 0.0 and < 10.0
      int i = x;

      // Refine the value in the location holding variable i
      __builtin_ais_annot("try instruction %here { enter with: %l1 = 0..9; };",
                          i);

      return data[i];
    }

This will result in the following PowerPC assembly code, where \%here will be replaced by CompCert together with the assembler/linker with the address of label .L118:

    ; int i = x;
      fctiwz  f13, f1
      stfdu   f13, -8(r1)
      lwz     r5, 4(r1)
      addi    r1, r1, 8

    .L117:
    ; __builtin_ais_annot("try instruction %here { enter with: %l1 = 0..9; };",
    ;                     i);

    .L118: .L119:
    ; return data[i];
      addi    r3, r1, 16
      rlwinm  r4, r5, 3, 0, 28 ; 0xfffffff8
      lfdx    f1, r3, r4

    .L120:
      addi    r1, r1, 96

    .L121:
      blr

    ...

    .section  "__compcert_ais_annotations",,n
    .ascii "# file:test.c line:25 function:func\ntry instruction "
    .byte 7,4
    .4byte .L118
    .ascii " { enter with: reg("r5") = 0..9; };"
    .ascii "\n"

The annotation, as extracted by a3 will then look as follows:

    # test.c line:25 function:func
instruction 0x10013c { enter with: reg("r5") = 0..9; };

Another possibility for refinement of values and improved analysis precision in a3 is to insert assert annotations about known value ranges of variables or function parameters. a3 utilizes those assertions and may also be able to report violations.

    int func(int a, int b, int c)
    {
      __builtin_ais_annot("try instruction %here {\n"
                          " assert always enter with: %e1 in (0..7);\n"
                          " assert always enter with: %e2 in (0..7);\n"
                          " assert always enter with: %e3 in (0..7);\n"
                          "};", a, b, c);
      ...
    }
Areas

The contents of memory areas can also be specified for a3.

The following example assumes an embedded device that is rather slow when executing code located in ROM, and faster when running code from RAM. It is a common pattern in such a context to dynamically copy (e.g. once at system boot time) tiny hotspot functions from ROM into RAM buffers and execute the code from those buffers instead of calling the actual routine. Without further annotations a3 usually cannot know which code will be executed and hence may report unresolved computed branches in function ISR1_hwcheck.

A source level annotation can specify for a3 which code is copied into RAM buffers:

    typedef unsigned char(*CryptFktPtr_t)(unsigned char);

    // Assume this routine to be a hotspot and copied to crypt_ram buffer
    unsigned char crypt_function (unsigned char input)
    {
      return (input ^ 0xCAFE);
    }

    // Code buffer for execution from RAM
    volatile char crypt_ram[50];

    // Setup routine at boot time
    void init_crypt_ram()
    {
      __builtin_ais_annot("copy area %e1 width %e2 from %e3;",
                          &crypt_ram,
                          sizeof(crypt_ram),
                          &crypt_function);

      memcpy((void *)crypt_ram, &crypt_function, 50);
    }

    void ISR1_hwcheck(void)
    {
      CryptFktPtr_t f = (CryptFktPtr_t)((char *)crypt_ram);
      ...
      cs = f(s);
    }

Memory areas that are used for communication with external devices can be marked accordingly:

    volatile char* buffer_in[128];

    void init_device()
    {
      __builtin_ais_annot("area %e1 width %e2 {\n"
                          " readable: true;\n"
                          " writable: false;\n"
                          " volatile;\n"
                          "};", &buffer_in, sizeof(buffer_in));

      ...
    }

6.4.2  Reference description

Format specifiers

__builtin_ais_annot supports the following format specifiers that are replaced with information in AIS-Syntax on export:

Extraction order

When using the __builtin_ais_annot feature, CompCert collects all annotations contained in a compilation unit and stores them in encoded form in a special section of the object file (__compcert_ais_annotations). While creating the final executable, the linker collects all annotation sections from the object files, concatenates them, and stores the result in the executable.

The order in which annotations are exported into the final executable is explicitly undefined. This applies to the extraction within each compilation unit as well as the merging process done by the linker. It is therefore not possible to rely on a specific order in which a3 will see the annotations.

Semantics

Semantically, CompCert treats __builtin_ais_annot as a call to an external function. CompCert will not generate actual code for a call to an external function, but the parameters of the builtin will be evaluated, as it is mandatory in C semantics.

If all additional arguments are non-volatile C variables or compile-time constant expressions, it is guaranteed that no additional code will be generated for __builtin_ais_annot. Moreover, the location displayed as a replacement for the %e or %l sequence is guaranteed to be the location where the corresponding variable resides. If one or several additional arguments are complex expressions (neither non-volatile variables nor constant expressions), useless code is generated to compute their values and leave them in temporary registers or stack locations. The location displayed as a replacement for the %e or %l sequence is that of the corresponding temporary.

It should also be noted that using symbols in parameter expressions to __builtin_ais_annot may also have an effect on the liveness of those symbols and hence prevent some optimizations. Furthermore, since __builtin_ais_annot is considered a call to an external function it also acts as a barrier for many optimizations.

In the current implementation, __builtin_ais_annot can only be used at places where C statements are valid, i.e. within a function definition, but not on the toplevel.

Lastly, CompCert has no knowledge about the AIS syntax and does neither a syntactical nor a semantical check on the annotations.

6.4.3  Best practices

Avoiding side effects

Annotations with __builtin_ais_annot should be pure code, i.e. side effects with the application code should be avoided in the parameter expressions of the builtin. This improves the clarity of the application code and also allows compatibility with other compilers, e.g. via conditional compilation.

    // Bad style: a builtin-call modifying 'len'
    __builtin_ais_annot("loop %here bound: 0..%e1;", ++len);"
Multiline annotations

AIS annotations often span multiple lines. The simplest way to reproduce this visually with the builtin is by utilizing the automatic concatenation of string literals in C. Newline characters have to be inserted as an escape sequence \n. For example, consider the annotation:

    __builtin_ais_annot("# a comment"
                        " and "
                        "another comment\n"
                        "annotation line 1\n"
                        "annotation line 2");

This is equivalent to an AIS file containing:

    # a comment and another comment
annotation line 1
annotation line 2
String literals and quoting

AIS annotations itself can also contain string literals, e.g. to refer to symbol names. Translating this directly to a string literal for use with __builtin_ais_annot would require quoting of all special characters. It is therefore recommended to use single quoted string literals in AIS annotations where possible. Example:

    routine "name" { ... };

Directly translating this would result in:

    __builtin_ais_annot("routine \"name\" { ... };");

Note the escape characters \ which are necessary to encode " into the C string literal. Using single quotes on AIS-level results in a more readable annotation:

    __builtin_ais_annot("routine 'name' { ... };");

There are two levels of quoting: the first one on C-level to encode the AIS content as a string literal. The second one is the quoting on AIS-level to encode special characters in the AIS-syntax.

Separate analyses of tasks

Currently a3 can extract either all annotations embedded in an executable or none. Analyses that cover only a portion of the binary code – e.g. when doing a separate analysis for each task of the executable – may therefore issue warnings for annotations of unreachable program points. The try keyword of AIS can be used to suppress such warnings:

    __builtin_ais_annot("try instruction 'name' -> computed(1) { ... };");

If name is not reachable from the starting point of a specific analysis, no warning will be generated by a3.

6.5  General program annotations

In addition to the a3 specific program annotations described in chapter 6.4 CompCert C provides a more general but lower-level mechanism to attach free-form annotations (text messages possibly mentioning the values of variables) to C program points, and have these annotations transported throughout compilation, all the way to the generated assembly code. Analysis tools can extract the annotation information from comments in the generated assembly code and process them further for their needs.

The general annotation mechanism is presented as a pseudo built-in function called __builtin_annot, taking as arguments a string literal and zero or more local variables. For example, the binary search routine bsearch in the following C code snippet can be annotated with two important pieces of information: First, the while loop executes at most ⌈ log2(100) ⌉ = 7 iterations. Second, the array index m is always between 0 and 99, ensuring that the memory access tbl[m] is always within bounds.

    int bsearch(int tbl[100], int v)
    {
      int l = 0, u = 99;
      __builtin_annot("loop bound: 0..7");
      while (l <= u) {
        int m = (l + u) / 2;
        __builtin_annot("%1 = 0..99", m);
        if (tbl[m] < v) l = m + 1;
        else if (tbl[m] > v) u = m - 1;
        else return m;
      }
      return -1;
    }

After compiling this function using ccomp -Wa,-a -c bsearch.c the generated assembly listing looks as follows:

    ...
     4                    bsearch:
     5 0000 9421FFF0              stwu    r1, -16(r1)
     6 0004 7C0802A6              mflr    r0
     7 0008 90010008              stw     r0, 8(r1)
     8 000c 39200000              addi    r9, r0, 0
     9 0010 39400063              addi    r10, r0, 99
    10                    # annotation: loop bound: 0..7
    11                    .L100:
    12 0014 7C095000              cmpw    cr0, r9, r10
    13 0018 41810040              bt      1, .L101
    14 001c 7CE95214              add     r7, r9, r10
    15 0020 7CE50E70              srawi   r5, r7, 1
    16 0024 7CA50194              addze   r5, r5
    17                    # annotation: r5 = 0..99
    18 0028 54A8103A              rlwinm  r8, r5, 2, 0, 29 # 0xfffffffc
    19 002c 7CC3402E              lwzx    r6, r3, r8
    20 0030 7C062000              cmpw    cr0, r6, r4
    21 0034 4180001C              bt      0, .L102
    ...

One can see that CompCert does not generate machine code for the two __builtin_annot source statements. Instead, CompCert produces assembly comments at the exact program points corresponding to those in the source function. These comments consist of the string literal carried by the annotation, where positional parameters %1, %2, etc, are replaced by the locations (processor registers or stack slots) of the corresponding variable arguments.

Generalizing from the example above, we see that __builtin_annot statements offer a flexible mechanism to mark program points and local variables in C source files, then track them all the way to assembly language. Besides helping with static analysis at the machine code level, this mechanism also facilitates manual traceability analysis between C and assembly code.

Reference description

CompCert’s annotation mechanism is presented by the following two pseudo built-in functions.

void __builtin_annot(const char * msg, ...)
The first argument must be a string literal. It can be followed by arbitrarily many additional arguments, of integer, pointer or floating-point types. In the intended uses described above, the additional arguments are names of local variables, but arbitrary expressions are allowed.

The formal semantics of __builtin_annot is that of a pro forma “print” statement: the compiler assumes that every time a __builtin_annot is executed, the message and the values of the additional arguments are displayed on an hypothetical output device. In other words, an invocation of __builtin_annot is treated as an observable event in the program execution. The compiler, therefore, guarantees that __builtin_annot statements are never removed, duplicated, nor reordered; instead, they always execute at the times prescribed by the semantics of the source program, and in the same order relative to other observable events such as calls to I/O functions or volatile memory accesses.

As described in the motivational example above, the actual effect of a __builtin_annot statement is simply to generate a comment in the assembly code. This comment consists of the message carried by the annotation, where %n sequences are replaced by the machine location containing the value of the n-th additional argument, or by its value if the n-th additional argument is a compile-time constant expression of numerical type.

The location of an argument is either a machine register, an integer or floating-point constant, the name of a global variable, or a stack memory location of the form mem(sp + offset, size) where sp is the stack pointer register, offset a byte offset relative to the value of sp, and size the size in bytes of the argument. For example, on the PowerPC, register locations are R3R31 and F0F31, and stack locations are of the form mem(R1 + offset, size).

If all additional arguments are non-volatile C variables or compile-time constant expressions, it is guaranteed that no code (other than the assembly comment) will be generated for __builtin_annot. Moreover, the location displayed as a replacement for the %n sequence is guaranteed to be the location where the corresponding variable resides.

If one or several additional arguments are complex expressions (neither non-volatile variables nor constant expressions), useless code is generated to compute their values and leave them in temporary registers or stack locations. The location displayed as a replacement for the %n sequence is that of the corresponding temporary. This behavior is not particularly useful for static analysis at the machine level, and can generate useless code. It is therefore highly recommended to use only non-volatile variable names or constant expressions as additional parameters to __builtin_annot.

int __builtin_annot_intval(const char * msg, int x)
In contrast with __builtin_annot, which is used as a statement, __builtin_annot_intval is intended to be used within expressions, to track the location containing the value of a subexpression and return this value unchanged. A typical use is within array indexing expressions, to express an assertion over the array index:
    int x = tbl[__builtin_annot_intval("%1 = 0..99", (lo + hi)/ 2)];

The formal semantics of __builtin_annot_intval is also the pro forma effect of displaying the message msg and the value of the integer argument x on a hypothetical output device. In addition, the value of the second argument x is returned unchanged as the result of __builtin_annot_intval.

In the compiled code, __builtin_annot_intval evaluates its argument x to some temporary register, then inserts an assembly comment equal to the text msg where occurrences of %1 are replaced by the name of this register.

6.6  Extended inline assembly

Like in GCC and Clang, inline assembly code using the asm statement can be parameterized by C expressions as operands. The actual locations of the operands (registers and memory locations), as determined during compilation, are inserted in the given assembly code fragment.

Warning: Indiscriminate use of asm statements can ruin all the semantic preservation guarantees of CompCert. For this reason, support for asm statements is turned off by default and must be activated through the -finline-asm or -fall options. For the generated code to behave properly, it is the responsibility of the programmer to list (in the asm statement) the registers and memory that are modified by the assembly code, and to avoid modifying memory that is private to the compiled code, such as return addresses stored in the stack.

Examples

Here is how to use the PowerPC mulhw instruction to compute the high 32 bits of the 64-bit integer product x * y:

    int prod;
    asm ("mulhw %0,%1,%2" : "=r"(prod) : "r"(x), "r"(y));

The two arguments x and y are evaluated into registers, which get substituted for %1 and %2 (respectively) in the assembly template. Likewise, %0 is substituted by the register associated with variable prod. The three r constraints indicate that all three operands are expected in registers. The = constraint in =r means that the operand is an output.

To designate operands, symbolic names can be used instead of operand numbers. Using named operands, the mulhw example above can be written as:

    asm ("mulhw %[res],%[arg1],%[arg2]"
         : [res]"=r"(prod)
         : [arg1]"r"(x), [arg2]"r"(y));

Sometimes, inline assembly code has no result value, but modifies memory. An example is the dcba instruction of PowerPC, which allocates a cache block without reading its contents from main memory:

    asm volatile ("dcba 0, %[addr]" : : [addr]"r"(p) : "memory");

Note the absence of output operand, the volatile modifier indicating that the assembly code has side effects, and the "memory" annotation indicating that the assembly code modifies memory in ways that are not predictable by the compiler. CompCert treats all asm statements as volatile and clobbering memory, but other compilers need these annotations to produce correct code.

Some instructions operate over memory locations instead of registers. In this case, the m constraint should be used instead of r. For example, the x86 instruction fnstsw stores the FP control word in a memory location:

    unsigned short cw;
    asm ("fnstsw %0" : "=m" (cw));

Note that this asm statement has no inputs, hence the second colon can be omitted.

Other instructions demand arguments that are integer constants, instead of registers or memory locations. In this case, the i constraint should be used, and the corresponding argument must be a compile-time constant expression. For example, here is how to use the PowerPC mfspr and mtspr instructions to read and write special registers:

    #define read_spr(regno,res) \
       asm ("mfspr %0,%1" : "=r"(res) : "i"(regno))
    #define write_spr(regno,val) \
       asm volatile ("mtspr %0,%1" : : "i"(regno), "r"(val))

We already used the "memory" annotation telling the compiler that the assembly code “clobbers” (modifies unpredictably) the memory state. Likewise, if the assembly code modifies registers other than that of the output operand, the names of those registers must be given in the clobber list. For example, here is an implementation of atomic test-and-set using the x86 locked-exchange instruction:

    int testandset(int * p)
    {
      // store 1 in *p and return the previous value of *p
      int res;
      asm volatile
        ("movl $1, %%eax\n\t"
         "xchgl (%[addr]), %%eax\n\t"
         "movl %%eax, %[oldval]"
         : [oldval]"=r"(res) : [addr]"r"(p) : "eax", "memory");
      return res;
    }

Note that the assembly template can contain several instructions, using \n\t in the template to separate the instructions. Also note the use of %% in the assembly template to stand for a single % character in the generated assembly code.

Since the template uses eax as a temporary register, we must list register eax as clobbered. Besides informing the compiler that the previous contents of eax are destroyed, this clobber annotation also ensures that none of the asm operands (res and p) are allocated to eax.

On 32-bit target platforms, register operands of type long long or unsigned long long (64-bit integers) need special handling, since CompCert allocates them into pairs of 32-bit registers. The assembly template must use %R to refer to the most significant half of the register pair, and %Q to refer to the least significant half. For example, here is how to use the x86 rdtsc instruction to read the time stamp counter as an unsigned 64-bit integer:

    unsigned long long rdtsc(void)
    {
      unsigned long long res;
      asm("rdtsc\n\t"
          "movl %%eax, %Q0\n\t"
          "movl %%edx, %R0\n\t"
          : "=r" (res) : : "eax", "edx");
      return res;
    }
Reference description

The syntax of extended inline assembly is as follows:

statement::=… ∣ asm   (const|volatile)*   (   "template"   asm-operands   )
asm-operands::=: asm-output :  asm-inputs :  asm-clobbers 
 : asm-output :  asm-inputs 
 : asm-output 
  
asm-outputs::=asm-arg ?
asm-inputs::=asm-arg ,  … ,  asm-arg
asm-clobbers::="resource" ,  … ,  "resource"
asm-arg::=([ name ])?  "constraint" (  expr  )

An asm statement carries an assembly template (a string literal) and up to 3 lists of operands separated by colons: a list of zero or one output expressions (results produced by the assembly code); a list of zero, one or several input expressions (arguments used by the assembly code); and a list of zero, one or several resources (e.g. processor registers) that are “clobbered” by the assembly code.

The assembly template is a string literal possibly containing placeholders marked with a % (percent) character: either numbered placeholders (%0, %1, etc) or named placeholders (%[name]). During compilation, the placeholders are replaced by the locations of the corresponding operands, then the resulting text is given to the assembler as is. A %% sequence in the template is translated to a single % character in the text. Operands are numbered consecutively starting with %0 for the first output operand (if any) and continuing with the input operands. Instead of numbers, the template can also refer to operands by their optional names, using the %[name] notation.

The assembly outputs and the inputs are comma-separated, possibly empty lists of C expressions preceded by an optional name between brackets and a mandatory constraint (a string literal). CompCert can handle zero or one output; two outputs or more cause a compile-time error. For inputs, the following constraints are supported:

Input constraintMeaning
"r"Register. The expression is evaluated into a processor register, whose name is inserted in the assembly template.
"m"Memory location. The expression is evaluated into a memory location, whose address is inserted (as a valid processor addressing mode) in the assembly template.
"i"Integer immediate. The expression must be a compile-time constant. Its value is inserted in the assembly template as a decimal literal.

For outputs, the C expressions must be l-values, and the following constraints are supported:

Input constraintMeaning
"=r"Register. This is either the register allocated to the output expression (if it is a local variable allocated to a register), or a temporary register chosen by CompCert, whose value is assigned to the expression just after the assembly code.
"=m"Memory location. This is the memory location of the output expression. Its address is inserted (as a valid processor addressing mode) in the assembly template.

The fourth, optional component of an asm statement is a comma-separated list of resources that are “clobbered” by the assembly code fragment, i.e. set to unpredictable values. The resources, given as string literals, are either processor register names or the special resources "memory" and "cc", denoting unpredictable changes to the memory and the processor’s condition codes, respectively. CompCert always assumes that inline assembly can modify memory and condition codes in unpredictable ways, even if the "memory" and "cc" clobbers are not specified. The register names are case-insensitive and depend on the target processor, as follows:

ProcessorRegister names
ARMinteger registers: "r0", "r1", …, "r12", "r14"
 VFP registers: "f0", "f1", …, "f15"
PowerPCinteger registers: "r0", "r3", …, "r12", "r14", …, "r31"
 FP registers: "f0", "f1", "f2", …, "f31"
x86 32 bitsinteger registers: "eax", "ebx", "ecx", "edx", "esi", "edi", "ebp"
 XMM registers: "xmm0", "xmm1", …, "xmm7"
x86 64 bitsinteger registers: "rax", "rbx", "rcx", "rdx", "rsi", "rdi", "rbp",
 integer registers: "r8", …, "r15"
 XMM registers: "xmm0", "xmm1", …, "xmm15"
RISC-Vinteger registers: "x5", "x6", …, "x29", "x30"
 FP registers: "f0", "f1", …, "f29", "f30"

Registers not listed above are reserved for use by the compiler and must not be modified by inline assembly code. For example, the stack pointer register (r13 for ARM, r1 for PowerPC, esp/rsp for x86, x2 for RISC-V) must be preserved.


Previous Up Next