Chapter 6 Language extensions

This chapter describes several extensions to the C99 standard implemented by CompCert: compiler pragmas (section 6.1), attributes (section 6.2), built-in functions (section 6.3), code annotation mechanisms (sections 6.4 and 6.5), and GCC-style extended inline assembly (section 6.6).

6.1 Pragmas

This section describes the pragmas supported by CompCert C. The compiler emits a warning for an unrecognized pragma.

#pragma reserve_register reg-name

Ensure that all subsequent function definitions do not use register reg-name in their compiled code, and therefore preserve the value of this register. The register must be a callee-save register. The following register names are allowed:

On PowerPC:	R14, R15, …, R31 (general-purpose registers)
	F14, F15, …, F31 (float registers)
On ARM:	R4, R5, …, R11 (integer registers)
	F8, F9, …, F15 (float registers)
On x86-32:	EBX, ESI, EDI, EBP (32-bit integer registers)
On x86-64:	RBX, RBP, R12, R13, R14, R15 (64-bit integer registers)
On RISC-V:	X8, X9; X18, X19, …, X27 (integer registers)
	F8, F9; F18, F19, …, F27 (float registers)
On AArch64:	X19, X20, …, X29 (integer registers)
	D8, D9, …, D15 (float registers)

#pragma section ident "iname" "uname" addr-mode access-mode

Define a new linker section, or modify the characteristics of an existing linker section. The parameters are as follows:

ident: The compiler internal name for the section.
iname: The linker section name to be used for initialized variables and for function code.
uname: The linker section name to be used for uninitialized variables.
addr-mode: The addressing mode used to access variables located in this section. On PowerPC, setting addr-mode to near-data denotes a small data area, which is addressed by 16-bit offsets relative to the corresponding small data area register. On PowerPC, setting addr-mode to far-data denotes a relocatable data area, which is addressed by 32-bit offsets relative to the corresponding register. Any other value of addr-mode denotes standard absolute addressing.
access-mode: One or several of R to denote a read-only section, W to denote a writable section, and X to denote an executable section.

Functions and global variables can be explicitly placed into user-defined sections using the #pragma use_section directive (see below). Another, more indirect, less recommended way is to modify the characteristics of the default sections in which the compiler automatically place function code, global variables, and auxiliary compiler-generated data. These default sections are as follows:

Internal name	What is put there
DATA	global, non-const variables of size greater than N bytes. N defaults to 0 and can be set using the -fsmall-data command-line option.
CONST	global, const variables of size greater than N bytes. N defaults to 0 and can be set using the -fsmall-const command-line option.
SDATA	global, non-const variables of size less than or equal to N bytes. N defaults to 0 and can be set using the -fsmall-data command-line option.
SCONST	global, const variables of size less than or equal to N bytes. N defaults to 0 and can be set using the -fsmall-const command-line option.
STRING	string literals
CODE	machine code for function definitions
LITERAL	constants (e.g. floating-point literals) referenced from function code
JUMPTABLE	jump tables generated for switch statements

A simpler, alternate way to control placement into sections is to use the section attribute.

Example: explicit placement into user-defined sections. We first define four new compiler sections.

    #pragma section MYDATA "mydata_i" "mydata_u" standard RW
    #pragma section MYCONST "myconst" "myconst" standard R
    #pragma section MYSDA "mysda_i" "mysda_u" near-data RW
    #pragma section MYCODE "mycode" "mycode" standard RX

We then use the #pragma use_section to place variables and functions in these sections, then define the variables and functions in question.

    #pragma use_section MYDATA a b
    int a;                // uninitialized; goes into linker section "mydata_u"
    double b = 3.1415;    // initialized; goes into linker section "mydata_i"
    #pragma use_section MYCONST c
    const short c = 42;   // goes into linker section "myconst"
    #pragma use_section MYSDA d e
    int d;                // goes into linker section "mysda_u"
    double e = 2.718;     // goes into linker section "mysda_i"
    #pragma use_section MYCODE f
    int f(void) { return a + d; }
                          // goes into linker section "mycode"
                          // accesses d via small data relative addressing
                          // accesses a via absolute addressing

Example: implicit placement by modifying the compiler default sections. In this example, we assume that the options -fsmall-data 8 and -fsmall-const 0 are given.

    #pragma section DATA "mydata_i" "mydata_u" standard RW
    #pragma section CONST "myconst" "myconst" standard R
    #pragma section SDATA "mysda_i" "mysda_u" near-data RW
    #pragma section CODE "mycode" "mycode" standard RX
    #pragma section LITERAL "mylit" "mylit" standard R

    int a;            // small data, uninitialized; goes into "mysda_u"
    char b[16];       // big data, uninitialized; goes into "mydata_u"
    const int c = 42; // big const data, initialized; goes into "myconst"
    double f(void) { return c + 3.14; }
                      // code goes into "mycode"
                      // literal 3.14 goes into "mylit"

Caveat: when using non-standard sections, the linker must, in general, be informed of these sections and how to place them in the final executable, typically by providing an appropriate mapping file to the linker.

#pragma use_section ident var …

Explicitly place the global variables or functions var, … in the compiler section named ident. The use_section pragma must occur before any declaration or definition of the variables and functions it mentions. See #pragma section above for additional explanations and examples.

6.2 Attributes

Like the GCC compiler, the CompCert C compiler allows the programmer to attach attributes to various parts of C source code.

An attribute qualifier is of the form __attribute((attribute-list)) or __attribute__((attribute-list)), where attribute-list is a possibly empty, comma-separated lists of attributes. Each attribute is of the following form:

attribute	::=	ident
	∣	ident(attr-arg,…,attr-arg)
attr-arg	::=	ident
	∣	"string-literal"
	∣	const-expr

Each attribute carries a name and zero or more arguments, which can be identifiers, string literals, or C expressions of integer types that are compile-time constants.

Attribute names can be specified with __ (two underscore characters) preceding and following each name. For instance, __aligned__(N) and aligned(N) denote the same attribute.

For compatibility with other C compilers, the keyword __packed__ is also recognized as an attribute:

__packed__(params)

is equivalent to

__attribute((packed(params)))

In C source files, attribute qualifiers can occur anywhere a standard type qualifier (const, volatile) can occur, and also just after the struct and union keywords. For partial compatibility with GCC, the CompCert parser allows attributes to occur in several other places, but may silently ignore them.

Warning. Some standard C libraries, when used in conjunction with CompCert, deactivate the __attribute__ keyword: the standard includes, or CompCert itself, define __attribute__ as a macro that erases its argument. This is the case for the Glibc standard library under Linux, and the Xcode header files under macOS. For this reason, please use the __attribute keyword in preference to the __attribute__ keyword.

The following attributes are recognized by CompCert. For unrecognized attributes CompCert issues a warning.

aligned(N)

(name attribute) Specify the alignment to use for a variable or a struct member. The argument N is the desired alignment, in bytes. It must be a power of 2. This attribute is equivalent to the qualifier _Alignas(N).

noinline

(function attribute) Prevents a function from being considered for inlining.

noreturn

(function attribute) Indicate that a function never returns normally. This information is currently not used for code optimization, but only to produce more precise warnings. This attribute is similar to the _Noreturn function specifier from ISO C 2011, but it can also appear in function pointer types.

packed(max-member-alignment, min-struct-alignment, byte-swap)

(structure attribute) This attribute is recognized only if the -fpacked-structs option is active (section 3.2.9).

The packed attribute applies to a struct declarations and affects the memory layout of the members of this struct. Zero, one, two or three integer arguments can be provided.

If the max-member-alignment parameter is provided, the natural alignment of every member (field) of the structure is reduced to at most max-member-alignment. In particular, if max-member-alignment = 1, members are not aligned, and no padding is ever inserted between members.

If the min-struct-alignment parameter is provided, the natural alignment of the whole structure is increased to at least min-struct-alignment.

If the byte-swap parameter is provided and equal to 1, accesses to structure members of integer or pointer types are performed using the opposite endianness than that of the target platform. For PowerPC, accesses are done in little-endian mode instead of the natural big-endian mode; for x86 and RISC-V accesses are done in big-endian mode instead of the natural little-endian mode; for ARM — where the endianness is configuration dependent — accesses are done in big-endian mode when configured as little-endian and vice versa.

Examples:

    struct __attribute__((packed(1))) s1 {    // suppress all padding
      char c;           // at offset 0
      short s;          // at offset 1
      int i;            // at offset 3
    };                 // total size is 7, structure alignment is 1

    struct __attribute__((packed(4,16,1))) s2 {
      char c;           // at offset 0
      short s;          // at offset 4 (because 4-aligned); byte-swap at access
      int i;            // at offset 8 (because 4-aligned); byte-swap at access
    };                 // total size is 16, structure alignment is 16

    struct s3 {        // default layout
      char c;           // at offset 0
      short s;          // at offset 2 (because 2-aligned)
      int i;            // at offset 4
    };                 // total size is 8, structure alignment is 4

Limitations: For a byte-swapped structure, all members should be of integer or pointer types, or arrays of those types.

Reduced member alignment should not be used on the ARM platform, since unaligned memory accesses on ARM can crash the program or silently produce incorrect results. Only the PowerPC, x86 and RISC-V platforms support unaligned memory accesses.

Bit fields are not supported inside byte-swapped structures and reduced-alignment structures.

packed

(structure attribute) This form of the packed attribute is equivalent to packed(1). It causes the the structure it modifies to be laid out with no alignment padding between the members. The size of the resulting structure is therefore the sum of the sizes of its members. The alignment of the resulting structure is 1.

section("section-name")

(object attribute) Specify the linker section section-name where to place functions and global variables whose types carry this section attribute. The linker section is declared as executable read-only if the attribute applies to a function definition; as read-only if the attribute applies to a const variable definition; and as read-write if the attribute applies to a non-const variable definition.

The #pragma section and #pragma use_section directives can be used to obtain finer control on user-defined sections (section 6.1).

unused

(object attribute) This attribute, attached to a variable or function parameter, means that the variable/parameter is meant to be possibly unused. CompCert will not produce a warning for it.

Interpretation of attributes. Attributes are attached to parts of the program in a way that depends on the kind of the attribute:

Type attributes are treated like the standard type qualifiers const, volatile, etc.
Structure attributes such as packed apply to struct, union and enum types.
Name attributes such as aligned apply to the name being defined: variable, function, type name (typedef), struct, union, enum, or member of struct or union.
Function attributes such as noreturn apply to function declarations or definitions.
Object attributes such as section apply to the variable or function being declared or defined.

For example, consider

  __attribute((noreturn, section("foo"))) void (*fptr) (int x);

The section attributes applies to variable fptr while the noreturn attribute applies to the function type. The meaning, therefore, is that fptr is a variable in section foo that has type “pointer to non-returning functions from int to void”.

  __attribute((aligned(16))) char * p;
  typedef __attribute((aligned(16))) char char16;
  char16 * q;

The aligned attribute applies to the variable name p and to the type name char16. Consequently, p is a 16-aligned pointer to naturally-aligned char, while q is a naturally-aligned pointer to 16-aligned char.

6.3 Built-in functions

Built-in functions are functions that are predefined in the initial environment — the environment in which the compilation of a source file starts. In other words, these built-in functions are always available: there is no need to include header files.

Built-in functions give access to interesting capabilities of the target processor that cannot be exploited in standard C code. For example, most processors provide an instruction to quickly compute the absolute value of a floating-point number. In CompCert C, this instruction is made available to the programmer via the __builtin_fabs function. It provides a faster alternative to the fabs library function from <math.h>.

Invocations of built-in functions are automatically inlined by the compiler at point of use. It is a compile-time error to take a pointer to a built-in function.

Some built-in functions are available on all target platforms supported by CompCert. Others are specific to a particular platform.

6.3.1 Common built-in functions

Integer arithmetic

unsigned int __builtin_bswap32(unsigned int x): Return x with the order of the 4 bytes reversed. If x is of the form 0xaabbccdd, the result is 0xddccbbaa.
unsigned int __builtin_bswap(unsigned int x): A synonym for __builtin_bswap32.
unsigned short __builtin_bswap16(unsigned short x): Return x with the order of the 2 bytes reversed. If x is of the form 0xaabb, the result is 0xbbaa.
unsigned long long __builtin_bswap64(unsigned long long x): Return x with the order of the 8 bytes reversed. If x is of the form 0xaabbccddeeffgghh, the result is 0xhhggffeeddccbbaa.
int __builtin_clz(unsigned int x)
int __builtin_clzl(unsigned long x)
int __builtin_clzll(unsigned long long x): Count the number of consecutive zero bits in x, starting with the most significant bit. On x86 the result is undefined if x is 0, otherwise the result is between 0 and 31 inclusive if the type of x is 32-bit wide, or between 0 and 63 inclusive if the type of x is 64-bit wide. On other target architectures, the result is between 0 and 32 inclusive if the type of x is 32-bit wide, or between 0 and 64 inclusive if the type of x is 64-bit wide.
int __builtin_ctz(unsigned int x)
int __builtin_ctzl(unsigned long x)
int __builtin_ctzll(unsigned long long x): Count the number of consecutive zero bits in x, starting with the least significant bit. On x86 the result is undefined if x is 0, otherwise the result is between 0 and 31 inclusive if the type of x is 32-bit wide, or between 0 and 63 inclusive if the type of x is 64-bit wide. On other target architectures, the result is between 0 and 32 inclusive if the type of x is 32-bit wide, or between 0 and 64 inclusive if the type of x is 64-bit wide.

Floating-point arithmetic

double __builtin_fabs(double x)
float __builtin_fabsf(float x): Return the floating-point absolute value of its argument, or NaN if the argument is NaN. The __builtin_fabs function is equivalent to the fabs() function from the <math.h> standard library, and __builtin_fabsf is equivalent to fabsf(), but the built-in functions execute faster.
double __builtin_fsqrt(double x)
double __builtin_sqrt(double x): Return the square root of x, like the sqrt function from the <math.h> standard library. On PowerPC the corresponding instruction is optional and not supported by all processors. Note: PowerPC e5500 does not support the fsqrt instruction.

Conditionals

TYPE __builtin_sel(_Bool cond, TYPE iftrue, TYPE iffalse): Return iftrue if cond is true, iffalse if cond is false. TYPE may be any integer, enumeration, pointer, or floating-point type. Expands to a branch-free instruction sequence if supported by the target architecture.
int __builtin_constant_p(EXPR): Return 1 if the argument expression EXPR is a compile-time constant (as defined by ISO C 99, section 6.6), 0 otherwise. The argument can be of any type.

Block copy with known size and alignment

void __builtin_memcpy_aligned (void * dst, const void * src, size_t sz, size_t al)

Copy sz bytes from the memory area at src to the memory area at dst. The source and destination memory areas must be either disjoint or identical; the behavior is undefined if they overlap. The pointers src and dst must be aligned on an al byte boundary, where al is a power of 2. The sz and al arguments must be compile-time constants. A typical invocation is

    __builtin_memcpy_aligned(&dst, &src, sizeof(dst), _Alignof(type_dst));

where dst and src are two objects of the same complete type type_dst. An invocation of __builtin_memcpy_aligned(dst,src,sz,al) behaves like memcpy(dst,src,sz) as defined in the <string.h> standard library. Knowing the size and alignment at compile-time enables the compiler to generate very efficient inline code for the copy.

Synchronization

void __builtin_membar(void): Software memory barrier. Prevent the compiler from moving memory loads and stores across the call to __builtin_membar. No processor instructions are generated, hence the hardware can still reorder memory accesses. To generate hardware memory barriers, see the “synchronization” built-in functions specific to each processor.

Optimization hints

long __builtin_expect(long exp, long cst)

Inform the compiler that the expression exp is likely to evaluate to value cst (usually a compile-time constant). The return value is that of exp. For example:

    if (__builtin_expect(p == NULL, 0)) { ... } else { ... }

This tells the compiler that p is usually not NULL and, therefore, that the else branch is more likely to be executed than the then branch.

void __builtin_unreachable(void)

Inform the compiler that the point of the __builtin_unreachable is never reached during program execution. (If control flow does reach this point, the program is undefined.) For example, __builtin_unreachable() can be added after a call to a function that does not return but was not declared _Noreturn.

6.3.2 PowerPC built-in functions

Integer arithmetic

int __builtin_mulhw(int x, int y): Return the high 32 bits of the full 64-bit product of two signed integers.
unsigned int __builtin_mulhwu(unsigned int x, unsigned int y): Return the high 32 bits of the full 64-bit product of two unsigned integers.
long long __builtin_mulhd(long long x, long long y): Return the high 64 bits of the full 128-bit product of two signed long longs (only available for 32/64-bit hybrid PowerPC).
unsigned long long __builtin_mulhdu (unsigned long long x, unsigned long long y): Return the high 64 bits of the full 128-bit product of two unsigned long longs (only available for 32/64-bit hybrid PowerPC).
int __builtin_isel(_Bool cond, int iftrue, int iffalse)
unsigned int __builtin_uisel (_Bool cond, unsigned int iftrue, unsigned int iffalse)
_Bool __builtin_bsel(_Bool cond, _Bool iftrue, _Bool iffalse): Return iftrue if cond is true, iffalse if cond is false. Expands to an isel instruction on Freescale EREF processors or a simple mr instruction in the case that iftrue and iffalse are in the same register, and to a branch-free instruction sequence on other PowerPC processors.
signed long long __builtin_isel64 (_Bool cond, signed long long iftrue, signed long long iffalse)
unsigned long long __builtin_uisel64 (_Bool cond, unsigned long long iftrue, unsigned long long iffalse): Return iftrue if cond is true, iffalse if cond is false. Expands to an isel instruction on Freescale EREF processors, or a simple mr instruction in the case that iftrue and iffalse are in the same register, and to a branch-free instruction sequence on other PowerPC processors (only available for 32/64-bit hybrid PowerPC).

Floating-point arithmetic

double __builtin_fmadd(double x, double y, double z): Fused multiply-add. Compute x*y + z without rounding the intermediate product x*y.
double __builtin_fmsub(double x, double y, double z): Fused multiply-sub. Compute x*y - z without rounding the intermediate product x*y.
double __builtin_fnmadd(double x, double y, double z): Fused multiply-add-negate. Compute -(x*y + z) without rounding the intermediate product x*y.
double __builtin_fnmsub(double x, double y, double z): Fused multiply-sub-negate. Compute -(x*y - z) without rounding the intermediate product x*y.
double __builtin_frsqrte(double x): Compute an estimate (with relative accuracy 1/32) of 1 / √x, the reciprocal of the square root of x. The corresponding PowerPC instruction is optional and not supported by all processors (PowerPC e5500 supports this instruction).
float __builtin_fres(float x): Compute an estimate (with relative accuracy 1/256) of the single-precision reciprocal of x. The corresponding PowerPC instruction is optional and not supported by all processors (PowerPC e5500 supports this instruction).
double __builtin_fsel(double x, double y, double z): Return y if x is greater or equal to 0.0. Return z if x is less than 0.0 or is NaN. The corresponding PowerPC instruction is optional and not supported by all processors (PowerPC e5500 supports this instruction).
int __builtin_fcti(double x): Round the given floating-point number x to an integer and return this integer. The difference with the standard C conversion (int)x is that the latter rounds x towards zero, while __builtin_fcti(x) rounds x according to the current rounding mode (by default: to the nearest integer, ties round to even).

Memory accesses with reversed endianness

unsigned short __builtin_read16_reversed(const unsigned short * ptr): Read a 16-bit integer at address ptr and reverse its endianness by swapping the two bytes of the result, as __builtin_bswap16 does.
unsigned int __builtin_read32_reversed(const unsigned int * ptr): Read a 32-bit integer at address ptr and reverse its endianness by swapping the four bytes of the result, as __builtin_bswap32 does.
unsigned long long __builtin_read64_reversed (const unsigned long long * ptr): Read a 64-bit integer at address ptr and reverse its endianness by swapping the eight bytes of the result, as __builtin_bswap64 does. (Currently only available for 32/64-bit hybrid PowerPC.)
void __builtin_write16_reversed(unsigned short * ptr, unsigned short x): Reverse the endianness of x by swapping its two bytes, then write the 16-bit result at address ptr.
void __builtin_write32_reversed(unsigned int * ptr, unsigned int x): Reverse the endianness of x by swapping its four bytes, then write the 32-bit result at address ptr.
void __builtin_write64_reversed (unsigned long long * ptr, unsigned long long x): Reverse the endianness of x by swapping its eight bytes, then write the 64-bit result at address ptr. (Currently only available for 32/64-bit hybrid PowerPC.)

Synchronization instructions

void __builtin_eieio(void): Issue an eieio barrier.
void __builtin_sync(void): Issue a sync barrier.
void __builtin_isync(void): Issue an isync barrier.
void __builtin_lwsync(void): Issue an lwsync barrier.
void __builtin_mbar(int level): Issue a mbar barrier. The integer argument must be 0 or 1.
void __builtin_trap(void): Abort the program on an unconditional trap instruction.

Cache control instructions

void __builtin_dcbf(void * addr)
void __builtin_dcbi(void * addr)
void __builtin_dcbtls(void * addr, int level)
void __builtin_dcbz(void * addr)
void __builtin_icbi(void * addr)
void __builtin_icbtls(void * addr, int level): Issue the corresponding cache control instruction. addr is the address of the cache block affected. level must be the constant 0 (L1 cache) or 2 (L2 cache).
void __builtin_prefetch(void * addr, int for_write, int level): Issue a dcbt instruction if for_write is 0 and a dcbtst instruction if for_write is 1.

Access to special registers

unsigned int __builtin_get_spr(int spr)
unsigned long long __builtin_get_spr64(int spr)
void __builtin_set_spr(int spr, unsigned int value)
void __builtin_set_spr64(int spr, unsigned long long value): The spr argument is the number of the special register accessed. It must be a compile-time constant.

Atomic (sequentially-consistent) memory operations

void __builtin_atomic_exchange(int * addr, int * new, int * old): Store the current value of *addr in *old and set *addr to the value *new.
void __builtin_atomic_load(int * p, int * val): Read the current value of *p and store it into *val.
_Bool __builtin_atomic_compare_exchange (int * p, int * expected, int * desired): Compare the current value of *p with *expected. If equal, set *p to the value *desired and return 1. If different, set *expected to the current value of *p and return 0.
int __builtin_sync_fetch_and_add(int * p, int delta): Add delta to the contents of *p. Return the initial value of *p before the addition.

Note: CompCert does not model multi-threaded execution and only gives the same guarantees for the atomic memory operations as it does for all other built-in functions. The built-in itself expand to a sequence of PowerPC instructions that are suitable to implement a sequentially-consistent variant of these atomic memory operations.

Note: CompCert uses the lwarx and stwcx. instructions in the expansion of the atomic builtins. Therefore, memory areas accessed by the builtins must be naturally aligned and should be in pages that are marked as WIMGE = 001xx. Otherwise, exceptions may occur.

6.3.3 x86 built-in functions

Floating-point arithmetic

double __builtin_fmax(double x, double y): Return the greater of x and y. If x or y are NaN, the result is either x or y, but it is unspecified which.
double __builtin_fmin(double x, double y): Return the smaller of x and y. If x or y are NaN, the result is either x or y, but it is unspecified which.
double __builtin_fmadd(double x, double y, double z): Fused multiply-add. Compute x*y + z without rounding the intermediate product x*y. Requires a processor that supports the FMA3 extensions.
double __builtin_fmsub(double x, double y, double z): Fused multiply-sub. Compute x*y - z without rounding the intermediate product x*y. Requires a processor that supports the FMA3 extensions.
double __builtin_fnmadd(double x, double y, double z): Fused multiply-negate-add. Compute -(x*y) + z without rounding the intermediate product x*y. Requires a processor that supports the FMA3 extensions.
double __builtin_fnmsub(double x, double y, double z): Fused multiply-negate-sub. Compute -(x*y) - z without rounding the intermediate product x*y. Requires a processor that supports the FMA3 extensions.

6.3.4 ARM built-in functions

Synchronization instructions

void __builtin_dmb(void): Issue a dmb (data memory) barrier.
void __builtin_dsb(void): Issue a dsb (data synchronization) barrier.
void __builtin_isb(void): Issue an isb (instruction synchronization) barrier.

6.3.5 RISC-V built-in functions

Floating-point arithmetic

double __builtin_fmax(double x, double y): Return the greater of x and y. If both x and y are NaN, the default NaN is returned. If x is NaN but not y, y is returned. If y is NaN but not x, x is returned.
double __builtin_fmin(double x, double y): Return the smaller of x and y. If both x and y are NaN, the default NaN is returned. If x is NaN but not y, y is returned. If y is NaN but not x, x is returned.
double __builtin_fmadd(double x, double y, double z): Fused multiply-add. Compute x*y + z without rounding the intermediate product x*y.
double __builtin_fmsub(double x, double y, double z): Fused multiply-sub. Compute x*y - z without rounding the intermediate product x*y.
double __builtin_fnmadd(double x, double y, double z): Fused multiply-add-negate. Compute -(x*y) - z without rounding the intermediate product x*y.
double __builtin_fnmsub(double x, double y, double z): Fused multiply-sub-negate. Compute -(x*y) + z without rounding the intermediate product x*y.

6.3.6 AArch64 built-in functions

Integer arithmetic

int __builtin_cls(int x)
int __builtin_clsl(long x)
int __builtin_clsll(long long x): Count the number of consecutive bits equal to the sign bit in x, starting with the most significant bit. The count does not include the sign bit itself, so the result will be in the range 0 to 31 (for __builtin_cls) and in the range 0 to 63 (for __builtin_clsl and __builtin_clsll).

Floating-point arithmetic

double __builtin_fmax(double x, double y): Return the greater of x and y. If x or y are NaN, a NaN is returned.
double __builtin_fmin(double x, double y): Return the smaller of x and y. If x or y are NaN, a NaN is returned.
double __builtin_fmadd(double x, double y, double z): Fused multiply-add. Compute x*y + z without rounding the intermediate product x*y.
double __builtin_fmsub(double x, double y, double z): Fused multiply-sub. Compute (-x)*y + z without rounding the intermediate product (-x)*y.
double __builtin_fnmadd(double x, double y, double z): Fused multiply-add-negate. Compute (-x)*y + (-z) without rounding the intermediate product (-x)*y.
double __builtin_fnmsub(double x, double y, double z): Fused multiply-sub-negate. Compute x*y + (-z) without rounding the intermediate product x*y.

6.4 Embedded program annotations for a³

CompCert C provides a mechanism to annotate C source code with information for AbsInt’s a³ framework (a³ is an acronym for AbsInt Advanced Analyzer) combining various analyzers modules like aiT, TimingProfiler, StackAnalyzer, or ValueAnalyzer.

The analyzer modules of a³ typically operate on machine code in fully linked executables and usually expect external information for improved analysis precision to be also presented on this rather low level.

With CompCert’s annotation mechanism it is now possible to reliably annotate on C source code level and reason about C variables instead of using code addresses or processor registers. Furthermore it allows to add annotations in the AIS specification language of a³ (refer [4]). These annotations are automatically carried through the compilation chain and the linked executable into a³ without the need of external annotation files.

The annotation mechanism is available for all target architectures with ELF binary format.

Concept CompCert collects all annotations of a compilation unit and stores them in encoded form in a special section named __compcert_ais_annotations in the object file. a³ can automatically extract them from this section and utilize them in analyses.

For CompCert, annotations via __builtin_ais_annot look like a call to an external variadic function similar to printf: The first argument contains the AIS annotation and is also a format string. It can contain format specifiers (section 6.4.2) that are replaced with the AIS representation of the additional arguments to __builtin_ais_annot. In contrast to printf most format specifiers are tagged with numbers and refer to a specific argument independent of the order. It is also possible to refer to an argument more than once.

In the context of optimizations, the builtin is also a robust mechanism to attach annotations at specific code locations. It does not rely on the rather ambiguous line information of the DWARF debug information but rather utilizes the label mechanism of the assembler and linker to generate annotations with actual code addresses.

With the builtin-mechanism it is possible for CompCert to e.g. remove unreachable code together with the contained annotations or do code transformations like reordering code blocks without breaking the annotations. Consider the following C snippet:

    static void func(int count)
    {
      int i;
      for (int i = 0; i < count; i++) {
        __builtin_ais_annot("try loop %here bound: %e1;", count);
        ...
      }
      ...
    }

If constant propagation can prove that count is always zero, CompCert can remove the whole loop since it will never be executed. In such a situation the annotation will also be removed. In contrast to this, a conventional source code annotation via special formatted C comments (e.g. // ai: …) would remain visible and probably cause problems since a³ collects such annotations by scanning the source code. The same is true, when source code uses the C preprocessor for conditional compilation: CompCert can remove not used annotations while conventional source code annotations will remain visible for a³.

6.4.1 Examples

The following paragraphs depict some motivating examples on how to use the annotation mechanism with a³ in, e.g. a software library that will be integrated in an embedded system.

Label definitions AIS labels can be used to create robust references to code locations, e.g. for the location of the different branches in an if-statement:

    void funcExecTask(int)
    {
      __builtin_ais_annot("label 'funcExecTask_before_cond': %here;");

      if (condition) {
        __builtin_ais_annot("label 'funcExecTask_cond_then': %here;");
        ...
      } else {
        __builtin_ais_annot("label 'funcExecTask_cond_else': %here;");
        ...
      }
      ...
    }

A user of a³ can then easily use the defined label references for his analysis specific annotations:

    ## User ais annotations for specific analyses# annotations for code within then and else branches
try instruction label('funcExecTask_cond_then') ...;
try instruction label('funcExecTask_cond_else') ...;
# annotation for the actual branch instruction
try instruction label('funcExecTask_before_cond') -> conditional(1) ...;

Excluding critical code snippets Labels can also be used to exclude critical code snippets from analysis. First a label is defined to reference the beginning of a critical code section. Behind the critical code, a second annotation excludes the code snippet between the defined label and the current address (%here) from analysis:

    void funcExecTask(int taskNo)
    {
      ...

      // Label before critical code section
      __builtin_ais_annot("label 'execTask_begin_critical': %here;");

      // The critical code, not suitable for analysis
      ...

      // End of critical code: cut it away and annotate execution time, etc.
      // depending on function parameter
      __builtin_ais_annot("try instruction label('execTask_begin_critical')\n"
                          " snippet {\n"
                          " continue at: %here;\n"
                          " not analyzed;\n"
                          " takes: %e1 * 42 cycles;\n"
                          " ...\n"
                          " }", taskNo);

      ...
    }

Loop and recursion annotations A (probably overestimated) annotation can be specified in the source code of a common library routine to ensure that a³ can compute reasonable results without annotation effort or to increase analysis precision at specific code loations. If necessary, a user of a³ can improve this annotation by giving more specific annotations for the actual analysis context. For example:

Specifying a bound for a data dependent loop:

    int strncmp_x(char s[], char t[], int len)
    {
      int i;
      for (i = 0; i < len && ...; i++) {
        __builtin_ais_annot("try loop %here bound: 0..%e1;", len);
        ...
      }
      return 0;
    }

Providing unrolling hints for loops for improved precision in a³:

    void strcpy_x(char s[], char t[])
    {
      int i = 0;
      while (( s[i] = t[i] ) != '\0') {
        __builtin_ais_annot("try loop %here mapping { default unroll: 50; }");
        ...
      }
    }

Specifying a time bound for a busy waiting loop:

    void openCanSocket(volatile struct device_t* device)
    {
      ...
      // Busy wait for ACK. Assume a worst-case timing of 23 us
      while(device->bus_data != 0x00) {
        __builtin_ais_annot("try loop %here takes: 23 us;");
      }
      ...
    }

Specifying a recursion bound and an incarnation bound for a recursive routine.

    void errorHook(unsigned char err_code)
    {
      __builtin_ais_annot("try routine %e1 {\n"
                          " recursion bound: 1;\n"
                          " incarnation limit: 1;\n"
                          "}", &errorHook);
      ...
    }

Refining values In this example a double value with a known range is converted to an integer and used as an index, e.g. to access a data array. a³ has currently no knowledge of floating point values and needs help to restrict the range of i (and derived from it, a memory access) to a small range:

    double func(double x)
    {
      double data[10] = { ... };

      // x is known to be always >= 0.0 and < 10.0
      int i = x;

      // Refine the value in the location holding variable i
      __builtin_ais_annot("try instruction %here { enter with: %l1 = 0..9; };",
                          i);

      return data[i];
    }

This will result in the following PowerPC assembly code, where \%here will be replaced by CompCert together with the assembler/linker with the address of label .L118:

    ; int i = x;
      fctiwz  f13, f1
      stfdu   f13, -8(r1)
      lwz     r5, 4(r1)
      addi    r1, r1, 8

    .L117:
    ; __builtin_ais_annot("try instruction %here { enter with: %l1 = 0..9; };",
    ;                     i);

    .L118: .L119:
    ; return data[i];
      addi    r3, r1, 16
      rlwinm  r4, r5, 3, 0, 28 ; 0xfffffff8
      lfdx    f1, r3, r4

    .L120:
      addi    r1, r1, 96

    .L121:
      blr

    ...

    .section  "__compcert_ais_annotations",,n
    .ascii "# file:test.c line:25 function:func\ntry instruction "
    .byte 7,4
    .4byte .L118
    .ascii " { enter with: reg("r5") = 0..9; };"
    .ascii "\n"

The annotation, as extracted by a³ will then look as follows:

    # test.c line:25 function:func
instruction 0x10013c { enter with: reg("r5") = 0..9; };

Another possibility for refinement of values and improved analysis precision in a³ is to insert assert annotations about known value ranges of variables or function parameters. a³ utilizes those assertions and may also be able to report violations.

    int func(int a, int b, int c)
    {
      __builtin_ais_annot("try instruction %here {\n"
                          " assert always enter with: %e1 in (0..7);\n"
                          " assert always enter with: %e2 in (0..7);\n"
                          " assert always enter with: %e3 in (0..7);\n"
                          "};", a, b, c);
      ...
    }

Areas The contents of memory areas can also be specified for a³.

The following example assumes an embedded device that is rather slow when executing code located in ROM, and faster when running code from RAM. It is a common pattern in such a context to dynamically copy (e.g. once at system boot time) tiny hotspot functions from ROM into RAM buffers and execute the code from those buffers instead of calling the actual routine. Without further annotations a³ usually cannot know which code will be executed and hence may report unresolved computed branches in function ISR1_hwcheck.

A source level annotation can specify for a³ which code is copied into RAM buffers:

    typedef unsigned char(*CryptFktPtr_t)(unsigned char);

    // Assume this routine to be a hotspot and copied to crypt_ram buffer
    unsigned char crypt_function (unsigned char input)
    {
      return (input ^ 0xCAFE);
    }

    // Code buffer for execution from RAM
    volatile char crypt_ram[50];

    // Setup routine at boot time
    void init_crypt_ram()
    {
      __builtin_ais_annot("copy area %e1 width %e2 from %e3;",
                          &crypt_ram,
                          sizeof(crypt_ram),
                          &crypt_function);

      memcpy((void *)crypt_ram, &crypt_function, 50);
    }

    void ISR1_hwcheck(void)
    {
      CryptFktPtr_t f = (CryptFktPtr_t)((char *)crypt_ram);
      ...
      cs = f(s);
    }

Memory areas that are used for communication with external devices can be marked accordingly:

    volatile char* buffer_in[128];

    void init_device()
    {
      __builtin_ais_annot("area %e1 width %e2 {\n"
                          " readable: true;\n"
                          " writable: false;\n"
                          " volatile;\n"
                          "};", &buffer_in, sizeof(buffer_in));

      ...
    }

6.4.2 Reference description

Format specifiers __builtin_ais_annot supports the following format specifiers that are replaced with information in AIS-Syntax on export:

%here is replaced with the absolute address of the annotation location in the final executable. For example consider an annotation like:
```
    __builtin_ais_annot("instruction %here -> call(1) { ... };");
```
a³ will see this as:
```
    # file:... line:... function:...
instruction 0x1000 -> call(1) { ... };
```
Note, that no debug information is necessary for the %here-replacement. Instead it relies only on the label mechanism of the assembler and linker.
Expressions, i.e. %e1, %e2, …: they are replaced with an AIS expression for the value of the first, second, … additional argument. The resulting AIS expressions have the form:
- An integer constant if the argument is a constant expression or could be optimized to one. This can e.g. happen for C variables that are used as constants within the code.
- A register value, e.g. reg("r1"). This may be generated for values of variables that are currently held in single registers.
- An expression for a composed value, e.g. (reg("r1")*0x10000000)+reg("r2"). This is the output for the value of 64bit variables that are currently held in a register pair.
- A value in a stack cell, e.g. mem(reg("r1") + 8, 4). This is generated for values of variables that are located on the stack.
- An address in the local stackframe, e.g. (reg("r1") + 8). Using the address-of operator on a local variable or a function parameter forces CompCert to locate that variable in the stack frame instead of a register.
- An address in global memory, e.g. (0x1234 + 0), where 0x1234 is the start address of a global symbol.
- A value in a global memory cell, e.g. mem(0x1234 + 0, 4).
- The sum of two AIS expressions, e.g. to compose a complex address (expr₁ + expr₂).
L-values, i.e. %l1, %l2, …: similar to expressions, they are replaced with an AIS expression for the value of the first, second, … additional argument. The difference is that %ln is intended for use in an L-value position within an AIS expression, which reduces the available AIS syntax to three cases:
- A register value, e.g. reg("r1").
- A stack cell, e.g. mem(reg("r1") + 8, 4)
- A global memory cell, e.g. mem(0x1234 + 0, 4)
For parameters that cannot be expressed in one of the three forms, CompCert will issue a warning and ignore the annotation. Apart from this restriction, %l and %e produce the same output.

Extraction order When using the __builtin_ais_annot feature, CompCert collects all annotations contained in a compilation unit and stores them in encoded form in a special section of the object file (__compcert_ais_annotations). While creating the final executable, the linker collects all annotation sections from the object files, concatenates them, and stores the result in the executable.

The order in which annotations are exported into the final executable is explicitly undefined. This applies to the extraction within each compilation unit as well as the merging process done by the linker. It is therefore not possible to rely on a specific order in which a³ will see the annotations.

Semantics Semantically, CompCert treats __builtin_ais_annot as a call to an external function. CompCert will not generate actual code for a call to an external function, but the parameters of the builtin will be evaluated, as it is mandatory in C semantics.

If needed a architecture specific nop instructions will be inserted in order to prevent merging of a generated label with a following label. For example if an annotation is directly followed by a do-while and the annotation label would be otherwise merged with the loop head.

If all additional arguments are non-volatile C variables or compile-time constant expressions, it is guaranteed that no additional code will be generated for __builtin_ais_annot. Moreover, the location displayed as a replacement for the %e or %l sequence is guaranteed to be the location where the corresponding variable resides. If one or several additional arguments are complex expressions (neither non-volatile variables nor constant expressions), useless code is generated to compute their values and leave them in temporary registers or stack locations. The location displayed as a replacement for the %e or %l sequence is that of the corresponding temporary.

It should also be noted that using symbols in parameter expressions to __builtin_ais_annot may also have an effect on the liveness of those symbols and hence prevent some optimizations. Furthermore, since __builtin_ais_annot is considered a call to an external function it also acts as a barrier for many optimizations.

In the current implementation, __builtin_ais_annot can only be used at places where C statements are valid, i.e. within a function definition, but not on the toplevel.

Lastly, CompCert has no knowledge about the AIS syntax and does neither a syntactical nor a semantical check on the annotations.

6.4.3 Best practices

Avoiding side effects Annotations with __builtin_ais_annot should be pure code, i.e. side effects with the application code should be avoided in the parameter expressions of the builtin. This improves the clarity of the application code and also allows compatibility with other compilers, e.g. via conditional compilation.

    // Bad style: a builtin-call modifying 'len'
    __builtin_ais_annot("loop %here bound: 0..%e1;", ++len);"

Multiline annotations AIS annotations often span multiple lines. The simplest way to reproduce this visually with the builtin is by utilizing the automatic concatenation of string literals in C. Newline characters have to be inserted as an escape sequence \n. For example, consider the annotation:

    __builtin_ais_annot("# a comment"
                        " and "
                        "another comment\n"
                        "annotation line 1\n"
                        "annotation line 2");

This is equivalent to an AIS file containing:

    # a comment and another comment
annotation line 1
annotation line 2

String literals and quoting AIS annotations itself can also contain string literals, e.g. to refer to symbol names. Translating this directly to a string literal for use with __builtin_ais_annot would require quoting of all special characters. It is therefore recommended to use single quoted string literals in AIS annotations where possible. Example:

    routine "name" { ... };

Directly translating this would result in:

    __builtin_ais_annot("routine \"name\" { ... };");

Note the escape characters \ which are necessary to encode " into the C string literal. Using single quotes on AIS-level results in a more readable annotation:

    __builtin_ais_annot("routine 'name' { ... };");

There are two levels of quoting: the first one on C-level to encode the AIS content as a string literal. The second one is the quoting on AIS-level to encode special characters in the AIS-syntax.

Separate analyses of tasks Currently a³ can extract either all annotations embedded in an executable or none. Analyses that cover only a portion of the binary code – e.g. when doing a separate analysis for each task of the executable – may therefore issue warnings for annotations of unreachable program points. The try keyword of AIS can be used to suppress such warnings:

    __builtin_ais_annot("try instruction 'name' -> computed(1) { ... };");

If name is not reachable from the starting point of a specific analysis, no warning will be generated by a³.

6.5 General program annotations

In addition to the a³ specific program annotations described in chapter 6.4 CompCert C provides a more general but lower-level mechanism to attach free-form annotations (text messages possibly mentioning the values of variables) to C program points, and have these annotations transported throughout compilation, all the way to the generated assembly code. Analysis tools can extract the annotation information from comments in the generated assembly code and process them further for their needs.

The general annotation mechanism is presented as a pseudo built-in function called __builtin_annot, taking as arguments a string literal and zero or more local variables. For example, the binary search routine bsearch in the following C code snippet can be annotated with two important pieces of information: First, the while loop executes at most ⌈ log₂(100) ⌉ = 7 iterations. Second, the array index m is always between 0 and 99, ensuring that the memory access tbl[m] is always within bounds.

    int bsearch(int tbl[100], int v)
    {
      int l = 0, u = 99;
      __builtin_annot("loop bound: 0..7");
      while (l <= u) {
        int m = (l + u) / 2;
        __builtin_annot("%1 = 0..99", m);
        if (tbl[m] < v) l = m + 1;
        else if (tbl[m] > v) u = m - 1;
        else return m;
      }
      return -1;
    }

After compiling this function using ccomp -Wa,-a -c bsearch.c the generated assembly listing looks as follows:

    ...
     4                    bsearch:
     5 0000 9421FFF0              stwu    r1, -16(r1)
     6 0004 7C0802A6              mflr    r0
     7 0008 90010008              stw     r0, 8(r1)
     8 000c 39200000              addi    r9, r0, 0
     9 0010 39400063              addi    r10, r0, 99
    10                    # annotation: loop bound: 0..7
    11                    .L100:
    12 0014 7C095000              cmpw    cr0, r9, r10
    13 0018 41810040              bt      1, .L101
    14 001c 7CE95214              add     r7, r9, r10
    15 0020 7CE50E70              srawi   r5, r7, 1
    16 0024 7CA50194              addze   r5, r5
    17                    # annotation: r5 = 0..99
    18 0028 54A8103A              rlwinm  r8, r5, 2, 0, 29 # 0xfffffffc
    19 002c 7CC3402E              lwzx    r6, r3, r8
    20 0030 7C062000              cmpw    cr0, r6, r4
    21 0034 4180001C              bt      0, .L102
    ...

One can see that CompCert does not generate machine code for the two __builtin_annot source statements. Instead, CompCert produces assembly comments at the exact program points corresponding to those in the source function. These comments consist of the string literal carried by the annotation, where positional parameters %1, %2, etc, are replaced by the locations (processor registers or stack slots) of the corresponding variable arguments.

Generalizing from the example above, we see that __builtin_annot statements offer a flexible mechanism to mark program points and local variables in C source files, then track them all the way to assembly language. Besides helping with static analysis at the machine code level, this mechanism also facilitates manual traceability analysis between C and assembly code.

Reference description CompCert’s annotation mechanism is presented by the following two pseudo built-in functions.

void __builtin_annot(const char * msg, ...)

The first argument must be a string literal. It can be followed by arbitrarily many additional arguments, of integer, pointer or floating-point types. In the intended uses described above, the additional arguments are names of local variables, but arbitrary expressions are allowed.

The formal semantics of __builtin_annot is that of a pro forma “print” statement: the compiler assumes that every time a __builtin_annot is executed, the message and the values of the additional arguments are displayed on an hypothetical output device. In other words, an invocation of __builtin_annot is treated as an observable event in the program execution. The compiler, therefore, guarantees that __builtin_annot statements are never removed, duplicated, nor reordered; instead, they always execute at the times prescribed by the semantics of the source program, and in the same order relative to other observable events such as calls to I/O functions or volatile memory accesses.

As described in the motivational example above, the actual effect of a __builtin_annot statement is simply to generate a comment in the assembly code. This comment consists of the message carried by the annotation, where %n sequences are replaced by the machine location containing the value of the n-th additional argument, or by its value if the n-th additional argument is a compile-time constant expression of numerical type.

The location of an argument is either a machine register, an integer or floating-point constant, the name of a global variable, or a stack memory location of the form mem(sp + offset, size) where sp is the stack pointer register, offset a byte offset relative to the value of sp, and size the size in bytes of the argument. For example, on the PowerPC, register locations are R3…R31 and F0…F31, and stack locations are of the form mem(R1 + offset, size).

If all additional arguments are non-volatile C variables or compile-time constant expressions, it is guaranteed that no code (other than the assembly comment) will be generated for __builtin_annot. Moreover, the location displayed as a replacement for the %n sequence is guaranteed to be the location where the corresponding variable resides.

If one or several additional arguments are complex expressions (neither non-volatile variables nor constant expressions), useless code is generated to compute their values and leave them in temporary registers or stack locations. The location displayed as a replacement for the %n sequence is that of the corresponding temporary. This behavior is not particularly useful for static analysis at the machine level, and can generate useless code. It is therefore highly recommended to use only non-volatile variable names or constant expressions as additional parameters to __builtin_annot.

int __builtin_annot_intval(const char * msg, int x)

In contrast with __builtin_annot, which is used as a statement, __builtin_annot_intval is intended to be used within expressions, to track the location containing the value of a subexpression and return this value unchanged. A typical use is within array indexing expressions, to express an assertion over the array index:

    int x = tbl[__builtin_annot_intval("%1 = 0..99", (lo + hi)/ 2)];

The formal semantics of __builtin_annot_intval is also the pro forma effect of displaying the message msg and the value of the integer argument x on a hypothetical output device. In addition, the value of the second argument x is returned unchanged as the result of __builtin_annot_intval.

In the compiled code, __builtin_annot_intval evaluates its argument x to some temporary register, then inserts an assembly comment equal to the text msg where occurrences of %1 are replaced by the name of this register.

6.6 Extended inline assembly

Like in GCC and Clang, inline assembly code using the asm statement can be parameterized by C expressions as operands. The actual locations of the operands (registers and memory locations), as determined during compilation, are inserted in the given assembly code fragment.

Warning: Indiscriminate use of asm statements can ruin all the semantic preservation guarantees of CompCert. For this reason, support for asm statements is turned off by default and must be activated through the -finline-asm or -fall options. For the generated code to behave properly, it is the responsibility of the programmer to list (in the asm statement) the registers and memory that are modified by the assembly code, and to avoid modifying memory that is private to the compiled code, such as return addresses stored in the stack.

Examples Here is how to use the PowerPC mulhw instruction to compute the high 32 bits of the 64-bit integer product x * y:

    int prod;
    asm ("mulhw %0,%1,%2" : "=r"(prod) : "r"(x), "r"(y));

The two arguments x and y are evaluated into registers, which get substituted for %1 and %2 (respectively) in the assembly template. Likewise, %0 is substituted by the register associated with variable prod. The three r constraints indicate that all three operands are expected in registers. The = constraint in =r means that the operand is an output.

To designate operands, symbolic names can be used instead of operand numbers. Using named operands, the mulhw example above can be written as:

    asm ("mulhw %[res],%[arg1],%[arg2]"
         : [res]"=r"(prod)
         : [arg1]"r"(x), [arg2]"r"(y));

Sometimes, inline assembly code has no result value, but modifies memory. An example is the dcba instruction of PowerPC, which allocates a cache block without reading its contents from main memory:

    asm volatile ("dcba 0, %[addr]" : : [addr]"r"(p) : "memory");

Note the absence of output operand, the volatile modifier indicating that the assembly code has side effects, and the "memory" annotation indicating that the assembly code modifies memory in ways that are not predictable by the compiler. CompCert treats all asm statements as volatile and clobbering memory, but other compilers need these annotations to produce correct code.

Some instructions operate over memory locations instead of registers. In this case, the m constraint should be used instead of r. For example, the x86 instruction fnstsw stores the FP control word in a memory location:

    unsigned short cw;
    asm ("fnstsw %0" : "=m" (cw));

Note that this asm statement has no inputs, hence the second colon can be omitted.

Other instructions demand arguments that are integer constants, instead of registers or memory locations. In this case, the i constraint should be used, and the corresponding argument must be a compile-time constant expression. For example, here is how to use the PowerPC mfspr and mtspr instructions to read and write special registers:

    #define read_spr(regno,res) \
       asm ("mfspr %0,%1" : "=r"(res) : "i"(regno))
    #define write_spr(regno,val) \
       asm volatile ("mtspr %0,%1" : : "i"(regno), "r"(val))

We already used the "memory" annotation telling the compiler that the assembly code “clobbers” (modifies unpredictably) the memory state. Likewise, if the assembly code modifies registers other than that of the output operand, the names of those registers must be given in the clobber list. For example, here is an implementation of atomic test-and-set using the x86 locked-exchange instruction:

    int testandset(int * p)
    {
      // store 1 in *p and return the previous value of *p
      int res;
      asm volatile
        ("movl $1, %%eax\n\t"
         "xchgl (%[addr]), %%eax\n\t"
         "movl %%eax, %[oldval]"
         : [oldval]"=r"(res) : [addr]"r"(p) : "eax", "memory");
      return res;
    }

Note that the assembly template can contain several instructions, using \n\t in the template to separate the instructions. Also note the use of %% in the assembly template to stand for a single % character in the generated assembly code.

Since the template uses eax as a temporary register, we must list register eax as clobbered. Besides informing the compiler that the previous contents of eax are destroyed, this clobber annotation also ensures that none of the asm operands (res and p) are allocated to eax.

On 32-bit target platforms, register operands of type long long or unsigned long long (64-bit integers) need special handling, since CompCert allocates them into pairs of 32-bit registers. The assembly template must use %R to refer to the most significant half of the register pair, and %Q to refer to the least significant half. For example, here is how to use the x86 rdtsc instruction to read the time stamp counter as an unsigned 64-bit integer:

    unsigned long long rdtsc(void)
    {
      unsigned long long res;
      asm("rdtsc\n\t"
          "movl %%eax, %Q0\n\t"
          "movl %%edx, %R0\n\t"
          : "=r" (res) : : "eax", "edx");
      return res;
    }

Reference description The syntax of extended inline assembly is as follows:

statement	::=	… ∣ asm (const\|volatile)* ( "template" asm-operands )
asm-operands	::=	: asm-output : asm-inputs : asm-clobbers
	∣	: asm-output : asm-inputs
	∣	: asm-output
	∣
asm-outputs	::=	asm-arg ?
asm-inputs	::=	asm-arg , … , asm-arg
asm-clobbers	::=	"resource" , … , "resource"
asm-arg	::=	([ name ])? "constraint" ( expr )

An asm statement carries an assembly template (a string literal) and up to 3 lists of operands separated by colons: a list of zero or one output expressions (results produced by the assembly code); a list of zero, one or several input expressions (arguments used by the assembly code); and a list of zero, one or several resources (e.g. processor registers) that are “clobbered” by the assembly code.

The assembly template is a string literal possibly containing placeholders marked with a % (percent) character: either numbered placeholders (%0, %1, etc) or named placeholders (%[name]). During compilation, the placeholders are replaced by the locations of the corresponding operands, then the resulting text is given to the assembler as is. A %% sequence in the template is translated to a single % character in the text. Operands are numbered consecutively starting with %0 for the first output operand (if any) and continuing with the input operands. Instead of numbers, the template can also refer to operands by their optional names, using the %[name] notation.

The assembly outputs and the inputs are comma-separated, possibly empty lists of C expressions preceded by an optional name between brackets and a mandatory constraint (a string literal). CompCert can handle zero or one output; two outputs or more cause a compile-time error. For inputs, the following constraints are supported:

Input constraint	Meaning
"r"	Register. The expression is evaluated into a processor register, whose name is inserted in the assembly template.
"m"	Memory location. The expression is evaluated into a memory location, whose address is inserted (as a valid processor addressing mode) in the assembly template.
"i"	Integer immediate. The expression must be a compile-time constant. Its value is inserted in the assembly template as a decimal literal.

For outputs, the C expressions must be l-values, and the following constraints are supported:

Output constraint	Meaning
"=r"	Register. This is either the register allocated to the output expression (if it is a local variable allocated to a register), or a temporary register chosen by CompCert, whose value is assigned to the expression just after the assembly code.
"=m"	Memory location. This is the memory location of the output expression. Its address is inserted (as a valid processor addressing mode) in the assembly template.

The fourth, optional component of an asm statement is a comma-separated list of resources that are “clobbered” by the assembly code fragment, i.e. set to unpredictable values. The resources, given as string literals, are either processor register names or the special resources "memory" and "cc", denoting unpredictable changes to the memory and the processor’s condition codes, respectively. CompCert always assumes that inline assembly can modify memory and condition codes in unpredictable ways, even if the "memory" and "cc" clobbers are not specified. The register names are case-insensitive and depend on the target processor, as follows:

Processor	Register names
ARM	integer registers: "r0", "r1", …, "r12", "r14"
	VFP registers: "f0", "f1", …, "f15"
PowerPC	integer registers: "r0", "r3", …, "r12", "r14", …, "r31"
	FP registers: "f0", "f1", "f2", …, "f31"
x86 32 bits	integer registers: "eax", "ebx", "ecx", "edx", "esi", "edi", "ebp"
	XMM registers: "xmm0", "xmm1", …, "xmm7"
x86 64 bits	integer registers: "rax", "rbx", "rcx", "rdx", "rsi", "rdi", "rbp",
	integer registers: "r8", …, "r15"
	XMM registers: "xmm0", "xmm1", …, "xmm15"
RISC-V	integer registers: "x5", "x6", …, "x29", "x30"
	FP registers: "f0", "f1", …, "f30", "f31"
AArch64	integer registers: "x0", "x1", …, "x30"
	FP registers: "d0", "d1", …, "d31"

Registers not listed above are reserved for use by the compiler and must not be modified by inline assembly code. For example, the stack pointer register (r13 for ARM, r1 for PowerPC, esp/rsp for x86, x2 for RISC-V) must be preserved.

Chapter 6 Language extensions

6.1 Pragmas

6.2 Attributes

6.3 Built-in functions

6.3.1 Common built-in functions

6.3.2 PowerPC built-in functions

6.3.3 x86 built-in functions

6.3.4 ARM built-in functions

6.3.5 RISC-V built-in functions

6.3.6 AArch64 built-in functions

6.4 Embedded program annotations for a3

6.4.1 Examples

6.4.2 Reference description

6.4.3 Best practices

6.5 General program annotations

6.6 Extended inline assembly

6.4 Embedded program annotations for a³