Previous Up Next

Chapter 6  Language extensions

This chapter describes several extensions to the C99 standard implemented by CompCert: compiler pragmas (section 6.1), attributes (section 6.2), built-in functions (section 6.3), a code annotation mechanism (section 6.4), and GCC-style extended inline assembly (section 6.5).

6.1  Pragmas

This section describes the pragmas supported by CompCert C. The compiler emits a warning for an unrecognized pragma.

#pragma reserve_register reg-name
Ensure that all subsequent function definitions do not use register reg-name in their compiled code, and therefore preserve the value of this register. The register must be a callee-save register. The following register names are allowed:
On PowerPC:R14, R15, …, R31 (general-purpose registers)
 F14, F15, …, F31 (float registers)
On ARM:R4, R5, …, R11 (integer registers)
 F8, F9, …, F15 (float registers)
On x86-32:EBX, ESI, EDI, EBP (32-bit integer registers)
On x86-64:RBX, RBP, R12, R13, R14, R15 (64-bit integer registers)
#pragma section ident "iname" "uname" addr-mode access-mode
Define a new linker section, or modify the characteristics of an existing linker section. The parameters are as follows:
ident
The compiler internal name for the section.
iname
The linker section name to be used for initialized variables and for function code.
uname
The linker section name to be used for uninitialized variables.
addr-mode
The addressing mode used to access variables located in this section. On PowerPC, setting addr-mode to near-data denotes a small data area, which is addressed by 16-bit offsets relative to the corresponding small data area register. On PowerPC, setting addr-mode to far-data denotes a relocatable data area, which is addressed by 32-bit offsets relative to the corresponding register. Any other value of addr-mode denotes standard absolute addressing.
access-mode
One or several of R to denote a read-only section, W to denote a writable section, and X to denote an executable section.

Functions and global variables can be explicitly placed into user-defined sections using the #pragma use_section directive (see below). Another, more indirect, less recommended way is to modify the characteristics of the default sections in which the compiler automatically place function code, global variables, and auxiliary compiler-generated data. These default sections are as follows:

Internal nameWhat is put there
DATAglobal, non-const variables of size greater than N bytes. N defaults to 0 and can be set using the -fsmall-data command-line option.
CONSTglobal, const variables of size greater than N bytes. N defaults to 0 and can be set using the -fsmall-const command-line option.
SDATAglobal, non-const variables of size less than or equal to N bytes. N defaults to 0 and can be set using the -fsmall-data command-line option.
SCONSTglobal, const variables of size less than or equal to N bytes. N defaults to 0 and can be set using the -fsmall-const command-line option.
STRINGstring literals
CODEmachine code for function definitions
LITERALconstants (e.g. floating-point literals) referenced from function code
JUMPTABLEjump tables generated for switch statements

A simpler, alternate way to control placement into sections is to use the section attribute.

Example:

explicit placement into user-defined sections. We first define four new compiler sections.

    #pragma section MYDATA "mydata_i" "mydata_u" standard RW
    #pragma section MYCONST "myconst" "myconst" standard R
    #pragma section MYSDA "mysda_i" "mysda_u" near-access RW
    #pragma section MYCODE "mycode" "mycode" standard RX

We then use the #pragma use_section to place variables and functions in these sections, then define the variables and functions in question.

    #pragma use_section MYDATA a b
    int a;                // uninitialized; goes into linker section "mydata_u"
    double b = 3.1415;    // initialized; goes into linker section "mydata_i"
    #pragma use_section MYCONST c
    const short c = 42;   // goes into linker section "myconst"
    #pragma use_section MYSDA d e
    int d;                // goes into linker section "mysda_u"
    double e = 2.718;     // goes into linker section "mysda_i"
    #pragma use_section MYCODE f
    int f(void) { return a + d; }
                          // goes into linker section "mycode"
                          // accesses d via small data relative addressing
                          // accesses a via absolute addressing
Example:

implicit placement by modifying the compiler default sections. In this example, we assume that the options -fsmall-data 8 and -fsmall-const 0 are given.

    #pragma section DATA "mydata_i" "mydata_u" standard RW
    #pragma section CONST "myconst" "myconst" standard R
    #pragma section SDATA "mysda_i" "mysda_u" near-access RW
    #pragma section CODE "mycode" "mycode" standard RX
    #pragma section LITERAL "mylit" "mylit" standard R

    int a;            // small data, uninitialized; goes into "mysda_u"
    char b[16];       // big data, uninitialized; goes into "mydata_u"
    const int c = 42; // big const data, initialized; goes into "myconst"
    double f(void) { return c + 3.14; }
                      // code goes into "mycode"
                      // literal 3.14 goes into "mylit"
Caveat:

when using non-standard sections, the linker must, in general, be informed of these sections and how to place them in the final executable, typically by providing an appropriate mapping file to the linker.

#pragma use_section ident var
Explicitly place the global variables or functions var, … in the compiler section named ident. The use_section pragma must occur before any declaration or definition of the variables and functions it mentions. See #pragma section above for additional explanations and examples.

6.2  Attributes

Like the GCC compiler, the CompCert C compiler allows the programmer to attach attributes to various parts of C source code.

An attribute qualifier is of the form __attribute((attribute-list)) or __attribute__((attribute-list)), where attribute-list is a possibly empty, comma-separated lists of attributes. Each attribute is of the following form:

attribute::=ident
 ident(attr-arg,,attr-arg)
attr-arg::=ident
 "string-literal"
 const-expr

Each attribute carries a name and zero, one or several arguments, which can be identifiers, string literals, or C expressions of integer types that are compile-time constants.

Attribute names can be specified with __ (two underscore characters) preceding and following each name. For instance, __aligned__(N) and aligned(N) denote the same attribute.

For compatibility with other C compilers, the keyword __packed__ is also recognized as an attribute:

__packed__()is equivalent to__attribute((packed))
__packed__(params)is equivalent to __attribute((packed(params)))

In C source files, attribute qualifiers can occur anywhere a standard type qualifier (const, volatile) can occur, and also just after the struct and union keywords. For partial compatibility with GCC, the CompCert parser allows attributes to occur in several other places, but silently ignores them.

Warning. Some standard C libraries, when used in conjunction with CompCert, deactivate the __attribute__ keyword: the standard includes, or CompCert itself, define __attribute__ as a macro that erases its argument. This is the case for the Glibc standard library under Linux, and the Xcode header files under OS X. For this reason, please use the __attribute keyword in preference to the __attribute__ keyword.

The following attributes are recognized by CompCert. Unrecognized attributes are silently ignored.

aligned(N)
(type attribute) Specify the alignment to use for a variable or a struct member. The argument N is the desired alignment, in bytes. It must be a power of 2. This attribute is equivalent to the qualifier _Alignas(N).
noreturn
(function attribute) Indicate that a function never returns normally. This information is currently not used for code optimization, but only to produce more precise warnings. This attribute is similar to the _Noreturn function specifier from ISO C 2011, but it can also appear in function pointer types.
packed(max-member-alignment, min-struct-alignment, byte-swap)
(structure attribute) This attribute is recognizedonly if the -fpacked-structs option is active (section 3.2.8).

The packed attribute applies to a struct declarations and affects the memory layout of the members of this struct. Zero, one, two or three integer arguments can be provided.

If the max-member-alignment parameter is provided, the natural alignment of every member (field) of the structure is reduced to at most max-member-alignment. In particular, if max-member-alignment = 1, members are not aligned, and no padding is ever inserted between members.

If the min-struct-alignment parameter is provided, the natural alignment of the whole structure is increased to at least min-struct-alignment.

If the byte-swap parameter is provided and equal to 1, accesses to structure members of integer or pointer types are performed using the opposite endianness than that of the target platform. For PowerPC, accesses are done in little-endian mode instead of the natural big-endian mode; for x86 accesses are done in big-endian mode instead of the natural little-endian mode; for ARM — where the endianness is configuration dependent — accesses are done in big-endian mode when configured as little-endian and vice versa.

Examples:
    struct __attribute__((packed(1))) s1 {    // suppress all padding
      char c;           // at offset 0
      int i;            // at offset 1
      short s;          // at offset 5
    };                  // total size is 7, structure alignment is 1

    struct __attribute__((packed(4,16,1))) s2 {
      short s;          // at offset 0; byte-swap at access
      int i;            // at offset 4 (because 4-aligned); byte-swap at access
    };                  // total size is 8, structure alignment is 16

    struct s3 {         // default layout
      char c;           // at offset 0
      int i;            // at offset 4 (because 4-aligned)
      short s;          // at offset 8
    };                  // total size is 12, structure alignment is 4
Limitations:

For a byte-swapped structure, all members should be of integer or pointer types, or arrays of those types.

Reduced member alignment should not be used on the ARM platform, since unaligned memory accesses on ARM can crash the program or silently produce incorrect results. Only the PowerPC and x86 platforms support unaligned memory accesses.

packed
(structure attribute) This form of the packed attribute is equivalent to packed(1). It causes the the structure it modifies to be laid out with no alignment padding between the members. The size of the resulting structure is therefore the sum of the sizes of its members. The alignment of the resulting structure is 1.
section("section-name")
(name attribute) Specify the linker section section-name where to place functions and global variables whose types carry this section attribute. The linker section is declared as executable read-only if the attribute applies to a function definition; as read-only if the attribute applies to a const variable definition; and as read-write if the attribute applies to a non-const variable definition.

The #pragma section and #pragma use_section directives can be used to obtain finer control on user-defined sections (section 6.1).

Interpretation of attributes.

Attributes are attached to parts of the program in a way that depends on the kind of the attribute:

For example, consider

  __attribute((section("foo"), aligned(8))) int * p;

The section attributes applies to variable p while the aligned attributes modifies type int. The meaning, therefore, is that p is a variable in section foo that has type pointer to 8-aligned chars.

  __attribute((noreturn)) void (*fptr) (int x);

The noreturn attribute applies to the function type, not to the variable fptr. The meaning is that fptr is a variable of type “pointer to non-returning functions from int to void”.

6.3  Built-in functions

Built-in functions are functions that are predefined in the initial environment — the environment in which the compilation of a source file starts. In other words, these built-in functions are always available: there is no need to include header files.

Built-in functions give access to interesting capabilities of the target processor that cannot be exploited in standard C code. For example, most processors provide an instruction to quickly compute the absolute value of a floating-point number. In CompCert C, this instruction is made available to the programmer via the __builtin_fabs function. It provides a faster alternative to the fabs library function from <math.h>.

Invocations of built-in functions are automatically inlined by the compiler at point of use. It is a compile-time error to take a pointer to a built-in function.

Some built-in functions are available on all target platforms supported by CompCert. Others are specific to a particular platform.

6.3.1  Common built-in functions

Integer arithmetic:

unsigned int __builtin_bswap(unsigned int x)
Swap the bytes of x to change its endianness. If x is of the form 0xaabbccdd, the result is 0xddccbbaa.
unsigned int __builtin_bswap32(unsigned int x)
A synonym for __builtin_bswap.
unsigned short __builtin_bswap16(unsigned short x)
Swap the bytes of x to change its endianness. If x is of the form 0xaabb, the result is 0xbbaa.
int __builtin_clz(unsigned int x)
int __builtin_clzl(unsigned long x)
Count the number of consecutive zero bits in x, starting with the most significant bit. On x86 the result is undefined if x is 0, otherwise the result is between 0 and 31 inclusive. On PowerPC and ARM architectures, the result is between 0 and 32 inclusive.
int __builtin_clzll(unsigned long long x)
Count the number of consecutive zero bits in x, starting with the most significant bit. On x86 the result is undefined if x is 0, otherwise the result is between 0 and 63 inclusive. On PowerPC and ARM architectures, the result is between 0 and 64 inclusive.
int __builtin_ctz(unsigned int x)
int __builtin_ctzl(unsigned long x)
Count the number of consecutive zero bits in x, starting with the least significant bit. On x86 the result is undefined if x is 0, otherwise the result is between 0 and 31 inclusive. On PowerPC and ARM architectures, the result is between 0 and 32 inclusive. If defined, the result is the number of the least significant bit that is set in x.
int __builtin_ctzll(unsigned long long x)
Count the number of consecutive zero bits in x, starting with the least significant bit. On x86 the result is undefined if x is 0, otherwise the result is between 0 and 63 inclusive. On PowerPC and ARM architectures, the result is between 0 and 64 inclusive. If defined, the result is the number of the least significant bit that is set in x.

Floating-point arithmetic:

double __builtin_fabs(double x)
Return the floating-point absolute value of its argument, or NaN if the argument is NaN. This function is equivalent to the fabs() function from the <math.h> standard library, but executes faster.

Block copy with known size and alignment:

void __builtin_memcpy_aligned
(void * dst, const void * src, size_t sz, size_t al)
Copy sz bytes from the memory area at src to the memory area at dst. The source and destination memory areas must be either disjoint or identical; the behavior is undefined if they overlap. The pointers src and dst must be aligned on an al byte boundary, where al is a power of 2. The sz and al arguments must be compile-time constants. A typical invocation is
    __builtin_memcpy_aligned(&dst, &src, sizeof(dst), _Alignof(dst));
where dst and src are two objects of the same complete type. An invocation of __builtin_memcpy_aligned(dst,src,sz,al) behaves like memcpy(dst,src,sz) as defined in the <string.h> standard library. Knowing the size and alignment at compile-time enables the compiler to generate very efficient inline code for the copy.

Memory accesses with reversed endianness:

unsigned short __builtin_read16_reversed(const unsigned short * ptr)
Read a 16-bit integer at address ptr and reverse its endianness by swapping the two bytes of the result.
unsigned int __builtin_read32_reversed(const unsigned int * ptr)
Read a 32-bit integer at address ptr and reverse its endianness by swapping the four bytes of the result, as __builtin_bswap does.
void __builtin_write16_reversed(unsigned short * ptr, unsigned short x)
Reverse the endianness of x by swapping its two bytes, then write the 16-bit result at address ptr.
void __builtin_write32_reversed(unsigned int * ptr, unsigned int x)
Reverse the endianness of x by swapping its four bytes, then write the 32-bit result at address ptr.

Synchronization:

void __builtin_membar(void)
Software memory barrier. Prevent the compiler from moving memory loads and stores across the call to __builtin_membar. No processor instructions are generated, hence the hardware can still reorder memory accesses. To generate hardware memory barriers, see the “synchronization” built-in functions specific to each processor.

6.3.2  PowerPC built-in functions

Integer arithmetic:

int __builtin_mulhw(int x, int y)
Return the high 32 bits of the full 64-bit product of two signed integers.
unsigned int __builtin_mulhwu(unsigned int x, unsigned int y)
Return the high 32 bits of the full 64-bit product of two unsigned integers.
int __builtin_isel(_Bool cond, int iftrue, int iffalse)
unsigned int __builtin_uisel
(_Bool cond, unsigned int iftrue, unsigned int iffalse)
Return iftrue if cond is true, iffalse if cond is false. Expands to an isel instruction on Freescale EREF processors, and to a branch-free instruction sequence on other PowerPC processors.

Floating-point arithmetic:

double __builtin_fmadd(double x, double y, double z)
Fused multiply-add. Compute x*y + z without rounding the intermediate product x*y.
double __builtin_fmsub(double x, double y, double z)
Fused multiply-sub. Compute x*y - z without rounding the intermediate product x*y.
double __builtin_fnmadd(double x, double y, double z)
Fused multiply-add-negate. Compute -(x*y + z) without rounding the intermediate product x*y.
double __builtin_fnmsub(double x, double y, double z)
Fused multiply-sub-negate. Compute -(x*y - z) without rounding the intermediate product x*y.
double __builtin_fsqrt(double x)
Return the square root of x, like the sqrt function from the <math.h> standard library. The corresponding PowerPC instruction is optional and not supported by all processors.
double __builtin_frsqrte(double x)
Compute an estimate (with relative accuracy 1/32) of 1 / √x, the reciprocal of the square root of x. The corresponding PowerPC instruction is optional and not supported by all processors.
float __builtin_fres(float x)
Compute an estimate (with relative accuracy 1/256) of the single-precision reciprocal of x. The corresponding PowerPC instruction is optional and not supported by all processors.
double __builtin_fsel(double x, double y, double z)
Return y if x is greater or equal to 0.0. Return z if x is less than 0.0 or is NaN. The corresponding PowerPC instruction is optional and not supported by all processors.
int __builtin_fcti(double x)
Round the given floating-point number x to an integer and return this integer. The difference with the standard C conversion (int)x is that the latter rounds x towards zero, while __builtin_fcti(x) rounds x according to the current rounding mode (by default: to the nearest integer, ties round to even).

Synchronization instructions:

void __builtin_eieio(void)
Issue an eieio barrier.
void __builtin_sync(void)
Issue a sync barrier.
void __builtin_isync(void)
Issue an isync barrier.
void __builtin_lwsync(void)
Issue an lwsync barrier.
void __builtin_mbar(int level)
Issue a mbar barrier. The integer argument must be 0 or 1.
void __builtin_trap(void)
Abort the program on an unconditional trap instruction.

Cache control instructions:

void __builtin_dcbf(void * addr)
void __builtin_dcbi(void * addr)
void __builtin_dcbtls(void * addr, int level)
void __builtin_dcbz(void * addr)
void __builtin_icbi(void * addr)
void __builtin_icbtls(void * addr, int level)
Issue the corresponding cache control instruction. addr is the address of the cache block affected. level must be the constant 0 (L1 cache) or 2 (L2 cache).
void __builtin_prefetch(void * addr, int for_write, int level)
Issue a dcbt instruction if for_write is 0 and a dcbtst instruction if for_write is 1.

Access to special registers:

unsigned int __builtin_get_spr(int spr)
unsigned long long __builtin_get_spr64(int spr)
void __builtin_set_spr(int spr, unsigned int value)
void __builtin_set_spr64(int spr, unsigned long long value)
The spr argument is the number of the special register accessed. It must be a compile-time constant.

Atomic (sequentially-consistent) memory operations:

void __builtin_atomic_exchange(int * addr, int * new, int * old)
Store the current value of *addr in *old and set *addr to the value *new.
void __builtin_atomic_load(int * p, int * val)
Read the current value of *p and store it into *val.
_Bool __builtin_atomic_compare_exchange
(int * p, int * expected, int * desired)
Compare the current value of *p with *expected. If equal, set *p to the value *desired and return 1. If different, set *expected to the current of *p and return 0;
int __builtin_sync_fetch_and_add(int * p, int delta)
Add delta to the contents of *p. Return the initial value of *p before the addition.

6.3.3  x86 built-in functions

Floating-point arithmetic:

double __builtin_fsqrt(double x)
Return the square root of x, like the sqrt function from the <math.h> standard library.
double __builtin_fmax(double x, double y)
Return the greater of x and y. If x or y are NaN, the result is either x or y, but it is unspecified which.
double __builtin_fmin(double x, double y)
Return the smaller of x and y. If x or y are NaN, the result is either x or y, but it is unspecified which.
double __builtin_fmadd(double x, double y, double z)
Fused multiply-add. Compute x*y + z without rounding the intermediate product x*y. Requires a processor that supports the FMA3 extensions.
double __builtin_fmsub(double x, double y, double z)
Fused multiply-sub. Compute x*y - z without rounding the intermediate product x*y. Requires a processor that supports the FMA3 extensions.
double __builtin_fnmadd(double x, double y, double z)
Fused multiply-negate-add. Compute -(x*y) + z without rounding the intermediate product x*y. Requires a processor that supports the FMA3 extensions.
double __builtin_fnmsub(double x, double y, double z)
Fused multiply-negate-sub. Compute -(x*y) - z without rounding the intermediate product x*y. Requires a processor that supports the FMA3 extensions.

6.3.4  ARM built-in functions

Floating-point arithmetic:

double __builtin_fsqrt(double x)
Return the square root of x, like the sqrt function from the <math.h> standard library.

Synchronization instructions:

void __builtin_dmb(void)
Issue a dmb (data memory) barrier.
void __builtin_dsb(void)
Issue a dsb (data synchronization) barrier.
void __builtin_isb(void)
Issue an isb (instruction synchronization) barrier.

6.4  Program annotations

CompCert C provides a general mechanism to attach free-form annotations (text messages possibly mentioning the values of variables) to C program points, and have these annotations transported throughout compilation, all the way to the generated assembly code. Before describing this annotation mechanism, we motivate it by an example.

Motivating example

Several static analysis tools operate not at the level of C source code, but directly on executable machine code. In particular, this is the case for the computation of Worst Case Execution Times (WCET) by static analysis, as performed for instance by the aiT WCET analyzer from AbsInt.

To analyze machine code with sufficient precision, these tools sometimes require the programmers to provide additional information that cannot easily be reconstructed from the machine code alone. Consider for example the following binary search routine, whose WCET we would like to estimate:

    int bsearch(int tbl[100], int v) {
      int l = 0, u = 99;
      while (l <= u) {
        int m = (l + u) / 2;
        if (tbl[m] < v) l = m + 1;
        else if (tbl[m] > v) u = m - 1;
        else return m;
      }
      return -1;
    }

To compute a tight upper bound on the WCET, the aiT analyzer must be given two additional pieces of information. First, the while loop executes at most ⌈ log2(100) ⌉ = 7 iterations. Second, the m array index is always between 0 and 99, ensuring that the memory access tbl[m] always hits in data memory and never, for instance, in memory-mapped devices. Using aiT, this information must be provided as annotations on the machine code, for instance via a separate text file containing

    loop at "bsearch" + 0x14 max 7
    instruction at "bsearch" + 0x28 is entered with r4 = (0,99)

Note that these annotations are expressed in terms of machine code addresses ("bsearch" + 0x14) and machine registers (r4), not in terms of source program points and variables.

Enters CompCert’s source annotation mechanism. It is presented as a pseudo built-in function called __builtin_annot, taking as arguments a string literal and zero, one or several local variables. Let us use this mechanism to annotate the bsearch routine above.

    int bsearch(int tbl[100], int v) {
      int l = 0, u = 99;
      __builtin_annot("loop max 7");
      while (l <= u) {
        int m = (l + u) / 2;
        __builtin_annot("entered with %1 = (0,99)", m);
        if (tbl[m] < v) l = m + 1;
        else if (tbl[m] > v) u = m - 1;
        else return m;
      }
      return -1;
    }

We then compile this function using ccomp -Wa,-a -c bsearch.c and look at the generated assembly listing:

    ...
     4                    bsearch:
     5 0000 9421FFF0              stwu    r1, -16(r1)
     6 0004 7C0802A6              mflr    r0
     7 0008 90010008              stw     r0, 8(r1)
     8 000c 39200000              addi    r9, 0, 0
     9 0010 39400063              addi    r10, 0, 99
    10                    # annotation: loop max 7
    11                    .L100:
    12 0014 7C095000              cmpw    cr0, r9, r10
    13 0018 41810040              bt      1, .L101
    14 001c 7CE95214              add     r7, r9, r10
    15 0020 7CE50E70              srawi   r5, r7, r1
    16 0024 7CA50194              addze   r5, r5
    17                    # annotation: entered with r5 = (0,99)
    18 0028 54A8103A              rlwinm  r8, r5, 2, 0, 29 # 0xfffffffc
    19 002c 7CC3402E              lwzx    r6, r3, r8
    20 0030 7C062000              cmpw    cr0, r6, r4
    21 0034 4180001C              bt      0, .L102
    ...

We see that CompCert generates no machine code for the two __builtin_annot source statements. Instead, it produces assembly comments at the exact program points corresponding to those in the source function. These comments consist of the string literal carried by the annotation, where positional parameters %1, %2, etc, are replaced by the locations (processor registers or stack slots) of the corresponding variable arguments.

From the assembly listing above, an ad-hoc tool (not provided with CompCert) can easily generate a machine-level annotation file usable by the aiT WCET analyzer. (Future directions for CompCert include facilitating the exploitation of annotations, for instance by saving them in special debug sections of the generated executables.)

Generalizing from the example above, we see that __builtin_annot statements offer a general and flexible mechanism to mark program points and local variables in C source files, then track them all the way to assembly language. Besides helping with static analysis at the machine code level, this mechanism also facilitates manual traceability analysis between C and assembly.

Reference description

CompCert’s annotation mechanism is presented by the following two pseudo built-in functions.

void __builtin_annot(const char * msg, ...)
The first argument must be a string literal. It can be followed by arbitrarily many additional arguments, of integer, pointer or floating-point types. In the intended uses described above, the additional arguments are names of local variables, but arbitrary expressions are allowed.

The formal semantics of __builtin_annot is that of a pro forma “print” statement: the compiler assumes that every time a __builtin_annot is executed, the message and the values of the additional arguments are displayed on an hypothetical output device. In other words, an invocation of __builtin_annot is treated as an observable event in the program execution. The compiler, therefore, guarantees that __builtin_annot statements are never removed, duplicated, nor reordered; instead, they always execute at the times prescribed by the semantics of the source program, and in the same order relative to other observable events such as calls to I/O functions or volatile memory accesses.

As described in the motivational example above, the actual effect of a __builtin_annot statement is simply to generate a comment in the assembly code. This comment consists of the message carried by the annotation, where %n sequences are replaced by the machine location containing the value of the n-th additional argument, or by its value if the n-th additional argument is a compile-time constant expression of numerical type.

The location of an argument is either a machine register, an integer or floating-point constant, the name of a global variable, or a stack memory location of the form mem(sp + offset, size) where sp is the stack pointer register, offset a byte offset relative to the value of sp, and size the size in bytes of the argument. For example, on the PowerPC, register locations are R3R31 and F0F31, and stack locations are of the form mem(R1 + offset, size).

If all additional arguments are non-volatile C variables or compile-time constant expressions, it is guaranteed that no code (other than the assembly comment) will be generated for __builtin_annot. Moreover, the location displayed as a replacement for the %n sequence is guaranteed to be the location where the corresponding variable resides.

If one or several additional arguments are complex expressions (neither non-volatile variables nor constant expressions), useless code is generated to compute their values and leave them in temporary registers or stack locations. The location displayed as a replacement for the %n sequence is that of the corresponding temporary. This behavior is not particularly useful for static analysis at the machine level, and can generate useless code. It is therefore highly recommended to use only non-volatile variable names or constant expressions as additional parameters to __builtin_annot.

int __builtin_annot_intval(const char * msg, int x)
In contrast with __builtin_annot, which is used as a statement, __builtin_annot_intval is intended to be used within expressions, to track the location containing the value of a subexpression and return this value unchanged. A typical use is within array indexing expressions, to express an assertion over the array index:
    int x = tbl[__builtin_annot_intval("entered with %1=(0,99)",
                                       (lo + hi)/ 2)];

The formal semantics of __builtin_annot_intval is also the pro forma effect of displaying the message msg and the value of the integer argument x on a hypothetical output device. In addition, the value of the second argument x is returned unchanged as the result of __builtin_annot_intval.

In the compiled code, __builtin_annot_intval evaluates its argument x to some temporary register, then inserts an assembly comment equal to the text msg where occurrences of %1 are replaced by the name of this register.

6.5  Extended inline assembly

Like in GCC and Clang, inline assembly code using the asm statement can be parameterized by C expressions as operands. The actual locations of the operands (registers and memory locations), as determined during compilation, are inserted in the given assembly code fragment.

Warning: indiscriminate use of asm statements can ruin all the semantic preservation guarantees of CompCert. For this reason, support for asm statements is turned off by default and must be activated through the -finline-asm or -fall options. For the generated code to behave properly, it is the responsibility of the programmer to list (in the asm statement) the registers and memory that are modified by the assembly code, and to avoid modifying memory that is private to the compiled code, such as return addresses stored in the stack.

Examples

Here is how to use the PowerPC mulhw instruction to compute the high 32 bits of the 64-bit integer product x * y:

    int prod;
    asm ("mulhw %0,%1,%2" : "=r"(prod) : "r"(x), "r"(y));

The two arguments x and y are evaluated into registers, which get substituted for %1 and %2 (respectively) in the assembly template. Likewise, %0 is substituted by the register associated with variable prod. The three r constraint indicates that all three operands are expected in registers. The = constraint in =r means that the operand is an output.

To designate operands, symbolic names can be used instead of operand numbers. Using named operands, the mulhw example above can be written as:

    asm ("mulhw %[res],%[arg1],%[arg2]"
         : [res]"=r"(prod)
         : [arg1]"r"(x), [arg2]"r"(y));

Sometimes, inline assembly code has no result value, but modifies memory. An example is the dcba instruction of PowerPC, which allocates a cache block without reading its contents from main memory:

    asm volatile ("dcba 0, %[addr]" : : [addr]"r"(p) : "memory");

Note the absence of output operand, the volatile modifier indicating that the assembly code has side effects, and the "memory" annotation indicating that the assembly code modifies memory in ways that are not predictable by the compiler. CompCert treats all asm statements as volatile and clobbering memory, but other compilers need these annotations to produce correct code.

Some instructions operate over memory locations instead of registers. In this case, the m constraint should be used instead of r. For example, the x86 instruction fnstsw stores the FP control word in a memory location:

    unsigned short cw;
    asm ("fnstsw %0" : "=m" (cw));

Note that this asm statement has no inputs, hence the second colon can be omitted.

Other instructions demand arguments that are integer constants, instead of registers or memory locations. In this case, the i constraint should be used, and the corresponding argument must be a compile-time constant expression. For example, here is how to use the PowerPC mfspr and mtspr instructions to read and write special registers:

    #define read_spr(regno,res) \
       asm ("mfspr %0,%1" : "=r"(res) : "i"(regno))
    #define write_spr(regno,val) \
       asm volatile ("mtspr %0,%1" : : "i"(regno), "r"(val))

We already used the "memory" annotation telling the compiler that the assembly code “clobbers” (modifies unpredictably) the memory state. Likewise, if the assembly code modifies registers other than that of the output operand, the names of those registers must be given in the clobber list. For example, here is an implementation of atomic test-and-set using the x86 locked-exchange instruction:

    int testandset(int * p)
    { // store 1 in *p and return the previous value of *p
      int res;
      asm volatile
        ("movl $1, %%eax\n\t"
         "xchgl (%[addr]), %%eax\n\t"
         "movl %%eax, %[oldval]"
         : [oldval]"=r"(res) : [addr]"r"(p) : "eax", "memory");
      return res;

Note that the assembly template can contain several instructions, using \n\t in the template to separate the instructions. Also note the use of %% in the assembly template to stand for a single % character in the generated assembly code.

Since the template uses eax as a temporary register, we must list register eax as clobbered. Besides informing the compiler that the previous contents of eax are destroyed, this clobber annotation also ensures that none of the asm operands (res and p) are allocated to eax.

On 32-bit target platforms, register operands of type long long or unsigned long long (64-bit integers) need special handling, since CompCert allocates them into pairs of 32-bit registers. The assembly template must use %R to refer to the most significant half of the register pair, and %Q to refer to the least significant half. For example, here is how to use the x86 rdtsc instruction to read the time stamp counter as an unsigned 64-bit integer:

    unsigned long long rdtsc(void)
    {
      unsigned long long res;
      asm("rdtsc\n\t"
          "movl %%eax, %Q0\n\t"
          "movl %%edx, %R0\n\t"
          : "=r" (res) : : "eax", "edx");
      return res;
    }
Reference description

The syntax of extended inline assembly is as follows:

statement::=… ∣ asm   (const|volatile)*   (   "template"   asm-operands   )
asm-operands::=: asm-output :  asm-inputs :  asm-clobbers 
 : asm-output :  asm-inputs 
 : asm-output 
  
asm-outputs::=asm-arg ?
asm-inputs::=asm-arg ,  … ,  asm-arg
asm-clobbers::="resource" ,  … ,  "resource"
asm-arg::=([ name ])?  "constraint" (  expr  )

An asm statement carries an assembly template (a string literal) and up to 3 lists of operands separated by colons: a list of zero or one output expressions (results produced by the assembly code); a list of zero, one or several input expressions (arguments used by the assembly code); and a list of zero, one or several resources (e.g. processor registers) that are “clobbered” by the assembly code.

The assembly template is a string literal possibly containing placeholders marked with a % (percent) character: either numbered placeholders (%0, %1, etc) or named placeholders (%[name]). During compilation, the placeholders are replaced by the locations of the corresponding operands, then the resulting text is given to the assembler as is. A %% sequence in the template is translated to a single % character in the text. Operands are numbered consecutively starting with %0 for the first output operand (if any) and continuing with the input operands. Instead of numbers, the template can also refer to operands by their optional names, using the %[name] notation.

The assembly outputs and the inputs are comma-separated, possibly empty lists of C expressions preceded by an optional name between brackets and a mandatory constraint (a string literal). CompCert can handle zero or one output; two outputs or more cause a compile-time error. For inputs, the following constraints are supported:

Input constraintMeaning
"r"Register. The expression is evaluated into a processor register, whose name is inserted in the assembly template.
"m"Memory location. The expression is evaluated into a memory location, whose address is inserted (as a valid processor addressing mode) in the assembly template.
"i"Integer immediate. The expression must be a compile-time constant. Its value is inserted in the assembly template as a decimal literal.

For outputs, the C expressions must be l-values, and the following constraints are supported:

Input constraintMeaning
"=r"Register. This is either the register allocated to the output expression (if it is a local variable allocated to a register), or a temporary register chosen by CompCert, whose value is assigned to the expression just after the assembly code.
"=m"Memory location. This is the memory location of the output expression. Its address is inserted (as a valid processor addressing mode) in the assembly template.

The fourth, optional component of an asm statement is a comma-separated list of resources that are “clobbered” by the assembly code fragment, i.e. set to unpredictable values. The resources, given as string literals, are either processor register names or the special resources "memory" and "cc", denoting unpredictable changes to the memory and the processor’s condition codes, respectively. CompCert always assumes that inline assembly can modify memory and condition codes in unpredictable ways, even if the "memory" and "cc" clobbers are not specified. The register names are case-insensitive and depend on the target processor, as follows:

ProcessorRegister names
ARMinteger registers: "r0", "r1", …, "r12", "r14"
 VFP registers: "f0", "f1", …, "f15"
PowerPCinteger registers: "r0", "r3", …, "r12", "r14", …, "r31"
 FP registers: "f0", "f1", "f2", …, "f31"
x86 32 bitsinteger registers: "eax", "ebx", "ecx", "edx", "esi", "edi", "ebp"
 XMM registers: "xmm0", "xmm1", …, "xmm7"
x86 64 bitsinteger registers: "rax", "rbx", "rcx", "rdx", "rsi", "rdi", "rbp", "r8", …"r15"
 XMM registers: "xmm0", "xmm1", …, "xmm15"

Registers not listed above are reserved for use by the compiler and must not be modified by inline assembly code. For example, the stack pointer register (r13 for ARM, r1 for PowerPC, esp/rsp for x86) must be preserved.


Previous Up Next