Pre-processed assembler and C integer literals

The Problem

It is very convenient to use a C preprocessor before running an assembler, so that constants can be shared between C code and assembly code. GCC makes this very easy, by pre-processing assembler sources if specified on the command line or if the extension is “.S” (“.s” is not preprocessed).

A problem arises if the constants defined in the header file use valid C syntax to specify that an integer literal is unsigned, long, etc. In embedded programming, it is very common to add the “UL” suffix to address literals or while building bit masks. This prevents problems related to undesired signed comparison or undefined behavior of shifts left (<<) of signed values. The suffix syntax used (ie: “0xdeadbeefUL” or “1234U” or “1234L“) is incompatible with most assemblers, including the GNU assembler. After preprocessing, these constants will appear with suffix in the assembler source, thus yielding weird error messages.

Example of problematic constants:

#define PTEHI_V                 (0x80000000UL)
#define PTEHI_VSID_MASK         (0x7FFFFF80UL)

In the assembly code (PowerPC in this case):

addis   r4,0,(PTEHI_VSID_MASK >> 16)
ori     r4,r4,(PTEHI_VSID_MASK & 0xffff)

After preprocessing, the source becomes:

addis   r4,0,((0x7FFFFF80UL) >> 16)
ori     r4,r4,((0x7FFFFF80UL) & 0xffff)

The error message (quite valid) from the assembler:

test.s:1: Error: missing ')'
test.s:1: Error: missing ')'
test.s:1: Error: operand out of range (0x7fffff80 is not between 0xffff0000 and 0x0000ffff)
test.s:1: Error: syntax error; found `U' but expected `,'
test.s:1: Error: junk at end of line: `UL)>>16)'
test.s:2: Error: missing ')'
test.s:2: Error: missing ')'
test.s:2: Error: operand out of range (0x7fffff80 is not between 0x00000000 and 0x0000ffff)
test.s:2: Error: syntax error; found `U' but expected `,'
test.s:2: Error: junk at end of line: `UL)&0xffff)'

The Solution

It might seem like this is trivial to fix:

#if !defined(__ASSEMBLER__)
#define PTEHI_V                 (0x80000000UL)
#define PTEHI_VSID_MASK         (0x7FFFFF80UL)
#else
#define PTEHI_V                 (0x80000000)
#define PTEHI_VSID_MASK         (0x7FFFFF80)
#endif /* !defined(__ASSEMBLER__) */

I’ve seen this solution a few times. However, that involves needless duplication of values and is error-prone.

Another option is to “forget” about the suffixes because you “know” that ints have the same size as longs on your platform, or some other similar argument. Believe it or not, this is very common :-|. You may not actually be sure

My middle-ground solution is to define macros that deal with adding the suffixes only in C. It is actually a very common method (not at all my invention), and used often in the Linux kernel for instance. Here are the macros:

#if defined(__ASSEMBLER__)
 
#if !defined(_UL)
#define _U(x) x
#define _L(x) x
#define _UL(x) x
#endif /* !defined(UL) */
 
#else
 
#if !defined(_UL)
#define _U(x) x ## U
#define _L(x) x ## L
#define _UL(x) x ## UL
#endif /* !defined(UL) */
 
#endif /* defined(__ASSEMBLER__) */

This way, you can write the following in your header:

#define PTEHI_V                 (_UL(0x80000000))
#define PTEHI_VSID_MASK         (_UL(0x7FFFFF80))

and the preprocessor deals with the rest.

The macro definitions for _U(), _L() and _UL() can be put in some global configuration constants header included on the compiler’s command-line (like Linux’s previous config.h).

Leave a Reply