Assembling Span-Dependent Instructions

Some instruction sets support encoding immediate operands with different sizes. E.g., on AMD64 the assembly language statement

movl 8(%rdi),%rax

can be encoded as

8b 47 08

or as

8b 87 08 00 00 00

The first form can encode only offsets in the range -128..127, while the latter can encode offsets in the range -2147483648..2147483647.

Architectures with fixed-size instructions tend to split operations with large immediate operands into more instructions than operations with small immediate operands, so the following also applies to them.

If the immediate operand is a constant in the assembly language source code, it is straightforward to just select the smallest encoding for which the operand fits.

However, if the immediate operand refers to labels in the assembly program, e.g., for a branch instruction, things become more complicated: the minimum size of the operand may depend on how the program is assembled, and how the program is assembled depends on the minimum size of the operand.

The common case is a relative branch to some label, with some code between. In this case, making the code larger can only make the minimum size of the operand larger, and this is relatively benign, as we will see below.

However, one can also construct cases where making the code larger can reduce the minimum size of the immediate operand, e.g.:

foo:
        movl foo+133-bar(%rdi),%eax
bar:

If you choose the large encoding, the offset is 127 and would seem to fit in the small encoding. But if you choose the small encoding, the offset is 130 and does not fit in the small encoding. So in this case, only the large encoding is a valid solution.

In general, such expressions involving labels may depend on how other instructions are encoded, which may depend on yet other instructions, with possible dependency cycles. In general, as Szymanski proved in 1978, this problem is NP-complete.

Now, as soon as anybody mentions NP-completeness, many people become very gullible; at least I see no other explanation for the amount of misunderstandings propagated about this topic, which prompted me to write this web page.

Szymanski's algorithm

Szymanski himself mentions the approaches discussed below, then describes a more sophisticated graph-based algorithm (read the paper for details). For dealing with cases like the second movl instruction above, he gives criteria for identifying whether an instruction may suffer from this condition, and suggests that they should be unconditionally compiled in the long encoding (and gas actually does that, even if foo+133-bar is replaced with foo-bar, for which the shorter encoding is sufficient in either case). Given that such cases are exceedingly rare, that's a good approach.

Start big and shrink

A simpler algorithm is to start by using the largest encoding of all instructions, determine the values of the labels, then make those instructions smaller that can be made smaller. This can be repeated until a steady-state is reached, or it can be stopped earlier to reduce assembling time.

This algorithm can fail if the program contains nasty instructions like our second movl: after the first pass, it decides to use the smaller encoding for the movl, but that does not work.

If you then want to make this instruction larger again, the algorithm is no longer guaranteed to terminate (see the end of section 2 for an example that starts small).

One solution is to identify the nasty instructions as described by Szymanski, and always keep these big.

A disadvantage of this algorithm is that its steady state may be worse than for the algorithm below.

Start small and grow

Another simple algorithm is to start by using the smallest encoding of all instructions, determine the values of the labels, then make those instructions larger that need to be larger. Repeat until a steady-state is reached. Note that you never must shrink an instruction again, otherwise the algorithm is no longer guaranteed to terminate.

Compared to the start-big approach, this algorithm is simpler (no need to identify nasty instructions, the algorithm grows them just like the others) and can produce better results, but typically need one additional pass.

Compared to Szymanski's algorithm, this algorithm is simpler (no need to identify nasty instructions) and runs slower (one pass more). Compared to Szymanski's algorithm with a gas-like imprecise identification of nasty instructions, the algorithm may produce smaller code; but given that such instructions do not really occur in real code, that's not a practical advantage.

Misunderstandings

Over the years, a number of misunderstandings of this topic have been posted. Comparing a recent one to the paper, I can imagine how the paper spawned them: The paper presents the start-big algorithm as just working, then presents a start-small-then-grow-or-shrink algorithm, and an example that does not converge (i.e., it does not terminate). It does not say that the same example will not terminate if you start-big-and-shrink-or-grow or fail if you start-big-and-shrink. The paper also explains that the general problem is NP-complete, and also that this is not practically relevant. So if you do not think about it too much, you might think that start-big is the safe choice, while start-small has problems with termination (and a few decades later that is conflated with the NP-completeness).

References

Thomas G. Szymanski: Assembling Code for Machines with Span-Dependent Instructions, CACM 21(4), 1978, p. 300-308

Anton Ertl