movl 8(%rdi),%raxcan be encoded as
8b 47 08or as
8b 87 08 00 00 00The first form can encode only offsets in the range -128..127, while the latter can encode offsets in the range -2147483648..2147483647.
Architectures with fixed-size instructions tend to split operations with large immediate operands into more instructions than operations with small immediate operands, so the following also applies to them.
If the immediate operand is a constant in the assembly language source code, it is straightforward to just select the smallest encoding for which the operand fits.
However, if the immediate operand refers to labels in the assembly program, e.g., for a branch instruction, things become more complicated: the minimum size of the operand may depend on how the program is assembled, and how the program is assembled depends on the minimum size of the operand.
The common case is a relative branch to some label, with some code between. In this case, making the code larger can only make the minimum size of the operand larger, and this is relatively benign, as we will see below.
However, one can also construct cases where making the code larger can reduce the minimum size of the immediate operand, e.g.:
foo: movl foo+133-bar(%rdi),%eax bar:If you choose the large encoding, the offset is 127 and would seem to fit in the small encoding. But if you choose the small encoding, the offset is 130 and does not fit in the small encoding. So in this case, only the large encoding is a valid solution.
In general, such expressions involving labels may depend on how other instructions are encoded, which may depend on yet other instructions, with possible dependency cycles. In general, as Szymanski proved in 1978, this problem is NP-complete.
Now, as soon as anybody mentions NP-completeness, many people become very gullible; at least I see no other explanation for the amount of misunderstandings propagated about this topic, which prompted me to write this web page.
movl
instruction above, he gives criteria for
identifying whether an instruction may suffer from this condition, and
suggests that they should be unconditionally compiled in the long
encoding (and gas actually does that, even if foo+133-bar
is replaced with foo-bar
, for which the shorter encoding
is sufficient in either case). Given that such cases are exceedingly
rare, that's a good approach.
This algorithm can fail if the program contains nasty instructions
like our second movl
: after the first pass, it decides
to use the smaller encoding for the movl
, but that does
not work.
If you then want to make this instruction larger again, the algorithm is no longer guaranteed to terminate (see the end of section 2 for an example that starts small).
One solution is to identify the nasty instructions as described by Szymanski, and always keep these big.
A disadvantage of this algorithm is that its steady state may be worse than for the algorithm below.
Compared to the start-big approach, this algorithm is simpler (no need to identify nasty instructions, the algorithm grows them just like the others) and can produce better results, but typically need one additional pass.
Compared to Szymanski's algorithm, this algorithm is simpler (no need to identify nasty instructions) and runs slower (one pass more). Compared to Szymanski's algorithm with a gas-like imprecise identification of nasty instructions, the algorithm may produce smaller code; but given that such instructions do not really occur in real code, that's not a practical advantage.