Compilation of Stack-Based Languages, status April 1999
This project was supported by the FWF for two years starting in
February 1996 (research project P11231). Since February 1998, the
project has stalled, but I expect to continue it some time in the
future.
Goal
The goal of the project Compilation of Stack-Based Languages
is to advance the state of the art in compiling languages like Forth,
Postscript and JavaVM to fast machine code. These languages have
properties that make traditional compilation techniques badly suited
for them:
- Stack
- items have to be allocated effectively to registers to
get fast machine code.
- Very fast compilation
- is required, because of the interactive
nature (Forth, Postscript), or because of the compile-and-run
infrastructure (JavaVM just-in-time compilation).
Current Results
We currently have a working compiler from Forth to machine code,
written in Forth:
- It produces executable machine code for the MIPS architecture.
- It can compile itself, repeatedly.
- It runs on Gforth, an
interpretive (direct threaded code) implementation of ANS Forth. The
native code compiler is fully integrated with the interpreter, i.e.,
the interpreter can call native code and vice versa.
- It uses tree parsing (of DAGs) for instruction selection,
backwards list scheduling for instruction scheduling, and a simple
basic block register allocation algorithm (with a new, simple, but
effective register shuffling algorithm). Implemented naively, these
phases would require nine passes; we integrate them into four passes,
for increased compilation speed.
- We use a linear intermediate representation that is quickly
executable, because it is threaded code [bell73,kogge82]; this
representation is ideal for code replication optimizations like
inlining.
The main results up to now are:
- The speedup over the interpreted Gforth is 3.32-4.46 for small
benchmarks and 1.41 for the self-compilation benchmark. The
self-compilation speedup is a little disappointing; we think that the
main reason for this is that even the RAFTS-compiled version of RAFTS
executes a significant number of threaded-code words.
- Compilation of Forth via C [ertl&maierhofer95] gives code that is
0.89-1.97 times faster than the code produced by our native code
compiler. The lower performance of our native code compiler on most
benchmarks is caused by the lack of interprocedural register
allocation combined with very short basic blocks (which are due to a
very high call frequency).
- The compilation speed is currently 2.9 times slower than for
Gforth's Forth-to-threaded-code compiler. A slowdown is to be
expected, because RAFTS has to perform the same I/O and dictionary
lookup tasks as Gforth (and currently most of these tasks actually use
the same threaded code), but has a much more complicated and
time-consuming code generator; still, we think that compilation speed
can be improved.
Publications
Our final report [ertl&pirker98] is a
short paper written for a computer science audience. For more
specific information, read our EuroForth papers [ertl&pirker96, ertl&pirker97] (but they assume a Forth
background). There is also a position paper that gives an overview of
the ideas [ertl92]. Finally, my dissertation
[ertl96diss] discusses RAFTS and other Forth
implementation techniques.
Future Work
- Interprocedural register allocation
- We will probably first do
a standard graph colouring register allocator. It may be too slow, so
we also think about a fast register allocation method that exploits
the nice liveness properties of stack items. While this method may
not produce as good code as established graph colouring techniques, we
hope that the improved compile time will be worth this price.
- Code replication
- The main purpose of code replication is to
produce longer basic blocks, that allow our basic block code
generation techniques to be more effective. Currently the plan is to
combine a conservative full inliner with an aggressive partial inliner
and a control flow join delaying optimization. This should be pretty
easy with our new linear intermediate representation.
- Tail-call optimization
- turns a call followed by an exit into a
jump. It should work even better in combination with partial inlining
(because partial inlining turns some calls into tail-calls).
References
- [bell73] James R. Bell.
-
Threaded code.
Communications of the ACM, 16 no. 6 pp. 370-372, 1973.
- [ertl92] M. Anton Ertl.
-
A
new approach
to Forth native code generation.
In EuroForth '92, pages 73-78, Southampton, England, 1992.
MicroProcessor Engineering.
- [ertl96diss] M. Anton Ertl.
-
Implementation of Stack-Based Languages on Register Machines.
Dissertation, Technische Universit\"at Wien, Austria, 1996.
- [ertl&maierhofer95] M. Anton Ertl and Martin
Maierhofer.
-
Translating Forth to efficient C.
In EuroForth '95 Conference Proceedings, Schlo\ss Dagstuhl,
Germany, 1995.
- [ertl&pirker96] M. Anton Ertl and Christian
Pirker.
-
RAFTS for basic blocks: A progress report on Forth native code
compilation.
In EuroForth '96 Conference Proceedings, St. Petersburg, Russia,
1996.
- [ertl&pirker97] M. Anton Ertl and Christian
Pirker.
-
The
structure of a Forth native code compiler.
In EuroForth '97 Conference Proceedings, pages 107-116, Oxford,
1997.
- [ertl&pirker98] M. Anton Ertl and Christian
Pirker.
-
Compilation of stack-based languages (abschlu\ssbericht).
Final report to fwf for research project p11231, Institut f\"ur
Computersprachen, Technische Universit\"at Wien, 1998.
- [kogge82] Peter M. Kogge.
-
An architectural trail to threaded-code systems.
Computer, pages 22-32, March 1982.
Anton Ertl