comp.lang.forth Frequently Asked Questions (1/6): General/Misc: Miscellaneous

5.1 Where can I find a C-to-Forth compiler?

We, ( CodeGen, Inc.) sell a C-to-Fcode compiler. Well, it actually generates IEEE-1275 Forth that then must be run through a tokenizer.
Really, it generates pretty ugly Forth code. It's easy to generate lousy Forth, but it's very difficult to generate nice clean optimized Forth. C and stack-based languages don't mix too well. I end up faking a C variable stack-frame using a Forth $frame variable for local vars.

Stephen Pelc writes:

MPE has produced a C to stack-machine compiler. This generates tokens for a 2-stack virtual machine. The code quality is such that the token space used by compiled programs is better than that of the commercial C compilers we have tested against. This a consequence of the virtual machine design. However, to achieve this the virtual machine design has local variable support.
The tokens can then be back end interpreted, or translated to a Forth system. The translater can be written in high level Forth, and is largely portable, except for the target architecture sections.

These are not shareware tools, and were written to support a portable binary system.

5.2 Where can I find a Forth-to-C compiler?

An unsupported prototype Forth-to-C compiler is available at http://www.complang.tuwien.ac.at/forth/forth2c.tar.gz. It is described in the EuroForth'95 paper http://www.complang.tuwien.ac.at/papers/ertl&maierhofer95.ps.gz. Another Forth-to-C compiler is supplied with Rob Chapman's Timbre system.

Many packages for data structuring facilities like Pascal's RECORDs and C's structs have been posted. E.g., the structures of the Forth Scientific Library ( http://www.taygeta.com/fsl/fsl_structs.html) or the structures supplied with Gforth http://www.complang.tuwien.ac.at/forth/struct.fs.

5.4 Why does THEN finish an IF structure?

Some people find the way THEN is used in Forth unnatural, others do not.

According to Webster's New Encyclopedic Dictionary, "then" (adv.) has the following meanings:

... 2b: following next after in order ... 3d: as a necessary consequence (if you were there, then you saw them).

Forth's THEN has the meaning 2b, whereas THEN in Pascal and other programming languages has the meaning 3d.

If you don't like to use THEN in this way, you can easily define ENDIF as a replacement:

: ENDIF  POSTPONE THEN ; IMMEDIATE

5.5 What is threaded code? What are the differences between the different threading techniques?

Threaded code is a way of implementing virtual machine interpreters. You can find a more in-depth explanation at http://www.complang.tuwien.ac.at/forth/threaded-code.html.

5.6 Has anyone written a Forth which compiles to Java bytecode?

Paul Curtis writes:

The JVM, although a stack machine, can't really be used to compile Forth efficiently. Why? Well, there are a number of reasons:

The maximum stack depth of a called method must be known in advance. JVM Spec, p. 111]
JVM methods can only return a single object to the caller. Thus, a stack effect ( n1 n2 -- n3 n4 ) just isn't possible.
There is no direct support for unsigned quantities.
CATCH and THROW can't be resolved easily; you need to catch exceptions using exception tables. This doesn't match Forth's model too well. JVM Spec, p. 112]
You'd need to extend Forth to generate the attributes required for Java methods.
There is no such thing as pointer arithmetic.
You can't take one thing on the stack and recast it to another type.
You can't manufacture objects out of raw bytes. This is a security issue.
There is no support for the return stack.

That said, it is possible to write something Forth-like using JVM bytecodes, but you can't use the JVM stack to implement the Forth stack. ...

If you're serious, try getting Jasmin and programming directly on the JVM.

5.7 What about translating Java bytecode to Forth?

Some of the non-trivial pieces in translating JavaVM to Forth, that we have identified, are:

garbage collection
threads
control structures (branches->ANS Forth's seven universal control structure words)
exceptions
subroutines (JavaVM does not specify that a subroutine returns to its caller)
JavaVM makes the same mistake as Forth standards up to Forth-83: It specifies type sizes (e.g., a JavaVM int is always 32-bit). A few operators have to be added to support this.
The native libraries (without them JavaVM can do nothing).

5.8 How is Postscript related to Forth?

Postscript is similar to Forth in having a data stack, being interactive, and supporting wordlists. Postscript differs from Forth in using run-time name binding, run-time typing for type-checking and overloading resolution, implementing control structures through words that take anonymous definitions as parameters, in terminology (I have used Forth terminology here), and in other respects.

Concerning the question of whether Forth influenced Postscript, the Postscript manual (first edition) claims that Postscript and its predecessors were conceived and developed independently of Forth. However, also according to John Warnock Postscript's "syntax looks a little bit like Forth, because it is derived from Forth". Jim Bowery's Genesis of Postscript mentions Forth.

5.9 How about running Forth without OS?

A Forth system running on the bare hardware is also known as a native system (in contrast to a hosted system, which runs on an OS). Don't confuse this with native-code systems (which means that the system compiles Forth code to machine code); hosted native-code systems exist as well as native threaded-code systems.

In the beginning Forth systems were native and performed the functions of an OS (from talking to hardware to multi-user multi-tasking). On embedded controllers Forth systems are usually still native. For servers and desktops most Forth-systems nowadays are hosted, because this avoids the necessity to write drivers for the wide variety of hardware available for these systems, and because it makes it easier for the user to use both Forth and his favourite other software on the host OS. A notable exception to this trend are are the native systems from Athena.

5.10 How about writing an OS in Forth?

Native Forth systems can be seen as OSs written in Forth, so it is certainly possible. Several times projects to write an OS in Forth were proposed. Other posters mentioned the following reasons why they do not participate in such a project:

If you want to write an OS in Forth for a desktop or server systems, the problems are the same as for native Forth systems (and any other effort to write a new OS): the need to write drivers for a wide variety of hardware, and few applications running on the OS.

To get around the application problem, some posters have suggested writing an OS that is API or even ABI compatible with an existing OS like Linux. If the purpose of the project is to provide an exercise, the resulting amount of work seems excessively large; if the purpose is to get an OS, this variant would be pretty pointless, as there is already the other OS. And if the purpose is to show off Forth (e.g., by having smaller code size), there are easier projects for that, the compatibility requirement eliminates some of the potential advantages, and not that many people care about the code size of an OS kernel enough to be impressed.

5.11 What is a tethered/umbilical Forth system?

A tethered Forth system is a cross-development environment where the host and the target are connected at run-time (during development), allowing full interactive use of the target system without requiring all the space that a full-blown Forth system would require on the target. E.g., the headers can be kept completely in the host. Tethered systems may also provide the compilation speed and some of the conveniences of a full-blown Forth system on the host.

Tethered systems are also called umbilical systems.

5.12 How about interpreting by compiling and immediately executing?

Such ideas have been proposed several times, to allow using control structures interpretively, among other benefits. It has also been implemented in some systems (e.g., Christophe Lavarenne's Free-Forth). In most proposals a line would be compiled and then executed.

However, such systems behave quite differently from ordinary Forth systems in some respects, in particular when dealing with parsing words. E.g., consider:

 ' + .
: my-' ' ;
my-' + .

In classical Forth ' parses + in both cases. This behaviour is hard to achieve in a compile-then-execute Forth system, unless it works a word at a time, but then it would have none of the benefits, either.

5.13 Why does a decimal point not indicate floating-point?

In the old days Forth did not have floating-point numbers; instead, fixed-point arithmetic was used, usually on double-cell numbers. So, a decimal point indicated a double number (the position of the decimal point was stored in the variable DPL for potential use by fixed-boint software).

In ANS Forth, a decimal point at the end indicates a double-cell number, and an E in the number indicates a floating-point number (when BASE is decimal).

All other ways to write numbers are system-dependent. However, most systems still interpret decimal points within a number as indicating double-cell numbers.

5.14 eForth: Who wrote it, and how do the versions differ?

eForth was written by Bill Muench, and was originally metacompiled. On request from C. H. Ting he also produced a version that was written in MASM, and had many words removed (the user should add them back in as an educational exercise).

5.15 Why is there a separate stack for FP?

More specifically, why do we put doubles, addresses, string descriptors etc. on the data stack? Why not FP values as well?

There is little overlap between integer and FP operations, basically just the data-movement stuff @ ! DUP SWAP etc. By contrast, we use, e.g., +, -, and u< on signed and unsigned integers, and on addresses, and being able to access the individual cells of doubles allows implementing various more refined operations (e.g., triple-cell integers in M*/). Having the individual cells of a c-addr u string descriptor accessible is the foundation for a number of string operations, e.g., checking whether a string is a prefix of another string. If we put these data on separate stacks, we would have to introduce and then use all kinds of words for transfering cells between these stacks; that's not the case for FP data.
How many cells should an FP value have on the data stack? 1? 2? 4? None of these answers is good for all systems. Or maybe an FP value should be an opaque type with an unspecified size? Then you cannot access anything below it.
FP hardware usually works on separate FP registers, and transfers between FP registers and integer registers are often pretty expensive (sometimes they require a memory store and memory load). Many Forth systems keep the top-of-data-stack in an integer register. On such a system with a shared integer+FP-stack an FP operation would imply a move from the integer registers to the FP registers and back. By contrast, with a separate FP stack, the system can keep the top-of-FP-stack in an FP register. On the 387 the whole Forth FP stack is often kept in the 387 FP stack.

Next Previous Contents

5. Miscellaneous