Semantics

This text explains the concepts of semantics in ANS Forth, where they are discussed in the standard, and how this relates to traditional implementations.

What are semantics in ANS Forth?

The semantics of a word is its meaning. As far as this standard is concerned, the semantics of a Forth word is the action or behaviour that happens when the word occurs in a specific context.

In ANS Forth (named) Forth words have two semantics: interpretation and compilation semantics.

The standard also talks about execution, run-time and initiation semantics, but these are just used to define interpretation and/or compilation semantics.

The default interpretation and compilation semantics of all words are defined in Section 3.4.3.2 and 3.4.3.3. Note that many standard words have non-default semantics that are specified in the respective glossary entries. Moreover, IMMEDIATE changes the compilation semantics of a word. Note that (anonymous) definitions defined with :NONAME have only one semantics, represented by its execution token (xt).

When are the various semantics used?

The semantics are used in the following contexts:
text interpreter, interpret state
perform interpretation semantics [3.4 b) 1.]
text interpreter, compile state
perform compilation semantics [3.4 b) 2.]
', [']
xt represents interpretation semantics [6.1.0070, last sentence].
POSTPONE
compile compilation semantics.
[COMPILE]
compile non-default or perform default compilation semantics.
FIND
unclear, RFI 8 pending.
SEARCH-WORDLIST
unclear, should be clarified with RFI 8.

What is the relation between semantics and STATE?

There is no a priori relationship between STATE and semantics. In particular, the interpretation semantics can be performed in any state, via ', ['], EXECUTE and COMPILE, [RFI7]. However, it is an ambiguous condition to perform compilation semantics in interpret state [Philip Preston's RFI].

Implementation strategies

Many implementations are possible, have been proposed and implemented. The following descriptions of two possible implementations are intended to enhance understanding.

Straightforward implementation: Dual-xt words

The most straightforward implementation of these concepts would be to have two xts per named word, one for interpretation semantics and one for compilation semantics [1]. A more practical implementation strategy uses a double-cell compilation token instead of a single-cell compilation xt; you can read about this in some postings; most of this page was written earlier, so this section describes the pure dual-xt approach.

Terminological note: The difference between words (names) and definitions (possibly anonymous procedures identified by xts) is especially important in this section, because here the correspondence between these concepts is not 1:1.

How can such an implementation implement various operations?

(Operations accessing) interpretation semantics
Access the interpretation xt.
(Operations accessing) compilation semantics
Access the compilation xt. E.g., the text interpreter in compile state EXECUTEs the compilation xt; POSTPONE COMPILE,s the compilation xt.
[COMPILE]
is a bit difficult to implement, because it requires knowing whether the compilation semantics is default. One solution is to have a flag or to make default compilation semantics easy to recognize; another is not to implement [COMPILE].

How can such an implementation implement the standard words?

Words with default semantics (e.g., + and user-defined non-immediate words)
The interpretation xt is the usual xt (what you want to get with '). The compilation xt is the xt of a definition that COMPILE,s the usual xt.
Immediate words (e.g., \ and user-defined immediate words)
i.e., words with the same interpretation and compilation semantics. The interpretation xt is the usual xt; the compilation xt is the same xt as the interpretation xt.
Words without standard interpretation semantics (e.g., IF, EXIT)
The interpretation xt can be anything, e.g., a special value indicating the absence of interpretation semantics, the xt of a definition performing -14 throw, or the same as the compilation xt (the latter is quite practical if you have to deal with lots of (non-standard) code that ticks such words). The compilation xt is for a definition that performs the compilation semantics.
Words with interpretation semantics and non-default, non-immediate compilation semantics (combined words) (S", TO)
are the most general case. The interpretation xt is the xt of a definition that performs the interpretation semantics. The compilation xt is the xt of a definition that performs the compilation semantics of the word.

Relation with traditional implementations

A popular implementation scheme is to associate one execution token (xt, traditionally implemented as CFA) and an immediate flag with each named word.

This implementation can be seen as follows in standard terms: The interpretation semantics of such a word is to EXECUTE the xt; the compilation semantics of non-immediate words is to COMPILE, the xt; the compilation semantics of immediate words is to EXECUTE the xt.

How can such an implementation implement the standard words?

Words with default semantics (e.g., + and user-defined non-immediate words)
are implemented as non-immediate words.
Immediate words (e.g., \ and user-defined immediate words)
i.e., words with the same interpretation and compilation semantics are implemented as immediate words.
Words without interpretation semantics and with non-default compilation semantics (e.g., IF)
can be implemented as immediate words.
Words without interpretation semantics and with default compilation semantics (e.g., EXIT)
can be implemented as non-immediate; you cannot implement them as immediate words if you support [COMPILE].
Words with interpretation semantics and non-default, non-immediate compilation semantics (combined words) (S", TO)
are the most general case. You can either hope that ticking these words will become non-standard and implement them as STATE-smart immediate words.

Or you can use the technique used in combined.zip to implement a solution that works correctly even when these words are ticked. This technique works like this: Such words are coded like STATE-smart immediate words, but ', (and [']) recognize such words and supply an xt for the correct (non-STATE-smart) interpretation semantics. This can be as simple as:

: ' ( "name" -- xt )
  \ implement smart ' in terms of dumb ' and [']
  ' CASE
    ['] S" OF ['] S"-int ENDOF
    ['] TO OF ['] TO-int ENDOF
    DUP
  ENDCASE ;
: ['] ' POSTPONE LITERAL ; IMMEDIATE

How can such an implementation implement various operations?

(Operations accessing) interpretation semantics
For most word classes listed above, the interpretation semantics is simply represented by the xt. The exception is the last class (combined words), and the solution is given above.
(Operations accessing) compilation semantics
If the immediate flag is clear, the compilation semantics is to COMPILE, the xt; if the immediate flag is set, the compilation semantics is to EXECUTE the xt. Performing compilation semantics is only legal in compilation state [Philip Preston's RFI], so there is no need to special-case combined words, unlike with interpretation semantics; you still may want to do so to cater for programs that were standard before that RFI.
Text interpretation in interpret state
It is possible to optimize this operation: Because the xt is guaranteed to be used in interpret state, the text interpreter can use the raw (STATE-smart) xt for combined words and does not need to replace it with a STATE-insensitive xt.
[COMPILE]
This operation can be implemented as COMPILE,ing the xt, regardless of the immediate flag. This implementation is much simpler than what you get by literally translating the specification. Thus you can see that [COMPILE] fits this implementation model quite well.

What's the deal with STATE-smartness?

The problems with STATE-smart words like
: 2DUP ( n1 n2 -- n1 n2 n1 n2 )
    STATE @ IF
        POSTPONE OVER POSTPONE OVER
    ELSE
        OVER OVER
    THEN ; IMMEDIATE
are: [RFI7]

I.e., you must not implement any standard words that can be ticked (i.e., that have interpretation semantics), as STATE-smart words; and you must not implement user-defined words (all of which can be ticked) like constants as STATE-smart words, either [RFI7] (unless the user explicitly writes them as STATE-smart, of course).

Moreover, if you support [COMPILE], you must not implement any standard words with default compilation semantics (e.g., EXIT and COMPILE,) as STATE-smart (or otherwise immediate) words.

For a longer discussion of the issue, read State-smartness: Why it is Evil and How to Exorcise it.

Footnotes

[1] If you have heard about multiple-code-field (MCF) words, that's a little different: they have several xts per word, but these xts are not independent, but implement different operations on the same data.
Anton Ertl