Semantics
This text explains the concepts of semantics in ANS Forth, where they
are discussed in the standard, and how this relates to traditional
implementations.
What are semantics in ANS Forth?
The semantics of a word is its meaning. As far as this standard is
concerned, the semantics of a Forth word is the action or behaviour
that happens when the word occurs in a specific context.
In ANS Forth (named) Forth words have two semantics: interpretation
and compilation semantics.
The standard also talks about execution, run-time and initiation
semantics, but these are just used to define interpretation and/or
compilation semantics.
The default interpretation and compilation semantics of all words
are defined in Section 3.4.3.2 and 3.4.3.3. Note that many standard words
have non-default semantics that are specified in the respective
glossary entries. Moreover, IMMEDIATE
changes the
compilation semantics of a word.
Note that (anonymous) definitions defined with :NONAME
have only one semantics,
represented by its execution token (xt).
When are the various semantics used?
The semantics are used in the following contexts:
- text interpreter, interpret state
- perform interpretation
semantics [3.4 b) 1.]
- text interpreter, compile state
- perform compilation semantics [3.4 b) 2.]
'
, [']
- xt represents
interpretation semantics [6.1.0070,
last sentence].
POSTPONE
- compile
compilation semantics.
[COMPILE]
- compile non-default
or perform default compilation semantics.
FIND
- unclear,
RFI 8 pending.
SEARCH-WORDLIST
- unclear,
should be clarified with RFI 8.
What is the relation between semantics and STATE
?
There is no a priori relationship between STATE
and semantics. In
particular, the interpretation semantics can be performed in any
state, via '
, [']
, EXECUTE
and COMPILE,
[RFI7]. However, it is an ambiguous condition to
perform compilation semantics in interpret state [Philip Preston's
RFI].
Implementation strategies
Many implementations are possible, have been proposed and implemented.
The following descriptions of two possible implementations are
intended to enhance understanding.
Straightforward implementation: Dual-xt words
The most straightforward implementation of these concepts would be to
have two xts per named word, one for interpretation semantics and one
for compilation semantics [1]. A more
practical implementation strategy uses a double-cell compilation token
instead of a single-cell compilation xt; you can read about this in some
postings; most of this page was written earlier, so this section
describes the pure dual-xt approach.
Terminological note: The difference between words (names)
and definitions (possibly anonymous procedures identified by
xts) is especially important in this section, because here the
correspondence between these concepts is not 1:1.
How can such an implementation implement various operations?
- (Operations accessing) interpretation semantics
- Access the interpretation xt.
- (Operations accessing) compilation semantics
- Access the
compilation xt. E.g., the text interpreter in compile state
EXECUTE
s the compilation xt; POSTPONE
COMPILE,
s the compilation xt.
[COMPILE]
- is a bit difficult to implement,
because it requires knowing whether the compilation semantics is
default. One solution is to have a flag or to make default
compilation semantics easy to recognize; another is not to implement
[COMPILE]
.
How can such an implementation implement the standard words?
- Words with default semantics (e.g.,
+
and
user-defined non-immediate words) - The interpretation xt is the
usual xt (what you want to get with
'
). The compilation
xt is the xt of a definition that COMPILE,
s the usual xt.
- Immediate words (e.g.,
\
and user-defined immediate
words) - i.e., words with the same interpretation and compilation
semantics. The interpretation xt is the usual xt; the compilation xt
is the same xt as the interpretation xt.
- Words without standard interpretation semantics (e.g.,
IF
, EXIT
) - The interpretation xt
can be anything, e.g., a special value indicating the absence of
interpretation semantics, the xt of a definition performing
-14
throw
, or the same as the compilation xt (the latter is quite
practical if you have to deal with lots of (non-standard) code that
ticks such words). The compilation xt is for a definition that
performs the compilation semantics.
- Words with interpretation semantics and non-default, non-immediate
compilation semantics (combined words) (
S"
, TO
) - are the most general
case. The interpretation xt is the xt of a definition that performs
the interpretation semantics. The compilation xt is the xt of a
definition that performs the compilation semantics of the word.
Relation with traditional implementations
A popular implementation scheme is to associate one execution token
(xt, traditionally implemented as CFA) and an immediate flag with each
named word.
This implementation can be seen as follows in standard terms: The
interpretation semantics of such a word is to EXECUTE
the xt; the compilation
semantics of non-immediate words is to COMPILE,
the xt; the compilation
semantics of immediate words is to EXECUTE
the xt.
How can such an implementation implement the standard words?
- Words with default semantics (e.g.,
+
and user-defined
non-immediate words) - are implemented as non-immediate words.
- Immediate words (e.g.,
\
and user-defined immediate
words) - i.e., words with the same interpretation and compilation
semantics are implemented as immediate words.
- Words without interpretation semantics and with non-default
compilation semantics (e.g.,
IF
) - can be implemented as
immediate words.
- Words without interpretation semantics and with default compilation
semantics (e.g.,
EXIT
)
- can be implemented as non-immediate; you cannot implement them as
immediate words if you support
[COMPILE]
.
- Words with interpretation semantics and non-default, non-immediate
compilation semantics (combined words) (
S"
, TO
) - are the most general case.
You can either hope that ticking these words will become non-standard
and implement them as STATE-smart immediate words.
Or you can use the technique used in combined.zip
to implement a solution that works correctly even when these words are
ticked. This technique works like this: Such words are coded like
STATE-smart immediate words, but '
, (and [']
) recognize such words and
supply an xt for the correct (non-STATE-smart) interpretation
semantics. This can be as simple as:
: ' ( "name" -- xt )
\ implement smart ' in terms of dumb ' and [']
' CASE
['] S" OF ['] S"-int ENDOF
['] TO OF ['] TO-int ENDOF
DUP
ENDCASE ;
: ['] ' POSTPONE LITERAL ; IMMEDIATE
How can such an implementation implement various operations?
- (Operations accessing) interpretation semantics
- For most word
classes listed above, the interpretation semantics is simply
represented by the xt. The exception is the last class (combined
words), and the solution is given above.
- (Operations accessing) compilation semantics
- If the immediate
flag is clear, the compilation semantics is to
COMPILE,
the xt; if the immediate flag is set, the compilation semantics is to
EXECUTE the xt. Performing compilation semantics is only legal in
compilation state [Philip Preston's RFI], so there is no need to
special-case combined words, unlike with interpretation semantics; you
still may want to do so to cater for programs that were standard
before that RFI.
- Text interpretation in interpret state
- It is possible to
optimize this operation: Because the xt is guaranteed to be used in
interpret state, the text interpreter can use the raw (STATE-smart) xt
for combined words and does not need to replace it with a
STATE-insensitive xt.
[COMPILE]
- This operation can be implemented as
COMPILE,
ing the xt, regardless of the immediate flag.
This implementation is much simpler than what you get by literally
translating the specification. Thus you can see that
[COMPILE]
fits this implementation model quite well.
What's the deal with STATE
-smartness?
The problems with STATE-smart words like
: 2DUP ( n1 n2 -- n1 n2 n1 n2 )
STATE @ IF
POSTPONE OVER POSTPONE OVER
ELSE
OVER OVER
THEN ; IMMEDIATE
are:
- It compiles correctly
- It interprets correctly
- (what it compiles) executes correctly
- Its tick, when EXECUTEd is correct if in interpret state
at the time EXECUTE is invoked, but is incorrect if in compile
state at the time
- A definition into which its tick is COMPILE,'d runs
correctly if the definition runs in interpret state but fails
if it is run in compile state
- [COMPILE] does not work correctly with it.
[RFI7]
I.e., you must not implement any standard words that can be ticked
(i.e., that have interpretation semantics), as STATE-smart words; and
you must not implement user-defined words (all of which can be ticked)
like constants as STATE-smart words, either [RFI7] (unless the user explicitly writes them as
STATE-smart, of course).
Moreover, if you support [COMPILE]
, you must not
implement any standard words with default compilation semantics (e.g.,
EXIT
and COMPILE,
) as STATE-smart (or
otherwise immediate) words.
For a longer discussion of the issue, read State
-smartness:
Why it is Evil and How to Exorcise it.
Footnotes
[1] If you have heard about multiple-code-field
(MCF) words, that's a little different: they have several xts per
word, but these xts are not independent, but implement different
operations on the same data.
Anton Ertl