From anton Tue Oct  1 19:09:57 1991
Date: Tue, 1 Oct 91 19:06:04 +0100
From: anton (Anton Martin Ertl)
To: Mitch.Bradley@Eng.Sun.COM, NER034%tp.ac.uk@cunyvm.cuny.edu,
        anton@mips.complang.tuwien.ac.at, nick@kyron.sw.stratus.com,
        pl@lsi.usp.br
Subject: internals wordset
Status: R

I think that now all reactions to my postings on FIGI-L and
comp.lang.forth are in. You all showed some interest in working the
internals workset (Not all "volunteer officially"):

Mitch Bradley (Mitch.Bradley@Eng.Sun.COM)

Peter Knaggs
+-----------------------------+-----------------------------------------------+
! School of Comp. & Maths.,   !    Janet: NER034 @ uk.ac.tees-poly            !
! Teesside Polytechnic,       !   Bitnet: NER034 % tp.ac.uk @ UKACRL          !
! Middlesbrough,              ! Internet: NER034 % tp.ac.uk @ cunyvm.cuny.edu !
! Cleveland, England. TS1 3BA !     Uucp: NER034 % tpoly.ac.uk @ ukc.uucp     !
!-----------------------------+-----------------------------------------------!

==========================================================================
Pedro Luis Prospero Sanchez      internet: pl@lsi.usp.br (PREFERRED)
University of Sao Paulo          uunet:    uunet!vme131!pl        
Dept. of Electronics Engineering  hepnet:   psanchez@uspif1.hepnet
phone: (055)(11)211-4574  home: (055)(11)914-9756 fax: (055)(11)815-4272
==========================================================================

nick@kyron.sw.stratus.com (Nicolas Tamburri)

RAY BROHINSKY <RAYBRO%HOLON@utrc.utc.com> encouraged us, but he
cannot work on it.

so here's the mailing list:
alias internals Mitch.Bradley@Eng.Sun.COM NER034%tp.ac.uk@cunyvm.cuny.edu pl@lsi.usp.br nick@kyron.sw.stratus.com anton@mips.complang.tuwien.ac.at


I think it is time for a little introducing: I learned Forth in late
1983 (a fig-Forth pretending to be a Forth-79 on a Commodore 64) and
since then I have done some programming in Forth, mostly for fun, but
I could use some of it for the courses I took. What do I do for a
living? I am the software toolsmith of a Viennese VAR.


So what are we going to do?
1) collect ideas I have or will forward
to you a paper of Mitch Bradley which he produced for the 1985 FORML
(I think), which contains some ideas.

I see the internals wordset divided into two parts:
a) defining access to the systems internal data structures
This should be quite straightforward. The data structures are:
The Forth word (the general parts)
word lists
Some of the things with mysterious stack pictures in the ANSI standard
Specialized parts of words, especially for:
Colon definitions
Word defined with CREATE/DOES>
vocabularies
What did I miss?
b) defining hooks where user code can be brought into the system, e.g.
   for tracing
This seems much harder, because I don't know when it's complete. Also,
defining hooks may force some structure onto the Forth system.

As test cases for the completeness of the word set some applications
should be implemented. What comes to my mind immediately is a debugger
and WORDS

2) transform the ideas into standard-like wording
Also, we need to know, what ideas we want in the wordset and which of
the words are extension.
In my opinion most of the word set should be easy to implement on
"conventional" (i.e. threaded, fig-like) Forth systems. It might be hard
to implement on a native code producing system.

3) The resulting text must be marketed
The simplest way would be to get the internals word set into the ANSI
standard through the rewiew process. If we cannot achieve this,
marketing will cost more effort. In this case I think we should publish
the wordset in the Forth journals, develop implementations of it for a
few popular Forth systems, and write some really neat applications,t
hat everyone wants to have, so there's an incentive for implementing
the word set on other systems. The applications would be the hardest
part, but I think they would be fun.

From Mitch.Bradley@Eng.Sun.COM Wed Oct  2 00:47:14 1991
Received: from Eng.Sun.COM (zigzag-bb.Corp.Sun.COM) by Sun.COM (4.1/SMI-4.1)
        id AA10105; Tue, 1 Oct 91 16:47:06 PDT
Received: from mitch.Eng.Sun.COM by Eng.Sun.COM (4.1/SMI-4.1)
        id AA17239; Tue, 1 Oct 91 16:46:57 PDT
Received: by mitch.Eng.Sun.COM (4.1/SMI-4.1)
        id AA08358; Tue, 1 Oct 91 16:46:06 PDT
Message-Id: <9110012346.AA08358@mitch.Eng.Sun.COM>
To: anton@mips.complang.tuwien.ac.at
Subject: Re: internals wordset
Date: 01 Oct 91 16:46:05 PDT (Tue)
From: Mitch.Bradley@Eng.Sun.COM
Status: R

Here is a proposed wordlist for vocabualary hacking, very similar to
what I use now in Open Boot.

Data types:
   wid     - ANS Forth wordlist id
   nid     - "name" id ("handle" for a word name)
   xt      - execution token
   adr len - string

immediate?  ( xt -- flag )       True if word is immediate

create-word  ( adr len wid -- )  Create the named word in the vocabulary "xt"

remove-word  ( nid wid -- )      Remove the name nid from the wordlist wid.

name>string  ( nid -- adr len )  Return string representation of name

>name  ( xt -- nid )             Return a name of the word xt, or 0 if that
                                 word has no names.

name>  ( nid -- xt )             Return the execution token of the name nid.

next-word  ( nid1 wid -- nid2 )  nid2 is the name preceding nid1 in wordlist
                                 in the wordlist "wid".  If nid1 is 0, nid2
                                 is the first word in wid.  If nid2 is 0,
                                 there are no more words in wid.
 
Note that nid's and xt's have to be separate, because many systems
have alias mechanisms and headerless words, thus the mapping between
names and execution tokens is not one-to-one.  Every name has exactly
one execution token, but a particular execution token may have zero,
one, or several names.

Mitch.Bradley@Eng.Sun.COM

From NER034@prime-a.tees-poly.ac.uk Thu Oct 10 13:54:11 1991
Received: from eros.uknet.ac.uk by mcsun.EU.net with SMTP;
        id AA24099 (5.65a/CWI-2.115); Thu, 10 Oct 1991 12:50:33 +0100
Message-Id: <9110101150.AA24099@mcsun.EU.net>
Received: from kestrel.ukc.ac.uk by eros.uknet.ac.uk via UKIP with SMTP (PP) 
          id <21177-0@eros.uknet.ac.uk>; Thu, 10 Oct 1991 12:42:24 +0100
Received: from tp.ac.uk by kestrel.Ukc.AC.UK via Janet (UKC CAMEL FTP) 
          id aa16678; 10 Oct 91 10:24 BST
Date: Thu, 10 Oct 91 10:29:38 BST
From: NER034@prime-a.tees-poly.ac.uk
To: ANTON <ANTON@mips.complang.tuwien.ac.at>
Subject: Re: Internals Wordset
Status: R

> Most of your renamings are good

Gee, thanks
.
> except for .ID .  .ID already exists in a lot of systems, and it usually
> takes a link field address.  It is better to use a different name for a
> different word.  Instead of C.ID, I now use .NAME .

As I said before, I recon that .ID is as near to the old .ID that we are
going to be able to get.  However, I don't see any objection in calling
it .NAME .  Indeed it is probably the only way it will get through the TSC.


> DOES? is used in a decompiler.  Inside the defining word, some systems
> compile the same run-time tokens for DOES> and ;CODE .

Oh, I never said that I don't understand why you would want the word.  I
just don't think that I would ever use it.  I am not of the opinion that all
Forth words should be de-compilable.  Indeed in a system that provides fast
and efficent compiled code this would be impossable.  But then, would you
implement this wordset for such a system ?


> FOLLOW and ANOTHER? are convenient to use, but they have a theoretical
> problem.  They maintain "hidden" information about the state of the
> search.  This causes problems with multitasking, reentrancy, and
> nestability.

Good Point.  May I suggest the following definitions:

FOLLOW  ( wid -- state )
        Initializes system dependent values (state) in preparation for
        scanning the given wordlist (wid).  The system dependent values
        (state) may be any number of cells in langth.  FOLLOW is used in
        conjunction with ANOTHER?, and UNFOLLOW.

        Example Usage: See ANOTHER?

        See also: ANOTHER?; UNFOLLOW.

ANOTHER? ( state -- state' xt true )                    "Another Query"
         ( -- false )
        Extracts the execution token of the next word in the wordlist
        begin scanned (indicated by state, as initialized by FOLLOW).  A
        true is return allong with the next execution token.  The system
        dependent values incorperated in state are modified to reflect the
        current position in the scanning of the wordlist.  A flase indicates
        the end of the wordlist.

        Example usage:
                : WORDS ( -- )
                  CONTEX @ FOLLOW       \ Follow the current wordlist
                  BEGIN
                     ANOTER?            \ Get next xt
                     KEY? IF            \ Has the user pressed a key
                          UNFOLLOW      \ Yes => Drop search state
                          FALSE         \        Exit loop
                     THEN
                  WHILE                 \ For all of the wordlist or until
                                        \ the user presses a key
                     .NAME              \ Display the name of xt
                     SPACE
                  REPETE
                ;

        See also: FOLLOW; UNFOLLOW.

UNFOLLOW ( state -- )
        Used to remove the system dependent values (state) from the stack
        after aborting an ANOTHER? based loop before its natrual
        termination.

        Example usage: See ANOTHER?

        See also: FOLLOW; ANOTHER?.


> Also, ANOTHER? should not return an "xt" because some systems have
> a low-level alias mechanism, and the mapping from names to "xt" may
> be many-to-one.  If ANOTHER? returns "xt", it will not be able to
> distinguish the names.

This is an interesting point.  However, if we were to introduce a new id
(namly "name-id" or "nid") we would also have to provide a way of converting
between nid and xt.  The possibality of getting this through the TSC will be
pretty remote in my view.

I am prepared to accept that alias may exist, however for the purpses of
displaing a name, the name associated with the original definition should be
used.

: FU FOO ;                 \ A standard way of aliasing.
ALIAS FOO BAR              \ Define a "low level" alias.

' FU  .NAME ( gives ) FU   \ Display the name of FU, as this is a colon
                           \ definition FU is displayed and not FOO.

' BAR .NAME ( gives ) FOO  \ As this is a "low level" alias the name
                           \ associated with the colon definition is given.
                           \ Ie., FOO.

This will not allow us to re-construct the orriginal definition when such
alias have been used.  However, the definition we create will be
functionally the same, as we have simply resolved the aliasing.  I conceder
this to be a pain, but I don't see any way around it.

Thus the new definition of .ID reads:

.NAME   ( xt -- )                                       "Dot Name"
        Displays the name of the word associated with the given execution
        token (xt).

        If the full name of the word has not been stored then an
        aproximation is required.  Ie., if the word INTEGER is defined in
        a systems that only stores the first three leters then INT---- is
        an exceptiable display.

        If the word has been defined as headerless then the name must
        take on the form H-nnnn where the H indicates that the word is
        headerless, and the nnnn is a representation of the execution
        token.

        If there are several names associated to the single execution
        token (ie., the name has aliases) then the name associated with the
        original definition is displayed.

Note, the only changes are the name of the word.  Changed from .ID to .NAME,
and the addition of the last paragraph to cater for aliasing.

> Mitch

I see that you make no comment on the other alterations and new words that I
have added to your list.  I therefor assume (a) you agree with them, or (b)
they are to be the subject of another mailing.

Peter Knaggs
+-----------------------------+-----------------------------------------------+
! School of Comp. & Maths.,   !    Janet: NER034 @ uk.ac.tees-poly            !
! Teesside Polytechnic,       !   Bitnet: NER034 % tp.ac.uk @ UKACRL          !
! Middlesbrough,              ! Internet: NER034 % tp.ac.uk @ cunyvm.cuny.edu !
! Cleveland, England. TS1 3BA !     Uucp: NER034 % tpoly.ac.uk @ ukc.uucp     !
!-----------------------------+-----------------------------------------------!
! It is not enough to do the right thing; one must also do it the right way.  !
+-----------------------------------------------------------------------------+

From nick@kyron.sw.stratus.com Fri Oct 11 15:36:02 1991
Received: from lectroid.sw.stratus.com (lectroid-gw.sw.stratus.com) by transfer.stratus.com (4.1/2.0-jjm)
        id AA05528; Fri, 11 Oct 91 10:34:40 EDT
Received: from kyron.sw.stratus.com.sw.stratus.com by lectroid.sw.stratus.com (4.1/2.1-jjm)
        id AA20552; Fri, 11 Oct 91 10:36:12 EDT
Received: by kyron.sw.stratus.com.sw.stratus.com (4.1/SMI-4.1)
        id AA04757; Fri, 11 Oct 91 10:36:09 EDT
Date: Fri, 11 Oct 91 10:36:09 EDT
From: nick@kyron.sw.stratus.com (Nicolas Tamburri)
Message-Id: <9110111436.AA04757@kyron.sw.stratus.com.sw.stratus.com>
To: NER034@prime-a.tees-poly.ac.uk
Subject: comments on wordset
Cc: Mitch.Bradley@Eng.Sun.COM, NER034%tp.ac.uk@cunyvm.cuny.edu,
        anton@mips.complang.tuwien.ac.at, nick@kyron.sw.stratus.com,
        pl@lsi.usp.br
Status: R

Sorry it's taken so long to respond.  I`ve been trying to reconcile
your wordset with the way I've been thinking about this problem, as I
began outlining in a post to c.l.f.  Since I've not received any comments
on my post, I assume that it wasn't well received as a starting point for
discussion.  Nevertheless, my mind is set to that way of thinking and,
involuntarily, I've been trying to compare the 2 schemes to try to
see which has the more potential for problems.

Some general comments:

Our basic differences stem from the fact that your scheme, (and Mitch's)
rely on knowing about addresses, whereas mine tries to avoid native
machine addresses and rely on Dictionary Tokens (DT) for passing header
information.  Generally speaking this is only a matter of semantics, and in
most cases the DT would be kept as an address anyway.  Conceptually however,
freeing the vendor from having to supply me with addresses removes a limit
which may be important in the future.  This is important IMO.  I believe,
(but will go along with the majority opinion on this,) that we are not
abstracting data enough.  That was the intent of project, was it not?

Below are some specific comments. It assumes we wish to keep this
addressing scheme.

>TOKEN@  ( a.addr -- xt )                                "Token Fetch"
>        xt is the system dependent execution token stored at the address
>        alligned a.addr.  The execution token is of a form that can be
>        passed to EXECUTE.  An ambiguas condition exists when a.addr
>        does not reference an execution token.

Why would I ever want to use this, when LOCATE not only returns me the
xt, but tells me if it is valid as well?  (At least I assume it does from
its description.)  Is it ever possible for me to test an xt to see if it
is valid?

>+TOKEN  ( a.addr1 -- a.addr2 )                          "Plus Token"
>        Moves the given address aligned a.addr1 past the token stored at
>        a.addr1 to the address of the next token (a.addr2).  An ambigus
>        condition exists if a.addr1 does not point to an execution token.
>

I assume this also does +STRs as appropriate.  What happens if c.addr1
points to the last xt of a definition?  I think it should return 0.

>Notes:  The definition of TOKEN@ and TOKEN! are more or less the same as
>        Mitch's.  However I beleve that /TOKEN can not be used as some
>        systems (subroutine threded) may use differing sized of token,
>        dependent on cercamstances.  Therefor the compromise word
>        +TOKEN is given to counter this posiability.  Ie., to read two
>        tokens you would write:  DUP TOKEN@ SWAP +TOKEN TOKEN@
>                 as opposed to:  DUP TOKEN@ SWAP /TOKEN + TOKEN@

I agree.  In general, I'd rather have the system do the pointer arithmetic
when it comes to working with internal structures.

>        In the same respect I also beleve that a.addr should point to the
>        start of the token.  Hence on a subroutine threded system a.addr
>        will point to the subroiutne call instruction.

Don't pin point it down.  a.addr returns an address which points TOKEN@
uses to return an xt.  I should not have to care about what it points to.

>>TARGET ( a.addr1 -- a.addr2 )                          "To Target"
>        a.addr2 is the destination address of the branch instruction
>        located at a.addr1.  a.addr1 is the address of the branch
>        instruction and not its operand.  An ambiguas condition exists
>        if the instruction pointed to by a.addr1 is not a branch
>        instruction.

This is where a totally address less scheme such as mine breaks down.
In this case, you really do need an address.  At least, I can't think of
a way to avoid them.

>BRANCH? ( a.addr -- flag )                              "Brance Query"
>        Returns True if the branch instruction at a.addr is an
>        unconditional branch, and False if it is a conditional branch.
>        An ambiguas condition exists if the instruction pointed to by
>        a.addr1 is not a branch instruction.

How do you find out if it is a branch instruction in the first place?
How about returns 0 if non-branch, negative if conditional branch and
positive if unconditional branch?  Branch instructions include LOOP
and friends as well I assume.

>STR@    ( c.addr1 -- c.addr2 )                          "String Fetch"
>        Fetches the string literal compiled at the given c.addr1.  A
>        copy of the counted string is made available at c.addr2.

A copy of the counted string?  So this does not work the COUNT does,
which merely adjusts the input address to point to the beginning of the
string.  The problem with this is that it implies that the system allocates
space at c.addr2 to store the string into.  This has 2 problems:

        1. I don's see any words that deallocates the space.

        2. If the user has a unique memory management mechanism, a hidden
           and uncontrollable, call to ALLOCATE by the system would not
           be appreciated.

Solution:

STR@    ( c.addr1 length c.addr2 -- actualLength )
        Move the string from c.addr1 to c.addr2. Do not exceed the provided
        length count if the string is longer than provided for.  Return the
        actual length of the string.

BTW: If the STR operators are to be used only inconjunction with string
     literals, meaning they are compiled by " , ." etc. then I believe the
     input address should always point to the xt of the runtime components
     of the aformentioned string literal handlers.

>>DATA   ( xt -- a.addr )                                "To Data"
>        a.addr is the address of the data storage area associated with
>        the given execution token (xt).  Ie., for variables and user
>        created items, >DATA is equivalnt to >BODY, for user variables
>        >DATA returns an address in the user area, etc.

It might be useful to also return the length of the data area in bytes.

>.ID     ( xt -- )                                       "Dot I D"
>        Displays the name of the word associated with the given execution
>        token (xt).

I think we need an analogous way to retrieve the name as a string, rather
than something that types the name out.  This allows things like debuggers
to work, as well as friendly tools like command line name completers.

>LITERAL@ ( a.addr1 -- a.addr2 x )                       "Literal Fetch"
>        Reads the value compiled by a literal instruction at a.addr1.  The
>        address of the next instruction is returned (a.addr2) in
>        addition to the value of the literal instruction.  An ambiguas
>        condition exists if a.addr1 does not point to a literal
>        instruction.

This seems unnecessarily complex.  It seems to me that this can easily
be implemented as >DATA +TOKEN.

Maybe we should have a LITERAL? word, which specifies that this is a
word which is followed by inline data.

[I now switch to your response to Mitch's comments.]

>FOLLOW  ( wid -- state )
>        Initializes system dependent values (state) in preparation for
>        scanning the given wordlist (wid).  The system dependent values
>        (state) may be any number of cells in langth.  FOLLOW is used in
>        conjunction with ANOTHER?, and UNFOLLOW.

This has the same problems I mentioned above for STR@ . I suggest a
similar solution.  UNFOLLOW does provide the deallocation mechanism, but
the problem of FOLLOW possibly messing up a user's memory management
routines still exists.  Let the user do the allocation.

>> Also, ANOTHER? should not return an "xt" because some systems have
>> a low-level alias mechanism, and the mapping from names to "xt" may
>> be many-to-one.  If ANOTHER? returns "xt", it will not be able to
>> distinguish the names.
>
>This is an interesting point.  However, if we were to introduce a new id
>(namly "name-id" or "nid") we would also have to provide a way of converting
>between nid and xt.  The possibality of getting this through the TSC will be
>pretty remote in my view.

You have a good point, but I agree with Mitch.  For some, the use of aliases
is so pervasive that they may forget they are using them.  A decompilation
to the original code may simply serve to confuse, or think the decompiler
has a bug in it.

>This will not allow us to re-construct the orriginal definition when such
>alias have been used.  However, the definition we create will be
>functionally the same, as we have simply resolved the aliasing.  I conceder
>this to be a pain, but I don't see any way around it.

The way around it is to use DTs instead of XTs.  For most of the words we are
talking about,  this would work just fine.

>Thus the new definition of .ID reads:
>
>.NAME   ( xt -- )                                       "Dot Name"
>        Displays the name of the word associated with the given execution
>        token (xt).
>
>        If the full name of the word has not been stored then an
>        aproximation is required.  Ie., if the word INTEGER is defined in
>        a systems that only stores the first three leters then INT---- is
>        an exceptiable display.
>
>        If the word has been defined as headerless then the name must
>        take on the form H-nnnn where the H indicates that the word is
>        headerless, and the nnnn is a representation of the execution
>        token.

Last call for a NAME@ defined as:

NAME@   ( xt length c.addr -- )

        works just like my definition for STR@, but is specific to working
        with XTs.  (Works even better with DTs.)

>        If there are several names associated to the single execution
>        token (ie., the name has aliases) then the name associated with the
>        original definition is displayed.

Too restrictive.  If some enterprising vendor can figure out how to
return me an alias, then don't prevent it.

Summary:

Not having attempted to write anything with this word-set, I believe I can
live with it.  At the very least, I believe it is a good start.

I believe that it would be better if we implemented most words which
work with names to use DTs, and use XTs when traversing through the
parameter list of a word definition.   Would anyone be interested if
I try to come up with such a word-set,  or is this direction OK with
everyone?

Food for thought:

Are we concentrating at too low a level here?  It seems to me that the
above word-set has been defined to handle the most common features of
current forths.  Yet, we are defining a new word set, possibly for a
standard.  Can't we expect changes to current forths to support this
word-set.  Specifically, what I would love to have, to really do this
right, is a hook into the dictionary header which will allow me to
vector a data return routine. This way, if I have a complex data
structure, I can call a routine which will return the value of each
field in sequence, so I can type them out.  I know this is of limited
value in a production environment, but everything we are talking about
here is geared toward the development environment.  The finished
program would not have to include this extra code of course.

The words presented here don't break existing code, and don't seem to
be too hard to implement on most of the systems I've seen.  IMO a common
word set like this would benefit all vendors because it would free them
from having to develop their own environment, or sell a proprietary
environment for other vendors' platforms.  Would it be too much to expect
rudimentary support for this word-set from any vendor who wants these
benefits?  (Mitch?)

Comments are welcome of course...

                                                        /nt

From pl@lsi2 Fri Oct 11 19:19:21 1991
Received: from lsi2 (lsi2.lsi.usp.br) by lsi11.lsi.usp.br (SUN-IPC/4.1/LSI-1.0)
        id AA01544
Posted-Date: Fri, 11 Oct 91 15:13:11-030
Received-Date: Fri, 11 Oct 91 15:13:05 EST
Received: by lsi2 (4.0/SMI-4.0)
        id AA02961; Fri, 11 Oct 91 15:13:11-030
Date: Fri, 11 Oct 91 15:13:11-030
From: pl@lsi2 (Pedro Sanchez)
Message-Id: <9110111813.AA02961@lsi2>
To: anton@mips.complang.tuwien.ac.at
Subject: A discussion list for Internals Wordset?
Status: R

Hi Anton,

I think that a discussion list on the Internals wordset subject
would be a good idea. At least everybody would receive all the messages
and could follow the thread ( now I can't, because I do not receive all
messages).

Since you started the discussion on the subject, I am asking you what
you think about this.

I volunteer to provide the resources and to take care of the list.

Regards,
        Pedro.

==========================================================================
Pedro Luis Prospero Sanchez       internet: pl@lsi.usp.br (PREFERRED)
University of Sao Paulo           uunet:    uunet!vme131!pl        
Dept. of Electronics Engineering  hepnet:   psanchez@uspif1.hepnet
phone: (055)(11)211-4574  home: (055)(11)914-9756 fax: (055)(11)815-4272
==========================================================================


From anton Tue Oct 15 20:45:04 1991
Date: Tue, 15 Oct 91 20:41:31 +0100
From: anton (Anton Martin Ertl)
To: Mitch.Bradley@Eng.Sun.COM, NER034%tp.ac.uk@cunyvm.cuny.edu,
        anton@mips.complang.tuwien.ac.at, nick@kyron.sw.stratus.com,
        pl@lsi.usp.br
Subject: Re: Internals wordset
Status: R

Sorry for the late reply - had not much time lately

In <9110012346.AA08358@mitch.Eng.Sun.COM> Mitch writes

>immediate?  ( xt -- flag )       True if word is immediate
shouldn't it be ( nid -- flag ) ?

>create-word  ( adr len wid -- )  Create the named word in the vocabulary "xt"
how do xt's and immediate come in?
note that this word suggests, but does not force, that a word (nid) is in
only one word list (no objection, just to make you aware)

>name>string  ( nid -- adr len )  Return string representation of name
I think, that we have to restrict this word to allow implementation on
all systems. I.e. restrict the life of the string returned.

>next-word  ( nid1 wid -- nid2 )         nid2 is the name preceding nid1 in wordlist
>                                in the wordlist "wid".  If nid1 is 0, nid2
>                                is the first word in wid.  If nid2 is 0,
>                                there are no more words in wid.

Some Forth implementations might be very unhappy with this stack
effect: If the same nid can appear twice in a wordlist, ( nid wid )
does not identify the current position in the word list. This stack
effect can also have a bad effect on efficiency. A remedy would be to
use a word-list position, resulting in a stack effect like ( wlpos1 --
wlpos2 nid ). This word-list position is Peter Knaggs' state in the
FOLLOW/ANOTHER?/UNFOLLOW words. Another word to get the first
word-list position from the word list would be required (unless the
wid is the first word-list position), e.g. Peter's FOLLOW. I think the
size of the word-list position should be defined (unlike state) to
make it usable. A one-cell wlpos might be impossible to implement
without changing the internal structure of some implementations. Is
two-cell acceptable?
(note that I have not read Peter's message thoroughly and I don't know
what he replies to)

From mips!anton@relay.EU.net Wed Oct 30 20:20:11 1991
Received: from lsi7.lsi.usp.br by lsi11.lsi.usp.br (4.1/LSI-16OCT91-01)
        id AA00454; Wed, 30 Oct 91 17:05:12 EDT
Received: from lsi11.lsi.usp.br by lsi7.lsi.usp.br (4.1/SMI-4.1)
        id AA04029; Wed, 30 Oct 91 17:04:57 EDT
Received: from lsi17.lsi.usp.br by lsi11.lsi.usp.br (4.1/LSI-16OCT91-01)
        id AA00451; Wed, 30 Oct 91 17:05:06 EDT
Received: from mcsun.EU.net by lsi17.lsi.usp.br (4.0/SMI-4.1)
        id AA01882; Wed, 30 Oct 91 16:02:17-030
Received: from tuvie.can.ac.at by mcsun.EU.net with SMTP;
        id AA07893 (5.65a/CWI-2.120); Wed, 30 Oct 1991 19:57:41 +0100
Received: from mips.complang.tuwien.ac.at by tuvie.can.ac.at with SMTP;
        id AA12194 (5.65b+/CAN-1.15); Wed, 30 Oct 91 20:00:34 +0100
Received: by mips.complang.tuwien.ac.at (5.57/Ultrix3.0-C)
        id AA00291; Wed, 30 Oct 91 19:20:12 +0100
Date: Wed, 30 Oct 91 19:20:12 +0100
From: mips!anton@relay.EU.net (Anton Martin Ertl)
Message-Id: <9110301820.AA00291@mips.complang.tuwien.ac.at>
To: internals@lsi17.lsi.usp.br
Subject: decompilation and tracing
Status: R

Nick writes on c.l.f (message <8570@lectroid.sw.stratus.com>):
>Mach2 for the Mac is a JSR threaded Forth which allows inline machine
>code.
>Its debugger seems to do a pretty good job of decompiling words, JSR
>calls
>and machine code.  If the machine code has been generated by inline
>expansion
>of a Forth word,  it manages to identify the word that did it in all of
>the
>examples I've seen.  (There may be instances where it cannot.)  User
>code is
>simply displayed as assembler code, and/or hex values.
>
>Is there any language implementation/processor architecture which would
>specifically prevent this type of functionality?

There are optimizations which are impossible to undo (information loss).
Of course additional information can be stored and used for decompiling.
For debugging (i.e. stepping and tracing) the problem is worse: You not
only want to see the word decompiled, but also want it to execute
piecewise and see the stack effect.

A good optimizer can do the following transformations making debugging
difficult:
rearrange actions (e.g. instruction scheduling, or loop fusion)
combine several actions (words) into one instruction (instruction
selection, peephole optimization)
distribute an action into several instructions, which can then be
rearranged with other instructions (inlining)
optimize code away (constant folding, induction variable elimination)

BTW, undoing inlining (in Mach2) is quite a feat. I would not have
tried it.

Fortunately there are no (to my knowledge) Forth systems with high
levels of optimization. They all do a bit of inlining and some peephole
optimizations (to reduce pushing and popping).

However, since we do not want to prevent systems with such optimizers,
we have to take them into consideration. The basic question is: How
much correspondence with the source do we want?  This is even an issue
on a completely conventional (e.g. fig-)Forth.  They all have
immediate words which compile to something different (or nothing at
all, e.g. '('). Is it satisfactory to decompile into forth code that
is equivalent (on the specific system) to the original code? If yes, a
decompiler can be done on any system. If the user wants better
source-decompilation correspondence, he can turn the optimizer lower.

The usual approach in other languages is to have additional information,
(making the compiled program twice as big) and displaying the source code
(stored in the source files, not in the executable). Most compilers
disallow debugging optimized code, those that allow it have reduced
functionality (e.g. they are unable to display the values of variables
that are optimized away). Also some optimizations are turned off by
debugging anyway. You have to turn on compilation for debugging
explicitly.

The situation in Forth is harder because of the finer granularity
(word instead of statement).

- anton

From mips!anton@relay.EU.net Wed Oct 30 20:20:32 1991
Received: from lsi7.lsi.usp.br by lsi11.lsi.usp.br (4.1/LSI-16OCT91-01)
        id AA00463; Wed, 30 Oct 91 17:05:59 EDT
Received: from lsi11.lsi.usp.br by lsi7.lsi.usp.br (4.1/SMI-4.1)
        id AA04038; Wed, 30 Oct 91 17:05:43 EDT
Received: from lsi17.lsi.usp.br by lsi11.lsi.usp.br (4.1/LSI-16OCT91-01)
        id AA00458; Wed, 30 Oct 91 17:05:53 EDT
Received: from mcsun.EU.net by lsi17.lsi.usp.br (4.0/SMI-4.1)
        id AA01881; Wed, 30 Oct 91 16:02:16-030
Received: from tuvie.can.ac.at by mcsun.EU.net with SMTP;
        id AA07908 (5.65a/CWI-2.120); Wed, 30 Oct 1991 19:57:51 +0100
Received: from mips.complang.tuwien.ac.at by tuvie.can.ac.at with SMTP;
        id AA12199 (5.65b+/CAN-1.15); Wed, 30 Oct 91 20:00:42 +0100
Received: by mips.complang.tuwien.ac.at (5.57/Ultrix3.0-C)
        id AA00297; Wed, 30 Oct 91 19:22:51 +0100
Date: Wed, 30 Oct 91 19:22:51 +0100
From: mips!anton@relay.EU.net (Anton Martin Ertl)
Message-Id: <9110301822.AA00297@mips.complang.tuwien.ac.at>
To: internals@lsi17.lsi.usp.br
Subject: framework for "vocabulary hacking"
Status: R


%I hope you can process Latex; If you want a postscript version, mail me.
\documentstyle[12pt]{article}

\newcommand{\function}[4]{
\item[#1]\hfill{#2}

{#3}

{#4}}
\newcommand{\forth}[1]{{\tt #1}}
\newcommand{\stack}[1]{{\it #1}}
\newcommand{\stackcomment}[2]{(\stack{#1} --- \stack{#2})}
\newcommand{\word}[3]{\forth{#1} \stackcomment{#2}{#3}}
\newenvironment{functions}{\begin{description}\setlength{\parskip}{0mm}}{\end{description}}

\title{Internals Wordset Framework (draft)}
\author{M. Anton Ertl}
\date{October 25, 1991}
\begin{document}
\maketitle

This document lists functions that might be included in the internals
wordset. This listing is not yet complete. Since the discussion
currently revolves around wordlists and words in general, I have
attacked these themes first. For completeness and as food for thought
I have also listed functions that I would not provide for in the word
set. If you can think of any functions that I have missed, you are
welcome.

I have entered the words you or the ANSI TC have proposed. If I forgot
anything, please inform me. Note that there need not be a 1:1 relation
between functions and words.

The functions are classified with respect to implementability:
\begin{functions}
\item[1]
The function is already performed by some word in the standard (i.e. the
fuction is a factor of the word). The word is indicated in
parentheses.
\item[2]
Functions that can be implemented on current Forth sytems.
\item[3]
Functions that can be implemented but require changing data structures
of existing Forth systems.
\end{functions}
Of course my classifications may be wrong. Note that a word performing
a function can be in a higher class than the function.

The entries have the format:

\begin{functions}
\function
{function}
{class}
{comments}
{proposed word(s) [origin]}
\end{functions}

\section{Wordlists}

\begin{functions}
\function
{create a wordlist}
{1 (\forth{WORDLIST})}
{}
{\word{WORDLIST}{}{wid} \cite[15.1.2460]{basis15}}

\function
{insert a word into a wordlist}
{1 (\forth{CREATE})}
{}
{\word{create-word}{addr len wid}{} \cite{mb91b} (see also: create name)}

\function
{delete a word from a wordlist}
{2}
{}
{\word{remove-word}{nid wid}{} \cite{mb91b}}

\function
{change word in a wordlist}
{3}
{}
{}

\function
{enumerate the words in a wordlist/traverse the wordlist}
{1 (\forth{WORDS})}
{}
{\word{FOLLOW}{wid}{}, \word{ANOTHER?}{}{nid true {\rm or} false}
\cite{mb85};

\word{FOLLOW}{wid}{wordlist-pos},
\word{ANOTHER?}{wordlist-pos1}{wordlist-pos2 nid true {\rm or} false},
\word{UNFOLLOW}{wordlist-pos}{} \cite{pk91a}

I have renamed the stack items for clarity (Whether we unify
\stack{xt} and \stack{nid}, is a seperate issue).

\word{next-word}{nid1 wid}{nid2} \cite{mb91a}}

\end{functions}

\section{Name (nid)}
standard-visible data: name string, immediate flag, associated
execution token (xt)

\begin{functions}
\function
{create name}
{1 (\forth{CREATE} etc.)}
{}
{\word{create-word}{addr len wid}{} \cite{mb91b} (see also: insert a
word into a wordlist)}

\function
{get name string}
{1 (\forth{WORDS})}
{the result should be an approximation sufficient for searching}
{\word{name>string}{nid}{adr len} \cite{mb91b}}

\function
{get immediate flag}
{1 (outer interpreter/compiler)}
{}
{\word{immediate?}{nid}{flag} \cite{mb91b} (I have changed the stack effect)}

\function
{get xt}
{1 (\forth{'})}
{}
{\word{name>}{nid}{xt} \cite{mb91b}}

\function
{get wordlist}
{3}
{}
{}

\function
{get position in word list}
{2}
{}{}

\function
{change name string}
{3}
{}
{}

\function
{change immediate flag}
{1 (\forth{IMMEDIATE})}
{}{}

\function
{change xt}
{3}
{}{}
\end{functions}

\section{Execution token (xt)}

\begin{functions}
\function
{create execution token}
{1 (\forth{:NONAME})}
{}{}

\function
{get definition token}
{1 (\forth{EXECUTE})}
{The definition token says what kind of word the xt is (i.e. colon
def, var, ...)}
{\word{DEFINER}{xt1}{xt2} \cite{mb91a}}

\function
{check if the word has a name (or if it is headerless)}
{3}
{}
{\word{>name}{xt}{nid} \cite{mb91b}}

\function
{get the nid if there is one}
{2}
{}
{\word{>name}{xt}{nid} \cite{mb91b}}

\function
{get the memory block associated with the xt}
{address 1-2, size 3}
{former parameter field}
{\word{>DATA}{xt}{addr} \cite{mb85} for the address (no size)}

\end{functions}

%for the next time
%there are the following definition words in Basis 15
%2constant, 2variable, :, code, constant, create, fconstant,
%fvariable, marker, value, variable
%(local) is not listed as defining word

\begin{thebibliography}{9}
%may be slightly wrong, I don't have the papers right now
\bibitem{basis15} ANSI~X3J14 Technical Committee.
\newblock{\em Basis~15}, 1991.

\bibitem{mb85} Mitch Bradley.
\newblock Self-Understanding Programs.
\newblock {\em FORML Proceedings}, 1985.

\bibitem{mb91a} Mitch Bradley.
\newblock How to make a portable decompiler.
\newblock Email message 9109050216.AA13686@mitch.Eng.Sun.COM,
September 4, 1991.

\bibitem{mb91b} Mitch Bradley.
%\newblock {\em Re: internals wordset}.
\newblock 9110012346.AA08358@mitch.Eng.Sun.COM internals posting, 1991.

\bibitem{pk91a} Peter Knaggs.
%\newblock {\em Re: Internals Wordset}.
\newblock internals posting, October 10, 1991.

\end{thebibliography}
\end{document}

From mips!anton@relay.EU.net Wed Oct 30 20:33:56 1991
Received: from lsi7.lsi.usp.br by lsi11.lsi.usp.br (4.1/LSI-16OCT91-01)
        id AA00524; Wed, 30 Oct 91 17:07:29 EDT
Received: from lsi11.lsi.usp.br by lsi7.lsi.usp.br (4.1/SMI-4.1)
        id AA04042; Wed, 30 Oct 91 17:07:12 EDT
Received: from lsi17.lsi.usp.br by lsi11.lsi.usp.br (4.1/LSI-16OCT91-01)
        id AA00499; Wed, 30 Oct 91 17:07:21 EDT
Received: from mcsun.EU.net by lsi17.lsi.usp.br (4.0/SMI-4.1)
        id AA01885; Wed, 30 Oct 91 16:05:51-030
Received: from tuvie.can.ac.at by mcsun.EU.net with SMTP;
        id AA04978 (5.65a/CWI-2.120); Wed, 30 Oct 1991 19:18:28 +0100
Received: from mips.complang.tuwien.ac.at by tuvie.can.ac.at with SMTP;
        id AA11486 (5.65b+/CAN-1.15); Wed, 30 Oct 91 19:21:20 +0100
Received: by mips.complang.tuwien.ac.at (5.57/Ultrix3.0-C)
        id AA00285; Wed, 30 Oct 91 19:18:15 +0100
Date: Wed, 30 Oct 91 19:18:15 +0100
From: anton@mips.complang.tuwien.ac.at (Anton Martin Ertl)
Message-Id: <9110301818.AA00285@mips.complang.tuwien.ac.at>
To: internals@lsi17.lsi.usp.br
Subject: internals
Status: R

Peter Knaggs writes (on Oct 10th)

>As I said before, I recon that .ID is as near to the old .ID that we are
>going to be able to get.  However, I don't see any objection in calling
>it .NAME .  Indeed it is probably the only way it will get through the TSC.

.NAME should be easy to define with the rest of the internals wordset,
e.g.
: .NAME ( nid -- )
 NAME>STRING TYPE ;

>Oh, I never said that I don't understand why you would want the word.  I
>just don't think that I would ever use it.  I am not of the opinion that all
>Forth words should be de-compilable.  Indeed in a system that provides fast
>and efficent compiled code this would be impossable.  But then, would you
>implement this wordset for such a system ?
>
I would, although what "this" is, is not yet fully determined.

>This is an interesting point.  However, if we were to introduce a new id
>(namly "name-id" or "nid") we would also have to provide a way of converting
>between nid and xt.  The possibality of getting this through the TSC will be
>pretty remote in my view.

Let us make it right! They will change it anyway. According to Mitch,
the chances of an internals wordset to pass are pretty small.

>        If the full name of the word has not been stored then an
>        aproximation is required.  Ie., if the word INTEGER is defined in
>        a systems that only stores the first three leters then INT---- is
>        an exceptiable display.

The name produced by NAME>STRING and/or  .NAME should be usable as
input for wordlist searching words

>        If the word has been defined as headerless then the name must
>        take on the form H-nnnn where the H indicates that the word is
>        headerless, and the nnnn is a representation of the execution
>        token.

Make that a suggestion, but not a requirement.

- anton

From Mitch.Bradley@Eng.Sun.COM Wed Oct 30 20:37:10 1991
Received: from Eng.Sun.COM (zigzag-bb.Corp.Sun.COM) by Sun.COM (4.1/SMI-4.1)
        id AA28989; Wed, 30 Oct 91 11:36:52 PST
Received: from mitch.Eng.Sun.COM by Eng.Sun.COM (4.1/SMI-4.1)
        id AA17297; Wed, 30 Oct 91 11:36:06 PST
Received: by mitch.Eng.Sun.COM (4.1/SMI-4.1)
        id AA02889; Wed, 30 Oct 91 11:35:10 PST
Message-Id: <9110301935.AA02889@mitch.Eng.Sun.COM>
To: anton@mips.complang.tuwien.ac.at
Cc: internals@lsi17.lsi.usp.br
Subject: Re: internals
Date: 30 Oct 91 11:35:09 PST (Wed)
From: Mitch.Bradley@Eng.Sun.COM
Status: R

 
> For TOKEN@ etc.? Then the DT (= nid?) would have to be in the compiled
> code (or the additional info). Would this not be a bit too
> expensive (in terms of constraints on the implementation) just for
> decompiling aliases well?

I am not worried about the problem of properly decompiling the aliases.
I am worried about the problem of enumerating the word names in a
vocabulary.  If you use the XT to refer to a word, you can't enumerate
a vocabulary that contains 2 aliases for the same XT, because you
can't find the successor of the second such alias.

Mitch.Bradley@Eng.Sun.COM

From mips!anton@relay.EU.net Wed Oct 30 20:44:21 1991
Received: from lsi7.lsi.usp.br by lsi11.lsi.usp.br (4.1/LSI-16OCT91-01)
        id AA00585; Wed, 30 Oct 91 17:17:51 EDT
Received: from lsi11.lsi.usp.br by lsi7.lsi.usp.br (4.1/SMI-4.1)
        id AA04084; Wed, 30 Oct 91 17:17:35 EDT
Received: from lsi17.lsi.usp.br by lsi11.lsi.usp.br (4.1/LSI-16OCT91-01)
        id AA00582; Wed, 30 Oct 91 17:17:45 EDT
Received: from mcsun.EU.net by lsi17.lsi.usp.br (4.0/SMI-4.1)
        id AA01902; Wed, 30 Oct 91 16:16:34-030
Received: from tuvie.can.ac.at by mcsun.EU.net with SMTP;
        id AA07879 (5.65a/CWI-2.120); Wed, 30 Oct 1991 19:57:33 +0100
Received: from mips.complang.tuwien.ac.at by tuvie.can.ac.at with SMTP;
        id AA12187 (5.65b+/CAN-1.15); Wed, 30 Oct 91 20:00:25 +0100
Received: by mips.complang.tuwien.ac.at (5.57/Ultrix3.0-C)
        id AA00294; Wed, 30 Oct 91 19:20:49 +0100
Date: Wed, 30 Oct 91 19:20:49 +0100
From: mips!anton@relay.EU.net (Anton Martin Ertl)
Message-Id: <9110301820.AA00294@mips.complang.tuwien.ac.at>
To: internals@lsi17.lsi.usp.br
Subject: xt vs. nid
Status: R

Should wordlist-position, name-id and execution token be kept
separate, or should some or all of these concepts be unified?

In many existing systems these concepts are related to each other in a
1:1 fashion (except for headerless words), so they could be unified
(i.e. use the xt for all purposes). If we unify them, or at least some
of them, as Peter Knaggs has suggested, we lose functionality on
systems that support many:1 relationships. E.g., implementing WORDS
with nid replaced by xt gives unexpected results for aliases.

On the other hand, keeping them separate makes words for conversion
necessary.

Also, we have to take care not to constrain the implementations in
either way.

Currently I prefer keeping them separate. It gives better
functionality and seems cleaner.

BTW, what about multiple code fields? Are they in use? Should we
consider them?

- anton


From Mitch.Bradley@Eng.Sun.COM Wed Oct 30 21:25:39 1991
Received: from lsi7.lsi.usp.br by lsi11.lsi.usp.br (4.1/LSI-16OCT91-01)
        id AA00684; Wed, 30 Oct 91 17:46:39 EDT
Received: from lsi11.lsi.usp.br by lsi7.lsi.usp.br (4.1/SMI-4.1)
        id AA04093; Wed, 30 Oct 91 17:46:23 EDT
Received: from lsi17.lsi.usp.br by lsi11.lsi.usp.br (4.1/LSI-16OCT91-01)
        id AA00681; Wed, 30 Oct 91 17:46:34 EDT
Received: from Sun.COM by lsi17.lsi.usp.br (4.0/SMI-4.1)
        id AA01980; Wed, 30 Oct 91 16:43:40-030
Received: from Eng.Sun.COM (zigzag-bb.Corp.Sun.COM) by Sun.COM (4.1/SMI-4.1)
        id AA28335; Wed, 30 Oct 91 11:33:15 PST
Received: from mitch.Eng.Sun.COM by Eng.Sun.COM (4.1/SMI-4.1)
        id AA17072; Wed, 30 Oct 91 11:32:28 PST
Received: by mitch.Eng.Sun.COM (4.1/SMI-4.1)
        id AA02880; Wed, 30 Oct 91 11:31:31 PST
Message-Id: <9110301931.AA02880@mitch.Eng.Sun.COM>
To: mips!anton@relay.EU.net
Cc: internals@lsi17.lsi.usp.br
Subject: Re: xt vs. nid
Date: 30 Oct 91 11:31:30 PST (Wed)
From: Mitch.Bradley@Eng.Sun.COM
Status: R

 
> Should wordlist-position, name-id and execution token be kept
> separate, or should some or all of these concepts be unified?

I for one don't want to unify them because that doesn't work
on my system, which has first-class aliases and headerless words.

Mitch.Bradley@Eng.Sun.COM

From mips!anton@relay.EU.net Thu Oct 31 13:49:59 1991
Received: from lsi7.lsi.usp.br by lsi11.lsi.usp.br (4.1/LSI-16OCT91-01)
        id AA00578; Wed, 30 Oct 91 17:16:18 EDT
Received: from lsi11.lsi.usp.br by lsi7.lsi.usp.br (4.1/SMI-4.1)
        id AA04079; Wed, 30 Oct 91 17:16:03 EDT
Received: from lsi17.lsi.usp.br by lsi11.lsi.usp.br (4.1/LSI-16OCT91-01)
        id AA00575; Wed, 30 Oct 91 17:16:13 EDT
Received: from mcsun.EU.net by lsi17.lsi.usp.br (4.0/SMI-4.1)
        id AA01897; Wed, 30 Oct 91 16:12:48-030
Received: from tuvie.can.ac.at by mcsun.EU.net with SMTP;
        id AA06194 (5.65a/CWI-2.120); Wed, 30 Oct 1991 19:34:28 +0100
Received: from mips.complang.tuwien.ac.at by tuvie.can.ac.at with SMTP;
        id AA11499 (5.65b+/CAN-1.15); Wed, 30 Oct 91 19:21:54 +0100
Received: by mips.complang.tuwien.ac.at (5.57/Ultrix3.0-C)
        id AA00288; Wed, 30 Oct 91 19:18:49 +0100
Date: Wed, 30 Oct 91 19:18:49 +0100
From: anton@mips.complang.tuwien.ac.at (Anton Martin Ertl)
Message-Id: <9110301818.AA00288@mips.complang.tuwien.ac.at>
To: internals@lsi17.lsi.usp.br
Subject: internals
Status: R

In <9110111436.AA04757@kyron.sw.stratus.com.sw.stratus.com> Nick
writes (in reply to Peter Knaggs):

>Since I've not received any comments
>on my post, I assume that it wasn't well received as a starting point for
>discussion.
Usually your posts are so neat that I have nothing to add.

>I believe,
>(but will go along with the majority opinion on this,) that we are not
>abstracting data enough.  That was the intent of project, was it not?
My intent was to make certain programming techniques standard Forth.
Abstraction is the means. I, too, see a need to be more abstract than
Mitch's FORML paper to ensure implementability on native code systems.

Below are some specific comments. It assumes we wish to keep this
addressing scheme.

>>        In the same respect I also beleve that a.addr should point to the
>>        start of the token.  Hence on a subroutine threded system a.addr
>>        will point to the subroiutne call instruction.
>
>Don't pin point it down.  a.addr returns an address which points TOKEN@
>uses to return an xt.  I should not have to care about what it points to.

Or if it is a pointer at all: In native code one instruction can
represent several words.

>STR@    ( c.addr1 length c.addr2 -- actualLength )
>        Move the string from c.addr1 to c.addr2. Do not exceed the provided
>        length count if the string is longer than provided for.  Return the
>        actual length of the string.

Why move it at all? Leave it in place, the user can copy it if (s)he
really wants to. Or are there address space problems (8088 et al.)?
Then copy it into a buffer (only one string at a time). Not good, but
better than having to allocate space for something I get, use and
immediately throw away. If you really must have preallocation, divide
the word in two - one for getting the length (for allocating), the other
for copying the string.

BTW, uses of ALLOCATE should be fully transparent, so what's your
problem with it (apart from deallocation)?

>>This is an interesting point.  However, if we were to introduce a new id
>>(namly "name-id" or "nid") we would also have to provide a way of converting
>>between nid and xt.  The possibality of getting this through the TSC will be
>>pretty remote in my view.
>
>You have a good point, but I agree with Mitch.  For some, the use of aliases
>is so pervasive that they may forget they are using them.  A decompilation
>to the original code may simply serve to confuse, or think the decompiler
>has a bug in it.
>
>>This will not allow us to re-construct the orriginal definition when such
>>alias have been used.  However, the definition we create will be
>>functionally the same, as we have simply resolved the aliasing.  I conceder
>>this to be a pain, but I don't see any way around it.
>
>The way around it is to use DTs instead of XTs.  For most of the words we are
>talking about,  this would work just fine.

For TOKEN@ etc.? Then the DT (= nid?) would have to be in the compiled
code (or the additional info). Would this not be a bit too
expensive (in terms of constraints on the implementation) just for
decompiling aliases well?

>I believe that it would be better if we implemented most words which
>work with names to use DTs, and use XTs when traversing through the
>parameter list of a word definition.   Would anyone be interested if
>I try to come up with such a word-set,  or is this direction OK with
>everyone?

I might be interested if I knew what you meant. If DT = nid, what do
you want to do that Mitch has not done (in
<9110012346.AA08358@mitch.Eng.Sun.COM>)?

>Food for thought:
>
>Are we concentrating at too low a level here?  It seems to me that the
>above word-set has been defined to handle the most common features of
>current forths.  Yet, we are defining a new word set, possibly for a
>standard.  Can't we expect changes to current forths to support this
>word-set.

It depends on how we market this wordset. If it's in the standard then
we can expect these changes. However, if the wordset requires such
changes, it will hardly get into the standard.

I see several levels of implementability of internals words:
1) factors of standard words (implementable on every standard system)
2) words implementable on current systems
3) words that require changes in the data structures of current
systems

Examples:
1) NAME>STRING
2) simple decompilation
3) detecting whether a word is headerless

I think that words on levels 2 and 3 should be in the internals
extensions wordset

>Specifically, what I would love to have, to really do this
>right, is a hook into the dictionary header which will allow me to
>vector a data return routine. This way, if I have a complex data
>structure, I can call a routine which will return the value of each
>field in sequence, so I can type them out.

You mean, especially for taking created words apart, with the routine
supplied by the user? Nice idea

>I know this is of limited value in a production environment

I would not bet on it.

- anton

From mips!anton@relay.EU.net Fri Nov  1 13:04:57 1991
Received: from lsi7.lsi.usp.br by lsi11.lsi.usp.br (4.1/LSI-16OCT91-01)
        id AA00183; Fri, 1 Nov 91 09:39:09 EDT
Received: from lsi11.lsi.usp.br by lsi7.lsi.usp.br (4.1/SMI-4.1)
        id AA00133; Fri, 1 Nov 91 09:38:51 EDT
Received: from lsi17.lsi.usp.br by lsi11.lsi.usp.br (4.1/LSI-16OCT91-01)
        id AA00162; Fri, 1 Nov 91 09:39:02 EDT
Received: from mcsun.EU.net by lsi17.lsi.usp.br (4.0/SMI-4.1)
        id AA03152; Thu, 31 Oct 91 19:03:58-030
Received: from tuvie.can.ac.at by mcsun.EU.net with SMTP;
        id AA03023 (5.65a/CWI-2.120); Thu, 31 Oct 1991 18:39:05 +0100
Received: from mips.complang.tuwien.ac.at by tuvie.can.ac.at with SMTP;
        id AA14694 (5.65b+/CAN-1.15); Thu, 31 Oct 91 18:41:45 +0100
Received: by mips.complang.tuwien.ac.at (5.57/Ultrix3.0-C)
        id AA04422; Thu, 31 Oct 91 18:38:46 +0100
Date: Thu, 31 Oct 91 18:38:46 +0100
From: anton@mips.complang.tuwien.ac.at (Anton Martin Ertl)
Message-Id: <9110311738.AA04422@mips.complang.tuwien.ac.at>
To: internals@lsi17.lsi.usp.br
Subject: decompiling and tracing
Status: RO

On decompiling and tracing (optimized native) code:

Currently I see three alternatives:

1) The "functional equivalence approach"
The Program is decompiled into code that is functionally equivalent to
the original code. Every instruction or chunk of instructions
decompiles into one or more Forth words. Since the decompiler already
produces a register-stack mapping for its own purposes (regenerating
stack words), displaying the stack between two steps or at a
breakpoint should be easy. The regenerated program could look very
different from the original one.

2) The "source approach"
The system keeps additional information that enables mapping of
executable instructions to the Forth source. When tracing, the decompiler
just shows in the source, which word(s) is/are executed in the next
step. Words that are optimized away (e.g. stack manipulation) are
never shown. (This contrasts with approach 1 where such words are
(re)generated). Displaying stack values is more difficult in this
approach, since you want to display the stack of the source program:
(in 1, the stack of the regenerated program is displayed)
a) Since decompiling one instruction may yield multiple words at a
time, that need not be adjacent in the source program, for which
source position do you display the stack?
b) not all stack values of a source can be displayed (optimized away,
not available due to reordering ...). Are those that can be displayed,
sufficient for the user?
c) However, values for other source positions can be displayed. Can
this info be organized in a way useful for the user?
This is the approach taken by other languages (like C).

3) The "virtual execution approach"
Similar to the "source approach", but when tracing the system does not
execute the optimized code, but executes (or simulates the execution
of) code that corresponds to the Forth source. Therefore the stack can
always be displayed. But there are problems when switching from real
to virtual execution (e.g. if an exception occurs, or at a breakpoint)
and back. I could not find a solution to this problem yet. Also, due
to portability bugs in the users code or bugs in the optimizer the
results of real and virtual execution might differ.

The "functional equivalence approach" and the "virtual execution
approach" seem to provide more power: They can not only be used for
debugging, but also for purposes like abstract interpretation.

With the current compilers the questions outlined above (Are the
regenerated programs understandable? Are the stack displays useful?)
do not pose themselves. We will get the answers, when (if?) highly
optimizing compilers appear.

So what alternative will our wordset support? (We could support more
than one)

I think, that we should support the "functinal equivalence approach"
for the following reasons:
1) It provides more power than the "source approach"
2) It does not have the real/virtual switching problems

The "source approach" becomes interesting as soon as a source or
editor interface is designed.

- anton

From cmr02@scm.tees-poly.ac.uk Thu Nov  7 02:18:01 1991
Received: from lsi7.lsi.usp.br by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01)
        id AA01209; Wed, 6 Nov 91 23:08:31 EDT
Received: from lsi11.lsi.usp.br by lsi7.lsi.usp.br (4.1/SMI-4.1)
        id AA00909; Wed, 6 Nov 91 23:08:09 EDT
Received: from sun2.nsfnet-relay.ac.uk by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01)
        id AA01206; Wed, 6 Nov 91 23:05:35 EDT
Received: from scm.tees-poly.ac.uk by sun2.nsfnet-relay.ac.uk via JANET 
          with NIFTP id <2825-0@sun2.nsfnet-relay.ac.uk>;
          Wed, 6 Nov 1991 08:31:08 +0000
From: Peter Knaggs (Research) <cmr02@scm.tees-poly.ac.uk>
Date: Tue, 5 Nov 91 17:45:20 GMT
Message-Id: <24897.9111051745@scm.tp.ac.uk>
To: internals <<@nsfnet-relay.ac.uk:internals@lsi.usp.br>>
Subject: Internals Wordset
Status: RO

Ok,

Well I have been out of the fray so to speek.  Since the change of address
and all that has cause large amounts of confusion at this end.  Most of
witch has been been cleaned up.

However, I can see form the mailings that the rest of the group wants to
abstract a bit.  I shall relent and will have to aggree with you all on that
one.  There are some other general comments regarding Mitch Bradley's
wordset (with my alterations).  I have just prinited of all the Internals
mailing that I have, and hope to be back with you, with some more detailed
answers on this and other questions later on.  Probably next week.

Peter Knaggs            School of Computing and Maths, Teesside Polytechnic,
pjk @ scm.tp.ac.uk      Middlesbrough, England.    +44 (642) 342673

From cmr02@scm.tees-poly.ac.uk Tue Nov 26 05:05:37 1991
Received: from lsi7.lsi.usp.br by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01)
        id AA03273; Tue, 26 Nov 91 01:54:06 EDT
Received: from lsi11.lsi.usp.br by lsi7.lsi.usp.br (4.1/SMI-4.1)
        id AA01337; Tue, 26 Nov 91 01:53:27 EDT
Received: from sun2.nsfnet-relay.ac.uk by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01)
        id AB03270; Tue, 26 Nov 91 01:51:24 EDT
Received: from scm.tees-poly.ac.uk by sun2.nsfnet-relay.ac.uk via JANET 
          with NIFTP id <5639-0@sun2.nsfnet-relay.ac.uk>;
          Mon, 25 Nov 1991 10:42:27 +0000
From: Peter Knaggs (Research) <cmr02@scm.tees-poly.ac.uk>
Date: Mon, 25 Nov 91 10:24:18 GMT
Message-Id: <15620.9111251024@scm.tp.ac.uk>
To: internals <<@nsfnet-relay.ac.uk:internals@lsi.usp.br>>
Subject: Internals: ...
Status: R

Finally I have some new ideas with regard to TOKEN@, LITERAL@ and the ilk.
However I think we had better get this lot sorted out before I give you thows

Peter Knaggs            School of Computing and Maths, Teesside Polytechnic,
pjk @ scm.tp.ac.uk      Middlesbrough, England.         +44 (642) 342673


From cmr02@scm.tees-poly.ac.uk Tue Nov 26 05:06:27 1991
Received: from lsi7.lsi.usp.br by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01)
        id AA03276; Tue, 26 Nov 91 01:55:34 EDT
Received: from lsi11.lsi.usp.br by lsi7.lsi.usp.br (4.1/SMI-4.1)
        id AA01342; Tue, 26 Nov 91 01:54:55 EDT
Received: from  by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01)
        id AB03270; Tue, 26 Nov 91 01:54:04 EDT
Received: from scm.tees-poly.ac.uk by sun2.nsfnet-relay.ac.uk via JANET 
          with NIFTP id <5639-1@sun2.nsfnet-relay.ac.uk>;
          Mon, 25 Nov 1991 10:43:08 +0000
From: Peter Knaggs (Research) <cmr02@scm.tees-poly.ac.uk>
Date: Mon, 25 Nov 91 10:22:44 GMT
Message-Id: <15606.9111251022@scm.tp.ac.uk>
To: internals <<@nsfnet-relay.ac.uk:internals@lsi.usp.br>>
Subject: Internals: NAME@, .NAME
Status: R

This brings me onto the problem of .NAME etc.

Can we at least aggree on a name for the word that returns a sting?  To keep
with the rest of the wordset may I suggest:


NAME@   ( nid -- c.addr n )                                     "Name Fetch"
        c.addr is the character alligned address of n charaters that
        represents the name associated with the name identifier nid.

        If the orginal name of the word can not be reproduced then a system
        dependent representation is returned.  An ambiguas condition exists
        if nid is not a valid name identifier.


Anton's request for this word to return somethink that will compile to the
same nid is impossable, as you can not protect against possable name clashes
(Well it is impossable in my system at least).

Normally I don't aggree with returning an address and count, although I think
that it is right for this case.

The name can be left in-situe where possable, or copied into a buffer (say 
PAD) as required.


.NAME   ( xt -- )                                               "Dot Name"
        Display the name (or a system dependent representation of the name)
        corresponding to the execution token xt.


.NAME can be defined as:

        : .NAME ( xt -- ; Display name for xt )
          X>N           \ Convert xt to nid
          NAME@         \ Get string for nid
          TYPE          \ Display name
        ;

Now that we have the word NAME@ I believe that we should not dump .NAME, but
leave it there for assistance in debugging.  It is for this reasion that 
I have defined .NAME to use the xt rather than a nid.  Thus you can give the
phrase:

        ' WORD .NAME    word Ok

or prehaps more inportantly:

        'EMIT @ .NAME   word Ok

Peter Knaggs            School of Computing and Maths, Teesside Polytechnic
pjk @ scm.tp.ac.uk      Middlesbrough, England.         +44 (642) 342673


From cmr02@scm.tees-poly.ac.uk Tue Nov 26 05:06:52 1991
Received: from lsi7.lsi.usp.br by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01)
        id AA03279; Tue, 26 Nov 91 01:56:48 EDT
Received: from lsi11.lsi.usp.br by lsi7.lsi.usp.br (4.1/SMI-4.1)
        id AA01346; Tue, 26 Nov 91 01:56:10 EDT
Received: from  by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01)
        id AB03270; Tue, 26 Nov 91 01:55:32 EDT
Received: from scm.tees-poly.ac.uk by sun2.nsfnet-relay.ac.uk via JANET 
          with NIFTP id <5639-2@sun2.nsfnet-relay.ac.uk>;
          Mon, 25 Nov 1991 10:44:04 +0000
From: Peter Knaggs (Research) <cmr02@scm.tees-poly.ac.uk>
Date: Mon, 25 Nov 91 10:22:02 GMT
Message-Id: <15584.9111251022@scm.tp.ac.uk>
To: internals <<@nsfnet-relay.ac.uk:internals@lsi.usp.br>>
Subject: Internals: nid/xt
Status: R

The introduction of name identifiers (nid) or a dictionary token (DT) requiresw
ords to transfer between the new nid and the only other thing in the standard
that comes close to the abstract type we require, an execution token (xt).  I
suggest the follwoing definitions (in keeping with the conversion words
already in the standard).

N>X     ( nid -- xt )                                           "n to x"
        xt is the execution token associated with the word indicated by the
        name identifier nid.

Note: There may be a 1:1 or n:1 relation between nid and xt.  This definition
      does not inhibit such a relationship.

X>N     ( xt -- nid )                                           "x to n"
        nid is a name identifier of a word associated with the execution token
        xt.

Note: This definition is worded such that a system with 1:1 or n:1 nid:xt
      relationship can be implemented.  In the n:1 case it is upto the system
      as to which nid is returned (the original or most reciently defined).

Peter Knaggs            School of Computing and Maths, Teesside Polytechnic,
pjk @ scm.tp.ac.uk      Middlesbrough, England.         +44 (642) 342673


From cmr02@scm.tees-poly.ac.uk Tue Nov 26 05:07:43 1991
Received: from lsi7.lsi.usp.br by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01)
        id AA03285; Tue, 26 Nov 91 02:00:06 EDT
Received: from lsi11.lsi.usp.br by lsi7.lsi.usp.br (4.1/SMI-4.1)
        id AA01350; Tue, 26 Nov 91 01:59:27 EDT
Received: from  by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01)
        id AB03270; Tue, 26 Nov 91 01:56:47 EDT
Received: from scm.tees-poly.ac.uk by sun2.nsfnet-relay.ac.uk via JANET 
          with NIFTP id <5639-3@sun2.nsfnet-relay.ac.uk>;
          Mon, 25 Nov 1991 10:44:54 +0000
From: Peter Knaggs (Research) <cmr02@scm.tees-poly.ac.uk>
Date: Mon, 25 Nov 91 10:20:55 GMT
Message-Id: <15560.9111251020@scm.tp.ac.uk>
To: internals <<@nsfnet-relay.ac.uk:internals@lsi.usp.br>>
Subject: Internals: Follow/Another?/Unfollow
Status: R

I sent this mail some time ago, but am not sure that it got out of the
maillist.  Hence, I am sending it again.  If you have already recieved a
copy, could you let me know, as I am still not totally happy with the mail
system on our Sun4s.

-----------------------------  Cut Here  -------------------------------------

Over the weekend I was deep into a bottle of wisky, sorry though.  I have now
been convinced that we should use an abstract token for wordsets.

Now, with this in mind I have been thinking about the wordlist scanning words
(FOLLOW, ANOTHER?, and UNFOLLOW).

o  In the previous definitions of these words I declared state to be unsized.
   In my system (for speed, and simplisity) I would store seventeen items 
   in the state.  This causes it's own problems.  If I wanted to follow more
   than one wordlist at a time, I would have to know how large the 'state' is
   inorder to manipulate the stack correctly.

To this end may I recomend the following (new) definitions:

FOLLOW  ( wid1 ... widn n -- state )
        Initialise a system dependent stack structure (state) in preparation
        for scanning the given n wordlists (indicated by wid1 ... widn).  The
        system dependent value (state) may be of any length.  FOLLOW is used
        in conjunction with ANOTHER?, and UNFOLLOW.

        See also: ANOTHER?; UNFOLLOW.

ANOTHER? ( state1 --- state2 nid flag )                 "Another Query"
        Extracts the name identifyer (nid) of the next entry in the 
        wordlist(s) being scanned (indicated by the system dependent stack
        structure state1, as initialsed by FOLLOW).  If there a word is found
        the nid of the word is returned, in addition to an updated search
        status (state2) and a true flag.  If no more words are found in the
        search then a false is returned and nid is not valid.

        ANOTHER? is used in conjunction with FOLLOW and UNFOLLOW.

        Example usage:
                : (WORDS) ( state -- ; List all the words in the search )
                  BEGIN
                    ANOTHER?
                    KEY? 0=
                    AND
                  WHILE
                    NAME@ TYPE SPACE
                  REPEAT
                  UNFOLLOW
                ;

                : WORDS ( -- ; Display all words in current word list )
                  GET-ORDER 1- 0 ?DO DROP LOOP 1 FOLLOW (WORDS) ;

                : VLIST ( -- ; Display all words in search order )
                  GET-ORDER FOLLOW (WORDS) ;

        See also: FOLLOW; UNFOLLOW.


UNFOLLOW ( state nid -- )
        Removes the system dependent stack structure (state) initlised by
        FOLLOW, and the nid returned by ANOTHER?.

        UNFOLLOW is normally used when exiting an ANOTHER? based loop.

        See also: FOLLOW; ANOTHER?.


You may notice that I have also changed the action of ANOTHER? and UNFOLLOW.
This is basically because I tried to define WORDS, and VLIST using the
previous stack effects, and ended up deciding that the new stack effects would
make such definitions a lot simpler.

This also means that thouse of you how want to allocate some memory in FOLLOW
to store the state, can deallocate the memory using UNFOLLOW as you are now
required to always have an UNFOLLOW matching a FOLLOW otherwise there will be
a (possable vary large) stack inbalance.

If we were to force state to be of a given size the I for one would have to
implement a memory allocation/deallocation system to implement thes words.

Peter Knaggs            School of Computing and Maths, Teesside Polytechnic,
pjk @ scm.tp.ac.uk      Middlesbrough, England.         +44 (642) 342673


From Mitch.Bradley%eng.sun.com@sun.com Sat Nov 30 02:38:53 1991
Received: from lsi7.lsi.usp.br by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01)
        id AA02000; Fri, 29 Nov 91 22:27:46 EDT
Received: from lsi11.lsi.usp.br by lsi7.lsi.usp.br (4.1/SMI-4.1)
        id AA03709; Fri, 29 Nov 91 22:27:04 EDT
Received: from sun2.nsfnet-relay.ac.uk by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01)
        id AA01996; Fri, 29 Nov 91 22:22:25 EDT
Received: from vax.nsfnet-relay.ac.uk by sun2.nsfnet-relay.ac.uk 
          with SMTP inbound id <29902-220@sun2.nsfnet-relay.ac.uk>;
          Fri, 29 Nov 1991 03:10:41 +0000
Received: from sun.com by vax.NSFnet-Relay.AC.UK via NSFnet with SMTP 
          id aa23630; 29 Nov 91 1:53 GMT
Received: from Eng.Sun.COM (zigzag-bb.Corp.Sun.COM) by Sun.COM (4.1/SMI-4.1) 
          id AA26091; Tue, 26 Nov 91 11:45:46 PST
Received: from mitch.Eng.Sun.COM by Eng.Sun.COM (4.1/SMI-4.1) id AA13269;
          Tue, 26 Nov 91 11:44:28 PST
Received: by mitch.Eng.Sun.COM (4.1/SMI-4.1) id AA07507;
          Tue, 26 Nov 91 11:45:34 PST
Message-Id: <9111261945.AA07507@mitch.Eng.Sun.COM>
To: Peter Knaggs (Research) <cmr02@scm.tees-poly.ac.uk>
Cc: internals <<@nsfnet-relay.ac.uk:internals@lsi.usp.br>>
Subject: Re: Internals: Follow/Another?/Unfollow
Date: 26 Nov 91 11:45:34 PST (Tue)
From: Mitch.Bradley@Eng.Sun.com
Status: R

> FOLLOW        ( wid1 ... widn n -- state )

Why should we scan all the wordlists simulataneously?  I would prefer
a more primitive function that scans only one wordlist at a time.

>       Initialise a system dependent stack structure (state) in preparation
> etc
> ANOTHER? ( state1 --- state2 nid flag )                       "Another Query"

How about just:

        ANOTHER-WORD?  ( nid1 wid -- false | nid2 true )

    Finds the successor "nid2" of the word "nid1" in the wordlist "wid", 
   or the first word in that wordlist if "nid1" is zero.

I submit that any kind of search state that you need to maintain is the
implementation's problem.  The way I do is is to keep a single global
state array that "caches" the most recent search state.  I "tag" that
cache with the last nid and wid, and rebuild it if I get a "miss".

Example usage:

        : (WORDS)  ( wid -- )
           >R  0
           BEGIN  R@ ANOTHER-WORD?  WHILE   ( nid )  ( r: wid )
              DUP NAME>STRING TYPE SPACE    ( nid )
           REPEAT                           ( )
           R> DROP
        ;
        : GET-CONTEXT  ( -- wid )
           GET-ORDER  1- 0 ?DO  NIP  LOOP
        ;
        : WORDS ( -- ; Display all words in current word list )
           GET-CONTEXT (WORDS)
        ;
        : VLIST ( -- ; Display all words in search order )
           GET-ORDER  0 ?DO  (WORDS)  LOOP
        ;

From <@sun.com:Mitch.Bradley@eng.sun.com> Sat Nov 30 09:30:30 1991
Received: from lsi7.lsi.usp.br by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01)
        id AA02082; Sat, 30 Nov 91 06:25:03 EDT
Received: from lsi11.lsi.usp.br by lsi7.lsi.usp.br (4.1/SMI-4.1)
        id AA03751; Sat, 30 Nov 91 06:24:20 EDT
Received: from sun2.nsfnet-relay.ac.uk by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01)
        id AA02079; Sat, 30 Nov 91 06:24:35 EDT
Received: from vax.nsfnet-relay.ac.uk by sun2.nsfnet-relay.ac.uk 
          with SMTP inbound id <24752-115@sun2.nsfnet-relay.ac.uk>;
          Sat, 30 Nov 1991 01:14:25 +0000
Received: from sun.nsfnet-relay.ac.uk by vax.NSFnet-Relay.AC.UK via Ethernet 
          with SMTP id ab11051; 29 Nov 91 16:52 GMT
Received: from vax.nsfnet-relay.ac.uk by sun.NSFnet-Relay.AC.UK Via Ethernet 
          with SMTP id om26722; 29 Nov 91 13:11 GMT
Received: from sun.com by vax.NSFnet-Relay.AC.UK via NSFnet with SMTP 
          id aa00443; 29 Nov 91 6:10 GMT
Received: from Eng.Sun.COM (zigzag-bb.Corp.Sun.COM) by Sun.COM (4.1/SMI-4.1) 
          id AA19432; Tue, 26 Nov 91 11:16:09 PST
Received: from mitch.Eng.Sun.COM by Eng.Sun.COM (4.1/SMI-4.1) id AA11356;
          Tue, 26 Nov 91 11:14:51 PST
Received: by mitch.Eng.Sun.COM (4.1/SMI-4.1) id AA07467;
          Tue, 26 Nov 91 11:15:58 PST
Message-Id: <9111261915.AA07467@mitch.Eng.Sun.COM>
To: Peter Knaggs (Research) <cmr02@scm.tees-poly.ac.uk>
Cc: internals <<@nsfnet-relay.ac.uk:internals@lsi.usp.br>>
Subject: Re: Internals: nid/xt
Date: 26 Nov 91 11:15:57 PST (Tue)
From: Mitch.Bradley@eng.sun.com
Status: R

>  N>X  ( nid -- xt )                                           "n to x"
>  X>N  ( xt -- nid )                                           "x to n"

The Forth-83 names for these functions are NAME> and >NAME , as described
in an experimental proposal included in the Forth-83 document.

I see nothing wrong with those "old" names, because they already
do the "right thing" if you establish the correspondence:

        Abstract        Actual Type in Some
          Type            Implementations
        --------        --------------------

           xt                   cfa
           nid                  nfa

Thus, we can simply generalize the traditional words, changing their
definitions to use "opaque" data types instead of particular addresses.

I don't like the terseness of X>N and N>X ; terseness is good for
frequently-used functions, but infrequently-used functions should
"spell out" their function.  XT>NAME and NAME>XT would be even better,
except that NAME> and >NAME are already familiar and accepted in some
circles.  Familiarity is a big advantage when you are trying to convince
people to go along with a new scheme.

Mitch

From anton%mips@AWITUW64.BITNET Sat Dec  7 22:27:15 1991
Received: from lsi7.lsi.usp.br by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01)
        id AA09925; Sat, 7 Dec 91 17:39:54 EDT
Received: from lsi11.lsi.usp.br by lsi7.lsi.usp.br (4.1/SMI-4.1)
        id AA00978; Sat, 7 Dec 91 17:39:04 EDT
Received: from [143.108.254.245] by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01)
        id AB09919; Sat, 7 Dec 91 17:39:41 EDT
Return-Path: anton@mips
Received: from AWITUW64.BITNET (MAILER@AWITUW64) by brfapesp.bitnet with
 PMDF#10108; Fri, 6 Dec 1991 21:05 -0300+1
Received: From AWITUW64.BITNET By AWITUW64.BITNET ; 06 Dec 91 19:44:34 GMT
Received: from mips.complang.tuwien.ac.at by email.tuwien.ac.at (5.65b/1.34) id
 AA25524; Fri, 6 Dec 91 20:43:48 +0100
Received: by mips.complang.tuwien.ac.at (5.57/Ultrix3.0-C) id AA11057; Fri, 6
 Dec 91 20:43:20 +0100
Date: Fri, 6 Dec 91 20:43:20 +0100
From: anton@mips.lsi.usp.br.
From: anton@mips.complang.tuwien.ac.at (Anton Martin Ertl)
Subject: internals: name@, next-word and names
To: internals@lsi11.lsi.usp.br.lsi.usp.br.
Message-Id: <9112061943.AA11057@mips.complang.tuwien.ac.at>
X-Envelope-To: internals@lsi.usp.br
Status: R

Concerning Peter's and Mitch's recent postings:
 
>NAME@   ( nid -- c.addr n )                                     "Name Fetch"
>
>Anton's request for this word to return somethink that will compile to the
>same nid is impossable, as you can not protect against possable name clashes
>(Well it is impossable in my system at least).
You are right. Let's try a new wording:
The returned string should produce the same nid on searches through
the wordlist, unless the name is shadowed (by a more recent
definition).
BTW, what does dpANS say about shadowing? I did not find anything in
Basis 15.
Also, the description should include the minimal life expectancy of
the resulting string. Will it live 'here' or in the pad in its own
buffer. What can the user assume? Proposed wording:
The returned string may be situated in the pad.
 
 
The longer I think about FOLLOW/ANOTHER?/UNFOLLOW, the more I dislike
it. On implementations using trees a straightforward approach would
use unlimited space for the state/wordlist position. This causes
either an unknown size on stack or use of memory allocation words. In
both cases the state is not a first class data type (e.g. you cannot
copy it like an integer), which causes more troubles. (We could add
further words to make it first class again, e.g. a word for copying,
but I do not think anybody would like this)
 
So I support Mitch's solution. (It has one problem: It restricts the
sytem to have a nid only once in a wordlist. But I think that all
systems implement this restriction without being forced, so it should
not make trouble.)
 
However I like his older
NEXT-WORD ( nid1 wid -- nid2 )
better than
ANOTHER-WORD?  ( nid1 wid -- false | nid2 true ),
both the name and the stack effect. I think that the stack effect
( wid nid1 -- wid nid2 ) would be even nicer, resulting in:
 
        : (WORDS)  ( wid -- )
           0
           BEGIN  NEXT-WORD ?DUP WHILE      ( wid nid )
              DUP NAME>STRING TYPE SPACE    ( wid nid )
           REPEAT                           ( )
           DROP
        ;
 
 
Concerning the names, I agree with Mitch.
 
In addition, I think we should adapt some of the proposed names for
conversion words to the usual scheme to make the names easy to remember.
NAME>IMMEDIATE instead of IMMEDIATE? (should we keep the '?')
XT>DEFINER or >DEFINER instead of DEFINER
Also, I favor Mitch's NAME>STRING over Peters NAME@

From anton%mips@AWITUW64.BITNET Sat Dec  7 22:34:57 1991
Received: from lsi7.lsi.usp.br by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01)
        id AA09946; Sat, 7 Dec 91 17:41:28 EDT
Received: from lsi11.lsi.usp.br by lsi7.lsi.usp.br (4.1/SMI-4.1)
        id AA00983; Sat, 7 Dec 91 17:40:39 EDT
Received: from [143.108.254.245] by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01)
        id AB09936; Sat, 7 Dec 91 17:41:14 EDT
Return-Path: anton@mips
Received: from AWITUW64.BITNET (MAILER@AWITUW64) by brfapesp.bitnet with
 PMDF#10108; Sat, 7 Dec 1991 02:13 -0300+1
Received: From AWITUW64.BITNET By AWITUW64.BITNET ; 06 Dec 91 19:44:57 GMT
Received: from mips.complang.tuwien.ac.at by email.tuwien.ac.at (5.65b/1.34) id
 AA25539; Fri, 6 Dec 91 20:44:36 +0100
Received: by mips.complang.tuwien.ac.at (5.57/Ultrix3.0-C) id AA11061; Fri, 6
 Dec 91 20:44:08 +0100
Date: Fri, 6 Dec 91 20:44:08 +0100
From: anton@mips.lsi.usp.br.
From: anton@mips.complang.tuwien.ac.at (Anton Martin Ertl)
Subject: internals: words i miss
To: internals@lsi11.lsi.usp.br.lsi.usp.br.
Message-Id: <9112061944.AA11061@mips.complang.tuwien.ac.at>
X-Envelope-To: internals@lsi.usp.br
Status: R

Some words for manipulating words and wordlists I missed in the
discussion or would do differently:
 
Motivation for the new words:
Make it easier and more portable to build programs that build, change
or analyse programs (metaprograms).
 
 
Data types:
definer-id: identifies the kind of word (colon def, variable, ...).
In conventional systems this could be the code address (the content of
the cfa). Different does> actions have a different definer-id. It
needs one cell (I hope this suffices). What is the definer-id of code
words?  Implementation-dependent? I think it would be more useful if
they returned a unique value (for all code words). But it would be
harder to implement.
 
 
Words:
Mitchs CREATE-WORD ( addr len wid -- ) has no source for the immediate
flag and the xt of the word. Therefore I propose to extend it in the
following way:
CREATE-WORD ( addr len wid flag definer-id -- nid )
creates a word with the name given by addr len, of type definer-id
with immediateness given by flag, and inserts it into the wordlist
wid.
 
Note, that a stack effect like ( addr len wid flag xt -- nid ) would
be more flexible, but would force the system to implement a n:1
relation of nids to xts.
 
 
NAME-IMMEDIATE! ( flag nid -- )
changes immediateness of name to flag
 
 
CREATE-XT ( definer-id -- xt )
cretes an anonymous word of type definer-id
 
 
Mitchs DEFINER ( xt1 -- xt2 ), which returns the xt of the defining
word of the word xt1, is useful, but hard or impossible to
implement, e.g.
: a DOES> ... ;
: b CREATE ... a ;
: c CREATE ... ;
: d c a ;
: e d ;
b b1
d d1
e e1
What are the defining words of b1, d1 and e1? Returning b, d and e
resp. might be quite useful, but IMO is impossible to
implement. Returning `a' can be implemented (even that's quite hard),
but has no advantage over returning a definer-id. So I propose:
XT>DEFINER ( xt -- definer-id )
Gets the definer-id of the xt.
 
It could also be called >DEFINER to stay consistent with other words,
that do not mention the xt, or DEFINER to stay consistent with Mitch's
existing usage.
 
 
>NAME? ( xt -- flag )
returns true if the xt is associated with a name, otherwise false.
 
This can be hard to implement on many systems (e.g. fig). Mitch's
>NAME includes this function, Peter's X>N seems not to. Since the
function "get the name assuming there is one" is usually much easier
to implement, IMO these functions should be separated, to avoid
implementors dropping both, if they do not implement this one.
Alternative: make >NAME (X>N) ambiguous, if the xt does not have a
name, but if the system can detect this, it should return 0.
(half-ambiguous situation)
 
 
XT-DEFINER! ( definer-id xt -- )
changes the definer of xt to definer-id.
 
CREATE-DEFINER ( xt -- definer-id )
creates a definer-id for words that when called push their body
address and then execute the xt
This is best explained by example:
  ' x CREATE-DEFINER
  CREATE y  ' y  XT-DEFINER!
is equivalent to
  : cx CREATE DOES> x ;
  cx y
 
These two words are factors of Bill Ragdale's DOES, that Mitch
discussed on comp.lang.forth some weeks ago.
 
 
Just for discussion:
NAME>WID ( nid -- wid )
Get a wordlist the name is in.
 
Might be hard to implement. How useful is it?
 
 
>SIZE ( xt -- u )
returns size of memory block allocated after the xt.
 
This is usually hard to implement. How useful is it?
 
 
NONAME ( -- xt )
After this word is called, the next defining-word does not consume the
input stream, but instead creates an anonymous word. The returned
execution token is the execution token of that word, if DP is not
changed between noname and the call of the defining word.
 
THIS-NAME ( addr len -- )
After this word is called, the next defining-word does not consume the
input stream, but instead creates a word with the name given by addr
len.
 
These two words are really ugly, but I could not find a nicer way. Or
should we just use create-xt and create-word respectively and
replicate the data structure building code? Then you might have to go
into other people's code, which you might not even have in source.
Adding words make the other implicit parameters to defining words
(e.g.  current) more explicit, is not necessary, since they can be
saved and restored.

From anton%mips@AWITUW64.BITNET Fri Jan 24 17:52:22 1992
Received: from fpsp.fapesp.br ([143.108.254.245]) by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01)
	id AA02321; Fri, 24 Jan 92 14:46:51 EDT
Return-Path: anton@mips
Received: from AWITUW64.BITNET (MAILER@AWITUW64) by brfapesp.bitnet with
 PMDF#10108; Fri, 24 Jan 1992 14:46 -0200(C)
Received: From AWITUW64.BITNET By AWITUW64.BITNET ; 24 Jan 92 15:47:44 GMT
Received: from mips.complang.tuwien.ac.at by email.tuwien.ac.at (5.65b/1.34) id
 AA08191; Fri, 24 Jan 92 16:43:38 +0100
Received: by mips.complang.tuwien.ac.at (5.57/Ultrix3.0-C) id AA01888; Fri, 24
 Jan 92 16:42:14 +0100
Date: Fri, 24 Jan 92 16:42:14 +0100
From: anton@mips.lsi.usp.br
From: anton@mips.complang.tuwien.ac.at (Anton Martin Ertl)
Subject: Words I miss
To: internals@lsi11.lsi.usp.br.lsi.usp.br
Message-Id: <9201241542.AA01888@mips.complang.tuwien.ac.at>
X-Envelope-To: internals@lsi.usp.br
Status: RO

Another word I miss:
 
LOOKUP-NAME ( addr len wid -- nid )
  searches for the name given by addr len. If the wordlist wid contains
  the name, LOOKUP-NAME returns its nid, otherwise 0. This word is a
  factor of SEARCH-WORDLIST.
  (Is SEARCH-NAME or FIND-NAME better?)
 
Concerning THIS-NAME:
 
I realized that, in addition to being ugly THIS-NAME is unnecessary.
It's function can be achieved by EVALUATE using an even uglier
hack. This applies to NONAME, too (just use a dummy name and a scratch
wordlist)
- anton

From anton%mips@AWITUW64.BITNET Sat Feb  1 13:51:05 1992
Received: from [143.108.254.245] by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01)
	id AB00167; Sat, 1 Feb 92 10:49:00 EDT
Return-Path: anton@mips
Received: from AWITUW64.BITNET (MAILER@AWITUW64) by brfapesp.bitnet with
 PMDF#10108; Sat, 1 Feb 1992 09:46 -0200(C)
Received: From AWITUW64.BITNET By AWITUW64.BITNET ; 28 Jan 92 17:59:55 GMT
Received: from mips.complang.tuwien.ac.at by email.tuwien.ac.at (5.65b/1.34) id
 AA21718; Tue, 28 Jan 92 18:59:28 +0100
Received: by mips.complang.tuwien.ac.at (5.57/Ultrix3.0-C) id AA08434; Tue, 28
 Jan 92 18:58:37 +0100
Date: Tue, 28 Jan 92 18:58:37 +0100
From: anton@mips.lsi.usp.br
From: anton@mips.complang.tuwien.ac.at (Anton Martin Ertl)
Subject: colon definition internals words
To: internals@lsi11.lsi.usp.br.lsi.usp.br
Cc: anton@mips.lsi.usp.br
Message-Id: <9201281758.AA08434@mips.complang.tuwien.ac.at>
X-Envelope-To: internals@lsi.usp.br
Status: RO

This is a repost, since the first try seems to have got lost.
 
My proposals for words for accessing the body of colon defs:
 
I think that the words proposed in Mitch's paper is not abstract
enough to be universally implementable. It seems that there has been
some discussion on this before, but I missed part of it, so if I am
redoing things, I am sorry. Anyway, let's get on with the wordset
while the iron is hot.
 
As for the descriptions of the words, I think I have overdone them -
they sound precise, but are probably hard to understand, sorry. Feel
free to ask when something seems unclear.
 
Motivation:
These words are useful for applications like a decompiler, structure
charting, abstract interpretation.
 
Data types:
code position:
identifies a specific place in the colon definition. Its size is two
cells (Is this enough?). Code positions can be compared using double
operators; (or should there be an extra operator? or double unsigned?)
In conventional systems this will be a pointer into the actual
threaded code. This is Mitch's xadr.
 
Words:
The code that can be accessed through these words is equivalent to
the original code, but need not have any other similarity to the
original code or conform to the standard. The code can also be
different from the one produced at previous accesses (I am thinking of
native code implementations which might recreate different code when
they start decompiling in the middle instead of at the beginning)
 
XT>CODEPOS ( xt -- codepos )
returns the codepos of the first word of the colon definition xt. If
xt is no colon definition, 0. is returned.
 
DEFINER>CODEPOS ( definer-id -- codepos )
returns the codepos of the first word of the DOES>-part of the word
that creates words with the definer definer-id. If definer-id does not
belong to a DOES>-defining word, the effect is ambigous. (Defining
DEFINER>XT and using DEFINER>XT XT>CODEPOS would be nicer, but would
not be implementable without changing the structure of many systems
(e.g. fig-derived))
 
NEXT-CODEPOS ( codepos1 -- codepos2 )
Codepos2 is the position of the word sequentially executed after the
word at codepos1, if the word at codepos1 is not a branching word.
codepos2 is greater than codepos1. (Is this too restrictive?) It is
ensured that, starting at the beginning, by stepping through the colon
definition with NEXT-CODEPOS every word is seen exactly once.
If codepos1 is the position of the last word in the colon definition,
the result is ambigous. (It would be best to return 0.)
 
TOKEN@ ( codepos -- xt )
(or should we rename it, because the stack effect has changed?)
 
TOKEN! ( xt codepos -- )
If the word at codepos has inline arguments, the effect of TOKEN! is
ambigous. (This word is very hard to implement on native code systems)
 
CODEPOS-COMPILE ( codepos -- )
append execution semantics of the code at codepos to the current
definition. (In plain English: Compile the word at codepos and its
inline arguments). An ambigous condition exists if the code is a
branching word.
 
CODEPOS>STRING ( codepos -- addr len )
Returns a string representation of the word at codepos and its inline
argument(s). (This word is useful only for the decompiler. Should we
include it? Are there other operations on inline arguments that might
be interesting? What's the lifetime of the string?)
 
CODEPOS>TARGETS ( codepos -- n*codepos n )
 
returns the code positions of the words that can be executed
immediately after the word at codepos. n*codepos are the possible
targets, n is their number; the targets can include the word returned
by NEXT-CODEPOS, in case of nonbranching words it will be the only
target. For words using targets supplied at run-time (EXIT, THROW)
only the statically determined targets are returned (e.g. none for
EXIT). (I hate words with variable stack effects - is there a better
solution with less than three words?)
 
BREAKPOINT! ( xt codepos -- )
causes the execution of the breakpoint handler xt, just before the
word at codepos is executed. If there already is a breakpoint at
codepos, it is replaced.  0 for the xt removes a possible breakpoint
(or the xt of NOOP instead of 0?). Before executing the handler
codepos is pushed. (BREAKPOINT! could be implemented using TOKEN!, but
it is probably easier to implement BREAKPOINT! than TOKEN! in native
code systems)
 
BREAKPOINT@ ( codepos -- xt )
This is included to enable programs to be well-behaved, i.e. so a
module can avoid treading on another module's feet.  (Should we add to
the discription of BREAKPOINT! that it works only on the task that
executed the BREAKPOINT!)
 
TRACING! ( xt -- )
behaves like a BREAKPOINT! on the whole code in the system. To
avoid an endless loop tracing is turned off while executing xt.
(This cannot be implemented using BREAKPOINT! because of things like
EXIT and THROW).
 
TRACING@ ( -- xt )
 
- anton (anton@mips.complang.tuwien.ac.at)
(Note that the header is a bit scrambled, so mail explicitely)
(If you have mailed anything to me on my last postings (Dec 6 and Jan
25, I did not receive it)

From internals@lsiserv2.lsi.usp.br Mon Feb 24 14:41:48 1992
Received: from lsiserv2 (lsiserv2.lsi.usp.br) by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01)
	id AA12732; Mon, 24 Feb 92 10:32:52 EST
Errors-To: pl@lsiserv2.lsi.usp.br
Received: from  (loopback) by lsiserv2 (4.1/SMI-4.1)
	id AA00987; Mon, 24 Feb 92 10:21:27 EST
Date: Mon, 24 Feb 92 10:21:26 EST
Errors-To: pl@lsiserv2.lsi.usp.br
Message-Id: <9202171911.AA14560@mips.complang.tuwien.ac.at>
Comment:  Forth Internals Distribution List
Originator: internals
Errors-To: pl@lsiserv2.lsi.usp.br
Reply-To: <internals@lsiserv2.lsi.usp.br>
Sender: internals@lsiserv2.lsi.usp.br
Version: 5.4 -- Copyright (c) 1991/92, Anastasios Kotsikonas
From: anton@mips.complang.tuwien.ac.at (Anton Martin Ertl)
To: Multiple recipients of list <internals@lsiserv2.lsi.usp.br>
Subject: internals dpANS comment
Status: RO

 
Should we write a public review comment urging the TC to include an
internals wordset and promising to supply a wordset as baseline for
their discussion later?
As far as I know, the deadline is February 25th.
 
If yes, who will do it?
 
Is there still anybody interested? (I ask because I have only heard
myself on this mailing list since December)
 
 
M. Anton Ertl                    Some things have to be seen to be believed
anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen


From nick@pwllheli.sw.stratus.com Mon Feb 24 14:56:15 1992
Received: from lectroid.sw.stratus.com (lectroid-gw.sw.stratus.com) by transfer.stratus.com (4.1/3.8-jjm)
	id AA08529; Mon, 24 Feb 92 08:55:21 EST
Received: from pwllheli.sw.stratus.com by lectroid.sw.stratus.com (4.1/3.7-jjm)
	id AA03589; Mon, 24 Feb 92 08:55:39 EST
Received: by pwllheli.sw.stratus.com (4.1/SMI-4.0)
	id AA01716; Mon, 24 Feb 92 08:55:36 EST
Date: Mon, 24 Feb 92 08:55:36 EST
From: nick@pwllheli.sw.stratus.com (Nicolas Tamburri)
Message-Id: <9202241355.AA01716@pwllheli.sw.stratus.com>
To: anton@mips.complang.tuwien.ac.at
Subject: Re:  internals dpANS comment
Status: RO


I regret that I have not been able to participate more fully in this
task.  Since the mailing list was formed my official work load has
increased to the point where it has not been possible for me to
compose coherent responses to any of the mailings, and so I have not
even tried beyond the first couple of mailings.  (I'm sure this has a
familiar ring to everyone else on this list.)

I appreciate the work you've done, and regret that I have not been
able to participate more. But, until things lighten up a little at
work, I don't believe I'll be able to contribute to this task beyond
reading.

							/nt

From internals@lsiserv2.lsi.usp.br Tue Feb 25 04:56:25 1992
Received: from lsiserv2 (lsiserv2.lsi.usp.br) by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01)
	id AA14968; Tue, 25 Feb 92 00:35:23 EST
Errors-To: pl@lsiserv2.lsi.usp.br
Received: from  (loopback) by lsiserv2 (4.1/SMI-4.1)
	id AA01603; Tue, 25 Feb 92 00:23:56 EST
Date: Tue, 25 Feb 92 00:23:55 EST
Errors-To: pl@lsiserv2.lsi.usp.br
Message-Id: <9202250121.AA06929@mitch.Eng.Sun.COM>
Comment:  Forth Internals Distribution List
Originator: internals
Errors-To: pl@lsiserv2.lsi.usp.br
Reply-To: <internals@lsiserv2.lsi.usp.br>
Sender: internals@lsiserv2.lsi.usp.br
Version: 5.4 -- Copyright (c) 1991/92, Anastasios Kotsikonas
From: Mitch.Bradley@Eng.Sun.COM
To: Multiple recipients of list <internals@lsiserv2.lsi.usp.br>
Subject: Re: internals dpANS comment
Status: RO

> Should we write a public review comment urging the TC to include an
> internals wordset and promising to supply a wordset as baseline for
> their discussion later?

My feeling is that it is much too late for the TC to consider such
a massive undertaking.  Debate over such a wordset would delay the
standard by at least 6 months, and I don't think that is in anybody's
best interests.

The internals stuff we have been discussing would be an appropriate
topic for the TC to consider as an extension *after* the standard is
approved.  The ANSI process does provide for such ongoing work.

Mitch


From internals@lsiserv2.lsi.usp.br Tue Feb 25 22:52:52 1992
Received: from lsiserv2 (lsiserv2.lsi.usp.br) by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01)
	id AA00674; Tue, 25 Feb 92 18:39:01 EST
Errors-To: pl@lsiserv2.lsi.usp.br
Received: from  (loopback) by lsiserv2 (4.1/SMI-4.1)
	id AA02990; Tue, 25 Feb 92 18:27:30 EST
Date: Tue, 25 Feb 92 18:27:30 EST
Errors-To: pl@lsiserv2.lsi.usp.br
Message-Id: <9202252137.AA05874@lsi3>
Comment:  Forth Internals Distribution List
Originator: internals
Errors-To: pl@lsiserv2.lsi.usp.br
Reply-To: <internals@lsiserv2.lsi.usp.br>
Sender: internals@lsiserv2.lsi.usp.br
Version: 5.4 -- Copyright (c) 1991/92, Anastasios Kotsikonas
From: pjk@scm.tp.ac.uk
To: Multiple recipients of list <internals@lsiserv2.lsi.usp.br>
Subject: internals dpANS comment
Status: RO

On the face of it, this seames like a good idea.  For a number of reasions
I would ask Mitch to do the job.  However, I think that we are all aggreed
that the Internals wordset would not make it into this revision.  We must
work out a full wordset, with the intent on getting it accepted into the
standard on it's next revision (in five years or so).  It may be a good
idea to write a letter to the TC stating this intent. (or is this what you
said?)

Peter J. Knaggs.        School of Computing and Maths, Teesside Polytechnic,
pjk @ scm.tp.ac.uk      Middlesbrough, England.         +44 (642) 342673

.


From internals@lsiserv2.lsi.usp.br Wed Feb 26 01:14:41 1992
Received: from lsiserv2 (lsiserv2.lsi.usp.br) by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01)
	id AA01144; Tue, 25 Feb 92 21:06:33 EST
Errors-To: pl@lsiserv2.lsi.usp.br
Received: from  (loopback) by lsiserv2 (4.1/SMI-4.1)
	id AA03496; Tue, 25 Feb 92 20:55:02 EST
Date: Tue, 25 Feb 92 20:55:01 EST
Errors-To: pl@lsiserv2.lsi.usp.br
Message-Id: <9202252321.AA08219@mitch.Eng.Sun.COM>
Comment:  Forth Internals Distribution List
Originator: internals
Errors-To: pl@lsiserv2.lsi.usp.br
Reply-To: <internals@lsiserv2.lsi.usp.br>
Sender: internals@lsiserv2.lsi.usp.br
Version: 5.4 -- Copyright (c) 1991/92, Anastasios Kotsikonas
From: Mitch.Bradley@Eng.Sun.COM
To: Multiple recipients of list <internals@lsiserv2.lsi.usp.br>
Subject: Re: internals dpANS comment
Status: RO

> On the face of it, this seames like a good idea.  For a number of reasions
> I would ask Mitch to do the job.

As before, I am not volunteering to, and will not be roped into, carrying
the ball on this issue.  I agreed to participate in technical discussions
but disclaimed interest in driving the issue in committee.

The committee battle on such a wordset will be long and bloody (or perhaps
short and fatal), and I'm insufficiently motivated to carry the banner,
lead the charge, and take the arrows.

Mitch


From internals@lsiserv2.lsi.usp.br Wed Feb 26 01:52:18 1992
Received: from lsiserv2 (lsiserv2.lsi.usp.br) by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01)
	id AA01148; Tue, 25 Feb 92 21:37:07 EST
Errors-To: pl@lsiserv2.lsi.usp.br
Received: from  (loopback) by lsiserv2 (4.1/SMI-4.1)
	id AA03530; Tue, 25 Feb 92 21:25:35 EST
Date: Tue, 25 Feb 92 21:25:35 EST
Errors-To: pl@lsiserv2.lsi.usp.br
Message-Id: <9202260035.AA05955@lsi3>
Comment:  Forth Internals Distribution List
Originator: internals
Errors-To: pl@lsiserv2.lsi.usp.br
Reply-To: <internals@lsiserv2.lsi.usp.br>
Sender: internals@lsiserv2.lsi.usp.br
Version: 5.4 -- Copyright (c) 1991/92, Anastasios Kotsikonas
From: pl@lsiserv2.lsi.usp.br (Pedro Sanchez)
To: Multiple recipients of list <internals@lsiserv2.lsi.usp.br>
Subject: Re: internals dpANS comment
Status: RO

>The committee battle on such a wordset will be long and bloody (or perhaps
>short and fatal), and I'm insufficiently motivated to carry the banner,
>lead the charge, and take the arrows.

 Oh, Mitch.  Where is your idealism?   :-)  
 Anyway, your words are very descriptive of the situation.
 
==========================================================================
Pedro Luis Prospero Sanchez       internet: pl@lsi.usp.br (PREFERRED)
University of Sao Paulo           uunet:    uunet!vme131!pl        
Dept. of Electronic Engineering   hepnet:   psanchez@uspif1.hepnet
phone: (055)(11)211-4574  home: (055)(11)914-9756 fax: (055)(11)815-4272
==========================================================================


From internals@lsiserv2.lsi.usp.br Wed Feb 26 04:26:08 1992
Received: from lsiserv2 (lsiserv2.lsi.usp.br) by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01)
	id AA01176; Wed, 26 Feb 92 00:21:56 EST
Errors-To: pl@lsiserv2.lsi.usp.br
Received: from  (loopback) by lsiserv2 (4.1/SMI-4.1)
	id AA03605; Wed, 26 Feb 92 00:10:24 EST
Date: Wed, 26 Feb 92 00:10:23 EST
Errors-To: pl@lsiserv2.lsi.usp.br
Message-Id: <9202260319.AA08383@mitch.Eng.Sun.COM>
Comment:  Forth Internals Distribution List
Originator: internals
Errors-To: pl@lsiserv2.lsi.usp.br
Reply-To: <internals@lsiserv2.lsi.usp.br>
Sender: internals@lsiserv2.lsi.usp.br
Version: 5.4 -- Copyright (c) 1991/92, Anastasios Kotsikonas
From: Mitch.Bradley@Eng.Sun.COM (Mitch Bradley)
To: Multiple recipients of list <internals@lsiserv2.lsi.usp.br>
Subject: Re: internals dpANS comment
Status: RO

> Oh, Mitch.  Where is your idealism?   :-)

Hmmm, where did I put that idealism?  I know I had it when I came in...

Seriously, it got beaten out of me 'round about my second or third ANS Forth
committee meeting.  Once it was gone, I started to actually have some
impact on the committee.  I learned what it takes to persuade a diverse
group of people to see things your way, or at least to vote with you.

Mitch


From internals@lsiserv2.lsi.usp.br Sun Jun 21 19:34:05 1992
Received: from lsiserv2 (lsiserv2.lsi.usp.br) by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01)
	id AA10641; Sun, 21 Jun 92 14:31:18 EST
Errors-To: pl@lsiserv2.lsi.usp.br
Received: from  (loopback) by lsiserv2 (4.1/SMI-4.1)
	id AA02854; Sun, 21 Jun 92 14:09:30 EST
Date: Sun, 21 Jun 92 14:09:30 EST
Errors-To: pl@lsiserv2.lsi.usp.br
Message-Id: <9206191629.AA08486@mips.complang.tuwien.ac.at>
Comment:  Forth Internals Distribution List
Originator: internals
Errors-To: pl@lsiserv2.lsi.usp.br
Reply-To: <internals@lsiserv2.lsi.usp.br>
Sender: internals@lsiserv2.lsi.usp.br
Version: 5.5 -- Copyright (c) 1991/92, Anastasios Kotsikonas
From: anton@mips.complang.tuwien.ac.at (Anton Martin Ertl)
To: Multiple recipients of list <internals@lsiserv2.lsi.usp.br>
Subject: internals progress report
Status: OR

 
Somebody on comp.lang.forth has asked for a progress report of the
internals mailing list. I have put a short one together. Unless you
complain, I'll post in on June 24th (Wednesday).
 
To have some progress to report in the future (8-), I would like to see
a bit of discussion on the unresolved issues (NEXT-WORD vs.
FOLLOW/ANOTHER?/UNFOLLOW) and on the words not discussed until now (my
proposals for decompilation and debugging). Shall I do a summary of
these things so you don't have to wade through all the old postings?
If the reason for your silence is lack of time, how about concentrating
on the dictionary stuff for now?
 
- anton
 
M. Anton Ertl                    Some things have to be seen to be believed
anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen
 
------------------ progress report --------------------------
Somebody on comp.lang.forth has asked for a progress report of the
internals mailing list. Here's a short one:
 
The goal of our work is the internals wordset: It should provide access
to internals of the Forth system without restricting the implementation.
 
We have made good progress on dictionary access, but this has not yet
resulted in a document that can be presented to the public.
 
There have also been proposals for words necessary for
deompilation/debugging, but they have not been discussed yet.
 
In the last months the mailing list was quiet. Maybe we need some fresh
blood. Mail to pl@lsi.usp.br (Pedro Sanchez) to participate.


From internals@lsiserv2.lsi.usp.br Tue Jun 16 02:36:01 1992
Received: from lsiserv2 (lsiserv2.lsi.usp.br) by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01)
	id AA02422; Mon, 15 Jun 92 21:32:25 EST
Errors-To: pl@lsiserv2.lsi.usp.br
Received: from  (loopback) by lsiserv2 (4.1/SMI-4.1)
	id AA01888; Mon, 15 Jun 92 21:11:15 EST
Date: Mon, 15 Jun 92 21:11:14 EST
Errors-To: pl@lsiserv2.lsi.usp.br
Message-Id: <9206151829.AA04871@mips.complang.tuwien.ac.at>
Comment:  Forth Internals Distribution List
Originator: internals
Errors-To: pl@lsiserv2.lsi.usp.br
Reply-To: <internals@lsiserv2.lsi.usp.br>
Sender: internals@lsiserv2.lsi.usp.br
Version: 5.5 -- Copyright (c) 1991/92, Anastasios Kotsikonas
From: anton@mips.complang.tuwien.ac.at (Anton Martin Ertl)
To: Multiple recipients of list <internals@lsiserv2.lsi.usp.br>
Subject: Re: What should the Standard include?
Status: OR

In article <3728.UUL1.3#5129@willett.pgh.pa.us> Doug Phillips writes:
|> Perhaps the internals group could post
|> occasional progress reports to ForthNet for those of us who are interested
|> in kind of knowing what is going on, but who cannot participate directly?
 
I have written up something. I'll post it on Thursday, unless
somebody complains:
 
-----------------------------------------------------------------
This is a short summary of the work of the internals group until now.
 
Our work until now covers two areas: dictionary access and things
necessary for decompiling and debugging.
 
Taking Mitch Bradleys previous work as basis, we are quite far in the
dictionary access discussion.
 
Proposals for decompiler and debugging words were made, but they were
not discussed yet.
 
In the last few months there was no activity on the mailing list,
since we all seem to have too little time. Perhaps fresh blood would
bring a little life into the discussion. Mail to Pedro Sanchez
(pl@lsi.usp.br) if you want to participate.
-----------------------------------------------------------------
 
BTW, I would like to hear some opinions on the things that have not
yet been discussed (the debugging words) and on the things that have
not yet been resolved(FOLLOW/ANOTHER/UNFOLLOW vs NEXT_WORD). Shall I
make a summary of the words up to now so that you don't have to wade
through all those old mails?
 
Also, what do you think about including words for accessing stacks and
locals (the locals thing would be hard, IMO).
 
- anton
--
M. Anton Ertl                    Some things have to be seen to be believed
anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen


From internals@lsiserv2.lsi.usp.br Tue Jul 21 02:15:10 1992
Received: by email.tuwien.ac.at (5.65b/1.34)
	id AA21830; Tue, 21 Jul 92 02:15:12 +0200
Received: From aearn.bitnet By awituw64.bitnet ; 21 Jul 92 00:15:11 GMT
Received: from brfapesp.bitnet by AEARN.EDVZ.Uni-Linz.AC.AT (Mailer R2.07) with
 BSMTP id 7956; Tue, 21 Jul 92 02:14:35 CDT
Received: from lsi11.lsi.usp.br by brfapesp.bitnet with PMDF#10108; Mon, 20 Jul
 1992 21:14 BSC (-0300 C)
Received: from lsiserv2.lsi.usp.br.lsi.usp.br by lsi11.lsi.usp.br (4.1/SMI-4.1)
 id AA16401; Mon, 20 Jul 92 21:12:07 EST
Received: from  ([127.0.0.1]) by lsiserv2.lsi.usp.br.lsi.usp.br (4.1/SMI-4.1)
 id AA06206; Mon, 20 Jul 92 20:47:09 EST
Date: Mon, 20 Jul 92 20:47:08 EST
From: anton@mips.complang.tuwien.ac.at (Anton Martin Ertl)
Subject: internals dictionary access summary
Sender: internals@lsiserv2.lsi.usp.br
To: Multiple recipients of list <internals@lsiserv2.lsi.usp.br>
Errors-To: pl@lsiserv2.lsi.usp.br
Errors-To: pl@lsiserv2.lsi.usp.br
Errors-To: pl@lsiserv2.lsi.usp.br
Reply-To: internals@lsiserv2.lsi.usp.br
Message-Id: <9207200752.AA14303@mips.complang.tuwien.ac.at>
X-Envelope-To: anton@mips.complang.tuwien.ac.at
Comment:  Forth Internals Distribution List
Originator: internals
Version: 5.5 -- Copyright (c) 1991/92, Anastasios Kotsikonas
Status: OR

This is a rerepost. I hope it gets through this time.

This is a summary of the proposals/discussions on dictionary
access (I have left out debugging/decompilation for now). I mostly
present the original proposals. Comments on them in other
postings are paraphrased in brackets, alternative proposals are
usually presented in full.


DATA TYPES:

name-id (nid)

The name-id uniquely identifies a header. The execution token does not
suffice because some systems have alias mechanisms and headerless
words. There has been some discussion whether to have seperate
name-ids or to use xts. I think it has been resolved in favor of
keeping them separate (I have replaced xt by name-id in the proposals
where it seemed to be appropriate). In fig-Forth the name-id can be
represented by the NFA.

definer-id:

identifies the kind of word (colon def, variable, ...).
In conventional systems this could be the code address (the content of
the cfa). Different does> actions have different definer-ids. It
needs one cell (I hope this suffices). What is the definer-id of code
words?  Implementation-dependent? I think it would be more useful if
they returned a unique value (for all code words). But it would be
harder to implement.
[The xt of the defining word has been proposed for this function]

ANSI Forth data types used subsequently

wid     - ANS Forth wordlist id
xt      - execution token
adr len - string


WORDS

next-word  ( nid1 wid -- nid2 )  nid2 is the name preceding nid1 in wordlist
                                 in the wordlist "wid".  If nid1 is 0, nid2
                                 is the first word in wid.  If nid2 is 0,
                                 there are no more words in wid.
[An alternative proposed stack effect is ( wid nid1 -- wid nid2 )]

alternative proposal:
ANOTHER-WORD?  ( nid1 wid -- false | nid2 true )
    Finds the successor "nid2" of the word "nid1" in the wordlist "wid",
   or the first word in that wordlist if "nid1" is zero.

alternative proposal: follow/another?/unfollow described below
FOLLOW  ( wid1 ... widn n -- state )
        Initialise a system dependent stack structure (state) in preparation
        for scanning the given n wordlists (indicated by wid1 ... widn).  The
        system dependent value (state) may be of any length.  FOLLOW is used
        in conjunction with ANOTHER?, and UNFOLLOW.

        See also: ANOTHER?; UNFOLLOW.
        [Originally proposed using only one wid

ANOTHER? ( state1 --- state2 nid flag )                 "Another Query"
        Extracts the name identifyer (nid) of the next entry in the
        wordlist(s) being scanned (indicated by the system dependent stack
        structure state1, as initialsed by FOLLOW).  If there a word is found
        the nid of the word is returned, in addition to an updated search
        status (state2) and a true flag.  If no more words are found in the
        search then a false is returned and nid is not valid.

        ANOTHER? is used in conjunction with FOLLOW and UNFOLLOW.

        Example usage:
                : (WORDS) ( state -- ; List all the words in the search )
                  BEGIN
                    ANOTHER?
                    KEY? 0=
                    AND
                  WHILE
                    NAME@ TYPE SPACE
                  REPEAT
                  UNFOLLOW
                ;

                : WORDS ( -- ; Display all words in current word list )
                  GET-ORDER 1- 0 ?DO DROP LOOP 1 FOLLOW (WORDS) ;

                : VLIST ( -- ; Display all words in search order )
                  GET-ORDER FOLLOW (WORDS) ;

        See also: FOLLOW; UNFOLLOW.

UNFOLLOW ( state nid -- )
        Removes the system dependent stack structure (state) initlised by
        FOLLOW, and the nid returned by ANOTHER?.

        UNFOLLOW is normally used when exiting an ANOTHER? based loop.

        See also: FOLLOW; ANOTHER?.


LOOKUP-NAME ( addr len wid -- nid )
  searches for the name given by addr len. If the wordlist wid contains
  the name, LOOKUP-NAME returns its nid, otherwise 0. This word is a
  factor of SEARCH-WORDLIST.
  (Is SEARCH-NAME or FIND-NAME better?)

create-word  ( adr len wid -- )  Create the named word in the vocabulary "xt"

alternative proposal:
CREATE-WORD ( addr len wid flag definer-id -- nid )
creates a word with the name given by addr len, of type definer-id
with immediateness given by flag, and inserts it into the wordlist
wid.

XT>DEFINER ( xt -- definer-id )
Gets the definer-id of the xt.
[alternative names: DEFINER and >DEFINER]
[I don't include the other proposals for working with xts and
definer-ids, as they are not directly related to dictionary access]

remove-word  ( nid wid -- )      Remove the name nid from the wordlist wid.


immediate?  ( name-id -- flag )       True if word is immediate
[alternative names NAME>IMMEDIATE and NAME>IMMEDIATE?]

NAME-IMMEDIATE! ( flag nid -- )
changes immediateness of name to flag


>name  ( xt -- nid )             Return a name of the word xt, or 0 if that
                                 word has no names.
alternative proposal:
X>N     ( xt -- nid )                                           "x to n"
        nid is a name identifier of a word associated with the execution token
        xt.

name>  ( nid -- xt )             Return the execution token of the name nid.
alternative proposal:
N>X     ( nid -- xt )                                           "n to x"
        xt is the execution token associated with the word indicated by the
        name identifier nid.
[>name and name> are already in Forth-83 (as experimental proposals)
with the corresponding meaning]

>NAME? ( xt -- flag )
returns true if the xt is associated with a name, otherwise false.
[This can be hard to implement on many systems. Alternative: >NAME
(X>N) returns 0 if it can detect this, otherwise it is ambigous)

name>string  ( nid -- adr len )  Return string representation of name

alternative proposal:
NAME@   ( nid -- c.addr n )                                     "Name Fetch"
        c.addr is the character alligned address of n charaters that
        represents the name associated with the name identifier nid.

        If the orginal name of the word can not be reproduced then a system
        dependent representation is returned.  An ambiguas condition exists
        if nid is not a valid name identifier.
[The returned string should produce the same nid on searches through
the wordlist, unless the name is shadowed (by a more recent
definition).]
[The lifetime of the string needs to be specified (e.g. say that it
may reside in PAD)]

.NAME   ( xt -- )                                       "Dot Name"
        Display the name (or a system dependent representation of the name)
        corresponding to the execution token xt.
        [The name produced by NAME>STRING and/or  .NAME should be
        usable as input for wordlist searching words]

        .NAME can be defined as:
        : .NAME ( xt -- ; Display name for xt )
          X>N           \ Convert xt to nid
          NAME@         \ Get string for nid
          TYPE          \ Display name
        ;

        [This word is convenient for debugging]

M. Anton Ertl                    Some things have to be seen to be believed
anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen


From internals@lsiserv2.lsi.usp.br Wed Jul 22 07:13:29 1992
Received: by email.tuwien.ac.at (5.65b/1.34)
	id AA08937; Wed, 22 Jul 92 07:13:42 +0200
Received: From aearn.bitnet By awituw64.bitnet ; 22 Jul 92 05:13:41 GMT
Received: from brfapesp.bitnet by AEARN.EDVZ.Uni-Linz.AC.AT (Mailer R2.07) with
 BSMTP id 5052; Wed, 22 Jul 92 07:13:23 CDT
Received: from lsi11.lsi.usp.br by brfapesp.bitnet with PMDF#10108; Wed, 22 Jul
 1992 02:13 BSC (-0300 C)
Received: from lsiserv2.lsi.usp.br.lsi.usp.br by lsi11.lsi.usp.br (4.1/SMI-4.1)
 id AA19195; Tue, 21 Jul 92 17:23:45 EST
Received: from  ([127.0.0.1]) by lsiserv2.lsi.usp.br.lsi.usp.br (4.1/SMI-4.1)
 id AA09088; Tue, 21 Jul 92 16:58:44 EST
Date: Tue, 21 Jul 92 16:58:44 EST
From: anton@mips.complang.tuwien.ac.at (Anton Martin Ertl)
Subject: internals discussions
Sender: internals@lsiserv2.lsi.usp.br
To: Multiple recipients of list <internals@lsiserv2.lsi.usp.br>
Errors-To: pl@lsiserv2.lsi.usp.br
Errors-To: pl@lsiserv2.lsi.usp.br
Reply-To: internals@lsiserv2.lsi.usp.br
Message-Id: <9207211958.AA09088@lsiserv2.lsi.usp.br.lsi.usp.br>
X-Envelope-To: anton@mips.complang.tuwien.ac.at
Comment:  Forth Internals Distribution List
Originator: internals
Version: 5.5 -- Copyright (c) 1991/92, Anastasios Kotsikonas
Status: OR

While summarizing the dictionary words I thoght about some of the
problems of our discussion and how to solve them.

Many things have been proposed several times (usually with different
names, stack effect and descriptions). Proposals often don't reference
alternative proposals. Nobody announces support for proposals. Nobody
explicitely withdraws proposals (But I guess presenting something new
for the same purpose counts as withdrawal).

I think that we need a working document to solve these problems. Then
there's something concrete to discuss. This also helps newcomers
(there's the danger of rehashing discussions, but that's no problem as
they can read the old postings in such a case). Should somebody do it?
(Please mail me a yes or no, I will summarize (don't remain silent)).
I volunteer. How did you like "Internals Wordset Framework (draft)"?
Should I do it along those lines? Should I do it in ASCII or in Latex
(and distribute as LateX and Postscript file)? (LaTex is more
beautiful and easier to edit, if you know it).

Also, I think we need some way to come to conclusions. We probably
need a voting mechanism. Any ideas?

- anton

(Note that replying will post to internals instead of mailing me).
M. Anton Ertl                    Some things have to be seen to be believed
anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen