From anton Tue Oct 1 19:09:57 1991 Date: Tue, 1 Oct 91 19:06:04 +0100 From: anton (Anton Martin Ertl) To: Mitch.Bradley@Eng.Sun.COM, NER034%tp.ac.uk@cunyvm.cuny.edu, anton@mips.complang.tuwien.ac.at, nick@kyron.sw.stratus.com, pl@lsi.usp.br Subject: internals wordset Status: R I think that now all reactions to my postings on FIGI-L and comp.lang.forth are in. You all showed some interest in working the internals workset (Not all "volunteer officially"): Mitch Bradley (Mitch.Bradley@Eng.Sun.COM) Peter Knaggs +-----------------------------+-----------------------------------------------+ ! School of Comp. & Maths., ! Janet: NER034 @ uk.ac.tees-poly ! ! Teesside Polytechnic, ! Bitnet: NER034 % tp.ac.uk @ UKACRL ! ! Middlesbrough, ! Internet: NER034 % tp.ac.uk @ cunyvm.cuny.edu ! ! Cleveland, England. TS1 3BA ! Uucp: NER034 % tpoly.ac.uk @ ukc.uucp ! !-----------------------------+-----------------------------------------------! ========================================================================== Pedro Luis Prospero Sanchez internet: pl@lsi.usp.br (PREFERRED) University of Sao Paulo uunet: uunet!vme131!pl Dept. of Electronics Engineering hepnet: psanchez@uspif1.hepnet phone: (055)(11)211-4574 home: (055)(11)914-9756 fax: (055)(11)815-4272 ========================================================================== nick@kyron.sw.stratus.com (Nicolas Tamburri) RAY BROHINSKY encouraged us, but he cannot work on it. so here's the mailing list: alias internals Mitch.Bradley@Eng.Sun.COM NER034%tp.ac.uk@cunyvm.cuny.edu pl@lsi.usp.br nick@kyron.sw.stratus.com anton@mips.complang.tuwien.ac.at I think it is time for a little introducing: I learned Forth in late 1983 (a fig-Forth pretending to be a Forth-79 on a Commodore 64) and since then I have done some programming in Forth, mostly for fun, but I could use some of it for the courses I took. What do I do for a living? I am the software toolsmith of a Viennese VAR. So what are we going to do? 1) collect ideas I have or will forward to you a paper of Mitch Bradley which he produced for the 1985 FORML (I think), which contains some ideas. I see the internals wordset divided into two parts: a) defining access to the systems internal data structures This should be quite straightforward. The data structures are: The Forth word (the general parts) word lists Some of the things with mysterious stack pictures in the ANSI standard Specialized parts of words, especially for: Colon definitions Word defined with CREATE/DOES> vocabularies What did I miss? b) defining hooks where user code can be brought into the system, e.g. for tracing This seems much harder, because I don't know when it's complete. Also, defining hooks may force some structure onto the Forth system. As test cases for the completeness of the word set some applications should be implemented. What comes to my mind immediately is a debugger and WORDS 2) transform the ideas into standard-like wording Also, we need to know, what ideas we want in the wordset and which of the words are extension. In my opinion most of the word set should be easy to implement on "conventional" (i.e. threaded, fig-like) Forth systems. It might be hard to implement on a native code producing system. 3) The resulting text must be marketed The simplest way would be to get the internals word set into the ANSI standard through the rewiew process. If we cannot achieve this, marketing will cost more effort. In this case I think we should publish the wordset in the Forth journals, develop implementations of it for a few popular Forth systems, and write some really neat applications,t hat everyone wants to have, so there's an incentive for implementing the word set on other systems. The applications would be the hardest part, but I think they would be fun. From Mitch.Bradley@Eng.Sun.COM Wed Oct 2 00:47:14 1991 Received: from Eng.Sun.COM (zigzag-bb.Corp.Sun.COM) by Sun.COM (4.1/SMI-4.1) id AA10105; Tue, 1 Oct 91 16:47:06 PDT Received: from mitch.Eng.Sun.COM by Eng.Sun.COM (4.1/SMI-4.1) id AA17239; Tue, 1 Oct 91 16:46:57 PDT Received: by mitch.Eng.Sun.COM (4.1/SMI-4.1) id AA08358; Tue, 1 Oct 91 16:46:06 PDT Message-Id: <9110012346.AA08358@mitch.Eng.Sun.COM> To: anton@mips.complang.tuwien.ac.at Subject: Re: internals wordset Date: 01 Oct 91 16:46:05 PDT (Tue) From: Mitch.Bradley@Eng.Sun.COM Status: R Here is a proposed wordlist for vocabualary hacking, very similar to what I use now in Open Boot. Data types: wid - ANS Forth wordlist id nid - "name" id ("handle" for a word name) xt - execution token adr len - string immediate? ( xt -- flag ) True if word is immediate create-word ( adr len wid -- ) Create the named word in the vocabulary "xt" remove-word ( nid wid -- ) Remove the name nid from the wordlist wid. name>string ( nid -- adr len ) Return string representation of name >name ( xt -- nid ) Return a name of the word xt, or 0 if that word has no names. name> ( nid -- xt ) Return the execution token of the name nid. next-word ( nid1 wid -- nid2 ) nid2 is the name preceding nid1 in wordlist in the wordlist "wid". If nid1 is 0, nid2 is the first word in wid. If nid2 is 0, there are no more words in wid. Note that nid's and xt's have to be separate, because many systems have alias mechanisms and headerless words, thus the mapping between names and execution tokens is not one-to-one. Every name has exactly one execution token, but a particular execution token may have zero, one, or several names. Mitch.Bradley@Eng.Sun.COM From NER034@prime-a.tees-poly.ac.uk Thu Oct 10 13:54:11 1991 Received: from eros.uknet.ac.uk by mcsun.EU.net with SMTP; id AA24099 (5.65a/CWI-2.115); Thu, 10 Oct 1991 12:50:33 +0100 Message-Id: <9110101150.AA24099@mcsun.EU.net> Received: from kestrel.ukc.ac.uk by eros.uknet.ac.uk via UKIP with SMTP (PP) id <21177-0@eros.uknet.ac.uk>; Thu, 10 Oct 1991 12:42:24 +0100 Received: from tp.ac.uk by kestrel.Ukc.AC.UK via Janet (UKC CAMEL FTP) id aa16678; 10 Oct 91 10:24 BST Date: Thu, 10 Oct 91 10:29:38 BST From: NER034@prime-a.tees-poly.ac.uk To: ANTON Subject: Re: Internals Wordset Status: R > Most of your renamings are good Gee, thanks . > except for .ID . .ID already exists in a lot of systems, and it usually > takes a link field address. It is better to use a different name for a > different word. Instead of C.ID, I now use .NAME . As I said before, I recon that .ID is as near to the old .ID that we are going to be able to get. However, I don't see any objection in calling it .NAME . Indeed it is probably the only way it will get through the TSC. > DOES? is used in a decompiler. Inside the defining word, some systems > compile the same run-time tokens for DOES> and ;CODE . Oh, I never said that I don't understand why you would want the word. I just don't think that I would ever use it. I am not of the opinion that all Forth words should be de-compilable. Indeed in a system that provides fast and efficent compiled code this would be impossable. But then, would you implement this wordset for such a system ? > FOLLOW and ANOTHER? are convenient to use, but they have a theoretical > problem. They maintain "hidden" information about the state of the > search. This causes problems with multitasking, reentrancy, and > nestability. Good Point. May I suggest the following definitions: FOLLOW ( wid -- state ) Initializes system dependent values (state) in preparation for scanning the given wordlist (wid). The system dependent values (state) may be any number of cells in langth. FOLLOW is used in conjunction with ANOTHER?, and UNFOLLOW. Example Usage: See ANOTHER? See also: ANOTHER?; UNFOLLOW. ANOTHER? ( state -- state' xt true ) "Another Query" ( -- false ) Extracts the execution token of the next word in the wordlist begin scanned (indicated by state, as initialized by FOLLOW). A true is return allong with the next execution token. The system dependent values incorperated in state are modified to reflect the current position in the scanning of the wordlist. A flase indicates the end of the wordlist. Example usage: : WORDS ( -- ) CONTEX @ FOLLOW \ Follow the current wordlist BEGIN ANOTER? \ Get next xt KEY? IF \ Has the user pressed a key UNFOLLOW \ Yes => Drop search state FALSE \ Exit loop THEN WHILE \ For all of the wordlist or until \ the user presses a key .NAME \ Display the name of xt SPACE REPETE ; See also: FOLLOW; UNFOLLOW. UNFOLLOW ( state -- ) Used to remove the system dependent values (state) from the stack after aborting an ANOTHER? based loop before its natrual termination. Example usage: See ANOTHER? See also: FOLLOW; ANOTHER?. > Also, ANOTHER? should not return an "xt" because some systems have > a low-level alias mechanism, and the mapping from names to "xt" may > be many-to-one. If ANOTHER? returns "xt", it will not be able to > distinguish the names. This is an interesting point. However, if we were to introduce a new id (namly "name-id" or "nid") we would also have to provide a way of converting between nid and xt. The possibality of getting this through the TSC will be pretty remote in my view. I am prepared to accept that alias may exist, however for the purpses of displaing a name, the name associated with the original definition should be used. : FU FOO ; \ A standard way of aliasing. ALIAS FOO BAR \ Define a "low level" alias. ' FU .NAME ( gives ) FU \ Display the name of FU, as this is a colon \ definition FU is displayed and not FOO. ' BAR .NAME ( gives ) FOO \ As this is a "low level" alias the name \ associated with the colon definition is given. \ Ie., FOO. This will not allow us to re-construct the orriginal definition when such alias have been used. However, the definition we create will be functionally the same, as we have simply resolved the aliasing. I conceder this to be a pain, but I don't see any way around it. Thus the new definition of .ID reads: .NAME ( xt -- ) "Dot Name" Displays the name of the word associated with the given execution token (xt). If the full name of the word has not been stored then an aproximation is required. Ie., if the word INTEGER is defined in a systems that only stores the first three leters then INT---- is an exceptiable display. If the word has been defined as headerless then the name must take on the form H-nnnn where the H indicates that the word is headerless, and the nnnn is a representation of the execution token. If there are several names associated to the single execution token (ie., the name has aliases) then the name associated with the original definition is displayed. Note, the only changes are the name of the word. Changed from .ID to .NAME, and the addition of the last paragraph to cater for aliasing. > Mitch I see that you make no comment on the other alterations and new words that I have added to your list. I therefor assume (a) you agree with them, or (b) they are to be the subject of another mailing. Peter Knaggs +-----------------------------+-----------------------------------------------+ ! School of Comp. & Maths., ! Janet: NER034 @ uk.ac.tees-poly ! ! Teesside Polytechnic, ! Bitnet: NER034 % tp.ac.uk @ UKACRL ! ! Middlesbrough, ! Internet: NER034 % tp.ac.uk @ cunyvm.cuny.edu ! ! Cleveland, England. TS1 3BA ! Uucp: NER034 % tpoly.ac.uk @ ukc.uucp ! !-----------------------------+-----------------------------------------------! ! It is not enough to do the right thing; one must also do it the right way. ! +-----------------------------------------------------------------------------+ From nick@kyron.sw.stratus.com Fri Oct 11 15:36:02 1991 Received: from lectroid.sw.stratus.com (lectroid-gw.sw.stratus.com) by transfer.stratus.com (4.1/2.0-jjm) id AA05528; Fri, 11 Oct 91 10:34:40 EDT Received: from kyron.sw.stratus.com.sw.stratus.com by lectroid.sw.stratus.com (4.1/2.1-jjm) id AA20552; Fri, 11 Oct 91 10:36:12 EDT Received: by kyron.sw.stratus.com.sw.stratus.com (4.1/SMI-4.1) id AA04757; Fri, 11 Oct 91 10:36:09 EDT Date: Fri, 11 Oct 91 10:36:09 EDT From: nick@kyron.sw.stratus.com (Nicolas Tamburri) Message-Id: <9110111436.AA04757@kyron.sw.stratus.com.sw.stratus.com> To: NER034@prime-a.tees-poly.ac.uk Subject: comments on wordset Cc: Mitch.Bradley@Eng.Sun.COM, NER034%tp.ac.uk@cunyvm.cuny.edu, anton@mips.complang.tuwien.ac.at, nick@kyron.sw.stratus.com, pl@lsi.usp.br Status: R Sorry it's taken so long to respond. I`ve been trying to reconcile your wordset with the way I've been thinking about this problem, as I began outlining in a post to c.l.f. Since I've not received any comments on my post, I assume that it wasn't well received as a starting point for discussion. Nevertheless, my mind is set to that way of thinking and, involuntarily, I've been trying to compare the 2 schemes to try to see which has the more potential for problems. Some general comments: Our basic differences stem from the fact that your scheme, (and Mitch's) rely on knowing about addresses, whereas mine tries to avoid native machine addresses and rely on Dictionary Tokens (DT) for passing header information. Generally speaking this is only a matter of semantics, and in most cases the DT would be kept as an address anyway. Conceptually however, freeing the vendor from having to supply me with addresses removes a limit which may be important in the future. This is important IMO. I believe, (but will go along with the majority opinion on this,) that we are not abstracting data enough. That was the intent of project, was it not? Below are some specific comments. It assumes we wish to keep this addressing scheme. >TOKEN@ ( a.addr -- xt ) "Token Fetch" > xt is the system dependent execution token stored at the address > alligned a.addr. The execution token is of a form that can be > passed to EXECUTE. An ambiguas condition exists when a.addr > does not reference an execution token. Why would I ever want to use this, when LOCATE not only returns me the xt, but tells me if it is valid as well? (At least I assume it does from its description.) Is it ever possible for me to test an xt to see if it is valid? >+TOKEN ( a.addr1 -- a.addr2 ) "Plus Token" > Moves the given address aligned a.addr1 past the token stored at > a.addr1 to the address of the next token (a.addr2). An ambigus > condition exists if a.addr1 does not point to an execution token. > I assume this also does +STRs as appropriate. What happens if c.addr1 points to the last xt of a definition? I think it should return 0. >Notes: The definition of TOKEN@ and TOKEN! are more or less the same as > Mitch's. However I beleve that /TOKEN can not be used as some > systems (subroutine threded) may use differing sized of token, > dependent on cercamstances. Therefor the compromise word > +TOKEN is given to counter this posiability. Ie., to read two > tokens you would write: DUP TOKEN@ SWAP +TOKEN TOKEN@ > as opposed to: DUP TOKEN@ SWAP /TOKEN + TOKEN@ I agree. In general, I'd rather have the system do the pointer arithmetic when it comes to working with internal structures. > In the same respect I also beleve that a.addr should point to the > start of the token. Hence on a subroutine threded system a.addr > will point to the subroiutne call instruction. Don't pin point it down. a.addr returns an address which points TOKEN@ uses to return an xt. I should not have to care about what it points to. >>TARGET ( a.addr1 -- a.addr2 ) "To Target" > a.addr2 is the destination address of the branch instruction > located at a.addr1. a.addr1 is the address of the branch > instruction and not its operand. An ambiguas condition exists > if the instruction pointed to by a.addr1 is not a branch > instruction. This is where a totally address less scheme such as mine breaks down. In this case, you really do need an address. At least, I can't think of a way to avoid them. >BRANCH? ( a.addr -- flag ) "Brance Query" > Returns True if the branch instruction at a.addr is an > unconditional branch, and False if it is a conditional branch. > An ambiguas condition exists if the instruction pointed to by > a.addr1 is not a branch instruction. How do you find out if it is a branch instruction in the first place? How about returns 0 if non-branch, negative if conditional branch and positive if unconditional branch? Branch instructions include LOOP and friends as well I assume. >STR@ ( c.addr1 -- c.addr2 ) "String Fetch" > Fetches the string literal compiled at the given c.addr1. A > copy of the counted string is made available at c.addr2. A copy of the counted string? So this does not work the COUNT does, which merely adjusts the input address to point to the beginning of the string. The problem with this is that it implies that the system allocates space at c.addr2 to store the string into. This has 2 problems: 1. I don's see any words that deallocates the space. 2. If the user has a unique memory management mechanism, a hidden and uncontrollable, call to ALLOCATE by the system would not be appreciated. Solution: STR@ ( c.addr1 length c.addr2 -- actualLength ) Move the string from c.addr1 to c.addr2. Do not exceed the provided length count if the string is longer than provided for. Return the actual length of the string. BTW: If the STR operators are to be used only inconjunction with string literals, meaning they are compiled by " , ." etc. then I believe the input address should always point to the xt of the runtime components of the aformentioned string literal handlers. >>DATA ( xt -- a.addr ) "To Data" > a.addr is the address of the data storage area associated with > the given execution token (xt). Ie., for variables and user > created items, >DATA is equivalnt to >BODY, for user variables > >DATA returns an address in the user area, etc. It might be useful to also return the length of the data area in bytes. >.ID ( xt -- ) "Dot I D" > Displays the name of the word associated with the given execution > token (xt). I think we need an analogous way to retrieve the name as a string, rather than something that types the name out. This allows things like debuggers to work, as well as friendly tools like command line name completers. >LITERAL@ ( a.addr1 -- a.addr2 x ) "Literal Fetch" > Reads the value compiled by a literal instruction at a.addr1. The > address of the next instruction is returned (a.addr2) in > addition to the value of the literal instruction. An ambiguas > condition exists if a.addr1 does not point to a literal > instruction. This seems unnecessarily complex. It seems to me that this can easily be implemented as >DATA +TOKEN. Maybe we should have a LITERAL? word, which specifies that this is a word which is followed by inline data. [I now switch to your response to Mitch's comments.] >FOLLOW ( wid -- state ) > Initializes system dependent values (state) in preparation for > scanning the given wordlist (wid). The system dependent values > (state) may be any number of cells in langth. FOLLOW is used in > conjunction with ANOTHER?, and UNFOLLOW. This has the same problems I mentioned above for STR@ . I suggest a similar solution. UNFOLLOW does provide the deallocation mechanism, but the problem of FOLLOW possibly messing up a user's memory management routines still exists. Let the user do the allocation. >> Also, ANOTHER? should not return an "xt" because some systems have >> a low-level alias mechanism, and the mapping from names to "xt" may >> be many-to-one. If ANOTHER? returns "xt", it will not be able to >> distinguish the names. > >This is an interesting point. However, if we were to introduce a new id >(namly "name-id" or "nid") we would also have to provide a way of converting >between nid and xt. The possibality of getting this through the TSC will be >pretty remote in my view. You have a good point, but I agree with Mitch. For some, the use of aliases is so pervasive that they may forget they are using them. A decompilation to the original code may simply serve to confuse, or think the decompiler has a bug in it. >This will not allow us to re-construct the orriginal definition when such >alias have been used. However, the definition we create will be >functionally the same, as we have simply resolved the aliasing. I conceder >this to be a pain, but I don't see any way around it. The way around it is to use DTs instead of XTs. For most of the words we are talking about, this would work just fine. >Thus the new definition of .ID reads: > >.NAME ( xt -- ) "Dot Name" > Displays the name of the word associated with the given execution > token (xt). > > If the full name of the word has not been stored then an > aproximation is required. Ie., if the word INTEGER is defined in > a systems that only stores the first three leters then INT---- is > an exceptiable display. > > If the word has been defined as headerless then the name must > take on the form H-nnnn where the H indicates that the word is > headerless, and the nnnn is a representation of the execution > token. Last call for a NAME@ defined as: NAME@ ( xt length c.addr -- ) works just like my definition for STR@, but is specific to working with XTs. (Works even better with DTs.) > If there are several names associated to the single execution > token (ie., the name has aliases) then the name associated with the > original definition is displayed. Too restrictive. If some enterprising vendor can figure out how to return me an alias, then don't prevent it. Summary: Not having attempted to write anything with this word-set, I believe I can live with it. At the very least, I believe it is a good start. I believe that it would be better if we implemented most words which work with names to use DTs, and use XTs when traversing through the parameter list of a word definition. Would anyone be interested if I try to come up with such a word-set, or is this direction OK with everyone? Food for thought: Are we concentrating at too low a level here? It seems to me that the above word-set has been defined to handle the most common features of current forths. Yet, we are defining a new word set, possibly for a standard. Can't we expect changes to current forths to support this word-set. Specifically, what I would love to have, to really do this right, is a hook into the dictionary header which will allow me to vector a data return routine. This way, if I have a complex data structure, I can call a routine which will return the value of each field in sequence, so I can type them out. I know this is of limited value in a production environment, but everything we are talking about here is geared toward the development environment. The finished program would not have to include this extra code of course. The words presented here don't break existing code, and don't seem to be too hard to implement on most of the systems I've seen. IMO a common word set like this would benefit all vendors because it would free them from having to develop their own environment, or sell a proprietary environment for other vendors' platforms. Would it be too much to expect rudimentary support for this word-set from any vendor who wants these benefits? (Mitch?) Comments are welcome of course... /nt From pl@lsi2 Fri Oct 11 19:19:21 1991 Received: from lsi2 (lsi2.lsi.usp.br) by lsi11.lsi.usp.br (SUN-IPC/4.1/LSI-1.0) id AA01544 Posted-Date: Fri, 11 Oct 91 15:13:11-030 Received-Date: Fri, 11 Oct 91 15:13:05 EST Received: by lsi2 (4.0/SMI-4.0) id AA02961; Fri, 11 Oct 91 15:13:11-030 Date: Fri, 11 Oct 91 15:13:11-030 From: pl@lsi2 (Pedro Sanchez) Message-Id: <9110111813.AA02961@lsi2> To: anton@mips.complang.tuwien.ac.at Subject: A discussion list for Internals Wordset? Status: R Hi Anton, I think that a discussion list on the Internals wordset subject would be a good idea. At least everybody would receive all the messages and could follow the thread ( now I can't, because I do not receive all messages). Since you started the discussion on the subject, I am asking you what you think about this. I volunteer to provide the resources and to take care of the list. Regards, Pedro. ========================================================================== Pedro Luis Prospero Sanchez internet: pl@lsi.usp.br (PREFERRED) University of Sao Paulo uunet: uunet!vme131!pl Dept. of Electronics Engineering hepnet: psanchez@uspif1.hepnet phone: (055)(11)211-4574 home: (055)(11)914-9756 fax: (055)(11)815-4272 ========================================================================== From anton Tue Oct 15 20:45:04 1991 Date: Tue, 15 Oct 91 20:41:31 +0100 From: anton (Anton Martin Ertl) To: Mitch.Bradley@Eng.Sun.COM, NER034%tp.ac.uk@cunyvm.cuny.edu, anton@mips.complang.tuwien.ac.at, nick@kyron.sw.stratus.com, pl@lsi.usp.br Subject: Re: Internals wordset Status: R Sorry for the late reply - had not much time lately In <9110012346.AA08358@mitch.Eng.Sun.COM> Mitch writes >immediate? ( xt -- flag ) True if word is immediate shouldn't it be ( nid -- flag ) ? >create-word ( adr len wid -- ) Create the named word in the vocabulary "xt" how do xt's and immediate come in? note that this word suggests, but does not force, that a word (nid) is in only one word list (no objection, just to make you aware) >name>string ( nid -- adr len ) Return string representation of name I think, that we have to restrict this word to allow implementation on all systems. I.e. restrict the life of the string returned. >next-word ( nid1 wid -- nid2 ) nid2 is the name preceding nid1 in wordlist > in the wordlist "wid". If nid1 is 0, nid2 > is the first word in wid. If nid2 is 0, > there are no more words in wid. Some Forth implementations might be very unhappy with this stack effect: If the same nid can appear twice in a wordlist, ( nid wid ) does not identify the current position in the word list. This stack effect can also have a bad effect on efficiency. A remedy would be to use a word-list position, resulting in a stack effect like ( wlpos1 -- wlpos2 nid ). This word-list position is Peter Knaggs' state in the FOLLOW/ANOTHER?/UNFOLLOW words. Another word to get the first word-list position from the word list would be required (unless the wid is the first word-list position), e.g. Peter's FOLLOW. I think the size of the word-list position should be defined (unlike state) to make it usable. A one-cell wlpos might be impossible to implement without changing the internal structure of some implementations. Is two-cell acceptable? (note that I have not read Peter's message thoroughly and I don't know what he replies to) From mips!anton@relay.EU.net Wed Oct 30 20:20:11 1991 Received: from lsi7.lsi.usp.br by lsi11.lsi.usp.br (4.1/LSI-16OCT91-01) id AA00454; Wed, 30 Oct 91 17:05:12 EDT Received: from lsi11.lsi.usp.br by lsi7.lsi.usp.br (4.1/SMI-4.1) id AA04029; Wed, 30 Oct 91 17:04:57 EDT Received: from lsi17.lsi.usp.br by lsi11.lsi.usp.br (4.1/LSI-16OCT91-01) id AA00451; Wed, 30 Oct 91 17:05:06 EDT Received: from mcsun.EU.net by lsi17.lsi.usp.br (4.0/SMI-4.1) id AA01882; Wed, 30 Oct 91 16:02:17-030 Received: from tuvie.can.ac.at by mcsun.EU.net with SMTP; id AA07893 (5.65a/CWI-2.120); Wed, 30 Oct 1991 19:57:41 +0100 Received: from mips.complang.tuwien.ac.at by tuvie.can.ac.at with SMTP; id AA12194 (5.65b+/CAN-1.15); Wed, 30 Oct 91 20:00:34 +0100 Received: by mips.complang.tuwien.ac.at (5.57/Ultrix3.0-C) id AA00291; Wed, 30 Oct 91 19:20:12 +0100 Date: Wed, 30 Oct 91 19:20:12 +0100 From: mips!anton@relay.EU.net (Anton Martin Ertl) Message-Id: <9110301820.AA00291@mips.complang.tuwien.ac.at> To: internals@lsi17.lsi.usp.br Subject: decompilation and tracing Status: R Nick writes on c.l.f (message <8570@lectroid.sw.stratus.com>): >Mach2 for the Mac is a JSR threaded Forth which allows inline machine >code. >Its debugger seems to do a pretty good job of decompiling words, JSR >calls >and machine code. If the machine code has been generated by inline >expansion >of a Forth word, it manages to identify the word that did it in all of >the >examples I've seen. (There may be instances where it cannot.) User >code is >simply displayed as assembler code, and/or hex values. > >Is there any language implementation/processor architecture which would >specifically prevent this type of functionality? There are optimizations which are impossible to undo (information loss). Of course additional information can be stored and used for decompiling. For debugging (i.e. stepping and tracing) the problem is worse: You not only want to see the word decompiled, but also want it to execute piecewise and see the stack effect. A good optimizer can do the following transformations making debugging difficult: rearrange actions (e.g. instruction scheduling, or loop fusion) combine several actions (words) into one instruction (instruction selection, peephole optimization) distribute an action into several instructions, which can then be rearranged with other instructions (inlining) optimize code away (constant folding, induction variable elimination) BTW, undoing inlining (in Mach2) is quite a feat. I would not have tried it. Fortunately there are no (to my knowledge) Forth systems with high levels of optimization. They all do a bit of inlining and some peephole optimizations (to reduce pushing and popping). However, since we do not want to prevent systems with such optimizers, we have to take them into consideration. The basic question is: How much correspondence with the source do we want? This is even an issue on a completely conventional (e.g. fig-)Forth. They all have immediate words which compile to something different (or nothing at all, e.g. '('). Is it satisfactory to decompile into forth code that is equivalent (on the specific system) to the original code? If yes, a decompiler can be done on any system. If the user wants better source-decompilation correspondence, he can turn the optimizer lower. The usual approach in other languages is to have additional information, (making the compiled program twice as big) and displaying the source code (stored in the source files, not in the executable). Most compilers disallow debugging optimized code, those that allow it have reduced functionality (e.g. they are unable to display the values of variables that are optimized away). Also some optimizations are turned off by debugging anyway. You have to turn on compilation for debugging explicitly. The situation in Forth is harder because of the finer granularity (word instead of statement). - anton From mips!anton@relay.EU.net Wed Oct 30 20:20:32 1991 Received: from lsi7.lsi.usp.br by lsi11.lsi.usp.br (4.1/LSI-16OCT91-01) id AA00463; Wed, 30 Oct 91 17:05:59 EDT Received: from lsi11.lsi.usp.br by lsi7.lsi.usp.br (4.1/SMI-4.1) id AA04038; Wed, 30 Oct 91 17:05:43 EDT Received: from lsi17.lsi.usp.br by lsi11.lsi.usp.br (4.1/LSI-16OCT91-01) id AA00458; Wed, 30 Oct 91 17:05:53 EDT Received: from mcsun.EU.net by lsi17.lsi.usp.br (4.0/SMI-4.1) id AA01881; Wed, 30 Oct 91 16:02:16-030 Received: from tuvie.can.ac.at by mcsun.EU.net with SMTP; id AA07908 (5.65a/CWI-2.120); Wed, 30 Oct 1991 19:57:51 +0100 Received: from mips.complang.tuwien.ac.at by tuvie.can.ac.at with SMTP; id AA12199 (5.65b+/CAN-1.15); Wed, 30 Oct 91 20:00:42 +0100 Received: by mips.complang.tuwien.ac.at (5.57/Ultrix3.0-C) id AA00297; Wed, 30 Oct 91 19:22:51 +0100 Date: Wed, 30 Oct 91 19:22:51 +0100 From: mips!anton@relay.EU.net (Anton Martin Ertl) Message-Id: <9110301822.AA00297@mips.complang.tuwien.ac.at> To: internals@lsi17.lsi.usp.br Subject: framework for "vocabulary hacking" Status: R %I hope you can process Latex; If you want a postscript version, mail me. \documentstyle[12pt]{article} \newcommand{\function}[4]{ \item[#1]\hfill{#2} {#3} {#4}} \newcommand{\forth}[1]{{\tt #1}} \newcommand{\stack}[1]{{\it #1}} \newcommand{\stackcomment}[2]{(\stack{#1} --- \stack{#2})} \newcommand{\word}[3]{\forth{#1} \stackcomment{#2}{#3}} \newenvironment{functions}{\begin{description}\setlength{\parskip}{0mm}}{\end{description}} \title{Internals Wordset Framework (draft)} \author{M. Anton Ertl} \date{October 25, 1991} \begin{document} \maketitle This document lists functions that might be included in the internals wordset. This listing is not yet complete. Since the discussion currently revolves around wordlists and words in general, I have attacked these themes first. For completeness and as food for thought I have also listed functions that I would not provide for in the word set. If you can think of any functions that I have missed, you are welcome. I have entered the words you or the ANSI TC have proposed. If I forgot anything, please inform me. Note that there need not be a 1:1 relation between functions and words. The functions are classified with respect to implementability: \begin{functions} \item[1] The function is already performed by some word in the standard (i.e. the fuction is a factor of the word). The word is indicated in parentheses. \item[2] Functions that can be implemented on current Forth sytems. \item[3] Functions that can be implemented but require changing data structures of existing Forth systems. \end{functions} Of course my classifications may be wrong. Note that a word performing a function can be in a higher class than the function. The entries have the format: \begin{functions} \function {function} {class} {comments} {proposed word(s) [origin]} \end{functions} \section{Wordlists} \begin{functions} \function {create a wordlist} {1 (\forth{WORDLIST})} {} {\word{WORDLIST}{}{wid} \cite[15.1.2460]{basis15}} \function {insert a word into a wordlist} {1 (\forth{CREATE})} {} {\word{create-word}{addr len wid}{} \cite{mb91b} (see also: create name)} \function {delete a word from a wordlist} {2} {} {\word{remove-word}{nid wid}{} \cite{mb91b}} \function {change word in a wordlist} {3} {} {} \function {enumerate the words in a wordlist/traverse the wordlist} {1 (\forth{WORDS})} {} {\word{FOLLOW}{wid}{}, \word{ANOTHER?}{}{nid true {\rm or} false} \cite{mb85}; \word{FOLLOW}{wid}{wordlist-pos}, \word{ANOTHER?}{wordlist-pos1}{wordlist-pos2 nid true {\rm or} false}, \word{UNFOLLOW}{wordlist-pos}{} \cite{pk91a} I have renamed the stack items for clarity (Whether we unify \stack{xt} and \stack{nid}, is a seperate issue). \word{next-word}{nid1 wid}{nid2} \cite{mb91a}} \end{functions} \section{Name (nid)} standard-visible data: name string, immediate flag, associated execution token (xt) \begin{functions} \function {create name} {1 (\forth{CREATE} etc.)} {} {\word{create-word}{addr len wid}{} \cite{mb91b} (see also: insert a word into a wordlist)} \function {get name string} {1 (\forth{WORDS})} {the result should be an approximation sufficient for searching} {\word{name>string}{nid}{adr len} \cite{mb91b}} \function {get immediate flag} {1 (outer interpreter/compiler)} {} {\word{immediate?}{nid}{flag} \cite{mb91b} (I have changed the stack effect)} \function {get xt} {1 (\forth{'})} {} {\word{name>}{nid}{xt} \cite{mb91b}} \function {get wordlist} {3} {} {} \function {get position in word list} {2} {}{} \function {change name string} {3} {} {} \function {change immediate flag} {1 (\forth{IMMEDIATE})} {}{} \function {change xt} {3} {}{} \end{functions} \section{Execution token (xt)} \begin{functions} \function {create execution token} {1 (\forth{:NONAME})} {}{} \function {get definition token} {1 (\forth{EXECUTE})} {The definition token says what kind of word the xt is (i.e. colon def, var, ...)} {\word{DEFINER}{xt1}{xt2} \cite{mb91a}} \function {check if the word has a name (or if it is headerless)} {3} {} {\word{>name}{xt}{nid} \cite{mb91b}} \function {get the nid if there is one} {2} {} {\word{>name}{xt}{nid} \cite{mb91b}} \function {get the memory block associated with the xt} {address 1-2, size 3} {former parameter field} {\word{>DATA}{xt}{addr} \cite{mb85} for the address (no size)} \end{functions} %for the next time %there are the following definition words in Basis 15 %2constant, 2variable, :, code, constant, create, fconstant, %fvariable, marker, value, variable %(local) is not listed as defining word \begin{thebibliography}{9} %may be slightly wrong, I don't have the papers right now \bibitem{basis15} ANSI~X3J14 Technical Committee. \newblock{\em Basis~15}, 1991. \bibitem{mb85} Mitch Bradley. \newblock Self-Understanding Programs. \newblock {\em FORML Proceedings}, 1985. \bibitem{mb91a} Mitch Bradley. \newblock How to make a portable decompiler. \newblock Email message 9109050216.AA13686@mitch.Eng.Sun.COM, September 4, 1991. \bibitem{mb91b} Mitch Bradley. %\newblock {\em Re: internals wordset}. \newblock 9110012346.AA08358@mitch.Eng.Sun.COM internals posting, 1991. \bibitem{pk91a} Peter Knaggs. %\newblock {\em Re: Internals Wordset}. \newblock internals posting, October 10, 1991. \end{thebibliography} \end{document} From mips!anton@relay.EU.net Wed Oct 30 20:33:56 1991 Received: from lsi7.lsi.usp.br by lsi11.lsi.usp.br (4.1/LSI-16OCT91-01) id AA00524; Wed, 30 Oct 91 17:07:29 EDT Received: from lsi11.lsi.usp.br by lsi7.lsi.usp.br (4.1/SMI-4.1) id AA04042; Wed, 30 Oct 91 17:07:12 EDT Received: from lsi17.lsi.usp.br by lsi11.lsi.usp.br (4.1/LSI-16OCT91-01) id AA00499; Wed, 30 Oct 91 17:07:21 EDT Received: from mcsun.EU.net by lsi17.lsi.usp.br (4.0/SMI-4.1) id AA01885; Wed, 30 Oct 91 16:05:51-030 Received: from tuvie.can.ac.at by mcsun.EU.net with SMTP; id AA04978 (5.65a/CWI-2.120); Wed, 30 Oct 1991 19:18:28 +0100 Received: from mips.complang.tuwien.ac.at by tuvie.can.ac.at with SMTP; id AA11486 (5.65b+/CAN-1.15); Wed, 30 Oct 91 19:21:20 +0100 Received: by mips.complang.tuwien.ac.at (5.57/Ultrix3.0-C) id AA00285; Wed, 30 Oct 91 19:18:15 +0100 Date: Wed, 30 Oct 91 19:18:15 +0100 From: anton@mips.complang.tuwien.ac.at (Anton Martin Ertl) Message-Id: <9110301818.AA00285@mips.complang.tuwien.ac.at> To: internals@lsi17.lsi.usp.br Subject: internals Status: R Peter Knaggs writes (on Oct 10th) >As I said before, I recon that .ID is as near to the old .ID that we are >going to be able to get. However, I don't see any objection in calling >it .NAME . Indeed it is probably the only way it will get through the TSC. .NAME should be easy to define with the rest of the internals wordset, e.g. : .NAME ( nid -- ) NAME>STRING TYPE ; >Oh, I never said that I don't understand why you would want the word. I >just don't think that I would ever use it. I am not of the opinion that all >Forth words should be de-compilable. Indeed in a system that provides fast >and efficent compiled code this would be impossable. But then, would you >implement this wordset for such a system ? > I would, although what "this" is, is not yet fully determined. >This is an interesting point. However, if we were to introduce a new id >(namly "name-id" or "nid") we would also have to provide a way of converting >between nid and xt. The possibality of getting this through the TSC will be >pretty remote in my view. Let us make it right! They will change it anyway. According to Mitch, the chances of an internals wordset to pass are pretty small. > If the full name of the word has not been stored then an > aproximation is required. Ie., if the word INTEGER is defined in > a systems that only stores the first three leters then INT---- is > an exceptiable display. The name produced by NAME>STRING and/or .NAME should be usable as input for wordlist searching words > If the word has been defined as headerless then the name must > take on the form H-nnnn where the H indicates that the word is > headerless, and the nnnn is a representation of the execution > token. Make that a suggestion, but not a requirement. - anton From Mitch.Bradley@Eng.Sun.COM Wed Oct 30 20:37:10 1991 Received: from Eng.Sun.COM (zigzag-bb.Corp.Sun.COM) by Sun.COM (4.1/SMI-4.1) id AA28989; Wed, 30 Oct 91 11:36:52 PST Received: from mitch.Eng.Sun.COM by Eng.Sun.COM (4.1/SMI-4.1) id AA17297; Wed, 30 Oct 91 11:36:06 PST Received: by mitch.Eng.Sun.COM (4.1/SMI-4.1) id AA02889; Wed, 30 Oct 91 11:35:10 PST Message-Id: <9110301935.AA02889@mitch.Eng.Sun.COM> To: anton@mips.complang.tuwien.ac.at Cc: internals@lsi17.lsi.usp.br Subject: Re: internals Date: 30 Oct 91 11:35:09 PST (Wed) From: Mitch.Bradley@Eng.Sun.COM Status: R > For TOKEN@ etc.? Then the DT (= nid?) would have to be in the compiled > code (or the additional info). Would this not be a bit too > expensive (in terms of constraints on the implementation) just for > decompiling aliases well? I am not worried about the problem of properly decompiling the aliases. I am worried about the problem of enumerating the word names in a vocabulary. If you use the XT to refer to a word, you can't enumerate a vocabulary that contains 2 aliases for the same XT, because you can't find the successor of the second such alias. Mitch.Bradley@Eng.Sun.COM From mips!anton@relay.EU.net Wed Oct 30 20:44:21 1991 Received: from lsi7.lsi.usp.br by lsi11.lsi.usp.br (4.1/LSI-16OCT91-01) id AA00585; Wed, 30 Oct 91 17:17:51 EDT Received: from lsi11.lsi.usp.br by lsi7.lsi.usp.br (4.1/SMI-4.1) id AA04084; Wed, 30 Oct 91 17:17:35 EDT Received: from lsi17.lsi.usp.br by lsi11.lsi.usp.br (4.1/LSI-16OCT91-01) id AA00582; Wed, 30 Oct 91 17:17:45 EDT Received: from mcsun.EU.net by lsi17.lsi.usp.br (4.0/SMI-4.1) id AA01902; Wed, 30 Oct 91 16:16:34-030 Received: from tuvie.can.ac.at by mcsun.EU.net with SMTP; id AA07879 (5.65a/CWI-2.120); Wed, 30 Oct 1991 19:57:33 +0100 Received: from mips.complang.tuwien.ac.at by tuvie.can.ac.at with SMTP; id AA12187 (5.65b+/CAN-1.15); Wed, 30 Oct 91 20:00:25 +0100 Received: by mips.complang.tuwien.ac.at (5.57/Ultrix3.0-C) id AA00294; Wed, 30 Oct 91 19:20:49 +0100 Date: Wed, 30 Oct 91 19:20:49 +0100 From: mips!anton@relay.EU.net (Anton Martin Ertl) Message-Id: <9110301820.AA00294@mips.complang.tuwien.ac.at> To: internals@lsi17.lsi.usp.br Subject: xt vs. nid Status: R Should wordlist-position, name-id and execution token be kept separate, or should some or all of these concepts be unified? In many existing systems these concepts are related to each other in a 1:1 fashion (except for headerless words), so they could be unified (i.e. use the xt for all purposes). If we unify them, or at least some of them, as Peter Knaggs has suggested, we lose functionality on systems that support many:1 relationships. E.g., implementing WORDS with nid replaced by xt gives unexpected results for aliases. On the other hand, keeping them separate makes words for conversion necessary. Also, we have to take care not to constrain the implementations in either way. Currently I prefer keeping them separate. It gives better functionality and seems cleaner. BTW, what about multiple code fields? Are they in use? Should we consider them? - anton From Mitch.Bradley@Eng.Sun.COM Wed Oct 30 21:25:39 1991 Received: from lsi7.lsi.usp.br by lsi11.lsi.usp.br (4.1/LSI-16OCT91-01) id AA00684; Wed, 30 Oct 91 17:46:39 EDT Received: from lsi11.lsi.usp.br by lsi7.lsi.usp.br (4.1/SMI-4.1) id AA04093; Wed, 30 Oct 91 17:46:23 EDT Received: from lsi17.lsi.usp.br by lsi11.lsi.usp.br (4.1/LSI-16OCT91-01) id AA00681; Wed, 30 Oct 91 17:46:34 EDT Received: from Sun.COM by lsi17.lsi.usp.br (4.0/SMI-4.1) id AA01980; Wed, 30 Oct 91 16:43:40-030 Received: from Eng.Sun.COM (zigzag-bb.Corp.Sun.COM) by Sun.COM (4.1/SMI-4.1) id AA28335; Wed, 30 Oct 91 11:33:15 PST Received: from mitch.Eng.Sun.COM by Eng.Sun.COM (4.1/SMI-4.1) id AA17072; Wed, 30 Oct 91 11:32:28 PST Received: by mitch.Eng.Sun.COM (4.1/SMI-4.1) id AA02880; Wed, 30 Oct 91 11:31:31 PST Message-Id: <9110301931.AA02880@mitch.Eng.Sun.COM> To: mips!anton@relay.EU.net Cc: internals@lsi17.lsi.usp.br Subject: Re: xt vs. nid Date: 30 Oct 91 11:31:30 PST (Wed) From: Mitch.Bradley@Eng.Sun.COM Status: R > Should wordlist-position, name-id and execution token be kept > separate, or should some or all of these concepts be unified? I for one don't want to unify them because that doesn't work on my system, which has first-class aliases and headerless words. Mitch.Bradley@Eng.Sun.COM From mips!anton@relay.EU.net Thu Oct 31 13:49:59 1991 Received: from lsi7.lsi.usp.br by lsi11.lsi.usp.br (4.1/LSI-16OCT91-01) id AA00578; Wed, 30 Oct 91 17:16:18 EDT Received: from lsi11.lsi.usp.br by lsi7.lsi.usp.br (4.1/SMI-4.1) id AA04079; Wed, 30 Oct 91 17:16:03 EDT Received: from lsi17.lsi.usp.br by lsi11.lsi.usp.br (4.1/LSI-16OCT91-01) id AA00575; Wed, 30 Oct 91 17:16:13 EDT Received: from mcsun.EU.net by lsi17.lsi.usp.br (4.0/SMI-4.1) id AA01897; Wed, 30 Oct 91 16:12:48-030 Received: from tuvie.can.ac.at by mcsun.EU.net with SMTP; id AA06194 (5.65a/CWI-2.120); Wed, 30 Oct 1991 19:34:28 +0100 Received: from mips.complang.tuwien.ac.at by tuvie.can.ac.at with SMTP; id AA11499 (5.65b+/CAN-1.15); Wed, 30 Oct 91 19:21:54 +0100 Received: by mips.complang.tuwien.ac.at (5.57/Ultrix3.0-C) id AA00288; Wed, 30 Oct 91 19:18:49 +0100 Date: Wed, 30 Oct 91 19:18:49 +0100 From: anton@mips.complang.tuwien.ac.at (Anton Martin Ertl) Message-Id: <9110301818.AA00288@mips.complang.tuwien.ac.at> To: internals@lsi17.lsi.usp.br Subject: internals Status: R In <9110111436.AA04757@kyron.sw.stratus.com.sw.stratus.com> Nick writes (in reply to Peter Knaggs): >Since I've not received any comments >on my post, I assume that it wasn't well received as a starting point for >discussion. Usually your posts are so neat that I have nothing to add. >I believe, >(but will go along with the majority opinion on this,) that we are not >abstracting data enough. That was the intent of project, was it not? My intent was to make certain programming techniques standard Forth. Abstraction is the means. I, too, see a need to be more abstract than Mitch's FORML paper to ensure implementability on native code systems. Below are some specific comments. It assumes we wish to keep this addressing scheme. >> In the same respect I also beleve that a.addr should point to the >> start of the token. Hence on a subroutine threded system a.addr >> will point to the subroiutne call instruction. > >Don't pin point it down. a.addr returns an address which points TOKEN@ >uses to return an xt. I should not have to care about what it points to. Or if it is a pointer at all: In native code one instruction can represent several words. >STR@ ( c.addr1 length c.addr2 -- actualLength ) > Move the string from c.addr1 to c.addr2. Do not exceed the provided > length count if the string is longer than provided for. Return the > actual length of the string. Why move it at all? Leave it in place, the user can copy it if (s)he really wants to. Or are there address space problems (8088 et al.)? Then copy it into a buffer (only one string at a time). Not good, but better than having to allocate space for something I get, use and immediately throw away. If you really must have preallocation, divide the word in two - one for getting the length (for allocating), the other for copying the string. BTW, uses of ALLOCATE should be fully transparent, so what's your problem with it (apart from deallocation)? >>This is an interesting point. However, if we were to introduce a new id >>(namly "name-id" or "nid") we would also have to provide a way of converting >>between nid and xt. The possibality of getting this through the TSC will be >>pretty remote in my view. > >You have a good point, but I agree with Mitch. For some, the use of aliases >is so pervasive that they may forget they are using them. A decompilation >to the original code may simply serve to confuse, or think the decompiler >has a bug in it. > >>This will not allow us to re-construct the orriginal definition when such >>alias have been used. However, the definition we create will be >>functionally the same, as we have simply resolved the aliasing. I conceder >>this to be a pain, but I don't see any way around it. > >The way around it is to use DTs instead of XTs. For most of the words we are >talking about, this would work just fine. For TOKEN@ etc.? Then the DT (= nid?) would have to be in the compiled code (or the additional info). Would this not be a bit too expensive (in terms of constraints on the implementation) just for decompiling aliases well? >I believe that it would be better if we implemented most words which >work with names to use DTs, and use XTs when traversing through the >parameter list of a word definition. Would anyone be interested if >I try to come up with such a word-set, or is this direction OK with >everyone? I might be interested if I knew what you meant. If DT = nid, what do you want to do that Mitch has not done (in <9110012346.AA08358@mitch.Eng.Sun.COM>)? >Food for thought: > >Are we concentrating at too low a level here? It seems to me that the >above word-set has been defined to handle the most common features of >current forths. Yet, we are defining a new word set, possibly for a >standard. Can't we expect changes to current forths to support this >word-set. It depends on how we market this wordset. If it's in the standard then we can expect these changes. However, if the wordset requires such changes, it will hardly get into the standard. I see several levels of implementability of internals words: 1) factors of standard words (implementable on every standard system) 2) words implementable on current systems 3) words that require changes in the data structures of current systems Examples: 1) NAME>STRING 2) simple decompilation 3) detecting whether a word is headerless I think that words on levels 2 and 3 should be in the internals extensions wordset >Specifically, what I would love to have, to really do this >right, is a hook into the dictionary header which will allow me to >vector a data return routine. This way, if I have a complex data >structure, I can call a routine which will return the value of each >field in sequence, so I can type them out. You mean, especially for taking created words apart, with the routine supplied by the user? Nice idea >I know this is of limited value in a production environment I would not bet on it. - anton From mips!anton@relay.EU.net Fri Nov 1 13:04:57 1991 Received: from lsi7.lsi.usp.br by lsi11.lsi.usp.br (4.1/LSI-16OCT91-01) id AA00183; Fri, 1 Nov 91 09:39:09 EDT Received: from lsi11.lsi.usp.br by lsi7.lsi.usp.br (4.1/SMI-4.1) id AA00133; Fri, 1 Nov 91 09:38:51 EDT Received: from lsi17.lsi.usp.br by lsi11.lsi.usp.br (4.1/LSI-16OCT91-01) id AA00162; Fri, 1 Nov 91 09:39:02 EDT Received: from mcsun.EU.net by lsi17.lsi.usp.br (4.0/SMI-4.1) id AA03152; Thu, 31 Oct 91 19:03:58-030 Received: from tuvie.can.ac.at by mcsun.EU.net with SMTP; id AA03023 (5.65a/CWI-2.120); Thu, 31 Oct 1991 18:39:05 +0100 Received: from mips.complang.tuwien.ac.at by tuvie.can.ac.at with SMTP; id AA14694 (5.65b+/CAN-1.15); Thu, 31 Oct 91 18:41:45 +0100 Received: by mips.complang.tuwien.ac.at (5.57/Ultrix3.0-C) id AA04422; Thu, 31 Oct 91 18:38:46 +0100 Date: Thu, 31 Oct 91 18:38:46 +0100 From: anton@mips.complang.tuwien.ac.at (Anton Martin Ertl) Message-Id: <9110311738.AA04422@mips.complang.tuwien.ac.at> To: internals@lsi17.lsi.usp.br Subject: decompiling and tracing Status: RO On decompiling and tracing (optimized native) code: Currently I see three alternatives: 1) The "functional equivalence approach" The Program is decompiled into code that is functionally equivalent to the original code. Every instruction or chunk of instructions decompiles into one or more Forth words. Since the decompiler already produces a register-stack mapping for its own purposes (regenerating stack words), displaying the stack between two steps or at a breakpoint should be easy. The regenerated program could look very different from the original one. 2) The "source approach" The system keeps additional information that enables mapping of executable instructions to the Forth source. When tracing, the decompiler just shows in the source, which word(s) is/are executed in the next step. Words that are optimized away (e.g. stack manipulation) are never shown. (This contrasts with approach 1 where such words are (re)generated). Displaying stack values is more difficult in this approach, since you want to display the stack of the source program: (in 1, the stack of the regenerated program is displayed) a) Since decompiling one instruction may yield multiple words at a time, that need not be adjacent in the source program, for which source position do you display the stack? b) not all stack values of a source can be displayed (optimized away, not available due to reordering ...). Are those that can be displayed, sufficient for the user? c) However, values for other source positions can be displayed. Can this info be organized in a way useful for the user? This is the approach taken by other languages (like C). 3) The "virtual execution approach" Similar to the "source approach", but when tracing the system does not execute the optimized code, but executes (or simulates the execution of) code that corresponds to the Forth source. Therefore the stack can always be displayed. But there are problems when switching from real to virtual execution (e.g. if an exception occurs, or at a breakpoint) and back. I could not find a solution to this problem yet. Also, due to portability bugs in the users code or bugs in the optimizer the results of real and virtual execution might differ. The "functional equivalence approach" and the "virtual execution approach" seem to provide more power: They can not only be used for debugging, but also for purposes like abstract interpretation. With the current compilers the questions outlined above (Are the regenerated programs understandable? Are the stack displays useful?) do not pose themselves. We will get the answers, when (if?) highly optimizing compilers appear. So what alternative will our wordset support? (We could support more than one) I think, that we should support the "functinal equivalence approach" for the following reasons: 1) It provides more power than the "source approach" 2) It does not have the real/virtual switching problems The "source approach" becomes interesting as soon as a source or editor interface is designed. - anton From cmr02@scm.tees-poly.ac.uk Thu Nov 7 02:18:01 1991 Received: from lsi7.lsi.usp.br by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01) id AA01209; Wed, 6 Nov 91 23:08:31 EDT Received: from lsi11.lsi.usp.br by lsi7.lsi.usp.br (4.1/SMI-4.1) id AA00909; Wed, 6 Nov 91 23:08:09 EDT Received: from sun2.nsfnet-relay.ac.uk by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01) id AA01206; Wed, 6 Nov 91 23:05:35 EDT Received: from scm.tees-poly.ac.uk by sun2.nsfnet-relay.ac.uk via JANET with NIFTP id <2825-0@sun2.nsfnet-relay.ac.uk>; Wed, 6 Nov 1991 08:31:08 +0000 From: Peter Knaggs (Research) Date: Tue, 5 Nov 91 17:45:20 GMT Message-Id: <24897.9111051745@scm.tp.ac.uk> To: internals <<@nsfnet-relay.ac.uk:internals@lsi.usp.br>> Subject: Internals Wordset Status: RO Ok, Well I have been out of the fray so to speek. Since the change of address and all that has cause large amounts of confusion at this end. Most of witch has been been cleaned up. However, I can see form the mailings that the rest of the group wants to abstract a bit. I shall relent and will have to aggree with you all on that one. There are some other general comments regarding Mitch Bradley's wordset (with my alterations). I have just prinited of all the Internals mailing that I have, and hope to be back with you, with some more detailed answers on this and other questions later on. Probably next week. Peter Knaggs School of Computing and Maths, Teesside Polytechnic, pjk @ scm.tp.ac.uk Middlesbrough, England. +44 (642) 342673 From cmr02@scm.tees-poly.ac.uk Tue Nov 26 05:05:37 1991 Received: from lsi7.lsi.usp.br by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01) id AA03273; Tue, 26 Nov 91 01:54:06 EDT Received: from lsi11.lsi.usp.br by lsi7.lsi.usp.br (4.1/SMI-4.1) id AA01337; Tue, 26 Nov 91 01:53:27 EDT Received: from sun2.nsfnet-relay.ac.uk by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01) id AB03270; Tue, 26 Nov 91 01:51:24 EDT Received: from scm.tees-poly.ac.uk by sun2.nsfnet-relay.ac.uk via JANET with NIFTP id <5639-0@sun2.nsfnet-relay.ac.uk>; Mon, 25 Nov 1991 10:42:27 +0000 From: Peter Knaggs (Research) Date: Mon, 25 Nov 91 10:24:18 GMT Message-Id: <15620.9111251024@scm.tp.ac.uk> To: internals <<@nsfnet-relay.ac.uk:internals@lsi.usp.br>> Subject: Internals: ... Status: R Finally I have some new ideas with regard to TOKEN@, LITERAL@ and the ilk. However I think we had better get this lot sorted out before I give you thows Peter Knaggs School of Computing and Maths, Teesside Polytechnic, pjk @ scm.tp.ac.uk Middlesbrough, England. +44 (642) 342673 From cmr02@scm.tees-poly.ac.uk Tue Nov 26 05:06:27 1991 Received: from lsi7.lsi.usp.br by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01) id AA03276; Tue, 26 Nov 91 01:55:34 EDT Received: from lsi11.lsi.usp.br by lsi7.lsi.usp.br (4.1/SMI-4.1) id AA01342; Tue, 26 Nov 91 01:54:55 EDT Received: from by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01) id AB03270; Tue, 26 Nov 91 01:54:04 EDT Received: from scm.tees-poly.ac.uk by sun2.nsfnet-relay.ac.uk via JANET with NIFTP id <5639-1@sun2.nsfnet-relay.ac.uk>; Mon, 25 Nov 1991 10:43:08 +0000 From: Peter Knaggs (Research) Date: Mon, 25 Nov 91 10:22:44 GMT Message-Id: <15606.9111251022@scm.tp.ac.uk> To: internals <<@nsfnet-relay.ac.uk:internals@lsi.usp.br>> Subject: Internals: NAME@, .NAME Status: R This brings me onto the problem of .NAME etc. Can we at least aggree on a name for the word that returns a sting? To keep with the rest of the wordset may I suggest: NAME@ ( nid -- c.addr n ) "Name Fetch" c.addr is the character alligned address of n charaters that represents the name associated with the name identifier nid. If the orginal name of the word can not be reproduced then a system dependent representation is returned. An ambiguas condition exists if nid is not a valid name identifier. Anton's request for this word to return somethink that will compile to the same nid is impossable, as you can not protect against possable name clashes (Well it is impossable in my system at least). Normally I don't aggree with returning an address and count, although I think that it is right for this case. The name can be left in-situe where possable, or copied into a buffer (say PAD) as required. .NAME ( xt -- ) "Dot Name" Display the name (or a system dependent representation of the name) corresponding to the execution token xt. .NAME can be defined as: : .NAME ( xt -- ; Display name for xt ) X>N \ Convert xt to nid NAME@ \ Get string for nid TYPE \ Display name ; Now that we have the word NAME@ I believe that we should not dump .NAME, but leave it there for assistance in debugging. It is for this reasion that I have defined .NAME to use the xt rather than a nid. Thus you can give the phrase: ' WORD .NAME word Ok or prehaps more inportantly: 'EMIT @ .NAME word Ok Peter Knaggs School of Computing and Maths, Teesside Polytechnic pjk @ scm.tp.ac.uk Middlesbrough, England. +44 (642) 342673 From cmr02@scm.tees-poly.ac.uk Tue Nov 26 05:06:52 1991 Received: from lsi7.lsi.usp.br by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01) id AA03279; Tue, 26 Nov 91 01:56:48 EDT Received: from lsi11.lsi.usp.br by lsi7.lsi.usp.br (4.1/SMI-4.1) id AA01346; Tue, 26 Nov 91 01:56:10 EDT Received: from by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01) id AB03270; Tue, 26 Nov 91 01:55:32 EDT Received: from scm.tees-poly.ac.uk by sun2.nsfnet-relay.ac.uk via JANET with NIFTP id <5639-2@sun2.nsfnet-relay.ac.uk>; Mon, 25 Nov 1991 10:44:04 +0000 From: Peter Knaggs (Research) Date: Mon, 25 Nov 91 10:22:02 GMT Message-Id: <15584.9111251022@scm.tp.ac.uk> To: internals <<@nsfnet-relay.ac.uk:internals@lsi.usp.br>> Subject: Internals: nid/xt Status: R The introduction of name identifiers (nid) or a dictionary token (DT) requiresw ords to transfer between the new nid and the only other thing in the standard that comes close to the abstract type we require, an execution token (xt). I suggest the follwoing definitions (in keeping with the conversion words already in the standard). N>X ( nid -- xt ) "n to x" xt is the execution token associated with the word indicated by the name identifier nid. Note: There may be a 1:1 or n:1 relation between nid and xt. This definition does not inhibit such a relationship. X>N ( xt -- nid ) "x to n" nid is a name identifier of a word associated with the execution token xt. Note: This definition is worded such that a system with 1:1 or n:1 nid:xt relationship can be implemented. In the n:1 case it is upto the system as to which nid is returned (the original or most reciently defined). Peter Knaggs School of Computing and Maths, Teesside Polytechnic, pjk @ scm.tp.ac.uk Middlesbrough, England. +44 (642) 342673 From cmr02@scm.tees-poly.ac.uk Tue Nov 26 05:07:43 1991 Received: from lsi7.lsi.usp.br by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01) id AA03285; Tue, 26 Nov 91 02:00:06 EDT Received: from lsi11.lsi.usp.br by lsi7.lsi.usp.br (4.1/SMI-4.1) id AA01350; Tue, 26 Nov 91 01:59:27 EDT Received: from by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01) id AB03270; Tue, 26 Nov 91 01:56:47 EDT Received: from scm.tees-poly.ac.uk by sun2.nsfnet-relay.ac.uk via JANET with NIFTP id <5639-3@sun2.nsfnet-relay.ac.uk>; Mon, 25 Nov 1991 10:44:54 +0000 From: Peter Knaggs (Research) Date: Mon, 25 Nov 91 10:20:55 GMT Message-Id: <15560.9111251020@scm.tp.ac.uk> To: internals <<@nsfnet-relay.ac.uk:internals@lsi.usp.br>> Subject: Internals: Follow/Another?/Unfollow Status: R I sent this mail some time ago, but am not sure that it got out of the maillist. Hence, I am sending it again. If you have already recieved a copy, could you let me know, as I am still not totally happy with the mail system on our Sun4s. ----------------------------- Cut Here ------------------------------------- Over the weekend I was deep into a bottle of wisky, sorry though. I have now been convinced that we should use an abstract token for wordsets. Now, with this in mind I have been thinking about the wordlist scanning words (FOLLOW, ANOTHER?, and UNFOLLOW). o In the previous definitions of these words I declared state to be unsized. In my system (for speed, and simplisity) I would store seventeen items in the state. This causes it's own problems. If I wanted to follow more than one wordlist at a time, I would have to know how large the 'state' is inorder to manipulate the stack correctly. To this end may I recomend the following (new) definitions: FOLLOW ( wid1 ... widn n -- state ) Initialise a system dependent stack structure (state) in preparation for scanning the given n wordlists (indicated by wid1 ... widn). The system dependent value (state) may be of any length. FOLLOW is used in conjunction with ANOTHER?, and UNFOLLOW. See also: ANOTHER?; UNFOLLOW. ANOTHER? ( state1 --- state2 nid flag ) "Another Query" Extracts the name identifyer (nid) of the next entry in the wordlist(s) being scanned (indicated by the system dependent stack structure state1, as initialsed by FOLLOW). If there a word is found the nid of the word is returned, in addition to an updated search status (state2) and a true flag. If no more words are found in the search then a false is returned and nid is not valid. ANOTHER? is used in conjunction with FOLLOW and UNFOLLOW. Example usage: : (WORDS) ( state -- ; List all the words in the search ) BEGIN ANOTHER? KEY? 0= AND WHILE NAME@ TYPE SPACE REPEAT UNFOLLOW ; : WORDS ( -- ; Display all words in current word list ) GET-ORDER 1- 0 ?DO DROP LOOP 1 FOLLOW (WORDS) ; : VLIST ( -- ; Display all words in search order ) GET-ORDER FOLLOW (WORDS) ; See also: FOLLOW; UNFOLLOW. UNFOLLOW ( state nid -- ) Removes the system dependent stack structure (state) initlised by FOLLOW, and the nid returned by ANOTHER?. UNFOLLOW is normally used when exiting an ANOTHER? based loop. See also: FOLLOW; ANOTHER?. You may notice that I have also changed the action of ANOTHER? and UNFOLLOW. This is basically because I tried to define WORDS, and VLIST using the previous stack effects, and ended up deciding that the new stack effects would make such definitions a lot simpler. This also means that thouse of you how want to allocate some memory in FOLLOW to store the state, can deallocate the memory using UNFOLLOW as you are now required to always have an UNFOLLOW matching a FOLLOW otherwise there will be a (possable vary large) stack inbalance. If we were to force state to be of a given size the I for one would have to implement a memory allocation/deallocation system to implement thes words. Peter Knaggs School of Computing and Maths, Teesside Polytechnic, pjk @ scm.tp.ac.uk Middlesbrough, England. +44 (642) 342673 From Mitch.Bradley%eng.sun.com@sun.com Sat Nov 30 02:38:53 1991 Received: from lsi7.lsi.usp.br by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01) id AA02000; Fri, 29 Nov 91 22:27:46 EDT Received: from lsi11.lsi.usp.br by lsi7.lsi.usp.br (4.1/SMI-4.1) id AA03709; Fri, 29 Nov 91 22:27:04 EDT Received: from sun2.nsfnet-relay.ac.uk by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01) id AA01996; Fri, 29 Nov 91 22:22:25 EDT Received: from vax.nsfnet-relay.ac.uk by sun2.nsfnet-relay.ac.uk with SMTP inbound id <29902-220@sun2.nsfnet-relay.ac.uk>; Fri, 29 Nov 1991 03:10:41 +0000 Received: from sun.com by vax.NSFnet-Relay.AC.UK via NSFnet with SMTP id aa23630; 29 Nov 91 1:53 GMT Received: from Eng.Sun.COM (zigzag-bb.Corp.Sun.COM) by Sun.COM (4.1/SMI-4.1) id AA26091; Tue, 26 Nov 91 11:45:46 PST Received: from mitch.Eng.Sun.COM by Eng.Sun.COM (4.1/SMI-4.1) id AA13269; Tue, 26 Nov 91 11:44:28 PST Received: by mitch.Eng.Sun.COM (4.1/SMI-4.1) id AA07507; Tue, 26 Nov 91 11:45:34 PST Message-Id: <9111261945.AA07507@mitch.Eng.Sun.COM> To: Peter Knaggs (Research) Cc: internals <<@nsfnet-relay.ac.uk:internals@lsi.usp.br>> Subject: Re: Internals: Follow/Another?/Unfollow Date: 26 Nov 91 11:45:34 PST (Tue) From: Mitch.Bradley@Eng.Sun.com Status: R > FOLLOW ( wid1 ... widn n -- state ) Why should we scan all the wordlists simulataneously? I would prefer a more primitive function that scans only one wordlist at a time. > Initialise a system dependent stack structure (state) in preparation > etc > ANOTHER? ( state1 --- state2 nid flag ) "Another Query" How about just: ANOTHER-WORD? ( nid1 wid -- false | nid2 true ) Finds the successor "nid2" of the word "nid1" in the wordlist "wid", or the first word in that wordlist if "nid1" is zero. I submit that any kind of search state that you need to maintain is the implementation's problem. The way I do is is to keep a single global state array that "caches" the most recent search state. I "tag" that cache with the last nid and wid, and rebuild it if I get a "miss". Example usage: : (WORDS) ( wid -- ) >R 0 BEGIN R@ ANOTHER-WORD? WHILE ( nid ) ( r: wid ) DUP NAME>STRING TYPE SPACE ( nid ) REPEAT ( ) R> DROP ; : GET-CONTEXT ( -- wid ) GET-ORDER 1- 0 ?DO NIP LOOP ; : WORDS ( -- ; Display all words in current word list ) GET-CONTEXT (WORDS) ; : VLIST ( -- ; Display all words in search order ) GET-ORDER 0 ?DO (WORDS) LOOP ; From <@sun.com:Mitch.Bradley@eng.sun.com> Sat Nov 30 09:30:30 1991 Received: from lsi7.lsi.usp.br by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01) id AA02082; Sat, 30 Nov 91 06:25:03 EDT Received: from lsi11.lsi.usp.br by lsi7.lsi.usp.br (4.1/SMI-4.1) id AA03751; Sat, 30 Nov 91 06:24:20 EDT Received: from sun2.nsfnet-relay.ac.uk by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01) id AA02079; Sat, 30 Nov 91 06:24:35 EDT Received: from vax.nsfnet-relay.ac.uk by sun2.nsfnet-relay.ac.uk with SMTP inbound id <24752-115@sun2.nsfnet-relay.ac.uk>; Sat, 30 Nov 1991 01:14:25 +0000 Received: from sun.nsfnet-relay.ac.uk by vax.NSFnet-Relay.AC.UK via Ethernet with SMTP id ab11051; 29 Nov 91 16:52 GMT Received: from vax.nsfnet-relay.ac.uk by sun.NSFnet-Relay.AC.UK Via Ethernet with SMTP id om26722; 29 Nov 91 13:11 GMT Received: from sun.com by vax.NSFnet-Relay.AC.UK via NSFnet with SMTP id aa00443; 29 Nov 91 6:10 GMT Received: from Eng.Sun.COM (zigzag-bb.Corp.Sun.COM) by Sun.COM (4.1/SMI-4.1) id AA19432; Tue, 26 Nov 91 11:16:09 PST Received: from mitch.Eng.Sun.COM by Eng.Sun.COM (4.1/SMI-4.1) id AA11356; Tue, 26 Nov 91 11:14:51 PST Received: by mitch.Eng.Sun.COM (4.1/SMI-4.1) id AA07467; Tue, 26 Nov 91 11:15:58 PST Message-Id: <9111261915.AA07467@mitch.Eng.Sun.COM> To: Peter Knaggs (Research) Cc: internals <<@nsfnet-relay.ac.uk:internals@lsi.usp.br>> Subject: Re: Internals: nid/xt Date: 26 Nov 91 11:15:57 PST (Tue) From: Mitch.Bradley@eng.sun.com Status: R > N>X ( nid -- xt ) "n to x" > X>N ( xt -- nid ) "x to n" The Forth-83 names for these functions are NAME> and >NAME , as described in an experimental proposal included in the Forth-83 document. I see nothing wrong with those "old" names, because they already do the "right thing" if you establish the correspondence: Abstract Actual Type in Some Type Implementations -------- -------------------- xt cfa nid nfa Thus, we can simply generalize the traditional words, changing their definitions to use "opaque" data types instead of particular addresses. I don't like the terseness of X>N and N>X ; terseness is good for frequently-used functions, but infrequently-used functions should "spell out" their function. XT>NAME and NAME>XT would be even better, except that NAME> and >NAME are already familiar and accepted in some circles. Familiarity is a big advantage when you are trying to convince people to go along with a new scheme. Mitch From anton%mips@AWITUW64.BITNET Sat Dec 7 22:27:15 1991 Received: from lsi7.lsi.usp.br by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01) id AA09925; Sat, 7 Dec 91 17:39:54 EDT Received: from lsi11.lsi.usp.br by lsi7.lsi.usp.br (4.1/SMI-4.1) id AA00978; Sat, 7 Dec 91 17:39:04 EDT Received: from [143.108.254.245] by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01) id AB09919; Sat, 7 Dec 91 17:39:41 EDT Return-Path: anton@mips Received: from AWITUW64.BITNET (MAILER@AWITUW64) by brfapesp.bitnet with PMDF#10108; Fri, 6 Dec 1991 21:05 -0300+1 Received: From AWITUW64.BITNET By AWITUW64.BITNET ; 06 Dec 91 19:44:34 GMT Received: from mips.complang.tuwien.ac.at by email.tuwien.ac.at (5.65b/1.34) id AA25524; Fri, 6 Dec 91 20:43:48 +0100 Received: by mips.complang.tuwien.ac.at (5.57/Ultrix3.0-C) id AA11057; Fri, 6 Dec 91 20:43:20 +0100 Date: Fri, 6 Dec 91 20:43:20 +0100 From: anton@mips.lsi.usp.br. From: anton@mips.complang.tuwien.ac.at (Anton Martin Ertl) Subject: internals: name@, next-word and names To: internals@lsi11.lsi.usp.br.lsi.usp.br. Message-Id: <9112061943.AA11057@mips.complang.tuwien.ac.at> X-Envelope-To: internals@lsi.usp.br Status: R Concerning Peter's and Mitch's recent postings: >NAME@ ( nid -- c.addr n ) "Name Fetch" > >Anton's request for this word to return somethink that will compile to the >same nid is impossable, as you can not protect against possable name clashes >(Well it is impossable in my system at least). You are right. Let's try a new wording: The returned string should produce the same nid on searches through the wordlist, unless the name is shadowed (by a more recent definition). BTW, what does dpANS say about shadowing? I did not find anything in Basis 15. Also, the description should include the minimal life expectancy of the resulting string. Will it live 'here' or in the pad in its own buffer. What can the user assume? Proposed wording: The returned string may be situated in the pad. The longer I think about FOLLOW/ANOTHER?/UNFOLLOW, the more I dislike it. On implementations using trees a straightforward approach would use unlimited space for the state/wordlist position. This causes either an unknown size on stack or use of memory allocation words. In both cases the state is not a first class data type (e.g. you cannot copy it like an integer), which causes more troubles. (We could add further words to make it first class again, e.g. a word for copying, but I do not think anybody would like this) So I support Mitch's solution. (It has one problem: It restricts the sytem to have a nid only once in a wordlist. But I think that all systems implement this restriction without being forced, so it should not make trouble.) However I like his older NEXT-WORD ( nid1 wid -- nid2 ) better than ANOTHER-WORD? ( nid1 wid -- false | nid2 true ), both the name and the stack effect. I think that the stack effect ( wid nid1 -- wid nid2 ) would be even nicer, resulting in: : (WORDS) ( wid -- ) 0 BEGIN NEXT-WORD ?DUP WHILE ( wid nid ) DUP NAME>STRING TYPE SPACE ( wid nid ) REPEAT ( ) DROP ; Concerning the names, I agree with Mitch. In addition, I think we should adapt some of the proposed names for conversion words to the usual scheme to make the names easy to remember. NAME>IMMEDIATE instead of IMMEDIATE? (should we keep the '?') XT>DEFINER or >DEFINER instead of DEFINER Also, I favor Mitch's NAME>STRING over Peters NAME@ From anton%mips@AWITUW64.BITNET Sat Dec 7 22:34:57 1991 Received: from lsi7.lsi.usp.br by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01) id AA09946; Sat, 7 Dec 91 17:41:28 EDT Received: from lsi11.lsi.usp.br by lsi7.lsi.usp.br (4.1/SMI-4.1) id AA00983; Sat, 7 Dec 91 17:40:39 EDT Received: from [143.108.254.245] by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01) id AB09936; Sat, 7 Dec 91 17:41:14 EDT Return-Path: anton@mips Received: from AWITUW64.BITNET (MAILER@AWITUW64) by brfapesp.bitnet with PMDF#10108; Sat, 7 Dec 1991 02:13 -0300+1 Received: From AWITUW64.BITNET By AWITUW64.BITNET ; 06 Dec 91 19:44:57 GMT Received: from mips.complang.tuwien.ac.at by email.tuwien.ac.at (5.65b/1.34) id AA25539; Fri, 6 Dec 91 20:44:36 +0100 Received: by mips.complang.tuwien.ac.at (5.57/Ultrix3.0-C) id AA11061; Fri, 6 Dec 91 20:44:08 +0100 Date: Fri, 6 Dec 91 20:44:08 +0100 From: anton@mips.lsi.usp.br. From: anton@mips.complang.tuwien.ac.at (Anton Martin Ertl) Subject: internals: words i miss To: internals@lsi11.lsi.usp.br.lsi.usp.br. Message-Id: <9112061944.AA11061@mips.complang.tuwien.ac.at> X-Envelope-To: internals@lsi.usp.br Status: R Some words for manipulating words and wordlists I missed in the discussion or would do differently: Motivation for the new words: Make it easier and more portable to build programs that build, change or analyse programs (metaprograms). Data types: definer-id: identifies the kind of word (colon def, variable, ...). In conventional systems this could be the code address (the content of the cfa). Different does> actions have a different definer-id. It needs one cell (I hope this suffices). What is the definer-id of code words? Implementation-dependent? I think it would be more useful if they returned a unique value (for all code words). But it would be harder to implement. Words: Mitchs CREATE-WORD ( addr len wid -- ) has no source for the immediate flag and the xt of the word. Therefore I propose to extend it in the following way: CREATE-WORD ( addr len wid flag definer-id -- nid ) creates a word with the name given by addr len, of type definer-id with immediateness given by flag, and inserts it into the wordlist wid. Note, that a stack effect like ( addr len wid flag xt -- nid ) would be more flexible, but would force the system to implement a n:1 relation of nids to xts. NAME-IMMEDIATE! ( flag nid -- ) changes immediateness of name to flag CREATE-XT ( definer-id -- xt ) cretes an anonymous word of type definer-id Mitchs DEFINER ( xt1 -- xt2 ), which returns the xt of the defining word of the word xt1, is useful, but hard or impossible to implement, e.g. : a DOES> ... ; : b CREATE ... a ; : c CREATE ... ; : d c a ; : e d ; b b1 d d1 e e1 What are the defining words of b1, d1 and e1? Returning b, d and e resp. might be quite useful, but IMO is impossible to implement. Returning `a' can be implemented (even that's quite hard), but has no advantage over returning a definer-id. So I propose: XT>DEFINER ( xt -- definer-id ) Gets the definer-id of the xt. It could also be called >DEFINER to stay consistent with other words, that do not mention the xt, or DEFINER to stay consistent with Mitch's existing usage. >NAME? ( xt -- flag ) returns true if the xt is associated with a name, otherwise false. This can be hard to implement on many systems (e.g. fig). Mitch's >NAME includes this function, Peter's X>N seems not to. Since the function "get the name assuming there is one" is usually much easier to implement, IMO these functions should be separated, to avoid implementors dropping both, if they do not implement this one. Alternative: make >NAME (X>N) ambiguous, if the xt does not have a name, but if the system can detect this, it should return 0. (half-ambiguous situation) XT-DEFINER! ( definer-id xt -- ) changes the definer of xt to definer-id. CREATE-DEFINER ( xt -- definer-id ) creates a definer-id for words that when called push their body address and then execute the xt This is best explained by example: ' x CREATE-DEFINER CREATE y ' y XT-DEFINER! is equivalent to : cx CREATE DOES> x ; cx y These two words are factors of Bill Ragdale's DOES, that Mitch discussed on comp.lang.forth some weeks ago. Just for discussion: NAME>WID ( nid -- wid ) Get a wordlist the name is in. Might be hard to implement. How useful is it? >SIZE ( xt -- u ) returns size of memory block allocated after the xt. This is usually hard to implement. How useful is it? NONAME ( -- xt ) After this word is called, the next defining-word does not consume the input stream, but instead creates an anonymous word. The returned execution token is the execution token of that word, if DP is not changed between noname and the call of the defining word. THIS-NAME ( addr len -- ) After this word is called, the next defining-word does not consume the input stream, but instead creates a word with the name given by addr len. These two words are really ugly, but I could not find a nicer way. Or should we just use create-xt and create-word respectively and replicate the data structure building code? Then you might have to go into other people's code, which you might not even have in source. Adding words make the other implicit parameters to defining words (e.g. current) more explicit, is not necessary, since they can be saved and restored. From anton%mips@AWITUW64.BITNET Fri Jan 24 17:52:22 1992 Received: from fpsp.fapesp.br ([143.108.254.245]) by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01) id AA02321; Fri, 24 Jan 92 14:46:51 EDT Return-Path: anton@mips Received: from AWITUW64.BITNET (MAILER@AWITUW64) by brfapesp.bitnet with PMDF#10108; Fri, 24 Jan 1992 14:46 -0200(C) Received: From AWITUW64.BITNET By AWITUW64.BITNET ; 24 Jan 92 15:47:44 GMT Received: from mips.complang.tuwien.ac.at by email.tuwien.ac.at (5.65b/1.34) id AA08191; Fri, 24 Jan 92 16:43:38 +0100 Received: by mips.complang.tuwien.ac.at (5.57/Ultrix3.0-C) id AA01888; Fri, 24 Jan 92 16:42:14 +0100 Date: Fri, 24 Jan 92 16:42:14 +0100 From: anton@mips.lsi.usp.br From: anton@mips.complang.tuwien.ac.at (Anton Martin Ertl) Subject: Words I miss To: internals@lsi11.lsi.usp.br.lsi.usp.br Message-Id: <9201241542.AA01888@mips.complang.tuwien.ac.at> X-Envelope-To: internals@lsi.usp.br Status: RO Another word I miss: LOOKUP-NAME ( addr len wid -- nid ) searches for the name given by addr len. If the wordlist wid contains the name, LOOKUP-NAME returns its nid, otherwise 0. This word is a factor of SEARCH-WORDLIST. (Is SEARCH-NAME or FIND-NAME better?) Concerning THIS-NAME: I realized that, in addition to being ugly THIS-NAME is unnecessary. It's function can be achieved by EVALUATE using an even uglier hack. This applies to NONAME, too (just use a dummy name and a scratch wordlist) - anton From anton%mips@AWITUW64.BITNET Sat Feb 1 13:51:05 1992 Received: from [143.108.254.245] by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01) id AB00167; Sat, 1 Feb 92 10:49:00 EDT Return-Path: anton@mips Received: from AWITUW64.BITNET (MAILER@AWITUW64) by brfapesp.bitnet with PMDF#10108; Sat, 1 Feb 1992 09:46 -0200(C) Received: From AWITUW64.BITNET By AWITUW64.BITNET ; 28 Jan 92 17:59:55 GMT Received: from mips.complang.tuwien.ac.at by email.tuwien.ac.at (5.65b/1.34) id AA21718; Tue, 28 Jan 92 18:59:28 +0100 Received: by mips.complang.tuwien.ac.at (5.57/Ultrix3.0-C) id AA08434; Tue, 28 Jan 92 18:58:37 +0100 Date: Tue, 28 Jan 92 18:58:37 +0100 From: anton@mips.lsi.usp.br From: anton@mips.complang.tuwien.ac.at (Anton Martin Ertl) Subject: colon definition internals words To: internals@lsi11.lsi.usp.br.lsi.usp.br Cc: anton@mips.lsi.usp.br Message-Id: <9201281758.AA08434@mips.complang.tuwien.ac.at> X-Envelope-To: internals@lsi.usp.br Status: RO This is a repost, since the first try seems to have got lost. My proposals for words for accessing the body of colon defs: I think that the words proposed in Mitch's paper is not abstract enough to be universally implementable. It seems that there has been some discussion on this before, but I missed part of it, so if I am redoing things, I am sorry. Anyway, let's get on with the wordset while the iron is hot. As for the descriptions of the words, I think I have overdone them - they sound precise, but are probably hard to understand, sorry. Feel free to ask when something seems unclear. Motivation: These words are useful for applications like a decompiler, structure charting, abstract interpretation. Data types: code position: identifies a specific place in the colon definition. Its size is two cells (Is this enough?). Code positions can be compared using double operators; (or should there be an extra operator? or double unsigned?) In conventional systems this will be a pointer into the actual threaded code. This is Mitch's xadr. Words: The code that can be accessed through these words is equivalent to the original code, but need not have any other similarity to the original code or conform to the standard. The code can also be different from the one produced at previous accesses (I am thinking of native code implementations which might recreate different code when they start decompiling in the middle instead of at the beginning) XT>CODEPOS ( xt -- codepos ) returns the codepos of the first word of the colon definition xt. If xt is no colon definition, 0. is returned. DEFINER>CODEPOS ( definer-id -- codepos ) returns the codepos of the first word of the DOES>-part of the word that creates words with the definer definer-id. If definer-id does not belong to a DOES>-defining word, the effect is ambigous. (Defining DEFINER>XT and using DEFINER>XT XT>CODEPOS would be nicer, but would not be implementable without changing the structure of many systems (e.g. fig-derived)) NEXT-CODEPOS ( codepos1 -- codepos2 ) Codepos2 is the position of the word sequentially executed after the word at codepos1, if the word at codepos1 is not a branching word. codepos2 is greater than codepos1. (Is this too restrictive?) It is ensured that, starting at the beginning, by stepping through the colon definition with NEXT-CODEPOS every word is seen exactly once. If codepos1 is the position of the last word in the colon definition, the result is ambigous. (It would be best to return 0.) TOKEN@ ( codepos -- xt ) (or should we rename it, because the stack effect has changed?) TOKEN! ( xt codepos -- ) If the word at codepos has inline arguments, the effect of TOKEN! is ambigous. (This word is very hard to implement on native code systems) CODEPOS-COMPILE ( codepos -- ) append execution semantics of the code at codepos to the current definition. (In plain English: Compile the word at codepos and its inline arguments). An ambigous condition exists if the code is a branching word. CODEPOS>STRING ( codepos -- addr len ) Returns a string representation of the word at codepos and its inline argument(s). (This word is useful only for the decompiler. Should we include it? Are there other operations on inline arguments that might be interesting? What's the lifetime of the string?) CODEPOS>TARGETS ( codepos -- n*codepos n ) returns the code positions of the words that can be executed immediately after the word at codepos. n*codepos are the possible targets, n is their number; the targets can include the word returned by NEXT-CODEPOS, in case of nonbranching words it will be the only target. For words using targets supplied at run-time (EXIT, THROW) only the statically determined targets are returned (e.g. none for EXIT). (I hate words with variable stack effects - is there a better solution with less than three words?) BREAKPOINT! ( xt codepos -- ) causes the execution of the breakpoint handler xt, just before the word at codepos is executed. If there already is a breakpoint at codepos, it is replaced. 0 for the xt removes a possible breakpoint (or the xt of NOOP instead of 0?). Before executing the handler codepos is pushed. (BREAKPOINT! could be implemented using TOKEN!, but it is probably easier to implement BREAKPOINT! than TOKEN! in native code systems) BREAKPOINT@ ( codepos -- xt ) This is included to enable programs to be well-behaved, i.e. so a module can avoid treading on another module's feet. (Should we add to the discription of BREAKPOINT! that it works only on the task that executed the BREAKPOINT!) TRACING! ( xt -- ) behaves like a BREAKPOINT! on the whole code in the system. To avoid an endless loop tracing is turned off while executing xt. (This cannot be implemented using BREAKPOINT! because of things like EXIT and THROW). TRACING@ ( -- xt ) - anton (anton@mips.complang.tuwien.ac.at) (Note that the header is a bit scrambled, so mail explicitely) (If you have mailed anything to me on my last postings (Dec 6 and Jan 25, I did not receive it) From internals@lsiserv2.lsi.usp.br Mon Feb 24 14:41:48 1992 Received: from lsiserv2 (lsiserv2.lsi.usp.br) by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01) id AA12732; Mon, 24 Feb 92 10:32:52 EST Errors-To: pl@lsiserv2.lsi.usp.br Received: from (loopback) by lsiserv2 (4.1/SMI-4.1) id AA00987; Mon, 24 Feb 92 10:21:27 EST Date: Mon, 24 Feb 92 10:21:26 EST Errors-To: pl@lsiserv2.lsi.usp.br Message-Id: <9202171911.AA14560@mips.complang.tuwien.ac.at> Comment: Forth Internals Distribution List Originator: internals Errors-To: pl@lsiserv2.lsi.usp.br Reply-To: Sender: internals@lsiserv2.lsi.usp.br Version: 5.4 -- Copyright (c) 1991/92, Anastasios Kotsikonas From: anton@mips.complang.tuwien.ac.at (Anton Martin Ertl) To: Multiple recipients of list Subject: internals dpANS comment Status: RO Should we write a public review comment urging the TC to include an internals wordset and promising to supply a wordset as baseline for their discussion later? As far as I know, the deadline is February 25th. If yes, who will do it? Is there still anybody interested? (I ask because I have only heard myself on this mailing list since December) M. Anton Ertl Some things have to be seen to be believed anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen From nick@pwllheli.sw.stratus.com Mon Feb 24 14:56:15 1992 Received: from lectroid.sw.stratus.com (lectroid-gw.sw.stratus.com) by transfer.stratus.com (4.1/3.8-jjm) id AA08529; Mon, 24 Feb 92 08:55:21 EST Received: from pwllheli.sw.stratus.com by lectroid.sw.stratus.com (4.1/3.7-jjm) id AA03589; Mon, 24 Feb 92 08:55:39 EST Received: by pwllheli.sw.stratus.com (4.1/SMI-4.0) id AA01716; Mon, 24 Feb 92 08:55:36 EST Date: Mon, 24 Feb 92 08:55:36 EST From: nick@pwllheli.sw.stratus.com (Nicolas Tamburri) Message-Id: <9202241355.AA01716@pwllheli.sw.stratus.com> To: anton@mips.complang.tuwien.ac.at Subject: Re: internals dpANS comment Status: RO I regret that I have not been able to participate more fully in this task. Since the mailing list was formed my official work load has increased to the point where it has not been possible for me to compose coherent responses to any of the mailings, and so I have not even tried beyond the first couple of mailings. (I'm sure this has a familiar ring to everyone else on this list.) I appreciate the work you've done, and regret that I have not been able to participate more. But, until things lighten up a little at work, I don't believe I'll be able to contribute to this task beyond reading. /nt From internals@lsiserv2.lsi.usp.br Tue Feb 25 04:56:25 1992 Received: from lsiserv2 (lsiserv2.lsi.usp.br) by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01) id AA14968; Tue, 25 Feb 92 00:35:23 EST Errors-To: pl@lsiserv2.lsi.usp.br Received: from (loopback) by lsiserv2 (4.1/SMI-4.1) id AA01603; Tue, 25 Feb 92 00:23:56 EST Date: Tue, 25 Feb 92 00:23:55 EST Errors-To: pl@lsiserv2.lsi.usp.br Message-Id: <9202250121.AA06929@mitch.Eng.Sun.COM> Comment: Forth Internals Distribution List Originator: internals Errors-To: pl@lsiserv2.lsi.usp.br Reply-To: Sender: internals@lsiserv2.lsi.usp.br Version: 5.4 -- Copyright (c) 1991/92, Anastasios Kotsikonas From: Mitch.Bradley@Eng.Sun.COM To: Multiple recipients of list Subject: Re: internals dpANS comment Status: RO > Should we write a public review comment urging the TC to include an > internals wordset and promising to supply a wordset as baseline for > their discussion later? My feeling is that it is much too late for the TC to consider such a massive undertaking. Debate over such a wordset would delay the standard by at least 6 months, and I don't think that is in anybody's best interests. The internals stuff we have been discussing would be an appropriate topic for the TC to consider as an extension *after* the standard is approved. The ANSI process does provide for such ongoing work. Mitch From internals@lsiserv2.lsi.usp.br Tue Feb 25 22:52:52 1992 Received: from lsiserv2 (lsiserv2.lsi.usp.br) by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01) id AA00674; Tue, 25 Feb 92 18:39:01 EST Errors-To: pl@lsiserv2.lsi.usp.br Received: from (loopback) by lsiserv2 (4.1/SMI-4.1) id AA02990; Tue, 25 Feb 92 18:27:30 EST Date: Tue, 25 Feb 92 18:27:30 EST Errors-To: pl@lsiserv2.lsi.usp.br Message-Id: <9202252137.AA05874@lsi3> Comment: Forth Internals Distribution List Originator: internals Errors-To: pl@lsiserv2.lsi.usp.br Reply-To: Sender: internals@lsiserv2.lsi.usp.br Version: 5.4 -- Copyright (c) 1991/92, Anastasios Kotsikonas From: pjk@scm.tp.ac.uk To: Multiple recipients of list Subject: internals dpANS comment Status: RO On the face of it, this seames like a good idea. For a number of reasions I would ask Mitch to do the job. However, I think that we are all aggreed that the Internals wordset would not make it into this revision. We must work out a full wordset, with the intent on getting it accepted into the standard on it's next revision (in five years or so). It may be a good idea to write a letter to the TC stating this intent. (or is this what you said?) Peter J. Knaggs. School of Computing and Maths, Teesside Polytechnic, pjk @ scm.tp.ac.uk Middlesbrough, England. +44 (642) 342673 . From internals@lsiserv2.lsi.usp.br Wed Feb 26 01:14:41 1992 Received: from lsiserv2 (lsiserv2.lsi.usp.br) by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01) id AA01144; Tue, 25 Feb 92 21:06:33 EST Errors-To: pl@lsiserv2.lsi.usp.br Received: from (loopback) by lsiserv2 (4.1/SMI-4.1) id AA03496; Tue, 25 Feb 92 20:55:02 EST Date: Tue, 25 Feb 92 20:55:01 EST Errors-To: pl@lsiserv2.lsi.usp.br Message-Id: <9202252321.AA08219@mitch.Eng.Sun.COM> Comment: Forth Internals Distribution List Originator: internals Errors-To: pl@lsiserv2.lsi.usp.br Reply-To: Sender: internals@lsiserv2.lsi.usp.br Version: 5.4 -- Copyright (c) 1991/92, Anastasios Kotsikonas From: Mitch.Bradley@Eng.Sun.COM To: Multiple recipients of list Subject: Re: internals dpANS comment Status: RO > On the face of it, this seames like a good idea. For a number of reasions > I would ask Mitch to do the job. As before, I am not volunteering to, and will not be roped into, carrying the ball on this issue. I agreed to participate in technical discussions but disclaimed interest in driving the issue in committee. The committee battle on such a wordset will be long and bloody (or perhaps short and fatal), and I'm insufficiently motivated to carry the banner, lead the charge, and take the arrows. Mitch From internals@lsiserv2.lsi.usp.br Wed Feb 26 01:52:18 1992 Received: from lsiserv2 (lsiserv2.lsi.usp.br) by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01) id AA01148; Tue, 25 Feb 92 21:37:07 EST Errors-To: pl@lsiserv2.lsi.usp.br Received: from (loopback) by lsiserv2 (4.1/SMI-4.1) id AA03530; Tue, 25 Feb 92 21:25:35 EST Date: Tue, 25 Feb 92 21:25:35 EST Errors-To: pl@lsiserv2.lsi.usp.br Message-Id: <9202260035.AA05955@lsi3> Comment: Forth Internals Distribution List Originator: internals Errors-To: pl@lsiserv2.lsi.usp.br Reply-To: Sender: internals@lsiserv2.lsi.usp.br Version: 5.4 -- Copyright (c) 1991/92, Anastasios Kotsikonas From: pl@lsiserv2.lsi.usp.br (Pedro Sanchez) To: Multiple recipients of list Subject: Re: internals dpANS comment Status: RO >The committee battle on such a wordset will be long and bloody (or perhaps >short and fatal), and I'm insufficiently motivated to carry the banner, >lead the charge, and take the arrows. Oh, Mitch. Where is your idealism? :-) Anyway, your words are very descriptive of the situation. ========================================================================== Pedro Luis Prospero Sanchez internet: pl@lsi.usp.br (PREFERRED) University of Sao Paulo uunet: uunet!vme131!pl Dept. of Electronic Engineering hepnet: psanchez@uspif1.hepnet phone: (055)(11)211-4574 home: (055)(11)914-9756 fax: (055)(11)815-4272 ========================================================================== From internals@lsiserv2.lsi.usp.br Wed Feb 26 04:26:08 1992 Received: from lsiserv2 (lsiserv2.lsi.usp.br) by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01) id AA01176; Wed, 26 Feb 92 00:21:56 EST Errors-To: pl@lsiserv2.lsi.usp.br Received: from (loopback) by lsiserv2 (4.1/SMI-4.1) id AA03605; Wed, 26 Feb 92 00:10:24 EST Date: Wed, 26 Feb 92 00:10:23 EST Errors-To: pl@lsiserv2.lsi.usp.br Message-Id: <9202260319.AA08383@mitch.Eng.Sun.COM> Comment: Forth Internals Distribution List Originator: internals Errors-To: pl@lsiserv2.lsi.usp.br Reply-To: Sender: internals@lsiserv2.lsi.usp.br Version: 5.4 -- Copyright (c) 1991/92, Anastasios Kotsikonas From: Mitch.Bradley@Eng.Sun.COM (Mitch Bradley) To: Multiple recipients of list Subject: Re: internals dpANS comment Status: RO > Oh, Mitch. Where is your idealism? :-) Hmmm, where did I put that idealism? I know I had it when I came in... Seriously, it got beaten out of me 'round about my second or third ANS Forth committee meeting. Once it was gone, I started to actually have some impact on the committee. I learned what it takes to persuade a diverse group of people to see things your way, or at least to vote with you. Mitch From internals@lsiserv2.lsi.usp.br Sun Jun 21 19:34:05 1992 Received: from lsiserv2 (lsiserv2.lsi.usp.br) by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01) id AA10641; Sun, 21 Jun 92 14:31:18 EST Errors-To: pl@lsiserv2.lsi.usp.br Received: from (loopback) by lsiserv2 (4.1/SMI-4.1) id AA02854; Sun, 21 Jun 92 14:09:30 EST Date: Sun, 21 Jun 92 14:09:30 EST Errors-To: pl@lsiserv2.lsi.usp.br Message-Id: <9206191629.AA08486@mips.complang.tuwien.ac.at> Comment: Forth Internals Distribution List Originator: internals Errors-To: pl@lsiserv2.lsi.usp.br Reply-To: Sender: internals@lsiserv2.lsi.usp.br Version: 5.5 -- Copyright (c) 1991/92, Anastasios Kotsikonas From: anton@mips.complang.tuwien.ac.at (Anton Martin Ertl) To: Multiple recipients of list Subject: internals progress report Status: OR Somebody on comp.lang.forth has asked for a progress report of the internals mailing list. I have put a short one together. Unless you complain, I'll post in on June 24th (Wednesday). To have some progress to report in the future (8-), I would like to see a bit of discussion on the unresolved issues (NEXT-WORD vs. FOLLOW/ANOTHER?/UNFOLLOW) and on the words not discussed until now (my proposals for decompilation and debugging). Shall I do a summary of these things so you don't have to wade through all the old postings? If the reason for your silence is lack of time, how about concentrating on the dictionary stuff for now? - anton M. Anton Ertl Some things have to be seen to be believed anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen ------------------ progress report -------------------------- Somebody on comp.lang.forth has asked for a progress report of the internals mailing list. Here's a short one: The goal of our work is the internals wordset: It should provide access to internals of the Forth system without restricting the implementation. We have made good progress on dictionary access, but this has not yet resulted in a document that can be presented to the public. There have also been proposals for words necessary for deompilation/debugging, but they have not been discussed yet. In the last months the mailing list was quiet. Maybe we need some fresh blood. Mail to pl@lsi.usp.br (Pedro Sanchez) to participate. From internals@lsiserv2.lsi.usp.br Tue Jun 16 02:36:01 1992 Received: from lsiserv2 (lsiserv2.lsi.usp.br) by lsi11.lsi.usp.br (SUN-IPC(LSI)/4.1/SMI-4.01) id AA02422; Mon, 15 Jun 92 21:32:25 EST Errors-To: pl@lsiserv2.lsi.usp.br Received: from (loopback) by lsiserv2 (4.1/SMI-4.1) id AA01888; Mon, 15 Jun 92 21:11:15 EST Date: Mon, 15 Jun 92 21:11:14 EST Errors-To: pl@lsiserv2.lsi.usp.br Message-Id: <9206151829.AA04871@mips.complang.tuwien.ac.at> Comment: Forth Internals Distribution List Originator: internals Errors-To: pl@lsiserv2.lsi.usp.br Reply-To: Sender: internals@lsiserv2.lsi.usp.br Version: 5.5 -- Copyright (c) 1991/92, Anastasios Kotsikonas From: anton@mips.complang.tuwien.ac.at (Anton Martin Ertl) To: Multiple recipients of list Subject: Re: What should the Standard include? Status: OR In article <3728.UUL1.3#5129@willett.pgh.pa.us> Doug Phillips writes: |> Perhaps the internals group could post |> occasional progress reports to ForthNet for those of us who are interested |> in kind of knowing what is going on, but who cannot participate directly? I have written up something. I'll post it on Thursday, unless somebody complains: ----------------------------------------------------------------- This is a short summary of the work of the internals group until now. Our work until now covers two areas: dictionary access and things necessary for decompiling and debugging. Taking Mitch Bradleys previous work as basis, we are quite far in the dictionary access discussion. Proposals for decompiler and debugging words were made, but they were not discussed yet. In the last few months there was no activity on the mailing list, since we all seem to have too little time. Perhaps fresh blood would bring a little life into the discussion. Mail to Pedro Sanchez (pl@lsi.usp.br) if you want to participate. ----------------------------------------------------------------- BTW, I would like to hear some opinions on the things that have not yet been discussed (the debugging words) and on the things that have not yet been resolved(FOLLOW/ANOTHER/UNFOLLOW vs NEXT_WORD). Shall I make a summary of the words up to now so that you don't have to wade through all those old mails? Also, what do you think about including words for accessing stacks and locals (the locals thing would be hard, IMO). - anton -- M. Anton Ertl Some things have to be seen to be believed anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen From internals@lsiserv2.lsi.usp.br Tue Jul 21 02:15:10 1992 Received: by email.tuwien.ac.at (5.65b/1.34) id AA21830; Tue, 21 Jul 92 02:15:12 +0200 Received: From aearn.bitnet By awituw64.bitnet ; 21 Jul 92 00:15:11 GMT Received: from brfapesp.bitnet by AEARN.EDVZ.Uni-Linz.AC.AT (Mailer R2.07) with BSMTP id 7956; Tue, 21 Jul 92 02:14:35 CDT Received: from lsi11.lsi.usp.br by brfapesp.bitnet with PMDF#10108; Mon, 20 Jul 1992 21:14 BSC (-0300 C) Received: from lsiserv2.lsi.usp.br.lsi.usp.br by lsi11.lsi.usp.br (4.1/SMI-4.1) id AA16401; Mon, 20 Jul 92 21:12:07 EST Received: from ([127.0.0.1]) by lsiserv2.lsi.usp.br.lsi.usp.br (4.1/SMI-4.1) id AA06206; Mon, 20 Jul 92 20:47:09 EST Date: Mon, 20 Jul 92 20:47:08 EST From: anton@mips.complang.tuwien.ac.at (Anton Martin Ertl) Subject: internals dictionary access summary Sender: internals@lsiserv2.lsi.usp.br To: Multiple recipients of list Errors-To: pl@lsiserv2.lsi.usp.br Errors-To: pl@lsiserv2.lsi.usp.br Errors-To: pl@lsiserv2.lsi.usp.br Reply-To: internals@lsiserv2.lsi.usp.br Message-Id: <9207200752.AA14303@mips.complang.tuwien.ac.at> X-Envelope-To: anton@mips.complang.tuwien.ac.at Comment: Forth Internals Distribution List Originator: internals Version: 5.5 -- Copyright (c) 1991/92, Anastasios Kotsikonas Status: OR This is a rerepost. I hope it gets through this time. This is a summary of the proposals/discussions on dictionary access (I have left out debugging/decompilation for now). I mostly present the original proposals. Comments on them in other postings are paraphrased in brackets, alternative proposals are usually presented in full. DATA TYPES: name-id (nid) The name-id uniquely identifies a header. The execution token does not suffice because some systems have alias mechanisms and headerless words. There has been some discussion whether to have seperate name-ids or to use xts. I think it has been resolved in favor of keeping them separate (I have replaced xt by name-id in the proposals where it seemed to be appropriate). In fig-Forth the name-id can be represented by the NFA. definer-id: identifies the kind of word (colon def, variable, ...). In conventional systems this could be the code address (the content of the cfa). Different does> actions have different definer-ids. It needs one cell (I hope this suffices). What is the definer-id of code words? Implementation-dependent? I think it would be more useful if they returned a unique value (for all code words). But it would be harder to implement. [The xt of the defining word has been proposed for this function] ANSI Forth data types used subsequently wid - ANS Forth wordlist id xt - execution token adr len - string WORDS next-word ( nid1 wid -- nid2 ) nid2 is the name preceding nid1 in wordlist in the wordlist "wid". If nid1 is 0, nid2 is the first word in wid. If nid2 is 0, there are no more words in wid. [An alternative proposed stack effect is ( wid nid1 -- wid nid2 )] alternative proposal: ANOTHER-WORD? ( nid1 wid -- false | nid2 true ) Finds the successor "nid2" of the word "nid1" in the wordlist "wid", or the first word in that wordlist if "nid1" is zero. alternative proposal: follow/another?/unfollow described below FOLLOW ( wid1 ... widn n -- state ) Initialise a system dependent stack structure (state) in preparation for scanning the given n wordlists (indicated by wid1 ... widn). The system dependent value (state) may be of any length. FOLLOW is used in conjunction with ANOTHER?, and UNFOLLOW. See also: ANOTHER?; UNFOLLOW. [Originally proposed using only one wid ANOTHER? ( state1 --- state2 nid flag ) "Another Query" Extracts the name identifyer (nid) of the next entry in the wordlist(s) being scanned (indicated by the system dependent stack structure state1, as initialsed by FOLLOW). If there a word is found the nid of the word is returned, in addition to an updated search status (state2) and a true flag. If no more words are found in the search then a false is returned and nid is not valid. ANOTHER? is used in conjunction with FOLLOW and UNFOLLOW. Example usage: : (WORDS) ( state -- ; List all the words in the search ) BEGIN ANOTHER? KEY? 0= AND WHILE NAME@ TYPE SPACE REPEAT UNFOLLOW ; : WORDS ( -- ; Display all words in current word list ) GET-ORDER 1- 0 ?DO DROP LOOP 1 FOLLOW (WORDS) ; : VLIST ( -- ; Display all words in search order ) GET-ORDER FOLLOW (WORDS) ; See also: FOLLOW; UNFOLLOW. UNFOLLOW ( state nid -- ) Removes the system dependent stack structure (state) initlised by FOLLOW, and the nid returned by ANOTHER?. UNFOLLOW is normally used when exiting an ANOTHER? based loop. See also: FOLLOW; ANOTHER?. LOOKUP-NAME ( addr len wid -- nid ) searches for the name given by addr len. If the wordlist wid contains the name, LOOKUP-NAME returns its nid, otherwise 0. This word is a factor of SEARCH-WORDLIST. (Is SEARCH-NAME or FIND-NAME better?) create-word ( adr len wid -- ) Create the named word in the vocabulary "xt" alternative proposal: CREATE-WORD ( addr len wid flag definer-id -- nid ) creates a word with the name given by addr len, of type definer-id with immediateness given by flag, and inserts it into the wordlist wid. XT>DEFINER ( xt -- definer-id ) Gets the definer-id of the xt. [alternative names: DEFINER and >DEFINER] [I don't include the other proposals for working with xts and definer-ids, as they are not directly related to dictionary access] remove-word ( nid wid -- ) Remove the name nid from the wordlist wid. immediate? ( name-id -- flag ) True if word is immediate [alternative names NAME>IMMEDIATE and NAME>IMMEDIATE?] NAME-IMMEDIATE! ( flag nid -- ) changes immediateness of name to flag >name ( xt -- nid ) Return a name of the word xt, or 0 if that word has no names. alternative proposal: X>N ( xt -- nid ) "x to n" nid is a name identifier of a word associated with the execution token xt. name> ( nid -- xt ) Return the execution token of the name nid. alternative proposal: N>X ( nid -- xt ) "n to x" xt is the execution token associated with the word indicated by the name identifier nid. [>name and name> are already in Forth-83 (as experimental proposals) with the corresponding meaning] >NAME? ( xt -- flag ) returns true if the xt is associated with a name, otherwise false. [This can be hard to implement on many systems. Alternative: >NAME (X>N) returns 0 if it can detect this, otherwise it is ambigous) name>string ( nid -- adr len ) Return string representation of name alternative proposal: NAME@ ( nid -- c.addr n ) "Name Fetch" c.addr is the character alligned address of n charaters that represents the name associated with the name identifier nid. If the orginal name of the word can not be reproduced then a system dependent representation is returned. An ambiguas condition exists if nid is not a valid name identifier. [The returned string should produce the same nid on searches through the wordlist, unless the name is shadowed (by a more recent definition).] [The lifetime of the string needs to be specified (e.g. say that it may reside in PAD)] .NAME ( xt -- ) "Dot Name" Display the name (or a system dependent representation of the name) corresponding to the execution token xt. [The name produced by NAME>STRING and/or .NAME should be usable as input for wordlist searching words] .NAME can be defined as: : .NAME ( xt -- ; Display name for xt ) X>N \ Convert xt to nid NAME@ \ Get string for nid TYPE \ Display name ; [This word is convenient for debugging] M. Anton Ertl Some things have to be seen to be believed anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen From internals@lsiserv2.lsi.usp.br Wed Jul 22 07:13:29 1992 Received: by email.tuwien.ac.at (5.65b/1.34) id AA08937; Wed, 22 Jul 92 07:13:42 +0200 Received: From aearn.bitnet By awituw64.bitnet ; 22 Jul 92 05:13:41 GMT Received: from brfapesp.bitnet by AEARN.EDVZ.Uni-Linz.AC.AT (Mailer R2.07) with BSMTP id 5052; Wed, 22 Jul 92 07:13:23 CDT Received: from lsi11.lsi.usp.br by brfapesp.bitnet with PMDF#10108; Wed, 22 Jul 1992 02:13 BSC (-0300 C) Received: from lsiserv2.lsi.usp.br.lsi.usp.br by lsi11.lsi.usp.br (4.1/SMI-4.1) id AA19195; Tue, 21 Jul 92 17:23:45 EST Received: from ([127.0.0.1]) by lsiserv2.lsi.usp.br.lsi.usp.br (4.1/SMI-4.1) id AA09088; Tue, 21 Jul 92 16:58:44 EST Date: Tue, 21 Jul 92 16:58:44 EST From: anton@mips.complang.tuwien.ac.at (Anton Martin Ertl) Subject: internals discussions Sender: internals@lsiserv2.lsi.usp.br To: Multiple recipients of list Errors-To: pl@lsiserv2.lsi.usp.br Errors-To: pl@lsiserv2.lsi.usp.br Reply-To: internals@lsiserv2.lsi.usp.br Message-Id: <9207211958.AA09088@lsiserv2.lsi.usp.br.lsi.usp.br> X-Envelope-To: anton@mips.complang.tuwien.ac.at Comment: Forth Internals Distribution List Originator: internals Version: 5.5 -- Copyright (c) 1991/92, Anastasios Kotsikonas Status: OR While summarizing the dictionary words I thoght about some of the problems of our discussion and how to solve them. Many things have been proposed several times (usually with different names, stack effect and descriptions). Proposals often don't reference alternative proposals. Nobody announces support for proposals. Nobody explicitely withdraws proposals (But I guess presenting something new for the same purpose counts as withdrawal). I think that we need a working document to solve these problems. Then there's something concrete to discuss. This also helps newcomers (there's the danger of rehashing discussions, but that's no problem as they can read the old postings in such a case). Should somebody do it? (Please mail me a yes or no, I will summarize (don't remain silent)). I volunteer. How did you like "Internals Wordset Framework (draft)"? Should I do it along those lines? Should I do it in ASCII or in Latex (and distribute as LateX and Postscript file)? (LaTex is more beautiful and easier to edit, if you know it). Also, I think we need some way to come to conclusions. We probably need a voting mechanism. Any ideas? - anton (Note that replying will post to internals instead of mailing me). M. Anton Ertl Some things have to be seen to be believed anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen