While this is possible in many cases without resorting to deep
stacks and without using pick and roll, in
other cases I failed to find such a factoring, and I posted some of
these cases as a challenge to those who hold the anti-locals position.
I finally collected the challenges and the solutions posted up to now
on this page.
Note that Chuck Moore uses variables to reduce the
stack depth. An advantage of this approach over using locals is
that the word can be split into factors without ado. It also has
disadvantages: The words involved now have parameters and/or results
that are not passed on the stack, which makes testing of individual
words harder; there may be non-obvious data flow between words,
which makes the program harder to understand; and the resulting code
is not reentrant.
In my EuroForth 2011 paper I give an overview of techniques programmers can use to reduce the stack depth.
The challenges are:
The challenge is to implement the words without using locals, using the stacks instead, or maybe temporarily allocated memory. I.e., no USERs or VARIABLEs at least for F>BUF-RDP-TRY; for comp-i.fs VARIABLEs are less of a problem, but to me still a sign that stacks are not the answer. And the result should be at least as understandable and maintainable as the original; using factoring to achieve the latter is probably a good idea.
If you can make it more readable by factoring while still using locals, this would show the merits of factoring, but not the demerits of using locals.
A problem with these challenges is that they are taken from real code, call non-standard words (you have to look up the stack effects of these), and are unwieldy in other ways. This is in some way unavoidable: A designed (rather than real-world) example would appear contrived.
F>BUF-RDP-TRYf>buf-rdp-try ( rf c-addr ur nd up um1 -- um2 ) is a
factor of f.rdp, a floating-point output word. It tries
to convert the FP number rf into a string in the buffer c-addr ur.
The result is in either fixed-point or exponential notation, depending
on the FP number and the other parameters (see f.rdp
documentation). ur corresponds to the nr parameter
of f.rdp, nd to the nd parameter, and up to the np
parameter. um1 is the mantissa length to try, um2 is the proper
mantissa length as determined by this try. Make another try with that
mantissa length, and you get the proper conversion. The source code
with its surroundings
are here.
\ uses the following non-standard non-obvious words:
\ push-right ( c-addr u1 u2 cfill -- )
\ move string at c-addr u1 right by u2 chars (without exceeding
\ the original bound); fill the gap with cfill
\ <<# ( -- )
\ Start a hold area that ends with '#>>'.
\ #>> ( -- )
\ Release the hold area started with '<<#'.
: f>buf-rdp-try { f: rf c-addr ur nd up um1 -- um2 }
\ um1 is the mantissa length to try, um2 is the actual mantissa length
c-addr ur um1 /string '0 fill
rf c-addr um1 represent if { nexp fsign }
nd nexp + up >= up 0= or
ur nd - 1- dup { beforep } fsign + nexp 0 max >= and if
\ fixed-point notation
c-addr ur beforep nexp - dup { befored } '0 push-right
befored 1+ ur >= if \ <=1 digit left, will be pushed out by '.'
rf fabs f2* 0.1e nd s>d d>f f** f> if \ round last digit
'1 c-addr befored + 1- c!
endif
endif
c-addr beforep 1- befored min dup { beforez } 0 max bl fill
fsign if
'- c-addr beforez 1- 0 max + c!
endif
c-addr ur beforep /string 1 '. push-right
nexp nd +
else \ exponential notation
c-addr ur 1 /string 1 '. push-right
fsign if
c-addr ur 1 '- push-right
endif
nexp 1- s>d tuck dabs <<# #s rot sign 'E hold #> { explen }
ur explen - 1- fsign + { mantlen }
mantlen 0< if \ exponent too large
drop c-addr ur '* fill
else
c-addr ur + 0 explen negate /string move
endif
#>> mantlen
endif
else \ inf or nan
\ don't rely on REPRESENT result
2drop
rf f0< if s" -Inf" else rf f0>= if s" Inf" else s" NaN" endif endif
c-addr ur rot umin dup >r move c-addr ur r> /string blank
ur
endif
1 max ur min ;
The problem is the big number of values around. One could factor out
the three cases (fixed-point, exponential, and inf/nan), but the
number of values around would stay the same, and we would still need
locals.
comp-i.fsThe code uses knowledge about the image file format, which is not
really documented; this makes the code hard to grok. There are two
long words with locals in this
code: compare-images
and comp-image.
\ non-standard words used:
\ image-data ( i-field expected-offset -- base offset )
\ image1 ( -- addr )
\ image2 ( -- addr )
\ write-cell ( x file-id -- ior )
\ reloc-bits ( -- addr )
\ set-bit ( u addr -- )
\ write-symbol ( acell mask file-id u -- )
: compare-images { size file-id -- }
\G compares image1 and image2 (of size cells) and sets reloc-bits.
\G offset is the difference for relocated addresses
\ this definition is certainly to long and too complex, but is
\ hard to factor.
cr ." code" cell 26 cells image-data { cbase coffset }
." xt" 13 cells 22 cells image-data { xbase xoffset }
." label" 14 cells 18 cells image-data { lbase loffset }
size 0 u+do
image1 i th @ image2 i th @ { cell1 cell2 }
case
cell1 cell2 = ?of
cell1 file-id write-cell throw endof
cell1 im-sects1 sect-reloc cell2 im-sects2 sect-reloc over = ?of
file-id write-cell throw
i reloc-bits set-bit endof
drop
cell1 coffset + cell2 = ?of
cell1 cbase - $4000 file-id i write-symbol endof
cell1 xoffset + cell2 = ?of
cell1 xbase - 0 file-id i write-symbol endof
cell1 loffset + cell2 = ?of
cell1 lbase - $8000 file-id i write-symbol endof
cell1 file-id write-cell throw
cell1 cell2 <> if
0 i th 9 u.r cell1 17 u.r cell2 17 u.r cr
endif
0 endcase
loop ;
Again, there is a large number of values in play here.
\ additional non-standard words used:
\ slurp-file ( c-addr1 u1 -- c-addr2 u2 )
\ size1 ( -- u )
\ size2 ( -- u )
\ prepare-sections ( -- )
\ reloc-size ( -- u )
\ reloc-bits ( -- addr )
: comp-image ( "image-file1" "image-file2" "new-image" -- )
parse-name slurp-file { file1 fsize1 }
file1 fsize1 s" Gforth5" search 0= abort" not a Gforth image"
drop 8 + file1 - { header-offset }
file1 fsize1 header-offset /string to size1 to image1
size1 aligned size1 <> abort" unaligned image size"
parse-name slurp-file header-offset /string to size2 to image2
size1 size2 <> abort" image sizes differ"
prepare-sections
parse-name ( "new-image" ) w/o bin create-file throw { outfile }
size1 1- cell/ bits/au / 1+ to reloc-size
reloc-size allocate throw to reloc-bits
reloc-bits reloc-size erase
file1 header-offset outfile write-file throw
base @ hex
size1 aligned cell/ outfile compare-images
base !
reloc-bits reloc-size outfile write-file throw
outfile close-file throw ;
This word is just a straight-line word, which makes it relatively easy
to understand despite its length (if you understand the components).
map-array (mostly solved)
: map-array ( ... addr u xt -- ... )
\ executes xt ( ... x -- ... ) for every element of the array starting
\ at addr and containing u elements
{ xt }
cells over + swap ?do
i @ xt execute
1 cells +loop ;
Here the standard restrictions of ?do...+loop on return
stack usage do not allow us to keep xt on the return stack the whole
time, while we can keep it in a local the whole time. It is possible
to push the xt on the return stack only during the EXECUTE:
: map-array ( ... addr u xt -- ... )
rot rot cells over + swap ?do ( xt )
i @ swap dup >r execute r>
1 cells +loop
drop ;
But that would mean quite a bit of stack busy work; not really an
example that shows the superiority of not using
locals. Andrew
Haley suggests using
macros instead:
: (map) ( a n - a a') cells over + swap ; : map[ postpone (map) postpone ?do ; immediate : ]map postpone cell postpone +loop ; immediateThis is used like
map[ i @ + ]map instead of ['] +
map-array and can only be used in a colon definition.