Friday, September 7, 2007

A survey of stack shufflers

Factor is almost Lisp. Aside from the libraries, there's just one tiny difference: Factor has no named variables1. Factor is a concatenative language. This means that programs are assembled by composing other operations. Trial and error has found that the only workable structure for functions to pass data between each other is a stack2 All words in Factor work, in theory, by taking the stack, doing something to it, and returning a new stack, which can be passed on to the next word. In reality, most words only modify the top few items.

For those of you who slept through CS 101 (or, like me, never took it), here's an overview of stacks. We have just three operations: push, pop, and empty. The operation push puts a datum on the end of the stack. The operation pop deletes the top item from the stack, returning it. The operation empty tests whether the stack has any items in it. Initially, we say the stack is empty. Stacks are most commonly implemented as linked lists or growable arrays. Wikipedia has more details.

For our purposes, we'll assume that the empty operation is off-limits3. Generally, we should know whether or not the stack is empty--it has all of our data on it, after all. If a word tries to use more things on the stack then exist, the program crashes with a stack underflow error. This is a good thing, reflecting a real error with the structure of the program which needs to be fixed rather than ignored.

The basics: constants and drop

So, how can we make push and pop work with concatenative languages? We have no place to get the arguments for these, or return the results of these functions except the stack itself. Instead of using push and pop directly, slightly more complicated operations are used. These provide not a way to mess around with the stack, but also the data flow of the language.

First, the easy case: pushing constant values. Say you want to push 1 onto the stack. This is represented in Factor as, simply, 1. Think about it: when you have a literal, and when all data is stored on the stack, what else would you want to do with it besides push it onto that stack? In this way, we can actually think of 1 as a function from stacks to stacks, which takes one stack and returns another which has 1 pushed on to the front of it.

It's so much fun to push these values on the stack! But what if we want to get them off? We want to perform pop on the stack, but ignore the result. We can imagine this implemented as

_ = pop(stack)

You can think of it like dropping the top of the stack. In fact, that's what we'll call this operation, drop.

We can try this all out at the Factor listener4. Open up Factor and push a bunch of numbers to the stack. To see what's there, you can use the command .s. The stack is printed from bottom to top order. To delete the item on the top of the stack, use the command drop.

Building blocks: dup, swap and the retain stack

Unfortunately, there's not much you can do with just those two things, pushing literals and dropping them, even if you add in primitive functions and functions derived from those. One useful action is to duplicate the top of the stack. Since "duplicate" is way too long to type so many times, we'll call it dup instead. dup can be implemented as

item = pop(stack)
push(stack, item)
push(stack, item)

dup is useful if you want to do two things with one piece of data. It's also useful if you want to apply the same piece of data twice to the same function. For example, let's look at the definition of sq, which squares the number on the top of the stack5:

: sq ( x -- x^2 )
dup * ;

This means that the one input to sq is sent as both arguments to *. The part in parentheses is called the stack signature, a listing of inputs and outputs to the word. It is basically a comment; the names included are not significant, except to the reader5. To give you an idea of what's going on here, the equivalent function in Haskell would be

sq x = x*x


Another basic building block is swap. This switches the top two things on the stack. Say you want to negate a number. The easiest way to do this, given a function - ( x y -- x-y ), is something like this:

neg x = 0-x

If we translate 0-x into Factor as 0 -, we get the wrong answer; this is equivalent to x-0, an identity function. So, what we need to do is push 0 onto the stack, swap it with x, and then perform a subtraction:

: neg ( x -- -x )
0 swap - ;


The third basic operation is dipping under the top of the stack to do stuff below. For example, say you want to calculate, given x, y and z, (x+y)*z. The easiest way to do this is to dip under the top of the stack, add x and y, then rise back up and multiply that result by z. In Factor, the dipping down operation is >r and the rising up operation is r>. All together, the Factor code for this function comes out as:

: (x+y)*z ( x y z -- (x+y)*z )
>r + r> * ;

(Yes, in Factor, (x+y)*z is a valid word name, since words are just whitespace-delimited chunks. And I was feeling a little unimaginative.)

Derived operations

Using just those five basic operations--dup, drop, swap, >r and r>, we can implement any other stack shuffling operation6. Here are a few that turn out to be useful. The code here is extremely simple, but resist the temptation to do your own stack shuffling with the above operators and not the below: using derived operations makes your code look far cleaner and clearer to future readers.

swapd swaps the two items under the top of the stack. The suffix "d" indicates that it does this underneath.

: swapd ( x y z -- y x z )
>r swap r> ;


rot rotates the third item on the stack up to the top, pushing the top two down.

: rot ( x y z -- y z x )
swapd swap ;


-rot rotates the top item on the stack down to the third position, pushing the two below up. It is the inverse of rot.

: -rot ( x y z -- z x y )
swap swapd ;


nip is like drop, but it deletes the second item on the stack.

: nip ( x y -- x )
>r drop r> ;


dupd duplicates the item second from the top of the stack, and puts the duplicate underneath. This is useful if you want to do something with the top two things on the stack, but retain what's second.

: dupd ( x y -- x x y )
>r dup r> ;


over takes the second item on the stack and puts a copy of it over the top.

: over ( x y -- x y x )
dupd swap ;


tuck tucks the top item on the stack under the second one

: tuck ( x y -- y x y )
swap over ;


pick picks the third item on the stack and places a copy of it on top

: pick ( x y z -- x y z x )
>r over r> swap ;


In Factor, these are all implemented as primitives, but the only reason for this is performance. Note that, in the above descriptions, when I said "copy" I meant copying a pointer, not copying data (which can be done with clone).

Keep

There's a really cool operator known as keep, which often makes code far more elegant. Strictly speaking, it's not a stack shuffler; it takes a quotation as an argument, making it a combinator, or higher order function. Basically, keep encapsulates a simple, common pattern:

dup >r ... r>

replacing it with

[ ... ] keep

There's also 2keep, which is equivalent to

2dup >r >r ... r> r>

These, with the analogous 3keep, are more useful than they first seem. They are used for any instance where you want to process the top one or more items on the stack, together with something underneath, while retaining those items on the top for further use.

What's this all for?

This list of functions probably looks daunting to memorize, and it is difficult at first. But it's useful, and they soon become second nature. In Factor and other concatenative languages, stack shufflers are the glue used to tie the output of certain functions to the input of other functions. No matter what programming language you're working in, everything ends up being a (perhaps convoluted) stringing together of different functions. Once you get used to it, using the stack and its associated transformations can be even more natural than using tons of named variables.



1 OK, technically, you could use locals for named variables. But it's all stack manipulation underneath. Also, there's dynamically scoped variables, but those aren't used the way lexically scoped locals typically are; they're used for passing data over greater distances.

2 Experimental languages have used a queue rather than, or in addition to, a stack. But this generally leads to more difficult to use--and compile--models. The original point-free language, FP, used function composition without a stack, but the way it uses lists is trivially equivalent to the stack. However, it ends up being somewhat awkward in practice. If you have an idea for how to improve either of these, or of another data structure for concatenative languages, I'd love to hear it. Please comment.

3 Strictly speaking, this isn't the case in Factor. In fact, a function to test whether the stack is empty is trivial to implement:

: stack-empty ( -- ? )
datastack empty? ;

But it's a bad idea to use this, since words should only care about their arguments, and they should take a fixed (or easy to determine, in the case of certain macros) number of arguments.

4 Here's how to get and open Factor. If you're on Windows or Mac, download the appropriate package from the Factor website and unpack it somewhere. On either platform, double click on the F icon, and a Factor listener window will pop up. On Linux or other Unixes, download the source, then make with the appropriate target (run without arguments to see options), and run ./factor. See the readme for more details. No matter how you get it started, once you have the listener open, you can just type in words and press enter to execute them.

5 The structure of stack comments is fairly simple. Before the -- is the inputs, and after is the outputs. Both sides are written from the bottom of the stack to the top of the stack. For example, ( a b -- c d ) indicates a word which takes a second from the top of the stack, and b on the top of the stack, pops those off, and returns c second from the top and
d
on the top. The names should indicate something about the arguments, ideally more informative than their type (though that is often used). If the same name is used on the left and right sides of the --, that implies that the same object exists before and after the function is called. This occurs most often in stack shufflers.

6 Actually, there's a set of two combinators, cake and k, which can do anything, including any stack shuffle, and call, and curry. They're described in a fun paper by Brent Kerby, The Theory of Concatenative Combinators.




Note: Another way to look at the stack shufflers is through a functional implementation of stacks in terms of linked lists with pattern matching. We'll call the front of the list the top of the stack Here's a few stack shufflers in Haskell, a language where : means cons:

drop (x:stack) = stack
swap (x:y:stack) = y:x:stack
dup (x:stack) = x:x:stack
over (x:y:stack) = y:x:y:stack
tuck (x:y:stack) = x:y:x:stack
rot (x:y:z:stack) = z:x:y:stack
dip f (x:stack) = x:f (stack)
keep f stack = dip f (dup stack)

This could be the basis for a dynamically typed concatenative language in Haskell, though it wouldn't work with static typing very well. (You could use nested tuples, with statically known types, are used instead. This requires a whole bunch of polymorphism and type classes. But with this, it is impossible to have a statically unknown stack depth. This makes it impossible to do certain types of recursion.)

Another way to think of it in Haskell is in terms of higher order functions:

drop f x = f -- same as const
swap f x y = f y x -- same as flip
dup f x = f x x

This, in my mind, more closely resembles the real data flow manipulation involved here. However, they are hard to compose in real programs.

7 comments:

wtanksley said...

You mention experimental queue-based languages; I did some exploring in that area, and I think they're more interesting than you indicate.

A 'normal' stack-based concatenative language actually isn't purely stack based. It has a stack, plus another unspecified data structure that holds the code. A stack+queue language specifies that code being executed is held on a queue.

Check out http://www.nsl.com/k/xy/xy.htm for an example.

Unknown said...

Toy languages which use a queue rather than a stack are basically impossible to use. XY is a cool idea, but I think it'd be difficult to efficiently compile it (though you could still get reasonable performance if you use the optimized K functions a lot). But research is always interesting, and new models of concatenative languages are never a bad thing. Sorry if I was a little off-putting.

Logan Capaldo said...

I think the stack effect comment for dupd is incorrect. it should be ( x y -- x y y ) no?

Slava Pestov said...

Logan, the stack effect is correct. In a stack effect, the top of the stack is at the right.

Logan Capaldo said...

I think I have stack effect dyslexia. This is not the first time I've made that mistake ;)

Anonymous said...

I had the same idea and wrote a forth like syntax in lisp, with functions taking lists and returning lists. it's in my blog, but it's in spanish.

Unknown said...

anonymous,

I know a little bit of Spanish, and Google knows more. Could you give me a link to your blog? Joy represents stacks as linked lists, so I wonder how similar the two ideas are. It's an interesting comment.