Recursion and accumulators
Recursion is perhaps the most important pattern in functional programming. Recursive functions are more practical in Haskell than in imperative languages, due to referential transparency and laziness. Referential transparency allows the compiler to optimize the recursion away into a tight inner loop, and laziness means that we don't have to evaluate the whole recursive expression at once.
Next we will look at a few useful idioms related to recursive definitions: the worker/wrapper transformation, guarded recursion, and keeping accumulator parameters strict.
The worker/wrapper idiom
Worker/wrapper transformation is an optimization that GHC sometimes does, but worker/wrapper is also a useful coding idiom. The idiom consists of a (locally defined, tail-recursive) worker function and a (top-level) function that calls the worker. As an example, consider the following naive primality test implementation:
-- file: worker_wrapper.hs isPrime :: Int -> Bool isPrime n | n <= 1 = False | n <= 3 = True | otherwise = worker 2 where worker i | i >= n = True | mod n i == 0 = False | otherwise = worker (i+1)
Here, isPrime
is the wrapper and worker
is the worker function. This style has two benefits. First, you can rest assured it will compile into optimal code. Second, the worker/wrapper style is both concise and flexible; notice how we did preliminary checks in the wrapper code before invoking the worker, and how the argument n
is also (conveniently) in the worker's scope too.
Guarded recursion
In strict languages, tail-call optimization is often a concern with recursive functions. A function f
is tail-recursive if the result of a recursive call to f
is the result. In a lazy language such as Haskell, tail-call "optimization" is guaranteed by the evaluation schema. Actually, because in Haskell evaluation is normally done only up to WHNF (outmost data constructor), we have something more general than just tail-calls, called guarded recursion. Consider this simple moving average implementation:
-- file: sma.hs sma :: [Double] -> [Double] sma (x0:x1:xs) = (x0 + x1) / 2 : sma (x1:xs) sma xs = xs
The sma
function is not tail-recursive, but nonetheless it won't build up a huge stack like an equivalent in some other language might do. In sma
, the recursive callis guarded by the (:
) data constructor. Evaluating the first element of a call to sma
does not yet make a single recursive call to sma
. Asking for the second element initiates the first recursive call, the third the second, and so on.
As a more involved example, let's build a reverse polish notation (RPN) calculator. RPN is a notation where operands precede their operator, so that (3 1 2 + *) in RPN corresponds to ((3 + 1) * 2), for example. To make our program easier to understand, we wish to separate parsing the input from performing the calculation:
-- file: rpn.hs data Lex = Number Double Lex | Plus Lex | Times Lex | End lexRPN :: String -> Lex lexRPN = go . words where go ("*":rest) = Times (go rest) go ("+":rest) = Plus (go rest) go (num:rest) = Number (read num) (go rest) go [] = End
The Lex
datatype represents a formula in RPN and is similar to the standard list type. The lexRPN
function reads a formula from string format into our own datatype. Let's add an evalRPN
function, which evaluates a parsed RPN formula:
evalRPN :: Lex -> Double evalRPN = go [] where go stack (Number num rest) = go (num : stack) rest go (o1:o2:stack) (Plus rest) = let r = o1 + o2 in r `seq` go (r : stack) rest go (o1:o2:stack) (Times rest) = let r = o1 * o2 in r `seq` go (r : stack) rest go [res] End = res
We can test this implementation to confirm that it works:
> :load rpn.hs > evalRPN $ lexRPN "5 1 2 + 4 * *" 60.0
The RPN expression (5 1 2 + 4 * *)
is (5 * ((1 + 2) * 4)) in infix, which is indeed equal to 60.
Note how the lexRPN
function makes use of guarded recursion when producing the intermediate structure. It reads the input string incrementally and yields the structure an element at a time. The evaluation function evalRPN
consumes the intermediate structure from left to right and is tail-recursive, so we keep the minimum amount of things in memory at all times.
Note
Linked lists equipped with guarded recursion (and lazy I/O) actually provide a lightweight streaming facility – for more on streaming see Chapter 6, I/O and Streaming.
Accumulator parameters
In our examples so far, we have encountered a few functions that used some kind of accumulator. mySum2
had an Int
that increased on every step. The go
worker function in evalRPN
passed on a stack (a linked list). The former had a space leak, because we didn't require the accumulator's value until at the end, at which point it had grown into a huge chain of pointers. The latter case was okay because the stack didn't grow in size indefinitely and the parameter was sufficiently strict in the sense that we didn't unnecessarily defer its evaluation. The fix we applied in mySum2'
was to force the accumulator to WHNF at every iteration, even though the result was not strictly speaking required in that iteration.
The final lesson is that you should apply special care to your accumulator's strictness properties. If the accumulator must always be fully evaluated in order to continue to the next step, then you're automatically safe. But if there is a danger of an unnecessary chain of thunks being constructed due to a lazy accumulator, then adding a seq
(or a bang pattern, see Chapter 2, Choose the Correct Data Structures) is more than just a good idea.