Simulating a call stack using arrays
A call stack is a data structure that is built when a program runs. As function calls keep coming in, the information regarding their code is arranged in frames, that is, a frame per call or variable evaluation. And these frames are stacked up. The program execution is then a matter of "unwinding" these frames, that is, after a frame at the top of the stack has been evaluated, it is unstacked and the process resumes at the new frame that is now at the top of the call stack.
Here we will observe a simple rule to unwind: as the execution goes, if we unstack a variable, we store it, and when we encounter a function call to unstack, we store the return value of its call and pass to it the parameters that we've stored so far. The next figure explains this process:
How to do it...
- First of all, let's define our
ns
(namespace) incorporating all Clojure facilities that we will use:(ns recipe4.core (:require [instaparse.core :as insta]) ;;=> To parse our programs (:require [clojure.zip :as z]) ;;=> To walk and process parse trees (:require [clojure.pprint :refer :all]) ;;=> To pretty print results (:require [clojure.walk :as walk])) ;;=> To transform some nodes ;; in our programs' parse trees
- We'll also alias
clojure.pprint/pprint
so that we can easily pretty-print the results of our computations:(def p pprint)
- We'll design a minimal language that will be parsed with
instaparse
.Note
Instaparse (https://github.com/engelberg/instaparse) is a parser generator written in Clojure. Explaining the mechanism of Instaparse is beyond the scope of this book, but you should know it handles context-free grammars (CFG), and generates parse trees of your input programs according to these grammar concepts.
- Our language will only be able to understand function calls. You can think of it as a kind of Lisp, but with no prefix notation; you can write your functions using the old mathematical way in this. Besides, our language is able to understand function declarations. Here is an example of what a program in this language looks like:
decl-fn f(x,y){ plus(x,y); }; plus(f(1,2),f(3,4));
- The functions without declarations are considered
primitive
orlibrary
functions in our programs. - Here is the
instaparse
grammar that is able to parse programs written in our minimal language:(def r4-language "S = ((FN-CALL|FN-DECL) <FN-SEP>)* FN-CALL = <optional-whitespace> ID <optional-whitespace> <left-paren> PARAMS-LIST <right-paren> PARAMS-LIST = <optional-whitespace> (ID|FN-CALL) (<optional-whitespace> <PARAMS-SEP> <optional-whitespace> (ID|FN-CALL))* FN-DECL = <optional-whitespace> 'decl-fn' <whitespace> ID <optional-whitespace> <left-paren> ARGS-LIST <right-paren> <optional-whitespace> <left-curly> FN-DECL-BODY <right-curly> ARGS-LIST = <optional-whitespace> ID (<optional-whitespace> <PARAMS-SEP> <optional-whitespace> ID)* FN-DECL-BODY = (FN-CALL <FN-SEP>)* left-paren = '(' right-paren = ')' left-curly = '{' right-curly = '}' ID = #'[a-zA-Z0-9]+' whitespace = #'\\s+' optional-whitespace = #'\\s*' FN-SEP = <optional-whitespace> ';' <optional-whitespace> PARAMS-SEP = <optional-whitespace> ',' <optional-whitespace>")
- Note that identifiers between angle brackets will not be shown in the parse tree, so there's no use of referring to
white-space
tags, for instance. - Let's see what the parse tree of the program we previously wrote looks like. Issue the following code in your REPL:
(p (insta/parse (insta/parser r4-language) " decl-fn f(x,y){ plus(x,y); }; plus(f(1,2),f(3,4));"))
- After this step, you'll get the following output:
[:S [:FN-DECL "decl-fn" [:ID "f"] [:ARGS-LIST [:ID "x"] [:ID "y"]] [:FN-DECL-BODY [:FN-CALL [:ID "plus"] [:PARAMS-LIST [:ID "x"] [:ID "y"]]]]] [:FN-CALL [:ID "plus"] [:PARAMS-LIST [:FN-CALL [:ID "f"] [:PARAMS-LIST [:ID "1"] [:ID "2"]]] [:FN-CALL [:ID "f"] [:PARAMS-LIST [:ID "3"] [:ID "4"]]]]]]
- Now we'll use the
instaparse
andtransform
functions to provide a more convenient representation of our parsed program.transform
function replaces particular tags in the parse tree, applying a function to the rest of elements in the vector that contains those tags. Here is how we want to transform the parse trees:(defn gen-program [parser program] (into [] (insta/transform {:S (fn [ & args] args) :FN-CALL (fn [fn-id params] [:FN-CALL (fn-id 1) params]) :PARAMS-LIST (fn[& params] (into [] params) ) :FN-DECL (fn [_ decl-fn-id args body] [:FN-DECL (decl-fn-id 1) args body]) :ARGS-LIST (fn [& args] (into [] args)) :FN-DECL-BODY (fn [& body] (into [] body))} (parser program))))
- To better understand what this function does you can refer to its output, which is as follows. Input the following code in to your REPL:
(p (gen-program (insta/parser r4-language) "decl-fn f(x,y){ plus(x,y); }; plus(f(1,2),f(3,4));" ))
- After completing this step, you'll get the following output:
[[:FN-DECL "f" [[:ID "x"] [:ID "y"]] [[:FN-CALL "plus" [[:ID "x"] [:ID "y"]]]]] [:FN-CALL "plus" [[:FN-CALL "f" [[:ID "1"] [:ID "2"]]] [:FN-CALL "f" [[:ID "3"] [:ID "4"]]]]]]
- With this representation of our program, we first need to know which functions are declared:
(defn get-fn-decls [program] (->> program (filter #(= :FN-DECL (get % 0))) ;;=> Take only instructions with :FN-DECL tag (into [])))
- Complementary to this function, we need a function that tells us which instructions (function calls) we have in our program:
(defn get-instructions [program] (->> program (filter #(not= :FN-DECL (get % 0))) ;;=> Take only instructions with no :FN-DECL tag. (into [])))
- Now we will focus on how to translate declared function calls. We need to exchange the reference to such calls with the bodies of declaration, in which we inject the parameters passed along with the call. Let's first see the declaration of a particular function:
(defn get-fn-id-decl [fn-decls fn-id] (->> fn-decls (filter #(= (get % 1) fn-id)) ;;=> Returns the fn-decl that matches the passed fn-id. (first))) ;;=> This function will return 'nil' if there is no ;; declaration found for it.
- Now we are going to implement
call-fn
, which is a function that does the actual translation of a function call using its declaration (if we ever find any) and passed parameters:(defn call-fn [fn-decl fn-call] (let [decl-args-list (fn-decl 2) ;;=> we get the args in the declaration decl-body (fn-decl 3) ;;=> We get the body of the declaration. fn-call-params (fn-call 2)] ;;=> We get the passed parameters (if (not (= (count decl-args-list) (count fn-call-params))) [:arity-error-in-calling (fn-decl 1 )] ;;=> If the count of parameters and args mismatch, we say we have an arity error (let [replacement-map (zipmap decl-args-list fn-call-params)] ;;=> we prepare a replacement map for 'postwalk-replace': ;; zipmap builds a map containing keys from the first seq ;; 'decl-args-list' and vals from the second one 'fn-call-params'. (walk/postwalk-replace replacement-map decl-body))))) ;;=> 'postwalk-replace' will then change in 'decl-body' the ;; arguments 'decl-args-list' by corresponding paramters in ;; 'fn-call-params'
- Next, we will do the actual translation of the declared function calls and leave the non-declared functions as they are, assuming that they are
primitive
orlibrary
functions. This is why we called theexpand-to-primitive-calls
function:(defn expand-to-primitive-calls [program] ;;=> A program generated with 'gen-program' (let [fn-decls (get-fn-decls program) instructions (get-instructions program) ;;=> preparing function declarations and instructions . zpr (z/vector-zip instructions)] ;;=> A zipper to walk instructions. (loop [result instructions ;;=> We initially have our result set to be our instructions. loc (-> zpr z/down)] (if (-> loc z/end?) result ;;=> end of recursion. If no more nodes to visit, we emit result. (let [current-node (-> loc z/node)] ;;=> We store current node (if (= (get current-node 0 :FN-CALL)) ;;=> If it is a function call (if-let [the-decl (get-fn-id-decl fn-decls (get current-node 1))] ;;=> and it has a declaration associated with it (recur (walk/postwalk-replace {(-> loc z/node) (call-fn the-decl current-node)} result ) (-> loc z/next)) ;;=> we recur replacing this current-nod with ;; the function declaration along with the parameters. (recur result (-> loc z/next))) ;;=> else we recur leaving the function as is considering it ;; to be 'primitive'. (recur result (-> loc z/next)))))))) ;;=> or we recur leaving the instruction as is, because here ;; we only have a variable evaluation.
- At this particular point we are able to construct a call stack for an instruction:
(defn a-call-stack [a-call] (let [zpr (z/vector-zip a-call)] ;;=> A zipper to walk our call. (loop [result [] loc (-> zpr z/down)] (if (-> loc z/end?) result ;;=> End of the recursion, we emit result. (let [current-node (-> loc z/node)] ;=> we store the current node. (recur (if (and (not (vector? current-node)) (not= :FN-CALL current-node) (not= :ID current-node)) ;;=> If this is a literal, that is, not a vector, and not a tag, (conj result {(-> loc z/left z/node) current-node}) ;;=> I add it to the stack, along with the node at its left; ;;=> for instance, we'll have {:ID a-value} ;; or {:FN-CALL a value} result) ;; => Else we leave the stack as is. (-> loc z/next))))))) ; and we go to the next node.
- Finally, we will get to construct a stack for every instruction:
(defn program-call-stack [prog] (into [] (map a-call-stack (expand-to-primitive-calls prog))))
Let's see how it works. Type the following in to your REPL:
(p (program-call-stack (gen-program (insta/parser r4-language) "decl-fn f(x,y){ plus(x,y); }; plus(f(1,2),f(3,4)); f(4,5);" )))
The result of this would be:
[[{:FN-CALL "plus"} {:FN-CALL "plus"} {:ID "1"} {:ID "2"} {:FN-CALL "plus"} {:ID "3"} {:ID "4"}] [{:FN-CALL "plus"} {:ID "4"} {:ID "5"}]]
Here, the stack top comes last, as vectors in Clojure are way more efficiently accessed from the tail. This stack would be unwinded as follows:
- This stack processes instruction 1.
- Then it stores the value
4
. - Stores the values
3
,4
. - Stores the value of
plus("3","4)
. - Stores the values of
2
,plus("3","4)
. - Stores the values of
1
,2
,plus("3","4)
. - Stores the values of
plus("1","2")
,plus("3","4)
. - Stores the values of
plus(plus("1","2")
,plus("3","4))
. - Instruction 1 finishes returning
plus(plus("1","2")
,plus("3","4))
. - Then it processes instruction 2.
- Stores the value
5.
- Stores the values
4
,5
. - Stores the value of
plus ("4","5")
. - Instruction 2 finishes returning the value of
plus ("4","5")
.