Sequences are a truly ubiquitous abstraction in Clojure. The primary motivation behind using sequences is that any domain with sequence-like data in it can be easily modelled using the standard functions that operate on sequences. This infamous quote from the Lisp world reflects on this design:
"It is better to have 100 functions operate on one data abstraction than 10 functions on 10 data structures."
A sequence can be constructed using the cons
function. We must provide an element and another sequence as arguments to the cons
function. The first
function is used to access the first element in a sequence, and similarly the rest
function is used to obtain the other elements in the sequence, shown as follows:
Note
The first
and rest
functions in Clojure are equivalent to the car
and cdr
functions, respectively, from traditional Lisps. The cons
function carries on its traditional name.
In Clojure, an empty list is represented by the literal ()
. An empty list is considered as a truthy value, and
does not equate to nil
. This rule is true for any empty collection. An empty list does indeed have a type – it's a list. On the other hand, the nil
literal signifies the absence of a value, of any type, and is not a truthy value. The second argument that is passed to cons
could be empty, in which case the resulting sequence would contain a single element:
An interesting quirk is that nil
can be treated as an empty collection, but the converse is not true. We can use the empty?
and nil?
functions to test for an empty collection and a nil
value, respectively. Note that (empty? nil)
returns true
, shown as follows:
Note
By the truthy value, we mean to say a value that will test positive in a conditional expression such as an if
or a when
form.
The rest
function will return an empty list when supplied an empty list. Thus, the value returned by rest
is always truthy. The seq
function can be used to obtain a sequence from a given collection. It will return nil
for an empty list or collection. Hence, the head
, rest
and seq
functions can be used to iterate over a sequence. The next
function can also be used for iteration, and the expression (seq (rest coll))
is equivalent to (next coll)
, shown as follows:
The sequence
function can be used to create a list from a sequence. For example, nil
can be converted into an empty list using the expression (sequence nil)
. In Clojure, the seq?
function is used to check whether a value implements the sequence interface, namely clojure.lang.ISeq
. Only lists implement this interface, and other data structures such as vectors, sets, and maps have to be converted into a sequence by using the seq
function. Hence, seq?
will return true
only for lists. Note that the list?
, vector?
, map?
, and set?
functions can be used to check the concrete type of a given collection. The behavior of the seq?
function with lists and vectors can be described as follows:
Only lists and vectors provide a guarantee of sequential ordering among elements. In other words, lists and vectors will store their elements in the same order or sequence as they were created. This is in contrast to maps and sets, which can reorder their elements as needed. We can use the sequential?
function to check whether a collection provides sequential ordering:
The associative?
function can be used to determine whether a collection or sequence associates a key with a particular value. Note that this function returns true
only for maps and vectors:
The behavior of the associative?
function is fairly obvious for a map since a map is essentially a collection of key-value pairs. The fact that a vector is also associative is well justified too, as a vector has an implicit key for a given element, namely the index of the element in the vector. For example, the [:a :b]
vector has two implicit keys, 0
and 1
, for the elements :a
and :b
respectively. This brings us to an interesting consequence – vectors and maps can be treated as functions that take a single argument, that is a key, and return an associated value, shown as follows:
Although they are not associative by nature, sets are also functions. Sets return a value contained in them, or nil
, depending on the argument passed to them, shown as follows:
Now that we have familiarized ourselves with the basics of sequences, let's have a look at the many functions that operate over sequences.
There are several ways to create sequences other than using the cons
function. We have already encountered the conj
function in the earlier examples of this chapter. The conj
function takes a collection as its first argument, followed by any number of arguments to add to the collection. We must note that conj
behaves differently for lists and vectors. When supplied a list, the conj
function adds the other arguments at the head, or start, of the list. In case of a vector, the conj
function will insert the other arguments at the tail, or end, of the vector:
The concat
function can be used to join or concatenate any number of sequences in the order in which they are supplied, shown as follows:
A given sequence can be reversed using the reverse
function, shown as follows:
The range
function can be used to generate a sequence of values within a given integer range. The most general form of the range
function takes three arguments—the first argument is the start of the range, the second argument is the end of the range, and the third argument is the step of the range. The step of the range defaults to 1
, and the start of the range defaults to 0
, as shown here:
We must note that the range
function expects the start of the range to be less than the end of the range. If the start of the range is greater than the end of the range and the step of the range is positive, the range
function will return an empty list. For example, (range 15 10)
will return ()
. Also, the range
function can be called with no arguments, in which case it returns a lazy and infinite sequence starting at 0
.
The take
and drop
functions can be used to take or drop elements in a sequence. Both functions take two arguments, representing the number of elements to take or drop from a sequence, and the sequence itself, as follows:
To obtain an item at a particular position in the sequence, we should use the nth
function. This function takes a sequence as its first argument, followed by the position of the item to be retrieved from the sequence as the second argument:
To repeat a given value, we can use the repeat
function. This function takes two arguments and repeats the second argument the number of times indicated by the first argument:
The repeat
function will evaluate the expression of the second argument and repeat it. To call a function a number of times, we can use the repeatedly
function, as follows:
In this example, the repeat
form first evaluates the (rand-int 100)
form, before repeating it. Hence, a single value will be repeated several times. Note that the rand-int
function simply returns a random integer between 0
and the supplied value. On the other hand, the repeatedly
function invokes the supplied function a number of times, thus producing a new value every time the rand-int
function is called.
A sequence can be repeated an infinite number of times using the cycle
function. As you might have guessed, this function returns a lazy sequence to indicate an infinite series of values. The take
function can be used to obtain a limited number of values from the resulting infinite sequence, shown as follows:
The interleave
function can be used to combine any number of sequences. This function returns a sequence of the first item in each collection, followed by the second item, and so on. This combination of the supplied sequences is repeated until the shortest sequence is exhausted of values. Hence, we can easily combine a finite sequence with an infinite one to produce another finite sequence using the interleave
function:
Another function that performs a similar operation is the interpose
function. The interpose
function inserts a given element between the adjacent elements of a given sequence:
The iterate
function can also be used to create an infinite sequence. Note that we have already used the iterate
function to create a lazy sequence in Example 1.7. This function takes a function f
and an initial value x
as its arguments. The value returned by the iterate
function will have (f x)
as the first element, (f (f x))
as the second element, and so on. We can use the iterate
function with any other function that takes a single argument, as follows:
There are also several functions to convert sequences into different representations or values. One of the most versatile of such functions is the map
function. This function maps a given function over a given sequence, that is, it applies the function to each element in the sequence. Also, the value returned by map
is implicitly lazy. The function to be applied to each element must be the first argument to map
, and the sequence on which the function must be applied is the next argument:
Note that map
can accept any number of collections or sequences as its arguments. In this case, the resulting sequence is obtained by passing the first items of the sequences as arguments to the given function, and then passing the second items of the sequences to the given function, and so on until any of the supplied sequences are exhausted. For example, we can sum the corresponding elements of two sequences using the map
and +
functions, as shown here:
The mapv
function has the same semantics of map, but returns a vector instead of a sequence, as shown here:
Another variant of the map
function is the map-indexed
function. This function expects that the supplied function will accept two arguments—one for the index of a given element and another for the actual element in the list:
In this example, the function supplied to map-indexed
simply returns its arguments as a vector. An interesting point that we can observe from the preceding example is that a string can be treated as a sequence of characters.
The mapcat
function is a combination of the map
and concat
function. This function maps a given function over a sequence, and applies the concat
function on the resulting sequence:
In this example, we use the split
function from the clojure.string
namespace to split a string using a regular expression, shown as #"\d"
. The split
function will return a vector of strings, and hence the mapcat
function returns a sequence of strings instead of a sequence of vectors like the map
function.
The reduce
function is used to combine or reduce a sequence of items into a single value. The reduce
function requires a function as its first argument and a sequence as its second argument. The function supplied to reduce
must accept two arguments. The supplied function is first applied to the first two elements in the given sequence, and then applied to the previous result and the third element in the sequence, and so on until the sequence is exhausted. The reduce
function also has a second arity, which accepts an initial value, and in this case, the supplied function is applied to the initial value and the first element in the sequence as the first step. The reduce
function can be considered equivalent to loop-based iteration in imperative programming languages. For example, we can compute the sum of all elements in a sequence using reduce
, as follows:
In this example, when the reduce
function is supplied an empty collection, it returns 0
, since (+)
evaluates to 0
. When an initial value of 1
is supplied to the reduce
function, it returns 1
, since (+ 1)
returns 1
.
A list comprehension can be created using the for
macro. Note that a for
form will be translated into an expression that uses the map
function. The for
macro needs to be supplied a vector of bindings to any number of collections, and an expression in the body. This macro binds the supplied symbol to each element in its corresponding collection and evaluates the body for each element. Note that the for
macro also supports a :let
clause to assign a value to a variable, and also a :when
clause to filter out values:
The for
macro can also be used over a number of collections, as shown here:
The doseq
macro has semantics similar to that of for
, except for the fact that it always returns a nil
value. This macro simply evaluates the body expression for all of the items in the given bindings. This is useful in forcing evaluation of an expression with side effects for all the items in a given collection:
As shown in the preceding example, both the first and second doseq
forms return nil
. However, the second form prints the value of the expression (* x x)
, which is a side effect, for all items in the sequence (range 3 7)
.
The into
function can be used to easily convert between types of collections. This function requires two collections to be supplied to it as arguments, and returns the first collection filled with all the items in the second collection. For example, we can convert a sequence of vectors into a map, and vice versa, using the into
function, shown here:
We should note that the into
function is essentially a composition of the reduce
and conj
functions. As conj
is used to fill the first collection, the value returned by the into
function will depend on the type of the first collection. The into
function will behave similar to conj
with respect to lists and vectors, shown here:
A sequence can be partitioned into smaller ones using the partition
, partition-all
and partition-by
functions. Both the partition
and partition-all
functions take two arguments—one for the number of items n
in the partitioned sequences and another for the sequence to be partitioned. However, the partition-all
function will also return the items from the sequence, which have not been partitioned as a separate sequence, shown here:
The partition
and partition-all
functions also accept a step argument, which defaults to the supplied number of items in the partitioned sequences, shown as follows:
The partition
function also takes a second sequence as an optional argument, which is used to pad the sequence to be partitioned in case there are items that are not partitioned. This second sequence has to be supplied after the step argument to the partition
function. Note that the padding sequence is only used to create a single partition with the items that have not been partitioned, and the rest of the padding sequence is discarded. Also, the padding sequence is only used if there are any items that have not been partitioned. This can be illustrated in the following example:
In this example, we first provide a padding sequence in the second statement as (range 11 12)
, which only comprises of a single element. In the next statement, we supply a larger padding sequence, as (range 11 15)
, but only the first item 11
from the padding sequence is actually used. In the last statement, we also supply a padding sequence but it is never used, as the (range 11)
sequence is partitioned into sequences of 3 elements each with a step of 4
, which will have no remaining items.
The partition-by
function requires a higher-order function to be supplied to it as the first argument, and will partition items in the supplied sequence based on the return value of applying the given function to each element in the sequence. The sequence is essentially partitioned by partition-by
whenever the given function returns a new value, as shown here:
In this example, the second statement partitions the given sequence into sequences that each contain a single item as we have used the identity
function, which simply returns its argument. For the [-2 -1 0 1 2]
sequence, the identity
function returns a new value for each item in the sequence and hence the resulting partitioned sequences all have a single element.
The sort
function can be used to change the ordering of elements in a sequence. The general form of this function requires a function to compare items and a sequence of items to sort. The supplied function defaults to the compare
function, whose behavior changes depending on the actual type of the items being compared:
If we intend to apply a particular function to each item in a sequence before performing the comparison in a sort
form, we should consider using the sort-by
function for a more concise expression. The sort-by
function also accepts a function to perform the actual comparison, similar to the sort
function. The sort-by
function can be demonstrated as follows:
In this example, the first and second statements both compare items after applying the first
function to each item in the given sequence. The last statement passes the >
function to the sort-by
function, which returns the reverse of the sequence returned by the first two statements.
Sequences can also be filtered, that is transformed by removing some elements from the sequence. There are several standard functions to perform this task. The keep
function can be used to remove values from a sequence that produces a nil
value for a given function. The keep
function requires a function and a sequence to be passed to it. The keep
function will apply the given function to each item in the sequence and remove all values that produce nil
, as shown here:
In this example, the first statement removes all even numbers from the given sequence. In the second statement, the seq
function is used to remove all empty collections from the given sequence.
A map or a set can also be passed as the first argument to the keep
function since they can be treated as functions, as shown here:
The filter
function can also be used to remove some elements from a given sequence. The filter
function expects a predicate function to be passed to it along with the sequence to be filtered. The items for which the predicate function does not return a truthy value are removed from the result. The filterv
function is identical to the filter function, except for the fact that it returns a vector instead of a list:
Both the filter
and keep
functions have similar semantics. However, the primary distinction is that the filter
function returns a subset of the original elements, whereas keep
returns a sequence of non nil
values that are returned by the function supplied to it, as shown in the following example:
Note that in this example, if we passed the odd?
function to the keep
form, it would return a list of true
and false
values, as these values are returned by the odd?
function.
Also, a for
macro with a :when
clause is translated into an expression that uses the filter
function, and hence a for
form can also be used to remove elements from a sequence:
A vector can be sliced using the subvec
function. By sliced, we mean to say that a smaller vector is selected from the original vector depending on the values passed to the subvec
function. The subvec
function takes a vector as its first argument, followed by the index indicating the start of the sliced vector, and finally another optional index that indicates the end of the sliced vector, as shown here:
Maps can be filtered by their keys using the select-keys
function. This function requires a map as the first argument and a vector of keys as a second argument to be passed to it. The vector of keys passed to this function indicates the key-value pairs to be included in the resulting map, as shown here:
Another way to select key-value pairs from a map is to use the find
function, as shown here:
take-while
and drop-while
are analogous to the take
and drop
functions, and require a predicate to be passed to them, instead of the number of elements to take or drop. The take-while
function takes elements as long as the predicate function returns a truthy value, and similarly the drop-while
function will drop elements for the same condition:
lazy-seq
and lazy-cat
are the most elementary constructs to create lazy sequences. The value returned by these functions will always have the type clojure.lang.LazySeq
. The lazy-seq
function is used to wrap a lazily computed expression in a cons
form. This means that the rest of the sequence created by the cons
form is lazily computed. For example, the lazy-seq
function can be used to construct a lazy sequence representing the Fibonacci sequence as shown in Example 1.8:
Example 1.8: A lazy sequence created using lazy-seq
The fibo-cons
function requires two initial values, a
and b
, to be passed to it as the initial values, and returns a lazy sequence comprising the first value a
and a lazily computed expression that uses the next two values in the sequence, that is, b
and (+ a b)
. In this case, the cons
form will return a lazy sequence, which can be handled using the take
and last
functions, as shown here:
Note that the fibo-cons
function from Example 1.8 recursively calls itself without an explicit recur
form, and yet it does not consume any stack space. This is because the values present in a lazy sequence are not stored in a call stack, and all the values are allocated on the process heap.
Another way to define a lazy Fibonacci sequence is by using the lazy-cat
function. This function essentially concatenates all the sequences it is supplied in a lazy fashion. For example, consider the definition of the Fibonacci sequence in Example 1.9:
Example 1.9: A lazy sequence created using lazy-cat
The fibo-seq
variable from Example 1.9 essentially calculates the Fibonacci sequence using a lazy composition of the map
, rest,
and +
functions. Also, a sequence is required as the initial value, instead of a function as we saw in the definition of fibo-cons
from Example 1.8. We can use the nth
function to obtain a number from this sequence as follows:
As shown previously, fibo-cons
and fibo-seq
are concise and idiomatic representations of the infinite series of numbers in the Fibonacci sequence. Both of these definitions return identical values and do not cause an error due to stack consumption.
An interesting fact is that most of the standard functions that return sequences, such as map
and filter
, are inherently lazy. Any expression that is built using these functions is lazy, and hence never evaluated until needed. For example, consider the following expression that uses the map
function:
In this example, the println
function is not called when we define the xs
variable. However, once we try to print it in the REPL, the sequence is evaluated and the numbers are printed out by calling the println
function. Note that xs
evaluates to (nil nil nil)
as the println
function always returns nil
.
Sometimes, it is necessary to eagerly evaluate a lazy sequence. The doall
and dorun
functions are used for this exact purpose. The doall
function essentially forces evaluation of a lazy sequence along with any side effects of the evaluation. The value returned by doall
is a list of all the elements in the given lazy sequence. For example, let's wrap the map
expression from the previous example in a doall
form, shown as follows:
Now, the numbers are printed out as soon as xs
is defined, as we force evaluation using the doall
function. The dorun
function has similar semantics as the doall
function, but it always returns nil
. Hence, we can use the dorun
function instead of doall
when we are only interested in the side effects of evaluating the lazy sequence, and not the actual values in it. Another way to call a function with some side effects over all values in a collection is by using the run!
function, which must be passed a function to call and a collection. The run!
function always returns nil
, just like the dorun
form.
Now that we are well versed with sequences, let's briefly examine zippers. Zippers are essentially data structures that help in traversing and manipulating trees. In Clojure, any collection that contains nested collections is termed as a tree. A zipper can be thought of as a structure that contains location information about a tree. Zippers are not an extension of trees, but rather can be used to traverse and realize a tree.
Note
The following namespaces must be included in your namespace declaration for the upcoming examples:
The following examples can be found in src/m_clj/c1/zippers.clj
of the book's source code.
We can define a simple tree using vector literals, as shown here:
The vector tree
is a tree, comprised of the nodes :a
, [1 2 3]
, :b
, and :c
. We can use the vector-zip
function to create a zipper from the vector tree
as follows:
The variable root
defined previously is a zipper and contains location information for traversing the given tree. Note that the vector-zip
function is simply a combination of the standard seq
function and the seq-zip
function from the clojure.zip
namespace. Hence, for trees that are represented as sequences, we should use the seq-zip
function instead. Also, all other functions in the clojure.zip
namespace expect their first argument to be a zipper.
To traverse the zipper, we must use the clojure.zip/next
function, which returns the next node in the zipper. We can easily iterate over all the nodes in the zipper using a composition of the iterate
and clojure.zip/next
functions, as shown here:
As shown previously, the first node of the zipper represents the original tree itself. Also, the zipper will contain some extra information, other than the value contained in the current node, which is useful in navigating across the given tree. In fact, the return value of the next
function is also a zipper. Once we have completely traversed the given tree, a zipper pointing to the root of the tree will be returned by the next
function. Note that some information in a zipper has been truncated from the preceding REPL output for the sake of readability.
To navigate to the adjacent nodes in a given zipper, we can use the down
, up
, left
, and right
functions. All of these functions return a zipper, as shown here:
The down
, up
, left
, and right
functions change the location of the root
zipper in the [:a [1 2 3] :b :c]
tree, as shown in the following illustration:
The preceding diagram shows a zipper at three different locations in the given tree. Initially, the location of the zipper is at the root of the tree, which is the entire vector. The down
function moves the location to the first child node in the tree. The left
and right
functions move the location of the zipper to other nodes at the same level or depth in the tree. The up
function moves the zipper to the parent of the node pointed to by the zipper's current location.
To obtain the node representing the current location of a zipper in a tree, we must use the node
function, as follows:
To navigate to the extreme left or right of a tree, we can use the leftmost
and rightmost
functions, respectively, as shown here:
The lefts
and rights
functions return the nodes that are present to the left and right, respectively, of a given zipper, as follows:
As the :a
node is the leftmost element in the tree, the rights
function will return all of the other nodes in the tree when passed a zipper that has :a
as the current location. Similarly, the lefts
function for the zipper at the :a
node will return an empty value, that is nil
.
The root
function can be used to obtain the root of a given zipper. It will return the original tree used to construct the zipper, as shown here:
The path
function can be used to obtain the path from the root element of a tree to the current location of a given zipper, as shown here:
In the preceding example, the path of the 1
node in tree
is represented by a vector containing the entire tree and the subtree [1 2 3]
. This means that to get to the 1
node, we must pass through the root and the subtree [1 2 3]
.
Now that we have covered the basics of navigating across trees, let's see how we can modify the original tree. The insert-child
function can be used to insert a given element into a tree as follows:
We can also remove a node from the zipper using the remove
function. Also, the replace
function can be used to replace a given node in a zipper:
One of the most noteworthy examples of tree-like data is XML. Since zippers are great at handling trees, they also allow us to easily traverse and modify XML content. Note that Clojure already provides the xml-seq
function to convert XML data into a sequence. However, treating an XML document as a sequence has many strange implications.
One of the main disadvantages of using xml-seq
is that there is no easy way to get to the root of the document from a node if we are iterating over a sequence. Also, xml-seq
only helps us iterate over the XML content; it doesn't deal with modifying it. These limitations can be overcome using zippers, as we will see in the upcoming example.
For example, consider the following XML document:
The document shown above contains countries and cities represented as XML nodes. Each country has a number of cities, and a single city as its capital. Some information, such as the name of the country and a flag indicating whether a city is a capital, is encoded in the XML attributes of the nodes.
Note
The following example expects the XML content shown previously to be present in the resources/data/sample.xml
file, relative to the root of your Leiningen project.
Let's define a function to find out all the capital cities in the document, as shown in Example 1.10:
Example 1.10: Querying XML with zippers
Firstly, we must note that the parse
function from the clojure.xml
namespace reads an XML document and returns a map representing the document. Each node in this map is another map with the :tag
, :attrs
, and :content
keys associated with the XML node's tag name, attributes, and content respectively.
In Example 1.10, we first define a simple function, is-capital-city?
, to determine whether a given XML node has the city
tag, represented as :city
. The is-capital-city?
function also checks whether the XML node contains the capital
attribute, represented as :capital
. If the value of the capital
attribute of a given node is the "true"
string, then the is-capital-city?
function returns true
.
The find-capitals
function performs most of the heavy lifting in this example. This function first parses XML documents present at the supplied path file-path
, and then converts it into a zipper using the xml-zip
function. We then iterate over the zipper using the next
function until we arrive back at the root node, which is checked by the take-while
function. We then map the node
function over the resulting sequence of zippers using the map
function, and apply the filter
function to find the capital cities among all the nodes. Finally, we use the mapcat
function to obtain the XML content of the filtered nodes and flatten the resulting sequence of vectors into a single list.
When supplied a file containing the XML content we described earlier, the find-capitals
function returns the names of all capital cities in the document:
As demonstrated previously, zippers are apt for dealing with trees and hierarchical data such as XML. More generally, sequences are a great abstraction for collections and several forms of data, and Clojure provides us with a huge toolkit for dealing with sequences. There are several more functions that handle sequences in the Clojure language, and you are encouraged to explore them on your own.