Operations on collections
In this section, we are going to illustrate how the manipulation of collections in Scala can be expressed in a concise and expressive way.
Transforming collections containing primitive types
The REPL is a great tool to try out the powerful operations that we can apply to the collection elements. Let's go back to our interpreter prompt:
scala> val numbers = List(1,2,3,4,5,6) numbers: List[Int] = List(1,2,3,4,5,6) scala> val reversedList = numbers.reverse reversedList: List[Int] = List(6,5,4,3,2,1) scala> val onlyAFew = numbers drop 2 take 3 onlyAFew: List[Int] = List(3, 4, 5)
The drop
method indicates that we get rid of the first two elements of the list, and the take
method indicates that we keep only three elements from the result obtained after the drop
method.
This last command is interesting for two reasons:
Since every method call is evaluated to an expression, we can chain several method calls at once (here,
take
is invoked on the result ofdrop
)As already stated before, the syntactic sugar added to the Scala syntax makes it equivalent to write
numbers drop 2
instead of the more traditional Javanumbers.drop(2)
Another way of writing elements in a given list is by using the ::
method, generally referred to in Scala documentation as the "cons operator". This alternative syntax looks like the following expression:
scala> val numbers = 1 :: 2 :: 3 :: 4 :: 5 :: 6 :: Nil numbers: List[Int] = List(1, 2, 3, 4, 5, 6)
If you are wondering why there is a Nil
value at the end of this expression, this is because there is a simple rule in Scala that says that a method whose last character is :
(that is, a colon) is applied on its right side rather than the left side (such a method is called as right-associative). So, the evaluation of 6 :: Nil
is not equivalent to 6.::(Nil)
in that case, but rather Nil.::(6)
. We can exhibit that into the REPL as follows:
scala> val simpleList = Nil.::(6) simpleList: List[Int] = List(6)
The evaluation of 5 :: 6 :: Nil
is therefore done by applying the ::
method on the simpleList
that we saw earlier, which is List(6)
:
scala> val twoElementsList = List(6).::(5) twoElementsList: List[Int] = List(5, 6)
In this case, 5
was appended before 6
. Repeating this operation several times will give you the final List(1,2,3,4,5,6)
.
This convenient way of expressing lists is not just for simple values such as integers but can be applied to any type. Moreover, we can concatenate two List
instances by using the :::
method in a similar way:
scala> val concatenatedList = simpleList ::: twoElementsList concatenatedList: List[Int] = List(6, 5, 6)
We can even mix elements of various types in the same List
, for example, integers and Booleans, as shown in the following code snippet:
scala> val things = List(0,1,true) things: List[AnyVal] = List(0, 1, true)
However, as you probably noticed, the result type AnyVal
chosen by the compiler in that case is the first common type between integers and Booleans encountered in their hierarchy. For instance, retrieving only the Boolean element (at index two in the list) will return an element of type AnyVal
rather than a Boolean
value:
scala> things(2) res6: AnyVal = true
Now, if we put an element of type String
within the list as well, we will get a different common type:
scala> val things = List(0,1,true,"false") things: List[Any] = List(0, 1, true, false)
The reason for that can be directly visualized by looking at the hierarchy of Scala types. Classes representing primitive types such as Int
, Byte
, Boolean
, or Char
belong to value types of scala.AnyVal,
whereas String
, Vector
, List
, or Set
belong to reference types of scala.AnyRef
, both being subclasses of the common type Any
, as shown in the following diagram:
The full hierarchy of Scala types is given in the official Scala documentation at http://docs.scala-lang.org/tutorials/tour/unified-types.html.
Collections of more complex objects
Let's manipulate objects that are more complex than integers. We can, for instance, create some collections of Money
instances that we made earlier and experiment with them:
scala> val amounts = List(Money(10,"USD"),Money(2,"EUR"),Money(20,"GBP"),Money(75,"EUR"),Money(100,"USD"),Money(50,"USD")) amounts: List[Money] = List(Money(10,USD), Money(2,EUR), Money(20,GBP), Money(75,EUR), Money(100,USD), Money(50,USD)) scala> val first = amounts.head first: Money = Money(10,USD) scala> val amountsWithoutFirst = amounts.tail amountsWithoutFirst: List[Money] = List(Money(2,EUR), Money(20,GBP), Money(75,EUR), Money(100,USD), Money(50,USD))
Filter and partition
Filtering elements of a collection is one of the most common operations and can be written as follows:
scala> val euros = amounts.filter(money => money.currency=="EUR") euros: List[Money] = List(Money(2,EUR), Money(75,EUR))
The parameter given to the filter
method is a function that takes a Money
item as the input and returns a Boolean
value (that is, a predicate), which is the result of evaluating money.currency=="EUR"
.
The filter
method iterates over the collection items and applies the function to each element, keeping only the elements for which the function returns True
. Lambda expressions are also referred to as
anonymous functions because we could give any name we want to the input argument, for example, x
instead of the money
used previously, and still get the same output:
scala> val euros = amounts.filter(x => x.currency=="EUR") euros: List[Money] = List(Money(2,EUR),Money(75,EUR))
A slightly shorter way of writing this one-liner can be done using an _
sign, a character that one encounters often when reading Scala code and that might seem awkward for a Java developer at first sight. It simply means "that thing", or "the current element". It can be thought of as the blank space or gap used to fill paper-based inquiries or passport registration forms, in the olden days. Other languages that deal with anonymous functions reserve other keywords, such as it
in Groovy, or self
in Python. The previous lambda example can be rewritten with the short underscore notation as the following:
scala> val euros = amounts.filter(_.currency=="EUR") euros: List[Money] = List(Money(2,EUR),Money(75,EUR))
A filterNot
method also exists to keep elements for which the evaluation of the function returns False
. Moreover, a partition
method is available to combine both the filter
and filterNot
methods into one single call that returns two collections, one evaluating to True
and the other to its complement, as shown in the following code snippet:
scala> val allAmounts = amounts.partition(amt => | amt.currency=="EUR") allAmounts: (List[Money], List[Money]) = (List(Money(2,EUR), Money(75,EUR)),List(Money(10,USD), Money(20,GBP), Money(100,USD), Money(50,USD)))
Dealing with tuples
Notice the return type of the partition result, (List[Money],List[Money])
. Scala supports the concept of tuples. The preceding parenthesis notation denotes a Tuple
type, which is a part of the standard Scala library and useful to manipulate several elements at once without having to create a more complex type for encapsulating them. In our case, allAmounts
is a Tuple2
pair containing two lists of Money
. To access only one of the two collections, we just need to type the following expressions:
scala> val euros = allAmounts._1 euros: List[Money] = List(Money(2,EUR),Money(75,EUR)) scala> val everythingButEuros= allAmounts._2 everythingButEuros: List[Money] = List(Money(10,USD),Money(20,GBP),Money(100,USD),Money(50,USD))
A cleaner and more natural syntax to achieve this as a one-liner, is the one that expresses the partition
method without referring to ._1
and ._2
, as shown in the following code snippet:
scala> val (euros,everythingButEuros) = amounts.partition(amt => | amt.currency=="EUR") euros: List[Money] = List(Money(2,EUR), Money(75,EUR)) everythingButEuros: List[Money] = List(Money(10,USD), Money(20,GBP), Money(100,USD), Money(50,USD))
This time, as a result, we get two variables, euros
and everythingButEuros
, which we can reuse individually:
scala> euros res2: List[Money] = List(Money(2,EUR), Money(75,EUR))
Introducing Map
Another elegant usage of tuples is related to the definition of a Map
collection, another structure that is part of the Scala collections. Similar to Java, the Map
collection stores key-value pairs. In Java, a trivial HashMap
definition that populates and retrieves elements of a Map
collection with a couple of values can be written with a few lines of code:
import java.util.HashMap; import java.util.Map; public class MapSample { public static void main(String[] args) { Map amounts = new HashMap<String,Integer>(); amounts.put("USD", 10); amounts.put("EUR", 2); Integer euros = (Integer)amounts.get("EUR"); Integer pounds = (Integer)amounts.get("GBP"); System.out.println("Euros: "+euros); System.out.println("Pounds: "+pounds); } }
Since no amount of GBP currency has been inserted into the Map
collection, running this sample will return a null
value for the Pounds
variable:
Euros: 2 Pounds: null
Populating a Map
collection in Scala can be elegantly written as follows:
scala> val wallet = Map( "USD" -> 10, "EUR" -> 2 ) wallet: scala.collection.immutable.Map[String,Int] = Map(USD -> 10, EUR -> 2)
The "USD" -> 10
expression is a convenient way of specifying a key-value pair and is equivalent to the definition of a Tuple2[String,Integer]
object in this case, as illustrated directly in the REPL (which could infer the type automatically):
scala> val tenDollars = "USD"-> 10 tenDollars : (String, Int) = (USD,10) scala> val tenDollars = ("USD",10) tenDollars : (String, Int) = (USD,10)
The process of adding and retrieving an element is very straightforward:
scala> val updatedWallet = wallet + ("GBP" -> 20) wallet: scala.collection.immutable.Map[String,Int] = Map(USD -> 10, EUR -> 2, GBP -> 20) scala> val someEuros = wallet("EUR") someEuros: Int = 2
However, accessing an element that is not included in the Map
collection will throw an exception, as follows:
scala> val somePounds = wallet("GBP") java.util.NoSuchElementException: key not found: GBP (followed by a full stacktrace)
Introducing the Option construct
A safer way to retrieve an element from the Map
collection that was introduced in the previous section is to invoke its .get()
method, which will instead return an object of type Option
, a feature that is not currently available in Java. Basically, an Option
type wraps a value into an object that can either return the type None
if the value is null, or Some(value)
otherwise. Let's enter this in the REPL:
scala> val mayBeSomeEuros = wallet.get("EUR") mayBeSomeEuros: Option[Int] = Some(2) scala> val mayBeSomePounds = wallet.get("GBP") mayBeSomePounds: Option[Int] = None
A glimpse at pattern matching
Avoiding the throwing of an exception makes it convenient to continue handling the flow of an algorithm as an evaluated expression. It not only gives the programmer the freedom of sophisticated chaining of the Option
values without having to check for the existence of a value, but also enables one to handle the two different cases via pattern matching:
scala> val status = mayBeSomeEuros match { | case None => "Nothing of that currency" | case Some(value) => "I have "+value+" Euros" | } status: String = I have 2 Euros
Pattern matching is an essential and powerful feature of the Scala language. We will look at more examples of it later on.
The filter
and partition
methods were just two examples of the so-called "higher-order" functions on lists, since they operate on containers of collection types (such as lists, sets, and so on) rather than the types themselves.
The map method
Among the collections' methods that cannot be overlooked lies the map
method (not to be confused with the Map
object). Basically, it applies a function to every element of a collection, but instead of returning Unit
for the foreach
method, it returns a collection of a similar container type (for example, a List
will return a List
of the same size) that contains the result of transforming each element through the function. A very simple example is shown in the following code snippet:
scala> List(1,2,3,4).map(x => x+1) res6: List[Int] = List(2,3,4,5)
In Scala, you may define standalone functions as follows:
scala> def increment = (x:Int) => x + 1 increment: Int => Int
We have declared an increment
function that takes an Int
value as the input (denoted by x
) and returns another Int
value (x+1
).
The previous List
transformation can be rewritten slightly in a different manner as shown in the following code snippet:
scala> List(1,2,3,4).map(increment) res7: List[Int] = List(2,3,4,5)
Using a bit of syntactic sugar, the .
sign in the method call, as well as the parenthesis on the function parameter can be omitted for readability, which leads to the following concise one-liner:
scala> List(1,2,3,4) map increment res8: List[Int] = List(2, 3, 4, 5)
Going back to our initial list of the Money
amounts, we can, for example, transform them into strings as follows:
scala> val printedAmounts = | amounts map(m=> ""+ m.amount + " " + m.currency) printedAmounts: List[String] = List(10 USD, 2 EUR, 20 GBP, 75 EUR, 100 USD, 50 USD)
Looking at String Interpolation
In Java, concatenating strings using a +
operator, as we did in the previous line, is a very common operation. In Scala, a more elegant and efficient way to deal with the presentation of strings is a feature named String Interpolation. Available since Scala Version 2.10, the new syntax involves prepending a s
character to the string literal as shown in the following code snippet:
scala> val many = 10000.2345 many: Double = 10000.2345 scala> val amount = s"$many euros" amount: String = 10000.2345 euros
Any variable in scope can be processed and embedded in a string. Formatting can even be more precise by using an f
interpolator instead of s
. In that case, the syntax follows the same style as that of the printf
method of other languages, where, for instance, %4d
means a four-digit formatting or %12.2f
means a floating point notation with exactly twelve digits before the comma and two afterwards:
scala> val amount = f"$many%12.2f euros" amount: String = " 10000.23 euros"
Moreover, the String Interpolation syntax enables us to embed the full evaluation of an expression, that is, a full block of code performing a calculation. The following is an example, where we want to display the value of our many
variable twice:
scala> val amount = s"${many*2} euros" amount: String = 20000.469 euros
The preceding block of code obeys the same rules as any method or function evaluation, meaning that the last statement in the block is the result. Although here we have a very simple computation, it is perfectly valid to include a multiline algorithm if needed.
Knowing the interpolation syntax, we can rewrite our previous amounts
as follows:
scala> val printedAmounts = | amounts map(m=> s"${m.amount} ${m.currency}") printedAmounts: List[String] = List(10 USD, 2 EUR, 20 GBP, 75 EUR, 100 USD, 50 USD)
The groupBy method
Another convenient operation is the groupBy
method that transforms a collection into a Map
collection:
scala> val sortedAmounts = amounts groupBy(_.currency) sortedAmounts: scala.collection.immutable.Map[String,List[Money]] = Map(EUR -> List(Money(2,EUR), Money(75,EUR)), GBP -> List(Money(20,GBP)), USD -> List(Money(10,USD), Money(100,USD), Money(50,USD)))
The foldLeft method
One last method that we would like to introduce here is the foldLeft
method, which propagates some state from one element to the next. For instance, to sum elements in a list, you need to accumulate them and keep track of the intermediate counter from one element to the next:
scala> val sumOfNumbers = numbers.foldLeft(0) { (total,element) => | total + element | } sumOfNumbers: Int = 21
The value 0
given as the first argument to foldLeft
is the initial value (which means total=0
when applying the function for the first List
element). The (total,element)
notation represents a Tuple2
pair. Note, however, that for summation, the Scala API provides a sum
method, so the last statement could have been written as follows:
scala> val sumOfNumbers = numbers.sum sumOfNumbers: Int = 21