Chapter 3. Data Structures and Manipulation
Most of the time that you spend in programming, you do something to manipulate data. You process properties of data, derive conclusions based on the data, and change the nature of the data. In this chapter, we will take an exhaustive look at various data structures and data manipulation techniques in JavaScript. With the correct usage of these expressive constructs, your programs will be correct, concise, easy to read, and most probably faster. This will be explained with the help of the following topics:
- Regular expressions
- Exact match
- Match from a class of characters
- Repeated occurrences
- Beginning and end
- Backreferences
- Greedy and lazy quantifiers
- Arrays
- Maps
- Sets
- A matter of style
Regular expressions
If you are not familiar with regular expressions, I request you to spend time learning them. Learning and using regular expressions effectively is one of the most rewarding skills that you will gain. During most of the code review sessions, the first thing that I comment on is how a piece of code can be converted to a single line of regular expression (or RegEx). If you study popular JavaScript libraries, you will be surprised to see how ubiquitous RegEx are. Most seasoned engineers rely on RegEx primarily because once you know how to use them, they are concise and easy to test. However, learning RegEx will take a significant amount of effort and time. A regular expression is a way to express a pattern to match strings of text. The expression itself consists of terms and operators that allow us to define these patterns. We'll see what these terms and operators consist of shortly.
In JavaScript, there are two ways to create a regular expression: via a regular expression literal and constructing an instance of a RegExp
object.
For example, if we wanted to create a RegEx that matches the string test exactly, we could use the following RegEx literal:
var pattern = /test/;
RegEx literals are delimited using forward slashes. Alternatively, we could construct a RegExp
instance, passing the RegEx as a string:
var pattern = new RegExp("test");
Both of these formats result in the same RegEx being created in the variable pattern. In addition to the expression itself, there are three flags that can be associated with a RegEx:
i
: This makes the RegEx case-insensitive, so/test/i
matches not onlytest
, but alsoTest
,TEST
,tEsT
, and so on.g
: This matches all the instances of the pattern as opposed to the default of local, which matches the first occurrence only. More on this later.m
: This allows matches across multiple lines that might be obtained from the value of atextarea
element.
These flags are appended to the end of the literal (for example, /test/ig
) or passed in a string as the second parameter to the RegExp
constructor (new RegExp("test", "ig")
).
The following example illustrates the various flags and how they affect the pattern match:
var pattern = /orange/; console.log(pattern.test("orange")); // true var patternIgnoreCase = /orange/i; console.log(patternIgnoreCase.test("Orange")); // true var patternGlobal = /orange/ig; console.log(patternGlobal.test("Orange Juice")); // true
It isn't very exciting if we can just test whether the pattern matches a string. Let's see how we can express more complex patterns.
Exact match
Any sequence of characters that's not a special RegEx character or operator represents a character literal:
var pattern = /orange/;
We mean o
followed by r
followed by a
followed by n
followed by …—you get the point. We rarely use exact match when using RegEx because that is the same as comparing two strings. Exact match patterns are sometimes called simple patterns.
Match from a class of characters
If you want to match against a set of characters, you can place the set inside []
. For example, [abc]
would mean any character a
, b
, or c
:
var pattern = /[abc]/; console.log(pattern.test('a')); //true console.log(pattern.test('d')); //false
You can specify that you want to match anything but the pattern by adding a ^
(caret sign) at the beginning of the pattern:
var pattern = /[^abc]/; console.log(pattern.test('a')); //false console.log(pattern.test('d')); //true
One critical variation of this pattern is a range of values. If we want to match against a sequential range of characters or numbers, we can use the following pattern:
var pattern = /[0-5]/; console.log(pattern.test(3)); //true console.log(pattern.test(12345)); //true console.log(pattern.test(9)); //false console.log(pattern.test(6789)); //false console.log(/[0123456789]/.test("This is year 2015")); //true
Special characters such as $
and period (.
) characters either represent matches to something other than themselves or operators that qualify the preceding term. In fact, we've already seen how [
, ]
, -
, and ^
characters are used to represent something other than their literal values.
How do we specify that we want to match a literal [
or $
or ^
or some other special character? Within a RegEx, the backslash character escapes whatever character follows it, making it a literal match term. So \[
specifies a literal match to the [
character rather than the opening of a character class expression. A double backslash (\\
) matches a single backslash.
In the preceding examples, we saw the test()
method that returns true or false based on the pattern matched. There are times when you want to access occurrences of a particular pattern. The exec()
method comes in handy in such situations.
The exec()
method takes a string as an argument and returns an array containing all matches. Consider the following example:
var strToMatch = 'A Toyota! Race fast, safe car! A Toyota!'; var regExAt = /Toy/; var arrMatches = regExAt.exec(strToMatch); console.log(arrMatches);
The output of this snippet would be ['Toy']; if you want all the instances of the pattern Toy
, you can use the g
(global) flag as follows:
var strToMatch = 'A Toyota! Race fast, safe car! A Toyota!'; var regExAt = /Toy/g; var arrMatches = regExAt.exec(strToMatch); console.log(arrMatches);
This will return all the occurrences of the word oyo
from the original text. The String object contains the match()
method that has similar functionality of the exec()
method. The match()
method is called on a String object and the RegEx is passed to it as a parameter. Consider the following example:
var strToMatch = 'A Toyota! Race fast, safe car! A Toyota!'; var regExAt = /Toy/; var arrMatches = strToMatch.match(regExAt); console.log(arrMatches);
In this example, we are calling the match()
method on the String object. We pass the RegEx as a parameter to the match()
method. The results are the same in both these cases.
The other String object method is replace()
. It replaces all the occurrences of a substring with a different string:
var strToMatch = 'Blue is your favorite color ?'; var regExAt = /Blue/; console.log(strToMatch.replace(regExAt, "Red")); //Output- "Red is your favorite color ?"
It is possible to pass a function as a second parameter of the replace()
method. The replace()
function takes the matching text as a parameter and returns the text that is used as a replacement:
var strToMatch = 'Blue is your favorite color ?'; var regExAt = /Blue/; console.log(strToMatch.replace(regExAt, function(matchingText){ return 'Red'; })); //Output- "Red is your favorite color ?"
The String object's
split()
method also takes a RegEx parameter and returns an array containing all the substrings generated after splitting the original string:
var sColor = 'sun,moon,stars'; var reComma = /\,/; console.log(sColor.split(reComma)); //Output - ["sun", "moon", "stars"]
We need to add a backslash before the comma because a comma is treated specially in RegEx and we need to escape it if we want to use it literally.
Using simple character classes, you can match multiple patterns. For example, if you want to match cat
, bat
, and fat
, the following snippet shows you how to use simple character classes:
var strToMatch = 'wooden bat, smelly Cat,a fat cat'; var re = /[bcf]at/gi; var arrMatches = strToMatch.match(re); console.log(arrMatches); //["bat", "Cat", "fat", "cat"]
As you can see, this variation opens up possibilities to write concise RegEx patterns. Take the following example:
var strToMatch = 'i1,i2,i3,i4,i5,i6,i7,i8,i9'; var re = /i[0-5]/gi; var arrMatches = strToMatch.match(re); console.log(arrMatches); //["i1", "i2", "i3", "i4", "i5"]
In this example, we are matching the numeric part of the matching string with a range [0-5]
, hence we get a match from i0
to i5
. You can also use the negation class ^
to filter the rest of the matches:
var strToMatch = 'i1,i2,i3,i4,i5,i6,i7,i8,i9'; var re = /i[^0-5]/gi; var arrMatches = strToMatch.match(re); console.log(arrMatches); //["i6", "i7", "i8", "i9"]
Observe how we are negating only the range clause and not the entire expression.
Several character groups have shortcut notations. For example, the shortcut \d
means the same thing as [0-9]
:
Notation |
Meaning |
---|---|
|
Any digit character |
|
An alphanumeric character (word character) |
|
Any whitespace character (space, tab, newline, and similar) |
|
A character that is not a digit |
|
A non-alphanumeric character |
|
A non-whitespace character |
|
Any character except for newline |
These shortcuts are valuable in writing concise RegEx. Consider this example:
var strToMatch = '123-456-7890'; var re = /[0-9][0-9][0-9]-[0-9][0-9][0-9]/; var arrMatches = strToMatch.match(re); console.log(arrMatches); //["123-456"]
This expression definitely looks a bit strange. We can replace [0-9]
with \d
and make this a bit more readable:
var strToMatch = '123-456-7890'; var re = /\d\d\d-\d\d\d/; var arrMatches = strToMatch.match(re); console.log(arrMatches); //["123-456"]
However, you will soon see that there are even better ways to do something like this.
Repeated occurrences
So far, we saw how we can match fixed characters or numeric patterns. Most often, you want to handle certain repetitive natures of patterns also. For example, if I want to match 4 a
s, I can write /aaaa/
, but what if I want to specify a pattern that can match any number of a
s?
Regular expressions provide you with a wide variety of repetition quantifiers. Repetition quantifiers let us specify how many times a particular pattern can occur. We can specify fixed values (characters should appear n times) and variable values (characters can appear at least n times till they appear m times). The following table lists the various repetition quantifiers:
?
: Either 0 or 1 occurrence (marks the occurrence as optional)*
: 0 or more occurrences+
: 1 or more occurrences{n}
: Exactlyn
occurrences{n,m}
: Occurrences betweenn
andm
{n,}
: At least ann
occurrence{,n}
: 0 ton
occurrences
In the following example, we create a pattern where the character u
is optional (has 0 or 1 occurrence):
var str = /behaviou?r/; console.log(str.test("behaviour")); // true console.log(str.test("behavior")); // true
It helps to read the /behaviou?r/
expression as 0 or 1 occurrences of character u
. The repetition quantifier succeeds the character that we want to repeat. Let's try out some more examples:
console.log(/'\d+'/.test("'123'")); // true
You should read and interpret the \d+
expression as '
is a literal character match, \d
matches characters [0-9]
, the +
quantifier will allow one or more occurrences, and '
is a literal character match.
You can also group character expressions using ()
. Observe the following example:
var heartyLaugh = /Ha+(Ha+)+/i; console.log(heartyLaugh.test("HaHaHaHaHaHaHaaaaaaaaaaa")); //true
Let's break the preceding expression into smaller chunks to understand what is going on in here:
H
: literal character matcha+
: 1 or more occurrences of charactera
(
: start of the expression groupH
: literal character matcha+
: 1 or more occurrences of charactera
)
: end of expression group+
: 1 or more occurrences of expression group (Ha+
)
Now it is easier to see how the grouping is done. If we have to interpret the expression, it is sometimes helpful to read out the expression, as shown in the preceding example.
Often, you want to match a sequence of letters or numbers on their own and not just as a substring. This is a fairly common use case when you are matching words that are not just part of any other words. We can specify the word boundaries by using the \b
pattern. The word boundary with \b
matches the position where one side is a word character (letter, digit, or underscore) and the other side is not. Consider the following examples.
The following is a simple literal match. This match will also be successful if cat
is part of a substring:
console.log(/cat/.test('a black cat')); //true
However, in the following example, we define a word boundary by indicating \b
before the word cat
—this means that we want to match only if cat
is a word and not a substring. The boundary is established before cat
, and hence a match is found on the text, a black cat
:
console.log(/\bcat/.test('a black cat')); //true
When we use the same boundary with the word tomcat
, we get a failed match because there is no word boundary before cat
in the word tomcat
:
console.log(/\bcat/.test('tomcat')); //false
There is a word boundary after the string cat
in the word tomcat
, hence the following is a successful match:
console.log(/cat\b/.test('tomcat')); //true
In the following example, we define the word boundary before and after the word cat
to indicate that we want cat
to be a standalone word with boundaries before and after:
console.log(/\bcat\b/.test('a black cat')); //true
Based on the same logic, the following match fails because there are no boundaries before and after cat
in the word concatenate
:
console.log(/\bcat\b/.test("concatenate")); //false
The exec()
method is useful in getting information about the match found because it returns an object with information about the match. The object returned from exec()
has an index
property that tells us where the successful match begins in the string. This is useful in many ways:
var match = /\d+/.exec("There are 100 ways to do this"); console.log(match); // ["100"] console.log(match.index); // 10
Alternatives – OR
Alternatives can be expressed using the |
(pipe) character. For example, /a|b/
matches either the a
or b
character, and /(ab)+|(cd)+/
matches one or more occurrences of either ab
or cd
.
Alternatives – OR
Alternatives can be expressed using the |
(pipe) character. For example, /a|b/
matches either the a
or b
character, and /(ab)+|(cd)+/
matches one or more occurrences of either ab
or cd
.
Beginning and end
Frequently, we may wish to ensure that a pattern matches at the beginning of a string or perhaps at the end of a string. The caret character, when used as the first character of the RegEx, anchors the match at the beginning of the string such that /^test/
matches only if the test substring appears at the beginning of the string being matched. Similarly, the dollar sign ($
) signifies that the pattern must appear at the end of the string: /test$/
.
Using both ^
and $
indicates that the specified pattern must encompass the entire candidate string: /^test$/
.
Backreferences
After an expression is evaluated, each group is stored for later use. These values are known as backreferences. Backreferences are created and numbered by the order in which opening parenthesis characters are encountered going from left to right. You can think of backreferences as the portions of a string that are successfully matched against terms in the regular expression.
The notation for a backreference is a backslash followed by the number of the capture to be referenced, beginning with 1, such as \1
, \2
, and so on.
An example could be /^([XYZ])a\1/
, which matches a string that starts with any of the X
, Y
, or Z
characters followed by an a
and followed by whatever character matched the first capture. This is very different from /[XYZ] a[XYZ]/
. The character following a
can't be any of X
, or Y
, or Z
, but must be whichever one of those that triggered the match for the first character. Backreferences are used with String's replace()
method using the special character sequences, $1
, $2
, and so on. Suppose that you want to change the 1234 5678
string to 5678 1234
. The following code accomplishes this:
var orig = "1234 5678"; var re = /(\d{4}) (\d{4})/; var modifiedStr = orig.replace(re, "$2 $1"); console.log(modifiedStr); //outputs "5678 1234"
In this example, the regular expression has two groups each with four digits. In the second argument of the replace()
method, $2
is equal to 5678
and $1
is equal to 1234
, corresponding to the order in which they appear in the expression.
Greedy and lazy quantifiers
All the quantifiers that we discussed so far are greedy. A greedy quantifier starts looking at the entire string for a match. If there are no matches, it removes the last character in the string and reattempts the match. If a match is not found again, the last character is again removed and the process is repeated until a match is found or the string is left with no characters.
The \d+
pattern, for example, will match one or more digits. For example, if your string is 123
, a greedy match would match 1
, 12
, and 123
. Greedy pattern h
.+l
would match hell
in a string hello
—which is the longest possible string match. As \d+
is greedy, it will match as many digits as possible and hence the match would be 123
.
In contrast to greedy quantifiers, a lazy quantifier matches as few of the quantified tokens as possible. You can add a question mark (?
) to the regular expression to make it lazy. A lazy pattern h.?l
would match hel
in the string hello
—which is the shortest possible string.
The \w*?X
pattern will match zero or more words and then match an X
. However, a question mark after *
indicates that as few characters as possible should be matched. For an abcXXX
string, the match can be abcX
, abcXX
, or abcXXX
. Which one should be matched? As *?
is lazy, as few characters as possible are matched and hence the match is abcX
.
With this necessary information, let's try to solve some common problems using regular expressions.
Removing extra white space from the beginning and end of a string is a very common use case. As a String object did not have the trim()
method until recently, several JavaScript libraries provide and use an implementation of string trimming for older browsers that don't have the String.trim()
method. The most commonly used approach looks something like the following code:
function trim(str) { return (str || "").replace(/^\s+|\s+$/g, ""); } console.log("--"+trim(" test ")+"--"); //"--test--"
What if we want to replace repeated whitespaces with a single whitespace?
re=/\s+/g; console.log('There are a lot of spaces'.replace(re,' ')); //"There are a lot of spaces"
In the preceding snippet, we are trying to match one or more space character sequences and replacing them with a single space.
As you can see, regular expressions can prove to be a Swiss army knife in your JavaScript arsenal. Careful study and practice will be extremely rewarding for you in the long run.
Arrays
An array is an ordered set of values. You can refer to the array elements with a name and index. These are the three ways to create arrays in JavaScript:
var arr = new Array(1,2,3); var arr = Array(1,2,3); var arr = [1,2,3];
When these values are specified, the array is initialized with them as the array's elements. An array's length
property is equal to the number of arguments. The bracket syntax is called an array literal. It's a shorter and preferred way to initialize arrays.
You have to use the array literal syntax if you want to initialize an array with a single element and the element happens to be a number. If you pass a single number value to the Array()
constructor or function, JavaScript considers this parameter as the length of the array, not as a single element:
var arr = [10]; var arr = Array(10); // Creates an array with no element, but with arr.length set to 10 // The above code is equivalent to var arr = []; arr.length = 10;
JavaScript does not have an explicit array data type. However, you can use the predefined Array
object and its methods to work with arrays in your applications. The Array
object has methods to manipulate arrays in various ways, such as joining, reversing, and sorting them. It has a property to determine the array length and other properties for use with regular expressions.
You can populate an array by assigning values to its elements:
var days = []; days[0] = "Sunday"; days[1] = "Monday";
You can also populate an array when you create it:
var arr_generic = new Array("A String", myCustomValue, 3.14); var fruits = ["Mango", "Apple", "Orange"]
In most languages, the elements of an array are all required to be of the same type. JavaScript allows an array to contain any type of values:
var arr = [ 'string', 42.0, true, false, null, undefined, ['sub', 'array'], {object: true}, NaN ];
You can refer to elements of an Array
using the element's index number. For example, suppose you define the following array:
var days = ["Sunday", "Monday", "Tuesday"]
You then refer to the first element of the array as colors[0]
and the second element of the array as colors[1]
. The index of the elements starts with 0
.
JavaScript internally stores array elements as standard object properties, using the array index as the property name. The length
property is different. The length
property always returns the index of the last element plus one. As we discussed, JavaScript array indexes are 0-based: they start at 0
, not 1
. This means that the length
property will be one more than the highest index stored in the array:
var colors = []; colors[30] = ['Green']; console.log(colors.length); // 31
You can also assign to the length
property. Writing a value that is shorter than the number of stored items truncates the array; writing 0
empties it entirely:
var colors = ['Red', 'Blue', 'Yellow']; console.log(colors.length); // 3 colors.length = 2; console.log(colors); // ["Red","Blue"] - Yellow has been removed colors.length = 0; console.log(colors); // [] the colors array is empty colors.length = 3; console.log(colors); // [undefined, undefined, undefined]
If you query a non-existent array index, you get undefined
.
A common operation is to iterate over the values of an array, processing each one in some way. The simplest way to do this is as follows:
var colors = ['red', 'green', 'blue']; for (var i = 0; i < colors.length; i++) { console.log(colors[i]); }
The forEach()
method provides another way of iterating over an array:
var colors = ['red', 'green', 'blue']; colors.forEach(function(color) { console.log(color); });
The function passed to forEach()
is executed once for every item in the array, with the array item passed as the argument to the function. Unassigned values are not iterated in a forEach()
loop.
The Array
object has a bunch of useful methods. These methods allow the manipulation of the data stored in the array.
The concat()
method joins two arrays and returns a new array:
var myArray = new Array("33", "44", "55"); myArray = myArray.concat("3", "2", "1"); console.log(myArray); // ["33", "44", "55", "3", "2", "1"]
The join()
method joins all the elements of an array into a string. This can be useful while processing a list. The default delimiter is a comma (,
):
var myArray = new Array('Red','Blue','Yellow'); var list = myArray.join(" ~ "); console.log(list); //"Red ~ Blue ~ Yellow"
The pop()
method removes the last element from an array and returns that element. This is analogous to the pop()
method of a stack:
var myArray = new Array("1", "2", "3"); var last = myArray.pop(); // myArray = ["1", "2"], last = "3"
The push()
method adds one or more elements to the end of an array and returns the resulting length of the array:
var myArray = new Array("1", "2"); myArray.push("3"); // myArray = ["1", "2", "3"]
The shift()
method removes the first element from an array and returns that element:
var myArray = new Array ("1", "2", "3"); var first = myArray.shift(); // myArray = ["2", "3"], first = "1"
The unshift()
method adds one or more elements to the front of an array and returns the new length of the array:
var myArray = new Array ("1", "2", "3"); myArray.unshift("4", "5"); // myArray = ["4", "5", "1", "2", "3"]
The reverse()
method reverses or transposes the elements of an array—the first array element becomes the last and the last becomes the first:
var myArray = new Array ("1", "2", "3"); myArray.reverse(); // transposes the array so that myArray = [ "3", "2", "1" ]
The sort()
method sorts the elements of an array:
var myArray = new Array("A", "C", "B"); myArray.sort(); // sorts the array so that myArray = [ "A","B","c" ]
The sort()
method can optionally take a callback function to define how the elements are compared. The function compares two values and returns one of three values. Let us study the following functions:
indexOf(searchElement[, fromIndex])
: This searches the array forsearchElement
and returns the index of the first match:var a = ['a', 'b', 'a', 'b', 'a','c','a']; console.log(a.indexOf('b')); // 1 // Now try again, starting from after the last match console.log(a.indexOf('b', 2)); // 3 console.log(a.indexOf('1')); // -1, 'q' is not found
lastIndexOf(searchElement[, fromIndex])
: This works likeindexOf()
, but only searches backwards:var a = ['a', 'b', 'c', 'd', 'a', 'b']; console.log(a.lastIndexOf('b')); // 5 // Now try again, starting from before the last match console.log(a.lastIndexOf('b', 4)); // 1 console.log(a.lastIndexOf('z')); // -1
Now that we have covered JavaScript arrays in depth, let me introduce you to a fantastic library called Underscore.js (http://underscorejs.org/). Underscore.js provides a bunch of exceptionally useful functional programming helpers to make your code even more clear and functional.
We will assume that you are familiar with Node.js; in this case, install Underscore.js via npm:
npm install underscore
As we are installing Underscore as a Node module, we will test all the examples by typing them in a .js
file and running the file on Node.js. You can install Underscore using Bower also.
Like jQuery's $
module, Underscore comes with a _
module defined. You will call all functions using this module reference.
Type the following code in a text file and name it test_.js
:
var _ = require('underscore');
function print(n){
console.log(n);
}
_.each([1, 2, 3], print);
//prints 1 2 3
This can be written as follows, without using each()
function from underscore library:
var myArray = [1,2,3]; var arrayLength = myArray.length; for (var i = 0; i < arrayLength; i++) { console.log(myArray[i]); }
What you see here is a powerful functional construct that makes the code much more elegant and concise. You can clearly see that the traditional approach is verbose. Many languages such as Java suffer from this verbosity. They are slowly embracing functional paradigms. As JavaScript programmers, it is important for us to incorporate these ideas into our code as much as possible.
The each()
function we saw in the preceding example iterates over a list of elements, yielding each to an iteratee function in turn. Each invocation of iteratee is called with three arguments (element, index, and list). In the preceding example, the each()
function iterates over the array [1,2,3]
, and for each element in the array, the print
function is called with the array element as the parameter. This is a convenient alternative to the traditional looping mechanism to access all the elements in an array.
The range()
function creates lists of integers. The start value, if omitted, defaults to 0
and step defaults to 1
. If you'd like a negative range, use a negative step:
var _ = require('underscore'); console.log(_.range(10)); // [0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ] console.log(_.range(1, 11)); //[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ] console.log(_.range(0, 30, 5)); //[ 0, 5, 10, 15, 20, 25 ] console.log(_.range(0, -10, -1)); //[ 0, -1, -2, -3, -4, -5, -6, -7, -8, -9 ] console.log(_.range(0)); //[]
By default, range()
populates the array with integers, but with a little trick, you can populate other data types also:
console.log(_.range(3).map(function () { return 'a' }) ); [ 'a', 'a', 'a' ]
This is a fast and convenient way to create and initialize an array with values. We frequently do this by traditional loops.
The map()
function produces a new array of values by mapping each value in the list through a transformation function. Consider the following example:
var _ = require('underscore');
console.log(_.map([1, 2, 3], function(num){ return num * 3; }));
//[3,6,9]
The reduce()
function reduces a list of values to a single value. The initial state is passed by the iteratee function and each successive step is returned by the iteratee. The following example shows the usage:
var _ = require('underscore'); var sum = _.reduce([1, 2, 3], function(memo, num){console.log(memo,num);return memo + num; }, 0); console.log(sum);
In this example, the line, console.log(memo,num);
, is just to make the idea clear. The output will be as follows:
0 1 1 2 3 3 6
The final output is a sum of 1+2+3=6. As you can see, two values are passed to the iteratee function. On the first iteration, we call the iteratee function with two values (0,1)
—the value of the memo
is defaulted in the call to the reduce()
function and 1
is the first element of the list. In the function, we sum memo
and num
and return the intermediate sum
, which will be used by the iterate()
function as a memo
parameter—eventually, the memo
will have the accumulated sum
. This concept is important to understand how the intermediate states are used to calculate eventual results.
The filter()
function iterates through the entire list and returns an array of all the elements that pass the condition. Take a look at the following example:
var _ = require('underscore'); var evens = _.filter([1, 2, 3, 4, 5, 6], function(num){ return num % 2 == 0; }); console.log(evens);
The filter()
function's iteratee function should return a truth value. The resulting evens
array contains all the elements that satisfy the truth test.
The opposite of the filter()
function is reject()
. As the name suggests, it iterates through the list and ignores elements that satisfy the truth test:
var _ = require('underscore'); var odds = _.reject([1, 2, 3, 4, 5, 6], function(num){ return num % 2 == 0; }); console.log(odds); //[ 1, 3, 5 ]
We are using the same code as the previous example but using the reject()
method instead of filter()
—the result is exactly the opposite.
The contains()
function is a useful little function that returns true
if the value is present in the list; otherwise, returns false
:
var _ = require('underscore'); console.log(_.contains([1, 2, 3], 3)); //true
One very useful function that I have grown fond of is invoke()
. It calls a specific function on each element in the list. I can't tell you how many times I have used it since I stumbled upon it. Let us study the following example:
var _ = require('underscore'); console.log(_.invoke([[5, 1, 7], [3, 2, 1]], 'sort')); //[ [ 1, 5, 7 ], [ 1, 2, 3 ] ]
In this example, the sort()
method of the Array
object is called for each element in the array. Note that this would fail:
var _ = require('underscore'); console.log(_.invoke(["new","old","cat"], 'sort')); //[ undefined, undefined, undefined ]
This is because the sort
method is not part of the String object. This, however, would work perfectly:
var _ = require('underscore'); console.log(_.invoke(["new","old","cat"], 'toUpperCase')); //[ 'NEW', 'OLD', 'CAT' ]
This is because toUpperCase()
is a String object method and all elements of the list are of the String type.
The uniq()
function returns the array after removing all duplicates from the original one:
var _ = require('underscore'); var uniqArray = _.uniq([1,1,2,2,3]); console.log(uniqArray); //[1,2,3]
The partition()
function splits the array into two; one whose elements satisfy the predicate and the other whose elements don't satisfy the predicate:
var _ = require('underscore'); function isOdd(n){ return n%2==0; } console.log(_.partition([0, 1, 2, 3, 4, 5], isOdd)); //[ [ 0, 2, 4 ], [ 1, 3, 5 ] ]
The compact()
function returns a copy of the array without all falsy values (false, null, 0, "", undefined, and NaN):
console.log(_.compact([0, 1, false, 2, '', 3]));
This snippet will remove all falsy values and return a new array with elements [1,2,3]
—this is a helpful method to eliminate any value from a list that can cause runtime exceptions.
The without()
function returns a copy of the array with all instances of the specific values removed:
var _ = require('underscore'); console.log(_.without([1,2,3,4,5,6,7,8,9,0,1,2,0,0,1,1],0,1,2)); //[ 3, 4, 5, 6, 7, 8, 9 ]
Maps
ECMAScript 6 introduces maps. A map is a simple key-value map and can iterate its elements in the order of their insertion. The following snippet shows some methods of the Map
type and their usage:
var founders = new Map(); founders.set("facebook", "mark"); founders.set("google", "larry"); founders.size; // 2 founders.get("twitter"); // undefined founders.has("yahoo"); // false for (var [key, value] of founders) { console.log(key + " founded by " + value); } // "facebook founded by mark" // "google founded by larry"
Sets
ECMAScript 6 introduces sets. Sets are collections of values and can be iterated in the order of the insertion of their elements. An important characteristic about sets is that a value can occur only once in a set.
The following snippet shows some basic operations on sets:
var mySet = new Set(); mySet.add(1); mySet.add("Howdy"); mySet.add("foo"); mySet.has(1); // true mySet.delete("foo"); mySet.size; // 2 for (let item of mySet) console.log(item); // 1 // "Howdy"
We discussed briefly that JavaScript arrays are not really arrays in a traditional sense. In JavaScript, arrays are objects that have the following characteristics:
- The
length
property - The functions that inherit from
Array.prototype
(we will discuss this in the next chapter) - Special handling for keys that are numeric keys
When we write an array index as numbers, they get converted to strings—arr[0]
internally becomes arr["0"]
. Due to this, there are a few things that we need to be aware of when we use JavaScript arrays:
- Accessing array elements by an index is not a constant time operation as it is in, say, C. As arrays are actually key-value maps, the access will depend on the layout of the map and other factors (collisions and others).
- JavaScript arrays are sparse (most of the elements have the default value), which means that the array can have gaps in it. To understand this, look at the following snippet:
var testArr=new Array(3); console.log(testArr);
You will see the output as
[undefined, undefined, undefined]
—undefined
is the default value stored on the array element.
Consider the following example:
var testArr=[]; testArr[3] = 10; testArr[10] = 3; console.log(testArr); // [undefined, undefined, undefined, 10, undefined, undefined, undefined, undefined, undefined, undefined, 3]
You can see that there are gaps in this array. Only two elements have elements and the rest are gaps with the default value. Knowing this helps you in a couple of things. Using the for...in
loop to iterate an array can result in unexpected results. Consider the following example:
var a = []; a[5] = 5; for (var i=0; i<a.length; i++) { console.log(a[i]); } // Iterates over numeric indexes from 0 to 5 // [undefined,undefined,undefined,undefined,undefined,5] for (var x in a) { console.log(x); } // Shows only the explicitly set index of "5", and ignores 0-4
A matter of style
Like the previous chapters, we will spend some time discussing the style considerations while creating arrays.
- Use the literal syntax for array creation:
// bad const items = new Array(); // good const items = [];
- Use
Array#push
instead of a direct assignment to add items to an array:const stack = []; // bad stack[stack.length] = 'pushme'; // good stack.push('pushme');
Summary
As JavaScript matures as a language, its tool chain also becomes more robust and effective. It is rare to see seasoned programmers staying away from libraries such as Underscore.js. As we see more advanced topics, we will continue to explore more such versatile libraries that can make your code compact, more readable, and performant. We looked at regular expressions—they are first-class objects in JavaScript. Once you start understanding RegExp
, you will soon find yourself using more of them to make your code concise. In the next chapter, we will look at JavaScript Object notation and how JavaScript prototypal inheritance is a new way of looking at object-oriented programming.