Mastering Node.js

Understanding the Node Environment

Introduction – JavaScript as a systems language

When John Bardeen, Walter Brattain, and William Shockley invented the transistor in 1947, they changed the world in ways we are still discovering today. From their revolutionary building block, engineers could design and manufacture digital circuits far more complex than those possible earlier. Each decade that followed has seen a new generation of these devices: smaller, faster, and cheaper, often by orders of magnitude.

By the 1970s, corporations and universities could afford mainframe computers small enough to fit in a single room, and powerful enough that they could serve multiple users simultaneously. The minicomputer, a new and different kind of device, needed new and different kinds of technologies to help users get the most out of the machine. Ken Thompson and Dennis Ritchie at Bell Labs developed the operating system Unix, and the programming language C to write it. They built constructs into their system, like processes, threads, streams, and the hierarchical filesystem. Today, these constructs are so familiar, that it's hard to imagine a computer working any other way. However, they're just constructs, made up by these pioneers, with the goal of helping people like us understand the otherwise inscrutable patterns of data in memory and storage inside the machine.

C is a systems language, and it is a safe and powerful shorthand alternative for developers familiar with keying in assembly instructions. Given its familiar setting of a microprocessor, C makes low-level system tasks easy. For instance, you can search a block of memory for a byte of a specific value:

// find-byte.c 
int find_byte(const char *buffer, int size, const char b) {
   for (int i = 0; i < size; i++) {
         if (buffer[i] == b) {
               return i;
         }
   }
   return -1; 
}

By the 1990s, what we could build with transistors had evolved again. A personal computer (PC) was light and cheap enough to be found on workplace and dormitory desktops. Increased speed and capacity allowed users to boot from a character-only teletype to graphical environments, with pretty fonts and color images. And with an Ethernet card and cable, your computer got a static IP address on the internet, where network programs could connect to send and receive data with any other computer on the planet.

It was within this landscape of technology that Sir Tim Berners-Lee invented the World Wide Web, and Brendan Eich created JavaScript. Designed for coders familiar with HTML tags, JavaScript was a way to move beyond static pages of text with animation and interactivity. Given its familiar setting of a webpage, JavaScript makes high-level tasks easy. Web pages are filled with text and tags, so combining two strings is easy:

// combine-text.js
const s1 = "first string";
const s2 = "second string";
let s3 = s1 + s2;

Now, let's port each program to the other language and platform. First, from the preceding combine-text.js, let's write combine-text.c:

// combine-text.c 
const char *s1 = "first string";
const char *s2 = "second string";
int size = strlen(s1) + strlen(s2);
char *buffer = (char *)malloc(size + 1); // One more for the 0x00 byte that terminates strings 
strcpy(buffer, s1);
strcat(buffer, s2);
free(buffer); // Never forget to free memory!

The two string literals are easy to define, but after that, it gets a lot harder. Without automatic memory management, it's your responsibility as a developer to determine how much memory you need, allocate it from the system, write to it without overwriting the buffer, and then free it afterwards.

Secondly, let's attempt the reverse: from the find-byte.c code prior, let's write find-byte.js. Before Node, it was not possible to use JavaScript to search a block of memory for a specific byte. In the browser, JavaScript can't allocate a buffer, and doesn't even have a type for byte. But with Node, it's both possible and easy:

// find-byte.js
function find_byte(buffer, b) {
  let i;
  for (i = 0; i < buffer.length; i++) {
    if (buffer[i] == b) {
      return i;
    }
  }
  return -1; // Not found
}
let buffer = Buffer.from("ascii A is byte value sixty-five", "utf8");
let r = find_byte(buffer, 65); // Find the first byte with value 65
console.log(r); // 6 bytes into the buffer

Emerging from generations of computing decades apart, when both computers and what people were doing with them were wildly different, there's no real reason the design, purpose, or use that drives these two languages, C and JavaScript, should necessarily come together. But they did, because in 2008 Google released Chrome, and in 2009, Ryan Dahl wrote Node.js.

Applying design principles previously only considered for operating systems. Chrome uses multiple processes to render different tabs, ensuring their isolation. Chrome was released open source and built on WebKit, but one part inside was completely new. Coding from scratch in his farmhouse in Denmark, Lars Bak's V8 used hidden class transitions, incremental garbage collection, and dynamic code generation to execute (not interpret) JavaScript faster than ever before.

With V8 under the hood, how fast can Node run JavaScript? Let's write a little program to show execution speed:

// speed-loop.js
function main() {
  const cycles = 1000000000;
  let start = Date.now();
  for (let i = 0; i < cycles; i++) {
    /* Empty loop */
  }
  let end = Date.now();
  let duration = (end - start) / 1000;
  console.log("JavaScript looped %d times in %d seconds", cycles, duration);
}
main();

The following is the output for speed-loop.js:

$ node --version
v9.3.0
$ node speed-loop.js
JavaScript looped 1000000000 times in 0.635 seconds

There's no code in the body of the for loop, but your processor is busy incrementing i, comparing it to cycles, and repeating the process. It's late 2017 as I write this, typing on a MacBook Pro with a 2.8 GHz Intel Core i7 processor. Node v9.3.0 is current, and takes less than a second to loop a billion times.

How fast is pure C? Let's see:

/* speed-loop.c */
#include <stdio.h>
#include <time.h>
int main() {
  int cycles = 1000000000;
  clock_t start, end;
  double duration;
  start = clock();
  for (int i = 0; i < cycles; i++) {
    /* Empty loop */
  }
  end = clock();
  duration = ((double)(end - start)) / CLOCKS_PER_SEC;
  printf("C looped %d times in %lf seconds\n", cycles,duration);
  return 0;
}

The following is the output for speed-loop.c:

$ gcc --version
Apple LLVM version 8.1.0 (clang-802.0.42)
$ gcc speed-loop.c -o speed-loop
$ ./speed-loop
C looped 1000000000 times in 2.398294 seconds

For additional comparison, let's try an interpreted language, like Python:

# speed-loop.py

import time

def main():

  cycles = 1000000000
  start = time.perf_counter()

  for i in range(0, cycles):
    pass # Empty loop

  end = time.perf_counter()
  duration = end - start
  print("Python looped %d times in %.3f seconds" % (cycles, duration))

main()

The following is the output for speed-loop.py:

$ python3 --version
Python 3.6.1
$ python3 speed-loop.py
Python looped 1000000000 times in 31.096 seconds

Node runs code fast enough so that you don't have to worry that your application might be slowed down by the execution speed. You'll still have to think about performance, of course, but constrained by factors beyond language and platform choice, such as algorithms, I/O, and external processes, services, and APIs. As V8 compiles JavaScript rather than interpreting it, Node lets you enjoy high-level language features like automatic memory management and dynamic types, without having to give up the performance of a natively-compiled binary. Earlier, you had to choose one or the other; but now, you can have both. It's great.

Computing in the 1970s was about the microprocessor, and computing in the 1990s was about the web page. Today, in 2017, another new generation of physical computing technology has once again changed our machines. The smartphone in your pocket communicates wirelessly with scalable, pay-as-you-go software services in the cloud. Those services run on virtualized instances of Unix, which in turn run on physical hardware in data centers, some of which are so large they were strategically placed to draw current from a neighboring hydroelectric dam. With such new and different machines as these, we shouldn't be surprised that what's possible for users and what's necessary for developers is also new and different, once again.

Node.js imagines JavaScript as a systems language, like C. On the page, JavaScript can manipulate headers and styles. As a systems language, JavaScript can manipulate memory buffers, processes and streams, and files and sockets. This anachronism, made possible by the performance V8 gives the language, sends it back two decades, transplanting it from the web page to the microprocessor die.

"Node's goal is to provide an easy way to build scalable network programs."

– Ryan Dahl, creator of Node.js

In this book, we will study the techniques professional Node developers use to tackle the software challenges of today. By mastering Node, you are learning how to build the next generation of software. In this chapter, we will explore how a Node application is designed, the shape and texture of its footprint on a server, and the powerful base set of tools and features Node provides for developers. Throughout, we will examine progressively more intricate examples demonstrating how Node's simple, comprehensive, and consistent architecture solves many difficult problems well.

The Unix design philosophy

As a network application scales, the volume of information it must recognize, organize, and maintain increases. This volume, in terms of I/O streams, memory usage, and processor load, expands as more clients connect. This expansion of information volume also burdens the software developer. Scaling issues appear, usually demonstrating a failure to accurately predict the behavior of a large system from the behavior of its smaller predecessors:

Can a data layer designed for storing a few thousand records accommodate a few million?
Are the algorithms used to search a handful of records efficient enough to search many more?
Can this server handle 10,000 simultaneous client connections?

The edge of innovation is sharp and cuts quickly, presenting less time for deliberation precisely when the cost of error is magnified. The shape of objects comprising the whole of an application becomes amorphous and difficult to understand, particularly as ad hoc modifications are made, reactively, in response to dynamic tension in the system. What is described in a specification as a small subsystem may have been patched into so many other systems, that its actual boundaries are misunderstood. When this happens, it becomes impossible to accurately trace the outline of the composite parts of the whole.

Eventually, an application becomes unpredictable. It is dangerous when one cannot predict all future states of an application, or the side effects of change. Any number of servers, programming languages, hardware architectures, management styles, and so on, have attempted to subdue the intractable problem of risk following growth, of failure menacing success. Oftentimes, systems of even greater complexity are sold as the cure. The hold that any one person has on information is tenuous. Complexity follows scale; confusion follows complexity. As resolution blurs, errors happen.

Node chose clarity and simplicity instead, echoing a philosophy from decades earlier:

"Write programs that do one thing and do it well.
Write programs to work together.
Write programs to handle text streams, because that is a universal interface."

-Peter H. Salus, A Quarter-Century of Unix, 1994

From their experiences creating and maintaining Unix, Ken Thompson and Dennis Ritchie came up with a philosophy for how people should best build software. Using this philosophy as his guide, Ryan Dahl made a number of decisions in the design of Node:

Node's design favors simplicity over complexity
Node uses familiar POSIX APIs, rather than attempting an improvement
Node does everything with events, and doesn't need threads
Node leverages the existing C libraries, rather than trying to reimplement their functionality
Node favors text over binary formats

Text streams are the language of Unix programs. JavaScript got good at manipulating text from its beginning as a web scripting language. It's a natural fit.

POSIX

POSIX, the Portable Operating System Interface, defines the standard APIs for Unix. It's adopted in Unix-based operating systems and beyond. The IEEE created and maintains the POSIX standard to enable systems from different manufacturers to be compatible. Write your C program using POSIX APIs on your laptop running macOS, and you'll have an easier time later building it on a Raspberry Pi.

As a common denominator, POSIX is old, simple, and most importantly, well-known to developers of all stripes. To make a new directory in a C program, use this API:

int mkdir(const char *path, mode_t mode);

And here it is in Node:

fs.mkdir(path[, mode], callback)

The Node documentation for the filesystem module starts out by telling the developer, there's nothing new here:

File I/O is provided by simple wrappers around standard POSIX functions.
https://nodejs.org/api/fs.html

For Node, Ryan Dahl implemented proven POSIX APIs, rather than trying to come up with something on his own. While such an attempt might be better in some ways, or some situations, it would lose the instant familiarity that POSIX gives to new Node developers trained in other systems.

In choosing POSIX for the API, Node is in no way limited to the standards from the 1970s. It's easy for anyone to write their own module that calls down to Node's API, while presenting a different one upwards. These fancier alternatives can then compete in a Darwinian quest to prove themselves better than POSIX.

Events for everything

If a program asks the operating system to open a file on the disk, that task might complete right away. Or, it might take a moment for the disk to spin up, or for other file system activity the operating system is working on to finish before it can perform this new request. Tasks that go beyond manipulating the memory of our application's process space to more distant hardware in the computer, network, and internet are not fast or reliable enough to program in the same way. Software designers needed a way to code these tasks, which can be slow and unreliable, without making their applications slow and unreliable as a whole. For systems programmers using languages like C and Java, the standard and accepted tool to use to solve this problem is the thread.

pthread_t my_thread;
int x = 0;
/* Make a thread and have it run my_function(&x) */
pthread_create(&my_thread, NULL, my_function, &x);

If a program asks the user a question, the user might respond right away. Or, the user may take a moment to think before clicking Yes or No. For web developers using HTML and JavaScript, the way to do this is the event as follows:

<button onclick="myFunction()">Click me</button>

At first glance, these two scenarios may seem completely distinct:

In the first, a low-level system is shuttling blocks of memory from program to program, with delays milliseconds can be too big to measure
In the second, the very top surface of a huge stack of software is asking the user a question

Conceptually, however, they're the same. Node's design realizes this, and uses events for both. In Node, there is one thread, bound to an event loop. Deferred tasks are encapsulated, entering and exiting the execution context via callbacks. I/O operations generate evented data streams, and these are piped through a single stack. Concurrency is managed by the system, abstracting thread pools, and simplifying shared access to memory.

Node showed us that JavaScript doesn't need threads to be useful as a systems language. Additionally, by not having threads, JavaScript and Node avoid concurrency issues that create performance and reliability challenges that developers expert in a code base can still have difficulty reasoning about. In Chapter 2, Understanding Asynchronous Event-Driven Programming, we'll go deeper into events, and the event loop.

Extending JavaScript

When he designed Node, JavaScript was not Ryan Dahl's original language choice. Yet, upon exploration, he found a good modern language without opinions on streams, the filesystem, handling binary objects, processes, networking, and other capabilities one would expect to exist in a systems language. JavaScript, strictly limited to the browser, had no use for, and had not implemented, these features.

Guided by the Unix philosophy, Dahl was guided by a few rigid principles:

A Node program/process runs on a single thread, ordering execution through an event loop
Web applications are I/O intensive, so the focus should be on making I/O fast
Program flow is always directed through asynchronous callbacks
Expensive CPU operations should be split off into separate parallel processes, emitting events as results arrive
Complex programs should be assembled from simpler programs

The general principle is, operations must never block. Node's desire for speed (high concurrency) and efficiency (minimal resource usage) demands the reduction of waste. A waiting process is a wasteful process, especially when waiting for I/O.

JavaScript's asynchronous, event-driven design fits neatly into this model. Applications express interest in some future event, and are notified when that event occurs. This common JavaScript pattern should be familiar to you:

Window.onload = function() {
  // When all requested document resources are loaded,
  // do something with the resulting environment
}
element.onclick = function() {
  // Do something when the user clicks on this element
}

The time it will take for an I/O action to complete is unknown, so the pattern is to ask for notification when an I/O event is emitted, whenever that may be, allowing other operations to be completed in the meantime.

Node adds an enormous amount of new functionality to JavaScript. Primarily, the additions provide evented I/O libraries offering the developer system access not available to browser-based JavaScript, such as writing to the filesystem or opening another system process. Additionally, the environment is designed to be modular, allowing complex programs to be assembled out of smaller and simpler components.

Let's look at how Node imported JavaScript's event model, extended it, and used it in the creation of interfaces to powerful system commands.

Events

Many functions in the Node API emit events. These events are instances of events.EventEmitter. Any object can extend EventEmitter, providing Node developers with a simple and uniform way to build tight, asynchronous interfaces to object methods.

The following code sets Node's EventEmitter object as the prototype of a function constructor we define. Each constructed instance has the EventEmitter object exposed to its prototype chain, providing a natural reference to the event API. The counter instance methods emit events, and code after that listens for them. After making a Counter, we listen for the incremented event, specifying a callback Node will call when the event happens. Then, we call the increment twice. Each time, our Counter increments the internal count it holds, and then emits the incremented event. This calls our callback, giving it the current count, which our callback logs:

// File counter.js
// Load Node's 'events' module, and point directly to EventEmitter there
const EventEmitter = require('events').EventEmitter;
// Define our Counter function
const Counter = function(i) { // Takes a starting number
  this.increment = function() { // The counter's increment method
    i++; // Increment the count we hold
    this.emit('incremented', i); // Emit an event named incremented
  }
}
// Base our Counter on Node's EventEmitter
Counter.prototype = new EventEmitter(); // We did this afterwards, not before!
// Now that we've defined our objects, let's see them in action
// Make a new Counter starting at 10
const counter = new Counter(10);
// Define a callback function which logs the number n you give it
const callback = function(n) {
  console.log(n);
}
// Counter is an EventEmitter, so it comes with addListener
counter.addListener('incremented', callback);
counter.increment(); // 11
counter.increment(); // 12

The following is the output for counter.js:

$ node counter.js
11
12

To remove the event listeners bound to counter, use this:

counter.removeListener('incremented', callback).

For consistency with browser-based JavaScript, counter.on and counter.addListener are interchangeable.

Node brought EventEmitter to JavaScript and made it an object your objects can extend. This greatly increases the possibilities available to developers. With EventEmitter, Node can handle I/O data streams in an event-oriented manner, performing long-running tasks while keeping true to Node's principles of asynchronous, non-blocking programming:

// File stream.js
// Use Node's stream module, and get Readable inside
let Readable = require('stream').Readable;
// Make our own readable stream, named r
let r = new Readable;
// Start the count at 0
let count = 0;
// Downstream code will call r's _read function when it wants some data from r
r._read = function() {
  count++;
  if (count > 10) { // After our count has grown beyond 10
    return r.push(null); // Push null downstream to signal we've got no more data
  }
  setTimeout(() => r.push(count + '\n'), 500); // A half second from now, push our count on a line
};
// Have our readable send the data it produces to standard out
r.pipe(process.stdout);

The following is the output for stream.js:

$ node stream.js
1
2
3
4
5
6
7
8
9
10

This example creates a readable stream r, and pipes its output to the standard out. Every 500 milliseconds, code increments a counter and pushes a line of text with the current count downstream. Try running the program yourself, and you'll see the series of numbers appear on your terminal.

On what would be the 11^th count, r pushes null downstream, indicating that it has no more data to send. This closes the stream, and with nothing more to do, Node exits the process.

Subsequent chapters will explain streams in more detail. Here, just note how pushing data onto a stream causes an event to fire, how you can assign a custom callback to handle this event, and how the data flows downstream.

Node consistently implements I/O operations as asynchronous, evented data streams. This design choice enables Node's excellent performance. Instead of creating a thread (or spinning up an entire process) for a long-running task like a file upload that a stream may represent, Node only needs to commit the resources to handle callbacks. Additionally, in the long stretches of time in between the short moments when the stream is pushing data, Node's event loop is free to process other instructions.

As an exercise, re-implement stream.js to send the data r produces to a file instead of the terminal. You'll need to make a new writable stream w, using Node's fs.createWriteStream:

// File stream2file.js
// Bring in Node's file system module
const fs = require('fs');
// Make the file counter.txt we can fill by writing data to writeable stream w
const w = fs.createWriteStream('./counter.txt', { flags: 'w', mode: 0666 });
...
// Put w beneath r instead
r.pipe(w);

Modularity

In his book, The Art of Unix Programming, Eric Raymond proposed the Rule of Modularity:

"Developers should build a program out of simple parts connected by well-defined interfaces, so problems are local, and parts of the program can be replaced in the future versions to support new features. This rule aims to save time on debugging complex code that is complex, long, and unreadable."

Large systems are hard to reason about, especially when the boundaries of internal components are fuzzy, and the interactions between them are complex. This principle of building large systems out of small, simple, and loosely-coupled pieces is a good idea for software and beyond. Physical manufacturing, management theory, education, and government, all have benefited from this design philosophy.

When developers began employing JavaScript for larger and more complex software challenges, they encountered this challenge. There was not yet a good way (and later, no common standard way) to assemble a JavaScript program from smaller ones. For example, you've probably seen HTML pages with tags like these at the top:

<head>
<script src="fileA.js"></script>
<script src="fileB.js"></script>
<script src="fileC.js"></script>
<script src="fileD.js"></script>
...
</head>

This works, but leads to a number of problems:

The page must declare all potential dependencies before any are needed or used. If, while running, your program encounters a situation where it needs an additional dependency, dynamically loading another module is possible, but a separate hack.
The scripts are not encapsulated. Code in every file writes to the same global object. Adding a new dependency may break an earlier one because of a name collision.
fileA cannot address fileB as a collection. An addressable context like fileB.function1 isn't available.

The <script> tag would be a nice place for useful module services such as dependency awareness and version control, but it doesn't have these features.

These difficulties and dangers made creating and using JavaScript modules feel more treacherous than effortless. A good module system with features like encapsulation and versioning can reverse this, encouraging code organization and sharing, and leading to a robust ecosystem of high-quality open source software components.

JavaScript needed a standard way to load and share discreet program modules, and found one in 2009 with the CommonJS Modules specification. Node follows this specification, making it easy to define and share bits of reusable code called modules or packages.

Choosing a delightfully simple design, a package is just a directory of JavaScript files. Metadata about the package, such as its name, version, and software license, lives in an additional file named package.json. The JSON contents of this file are easily both human and machine-readable. Let's take a look:

{
  "name": "mypackage1",
  "version": "0.1.2",
  "dependencies": {
    "jquery": "^3.1.0",
    "bluebird": "^3.4.1",
  },
  "license": "MIT"
}

This package.json defines a package named mypackage1, which depends on two other packages: jQuery and Bluebird. Alongside the package names is a version number. Version numbers follow the Semantic Versioning (SemVer) rules, with a pattern like Major.Minor.Patch. Looking at the incremented version numbers of a package your code has been using, here's what that means:

Major: There's a change in the purpose or outcome of the API. If your code calls an updated function, it may break or produce an unintended result. Figure out what's changed, and determine if it affects your code.
Minor: The package has added functionality, but remains compatible. Run all your tests, and you're good to go. Check out the documentation if you're curious, as there might be new, more advanced parts of the API alongside the functions and objects you're familiar with.
Patch: The package fixed a bug, improved performance, or refactored a little. Run all your tests, and you're good to go.

Packages enable the construction of large systems from many small, interdependent systems. Perhaps even more importantly, packages encourage sharing. More detailed information about SemVer is available in Appendix A, Organizing Your Work Into Modules, where npm and packages are discussed in more depth.

"What I'm describing here is not a technical problem. It's a matter of people getting together and making a decision to step forward and start building up something bigger and cooler together."

– Kevin Dangoor, creator of CommonJS

Not just about modules, CommonJS is actually a whole collection of standards founded with the goal of removing everything that was holding JavaScript back from world domination, open source developer Kris Kowal explained in a 2009 post evangelizing the initiative. He names the first of these impediments as the absence of a good module system. The second? The absence of a standard library, including such systems-level fundamentals as access to the filesystem, manipulation of I/O streams, and types for bytes and blocks of binary data. Today, CommonJS is known for giving JavaScript a module system, while Node is what gave JavaScript systems-level access:

https://arstechnica.com/information-technology/2009/12/commonjs-effort-sets-javascript-on-path-for-world-domination/

CommonJS gave JavaScript packages. With packages, the next thing JavaScript needed was a package manager. Node provided one with npm.

A registry of packages, npm is accessible in two ways. First, at the website www.npmjs.com, you can link to and search for packages, essentially shopping for the right one. Stats that count how many times a package has been downloaded in the last day, week, and month show popularity and usage. Most packages link to a developer profile page and open source code on GitHub, so you can see the code, visualize recent development activity, and judge the reputations of the authors and contributors.

The second way to access npm is through the command-line tool npm, which is installed with Node. Using npm as a traditional package manager for your workstation, you can install packages globally, creating new command-line tools on your shell's path. npm also knows how to create, read, and edit package.json files, and can start you out with a new, empty Node package, add the dependencies it needs, download all the code, and keep everything up to date.

Along with Git and GitHub, npm is now achieving a dream of software development identified in the 1970s: that code could be reused more often, and software projects would be written entirely from scratch less frequently.
Earlier attempts at reaching this goal through version control systems like CVS and Subversion, and open source code sharing websites like SourceForge.net, focused on bigger units of both code and people, and didn't achieve as much.

GitHub and npm took a different approach in two important ways:

Favoring individual developers working alone over communities meeting and discussing, developers could focus more on code and less on conversation
Favoring small, atomic software components over complete applications, encapsulated composition started happening not just at a micro-level of subroutines and objects, but at the more important macroscale of application design

Even documentation is better with the new approach: in a monolithic software application, documentation was too often the afterthought that may or may not have happened after the product shipped.
With components, great documentation is necessary to sell your package to the world, getting it a larger public daily download count, and the social media accounts you keep as a developer of more followers.

In no small part, Node's success is due to the number and quality of packages available to you as a Node developer.

More extensive information on creating and managing Node packages can be found in Appendix A, Organizing Your Work into Modules.

The key design philosophy to follow is this: build programs out of packages where possible, and share those packages when possible. The shape of your applications will be clearer and easier to maintain. Importantly, the efforts of thousands of other developers can be linked into applications via npm, directly by inclusion, and indirectly as shared packages are tested, improved, refactored, and repurposed by members of the Node community.

Contrary to popular belief, npm is not an abbreviation for Node Package Manager, and should never be used or explained as an acronym:
https://docs.npmjs.com/policies/trademark

The network

I/O in the browser is mercilessly hobbled, for very good reasons - if the JavaScript on any given website could access your filesystem, for instance, users could only click links to new sites they trusted, rather than ones they simply wanted to try out. Keeping pages in a limited sandbox, the design of the web made navigating from thing1.com to thing2.com not have the consequences of double-clicking thing1.exe and thing2.exe.

Node, of course, recasts JavaScript in the role of a systems language, giving it direct and unfettered access to operating system kernel objects such as files, sockets, and processes. This lets Node create scalable systems with high I/O requirements. It's likely the first thing you coded in Node was a HTTP server.

Node supports standard network protocols in addition to HTTP, such as TLS/SSL, and UDP. With these tools we can easily build scalable network programs, moving far beyond the comparatively limited AJAX solutions JavaScript developers know from the browser.

Let's write a simple program that sends a UDP packet to another node:

const dgram = require('dgram');
let client = dgram.createSocket("udp4");
let server = dgram.createSocket("udp4");
let message = process.argv[2] || "message";
message = Buffer.from(message);
server
.on('message', msg => {
  process.stdout.write(`Got message: ${msg}\n`);
  process.exit();
})
.bind(41234);
client.send(message, 0, message.length, 41234, "localhost");

Go ahead and open two terminal windows and navigate each to your code bundle for Chapter 8, Scaling Your Application, under the /udp folder. We're now going to run a UDP server in one window, and a UDP client in another.

In the right window, run receive.js with a command like the following:

$ node receive.js

On the left, run send.js with a command, as follows:

$ node send.js

Executing that command will cause the message to appear on the right:

$ node receive.js
Message received!

A UDP server is an instance of EventEmitter, emitting a message event when messages are received on the bound port. With Node, you can use JavaScript to write your application at the I/O level, moving packets and streams of binary data with ease.

Let's continue to explore I/O, the process object, and events. First, let's dig into the machine powering Node's core, V8.

V8, JavaScript, and optimizations

V8 is Google's JavaScript engine, written in C++. It compiles and executes JavaScript code inside of a VM (Virtual Machine). When a webpage loaded into Google Chrome demonstrates some sort of dynamic effect, like automatically updating a list or news feed, you are seeing JavaScript, compiled by V8, at work.

V8 manages Node's main process thread. When executing JavaScript, V8 does so in its own process, and its internal behavior is not controlled by Node. In this section, we will investigate the performance benefits that can be had by playing with these options, learning how to write optimizable JavaScript, and the cutting-edge JavaScript features available to users of the latest Node versions (such as 9.x, the version we use in this book).

Flags

There are a number of settings available to you for manipulating the Node runtime. Try this command:

$ node -h

In addition to standards such as --version, you can also flag Node to --abort-on-uncaught-exception.

You can also list the options available for v8:

$ node --v8-options

Some of these settings can save the day. For example, if you are running Node in a restrained environment like a Raspberry Pi, you might want to limit the amount of memory a Node process can consume, to avoid memory spikes. In that case, you might want to set the --max_old_space_size (by default ~1.5GB) to a few hundred MB.

You can use the -e argument to execute a Node program as a string; in this case, logging out of the version of V8 your copy of Node contains:

$ node –e "console.log(process.versions.v8)"

It's worth your time to experiment with Node/V8 settings, both for their utility and the path, to give you a slightly stronger understanding of what is happening (or might happen) under the hood.

Optimizing your code

The simple optimizations of smart code design can really help you. Traditionally, JavaScript developers working in browsers did not need to concern themselves with memory usage optimizations, having quite a lot to use for what were typically uncomplicated programs. On a server, this is no longer the case. Programs are generally more complicated, and running out of memory takes down your server.

The convenience of a dynamic language is in avoiding the strictness that compiled languages impose. For example, you need not explicitly define object property types, and can actually change those property types at will. This dynamism makes traditional compilation impossible, but opens up some interesting new opportunities for exploratory languages such as JavaScript. Nevertheless, dynamism introduces a significant penalty in terms of execution speeds when compared to statically compiled languages. The limited speed of JavaScript has regularly been identified as one of its major weaknesses.

V8 attempts to achieve the sorts of speeds one observes for compiled languages for JavaScript. V8 compiles JavaScript into native machine code, rather than interpreting bytecode, or using other just-in-time techniques. Because the precise runtime topology of a JavaScript program cannot be known ahead of time (the language is dynamic), compilation consists of a two-stage, speculative approach:

Initially, a first-pass compiler (the full compiler) converts your code into a runnable state as quickly as possible. During this step, type analysis and other detailed analysis of the code is deferred, prioritizing fast compilation – your JavaScript can begin executing as close to instantly as possible. Further optimizations are accomplished during the second step.
Once the program is up and running, an optimizing compiler then begins its job of watching how your program runs, and attempting to determine its current and future runtime characteristics, optimizing and re-optimizing as necessary. For example, if a certain function is being called many thousands of times with similar arguments of a consistent type, V8 will re-compile that function with code optimized on the optimistic assumption that future types will be like the past types. While the first compile step was conservative with as-yet unknown and un-typed functional signature, this hot function's predictable texture impels V8 to assume a certain optimal profile and re-compile based on that assumption.

Assumptions help us make decisions more quickly, but can lead to mistakes. What if the hot function V8's compiler just optimized against a certain type signature is now called with arguments violating that optimized profile? V8 has no choice, in that case: it must de-optimize the function. V8 must admit its mistake and roll back the work it has done. It will re-optimize in the future if a new pattern is seen. However, if V8 must again de-optimize at a later time, and if this optimize/de-optimize binary switching continues, V8 will simply give up, and leave your code in a de-optimized state.

Let's look at some ways to approach the design and declaration of arrays, objects, and functions, so that you are helping, rather than hindering the compiler.

Numbers and tracing optimization/de-optimization

The ECMA-262 specification defines the Number value as a "primitive value corresponding to a double-precision 64-bit binary format IEEE 754 value". The point is that there is no Integer type in JavaScript; there is a Number type defined as a double-precision floating-point number.

V8 uses 32-bit numbers for all values internally, for performance reasons that are too technical to discuss here. It can be said that one bit is used to point to another 32-bit number, should greater width be needed. Regardless, it is clear that there are two types of values tagged as numbers by V8, and switching between these types will cost you something. Try to restrict your needs to 31-bit signed Integers where possible.
Because of the type ambiguity of JavaScript, switching the types of numbers assigned to a slot is allowed. For example, the following code does not throw an error:

let a = 7;
a = 7.77;

However, a speculative compiler like V8 will be unable to optimize this variable assignment, given that its guess that a will always be an Integer turned out to be wrong, forcing de-optimization.

We can demonstrate the optimization/de-optimization process by setting some powerful V8 options, executing V8 native commands in your Node program, and tracing how v8 optimizes/de-optimizes your code.

Consider the following Node program:

// program.js
let someFunc = function foo(){}
console.log(%FunctionGetName(someFunc));

If you try to run this normally, you will receive an Unexpected Token error – the modulo (%) symbol cannot be used within an identifier name in JavaScript. What is this strange method with a % prefix? It is a V8 native command, and we can turn on execution of these types of functions by using the --allow-natives-syntax flag:

node --allow-natives-syntax program.js
// 'someFunc', the function name, is printed to the console.

Now, consider the following code, which uses native functions to assert information about the optimization status of the square function, using the %OptimizeFunctionOnNextCall native method:

let operand = 3;
function square() {
    return operand * operand;
}
// Make first pass to gather type information
square();
// Ask that the next call of #square trigger an optimization attempt;
// Call
%OptimizeFunctionOnNextCall(square);
square();

Create a file using the previous code, and execute it using the following command: node --allow-natives-syntax --trace_opt --trace_deopt myfile.js. You will see something like the following returned:

 [deoptimize context: c39daf14679]
 [optimizing: square / c39dafca921 - took 1.900, 0.851, 0.000 ms]

We can see that V8 has no problem optimizing the square function, as operand is declared once and never changed. Now, append the following lines to your file and run it again:

%OptimizeFunctionOnNextCall(square);
operand = 3.01;
square();

On this execution, following the optimization report given earlier, you should now receive something like the following:

**** DEOPT: square at bailout #2, address 0x0, frame size 8
 [deoptimizing: begin 0x2493d0fca8d9 square @2]
 ...
 [deoptimizing: end 0x2493d0fca8d9 square => node=3, pc=0x29edb8164b46, state=NO_REGISTERS, alignment=no padding, took 0.033 ms]
 [removing optimized code for: square]

This very expressive optimization report tells the story very clearly: the once-optimized square function was de-optimized following the change we made in one number's type. You are encouraged to spend some time writing code and testing it using these methods.

Objects and arrays

As we learned when investigating numbers, V8 works best when your code is predictable. The same holds true with arrays and objects. Nearly all of the following bad practices are bad for the simple reason that they create unpredictability.

Remember that in JavaScript, an object and an array are very similar under the hood (resulting in strange rules that provide no end of material for those poking fun at the language!). We won't be discussing those differences, only the important similarities, specifically in terms of how both these data constructs benefit from similar optimization techniques.

Avoid mixing types in arrays. It is always better to have a consistent data type, such as all integers or all strings. As well, avoid changing types in arrays, or in property assignments after initialization if possible. V8 creates blueprints of objects by creating hidden classes to track types, and when those types change the optimization, blueprints will be destroyed and rebuilt—if you're lucky. Visit https://github.com/v8/v8/wiki/Design%20Elements for more information.

Don't create arrays with gaps, such as the following:

let a = [];
a[2] = 'foo';
a[23] = 'bar';

Sparse arrays are bad for this reason: V8 can either use a very efficient linear storage strategy to store (and access) your array data, or it can use a hash table (which is much slower). If your array is sparse, V8 must choose the least efficient of the two. For the same reason, always start your arrays at the zero index. As well, do not ever use delete to remove elements from an array. You are simply inserting an undefined value at that position, which is just another way of creating a sparse array. Similarly, be careful about populating an array with empty values—ensure that the external data you are pushing into an array is not incomplete.

Try not to preallocate large arrays—grow as you go. Similarly, do not preallocate an array and then exceed that size. You always want to avoid spooking V8 into turning your array into a hash table. V8 creates a new hidden class whenever a new property is added to an object constructor. Try to avoid adding properties after an object is instantiated. Initialize all members in constructor functions in the same order. Same properties + same order = same object.

Remember that JavaScript is a dynamic language that allows object (and object prototype) modifications after instantiation. Since the shape and volume of an object can, therefore, be altered after the fact, how does V8 allocate memory for objects? It makes some reasonable assumptions. After a set number of objects are instantiated from a given constructor (I believe 8 is the trigger amount), the largest of these is assumed to be of the maximum size, and all further instances are allocated that amount of memory (and the initial objects are similarly resized). A total of 32 fast property slots, inclusive, are then allocated to each instance based on this assumed maximum size. Any extra properties are slotted into a (slower) overflow property array, which can be resized to accommodate any further new properties.

With objects, as with arrays, try to define as much as possible the shape of your data structures in a futureproof manner, with a set number of properties, of types, and so on.

Functions

Functions are typically called often, and should be one of your prime optimization focuses. Functions containing try-catch constructs are not optimizable, nor are functions containing other unpredictable constructs, like with or eval. If, for some reason, your function is not optimizable, keep its use to a minimum.

A very common optimization error involves the use of polymorphic functions. Functions that accept variable function arguments will be de-optimized. Avoid polymorphic functions.

An excellent explanation of how V8 performs speculative optimization can be found here: https://ponyfoo.com/articles/an-introduction-to-speculative-optimization-in-v8

Optimized JavaScript

The JavaScript language is in constant flux, and some major changes and improvements have begun to find their way into native compilers. The V8 engine used in the latest Node builds supports nearly all of the latest features. Surveying all of these is beyond the scope of this chapter. In this section, we'll mention a few of the most useful updates and how they might be used to simplify your code, helping to make it easier to understand and reason about, to maintain, and perhaps even become more performant.

We will be using the latest JavaScript features throughout this book. You can use Promises, Generators, and async/await constructs as of Node 8.x, and we will be using those throughout the book. These concurrency operators will be discussed at depth in Chapter 2, Understanding Asynchronous Event-Driven Programming, but a good takeaway for now is that the callback pattern is losing its dominance, and the Promise pattern in particular is coming to dominate module interfaces.

In fact, a new method util.promisify was recently added to Node's core, which converts a callback-based function to a Promise-based one:

const {promisify} = require('util');
const fs = require('fs');

// Promisification happens here
let readFileAsync = promisify(fs.readFile);

let [executable, absPath, target, ...message] = process.argv;

console.log(message.length ? message.join(' ') : `Running file ${absPath} using binary ${executable}`);

readFileAsync(target, {encoding: 'utf8'})
.then(console.log)
.catch(err => {
  let message = err.message;
  console.log(`
    An error occurred!
    Read error: ${message}
  `);
});

Being able to easily promisify fs.readFile is very useful.

Did you notice any other new JavaScript constructs possibly unfamiliar to you?

Help with variables

You'll be seeing let and const throughout this book. These are new variable declaration types. Unlike var, let is block scoped; it does not apply outside of its containing block:

let foo = 'bar';

if(foo == 'bar') {
    let foo = 'baz';
    console.log(foo); // 1st
}
console.log(foo); // 2nd

// baz
// bar
// If we had used var instead of let:
// baz
// baz

For variables that will never change, use const, for constant. This is helpful for the compiler as well, as it can optimize more easily if a variable is guaranteed never to change. Note that const only works on assignment, where the following is illegal:

const foo = 1;
foo = 2; // Error: assignment to a constant variable

However, if the value is an object, const doesn't protect members:

const foo = { bar: 1 }
console.log(foo.bar) // 1
foo.bar = 2;
console.log(foo.bar) // 2

Another powerful new feature is destructuring, which allows us to easily assign the values of arrays to new variables:

let [executable, absPath, target, ...message] = process.argv;

Destructuring allows you to rapidly map arrays to variable names. Since process.argv is an array, which always contains the path to the Node executable and the path to the executing file as the first two arguments, we can pass a file target to the previous script by executing node script.js /some/file/path, where the third argument is assigned to the target variable.

Maybe we also want to pass a message with something like this:

node script.js /some/file/path This is a really great file!

The problem here is that This is a really great file! is space-separated, so it will be split into the array on each word, which is not what we want:

[... , /some/file/path, This, is, a, really, great, file!]

The rest pattern comes to the rescue here: the final argument ...message collapses all remaining destructured arguments into a single array, which we can simply join(' ') into a single string. This also works for objects:

let obj = {
    foo: 'foo!',
    bar: 'bar!',
    baz: 'baz!'
};

// assign keys to local variables with same names
let {foo, baz} = obj;

// Note that we "skipped" #bar
console.log(foo, baz); // foo! baz!

This pattern is especially useful for processing function arguments. Prior to rest parameters, you might have been grabbing function arguments in this way:

function (a, b) {
    // Grab any arguments after a & b and convert to proper Array
    let args = Array.prototype.slice.call(arguments, f.length);
}

This was necessary previously, as the arguments object was not a true Array. In addition to being rather clumsy, this method also triggers de-optimization in compilers like V8.

Now, you can do this instead:

function (a, b, ...args) {
    // #args is already an Array!
}

The spread pattern is the rest pattern in reverse—you expand a single variable into many:

const week = ['mon','tue','wed','thur','fri'];
const weekend = ['sat','sun'];

console.log([...week, ...weekend]); // ['mon','tue','wed','thur','fri','sat','sun']

week.push(...weekend);
console.log(week); // ['mon','tue','wed','thur','fri','sat','sun']

Arrow functions

Arrow functions allow you to shorten function declarations, from function() {} to simply () => {}. Indeed, you can replace a line like this:

SomeEmitter.on('message', function(message) { console.log(message) });

To:

SomeEmitter.on('message', message => console.log(message));

Here, we lose both the brackets and curly braces, and the tighter code works as expected.

Another important feature of arrow functions is they are not assigned their own this—arrow functions inherit this from the call site. For example, the following code does not work:

function Counter() {
    this.count = 0;

    setInterval(function() {
        console.log(this.count++);
    }, 1000);
}

new Counter();

The function within setInterval is being called in the context of setInterval, rather than the Counter object, so this does not have any reference to count. That is, at the function call site, this is a Timeout object, which you can check yourself by adding console.log(this) to the prior code.

With arrow functions, this is assigned at the point of definition. Fixing the code is easy:

setInterval(() => { // arrow function to the rescue!
  console.log(this);
  console.log(this.count++);
}, 1000);
// Counter { count: 0 }
// 0
// Counter { count: 1 }
// 1
// ...

String manipulation

Finally, you will see a lot of backticks in the code. This is the new template literal syntax, and along with other things, it (finally!) makes working with strings in JavaScript much less error-prone and tedious. You saw in the example how it is now easy to express multiline strings (avoiding 'First line\n' + 'Next line\n' types of constructs). String interpolation is similarly improved:

let name = 'Sandro';
console.log('My name is ' + name);
console.log(`My name is ${name}`);
// My name is Sandro
// My name is Sandro

This sort of substitution is especially effective when concatenating many variables, and since the contents of each ${expression} can be any JavaScript code:

console.log(`2 + 2 = ${2+2}`)  // 2 + 2 = 4

You can also use repeat to generate strings: 'ha'.repeat(3) // hahaha.

Strings are now iterable. Using the new for...of construct, you can pluck apart a string character by character:

for(let c of 'Mastering Node.js') {
    console.log(c);
    // M
    // a
    // s
    // ...
}

Alternatively, use the spread operator:

console.log([...'Mastering Node.js']);
// ['M', 'a', 's',...]

Searching is also easier. New methods allow common substring seeks without much ceremony:

let targ = 'The rain in Spain lies mostly on the plain';
console.log(targ.startsWith('The', 0)); // true
console.log(targ.startsWith('The', 1)); // false
console.log(targ.endsWith('plain')); // true
console.log(targ.includes('rain', 5)); // false

The second argument to these methods indicates a search offset, defaulting to 0. The is found at position 0, so beginning the search at position 1 fails in the second case.

Great, writing JavaScript programs just got a little easier. The next question is what's going on when that program is executed within a V8 process?

The process object

Node's process object provides information on and control over the current running process. It is an instance of EventEmitter is accessible from any scope, and exposes very useful low-level pointers. Consider the following program:

const size = process.argv[2];
const n = process.argv[3] || 100;
const buffers = [];
let i;
for (i = 0; i < n; i++) {
  buffers.push(Buffer.alloc(size));
  process.stdout.write(process.memoryUsage().heapTotal + "\n");
}

Have Node run process.js with a command like this:

$ node process.js 1000000 100

The program gets the command-line arguments from process.argv, loops to allocate memory, and reports memory usage back to standard out. Instead of logging back to the terminal, you could stream output to another process, or a file:

$ node process.js 1000000 100 > output.txt

A Node process begins by constructing a single execution stack, with the global context forming the base of the stack. Functions on this stack execute within their own local context (sometimes referred to as scope), which remains enclosed within the global context. This way of keeping the execution of a function together with the environment the function runs in is called closure. Because Node is evented, any given execution context can commit the running thread to handling an eventual execution context. This is the purpose of callback functions.

Consider the following schematic of a simple interface for accessing the filesystem:

If we were to instantiate Filesystem and call readDir, a nested execution context structure would be created:

(global (fileSystem (readDir (anonymous function) ) ) )

Inside Node, a C library named libuv creates and manages the event loop. It connects to low-level operating system kernel mode objects that can produce events, such as timers that go off, sockets that receive data, files that open for reading, and child processes that complete. It loops while there are still events to process, and calls callbacks associated with events. It does this at a very low level, and with a very performant architecture. Written for Node, libuv is now a building block of a number of software platforms and languages.

The concomitant execution stack is introduced to Node's single-process thread. This stack remains in memory until libuv reports that fs.readdir has completed, at which point the registered anonymous callback fires, resolving the sole pending execution context. As no further events are pending, and the maintenance of closures no longer necessary, the entire structure can be safely torn down (in reverse, beginning with anonymous), and the process can exit, freeing any allocated memory. This method of building up and tearing down a single stack is what Node's event loop is ultimately doing.

Filter reviews by

All

Amazon verified reviews

Darrel O'Pry Feb 01, 2018

This in depth book on Node.js 9.x provides a takes a deep dive into the history, internals, use cases, and API's of Node.KS. More importantly, the updated edition adds insight into the new features of 9.x such as async/await, gives an excellent overview of micro-services architecture, walks through working with Kubernetes locally to speed up development and prepare for production deployments, and takes a run at debugging and testing. This is a great read for new Node.JS developers, and experienced developers looking to get up to speed with Node 9.x and current best practices.

Amazon Verified review

Cheung Wai Lok Oct 10, 2018

It seems to be the best nodejs book i have ever discovered

poplinr Feb 01, 2018

I remember buying the first edition of this book back in 2013 when I didn’t know what the hell I was doing the software world. Reading through a few chapters of this second edition, I was impressed at the level of details, concepts and the fact that this book continues to focus on mastering the core platform that is Node.js before moving into higher level concepts on top of the platform.I will also note that this Packt title is written damn well and flows properly which is a requirement for my expectations at this point.

Arthur Jose Germano Apr 09, 2018

This is a great book, full of what I think is the edge of Node coding and way of think.The only two things that is not great is this book are:- First: It doesn't go all the way with examples. I had to download the code. But I think when you are following a way of thinking it's best to show you all the way, not just part of it. Because if I am lost I can turn back and start over. Which was not possible in several examples.- Second: The use of paid examples. I don't have accounts in amazon, twillio, or digital ocean, which makes the examples provided hard to follow. As I said previosly in the first point, I would prefer to follow it all through.But that is just my opinion. Besides that, I recommend it very much. Very rich and the autor show his incredible mastery of Node.If you want to improve your knowlodge in Node... go for it buying it !

gzelenak36 Dec 16, 2019

I liked topic about modules it gave me plus info though other JS articles describe the theme very well.I found this book very useful.

Mastering Node.js: Build robust and scalable real-time server-side web applications efficiently , Second Edition

What do you get with eBook?

Mastering Node.js

Understanding the Node Environment

Introduction – JavaScript as a systems language

The Unix design philosophy

POSIX

Events for everything

Standard libraries

Extending JavaScript

Events

Modularity

The network

V8, JavaScript, and optimizations

Flags

Optimizing your code

Numbers and tracing optimization/de-optimization

Objects and arrays

Functions

Optimized JavaScript

Help with variables

Arrow functions

String manipulation

The process object

Page 1 of 8

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with eBook?

Product Details

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

Filter reviews by

People who bought this also bought

About the authors

FAQs

Mastering Node.js: Build robust and scalable real-time server-side web applications efficiently , Second Edition

What do you get with eBook?

Contact Details

Billing Address

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with eBook?

Contact Details

Billing Address

Product Details

Packt Subscriptions

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

Filter reviews by

People who bought this also bought

About the authors

FAQs