Understanding Node's unique design
I/O operations (disk and network) are clearly more expensive. The following table shows clock cycles consumed by typical system tasks (from Ryan Dahl's original presentation of Node—https://www.youtube.com/watch?v=ztspvPYybIY):
L1-cache |
3 cycles |
L2-cache |
14 cycles |
RAM |
250 cycles |
Disk |
41,000,000 cycles |
Network |
240,000,000 cycles |
The reasons are clear enough: a disk is a physical device, a spinning metal platter—storing and retrieving that data is much slower than moving data between solid-state devices (such as microprocessors and memory chips) or indeed optimized on-chip L1/L2 caches. Similarly, data does not move from point to point on a network instantaneously. Light itself needs 0.1344 seconds to circle the globe! In a network used by many billions of people regularly interacting across great distances at speeds much slower than the speed of light, with many detours and few straight lines, this sort of latency builds up.
When our software ran on personal computers on our desks, little or no communication was happening over the network. Delays or hiccups in our interactions with a word processor or spreadsheet had to do with disk access time. Much work was done to improve disk access speeds. Data storage and retrieval became faster, software became more responsive, and users now expect this responsiveness in their tools.
With the advent of cloud computing and browser-based software, your data has left the local disk and exists on a remote disk, and you access this data via a network—the Internet. Data access times have slowed down again, dramatically. Network I/O is slow. Nevertheless, more companies are migrating sections of their applications into the cloud, with some software being entirely network-based.
Node is designed to make I/O fast. It is designed for this new world of networked software, where data is in many places and must be assembled quickly. Many of the traditional frameworks to build web applications were designed at a time when a single user working on a desktop computer used a browser to periodically make HTTP requests to a single server running a relational database. Modern software must anticipate tens of thousands of simultaneously connected clients concurrently altering enormous, shared data pools via a variety of network protocols on any number of unique devices. Node is designed specifically to help those building that kind of network software.
What do concurrency, parallelism, asynchronous execution, callbacks, and events mean to the Node developer?
Concurrency
Running code procedurally, or in order, is a reasonable idea. We tend to do that when we execute tasks and, for a long time, programming languages were naturally procedural. Clearly, at some point, the instructions you send to a processor must be executed in a predictable order. If I want to multiply 8 by 6, divide that result by 144 divided by 12, and then add the total result to 10, the order of those operations must proceed sequentially:
( (8x6) / (144/12) ) + 10
The order of operations must not be as follows:
(8x6) / ( (144/12) + 10 )
This is logical and easy to understand. Early computers typically had one processor, and processing one instruction blocked the processing of subsequent instructions. But things did not stay that way, and we have moved far beyond single-core computers.
If you think about the previous example, it should be obvious that calculating 144/12
and 8x6
can be done independently—one need not wait for the other. A problem can be divided into smaller problems and distributed across a pool of available people or workers to work on in parallel, and the results can be combined into a correctly ordered final calculation.
Multiple processes, each solving one part of a single mathematical problem simultaneously, are an example of parallelism.
Rob Pike, co-inventor of Google's Go programming language, defines concurrency in this way:
"Concurrency is a way to structure a thing so that you can, maybe, use parallelism to do a better job. But parallelism is not the goal of concurrency; concurrency's goal is a good structure."
Concurrency is not parallelism. A system demonstrating concurrency allows developers to compose applications as if multiple independent processes are simultaneously executing many possibly related things. Successful high-concurrency application development frameworks provide an easy-to-reason-about vocabulary to describe and build such a system.
Node's design suggests that achieving its primary goal—to provide an easy way to build scalable network programs—includes simplifying how the execution order of coexisting processes is structured and composed. Node helps a developer reasoning about a program, within which many things are happening at once (such as serving many concurrent clients), to better organize his or her code.
Let's take a look at the differences between parallelism and concurrency, threads and processes, and the special way that Node absorbs the best parts of each into its own unique design.
Parallelism and threads
The following diagram describes how a traditional microprocessor might execute the simple program discussed previously:
The program is broken up into individual instructions that are executed in order. This works but does require that instructions be processed in a serial fashion, and, while any one instruction is being processed, subsequent instructions must wait. This is a blocking process—executing any one segment of this chain blocks the execution of subsequent segments. There is a single thread of execution in play.
However, there is some good news. The processor has (literally) total control of the board, and there is no danger of another processor nulling memory or overriding any other state that this primary processor might manipulate. Speed is sacrificed for stability and safety.
We do like speed; however, the model discussed earlier rapidly became obsolete as chip designers and systems programmers worked to introduce parallel computing. Rather than having one blocking thread, the goal was to have multiple cooperating threads.
This improvement definitely increased the speed of calculation but introduced some problems, as described in the following schematic:
This diagram illustrates cooperating threads executing in parallel within a single process, which reduces the time necessary to perform the given calculation. Distinct threads are employed to break apart, solve, and compose a solution. As many subtasks can be completed independently, the overall completion time can be reduced dramatically.
Threads provide parallelism within a single process. A single thread represents a single sequence of (serially executed) instructions. A process can contain any number of threads.
Difficulties arise out of the complexity of thread synchronization. It is very difficult to model highly concurrent scenarios using threads, especially models in which the state is shared. It is difficult to anticipate all the ways in which an action taken in one thread will affect all the others if it is never clear when an asynchronously executing thread will complete:
- The shared memory and the locking behavior this requires lead to systems that are very difficult to reason about as they grow in complexity.
- Communication between tasks requires the implementation of a wide range of synchronization primitives, such as mutexes and semaphores, condition variables, and so on. An already challenging environment requires highly complex tools, expanding the level of expertise necessary to complete even relatively simple systems.
- Race conditions and deadlocks are a common pitfall in these sorts of systems. Contemporaneous read/write operations within a shared program space lead to problems of sequencing, where two threads may be in an unpredictable race for the right to influence a state, event, or other key system characteristic.
- Because maintaining dependable boundaries between threads and their states is so difficult, ensuring that a library (for Node, it would be a package or module) is thread safe occupies a great deal of the developer's time. Can I know that this library will not destroy some part of my application? Guaranteeing thread safety requires great diligence on the part of a library's developer, and these guarantees may be conditional: for example, a library may be thread safe when reading—but not when writing.
We want the power of parallelization provided by threads but could do without the mind-bending world of semaphores and mutexes. In the Unix world, there is a concept that is sometimes referred to as the Rule of Simplicity: Developers should design for simplicity by looking for ways to break up program systems into small, straightforward cooperating pieces. This rule aims to discourage developers' affection for writing 'intricate and beautiful complexities' that are, in reality, bug-prone programs.
Concurrency and processes
Parallelism within a single process is a complicated illusion that is achieved deep within mind-bendingly complex chipsets and other hardware. The question is really about appearances—about how the activity of the system appears to, and can be programmed by, a developer. Threads offer hyper-efficient parallelism, but make concurrency difficult to reason about.
Rather than have the developer struggle with this complexity, Node itself manages I/O threads, simplifying this complexity by demanding only that control flow be managed between events. There is a need to micromanage I/O threading; one simply designs an application to establish data availability points (callbacks) and the instructions to be executed once the said data is available. A single stream of instructions that explicitly takes and relinquishes control in a clear, collision-free, and predictable way aids development:
- Instead of concerning themselves with arbitrary locking and other collisions, developers can focus on constructing execution chains, the ordering of which is predictable.
- Parallelization is accomplished through the use of multiple processes, each with an individual and distinct memory space, due to which communication between processes remains uncomplicated—via the Rule of Simplicity, we achieve not only simple and bug-free components, but also easier interoperability.
- The state is not (arbitrarily) shared between individual Node processes. A single process is automatically protected from surprise visits from other processes bent on memory reallocation or resource monopolization. Communication is through clear channels using basic protocols, all of which make it very hard to write programs that make unpredictable changes across processes.
- Thread safety is one less concern for developers to waste time worrying about. Because single-threaded concurrency obviates the collisions present in multithreaded concurrency, development can proceed more quickly and on surer ground.
A single thread describing asynchronous control flow efficiently managed by an event loop brings stability, maintainability, readability, and resilience to Node programs. The big news is that Node continues to deliver the speed and power of multithreading to its developers—the brilliance of Node's design makes such power transparent, reflecting one part of Node's stated aim of bringing the most power to the most people with the least difficulty.
Events
Many JavaScript extensions in Node emit events. These are instances of events.EventEmitter
. Any object can extend EventEmitter
, which gives the developer an elegant toolkit to build tight, asynchronous interfaces to their object methods.
Work through this example demonstrating how to set an EventEmitter
object as the prototype of a function constructor. As each constructed instance now has the EventEmitter
object exposed to its prototype chain, this
provides a natural reference to the event's Application Programming Interface (API). The counter
instance methods can, therefore, emit events, and these can be listened for. Here, we emit the latest count whenever the counter.increment
method is called and bind a callback to the "incremented" event, which simply prints the current counter value to the command line:
var EventEmitter = require('events').EventEmitter; var util = require('util'); var Counter = function(init) { this.increment = function() { init++; this.emit('incremented', init); } } util.inherits(Counter, EventEmitter); var counter = new Counter(10); var callback = function(count) { console.log(count); } counter.addListener('incremented', callback); counter.increment(); // 11 counter.increment(); // 12
Tip
Downloading the example code
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
To remove the event listeners bound to counter
, use counter.removeListener('incremented', callback)
.
EventEmitter
, as an extensible object, adds to the expressiveness of JavaScript. For example, it allows I/O data streams to be handled in an event-oriented manner in keeping with Node's principle of asynchronous, nonblocking programming:
var stream = require('stream'); var Readable = stream.Readable; var util = require('util'); var Reader = function() { Readable.call(this); this.counter = 0; } util.inherits(Reader, Readable); Reader.prototype._read = function() { if(++this.counter > 10) { return this.push(null); } this.push(this.counter.toString()); }; // When a #data event occurs, display the chunk. // var reader = new Reader(); reader.setEncoding('utf8'); reader.on('data', function(chunk) { console.log(chunk); }); reader.on('end', function() { console.log('--finished--'); });
In this program, we have a Readable
stream pushing out a set of numbers—with listeners on that stream's data event catching numbers as they are emitted and logging them—and finishing with a message when the stream has ended. It is plain that the listener is called once per number, which means that running this set did not block the event loop. Because Node's event loop need only commit resources to handling callbacks, many other instructions can be processed in the downtime of each event.
The event loop
The code seen in non-networked software is often synchronous or blocking. I/O operations in the following pseudo-code are also blocking:
variable = produceAValue() print variable // some value is output when #produceAValue is finished.
The following iterator will read one file at a time, dump its contents, and then read the next until it is done:
fileNames = ['a','b','c'] while(filename = fileNames.shift()) { fileContents = File.read(filename) print fileContents } // > a // > b // > c
This is a fine model for many cases. However, what if these files are very large? If each takes 1 second to fetch, all will take 3 seconds to fetch. The retrieval on one file is always waiting on another retrieval to finish, which is inefficient and slow. Using Node, we can initiate file reads on all files simultaneously:
var fs = require('fs'); var fileNames = ['a','b','c']; fileNames.forEach(function(filename) { fs.readFile(filename, {encoding:'utf8'}, function(err, content) { console.log(content); }); }); // > b // > a // > c
The Node version will read all three files at once, each call to fs.readFile
returning its result at some unknowable point in the future. This is why we can't always expect the files to be returned in the order they were arrayed. We can expect that all three will be returned in roughly the time it took for one to be retrieved—something less than 3 seconds. We have traded a predictable execution order for speed, and, as with threads, achieving synchronization in concurrent environments requires extra work. How do we manage and describe unpredictable data events so that our code is both easy to understand and efficient?
The key design choice made by Node's designers was the implementation of an event loop as a concurrency manager. The following description of event-driven programming (taken from http://www.princeton.edu/~achaney/tmve/wiki100k/docs/Event-driven_programming.html) clearly not only describes the event-driven paradigm, but also introduces us to how events are handled in Node and how JavaScript is an ideal language for such a paradigm:
"In computer programming, event-driven programming or event-based programming is a programming paradigm in which the flow of the program is determined by events—that is, sensor outputs or user actions (mouse clicks, key presses) or messages from other programs or threads.
Event-driven programming can also be defined as an application architecture technique in which the application has a main loop that is clearly divided down to two sections: the first is event selection (or event detection), and the second is event handling […]
Event-driven programs can be written in any language although the task is easier in languages that provide high-level abstractions, such as closures."
As we've seen in the preceding quote, single-threaded execution environments block and can, therefore, run slowly. V8 provides a single thread of execution for JavaScript programs.
How can this single thread be made more efficient?
Node makes a single thread more efficient by delegating many blocking operations to OS subsystems to process, bothering the main V8 thread only when there is data available for use. The main thread (your executing Node program) expresses interest in some data (such as via fs.readFile
) by passing a callback and is notified when that data is available. Until that data arrives, no further burden is placed on V8's main JavaScript thread. How? Node delegates I/O work to libuv, as quoted at http://nikhilm.github.io/uvbook/basics.html#event-loops:
"In event-driven programming, an application expresses interest in certain events and responds to them when they occur. The responsibility of gathering events from the operating system or monitoring other sources of events is handled by libuv, and the user can register callbacks to be invoked when an event occurs."
The user in the preceding quote is the Node process executing a JavaScript program. Callbacks are JavaScript functions, and managing callback invocation for the user is accomplished by Node's event loop. Node manages a queue of I/O requests populated by libuv, which is responsible for polling the OS for I/O data events and handing off the results to JavaScript callbacks.
Consider the following code:
var fs = require('fs'); fs.readFile('foo.js', {encoding:'utf8'}, function(err, fileContents) { console.log('Then the contents are available', fileContents); }); console.log('This happens first');
This program will result in the following output:
> This happens first > Then the contents are available, [file contents shown]
Here's what Node does when executing this program:
- Node loads the
fs
module. This provides access tofs.binding
, which is a static type map defined in src/node.cc that provides glue between C++ and JS code. (https://groups.google.com/forum/#!msg/nodejs/R5fDzBr0eEk/lrCKaJX_6vIJ). - The
fs.readFile
method is passed instructions and a JavaScript callback. Throughfs.binding
, libuv is notified of the file read request and is passed a specially prepared version of the callback sent by the original program. - libuv invokes the OS-level functions necessary to read a file within its own thread pool.
- The JavaScript program continues, printing
This happens first
. Because there is a callback outstanding, the event loop continues to spin, waiting for that callback to resolve. - When the file descriptor has been fully read by the OS, libuv (via internal mechanisms) is informed and the callback passed to libuv is invoked, which essentially prepares the original JavaScript callback for re-entrance into the main (V8) thread.
- The original JavaScript callback is pushed onto the event loop queue and is invoked on the next tick of the loop.
- The file contents are printed to the console.
- As there are no further callbacks in flight, the process exits.
Here, we see the key ideas that Node implements to achieve fast, manageable, and scalable I/O. If, for example, there were 10 read calls made for 'foo.js'
in the preceding program, the execution time would, nevertheless, remain roughly the same. Each call would have been made in parallel in its own thread within the libuv thread pool. Even though we wrote our code "in JavaScript", we are actually deploying a very efficient multithreaded execution engine while avoiding the difficulties of thread management.
Let's close with more details on how exactly libuv results are returned into the main thread's event loop.
When data becomes available on a socket or other stream interface, we cannot simply execute our callback immediately. JavaScript is single threaded, so results must be synchronized. We can't suddenly change the state in the middle of an event loop tick—this would create some of the classic multithreaded application problems of race conditions, memory access conflicts, and so on.
Upon entering an event loop, Node (in effect) makes a copy of the current instruction queue (also known as stack), empties the original queue, and executes its copy. The processing of this instruction queue is referred to as a tick. If libuv, asynchronously, receives results while the chain of instructions copied at the start of this tick are being processed on the single main thread (V8), these results (wrapped as callbacks) are queued. Once the current queue is emptied and its last instruction has completed, the queue is again checked for instructions to execute on the next tick. This pattern of checking and executing the queue will repeat (loop) until the queue is emptied, and no further data events are expected, at which point the Node process exits.
Note
This discussion at https://github.com/joyent/node/issues/5798 among some core Node developers about the process.nextTick
and setImmediate
implementations offers very precise information on how the event loop operates.
The following are the sorts of I/O events fed into the queue:
- Execution blocks: These are blocks of JavaScript code comprising the Node program; they could be expressions, loops, functions, and so on. This includes
EventEmitter
events emitted within the current execution context. - Timers: These are callbacks deferred to a time in the future specified in milliseconds, such as
setTimeout
andsetInterval
. - I/O: These are prepared callbacks returned to the main thread after being delegated to Node's managed thread pool, such as filesystem calls and network listeners.
- Deferred execution blocks: These are mainly the functions slotted on the stack according to the rules of
setImmediate
andprocess.nextTick
.
There are two important things to remember:
- You don't start and/or stop the event loop. The event loop starts as soon as a process starts and ends when no further callbacks remain to be performed. The event loop may, therefore, run forever.
- The event loop executes on a single thread but delegates I/O operations to libuv, which manages a thread pool that parallelizes these operations, notifying the event loop when results are available. An easy-to-reason-about single-threaded programming model is reinforced with the efficiency of multithreading.
To learn more about how Node is bound to libuv and other core libraries, parse through the fs
module code at https://github.com/joyent/node/blob/master/lib/fs.js. Compare the fs.read
and the fs.readSync
methods to observe the difference between how synchronous and asynchronous actions are implemented—note the wrapper
callback that is passed to the native binding.read
method in fs.read
.
To take an even deeper dive into the very heart of Node's design, including the queue implementation, read through the Node source at https://github.com/joyent/node/tree/master/src. Follow MakeCallback
within fs_event_wrap.cc
and node.cc
. Investigate the req_wrap
class, a wrapper for the V8 engine, deployed in node_file.cc
and elsewhere and defined in req_wrap.h
.